A mighty 'tock'? Will the just announced Skylake, Intel's latest processor standard, be a boon or bust for editors and video professionals? Rakesh Malik investigates.
Skylake, the successor to Broadwell, is now officially shipping, at least to system integrators. This is the 6th generation of Intel's Core architecture and the second on Intel's 14nm process. The first 14nm processor was Broadwell, whose focus was largely on size, the "tick" part of Intel's release cadence. Skylake is a new architecture using the 14nm process.
Due to the difficulties in getting the 14nm FinFET process to scale, Broadwell was delayed to the point where its desktop variants simply got skipped. The ultra-low voltage models have seen moderate success in tablets and ultrabooks, but the higher performance markets have continued to rely on Haswell, the new architecture Intel designed using their 22nm FinFET process.
Broadwell wound up offering very little for high-end customers like gamers and filmmakers. The desktop models didn't even match up to the performance of their predecessors, let alone outperform them.
Skylake, on the other hand, seems like it will be worth the wait.
First of all, Skylake uses a new socket and chipset that uses DDR4. While it does mean that it won't be possible to upgrade a Haswell or Broadwell system to Skylake, it also means that mainstream Skylake systems will have DDR4 support, providing higher memory performance than was available to most Broadwell systems, as well as allowing for a maximum of 64GB instead of 32GB, on top of being faster memory overall than the current mainstream DDR3.
Currently, architectural details of Skylake are still under wraps. Intel revealed some of the changes and updates in the core, but most of the information it has shared relates to the new GPU. While reducing power consumption is Intel's priority, it has made some enhancements to improve performance as well.
One of the improvements that doesn't directly affect performance is power management. Skylake has multiple disparate clock domains, so it can ramp up or down clock speeds based on power consumption and usage. It can ratchet up the clock speed of the GPU's video codec hardware while scaling back the clock speed of its execution units, effectively allocating power where it's needed most.
For starters, Skylake can issue more instructions at the same time than Broadwell and it can track more instructions in progress at once, enabling it to better utilize its execution resources.
Prefetching is where a processor predicts what data a running process is going to need next and fetches it before it's needed. When the processor is correct, it saves time because the software doesn't need to wait for the data to load, but when it's wrong, it's wasted effort and, hence, wasted power. Plus, if it's wrong, the software will end up requesting the data that it actually needs anyway, so it didn't gain anything. A lot of modern processors prefetch aggressively, because it helps to reduce the latency of fetching instructions and because power consumption was a secondary consideration compared to performance.
Some workloads are very predictable, like processing a stream of video, and prefetch works extremely well with these. Others, like ray tracing and Monte Carlo Photon Mapping are much more difficult to predict, so prefetching is much less useful with these. While Skylake still uses aggressive prefetching, it now detects how predictable the data pattern is and adapts. With less predictable patterns of data, it reduces prefetch and, therefore, power draw on the memory subsystem, allowing it to use that power elsewhere.
The ring bus that the four cores used to communicate with each other and with the rest of the computer system (GPU, memory, IO, etc) is faster now, as are some instructions. The ring can scale down to reduce power draw, as well as deliver higher throughput with the same power.
The cache hierarchy has seen some interesting improvements. The 3rd level cache is larger, now 2MB for two cores, and what used to be the 4th level cache is now a more general purpose cache, available to the GPU and devices on the PCI Express bus. This cache also ensures coherence (making sure that all of the cores always see current data). This is important in parallel applications, where one core might read data that another core has updated, but hasn't yet written out to main memory. The 4th level cache, now called a Memory Side Cache, sees all of these writes, so it can maintain a coherent picture of the processor's cache. The Memory Side Cache will be available in 64 and 128 MB configurations.
Skylake's GPU has quite a few improvements. It has support for DirectX12, OpenGL 4.4 and OpenCL 2.0, potentially enabling thin and light laptops and also tablets, such as the Surface Pro, to run software like Resolve, at least in models that include the eDRAM memory-side cache. How well they'll perform is uncertain at the moment, but hopefully they'll find their way into the wild and we'll get to find out soon.
Skylake adds support for HDMI 2.0 via Thunderbolt 3 and a DisplayPort to HDMI 2.0 adapter. The hardware codec gains support for h.265, as well as motion JPEG encoding and decoding. The GPU can power down the general-purpose execution units on the GPU when using the hardware codec, allowing for 4K video playback with very low power consumption.
Where Haswell and Broadwell supported up to 48 GPU execution units, Skylake raises the limit to 72, and the GPU execution units have also been upgraded to support wider SIMD instructions. The new execution units can execute packed instruction with 32-bit integer and float values, as well as 8 and 16 bit values as well. This is potentially a big help for color grading software, since most color suites use 32-bit floats in their color pipelines. The execution units are also able to switch between tasks more quickly than before, as well as take advantage of the higher data throughput offered by the updated ring bus and system memory.
The verdict on Skylake is still out, but overall it's promising. The overclocker friendly models are shipping now and, hopefully, eDRAM configured models will be shipping soon as well. There are smaller and lower power variants coming soon as well, aimed at smaller machines such as tablets and phones.
Skylake is designed to scale from a thermal design power of five watts all the way up to 95 watts. That by itself is quite an achievement. The big question for RedShark News readers, however, is whether or not it's worth the wait for video editing, color grading, compositing and VFX.
At present, the answer seems to be a guarded "yes" given the improvements, but until Skylake systems are out in the wild where end users can test them, not much is certain.
Intel is clearly striving to make some inroads into the discrete GPU market. While it's unlikely that software such as Resolve will run well on Skylake models without eDRAM, it's entirely possible that Resolve 12 will be able to run acceptably on a Skylake model with eDRAM and, with 72 execution units, it's even possible that it will enable Intel to claim some market share from nVidia and ATI in the lower end of the GPU market. It certainly won't be competitive with the top end GPUs, but since software like Resolve Studio can use two GPUs, it might still offer some performance benefits to go along with its reduced power consumption.
The largest growth market in computing currently is gaming, which probably explains why the first Skylake models are overclocker oriented. Subsequent laptop and mainstream models will be shipping soon, as well as Xeon models, and models with 72 GPU execution units and 64 or 128 MB of eDRAM. When those models start becoming available in systems, we'll get a much better picture of how much of an improvement Skylake really will be.
Mobile products based on Skylake will be very interesting indeed, especially higher end tablets like the Surface Pro, and thin and light laptops like the MacBook Air. While it's not going to compete with the high-end discrete GPUs like the Titan, Skylake's new GPU will probably enable Intel to make some more inroads into the lower end of the discrete GPU market, possibly even being competitive with AMD's APUs, with their integrated ATI GPUs.