Haswell, Xeon & Core Demystified - Part Two. Guest author Rakesh Malik looks at the different flavours of Haswell processors, examines where Xeon fits into the pattern, and looks ahead to the introduction next year of Intel's 14nm iterative 'tick' of the Haswell microarchitecture, Broadwell.
The Haswell Variations
There are several quite a few variations of Haswell processors, with varying numbers of cores, drive voltages, clock speeds, cache sizes, and GPU options. There are several low-voltage parts designed for tablet computers like the Microsoft Surface, that are limited to two cores, lower clock speeds and smaller cache sizes, and GPUs with fewer execution resources than the higher end parts. For desktops there are more cores, higher clock speeds (facilitated by higher drive voltages), bigger caches, and bigger GPUs. Odds are, anyone using a four or six core variant will also be using a discrete graphic adapter with its own dedicated frame buffer.
Unless you’re looking at a form factor with particular technical limitations like a tablet or an ultrabook, the primary factors determining which variant is ideal for you is going to be dependent on your target workload and your budget.
That said, in most cases if your target applications is video post production, your best bet will be to maximize the GPU.
What about Xeon?
Xeon has been for many years the brand that Intel assigns to its x86 processors made for servers. While the Xeon family generally lags the consumer-oriented Pentium and now Core family of processors, the differences are usually pretty small, primarily options like support for multiple sockets and bigger caches, generally more cores at the expense of lower maximum clock speeds. Gamers and 3D animators, due to the need for the host processor to run things like enemy AI in games and inverse kinematic solvers in character animation, depend a lot more on the host CPU than applications like DaVinci Resolve, which uses GPUs aggressively. Unlike applications with less flexible architectures like Adobe Premiere, Resolve will happily push your CPU almost as hard as it pushes your GPU(s).
Since Xeons are aimed at servers, they generally have more cores since serve applications rarely favor single-threaded performance over throughput. Bigger caches and more cores can also deliver overall superior performance over processors with fewer cores, smaller caches, and higher clock speeds in applications that parallelize well, such as ray tracers and video and image manipulation.
Xeons also include Error Checking and Correcting or ECC memory. While this slows memory performance down a bit and adds cost to the memory, it is necessary for a datacenter, since in a datacenter a bit flip that goes uncaught can lead to a transaction error where a $100.00 transaction could turn into a $1000 transaction. In a large-scale scientific simulation, a single bit flip that goes undetected can invalidate an entire simulation run, and in a cluster with 4400 DIMMs, there’s almost certainly a bit error per hour of simulation time. To get around this a researcher would have to run each simulation until they got two results that matched exactly. For this reason, VA Tech had to replace their entire PowerMac G5 cluster less than a year after installing it, because the first version didn’t have ECC support, and the researchers had to sacrifice so much performance due to the required redundancy that the cluster wasn’t worth it.
In a consumer application like a game, on a gaming system with 4 DIMMs, there might be an off-color pixel or the dead reckoning algorithms might mispredict an enemy’s location for a few frames until the game engine receives an update from the enemy’s engine. Something along these lines is bound to happen because of a memory error on average once sometime over the course of 1000 hours of continuous run time.
For consumer applications, ECC isn’t worth the cost or the relatively small performance penalty.
Intel developed a new type of memory called Fully Buffered DIMMs, which was supposed to improve scalability as well as performance. It worked on a technical level, but it didn’t get much industry acceptance, so its cost remained rather high. Current Xeon models use DDR3.
Which one should I get? Or should I wait for Broadwell?
The answer to the first question depends a lot on what you plan on doing with the computer. If your primary workloads require the fastest CPUs (eg games, character animation, ZBrush), you want the fastest clock speed you can get. If your main workload is more GPU-centric like DaVinci Resolve, then a single Haswell + several graphics cards (or an nVidia Tesla cluster if your budget allows) might be a better choice. If you’re looking to build a render farm for LightWave, you might be better off with a large cluster of single processor, quad-core Haswell systems with boatloads of memory and no discrete graphics card. You’ll have to evaluate your needs to get a good answer to this, in the end.
The choice about whether or not to wait for Broadwell is a little bit different. If you need a computer now, then obviously no, you shouldn’t wait. If you’re doing fine with your current computing solution, why are you looking to upgrade? Do some work and bank some savings so that you can get a nicer machine when it’s time to upgrade.
As for which model to get, just pick the fastest one you can afford. Where clock speed are equal, in most cases a Core i5 will be faster than a Core i3, and a Core i7 will be the fastest of the three. The addition of larger caches + HyperThreading does add up to around 20% better performance on average with an i7 than with an i5 when all else is equal.
We’re reaching a point where even a Surface tablet is powerful enough to serve as a workstation for a lot of applications, including editing video, and at the same time that the host processors are getting powerful enough to make a tablet into a workstation, graphics processors are broadening their reach. What has not changed is that there’s always something new coming.