What exactly is GPU computing and what does and doesn't use it?
As we saw back in December in Eyeon's informative video, GPU computing is a very powerful technique that, during the last year or two, has begun to break out of a niche. The original application of graphics cards was, obviously, video games. 3D rendering software such as Max and Lightwave have been using games-oriented graphics hardware to produce approximate previews of the scene for some time. More recently, the world's most popular operating system learned how to draw its user interface using more features on the graphics card, saving the CPU from spending its valuable time working out which window is on top.
What's new is the application of graphics processing units to calculations which are not, at least directly, graphics-related. Projects such as Folding@Home have used GPUs for simulation in medical research, and in the last couple of versions, some postproduction software has begun to apply the same technology to rendering effects. Even video games have followed the curve, and now commonly do physics simulation for both rigid objects and soft bodies, smoke, and liquid.
While this is all good, but it could be better.
To understand why, it's probably worth recapping how modern GPUs work and what they're therefore capable of doing.
Doing a large number of things at once
The fundamental principle at work is parallel computing, the concept of doing a large number of things at once. Computing has traditionally been focussed on doing one – or sometimes four or eight – complicated tasks at once, as quickly as possible. GPUs take the opposite approach, simplifying each processing core to the point where a single graphics card can have literally hundreds of them, and even if a GPU isn't generally clocked as quickly as a CPU, the performance advantage can be huge. The limitation is generally that all of these processing cores must work on the same data, which frequently means chunks of an image, although as we've seen with Folding, other things can be done too. Tasks which lend themselves to this sort of repetitive, parallel approach are extremely common in image processing and compositing and, notwithstanding ARM's apparent dedication to more cores in their CPUs, it's becoming increasingly obvious when a particular application is making a CPU do work that's better done on a more parallel architecture.