Why the arrival of GPU-based audio is an inflection point that David Shapton reckons could be as pivotal as when MIDI rewrote the rulebook in the early 1980s.
Audio on a GPU? Wait, what?
GPUs are for graphics, aren't they? Not exclusively. Not any more. It's now possible to process audio on a GPU. It's hard to describe how big a breakthrough this is.
Putting it in context...
Digital audio is a mature discipline. Work on Pulse Code Modulation dates back to the late 1930s, and the principles of sampling and digital audio processing were established in the decades following. Work on the Compact Disc started in the late 1970s, and by the end of the 1990s, digital audio was an established part of the recording scene. That decade was like a Cambrian period for digital audio, with extraordinary - and sometimes weird - digital audio editing systems materialising seemingly out of nowhere. Most of them didn't survive, but those that did (such as ProTools - originally Digidesign Sound Tools) are now household names.
Some of the earliest commercial digital audio products used discrete logic chips rather than a central processor. A digital audio mixing console would consist of dozens of logic boards, each about the size of a coffee table. The first dedicated DSP (Digital Signal Processing) chips arrived in the mid-1980s but weren't meant for digital audio. Instead, they were found in devices like echo-cancelling modems, where they processed the modem signals to remove the line reflections that could lower the bitrate.
Some people (including me!) found a way to use these DSP chips to build audio mixing desks. This was a revolutionary approach, and better chips, dedicated to digital audio came along quickly and helped to establish the digital domain as the default for professional work.
Computers couldn't process audio
Computers at the time - based around intel 80286, 80386 and 80486 processors couldn't process audio without the help of plug-in DSP cards. But around the late 1990s, something changed. An audio editing application called Cool Edit Pro proved that you could work with digital audio on a PC without assistance from dedicated audio hardware (apart from a sound card). Suddenly, those big-iron digital audio hardware products seemed like overkill. The following decade saw digital audio fully integrated with computers and even built into the operating system. Digital audio workstations ceased to be furniture and became applications.
Today's powerful PCs and Macs can handle dozens of audio tracks, as well as virtual instruments, effects and powerful virtual mixing consoles. It is incredible to see how far digital audio has come in just a few decades. But there are limits.
I mentioned in this article that while digital audio as a format and a medium is a done deal and has been for around twenty years, audio processing still has a long way to go. Specifically, if there was more available real-time processing power, then digital audio could enter a new golden age.
There's nothing to stop you from processing digital audio on a central heating controller. But it wouldn't be real-time. It might even take a month. Here's the thing about digital audio; for it to take place smoothly and without glitches, you have to be able to run an entire software program in between each sample. Typically, that would mean in 1/48,000th of a second.
And every time you add a track or a channel, add a VST plug-in, mix multiple channels, or output to a multi-channel format, you're stacking tasks that must be completed within that tiny allowance.
Modern audio software is incredibly accomplished. The fact that it mostly runs on a CPU is an extraordinary engineering feat, even more so when you realise that - apart from a few DSP-like tweaks, CPUs aren't designed for audio.
Enter GPU Audio
GPU Audio is a company whose name very neatly describes what it does. Its sole focus is making it possible for GPUs to process audio. And yes, it's a surprising move because the "G" in GPU stands for Graphics.
But GPUs are also extraordinarily powerful processors that can be repurposed for other tasks. This, in itself, is not new. For some years now, GPUs have been accelerating Machine Learning and AI. GPUs aren't general-purpose processors, but they can help with a lot of things
Ironically, when computers are running Digital Audio Workstations, the GPU sits mostly idle. With nothing to do apart from drawing a mostly static user interface, that's understandable, but it's an almost criminal waste of real-time processing potential.
Processing audio on a GPU is not straightforward, though, and the biggest obstacle is that processing video, or creating 3D assets and landscapes, is a mostly parallel task, and GPUs are designed for massive parallelism. To massively over-simplify, each pixel (or indeed each element of a 3D object, called a “vertex") could potentially have its own dedicated GPU core. That doesn't help audio in an obvious way, where samples have to be output sequentially. But the profound differences between parallel and sequential get blurred when you realise that GPUs are so blazingly fast they can carry out individual tasks on an audio stream and reassemble the parts within the space of an audio sample.
None of this gets even close to the work that GPU Audio is doing. They're re-writing the lowest level of software that runs on a GPU so that other audio software can run on it. It essentially re-maps the processing resources on a GPU to be meaningful to audio software.
So far, the company has released some "proof of concept" audio software and has convinced audio companies across the fields of music production and gaming that they're on to something. The hope is that we will see new standards emerge that make GPU audio accessible to developers.
Two key advantages
There are two key advantages in being able to process audio on a GPU.
Latency is the enemy of real-time processing because you add a delay every time you process audio. A delay of 20ms is about the maximum you can tolerate if you're playing a virtual instrument on a keyboard. More than that, and you won't be able to play in time.
Latency is always in the minds of audio developers. It's a hard limit they're constantly battling against. Without it, we'd have better and more powerful audio software.
Almost unlimited processing power
GPUs are so massively powerful - and that power is scalable - that they have the potential to provide seemingly limitless processing power for audio software. It's genuinely transformative. It's entirely reasonable to imagine that GPU audio could be as pivotal a moment as the arrival of MIDI in the early '80s. Why was MIDI so important? Because it spawned whole new industries. It meant you could control synthesisers with computers, leading to the inexorable rise of DAWs.
So far, all of this is incredibly abstract. So here's a concrete example of how GPU audio could help.
Approximately nine years ago, Roland released a new kind of software instrument modelling based not merely on the instrument's overall sound but on the individual behaviour of its internal components. The synthesiser company called it "ACB", or Analogue Circuit Behaviour. This is how Roland put it at the time:
"ACB is drastically different from conventional methods of modelling, and reproduces each analog component by thoroughly analysing each detail of the original design drawings. By combining the analysed components in exactly the same manner as the original analog components, detailed characteristics of the original musical instruments emerge and can be reproduced completely."
It's a fantastic idea in theory, and in practice, but it has one major drawback: it's a processor hog. Most modern computers can run dozens of plug-ins, but just one instance of an AWB-based synth soaks up way too much processing power, bringing almost any DAW to its knees if combined with other tasks. This method of emulation makes some of the best virtual instruments ever heard: so good that it's worth working around the drawbacks. But what if there simply weren't drawbacks?
I don't know if GPU Audio has spoken to Roland about this. But you can see where this is leading. If techniques like Roland's ACB could harness the processing power of even a small GPU (and almost every computer now has one), then we could see even more complex and convincing instrument emulations in the future - and we could have as many as we like running simultaneously on our DAWs.
It's early days yet. I have spoken briefly to GPU Audio (the company) and was struck by its focus on making this a usable technology. Its doing the hard stuff (and it's very hard) by writing extremely low-level code for GPUs to be able to "understand" audio. It wants to promote standards and hopes that audio companies will join it in creating a new-generation platform for audio. And while the music creators among us will look forward to the day when we have almost limitless power for our virtual studios, it's not just music that will benefit. There are huge implications across the whole of leisure and entertainment, for multi-directional sound and even for AI-generated audio. Remember, audio, wherever it came from, always needs processing at some point.
It's hard to overestimate the significance of GPU audio. It's a remarkable development because there are no downsides. And it's even good for sustainability because it uses a resource you already have on your computer.