DAWs revolutionised the way that we worked with audio and now AI tools look set to take us on to the next stage.
Tape recorders are iconic pieces of audio gear that still conjure up feelings of high-tech nostalgia. Maybe it's because they were more than a "black box" with their always-in-motion tape reels and warm sound. They represented something far more tangible than digital recordings on a computer. At the end of a session, you could put the reel of tape on a shelf, knowing that the physical reel and the recording were one and the same thing.
But they had their downsides. Edits meant physically slicing the tape. To mix tracks or change levels, you had to copy from one tape to another via a mixing desk. "Topping and tailing" tracks was easy, but cleaning up spoken word was an utter nightmare, with potentially an edit every few seconds.
Digital Audio Workstations (DAWs) have superpowers relative to tape and they took all that on by light years.
The brilliance of DAWs
DAWs are brilliant for timeline-based editing. You can chop and change things in a DAW that would require a mile of splicing tape with a tape recorder. Underpinning all this usefulness is the idea of Non-Linear-Editing (NLE), which stems from the fact that files have a quite separate existence from the medium that contains them.
With a reel of tape (or film), the medium is literally the message. If you wanted, for some reason, to collect all the instances of "the" in a podcast, you could cut out all the segments of tape and put them in a jam jar. It would literally be a jar full of "the".
You can't do that with a digital medium (unless it's digital tape - a hybrid of the old world and the new). With a hard disk or an SD card, if you start slicing it up, it simply won't work.
But the upsides of digital are that you can make perfect copies and do mathematics on the content. That's called Digital Signal Processing (DSP), and it has taken us a long way in the 40 or so years it has existed. So much so that anyone born in the 1980s would be stunned if they saw what even the simplest DAW could achieve - never mind the music-oriented "giants" such as Logic Pro, Ableton, Cubase, Reason and more. At the same time, dedicated audio editors like Adobe Audition are now precision tools for busy audio professionals.
DSP can do stuff like EQ, Compression (the dynamic range type), delay, reverb and even quite effective noise reduction. Techniques like speeding up or down without pitch change are accomplished, and "Autotune" has spawned entire music genres.
Some stuff, though, remains stubbornly tricky, not because there isn't enough processing power or talented enough developers; it's because of the laws of physics.
Using conventional DSP, removing room acoustics, unmixing a mix, undistorting a signal, removing clipping, or simply making a microphone sound better is tough. All of that is about to change and, in some cases, has changed already, and that's all due to AI.
AI: taking it to the next level
Why is it difficult to do these things? It's because artefacts like clipping and distortion represent something that's missing. In some simple, idealised cases, you can fix clipping if it's a predictable waveform like a sine wave or a triangle wave, but these are as rare as a perfectly straight line in nature. But in most real-world examples, you can't just fill in the missing gaps because it involves creating new material. But that's something that AI is very adept at doing.
In precisely the same way that AI can "autofill" - for example, uncrop a photo or change its aspect ratio and "invent" new but plausible material to fill in the extra space- AI can do the same with audio. And AI helps in other ways, too.
Creative people are rightly concerned that AI might ultimately take their jobs away. But another side to it is that it can make them do their jobs better - taking away the tedium and drudgery. For example, how many audio professionals actually enjoy taking out all the "um"s and "er"s from a recording? Depending on how "tight" you want the edit to be, there could be hundreds in a 60-minute podcast. But AI can search for these filler words and subtly remove them.
There are already tools that can find over-long gaps between words and subtly shorten silences. This, alone, would be a massive timesaver.
Essentially - and this is a vast and somewhat anthropomorphic oversimplification - AI "Imagines" what it would be like without noise, distortion, or filler words. It can also unmix multiple tracks. Needs an isolated vocal track? No problem. And these techniques are getting better all the time.
Some remarkably effective AI audio tools already exist, including this one from Adobe that will help improve your microphone setup to make sure you sound podcast-ready. You can remove room ambience, compensate for a cheap and nasty microphone, and make it sound like you're not too far away from the mic when, in fact, you were.
Nothing is perfect, but if all you've got is a damaged recording, you now have a good chance of rescuing it.
One final thing that AI is very good at is transcribing audio. This has always been challenging, but it's now very good - and pretty reliable. You can even click on a word and be taken to the exact part of the recording. It's a massive timesaver.
At least some breathtaking AI advances will make audio professionals' lives much easier. It's early days yet, but it will be exciting to see what these new tools evolve into.