Professional audio seems like a done deal. We solved the hard stuff decades ago. Video has lagged because the challenges are much bigger. But the two disciplines are related. So what can we learn from this?
For most of us, our first encounter with digital audio was the Compact Disc. By today's standards, early CD players could have been better. It wasn't the format's fault but that of primitive (and cheap) digital-to-analogue converters. These devices, which we rarely think about as consumers, became the weakest link in the audio chain. It didn't matter so much at the time because the concept of a non-linear playback system with luxurious digital silence between tracks gave the impression of a better sound than it was objectively.
Modern converters are much better and can be stratospherically good if you're prepared to pay. If you care about quality, you might consider that nothing less than 24-bit 192kHz sampling will do. If you don't care, you'll be happy with compressed audio from Spotify. But the reality is that old-fashioned CDs can reproduce a vast dynamic range with a frequency spectrum that's more than enough for most of us. Whoever dreamed up the specifications - actually, it was Phillips and Sony - did an excellent job.
When it comes to recording, there are different priorities. 16 bits gives you 96dB of dynamic range: that's more than enough for listening at home. But digital audio can be unforgiving if you get the levels wrong: tricky in a live recording session. So for recording, it's best to choose 24 bits. That means you can still capture without damage if there's an unexpected peak. Creating a 16-bit "windowed" version is relatively easy once the recording's finished. Some audio gear can record in 32-bit. That's probably enough dynamic range to record the Big Bang without going into clipping. No individual analogue to digital converter can go anywhere near that dynamic range, but by stacking multiple converters, with their input gains offset, it's possible to record pretty much anything without thinking about levels until post-production.
We don't need extreme sample rates that can capture the small talk of a Fruit Bat. But more detailed recordings mean smaller artefacts relative to the wanted audio.
Base spec? Job done!
Looking back at the last few decades, I think we can agree we have achieved the base level of professional audio performance. It's no longer a challenge to record and reproduce high-quality digital audio, in multiple tracks, in a variety of physical, file and streaming formats. Arguably, we reached this stage twenty years ago (to pluck an imprecise figure out of the air).
Since then, the audio industry hasn't stood still. Instead, the emphasis moved from recording to processing. Digital reverbs are now outstanding, with "convolution" based algorithms able to "sample" the impulse response of, say, a cathedral and apply that same response to successive audio samples to make it sound like you're actually in that reverberant space.
Techniques like auto-tune can fix pitching errors or be employed as a creative effect. Cher's brilliant Life After Love was a stunning early example of this.
As with all corrective processing, the more accomplished it gets, the less you notice it. So, for example, Izotope's audio toolkit can do things that used to be impossible to "fix" a bad recording or optimise an already good one.
And, inevitably, in any discussion about the future of content creation, AI is about to play an increasing role in audio. It's almost pointless to make predictions at this point, but expect to see it become the dominant technology in recording, music production, and film and TV audio.
But what about video? Is video "done"? Where can it go next?
We've solved this...for now
I think it's reasonable to say that video is "done" for many, if not most, purposes - as long as you add the qualification "for now". So let me explain what I mean by this.
Today, we can capture cinema-quality video at high frame rates, high dynamic range, and staggeringly high resolution - 8K and beyond. 8K still has a role, even in the living room. About three years ago, Samsung loaned me one of their high-end 8K, 65" TVs. At the time, it was almost pointless because there was very little 8K material around. But it taught me that while 4K is more than adequate for most people - if you sat close enough to the screen, you could imagine something better. And then, seeing authentic 8K material, effectively straight from the camera in studios and at trade shows, I had little doubt that it was a worthwhile improvement over 4K. As ever, I have to qualify this by saying you need (obviously!) good or well-adjusted eyesight, a big screen, and to be sufficiently discerning to notice the difference. As with very high audio sample rates, the exceptionally high spatial resolution of 8K makes any intrinsic artefacts smaller in comparison to the perceived image, as well as revealing details in texture and micro-shadows.
I'd say that our ability to record and reproduce video is, in the above strict sense, "done". We can, of course, imagine improvements, but unless we move outside our cone of current normality, any advances will be hard to spot.
I should clarify that I don't mean there's no point in improving cameras, lenses, or sensors: cumulatively and also in terms of sheer flexibility and adaptability for filmmakers, such gains will always be welcome, but the "medium" itself - our ability to record excellent video - is enough for now.
Finally, I have to add that I don't feel at all confident about this claim! I want to say it, but I wonder if I'm missing something. So please let me know what you think, and I will eagerly listen.
Beyond this, there's a whole new world of immersive video, the metaverse, virtual reality and every other direction that the future might take us. And again, some, if not all, of this will be driven by AI.
In the next article, I'll begin to explain how this is likely to pan out.