Phil Rhodes, our Technical Editor, may incur the wrath of audiophiles and engineers with thie idea that "the audio equipment world has it fairly easy," but he draws intriguing parallels between audio and camera equipment development, which may be instructive for manufacturers.
Do audio engineers feel like they're living in a world where equipment is suddenly the easy part? Probably not, to be honest; most technical people can find a reason to suck air through their teeth and offer a headshake at the performance of some piece of gear. Putting aside the world-weary curmudgeonliness that we expect from the engineering staff, though, it's hard to avoid the fact that audio equipment is now fairly transparent. This is a carefully-chosen term and one to use guardedly, of course, because otherwise someone will write four hundred words in the comments explaining in excruciating detail how terrible everything sounds since the popularisation of a randomly-selected technique or device. But overwhelmingly it's true: modern microphones, preamplifiers, analogue to digital conversion equipment and recording systems are good enough that, at the very least, industry quality-control checks are fairly easy to pass from a technical point of view. It is rare at the professional-level, I think, for someone to fail QC on a noisy mic preamp.
So, yes, the bandwidth and precision of the human auditory system is certainly less than that of a 192KHz, 24-bit audio signal, even once it's been through a few stages of processing and manipulation, even taking into account the vanishingly-minute noise of modern recording electronics. It's possible to buy things which fulfil these requirements for relatively small amounts of money. This brings to mind the concept of a post-scarcity society, as widely discussed in science fiction, wherein advancements in technology, sociology and politics have created a world where nobody need suffer any disadvantage through material want. The Star Trek canon is possibly the most well-known example; the crew of the Enterprise do not get paid, because they don't need to be.
There is an argument that, at least from an equipment point of view, the audio world lives in a post-scarcity society of this sort. Equipment adequate (if not ideal) for practically any purpose is affordable to practically any person. Needless to say, this is not intended to imply that location sound recording, music mastering or mixing and editing are anything other than highly skilled professions or that we don't need really good microphone amplifiers to satisfy 24-bit recordings; it's just that they are, increasingly, highly skilled professions that don't rely on highly expensive equipment. Watercolour paints and brushes aren't particularly hard to come by, but that's no reflection on Turner.
What this gives the audio department is flexibility, both artistically and technically. Sound designers can now apply a huge degree of processing to audio without causing unacceptable technical problems and they can do so using tools that afford a lot of convenience. Sometimes, the extreme technical flexibility of modern audio equipment can become a complexity in itself; often, the digital audio workstation software and the input-output hardware will both offer extensive patch and routing options, which can create complex situations in which things do not always behave intuitively.
Freedom to sell
This happens because designers are under pressure to come up with features that will sell products. It's possible to add enormous feature sets because the workload associated with transporting and manipulating audio data is, literally, an order of magnitude less than video. A really good eight-channel, 192KHz, 24-bit audio signal, such as would be used to handle a 7.1-channel surround sound track, represents four and a half megabytes per second of data. Double that, if you like, to include the audio description track for people with sight problems, foreign languages, perhaps a stereo downmix, and so on. The twenty-four-frame, ten-bit HD video signal that might accompany that audio represents nearly 178 megabytes per second, and we're already pushing for quad HD, at four times that. Video is so gigantic that we still routinely compress it and video compression is actually a huge disadvantage from every perspective (including processing power, latency and image quality) other than that it makes storage easier. Audio is now only compressed in distribution; there was barely a time during the existence of digital audio as a working tool, MiniDisc notwithstanding, when it was really necessary to compress audio at acquisition.
All of this goes to reinforce the idea that the audio equipment world has it fairly easy. Good work can be done on inexpensive workstations. The flash cards don't cost thousands. What we want – and this is really the subject of this article – is this sort of freedom in camerawork, and as at 4:09pm on May 21 2015, we don't have it. We're getting there; fifteen-stop cameras with considerably beyond-HD resolution are being talked about, even if those numbers are frequently a little optimistic. Companies such as Atomos and Blackmagic have made recording, previously a six-figure proposition involving HDCAM-SR tape decks, extremely affordable. Say what you like about 8K, though; neither cameras, nor recorders, nor displays are capable of acquiring, recording or displaying pictures which will fool the human eye that the scene is real. A movie screen never looks like a window into an alternate reality, stereoscopy be damned. Conversely, audio can, with care, fool the ear that a person is actually in the room, speaking. More than that, the audio department now has the flexibility to do that and still have a big sack of signal quality left over to perform significant processing and alteration without perceptible degradation in performance.
The manufacturing industry (and the primary science which supports its development work) are both clearly reaching for these things. Since we can build almost arbitrarily powerful recording systems, most of the problem is in acquisition. People who follow developments in sensors for cameras will likely know that the most rarefied, expensive and artistically controversial part of audio is often the microphone and the preamplifier and, to some extent, the analogue-to-digital converter, all of which is part of the sensor in a CMOS camera. Sound and vision are analogue phenomena and recording them requires analogue transducers which will never achieve mathematically absolute performance. The question is whether we can expect current techniques to provide cameras with the same sort of excess performance available to sound people.
Frankly, there seems to have been something of a logjam recently, with an appetite for increased resolution reducing photosite sizes to the point where gains in sensitivity and noise-floor are more or less offset. The physics does what it does, although advances such as the SiLM multi-layer sensor technology developed by Lumiense can offer significant advantages here and it may be this sort of thing that allows more than the incremental improvements we've seen over the last few years.
What's really important to realise here is that completely transparent imaging technology, while desirable, doesn't represent any kind of zenith in the artistry of filmmaking. We've said a thousand times that modern cameras are very good, so good that they are all, in their own way, capable of equalling the highest-priced toys of yesteryear which made some very fine movies. The opening theme to Skyfall does not make it sound as if Adele is standing in the room singing and isn't intended to; the point is that making it sound that way was greatly facilitated by the fact that at any level (and certainly at the level of mastering a James Bond theme) nobody is struggling for really excellent recordings. One day, the camera will be the same.