Replay: In the future, video will be neither digital, nor analogue. It will exist in a third domain.
The early signs are there.
I was writing a piece recently trying to explain how the “digital world” and the “analogue world” co-exists, one as a virtual overlay on the other, when it dawned on me that there’s no such thing as the “analogue world”. There is, in fact, just “the world”.
I’m pretty sure that until we started “going digital” not a single person throughout history has referred to the reality that surrounds them as “the analogue world” Our world is only analogue relative to the digital world. It’s analogue by virtue of not being digital.
Despite all that, I think it is useful to consider our actual reality-container as analogue.
One of the most obvious features of a digital world is that it is quantised. That means that a continuously changing phenomenon like a blue sky is broken up into separate but adjacent colours, and each is given a number. The more steps you have for a given range of values the better it will look (or sound), but, ultimately, in a digital world, even smoothly changing quantities are given discrete values.
It’s easy to show mathematically that if you have enough numbers - or steps - to describe a scene or a sound, then there will be no way to distinguish analogue from digital - to the human ear or eye. Perhaps the biggest difference is that the digital version will consist of a set of numerical values. Once you have these values, you can store them, do maths on them, make perfect copies of them and share them over a network. There are huge advantages to “digital”. Done well, digital can be better than analog in every measurable way. Plenty of people would disagree with that and their gripe would probably centre around the meaning of the word “better”. It’s easy to demonstrate that high end digital recordings are more accurate than analogue ones. But are they nicer? Do they give a more pleasurable experience? Certainly playing a vinyl record or watching a spectacularly well shot movie captured on film can be wonderful - but it won’t be because of accuracy.
Arguments about analogue vs digital always seem to come back to the idea that analogue phenomena are continuous whereas digital versions are quantised, or stepped. It’s possible to prove that with enough resolution in the digital recording that’s never going to be a problem and I think it’s along the same lines as complaining that your glass tabletop feels rough because it’s made of atoms. Even the smoothest material looks rough if you zoom in far enough - and you still won’t see the atoms!
The third domain
Wouldn’t it be great to be able to describe the analogue world with all the robustness of digital but without the quantization? It would be, and we can, and we have been doing it for decades. Look no further than a page description language called Postscript. Launched in 1984 by Adobe, it was quickly noticed by Steve Jobs, who, ambitiously, incorporated it into Apple’s LaserWriter - quickly spawning the Desktop Publishing revolution. Postscript was also an important antecedent to the PDF format.
Its relevance here is that it broke the pixel mould. Instead of representing typefaces as bitmaps (arrays of dots) they were stored as mathematical descriptions. A diagonal line is a diagonal line. It’s a concept, stored as a mathematical expression - which, for the letter I might be as simple as “start a vertical line this thick here and finish it here”, where the words in italics are the specifications of the character. An O would be, simply “draw a circle” - with the qualification that typefaces are way more sophisticated than that and that an O is rarely a true circle. But that doesn’t matter. What matters is that you can characterise any shape with an expression.
These PostScript characters exist in the third domain. They can be expressed and stored digitally but without quantisation. They will be reproduced at the resolution of the output device. If you had an output device with infinite resolution, so would be the characters it prints or displays.
These so-called vector descriptions take us a long way but they’re problematic on a practical and a mathematical level. The practical problem is that algorithms to detect outlines from a bitmap are imprecise. Worst of all, they tend to produce different results for adjacent frames. Outlines would appear to be “fizzing” as the imprecisions come though as a form of noise. Far from looking better, this would be much worse. Bear this in mind while we look at the next stage.
AI thinks in vectors
On the face of it, there’s a lot wrong with that heading. AI doesn’t “think” (yet) and I use “vector” in a different sense than the PostScript vectors when I’m talking about AI.
Machine learning - the bedrock process of most AI - is getting very good at generating photorealistic images of things that have never existed. Remarkably, this also applies to images of people: not anyone that you or I know, but imaginary people that are made from data about hundreds of thousands of faces. It works surprisingly well, but occasionally you see artefacts that give it away - an extra tooth, or strange-looking ear lobes. Mostly, though, it’s convincing, and, three or four years on from when the technique was invented (it’s called a Generative Adversarial Network), it has improved and is more controllable - but this original demonstration www.thispersondoesnotexist.com is still pretty amazing.
This kind of “AI visualisation” and how your deep memories work is going to define the future of video, because AI thinks not in geometrical vectors, but conceptual ones.
Essentially the AI “tends towards” patterns that it recognises along pathways that it has built up through its training, where it will have seen enough examples of real-world data to be able to offer solutions to certain inputs.
Curiously, we don’t know from the outside what these conceptual vectors are. There’s no dashboard for you to tweak the size of someone’s nose or their eyebrow density. That mystery is gradually unfolding, though and the progress in this field is more than rapid.
By the way - if you’re having trouble visualising this process, try to recall what happens when you photocopy an image multiple times. Little specs or undulations in the image density tend to get accentuated. Dark spots get darker and light spots get lighter. Even artefacts that have nothing to do with the image get more prominent and, eventually, a pattern emerges that can look like a leopard skin or some other mottled natural phenomenon.
This is the same sort of pattern of reinforcement that is the basis of machine learning. In the case of the photocopier it’s a process that doesn’t add information. With Ai, the emerging patterns are based on guidance from the real world and - you could say - represent the probability that a given input will be correctly guessed or represented in the AI version.
Since this is more about probability than pixels, it is resolution independent. It’s also frame-rate independent: you’ll be able to output at any frame rate.
This will happen - maybe sooner than you think
I’ve been thinking about vector video for about twenty years. When I first talked about the idea to video encoding experts (the first time I mentioned it was to a video codec guru at JVC’s headquarters in London) they were understandably sceptical. I mostly got the response that we would need a thousand fold increase in processing power for this to happen. That seemed like a lot at the time, but with the recent revelation from Apple that their new iPad pro is 1,500 times more powerful than the first version released ten years ago, and with the advances in AI - and with the work that Nvidia is doing in AI-assisted video compression - then I think we can reasonably say we have, in fact, had a overall million-fold increase in capability since the turn of the century.
A lot of what you’ve just read might sound ludicrously theoretical and out of touch with the real world. I understand that. But when you factor in the extraordinary rate of change - and the fact that there are instances of this already happening - I think it is not only likely, but almost inevitable that we will have a high quality AI-based codec within a few years. And from that point on, there won’t be any need to resort to pixels.