The future of video is much, more more than 4K or 8K. Much more than pixels and more than screens. The future of video is here, in this article
We talk a lot about the future at RedShark. These are exciting times to live in. People alive now will probably see the point where machines become more intelligent than us, although they will probably not outlive the debate about what exactly that means.
We always try to be objective - ie sticking to the facts - and we also like to be positive. These are difficult times for many, but they’re hopeful as well, because the future is taking shape as we watch. We don’t have to wait decades for new and radical technology to come along: we just have to check our RSS feeds or Engadget in the morning to see what direction the Earth’s tilted on its axis overnight.
Something big happened just this week, when Facebook announced that it was buying Oculus Rift, a small but clever Virtual Reality company that’s clearly on to something when it can attract the attention - and the $billions - of a social media company. A 21 year old became a billionaire: another day, another billion.
(Is Virtual Reality exciting to video makers? It certainly is! Read on to avoid being left in any doubt whatsoever!)
The standard or “received” vision of the future of video is that it’s going to be high resolution: 4K at least, probably 8K eventually and possibly more. There may be higher framerates and possibly (another!) revival of stereoscopic 3D, especially if “glasses-free” displays become good enough.
Yes, this may happen, and we may even get enough bandwidth eventually to make it work, but there is another way. And that’s what we’re going to talk about today.
There is another way
Now, when you read this, you should bear in mind that this is a long way off, and we don't even know what we mean by "a long way off"! The trouble with exponential progress is that if you’re out by a year or two then you can also be out by a factor of ten or a hundred. It’s also jerky: sometimes progress in one area has to wait for another area to catch up.
We mentioned in our article on Vector Video (our most popular ever, with over 100,000 page impressions!) that we didn’t think that, ultimately, resolution and framerate were going to be issues because we would move to a system that was independent of them, in the same way that Postscript is a way of storing images and documents independently of pixel density either in the image file or in the output device - the printer. The massive positive to this is that your output resolution can be anything. In the case of video, this could mean virtually any pixel density, and virtually any framerate. This doesn’t mean that resolution isn’t important: it’s more important than ever that the video is captured - however it is captured - well.
There were strong positive and negative reactions to this article, all with abundant truth in most of them. The bottom line is that when we wrote the piece, it was highly speculative and it certainly wasn’t saying that vector video technology is here and now; just that it’s coming.
One way of understanding vector video is to think about computer generated animation, where 3D models are created and then placed in a scene, lit and animated. Because the objects in the film are virtual recreations of real things - visible from any angle - it’s pretty easy to animate camera moves as well as the characters. You can think of vector video as being a CGI model of the image in front of the camera.
But there’s a snag. Someone has to make the models. This is a big deal.
It’s big, because it’s hard work. It’s like building a real model. Have you seen what happens when broadcasters try to provide a real-time subtitling service? The surprise is that it’s possible at all, but there are so many mistakes! It would be much worse than that if someone had to make a model of a scene in real time. In fact, you’d probably need real time times a thousand to make it work.
So, right now, you can’t just point a camera at a scene and derive a detailed, high resolution 3D model from it.
You can do it now
Except that, very nearly, you can, now.
The possibility of doing this is improving on two important fronts.
First, you have depth-sensing cameras. These come in many forms. In a sense, just about every camera with an autofocus system has depth sensing, but what we need for this is the ability to sense the depth of every pixel!
And we’re getting there. The original Kinect was able to provide low-ish resolution depth maps. The detail was actually pretty low but easily enough to detect basic movements and the shapes of objects in front of it: and it worked in real time. Some developers even used this to create 3D models, although not very detailed.
Kinect 2 is out now and is far more detailed. But not 4K detailed. But that’s OK because, in a way, depth information is a bit like the relationship of colour to luminance (ie brightness) values: you don’t need depth information to be as detailed as your brightness data, although there are times when it might help - like if you’re trying to track the end of a cat’s whisker.
These systems will continue to get better and the lack of depth detail on a per-pixel basis will eventually be overcome by the cleverness of the algorithms that extrapolate between the depth values, smoothing out any bumps and discrepancies, and making the best possible job in the absence of pixel-by-pixel depth information.
The second way to create real-time 3D models is to use multiple cameras, and powerful software to create new angles of view in the gaps between them. In doing this, at least some sort of 3D model has to be created.
3D scanning can create amazingly detailed images that can be rotated in all directions.
We’ve covered virtually all these techniques in RedShark. But until now, we haven’t connected them together. So why are we making this leap of imagination now?
Critical to the future of digital video
It’s not because of an overnight scientific breakthrough. It’s because of some business news, that we’ve already reported on. (If you’re not into social media here - we’re going to talk about it in the next few paragraphs, but stay with us, because this is crutial to the bit about the future of video!)
Facebook bought Oculus Rift this week. It’s a big deal, not just because it involves $ billions, but because Facebook might be playing a long game, the outcome of which is exactly what we’ve been talking about.
Of course, Facebook might have bought Oculus Rift because Mark Zuckerberg likes the technology. He might have bought this in the same way that a mere multi-millionaire (as opposed to multi-billionaire) might have bought a custom-built sports car: just to have fun with and as a bit of a status symbol.
But while this is possible, it’s not likely because Facebook is a publically traded company. The shareholders would probably not approve of company purchases as exercises in vanity.
No, what’s more likely is that there’s a bigger vision here.
Right now, that vision is probably not clear. There are too many variables and contingencies, but what there is of it does - of course - include virtual reality.
Don’t be distracted here by the thought of having to wear ungainly virtual reality helmets! These are a transitional technology until high resolution retinal projection comes into common use. This is where an image is projected directly onto your retina as opposed to you viewing it on a screen in front of you. At best, this can superimpose an image on top of what you’re already seeing, but all you need to do is mask your eyes and you’ll see 100% of what is being projected.
Social media’s current platform is the web and mobile apps, which pretty much duplicate the web experience. There’s nothing very immersive about it. It’s a flat experience. What Facebook is trying to do is identify the next big platform. At this stage, it’s anybody’s guess. Mobile is clearly a platform in itself but it is no more immersive.
Meanwhile, more in the gaming domain than social networks, products like Sony Home on the Playstation 3 (but not the Playstation 4!) have introduced Virtual Worlds to large populations of gamers, with the ability to walk around and talk to other people in a shared environment. You can even buy virtual goods like clothes, apartments and yachts. More recently, there’s’ Avakin (full disclosure: the two owners of Avakin are part of my immediate family) , which is the first 3D world to embrace mobile and be available across all platforms (Android, OS X, Web etc).
This type of thing is instantly compatible with Virtual Reality: the difference being that you would be able to walk around amongst your virtual friends as opposed to waving to them on a screen.
So there’s at least one connection between VR and social media. But what about video? What if we could watch video in VR instead of on a screen?
Walk around a movie set
Wouldn’t it be great if we could just walk around a move set, walking up to the actors, and watching them from all angles?
Well, probably not, because that’s not what film making is about. It’s more about giving you a carefully crafted (and controlled!) experience that’s going to fall apart if you walk onto the set. Don’t forget that most movie sets aren’t what they seem: if you walk behind them you’ll see that they’re just hoardings.
But sometimes, VR is exactly what you want. It can take you to places you’ve never seen and give you an immersive experience. And it can give you experiences you’ve never had and that have never been thought of before because it’s not possible!
And maybe VR is ultimately going to be the way we were really supposed to see 3D in films: not as a stereoscopic image, where there is only ever going to be one valid viewpoint, but as a real 3D stage, where we’re looking at objects, not pixels.
And to get to here, we will have solved the problem of how to produce vector video. We will no longer be slaves to pixel resolution or frame rates, although we will still have to capture images as well as we possibly can.
I just want to be clear about this: at some point you have to capture pixels, whether it’s conventional ones that describe the visual image, 3D pixels that have a depth value, or pixels from multiple camera viewpoints. Then you have to create a moving 3D “scene” and in doing so you will create a description of how the elements within the scene change over time. Then, to actually see these images, you’ll need to turn them back into pixels, but you’ll do so at the resolution of the device.
Talking to the brain
But ultimately, we might not need to output our video as pixels, because at some point we’ll talk to the brain directly. I realise that’s quite a statement. Here’s what I mean:
When we see with our eyes, we don’t use pixels at all. Pixels are a digital phenomenon, but our perception is not quantized in conventional digital terms.
Instead, we recognise objects - at a high level and a low level simultaneously. Here’s the crucial bit: it is actually our brain (or, to be quite specific, our brain’s “reality engine”) that makes whatever we see look high resolution. When we “look” at our memories, they don’t look fuzzy or faded: they’re as clear as the room I’m sitting in now. This is not to say they’re accurate - and that’s the point! Think about your old school playing field, if you had one. Now, mentally zoom in until you can see an individual blade of grass. And now go in even closer until that blade of grass fills your field of “vision”. Can you see all the little imperfections on it? And is it still sharp?
What’s going on here is that you’re not accessing an exabyte of bitmap data stored in your brain. Instead, you’re looking up what your idea of a single blade of grass should look like and you’re mentally sprinkling a magic ingredient called “reality” onto it. Essentially, you’re flagging it as “real” and “Accurate”; even if it’s not!
Does that final part of the previous-but-one paragraph remind you of anything? Have you ever been hypnotised? I have, but it never seems to work well with me. But what should happen in hypnosis is that your sense of reality is diverted to something other than what is happening around you in your current environment. It’s a very powerful phenomenon. It’s related to what happens when we get absorbed in a film. We get caught up in the action and completely transfer our sense of reality to the movie - even if it’s a fairy story or a a cartoon.
Eventually, we’ll be able to “transfer” the reality-generating part of our cognition to a 3D virtual video world. We will be inside the video to the same extent as we’re inside our dreams.
And when we do that, we won’t be feeding our perception with pixels. There will be a more efficient way to do it than that. Quite what that is I don’t know exactly, but it will probably be similar, in a sense to the graphics “primitives” that are used to create 3D models in games - except they will be in a more concise, precise language of perceptual primitives rather than representations of physical objects.
Do you still wonder whether this will ever happen? Large parts of it are in place already.
We spoke to Matteo Shapiro, co-founder and chief creative and technical officer of Replay-Technologies what he thought about the future of 360 degree video and virtual (and augmented) reality.
Imagine a simple camera array implemented in the walls of your home and office, that when coupled with an AR device, it literally lets you immerse yourself with another "freeD" space- such as having dinner with your family from your Las Vegas hotel room, with complete and utter immersion. This would be I believe the next huge social revolution- the elimination of "physical boundaries". Imagine we could meet for coffee and chat while being 3000 miles apart
This technology is going to affect us in all aspects of our life. All the way from the living room scene describe above to new types of movie and immersive video entertainment.
The technology is so nearly here. All we need is our imagination to make use of it.