The Z List: One of the great technological leaps forward that failed to really catch on the first time round was the 3D camera - a unit that could automatically capture the z-axis (the "depth axis) of a scene to create depth map information. Perhaps it’s time for the industry to look at them again.
If there is one aspect of films that makes huge leaps year on year it is computer generated imagery. Ever since James Cameron's “The Abyss” and subsequently “Terminator 2: Judgement Day” exploded onto the scene, seamlessly integrating CGI with live action footage, we have expected our CGI with a firm dose of photo-realism ever since.
With ever more powerful computers being available to, well, pretty much everyone, this CGI revolution gradually made its way down the chain to be accessible to all of us. Anyone with After Effects and a suitable 3D animation package could try their hand at the sorts of special effects that were once the preserve of million dollar movies.
There is one link that is missing in all of this, however. From the most expensive films down to the most lowly of independent movies, nothing has really come along to make compositing (the mixing controlled blending together of multiple layers) truly easy. I mean truly point and click easy.
It is true that software has become better at detecting outlines and giving us more tools to be able to extract objects from the background, and moviemakers have become more adept at shooting well lit green screen. What happens, though, if you are shooting with a real background and you wish to integrate CGI or other effects? What then?
Traditionally this means long nights hand tracing outlines and other archaic methods. For the animator it is about as much fun as having root canal surgery. Despite all the promises of amazing edge detection in the latest versions of After Effects for example, the software isn’t perfect, and it would be impossible for it to be so. After all the world is a complex place and a computer cannot possibly have the intelligence to know what an individual object in a scene actually is. At least not yet.
So what can be done about this? The answer I believe lies inside your Xbox Kinect device. The Kinect is not just a camera. The secret to how it works is that it creates a Z-depth map of the scene in front of it.
Z-depth maps have long since been used by 3D animators for compositing purposes and effects. They can be used by apps such as Photoshop to create depth of field effects for pre-rendered 3D animations in post for example. This is because a Z-depth map creates a black and white render of a scene and depicts the distance of the surface from the camera with the tone of its shading. The furthest away point is rendered as black, while surfaces that are closer to the camera are lighter in shade, going towards white.
Such depth maps are now also used to assist in creating 3D conversions from 2D sources.
So what on earth has this got to do with the cameras that we all use? As the Xbox Kinect, as well as other similar devices show, it is possible to combine depth mapping with a standard camera. Such devices have many names, including Range Cameras and RGB-D cameras to name a couple.
If we can capture an accurate depth channel at the same time that we shoot our footage we will immediately have an extremely powerful tool for post production work.
No screen greenscreen
Imagine for example, instead of having to use green screen and along with all its associated drawbacks, that you could eliminate the background based upon a depth filter? You could film your presenter or actor against any colour of backdrop and obtain a seamless extraction based upon their distance from it. You could even extract multiple individual objects by layering such a filter.
Want to integrate an alien flying saucer flying through your footage of the London skyline? No problem. A depth map would tell the computer how far away buildings were, so compositing the flying saucer into the scene so that it can fly behind and in front of various objects would be made far easier than it is currently.
Recording a high resolution depth map would also make it possible to add true depth of field effects in post with specially designed filters that take advantage of the depth information. Such a filter would not be able to correct already out of focus footage (although theoretically depth map information could help such currently crude correction filters to be more accurate), but it would allow footage shot with a small chip camera, for example, to defocus aspects such as the background or the foreground with a filter in an NLE, without the need for tracing around objects or actors manually.
In short the ability for our cameras to record an integrated depth channel into our footage would be a step change in post production speed, fluidity, and abilities. Not only would the merging of CGI and live action be made that much easier, but there are also some rather amazing alternative uses.
Because a depth map is mapping the distance of a surface from the camera, it is therefore possible to create a basic 3D model of the scene using that information. Of course the depth map cannot allow information from behind the object, but it is possible to extract a basic 3D model of a scene and allow for slight rotation.
If you took footage by rotating around a scene or an object 360 degrees it is possible to calculate a very accurate 3D model, perhaps much more so than current software that can do this based upon straightforward video footage.
The ability to make a “projected” 3D height map of the scene could possibly have other benefits too, including the ability to add extra lighting to a scene. This of course could not correct for issues of noise from adding light to a shadow area. However it could allow for subtle light accenting to increase the realism of different times of the day, and other effects.
Currently the equipment to record depth maps in real time is fairly low resolution, but technology is improving all the time. The accuracy of the depth maps are also affected by the number of tones in the greyscale, too. Remember that a depth map, once recorded, is merely a black and white picture or animation. So the higher the colour bit-depth, the more accurate and finer the depth map. Once the depth maps that are recorded match the resolution of the footage itself, then we will have a very useful tool indeed.
The advantage of recording depth maps with footage hasn’t gone unnoticed by some, and hence already projects such as Depthkit, which integrate the Xbox Kinect with normal cameras are being developed. The results are rather crude so far, but demonstrate the construction of 3D animated objects from the Kinect data for graphical effects.
Ironically the technology inside the Kinect came from a company called 3DV, who developed a camera device called the Z-Cam. They claimed that the technology could in fact be placed inside any camera. It is an irony then that 3DV no longer exists, since the technology was bought out by Microsoft, while a group of enthusiasts are working to get the Kinect working in harmony with standard video cameras!