We've been following the progress of the Fraunhofer Institute's work on lightfield arrays for a while, catching up at NAB and IBC as demonstrations moved from stills to moving pictures. The most recent work has been on the post processing requirements of the technique, but first, let's recap exactly what this is all about.
A lightfield array is a number of cameras arranged – generally - in a grid, each of which therefore sees the scene from a very slightly different angle due to its differing position with respect to the others. In some respects this is rather like a stereoscopic camera rig, only more so: instead of the rather limited horizontal offset of two stereoscopic cameras, a lightfield array provides a variety of views of the scene which are distributed in both axes of the two-dimensional image. Common experimental arrays have used nine, twelve or sixteen cameras – note that there is no requirement that the array be square.
This approach leads to several interesting techniques. Using post production software, the multiple views from different cameras can be interpolated to produce a novel camera position not represented in the original grid; this camera can be positioned not only horizontally and vertically, but also, to a limited extent, in depth. This position can be animated to any point within the original lightfield, producing true tracking shots, albeit of limited range. The point of focus may be selected in post, with the lightfield images being used to calculate an optically accurate blur of any desired radius which behaves correctly with respect to the depth of objects in the scene.
A depth map of the scene can also be calculated, permitting compositing and grading tools to operate three-dimensionally. No longer is the colourist required to isolate parts of the scene for separate treatment based on their existing hue, saturation and brightness; objects in the deep background can be separated based on their three-dimensional position. Compositing can also take place, basing decisions about what pixels represent the background and which the foreground based on the value of the depth map.
It also becomes possible to relight the scene in a way that even the most advanced pre-existing grading tools struggle to match. The depth map can be used to calculate a normal map – that is, a value for each pixel representing the angle between the camera and the object the pixel represents. Between this and the 3D depth information, true three-dimensional lighting can be calculated, and even specularity changes on objects. In extremis, the entire scene can be rebuilt in a 3D package as a cloud of points, each representing a pixel, and subject to any of the many techniques available in 3D graphics.
And now, the downside
There are, of course, caveats to all this. The principal inconveniences of lightfield arrays are the multiplication of storage – one stream for each camera – and the requirement that the cameras be reasonably small, so that the resolution of the captured lightfield is reasonably high with respect to the scale of objects in the scene. It is theoretically possible though to use dissimilar cameras – a central high-end digital cinematography device, perhaps, to record image data, surrounded by smaller, simpler cameras to record the lightfield.
Post is hard work too. To realise the benefits we've discussed, calculations must be performed that are not dissimilar to those involved in the vector quantisation of a video codec. In the codec, areas of the image are matched between frames; with a lightfield array, the images from all the cameras are compared, with the difference in the apparent position of an object between two cameras being proportional to the depth of that object within the scene. With, say, 16 cameras, each of which must be matched to 15 others, that's a very big job.
The technique as it is being pursued at the moment is imperfect. As with all extant depth-sensing technologies, the depth map is somewhat noisy and unfortunately the edges of foreground objects are likely to be exactly the place where precision and cleanliness is most required for compositing (although the results are already good enough to make foreground-background selectivity in grading trivially straightforward). There is also an unexpected side issue with antialiasing in depth maps, regardless of the source technology.
But, despite the wrinkles, Fraunhofer is continuing an admirable tradition of researching not only theoretical possibilities but also practical applications, and lightfield arrays in general seem on the very cusp of usability as regards a working tool. Other depth-sensing technologies, such as time-of-flight cameras which measure the time required for light to bounce back from an object, suffer noise problems at least as severely, and usually more so, and they don't provide any image data about the appearance of objects partially occluded by the notional camera in the way that a lightfield array does.
It's certainly possible to imagine a future, and one not too far off, in which the tedium of rotoscoping VFX shots is only necessary in particular problem cases. Lightfield arrays will also improve with the general improvement of cameras, and are a worthwhile application for the very high resolution sensors which continue to become available.
People thinking really big might imagine a studio entirely walled in cameras, or perhaps a sphere, within which the camera could be placed arbitrarily in post production. The storage and computing requirements make that impractical at the moment, but one day, peel-and-stick camera wallpaper might make grip, camera and lighting entirely redundant.