RedShark Replay: Using some very clever algorithms, unusable footage is made to look like it was shot with a Steadicam [first published August 2014]
As video cameras get smaller and more portable, they get attached to things that shake, vibrate and wobble. You can’t blame the owners: they just want to get the most spectacular footage possible, but while the ideal place for a camera is a tripod, that ideal is about as far as you can get from the mounting point of a typical action cam.
The situation is even worse if you want to speed up the action, because playing the material fast, or shooting time lapse, just makes the erratic movement look even worse.
Luckily, there are ways to “smooth out” the action, but they’re not perfect and simply don’t work with time-lapse, where the movement between frames is too severe to compensate for.
There’s no doubt that video stabilisation is a very clever process. There are different ways to do it but it mostly revolves around the idea that you can track points or objects in successive frames, and then move the whole picture such that those points are in the same place in subsequent frames. The system takes into account the “proper” movement of objects and simply evens out the movement so that it looks relatively natural.
You don't get something for nothing
There are issues with this - you don’t get something for nothing.
First, in order to stabilise the picture, the system will have to zoom in to allow the edges of the original frame to be used to absorb the movement necessary to stabilise the main part of the image. So the overall resolution is reduced. Mostly, it’s a worthwhile sacrifice, because the overall viewability is improved by the process. (This doesn't have to happen if the stabilisation is done in-camera, using an over-sized sensor to provide the "safety zones" around the edges).
There’s another, more subtle problem, which is that if the unwanted movement of the camera results in it being rotated from side to side or up and down, then correcting the image simply by repositioning it will fail to take into account that the camera will be looking at the scene from a different angle. If this happens by more than a very small amount, objects will appear distorted and the corrected image will look as though it’s being projected onto a moving rubber diaphragm. This is called "Planar Distortion" and it's a particularly nasty one to have to correct.
The new technique that we’re going to see actually uses the fact that the camera can point in different directions to help with the correction, to spectacular effect. It’s not a method that can be used in every case - in fact, it can currently only be used with time lapse material, but it does have implications and potential for other uses, and the way it works is just incredible.
It’s been developed - and is still a “work in progress” - by Microsoft. If you’re surprised by that you might be unaware that Microsoft has always had a lab where projects which seemingly have nothing to do with the main business of Microsoft are developed. Some are utterly speculative - and that’s their strength.
You can easily imagine products like Microsoft’s Kinect for the Xbox coming out of this environment (although part of Kinect came from Microsoft's Gaming Division, and other parts were from third-party developers). Another technology, which is available for you to try, is Photosynth. And we suspect that it is Photosynth technology that forms the basis of the technique that is the subject of this article.
Let’s look at Photosynth first. It’s an extremely clever technique that takes still photographs (from an individual or completely croudsourced) and places them in a 3D space, meshing them together to create a complete environment, which, in the Photosynth viewer, you can look around in any direction. Sometimes the viewing “sphere” is incomplete - you can’t rely on there being picture for every direction unless the “Photosynth” picture is planned in advance.
It goes further than that. If you see a part of the scene that you’d like to view in more detail, just zoom in. Imagine a clock tower. You first see it as part of a wide angle shot, and then, as you close in, the image changes to a shot of the clock that someone has taken with a telephoto lens. Ultimately as you continue to get closer, the system might be able to use a picture that someone took while up a ladder, and only six inches away from the clock face. All the image transitions, at the various scales, blend into each other to look almost like a video.
What Photosynth is, ultimately, is a way to construct a pseudo 3D world from a series of flat images. Using image-matching techniques, the system “understands” the 3D space that the pictures were taken in, and can warp and blend the pictures to give a final image that is correct for the viewing angle of the arbitrarily-placed “virtual” camera controlled by the viewer. Note that this isn't a genuine 3D model, but set of flat images that the system can position in 3D space. It's kind of 2.5D.
You really need to see this at work to understand it. There are many examples on Microsoft’s Photosynth site, and you can even make your own “Photosynths”.
The key takeaway here is that the Photosynth software can understand the position and direction of a camera by building a virtual world from the information in a set of frames.
So far this only applies to stills. But there's more...
Now, this technique is obviously nothing to do with video so far. But we suspect that a some significant parts of it are used in a new way of stabilising time lapse footage from action cameras.
Cameras like GoPro and dozens of similar devices are pretty universally worn these days by anyone who takes part in action sports. Mountain bikers and rock climbers routinely take their cameras with them on their adventures. Often the footage they capture is fantastic, but more often than that, it can be boring, because in between the exciting parts of their adventures, are tedious, dull intervals with no intrinsic merit.
To get around this issue, and to avoid having to edit what might be a very long video, these adventurous video-makers sometimes use time lapse instead to speed up the action and minimise the uninteresting bits.
But there’s a problem with this. Even the real-time footage from a camera on a bike is shaky, and so time lapse footage is even worse.
Conventional stabilisation doesn’t work with time lapse, because there is too much distance between the frames to smooth out the motion. You can imagine that in between time lapse frames the mountain bike might have gone around a corner - so there's no consistent frame of reference for stabilisation.
Microsoft steps in
This is where Microsoft’s technique steps in.
The first thing the new method does is look at all the frames in the video and then identify images that inform the system where the camera is in space, and where it is pointing. Using this information, it works out the exact 3D path that the camera took during the ride.
Then, it looks for individual frames - and parts of frames - that it can use to construct a new image that would be seen correctly from a new, idealised, smoother path. Remember that the technology can blend, warp and zoom into parts of an image.
The result is almost unbelievably different from the original, and better. You really do have to see it to believe it.
Where this gets really exciting is when you consider what else could be done with this technique. The biggest limitation of the current method is that it only works with time lapse. Maybe, if you were to use a very high frame rate for the original footage, you could apply this to “real-time” video.
You may ultimately be able to move a “virtual camera’ to a different position based only on the images from a single camera. And if you shot with multiple cameras you could do so much more.
You can imagine a movie director sitting in the post production suite with a joystick, controlling the “virtual camera” as if it were a real one.
It’s just possible that this might turn out to be the most powerful video processing technique that there’s ever been.
Two example videos after the break
Microsoft video showing the Hyperlapse technique