NVIDIA has released a demo of how it is using AI to revolutionise video conferencing. This is the future of video codecs.
NVIDIA Maxine AI video codec. Image: NVIDIA.
Called NVIDIA Maxine, the new AI system can not only cram high quality video streaming into a tenth of the space that traditional streaming takes, but it can improve the quality as well. It's an impressive demonstration of where video codecs are heading. But it isn't just video bandwidth that is saved. The system also provides for some extremely impressive noise removal (the demo below shows the sounds of children in the background being removed), as well as realtime language translation. I'll come to the implications in a moment, but just take a look at the demo below.
The primary selling point of the new system is the incredibly low bandwidth requirements that are required for streaming. We are all having to do more online conferencing or online chats with family members these days, and not everyone has a good internet connection. if you're in a rural area you could still be stuck with a 1Mb/s connection, or in some cases worse.
NVIDIA's Maxine system can stream high quality video at around 0.1165KB per frame. Clearly this will be of huge benefit to someone on either a slow rural connection or on mobile. So, how does it work?
The layman's take on things is that it uses a reference image of the caller, and then it isolates various control points, such as the facial outline, the eyes, the mouth, the nose etc, and then effectively the AI constructs the detail itself. It's an odd thought, because much like the idea of upscaling and interpolating old film images much of the image is constructed with 'fakery'.
Nevertheless the results are astoundingly good, and combined with the other features, such as intelligent noise reduction, realtime language translation, virtual assistants that take notes, transcription etc, it will likely be a very popular service.
But the video manipulation doesn't stop at bandwidth. The system can also re-align your face so that it is looking directly at the camera. As anyone who does regular calls knows, looking off camera does reduce the 'face-to-face' effect that video calling is supposed to replicate. Unlike Apple's experimentation with eye realignment, NVIDIA's system looks pretty good and thankfully, unlike the Apple version, doesn't make you look like Christopher Lloyd in Who Framed Roger Rabbit.
NVIDIA Maxine, you'll be unsurprised to hear, is a cloud based service, although it also utilises the power of NVIDIA GPU Tensor cores to provide a lot of the grunt. You can sign up for early adopter access on its website.
Knowing that AI is effectively rebuilding your face may put off some of the more technology averse. But it cannot be ignored that the system seeks to solve a problem that many of us find ourselves at the mercy of. As such it makes video conferencing more convenient, and apparently more immediate. No price for the service has been made available yet, but regardless of the cost you can bet that this type of technology will start finding its way into other video streaming systems and codecs in the future. It's a solid glimpse of what's to come.