Artificial intelligence for video compression is a technology that is coming to a streaming service near you, and it can't arrive quickly enough.
A year ago, when everyone decamped home overnight and overheated the global demand for the internet. In a selfless act, Netflix, YouTube and Disney+ dialled down their bitrates to ease bandwidth consumption in the process deliberately compromising the ultimate quality of their service (for about a month).
That immediate crisis may have subsided but in a world where online video use is soaring and bandwidth remains at a premium, some longer term solution is required. Even in a world with universal 5G, bandwidth is not a finite resource. Not when 5G promises uber video-centric bandwidth hogging applications like 8K VR.
New video compression technologies are the conventional answer but the ‘Moore’s Laws’ for its development have reached the end of the line. The coding algorithm has been tweaked over and over, but it is still based on the same original scheme.
Even great new hope Versatile Video Coding (VVC) which MPEG is targeting at ‘next-gen’ immersive applications is only an evolutionary step forward from HEVC, itself a generation away from the neanderthal H.261 in 1988.
It’s not only the concept which has reached its limit. So too has physical capacity on a silicon chip. Codecs are at an evolutionary cul-de-sac. What we need is a new species.
AI compression enters the frame
The smarts of codec development are being trained on artificial intelligence, machine learning, and neural networks.
AI/ML techniques fundamentally differ from traditional methods because they can solve multi-dimensional issues that are difficult to model mathematically. They are also software-based and therefore more suited for an environment in which applications will run on generic hardware or virtualised in the cloud.
“We think that you could use AI to retain essentially the same schema as currently but using some AI modules,” says Lionel Oisel, director, Imaging Science Lab, InterDigital which owns patents in HEVC and VVC. “This would be quite conservative and be pushed by the more cost-conscious manufacturers. We also think that we could throw the existing schema away and start again using a compete end to end chain for AI - a neural network design.”
Some vendors have used ML to optimise the selection of encoding parameters, and others have incorporated techniques at a much deeper level, for example, to assist with the prediction of elements of output frames.
First AI-driven solutions
V-Nova lays claim to being the first company to have standardised an AI-based codec. It teamed with Metaliquid, a video analysis provider, to build V-Nova’s codec Perseus Pro into a AI solution for contribution workflows now enshrined as VC-6 (SMPTE standard 2117).
Algorithms AI can calculate bitrate to optimise bandwidth usage while maintaining an appropriate level of quality at superspeed.
Nvidia’s Maxine system uses an AI to compress video for very low bandwidth video conferencing.
Haivision offers Lightflow Encode which uses ML to analyse video content (per title or per scene), to determine the optimal bitrate ladder and encoding configuration for video.
It uses a video quality metric called LQI which represents how good the human visual system perceives video content at different bitrates and resolutions. Haivision claims this results in “significant” bitrate reductions and “perceptual quality improvements, ensuring that an optimised cost-quality value is realised.”
Perceptual quality rather than ‘broadcast quality’ is increasingly being used to rate video codecs and automate bit rate tuning. Metrics like VMAF (Video Multi-method Assessment Fusion) combines human vision modelling with machine learning and seeks to understand how viewers perceive content when streamed on a laptop, connected TV or smartphone.
It was originated by Netflix and is now open sourced.
“VMAF can capture larger differences between codecs, as well as scaling artifacts, in a way that’s better correlated with perceptual quality,” Netflix explains “It enables us to compare codecs in the regions which are truly relevant.”
ML techniques which have been used heavily in image recognition will be key to meeting the growing demand for video streaming that we are seeing, according to Christian Timmerer, a co-founder of streaming technology company Bitmovin and a member of the research project Athena Christian Doppler Pilot Laboratory. The lab is currently preparing for large-scale testing of a convolutional neural network (CNN) integrated into production-style video coding solutions.
In a paper recently presented to the IEEE Timmerer’s team proposed the use of CNNs to speed up the encoding of ‘multiple representations’ of video. In layperson’s terms, videos are stored in versions or ‘representations’ of multiple sizes and qualities. The player, which is requesting the video content from the server on which it resides, chooses the most suitable representation based on whatever the network conditions are at the time.
In theory, this adds efficiency to the encoding and streaming process. In practicality, however, the most common approach for delivering video over the Internet - HTTP Adaptive Streaming limits in the ability to encode the same content at different quality levels.
“Fast multirate encoding approaches leveraging CNNs, we found, may offer the ability to speed the process by referencing information from previously encoded representations,” he explains. “Basing performance on the fastest, not the slowest element in the process.”
iSIZE steps up
London-based startup iSIZE Technologies has developed an encoder to capitalise on the trend for perceptual quality metrics such as VMAF. Its bitrate saving and quality improvements are achieved by incorporating a proprietary deep perceptual optimisation and precoding technology as a preprocessing stage of a standard codec pipeline.
This ‘precoder’ stage enhances details of the areas of each frame that affect the perceptual quality score of the content after encoding and dials down details that are less important.
“Our perceptual optimisation algorithm seeks to understand what part of the picture triggers our eyes and what we don’t notice at all,” explains Sergio Grce, company CEO.
This not only keeps an organisation’s existing codec infrastructure and workflow unchanged but is claimed to save 30 to 50 percent on bitrate at the cost in latency of just 1 frame – making it suitable for live as well as VOD.
The company has tested its technology (shown here) against AVC, HEVC and VVC with “substantial savings” in each case.
“Companies with planet scale steaming services like YouTube and Netflix have started to talk about hitting the tech walls,” says Grce. “Their content is generating millions and millions of views but they cannot adopt a new codec or build new data centres fast enough to cope with such an increase in streaming demand.”
Old problem, new tools
Even MPEG co-founder Leonardo Chiariglione saw the writing on the wall. He left the body in 2019 to found MPAI – Moving pictures, audio and data coding by Artificial Intelligence (AI).
MPAI is an international non-profit organisation with the mission is to develop AI enabled digital data compression specifications, with clear Intellectual Property Rights (IPR) licensing frameworks – that is, unlike MPEG in its latter days.
In 1997 the match between IBM Deep Blue and Garry Kasparov made headlines. Machine beat man.
“As with IBM Deep Blue, old coding tools had a priori statistical knowledge modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics,” Chiariglione says.
“This is the reason why AI tools are more promising than traditional data processing tools. For a new age you need new tools and a new organisation tuned to use those new tools.”