RedShark Replay: With H.265 only just becoming common in devices, we revisit this article from 2013 as Phil Rhodes explores the background to HEVC/H.265, and explains what makes it so good at compressing video. Read this if you want to know how almost all video - including 4K - will be delivered in the near future.
Since almost the first days of digital video, there’s been a need to reduce the otherwise unmanageable amount of data that uncompressed video represents. While it’s now relatively straightforward to handle standard-definition, and even high definition, video as a sequence of unadulterated bitmaps, the demand for ever-higher resolution means that efficiently reducing the bitrate of video is going to be an issue for as long as people need to sell televisions.
Compression has been around since the early 90s
Video compression became a mass-market technology in the early 90s, with the release of Quicktime in December 1991 and Video for Windows a year or so later. At the time, the performance of video codecs – in terms of the ratio of bitrate to image quality – was limited mainly by the performance of the system that decoded it, which invariably meant the CPU of a desktop PC. Video codecs are generally highly asymmetric, with encoding taking more work than decoding, often multiples of realtime to encode – but they must usually be decoded in realtime. At the time, Intel’s 486 processor line was ascendant, but with performance limited to perhaps fifty million instructions per second, use of an encoding scheme such as the now-current h.264/MPEG-4 AVC was impractical. Both Video for Windows and Quicktime were initially most often used with Cinepak, a codec based on wildly different techniques to more modern ones, but with the key feature that it was designed to work in what, by modern standards, seem extremely modest circumstances. Decoding 320 by 240 frames, at the 150 kilobytes per second of a single-speed CD-ROM drive, is something you can absolutely do with h.264 – but you couldn’t have decoded it on the CPU of a Sega Mega Drive (er, Genesis, Americans) games console, circa 1990.
Drive for better quality
The drive for better quality for the bitrate, as well as the need for better absolute quality and higher resolution, is nothing new, and has largely advanced in step with the ability of ever-improving hardware to handle more elaborate codec techniques. Through the late 90s, approaches that are recognisably the technological forerunner of current codecs began to emerge, particularly h.261 in 1998 which was designed to stream video over ISDN lines from 64Kbps upward. Through the last decade or so, and ever-increasing H-numbers (which come from ITU-T Recommendation numbers), the performance of video codecs has improved more or less alongside the ability of affordable electronics to decode them. This is good, given the explosive success of video-on-demand services and the resulting pressures placed on internet and cellular network bandwidth. One would be forgiven for assuming, with maximum cynicism and misanthropy, that the work involved in all this improvement is being done mainly so that people can send us lots more advertising without having to upgrade their technology. Either way, it’s now clear that the internet and various riffs on video over IP technology is what’s going to provide the video-on-demand experience that’s been discussed since the 80s, even if the people who developed the protocols on which the internet runs probably didn’t foresee this use.
The successor to h.264 is, cunningly, h.265, an ITU standard just a fraction away from being approved at the time of writing that's based on the High Efficiency Video Coding (HEVC) system. As we’ve seen, the complexity and therefore effectiveness of video codecs is largely controlled by the performance of the devices on which they will be replayed. HEVC leverages improvements in the performance of consumer electronics, aiming to achieve the same image quality at half the bitrate of h.264 through the application of more advanced image-encoding techniques, while being no more than three times harder to decode than High-profile h.264. On the face of it, this seems like a questionable deal, with a tripling of requirements while only doubling performance, but with Arm and Intel currently competing quite effectively to create electronics that do more work for less money and less electricity, this seems like a rather ungenerous criticism.
Achievements in improving the performance of compression codecs have been mainly regulated by the performance of the consumer devices that must decode and display content, and recent improvements in this area have made ever more complex and effective codecs feasible. HEVC is recognisably a development of h.264, and as the overview to the specification states, there is no one major change that accounts much more than any other for the improvement in performance.
Variable block size
This is inevitably a very partial and incomplete discussion as the complete HEVC specification is neither short nor simple, but perhaps most fundamentally, HEVC does not start by breaking the image up into squares of equal size, as had been the case in previous standards. Instead, it has the flexibility to select the block size to maximise the effectiveness of its other techniques, depending on the image content. The block size itself is, optionally, larger – up to 64 pixels square, as opposed to 16 – which makes for additional efficiencies, especially on the more CPU-intensive profiles of HEVC which require the use of larger blocks. Beyond this, HEVC has the option to break these larger blocks down into smaller ones, treating the smaller blocks individually with regard to the encoding techniques that are used to compress them.
Working within a frame
When working on I (intra) frames, which don’t require reference to any other frame when decoding, HEVC offers a wider variety of tools for predicting the content of one block from neighbouring blocks within the frame. H.264, for instance, allowed prediction in any one of eight directions; HEVC provides 33, with a clever concentration of those angles around the horizontal and vertical where real-world pictures are statistically likely to have a lot of similarity.
Motion compensation is a common technique in modern codecs, allowing the re-use of image data which may simply have moved around the frame due to camera motion. With respect to B (bidirectional) and P (predictive) frames, which are assembled with reference to picture data from nearby frames, HEVC provides greater precision than h.264, more accurate processing and a larger range (which is useful for higher-resolution video) when describing where a block of picture may have moved to.
This is where it gets a bit complicated
Moving on to the topic of really complicated mathematics, h.264 allowed either of two final, lossless compression techniques based on entropy coding to be applied to the output of the discrete cosine transform used to compress actual image data. Without turning this article into a pure mathematics lecture, these techniques (CAVLC and CABAC, if you want to look it up) offered a choice of efficiency against CPU horsepower, with CABAC being more effective but rather considerably harder work to decode. This became something of a dividing issue in h.264, with the Baseline and Extended profiles offering only the less effective CAVLC option. Some early video iPod devices supported only these lower profiles, creating an unpalatable choice between performance and bandwidth for distributors. HEVC, on the other hand, requires the more effective CABAC scheme in all cases.
Finally, both h.264 and HEVC provide a couple of types of filtering which correct the output of all the previous operations towards a more ideal result. An in-depth discussion of these filters is beyond the scope of this article, although it’s worth making it clear that these are a lot smarter than simply blurring errors and do make use of information about the picture to clean things up in an intelligent and accurate manner.
Optimised for multiple core processing
There are other changes in HEVC, mainly aimed at making it easier to implement in a parallel processing environment (effectively, one where many CPU cores are available to do work simultaneously). These, however, aren’t aimed at improving image quality (in fact, a couple of them may fractionally decrease it) and are mainly intended to make it possible to divide up the work of decoding HEVC on a finer level than was possible with h.264.
So, that’s an overview, albeit a very broad one, of what HEVC is about. The question of whether it achieves its fifty per cent bitrate reduction goal for the same picture quality is a difficult one because objectively measuring video quality as perceived by humans is strictly speaking impossible. The industry has been trying to move away from a simple ratio of signal to noise, as it has been recognised that it can suggest unreasonably good results in some circumstances. Wavelet codecs, such as the JPEG-2000 algorithm used by Red in their cameras, are notorious for producing great signal to noise figures while simultaneously not necessarily looking as good as the S/N statistic would suggest. Real world tests of image quality tend to involve both widely-agreed mathematical algorithms and the results of subjective analysis by real humans, averaged in an attempt to remove individual bias.
Does it succeed?
In reasonable tests, then, HEVC does approach its goal of 50% bitrate reduction. At half the bitrate, individual HEVC frames may look less sharp than h.264 frames, but HEVC moving image sequences tend to do better in human-observer tests as it produces much reduced temporal artefacts. Video compressed with HEVC flickers less, suffers less from trembling blocks of image, and generally looks better on the move.
When we’ll start to see it deployed in acquisition devices - cameras - is anyone’s guess, but software products and hardware encoders for the mobile content market are already available. To put this in perspective, digital terrestrial TV in the UK (for example) is still mainly MPEG-2!
Read our article on Video Data Rates if you want to understand why we really do need compression!