<img src="https://certify.alexametrics.com/atrk.gif?account=43vOv1Y1Mn20Io" style="display:none" height="1" width="1" alt="">

All codecs share the same DNA

3 minute read

Blackmagic Design

It's extremely difficult to develop a new codec that truly transforms the fundementals of video compression. Here's why.

Another day, another new video codec. Or, if we look just a little below the surface, perhaps a not-so-new video codec.

Less is more. Or is it the opposite

The development I'm talking about here is called MagicYUV and it's promoted as a way to store 4K video with true lossless compression, to be carefully differentiated from visually lossless compression, which means “lossy compression”. Benchmarks given on the site suggest that MagicYUV is significantly faster than either of the two most commonly encountered lossless codecs. Comparisons include UT Video, which I talked about back in 2018, and Lagarith, which is essentially a derivative of the now-venerable HuffYUV.

With flash getting bigger and faster all the time, even lossy codecs are now capable of creating near-lossless results

And yes, all of these things use fairly similar underlying technology, which is what I'm really here to discuss. The reality is that, despite the marketing efforts of various companies, there are really only two common ways to do conventional, lossy compression on video and only a couple of ways to do properly lossless compression on any type of data. The canonical example of lossy compression is the discrete cosine transform, which is used in at least (deep breath) JPEG images and the MJPEG video codec, DV, DVCAM and DVCPRO-HD tape formats, DNxHD, ProRes, and at least some part of every incarnation of MPEG, including H.264 and H.265, though they use many other techniques besides.

DCT coding

DCT coding was introduced in the mid-70s for image compression and works by treating the brightness values of the pixels in the image as a graph forming a wiggly line (that is, treating it in the frequency domain), then approximating that graph by adding together various proportions of sine waves at different frequencies. The compression comes from storing how much of each sine wave to use with less precision. The other common way to do lossy compression on images is related: the discrete wavelet transform (sometimes DWT). It works in broadly the same way, except that instead of using sine waves, DWT uses other types of mathematically generated curves, and generally works a bit better.

And that's really it. If we consider things like the MPEG codecs, then there are a lot more techniques to discuss which deal with the differences between frames and things become a lot more complicated, though many of those techniques are also common to the H-series designs and things like VP8 and VP9. For most acquisition codecs, though, the lion's share of the work is done by one of two bits of mathematics and the differences between them are in how that's implemented. Let’s be clear, DNxHD and ProRes are not the same codecs, but they’re as comparable as, say, a Hershey or Cadbury bar, which are not the same thing but both fundamentally chocolate.

The square edges visible in the blown-up area are the edges of macroblocks within which the DCT algorithm is run.png

The square edges visible in the blown-up area are the edges of macroblocks, within which the DCT algorithm is run

Similarly, there are only a few ways to compress data without loss. Arithmetic coding is often used essentially as a post-processing step after the application of something like DCT, which may still leave redundancy in the output data. Look for terms like CABAC (for context-adaptive binary arithmetic coding) and CAVLC (...variable-length coding) options in particularly advanced H.264 encoders. Asymmetric numeral systems (ANS) are another, a perhaps better, approach that I discussed about a year ago. By far the best-established approach to lossless compressing video, however, is Huffman tree encoding, which brings us back to MagicYUV and its various aunts and uncles.

Huffman encoding processes data to represent the most common combinations of bits with a short code and longer combinations with a longer code, reducing overall size while still being able to precisely reconstruct the original data; the “tree” often referred to describes the way in which that’s worked out. Anyway, the conclusion here is not that there’s any reason to be critical of people for reusing these mathematical concepts, nor that there’s anything wrong with any particular piece of software.

Might we one day see Huffman-style compression here

The beauty is in the implementation. If MagicYUV is faster than some competing item, then that’s likely to be down to the way it’s been written rather than any massive innovation in the fundamentals. If you’re the developer and this isn’t true, do let us know; with flash capacity skyrocketing, it’s quite possible that mathematically lossless compression may become more practical for acquisition in the near future. In the end, the truth is that, say, ProRes is useful, but not because there’s anything spectacularly clever about the underlying technology. There isn’t and there never was, but the details of the implementation and the level of compatibility matter hugely.

Tags: Production