The limitations of human vision and how we see colour are actually an advantage when it comes to storing video. Phil Rhodes demystifies the confusing world of colour subsampling.
Most people will have encountered three digits separated by colons when discussing digital video. What's being referred to is a technique of image compression or at least a reduction in the data required to store a frame. Colour subsampling doesn't work in the same way as things like JPEG.
This is the image we'll be using as an example. Notice the various skin tones and the background which contains green and is generally less saturated
The idea behind subsampling as a compression technique comes from the fact that human beings don't see colour as sharply as they see brightness. Put a red dot in the middle of a green display and we don't see the edge of that dot as sharply as we'd see a black dot on white or even a lighter grey dot on a darker grey. The human eye has two kinds of light-sensitive cells, one of which has three sub-types that can see colour. There are, broadly speaking, far more brightness-sensitive cells than there are colour-sensitive cells.
Beyond the biological factors, at least some part of colour subsampling in digital video was provoked by the techniques of analogue video. Colour was always an add-on to electronic moving pictures, designed for backward compatibility with monochrome TVs. The techniques used to shoehorn colour into the monochrome TV system had limited resolution, so colour TV also had less resolution in colour than brightness.
These decisions were also guided by a need for digital video to match or exceed the performance of analogue systems. Standard-definition digital video has 704 horizontal pixels, simply because that's what was necessary to match the rate at which an analogue signal could change the brightness of a single line of video. Analogue video tends to be able to change colours at about a third the rate it can change brightness, so the colour resolution is about 230-odd pixels, horizontally.
Unfortunately, the desire to treat colour and brightness separately doesn't work with red, green and blue colour channels, each of which contains both brightness and colour information. Happily, there's a way we can mathematically separate the two. What we'll end up with is a greyscale brightness channel and two channels which each indicate how much red or blue to add or subtract from that brightness channel to get back to the original colour.
This is the red colour-difference channel. The skin is brighter than the background, and the red lipstick is brightest of all. Notice also the orange artefact in the background, which contains red
Let's do an example. The brightness channel is easy to understand. It's (roughly) a black and white representation of the picture. If the colour we're trying to represent in a pixel is grey, then the red and blue difference channels sit exactly in the middle. If we're trying to represent red, the red difference channel will be at maximum value, while the blue will be in the middle. If we want cyan, we reduce the red channel; subtracting red from grey creates cyan. If we want blue, we push up the blue channel. If we want yellow, we reduce it.
This is the blue colour-difference channel. The skin is darker than the background, pulling it away from blue, toward yellow. The lips have a lot of blue because the lipstick is pink, not red
The mathematics are pretty straightforward. There are some complications, in that we now have a system that can describe colours which are 100% bright, and 100% red simultaneously. This doesn't match very well with the RGB original image, which can only have things that are 100% red when the red channel is at full power and the green and blue channels are at zero. The problem is, that won't create a pixel that's at 100% brightness. The system we're discussing does not have exactly the same capability to encode various ranges of brightness and colour as conventional RGB and, therefore, the translation between the two is lossy.
The brightness channel is essentially a black and white version of the image
This sort of colour encoding is called YUV, component, YCC, YCBCR, or other things, depending on the circumstances. Regardless, we've disentangled the colour from the brightness. In the digital world we still have three values per pixel, however, so we haven't achieved savings in data space. What we can do is store the colour difference channels at a reduced resolution – and that's what things like “4:2:2” are talking about.
422 subsampling at work. This is a blown up 10 to 1 from an area around the model's camera-left eye
In a 4:2:2 image, for every four pixels of brightness, we store only two pixels of each colour difference channel. To put it another way, the colour difference channels are stored at half the horizontal resolution, compared to the brightness channel. When we reconstruct the image, we can scale the colour difference channels back up, and fuzzy human colour sight won't notice. Terms like 4:2:0 suggest that for every four luminance pixels, we're storing two red colour-difference pixels and no blue colour difference pixels. That wouldn't let us recreate anything like the right colours, though, and it's a shorthand for halving the resolution of the image in both the horizontal and vertical directions – it's essentially 4:2:0 on one line, and 4:0:2 on the next.
The HDCAM tape format was described as 3:1:1. Even the luminance signal was reduced in resolution to 1440 pixels wide, or three-quarters of the 1920 pixel width of the HD frame. 4:1:1 exists for really low-resolution colour, as in NTSC DV and some other formats. On computers, codecs such as ProRes may support 4:4:4 recording for no loss of colour resolution at all. That's useful if we're doing green-screen work or selective grading because while our eyes might not see the lack of colour resolution, the software will.
The separation of brightness and colour has been a crucial part of colour video since the development of colour TV, and it's used in the vast majority of video that's shot today. Many codecs support a variety of different options, which should now make a bit more sense than they did before.
Images accompanying this article were prepared using Rarevision's 5DtoRGB, an application designed to recover subsampled colour using better scaling algorithms than many codecs use. It also has the option to output raw YUV data, which was used here.