Redshark's only 10 months old, and our readership is growing all the time. So if you're a new arrival here you'll have missed some great articles from earlier in the year
These RedShark articles are too good to waste! So we're re-publishing them one per day for the next two weeks, under the banner "RedShark Summer Replay".
Here's today's Replay:
This may be the most important stuff you can know about digital video
This is a topic that nothing else makes much sense without. If you don't understand this stuff, you're likely to make mistakes all the time with video - especiallly now that we're into an era of complex raw workflows and grading. Luckly, you're in safe hands. Phil Rhodes is an expert: he's not just a cinematographer but an engineer as well.
While we’ve talked about dynamic range and bit depth before, it seems as if that there’s quite a bit of confusion out there regarding the relationship between the absolute dynamic range of a camera and the bit depth of the recording medium. I’ve even heard manufacturers’ sales reps furthering this association, which just goes to show that you shouldn’t get your information from sales reps. Obviously, you should get your information from Red Shark News.
I think the reason for this confusion comes from the fact that both dynamic range and bit depth increase their range by a factor of two every time an additional index is added. Open up a lens by a stop and you double the amount of light getting in; add one bit to a binary number and you double the range of numbers it can encode.
In binary numbers, this is easy to understand. While practical imaging systems such as the ones we use in cinematography use either eight or ten bits to encode brightness, let’s consider the example of a two-bit word, which is easier to consider as an example. Two bits, each of which may be either 1 or 0, have a total of four unique combinations: 00, 01, 10 and 11. Add another bit, and we can have all of the existing combinations with the new bit at zero, plus all the existing combinations with the new bit at one, for a total of eight. Keep doing this, and you end up with a total of 1,024 combinations for a 10-bit word. Adding one bit doubles the available range of numbers, and therefore makes for more precise greyscale representations.
F-numbers, in terms of photographic exposure, are similarly straightforward. An F number is the ratio between the diameter of the “entrance pupil” – the hole through which the light must pass – and the focal length. As a practical matter this is easy to picture. Looking down a long tube allows less light to hit your eye than looking down a shorter tube of the same diameter, because the shorter tube allows a wider field of view from which light can enter. The reason the F-stops on actual lenses are not evenly numbered is because the amount of light entering the lens is dependent on the area of the entrance pupil, not its diameter. The counterintuitive, fractional numbers are chosen such that the area of the entrance pupil, and thus the amount of light entering and the exposure, doubles for every stop opened up on the lens.
F-stops, therefore, are a measure of relative brightness, whereas bit depth is a measure of precision. These concepts, while both critical to digital cinematography, are strictly speaking unrelated. It would be entirely feasible to consider a camera system with a dynamic range of 20 stops (something we don’t currently have, but would like) which recorde two-bit pictures (which we wouldn’t like). It wouldn’t be terribly usable, much as people occasionally used to shoot on optical-sound origination stock to probably quite similar results. But if it was configured such that the highest binary count of 11 occurred only when the sensor was at maximum level, and the lowest binary count of 00 occurred just above the noise floor, we could correctly describe that as a four bit picture with a dynamic range of 20 stops.
Now, we might not consider that terribly useful – and we’d be right. But it does serve to demonstrate that while it’s highly desirable to have a decent amount of precision, and the amount of precision required to do a decent job certainly does increase with dynamic range, there is no absolute relationship between the two.
Managing bit depth and the resulting precision issues is a complex topic, if only because the practicalities are largely manufacturer-specific and sometimes poorly-documented. While real-world situations are never as extreme as twenty stops into four bits, it’s not uncommon to find a fourteen-stop camera with eight bit output, such as the Sony FS700. While, as we’ve seen, there’s nothing stopping anyone encoding a 14-stop signal in directly into eight bits, the results of doing so can be difficult to use. Any display device – such as a TFT monitor – will have much less than fourteen stops of contrast range itself, resulting in a flat, low-contrast, greyish and uninteresting picture. Attempts to correct this with standard colour-correction tools will quickly reveal the lack of precision in the signal, stretching out the midtones of the image until banding – strictly, quantization noise - is clearly visible.
The popular log encoding is just one of several ways of making better use of the available bit space, compressing highlights (to which our eyes are less sensitive) and expanding shadow and, more modestly, midtone precision. There is no one single log encoding; in practice, it may have little to do with an actual mathematical logarithm, being defined by the manufacturer of the camera (S-log for Sony, Log C for Arri, etc) in a way that they feel most benefits their camera system. Generally, the signal will be processed within the camera at a very high bit depth in such a way that when the bit depth is reduced for recording, each stop of increased exposure increases the binary count by roughly the same amount.
While this helps make best use of the available precision in a digital signal, it does create an even flatter, even less interesting picture which must be processed before it looks anything other than completely grey and foggy. This is usually done on set with monitors that are capable of loading a predefined lookup table (LUT) which will process the log signal into something viewable, and in post using software which loads the same LUT. The LUT will usually be either a mathematically-defined transformation from the manufacturer-specific log encoding to an industry standard representation which will look reasonable in most cases, or, in more advanced circumstances, something similar which also includes an approximation of the cinematographer’s intended final grade.
It does not always make sense to use this sort of approach. The much-vaunted Technicolor Cinestyle preset for Canon DSLRs attempts to create something like a log workflow, and while this makes sense where the footage will be postproduced alongside log footage from other sources, it is not always, or even usually, a sensible way to go. DSLRs typically apply heavy compression to their recordings, and embedding that heavy compression in the low-contrast log image can cause problems that are hard to fix later. Recovering viewable pictures from a log recording involves adding a lot of contrast, which will exacerbate greatly any compression artefacts which are present. Log encoding is usually sensible only for uncompressed or lightly-compressed workflows, where lightly-compressed might mean something like a high-bitrate ProRes recording.
Conventional video signals solve the problem of packing many stops of dynamic range into a limited-precision signal differently, by applying what is in effect a general-purpose grade. This typically rolls highlights off using something not unlike the S-curve of film exposure, which is where the knee setting of many video cameras comes into play – it defines the point at above which brightness will gradually be compressed to make more room for the more useful and visible midtone information. Standards for the resulting image are defined in well-known documents such as the ITU-R’s Recommendation BT.709, which controls how most real-world cameras do this. While this makes for viewable pictures straight out of the camera, it can limit the gradeability of the picture, and an innovation of the last ten years or so is the option to perform this contrast-range compression more gently. Options referred to as “cine” or “film” in the colorimetry controls of a camera are often an attempt to split the difference between the two approaches, offering some of the superior highlight handling of an unprocessed signal with the easy handling of a gamma-corrected one. This is particularly important to 8-bit cameras such as the Sony FS700 and Canon C300 which would otherwise struggle to adequately record the wide dynamic range signals they’re capable of producing, but are frequently used as part of a workflow that doesn’t easily accommodate log images.
No less complicated...?
I’ve had to be reasonably brief, but I hope this will clear up a bit of the misunderstanding that often surrounds the way bit-depth works and how we deal with precision issues on both low- and high-end cameras. The situation is, at least, more under the control of the individual than it used to be when we had to deal with film stock manufacturers, processing laboratories and a lot of men in white coats, although you’d be forgiven for thinking that it’s no less complicated.
Here's my article 8 bit or 10 bit: the truth may surprise you