Streaming delays are a major problem to overcome

Written by Phil Rhodes

Shutterstock - TZIDO SUN

Live streaming has opened up broadcasting opportunities across the board. But latency issues can throw a spanner into the works, and it's not an easy problem to solve.

Anyone who's watched a Twitch stream will be keenly aware of the problems of latency, delay and lag. Back in the day, when we executed dissolves by pushing on a T-bar, things happened instantly. They had to because the technology to buffer large amounts of video didn't really exist. Devices such as the seminal Panasonic WJ-MX50 vision mixer, a favourite of video playback people on sets during the last days of film being widely used, had a single frame's worth of memory on each input so that it could synchronise all of the incoming frames for mixing. The maximum delay was, therefore, a single frame, and things felt snappy and responsive.

Ask a question to a Twitch streamer, though, and while they'll see the text message almost immediately, their video response won't get back to you for perhaps tens of seconds. In the traditional sense, that's a lot of video to buffer up and it's only possible because modern technology gives us such huge resources. It represents a whole pile of delays: the webcam itself takes time to process the image and probably compresses it before sending it down the USB bus to the host computer. The USB bus has a certain amount of buffering in it, then the computer itself will almost certainly decompress the image and do some processing on it, especially if we're generating a Twitch stream with the game full screen and the player chromakeyed into one corner.

Then it has to be recompressed and sent to (say) Twitch or Livestream, at which point it will almost certainly be decompressed and recompressed yet again to create streams suitable for various different devices on internet connections of different capabilities. Even then, the lion's share of the delay comes from network buffering, where several devices in the chain of electronics between one computer and another might store up a bit of video data, so that, if there's an unexpected interruption, then there's enough spare audio-visual data to cover the gap. Sometimes it's possible to tweak the settings, but it's easy to end up on a knife-edge of reliability.

It's a big problem

Does this matter? Sure. It means that truly interactive productions are difficult or impossible. Nobody is ever going to be able to do a phone-in show on a platform with a 20-second round-trip time. Shorter delays are a problem too: the awkward several-second pause between a news presenter's question and the field reporter's response is a familiar problem. There are even issues with sub-second delays, where it's very difficult to speak clearly when one's own voice is audible with a half-second delay, perhaps via a comms headset.

It's also a very difficult problem to solve. The ability we now have to send video over links of very limited performance, such as a cell phone network, is entirely reliant on compression. The sort of compression we tend to use for long-distance and live work invariably uses inter-frame compression, taking advantage of the similarities between adjacent frames to achieve (much) better performance. That's fine, but it does mean that if we consider the changes over (say) six frames, most techniques ensure that there will always be an absolute minimum of six frames delay in the encoder, regardless of any other special engineering we might choose to do. Better compression is available by using larger frame groupings so that quality and delay are opposite sides of the same engineering compromise.

There's no immediate prospect of a fix for most of this. It'll be a chilly day in the devil's hometown before the internet gets fast enough to send uncompressed video or sufficiently reliable to do it with minimal or no buffering. Still, a bit more engineering attention to this sort of thing can help a lot, as we saw when Blackmagic released a firmware update for the Ursa Mini which reduced the lag in the viewfinder, where instants matter. Otherwise, a second or half a second in an encoder is one thing, taking 20 seconds to stream images of someone playing a game is quite another, and it's probably possible to do better. Let's hope more work gets done on this.

Main image: Shutterstock - Tzido Sun

Tags: Production


Related Articles

2 August, 2020

This is how the first DV cameras changed video production forever

The 1980s were the decade when video began to encroach on film – certainly for TV, if not for cinema. The 1990s was the decade when digital cameras...

Read Story

1 August, 2020

This is one of the biggest influencers on modern video you might not have heard of

If you’ve started using cameras in the last few years you might not be aware of just how far cameras have come. For some time one of the go-to...

Read Story

31 July, 2020

Why do we keep thinking in 35mm for focal lengths?

Replay: Do we really need to keep using 35mm as our baseline for focal lengths, or is there a much better way?

Read Story