Streaming delays are a major problem to overcome

Written by Phil Rhodes

Shutterstock - TZIDO SUN

Live streaming has opened up broadcasting opportunities across the board. But latency issues can throw a spanner into the works, and it's not an easy problem to solve.

Anyone who's watched a Twitch stream will be keenly aware of the problems of latency, delay and lag. Back in the day, when we executed dissolves by pushing on a T-bar, things happened instantly. They had to because the technology to buffer large amounts of video didn't really exist. Devices such as the seminal Panasonic WJ-MX50 vision mixer, a favourite of video playback people on sets during the last days of film being widely used, had a single frame's worth of memory on each input so that it could synchronise all of the incoming frames for mixing. The maximum delay was, therefore, a single frame, and things felt snappy and responsive.

Ask a question to a Twitch streamer, though, and while they'll see the text message almost immediately, their video response won't get back to you for perhaps tens of seconds. In the traditional sense, that's a lot of video to buffer up and it's only possible because modern technology gives us such huge resources. It represents a whole pile of delays: the webcam itself takes time to process the image and probably compresses it before sending it down the USB bus to the host computer. The USB bus has a certain amount of buffering in it, then the computer itself will almost certainly decompress the image and do some processing on it, especially if we're generating a Twitch stream with the game full screen and the player chromakeyed into one corner.

Then it has to be recompressed and sent to (say) Twitch or Livestream, at which point it will almost certainly be decompressed and recompressed yet again to create streams suitable for various different devices on internet connections of different capabilities. Even then, the lion's share of the delay comes from network buffering, where several devices in the chain of electronics between one computer and another might store up a bit of video data, so that, if there's an unexpected interruption, then there's enough spare audio-visual data to cover the gap. Sometimes it's possible to tweak the settings, but it's easy to end up on a knife-edge of reliability.

It's a big problem

Does this matter? Sure. It means that truly interactive productions are difficult or impossible. Nobody is ever going to be able to do a phone-in show on a platform with a 20-second round-trip time. Shorter delays are a problem too: the awkward several-second pause between a news presenter's question and the field reporter's response is a familiar problem. There are even issues with sub-second delays, where it's very difficult to speak clearly when one's own voice is audible with a half-second delay, perhaps via a comms headset.

It's also a very difficult problem to solve. The ability we now have to send video over links of very limited performance, such as a cell phone network, is entirely reliant on compression. The sort of compression we tend to use for long-distance and live work invariably uses inter-frame compression, taking advantage of the similarities between adjacent frames to achieve (much) better performance. That's fine, but it does mean that if we consider the changes over (say) six frames, most techniques ensure that there will always be an absolute minimum of six frames delay in the encoder, regardless of any other special engineering we might choose to do. Better compression is available by using larger frame groupings so that quality and delay are opposite sides of the same engineering compromise.

There's no immediate prospect of a fix for most of this. It'll be a chilly day in the devil's hometown before the internet gets fast enough to send uncompressed video or sufficiently reliable to do it with minimal or no buffering. Still, a bit more engineering attention to this sort of thing can help a lot, as we saw when Blackmagic released a firmware update for the Ursa Mini which reduced the lag in the viewfinder, where instants matter. Otherwise, a second or half a second in an encoder is one thing, taking 20 seconds to stream images of someone playing a game is quite another, and it's probably possible to do better. Let's hope more work gets done on this.

Main image: Shutterstock - Tzido Sun

Tags: Production


Related Articles

27 May, 2020

Sony FDR-AX3: The camera that makes gimbals redundant?

Sony announces the FDR-AX3, a camera that would appear to push the boundaries of non gimbal stabilisation.

The Sony FDR-AX3 has sophisticated new...

Read Story

26 May, 2020

Sony ZV-1 is an entirely new genre of camera

Vlogging is one of the most popular uses for video cameras today. Sony has just announced a new camera designed specifically for the purpose.

The new...

Read Story

25 May, 2020

Watch the stunning beauty of surfing, encapsulated at 1000fps

Replay: Whether you're a couch potato or an active lifestyler, I think we can all agree that surfing is cool. And it doesn't get much cooler looking...

Read Story