Sometimes I yearn for the old days. Nostalgia can be rose-tinted but I don’t seem to remember a problem with A/V synchronisation and technology from the 1990s or before. Things were much simpler then – no streaming services, no user-generated content and only a handful of manufacturers who made domestic TVs, most of them being CRTs.
Roll on a couple of decades and things have got a lot more complicated: millions of channels on YouTube, thousands of TV channels across the globe and a slew of TV manufacturers, some of them the established names and some that have launched within the last five years. All these choices and all these new ways to consume content – TV, laptop, tablet and even VR headset. It’s no wonder that sometimes there are some technical gremlins. One that seems to have gotten worse over time is proper lip-sync or A/V sync. This is a complex problem, maintaining proper sync from camera to screen is actually a bit more difficult than it first appears.
Let’s break it down here
The Camera - Silly as it sounds, the camera needs to record the audio and the video in sync, either from an internal microphone or an external source. This is usually not a problem if the mics are hard-wired in, but wireless ones can have issues. It's not unusual to find latency of more than half a frame with some more consumer-based wireless systems, at which point it becomes quite noticeable. Of course, you can also have separate system audio recording which requires synchronisation when you come to edit.
The Edit - By default, any audio you import which has been recorded in-camera is automatically kept in sync on the timeline. If you've recorded separate system audio you then need to synchronise the audio and video before anything. This can be done automatically or manually. These days, the automatic system can work quite well as long as you've recorded scratch audio on your camera so it has something to synchronise to. It is, however, very easy to knock clips out of sync if you're not careful when editing, although it should be very obvious this is usually indicated with some red flags on the audio and video that help you bring the two items back into line.
Mixing frame rates
However, there can be some issues when using different frame rates and mixing them on the same timeline, especially if the sample rates from the audio are different. You can also encounter problems when exporting audio and video if you choose a high compression ratio with a large GoP figure.
You are then at the mercy of the delivery and display systems. There are, of course, technical specifications that detail the maximum latency required for broadcasting. However, there is no use such document for delivery to the user content generated services like YouTube. Usually, what happens is that the more highly compressed a delivery media is, the worse the relationship between the audio and video synchronisation gets. It truly is a minefield. I have found that, usually, the best service for correct A/V synchronisation is, in descending order of accuracy, Disc based media, SD TV from an SD original, HD broadcast television, Amazon Video, Netflix, YouTube and then an SD broadcast if has been down-converted from an HD source. Although this can vary significantly depending on the exact content.
None of this chain, however, takes into account variables like the software and firmware running on the display devices you are watching on and even the display type. Of course, if you're using wireless headphones or speakers to monitor, this throws another spanner into the works. If you look at the end-to-end chain it is no wonder that sometimes things go a little awry. Even the introduction of automatic lip-sync into the HDMI specification some years ago does not seem to have done anything to improve matters. This feature is actually optional and lots of devices don’t support it. It can’t do anything if the error is in the original media in the first place.
This makes it even more frustrating when sync drift isn’t consistent, either across different TV channels or even within a single one. You may also encounter a situation where different devices will produce different results, on one TV your favourite YouTuber may be in sync, but move to the Kitchen TV and they are out. This may be only a few frames, but it’s noticeable, although with smaller displays at certain distances it’s not as easy to spot.
Of course, it's not unusual to find the ability to delay the audio in the menus of the various players and even in some televisions. However, this does nothing if the video is in fact early to start with. In that case, you'll need an external A/V processor between your display device and your player which can correct for these scenarios. These tend to cost more than the players themselves or sometimes more than the television. It shouldn't be beyond the realms of possibility for this functionality to be incorporated into the display or the player. Something that once cost many hundreds of pounds or dollars to implement must surely be available now at a lower cost option? I think that perhaps the reason this feature is not offered is very simple: viewers are either not noticing the problem and complaining, or they have simply learned to live with it.