RedShark Summer Replay: Sony's XAVC is used across an increasing number of its professional camcorders. Why did they choose yet another format? There were good reasons, as we explain below.
You could hear the groans from experienced camera operators and post production experts as Sony announced a new video format in October 2012, on the day that the F5 and F55 cameras, which use the new format, were revealed.
It was understandable. Very few professionals wake up in the morning wishing that there was yet another video format to grapple with.
But progress is so rapid in the professional video industry that it’s impossible to stand still for long. If we did, we’d all still be using Digital Betacam. The simple fact is that 4K wasn’t covered well by existing formats. Something new was needed to improve quality and usability at the same time as keeping video bitrates (and storage requirements) to a minimum. Here’s just one example: existing H.264-based codecs can’t handle 1080 50p/60p encoding - a format that’s increasingly used in modern productions.
A few industry observers commented that the new format might just be an attempt by Sony to make its cameras proprietary. Every company wants to make customers buy its own products, but, having spent time talking in detail to Sony about the XAVC format, it’s clear to me that this is a genuine technical advance that gives real benefits to users, which is flexible enough to be current for some time ahead.
Since the F5 and F55 were launched, most NLEs have provided native support for the new format, and it has also spread to much of Sony’s new camera range.
Just a bit of background to put all of this into context:
XAVC is based on H.264, but the idea that H.264 is a single, standardised codec is not really true at the practical level. There are dozens of tweaks and optimisations - and fundamentally different parameters - that can be set to suit a camera or the way it is used, and this is the background to the reason why Sony decided to develop their own set of codecs, called XAVC.
You can think of H.264 as a set of building blocks; a toolkit, if you like, that Sony has used to create their XAVC ecosystem. They’ve tweaked the algorithms, made it more efficient, and they’ve added a pre-processor that conditions the video before the encoding process. All of which means that when stacked up against other codecs, it is typically more efficient, and easier to use in post production. What’s more, XAVC has been built using the latest generation of encoding technology, 5.2, ensuring that it is even more efficient.
1. It covers a very wide range of bitrates
It has to. Over the last few years, the ability to get data off a sensor has mushroomed. 4K video requires a sensor with 8 megapixels, and sensors with this resolution have been around for at least ten years, but it is only recently that we’ve been able to get the data off them quickly enough to create video with it. Remember that Full HD is only around 2.5 megapixels, so 4K calls for a 4x increase in raw (ie uncompressed) data rates and storage.
XAVC is designed to scale from 15 mbit/s to 960 mbit/s. This covers just about every likely frame rate (except ultra slow motion) and includes HD as well as 4K.
2. It is designed for acquisition as well as post production
Previous versions of H.264 have been designed primarily for distribution and not for capturing video. This has led to inefficiencies and difficulties with scaling for higher bitrates. For example, the type of H.264 that was designed for Blu Ray and for satellite transmission was never going to be ideal for cameras. By going back to the basic building blocks of the format, Sony has been able to make a codec that is equally happy in a camera and in post production and which brings tangible benefits to both. This is much better than with previous generations of H.264-based codecs which were arguably not optimal for use at the front end of production.
3. It’s available in consumer and professional formats
XAVC is the professional format, and it’s wrapped in an MXF OP1a container - which is standard across broadcast platforms. XAVC-S is the consumer format, and it’s wrapped in an MPEG-4 container. It’s optimised for consumer use. XAVC-S is always 8-bit but, unlike AVCHD, it is designed for 4K. It also differs from XAVC in being better at low bitrates and it is designed for shorter, less complex workflows that are typical of consumer production. In other words, it is not loaded down with robustness that isn’t needed in a consumer product.
4. XAVC is available as Long-GOP and as IntraFrame (I-Frame only)
This really does mean the best of both worlds: a very efficient, low-bitrate codec for when space is at a premium, and a more relaxed, IntraFrame codec for when there’s more space, more bandwidth, or for when editing flexibility is paramount.
Not sure what Long-GOP and IntraFrame mean? Long-GOP (GOP stands for Group Of Pictures) codecs use predictable motion between the frames to recreate the sequence of pictures when decompressed. Doing this means that it isn’t necessary to store material that’s repeated on every frame, even if it is moving. Long-GOP codecs can compress much more than IntraFrame compressors.
IntraFrame is literally “within a frame”. Even a single frame of IntraFrame video can be decompressed accurately, without reference to any other frames. IntraFrame isn’t as efficient as Long GOP, but with some varieties of codec, it is much better for editing where you may need to jump backwards and forwards with single-frame accuracy. It is more efficient in terms of computer power because you don’t need to decode all the surrounding frames to see the frame in the middle! But, (see 5. below) with XAVC, the difference between Long GOP and IntraFrame is much less.
5. It’s as easy to decode long-GOP XAVC as IntraFrame
XAVC needs more computing power than MPEG-2 or, for example ProRes, but this overhead will soon disappear with the ever-increasing power of computers. The really good news is that there is effectively no difference in the computing effort needed to decode Long GOP vs IntraFrame XAVC. This is good news for editors as it means that they’ll be able to jump around their timelines with virtually no performance penalty.
The extra computing power used to encode XAVC means that there is very little difference between the quality of Long GOP recordings and IntraFrame ones. Which means that you can record at lower bitrates for the same quality, saving the cost of memory, and squeezing more high quality video onto your storage media.
6. Better quality than traditional production codecs (eg ProRes or DNxHD) in less space
ProRes and DNxHD are examples of edit-friendly codecs that are easy to work with because they’re relatively simple, and use a low compression ratio. They give great quality at the expense of taking up a bit more room. The more sophisticated algorithms used in XAVC improve the quality for a given bitrate. Although the Sony format needs more computing power, with XAVC IntraFame, you will probably get the same quality as ProRes in around half the space, under optimal conditions. And working with smaller files just about compensates for the slightly greater computing load. (These aren’t Sony’s comparisons - they’re based on what we typically find ourselves.)
7. Dynamically optimises the quality frame by frame
An XAVC encoder optimises quality frame by frame, and while it does this, it records metadata to help decoders understand the optimisations used in the encoding process. So, end-to-end, XAVC maximises quality, but it does so dynamically, so that it doesn’t waste space.
The frame-by-frame metadata also helps with non-linear playback, which is one of the factors that makes Long GOP nearly as good as IntraFrame
8. XAVC Pre-codes media before encoding
Pre-coding, or preparing the media means that the XAVC codec can do its work more efficiently. The Sony XAVC pre-coder is built into their hardware chips but it is also part of the XAVC software codec - so there is no difference between material that has been encoded in hardware or software. The pre-coding occurs both with 4K and high frame-rate recordings.
9. Multi Codec chipsets
Sony has chipsets that work equally well with MPEG2 and XAVC. These chipsets are found in most of Sony’s modern cameras and are enabling them to be upgraded to use XAVC, since the capability is built into the cameras. Conversely, Sony has recently stated that it will never drop support for its older codecs in its chipsets.
10. It isn’t H.265!
Isn’t H.265 supposed to be the latest and greatest codec? It certainly has the potential to be more efficient than previous-generation codecs, but it needs vastly more computing power to encode and decode it than H.264-based codecs. Ultimately, someone will probably create implementations of H.265 codecs that will be more efficient than those made with H.264, but we are still very early in the adoption cycle of H.265.
H.264 has been around for longer, and is extremely widely used. It takes time to research and perfect new technology and it is only now in the life-cycle of H.264 that we’re seeing the full benefits at the production and post production stages.
Computer power has increased dramatically since H.264 was first announced, and computers are easily able to cope with encoding and decoding tasks. XAVC has additional technology to improve the usability of the codec, and the specification of the Sony format is wide enough that it will stay current and optimal for a long time to come.
Sony’s’ decision to create XAVC might have puzzled and infuriated potential users at first, but the truth is that there is probably never going to be a single, universal format to suit all types of cameras, and all types of users. What’s more, Sony knows best how Sony cameras work - after all, they build them from the ground up, even making their own sensors. So it surely makes sense that they should design their own variant of a well-known codec, that brings out the best in its cameras, and in the creative efforts of their customers.