It's almost as if camera manufacturers are somehow dedicated to keeping hard disk manufacturers in business. By the early 2000s, it had become quite easy to handle standard-definition digital video. We had almost achieved the nirvana of being able to do advanced editing without having to spend a fortune on hardware. Then along came HD, which looked better, but put the cost of an hour's worth of storage back where it had been in the mid-90s. Given 4K and HDR, and high frame rate, and 4K HDR wide-colour gamut high frame rate stereo 3D uncompressed raw, well... there's no immediate sign of cloud storage or pocket-sized devices taking over from big stacks of hard disks.
What's changed are the ways in which we can configure those stacks of disks. Back in the day, there was no Thunderbolt for desktop connections and no fast networks to share storage around a building, but there were still a lot of choices to make.
It's common to refer to almost any unified pile of hard disks as a RAID but the R in the term – redundancy – only applies to some of the ways in which that pile can be stuck together into one device. Even when a disk array does offer redundancy, some configurations can tolerate more failures than others. The time various types of array take to recover from failures is variable, and some types perform very slowly when they've suffered a failure. That can make the system unusable regardless of the fact that it may eventually recover – and near a deadline, that's still a big problem. More than that, any setup that provides redundancy will sacrifice a bit of disk space to do it since at least some part of the data needs to be stored more than once. Some systems sacrifice more than others. Some are easier to expand with more space without having to delete the contents.
Finally, of course, there's cost – more capable systems achieve a better compromise of space, speed redundancy, but more capable disk controllers cost more. A lot of modern devices support various different RAID techniques, and different techniques work better for different applications. Here's a quick guide to what works best.
JBOD – the RAID that's not a RAID
There are various approaches to simply joining hard disks together to add more space without affecting reliability or speed. The acronym stands for Just a Bunch of Disks, though other terms such as SPAN are used to describe similar approaches. The idea here is that when one disk is near full, it's possible to add another. If any disk fails the contents are lost, and performance is exactly the same as a single disk since each file is only resident on a single disk. The purpose of this is simply the convenience of being able to expand the disk without reconfiguring software to look somewhere else for files.
RAID 0 – another RAID that's still not a RAID
A RAID 0 array always needs at least two disks. Data saved to the array is broken up into small pieces and alternately written on each of the disks. Two disks work at roughly double the speed of a single one because each only has to store half of the data. It's usually possible to select the “stripe size,” that is, the size of the chunks into which the data is broken down. Larger stripe sizes can be slightly faster, but if the stripe size multiplied by the number of disks in the array is bigger than the file to be stored, space must be wasted. If any disk fails, all of the data is lost, so RAID 0 is less reliable than a single disk – though it is much, much faster. Some early disk recorders that were used for high-end filmmaking used RAID 0 in a desperate attempt to keep up with uncompressed HD pictures and no, that wasn't a very sensible idea. Raid 0 is sometimes called striping.
RAID 1 – Redundancy
Making a backup copy of a file onto two devices at once is often done manually. RAID 1 automates that process, using more than one disk with the same data written to each of them. Two disks are normal, though more can be used if even higher reliability is required. Because the data is duplicated, the efficiency (on two disks) is 50%, but all of the data is available as long as at least one disk is still working. The performance of a RAID 1 array is dependent on the size of the files and the way they're read. In theory, a RAID 1 can read more than one file at once, making reading faster, though writing is no faster than a single disk. RAID 1 is sometimes called mirroring.
RAID 10 – Combining techniques
Someone interested in the speed of RAID 0 and the reliability of RAID 1 might decide to combine the two approaches in hopes of achieving an array that sacrifices half of the available disk space to redundancy but which is very fast at both reading and writing. This works well if we take several pairs of disks in RAID 1, which can, therefore, suffer a failure and keep working, and combine those pairs in RAID 0 so that data is split up across several of the RAID 1 pairs. The result can survive several disks failing, so long as they're not both in the same RAID 1 pair. If there is a failure, it's quick to rebuild - the failed disk is replaced with a new one and the contents of the other disk in the same RAID 1 pair copied onto it.
Despite the large space sacrifice, RAID 10 can often make sense. It's simple, requiring only a basic controller (built into many modern PCs or simulated by the operating system) and the cost of the extra disks can be less than the cost of a third-party controller capable of similar performance with fewer disks.
RAID 2 through RAID 4
Configurations numbered 2 through 4 have existed, but are obsolete – they do nothing that other approaches can't do better.
RAID 5 and RAID 6
RAID 5 and 6 offsets the space-inefficiency of RAID 10 by doing a parity calculation for the data written to (often) three disks and storing that parity on the fourth. The mathematics are arranged so that the contents of any one of the three data disks can be calculated based on the other two plus the parity. If the parity drive fails, it can be recalculated from the three data disks. As such, it sacrifices less space to redundancy than RAID 10. A four-disk RAID 5 will have 75% of the total space of the disks available for data, as opposed to 50%. RAID 6 is more variable but often similar and tends to store the parity information twice so that it can withstand the failure of any two disks. This is, to some extent, a reaction to the fact that modern disks are becoming so huge that RAID 5 is too likely to suffer an unrecoverable failure; adding more duplication increases reliability.
The downside is that the parity calculations are complex and writing to a RAID 5 or 6 tends to be slower than RAID 10, unless a rather expensive controller with lots of number-crunching power is used. In a desktop workstation, it tends to be cheaper to use RAID 10, spending more money on disks and less on the controller, to achieve similar storage space often with better performance.
Various types of RAID can be used either as direct-attached storage, where the disks are in the workstation or as part of storage made available over a network, where one computer contains the disks and a network interface and does no work other than managing the storage. Various approaches to this are called a storage-area network or network-attached storage, and it's usually done so that several workstations can have access to the same files, as in a large newsroom editing suite. In this situation, it's common for the disks to be controlled by software on the storage machine, as opposed to a dedicated controller card, though both approaches work. Software RAID controllers, such as ZFS, tend to be more configurable than hardware storage controllers.
This sort of setup can be put together by an enthusiast, perhaps using an old machine for the storage server, but unless there's a specific need for shared storage, the complexity of it will tend to outweigh the advantages. Software RAID, in general, can work well, especially with simpler approaches such as RAID 10, which means that RAID 10 is often the best approach when we simply need some fast, reliable storage for a workstation.
The other approach is simply to use flash storage, which is possibly, maybe, arguably somewhat more reliable than hard disks – or then again maybe not, depending on type. Perhaps one day flash will replace spinning metal and RAID entirely, but that day has repeatedly failed to occur, despite being widely predicted. Flash is certainly a lot more expensive, so as cameras keep producing more and more data, it seems likely that at some point, a big production is going to need the proverbial fridge full of hard disks and that requires some way to stick them all together.
It's also worth mentioning that RAID, no matter how reliable, is not backup. Take a RAID 1 and accidentally delete a file, and the RAID 1 controller will dutifully delete that file from both disks. No matter how good a disk array is, we still need something like LTO tape so there's a copy of important data to put safely on a shelf.
Title image: Shutterstock - Pixza Studio