It’s not just a case of backing your data up, you need to back the right data up in the right place and do it consistently.
I'll spare the blushes of the production whose misfortune provoked this article, as losing a morning's rushes to a moment of carelessness with a mouse and keyboard is bad enough, even without public disclosure. Nevertheless, our subject today is one that should chill the giblets of even the steeliest-eyed cameraman: that sinking feeling that occurs only when the result of inserting a card into a slot is anything other than a screen full of footage. The phrase “the disk in drive E is not formatted” should, we feel, be metaphysically accompanied by one of those unsettling musical blasts from the brass section of the Inception trailer orchestra.
On a sufficiently large scale production, the job of dealing with data from cameras is a full-time job with lots of time and resources to get things right. But at the nine-figure budgetary level, that's nothing new: there was always a loader to deal with this most responsible, yet paradoxically most junior, of tasks. Where digital acquisition has made the most difference is at the level which could never afford film anyway, so never needed a loader – and until the early noughties, were mainly shooting onto tape, so archive was barely a problem. Proper data handling in these realms, whether you want to call the incumbent a DIT, a digital loader, or a data wrangler, is often terribly resource limited, but many of the most common mistakes can be mitigated for free.
The top three problems
Perhaps the most common SNAFU is to carefully duplicate, verify, and store material, only to discover it's the wrong material. Computers will duplicate files all day long, and even the sort of extra reliability mechanisms used by the film industry (MD5 checksums, or media hash lists which use the same underlying mechanism) will only verify that files are identical, not that they're the right files. Some of the best-designed production reports include space for a brief, one-sentence description of the image content of the first, and sometimes last, files in an archive, to head off the issue of carefully storing the wrong material. Noting slate details and a timecode range helps too – although timecode can be wrong, and sometimes difficult to actually read from a file without special tools, it's an easily automated check.
Running a close second to this is the issue of unchecked backups. In situations where two duplicates of material are being made on the same hardware, this is less of an issue, but often two tape drives will be used to create dual redundant backups, or two sets of hard disks used to the same end. It's therefore possible that the backups may have faults that won't be noticed until it's too late – generally, only the master copy will be read because problems are rare. It's for this reason that high-end workflows often do checksum verification of file copies, but that still doesn't obviate issues with a hard disk that might, for instance, only be readable on one very specific system architecture, or a device that reads its own recordings happily but produces media that nobody else can read.
This sort of programme-interchange problem largely went away with helical-scan tape, but there are still circumstances under which it can occur. Data wranglers interested in really bulletproof workflows will ensure that both master and backup are checked on separate hardware – when writing two copies of something on two hard disks, for instance, swap them over before verification, ideally onto a different machine running a different OS, so that each set of software and electronics checks the other.
Jostling for third place are insidious problems which are the result of misfortune or malfeasance, as opposed to problems of technology. Really this should have been aired sooner, because real-world events or human factors are a far more common vector for problems than any frailty of the technology, which is itself generally very good. First, if we make two copies of something and put them both in the same vehicle, if that vehicle is stolen, the backup is meaningless. Not for nothing do film companies use old salt mines (chosen for their stable humidity and temperature in the days of film, but just as useful now) as a fallout-proof shelter for their valuable back-catalogue. The term “offsite backup” is used in fields such as finance to refer to the need to keep a redundant copy of important data at some physical remove from the primary copy, so that if the building burns to the ground, the company within doesn't suffer a similar fate. The stolen-car situation is more common, but the preemptive technique is the same in both cases: give one copy to the producer and send one to post. If the company offices, or the producer's car, disappear into a sinkhole, the production will not go with it.
Belt and braces
And there is one more related issue with dual redundant backups which has caused people a lot of heartburn. If we create two copies each of two camera rolls, we have four pieces of media, two of which will go to offsite storage and two of which will go to post. It's incredibly easy, at the end of a long day, to send both copies of Roll 1 to storage, and both copies of Roll 2 to post. As a result, Roll 1 appears to vanish. A good preventative approach here is to buy sets of tapes (or drives, or whatever) from different manufacturers so they're different colours (or to mark the cases with coloured tape, though that's not quite as good) so that the “master” and “backup” copies are instantly visually identifiable. Elastic-banding a copy of the camera reports, as well as a condensed directory listing, to each piece of media is a good idea. One day, someone will start making tape and disk cases with a document wallet on the side.
Ultimately, the choice of techniques will be dictated by the requirements of a completion bond company (if any) or insurer, but although working practices are fairly well codified on higher end shows, the same techniques apply almost universally and all too often they are not followed, even though most of them cost nothing to do. Decent hard disks, from companies such as G-Technology, as well as things like the LTO tape format that we've discussed before are now very reliable, but as I think we've made clear, the problems are not really technological – the problems are overwhelmingly human factors, and that takes nothing but discipline to solve.