RedShark Replay: This is a hot topic, so we're giving you another chance to read this: If we don't do something fast, all of our digital films and videos will disappear in just a few years
You might be forgiven for thinking that digital recordings are robust and can last, effectively, for ever. Storing video and audio as ones and noughts has advantages: you only have to tell the difference between "On" and "Off", rather than measure, accurately, a continuously varying voltage. What could be more robust than a string of pulses that could be represented by acorns or carrots or pits on an optical disk?
Except that it's not like that at all. The "1"s and "0"s are meaningless if we don't understand the way they're arranged. And this is the crux of the problem.
Operating systems exist to protect us
In a sense, operating systems exist to protect us from all of this. We can even select and play digital files now by talking to our computers: "Siri, play some Calvin Harris". That's pretty useful but also incredibly abstract, where "abstract" means "there is an awful lot hidden between our command and the digital content of the file".
Oil paintings last pretty well. So do photographs, and - as long as it's stored properly, film. Not so digital storage media, especially anything that moves or rotates.
For a long time after I bought my first digital SLR camera (a Canon 300D, in 2004) I made sure that all the pictures I wanted to keep were stored on multiple drives. Around that time external USB hard drives had become cheap and easy to use. So I backed up my rapidly growing collection of JPEGs to two separate external drives, leaving a third copy on whatever was my main work computer. Pretty safe, then.
I did this for several years, copying all the media to newer and newer drives, until, last year, when I was moving house, I tried to find a copy of the folder with all those pictures.
The two external drives just flatly refused to work, for different reasons. One was completely dead, and the other wasn't recognised as a drive by my computer. I tried alternative power supplies and cables. I tried plugging the one that did spin up into another computer. Nothing.
And the third copy?
That was on an old Macbook whose 60GB spinning drive had failed as well.
Luckily, I had most of the files also stored in DropBox, a paid-for cloud storage and synchronisation company that seems to be doing OK today but which might not be here at some point in the future.
You might rightly say that all of this was down to carelessness, and it was, but this was all about approximately 30GB of pretty much universally-readable JPEG files.
Several feature film's worth of raw video
Now imagine the sort of issues you might get storing several feature films' worth of raw video, or an entire digital film archive. This is not going to be easy.
Let's look at the problem in some more detail. And a good way to do this is to consider the question "how do you get a number down a wire". Seriously, this is a very abstract thing and it doesn't happen easily. Let's say you want send a 7 down a cable. 7 is a concept. A cable is a physical thing with no visible means of carrying precise numerical values.
It would be a lot easier if you had more than one wire. If you had ten, for example, you could send the numbers one to ten just by designating the wires as representing one to ten, and then applying a voltage to cable number seven.
This would work, but you'd have to have a way of specifying when one number had finished and another had started. And you'd also need an awful lot of wires, although you can reduce that number if you move towards a binary system where cable 1 represents 1s, cable 2 represents 2s, cable 3 represents 4s, and so on.
But digital media needs a a lot of big numbers, sent very quickly. On our "parallel" system described above, there's always a chance that the numbers won't arrive exactly together. The digits on some wires could be confused with the previous or next numbers. This would place a limit on the bandwidth of this arrangement. The longer the cables, the bigger the problem.
Perhaps counter-intuitively, you can achieve more with a single wire (a "serial" system, which, in practice, is almost never just a single wire), because numbers would be very precisely clocked, and there would be no room for confusion as to which number belongs to which chunk of data. You'd need to wrap up the numbers so that you would know whether they belonged to the right or left channel, with audio, or to R, G, B etc with video. You'd need all sorts of complicated and precise schemes to keep track of the numbers.
So that's how it is with signals. But we're concerned with storage in this article. How do you store these signals? These days, we store them as files.
We store signals as files
Now as we've already hinted, it's very easy to fall back into thinking that files are straightforward to get at. They're there in the file explorer on our computer: easy to see. You can even see what type they are: .jpg, .png, .mov etc. What's not immediately apparent is that at the level that we see them, they're incredibly abstract. Think about it like this: I can see in my Apple OS X finder (Apple's file explorer) files on my computer's hard disk (which is actually an SSD) some files on a 4 terabyte raid on my desk, some files in Dropbox, and some others on an SD card from my camera. These are all wildly different devices, in different places - and in the case of Dropbox I have absolutely no idea either about the nature of the storage device (in the cloud) or where the file is. It might be shredded between a hundred different places for all I know. It really doesn't make any difference.
In the case of my files in Dropbox and the 4TB raid, there's a measure of protection, through redundancy if an individual drive fails. If my computer's hard drive gives up, or my SD card fries itself, that's it. No more files.
Note that these are all devices that play nicely with OS X. It's the job of the operating system to slice through all these layers of abstraction and make my files visible as useful things that I can play and understand. But inside the individual disk drives - well, where do you start? How can you find a file if you don't have a computer to plug it into? If you take it apart all you will see is a bunch of metal plates with some vague pattern of magnetism. You won't even begin to be able to make sense of the magnetic variations on the disk.
Computers are supposed to make sense of this stuff
But why worry? Isn't that what computers are supposed to do? Make sense of this stuff? Yes, of course they do. Until they don't, that is. At some point in the history of any computer platform and operating system, it will stop supporting external and internal devices, either because the OS no longer supports the type of interface used by the external drive, or because there is no longer any software driver for that type of device. Do you remember the early days of SCSI ("Scuzzy") Drives, when they connected to your computer tower with a cable about as thick as a hosepipe? What could you do with one of those drives now? Nothing. There's nowhere that you can plug one into a modern computer and no-one makes the interface cards any more. What's more, these drives are so old that they're very likely to have seized on their spindles.
I have lost valuable material because storage devices have become obsolete. You have to be a certain age to remember Jazz drives. Made by Iomega, these were great. They were about a hundred times the capacity of a floppy drive, and were in a robust plastic cartridge. You needed a special reader for them but they worked and worked well enough to encourage you to put all your stuff on them. Which I did. And I now have a stack of these things and absolutely no way to access any of the information on them.
This lost data includes some MIDI files which were the result of some musical works that I was commissioned to compose. One of them was a ballet score (I know I don't look like I know anything about ballet, but ask me about it one day...) and there were other works that I spent weeks working on as a jobbing composer. At the time I thought I was being clever by copying all my eight inch and three and a half inch floppy drives onto this newer 100 MB format, but all I was doing was throwing them into an obscure dungeon whose keys would dissolve before my eyes.
Ebay is your friend
I still have a few of these works remaining. As a hissy audio recording on a Compact Cassette. Ironically, since I no longer have a working cassette recorder (Ebay is your friend here) I managed to transfer them to my computer where they're now stored in Dropbox as high bitrate MP3 files.
Absolutely the most important thing to remember here is that this can happen right under your nose without you realising it. It's like they way you forget things. When you lose something from the memory in your head, you don't get a notification saying "you've forgotten the name of that annoying kid in your primary school". All that happens is that the next time you try to remember it, you can't. That's it. That's the first time you find out.
Now, luckily, we can take control of this. We can have backup strategies. But that's clearly not enough. There's no point at all in backing up all your files so that they're stored on accessible error-free media, only to find that you don't have any applications to play them. Remember that this abstraction business keeps going all the way from analogue voltages (or analogue patches of polarised magnetism) up to the arcane way that highly compressed video is stored in files. And while we're here - compression is another thing. To play a compressed file, you have to not only have software that can decode the compression (a codec) but it has to understand the way the compressed file is stored. None of this is simple and it certainly isn't obvious.
I'm not the only one worried about this. I have been for some time. And everyone in the digital content creation business who has any time at all to reflect has these concerns as well.
The "father of the internet"
What prompted this article was that someone who really, really, knows what he's talking about, has just drawn attention to this issue, which doesn't just affect digital media files, but everything that exists in a digital form. It's Vint Cerf, a VP of Google and one of the very few people on the planet who can claim to have been a parent of the internet. Cerf was hugely influential in the development of TCP/IP, so he knows a thing or two about this stuff.
And what he's saying is very scary simply because it's clear what we have to do and even more clear that we don't have the means to do it right now. What has to happen, he's saying, is that we have to not only preserve the files, but the means to decode them as well. Unfortunately it's not as simple as storing the codec along with the file - although that will obviously help.
No, we also have to preserve a working copy of the operating system that can play back the media files, and because machines go out of date, we have to preserve a working copy of the machine. (In case you're wondering how you can "copy" a machine, it's fairly established technology. In fact, much of the World Wide Web is run from virtual machines - emulated computers running on real computers - and this allows it to be much more efficient, if somewhat slower, in individual machine performance terms.)
Once we have a virtual machine that is written in a "portable" language, like Java, perhaps, then it can be transported to run on anything that supports that portable language. This gets us a long way towards being able to resurrect "dead" machine/software combinations.
But it takes work. Lots of it. And because it takes work, to support the virtual machines and make them able to run on whatever is the current and near-future generations of technology, it will cost a lot of money. This, then is the point.
The point is that in future, we will no longer be able to store our creations on a shelf at zero cost. It will require real expenditure for our films and videos to be available in the future. This will not be a small cost. Probably the best we can hope for is that a big company, like Google, perhaps, or Dropbox, or very likely some company that doesn't exist yet, will set up an operation to do what is necessary on a very large scale that will reduce the costs to individual users. This will have to happen.
It will have to happen because if it doesn't, our films, videos, music tracks, personal memories, and in fact the whole of our recent (and future) history, will simply disappear.
Please understand that we are not taking here about data safety as it affects us now, today. We're not talking about RAID systems, which can absolutely be configured to make data incredibly safe. We're not talking about backups and archives, the theory and practise of which are very well known, if not always perfectly executed.
No, what we're talking about is that moment when you realise that you were just - only just - too slow in transferring your content to media that can still be read by today's machines. You don't get a warning when something is about to become obsolete or unreadable. You just get an error message bringing you the bad news, or the device doesn't show up in your file system explorer.
Data doesn't fade away gradually. It just becomes inaccessible. But when you step back and look at a mass of data from afar, the effect is that it gradually goes away. Welcome to the future of digital media.