Working digitally with Analogue phenomena is what we do all the time. But how does it actually work?
Enormous amounts of the work done by modern electronics involves processing real-world information, from camera images to mp3 players reconstructing sounds from highly compressed data that only represents the original audio via a lot of mathematics. The fact that we can now build devices that will play back high-definition video of good quality for extended periods, are small and affordable, and consume comparatively little power is in a large part because of our willingness to build devices that are specific to task, and do one type of task with great efficiency.
Digital Signal Processing
The field that's currently referred to as digital signal processing emerged in the late 1970s as electronics became more capable of performing the sort of bulk mathematics required to do useful things to real-world measurements. Initially DSP as a subject included the analogue to digital and digital to analogue conversions which were required at the time, although this is often more or less assumed in modern applications. The first applications were in science, medicine and defence, which, at that time, were those which could afford the requisite exotic computer hardware.
Since then, the defining characteristic of DSP as a discipline, rather than a technology, has been that it involves doing work on samples – such as audio samples – representing a continuous real-world phenomenon such as audio or full motion video. Processes such as filtering can take place in the digital world, using mathematics, in the same way that they can be performed by analogue electronics. A stream of samples is treated as a signal in just the same way as a continuously-variable voltage.
The mathematics required to actually do something useful to that sort of signal can be complicated. One simple example, which we'll use in the rest of the article, is the finite impulse response (FIR) filter, which might be used to create a moving average of an audio waveform and behave as a low-pass filter. It creates a stream of output samples which are calculated as an average of some proportion of the last few samples. That might mean that each output sample is equal to the sum of one-third of each the previous three samples. In that case, put very roughly, frequencies above two-thirds of the Nyquist limit of the sample rate will be reduced.
The Nyquist limit is the maximum frequency that can be accurately reproduced for a given sample rate. Conveniently, this works out at exactly half the sample rate - so if your sample rate is 48KHz, the maximum frequency you can reproduce is 24KHz. It's actually more complicated than this because you can't just stop everything above the Nyquist limit, but if you don't, you get all sorts of unwanted stuff (generally we lump this stuff together and call it "aliasing").
Digital signal processing can be implemented on a wide variety of devices. A modern desktop computer is entirely capable of performing (for instance) a multi-band equalisation on an audio signal on its CPU, without the need for a specific piece of DSP hardware. General-purpose microprocessors, though, aren't ideally suited to this sort of work, being optimised to retrieve limited amounts of information and make decisions about it, as opposed to retrieving large and continuous streams of information and doing mathematics on it. The sheer performance of modern workstations makes it possible to do DSP on a conventional microprocessor, but there are cases where practical concerns may intervene: a low-power netbook, for instance, or a cellphone, might have a CPU that would struggle to do big video tasks in realtime. In this situation, more work can be done for the same amount of power and time by using a piece of hardware that's been specifically designed to do digital signal processing. A DSP device is a microprocessor, but a special case with different design goals to the Intel or ARM CPU that runs your laptop or phone.
For a trivial example of the ways in which digital signal processors are optimised for this sort of work, let's consider that FIR filter again. To calculate each sample, we must retrieve the three most recent samples, divide each by three, add the results all together. That's usually done by keeping a list of some number of recent samples and an index number indicating the most recent sample in the list. The index is increased for each sample calculated, a new value is inserted, and when we reach the end of the list we must remember to loop around to the beginning again.
This is referred to as a ring buffer, a common construct in signal processing, and it's pretty easy to implement. However, we do have to check whether the index has reached the end of the list every single sample, and for a production-quality audio signal we might need to make that check ninety-six thousand times per second. A practical application might have to deal with with samples that were 16 bits wide, which is two eight-bit bytes, or 24 bits wide, which is three, so it would need extra logic to deal with incrementing the index by either two or three bytes per sample. And it might need still more logic to deal with a variable low-pass filter setting, so that there might be more or less than three previous values to store. That's potentially a lot of wasted time, every 0.00001 of a second, before we've even got to the point of dividing values by three or adding them together.
A ring buffer is a common example of the sort of thing that a DSP might implement in hardware. A programmer working with a ring buffer on a normal CPU might insert a new sample, increment the index by the number of bytes in the sample then test whether the index has reached the end of the list. A programmer working with a DSP simply requests the next value, and everything else is dealt with in specific arrangements of logic gates built into the DSP device. The DSP device can insert or retrieve a value from the ring buffer every clock cycle, whereas the CPU will be busy doing administrative tasks for at least several more.
And that's just one low-level example. Modern DSPs tend to have entirely different architectures to the sort of processors we think of as CPUs, with private memory areas for code and data (whereas a current workstation has a single, shared memory). There are often facilities to sum the results of multiplication operations without having to separately perform the addition, enhanced facilities for floating-point mathematics, and for almost GPU-style parallel computing, where a single instruction is rapidly executed on a series of data. The inclusion of fairly elementary versions of this single instruction, multiple data (SIMD) approach on desktop computer processors is designed to improve their DSP abilities; Intel called their early version MMX, for multimedia extensions. Current, more advanced implementations are called SSE by Intel and 3DNow by AMD, but they're eclipsed by the sort of heavy-duty hardware available on a dedicated DSP device.
So far, so low level. But what of modern cellphones and their ability to do truly mindbending amounts of work on full-motion video?
Almost all video codecs that are currently useful for film and TV work, including h.264, MPEG-2 and Redcode, involve (at least in part) breaking the image down into chunks and then representing those chunks as combinations of wave shapes. The idea here is that any continuous signal, such as a series of pixel brightnesses, can be drawn as a graph. That graph, a wavy line, can be closely approximated as a sum of various sine (or other) waves. Those combined waves which contribute only a small amount to the final signal shape can be ignored without objectionable amounts of change to the image, which is how compression is achieved.
Actually doing this requires a mathematical procedure (the discrete cosine transform or discrete wavelet transform) which is beyond the scope of this article, but when broken down into individual steps as computer code requires an absolutely enormous amount of multiplication-and-addition steps, as does the reverse procedure required to recover a viewable image. The low-power ARM processors used in cellphones, for instance, would not be able to do this in realtime. Instead, a specific device designed specifically to do bulk DSP work of this type is included. Various implementations of hardware h.264 codecs rely on the host device's CPU to do more or less of the work. For desktop and laptop computers, Nvidia's PureVideo trademark refers to a hardware DSP implementation of h.264 and codecs using similar techniques. On cellphones, the h.264 handling hardware is probably on the same bit of silicon as the (often) ARM CPU, but it's a DSP all the same.
So, all video codecs are DSP, and given the relative bulk of video data, it's no surprise to find hardware digital signal processors implementing them. Other things, like GPUs, are a grey area: if a 3D representation of a scene delivered as a list of points, lighting and some texture maps can be considered a signal, they're DSP, but that's a bit too much of a leap for some people. Certainly their massively parallel architecture means that general-purpose DSP applications can use at least some of their potential. Either way, the ability of modern computers to render enormously complicated scenes in realtime is certainly a great example of specifically-designed hardware raising the capabilities of electronics way beyond what would be possible with a more generic design, which is the main thing to realise about this. Flexibility and performance are, to an extent, antagonistic characteristics.
So, next time a piece of equipment that fits in your pocket or travelling bag appears to be doing immensely complex things, think again before handing the glory to ARM or Intel. Progress in general-purpose computing has been immense and that is of course very welcome. But ultimately, don't overlook the idea that any particular bit of audio-visual chicanery is in fact being executed by some third-party, task-specific system, beavering away behind the scenes.