It’d be nice to open this piece with a daring claim that Apple is developing a self-driving autonomous car. Let’s look a little closer at that and figure out why it’s been suggested.
Figuring out what a company is doing by looking at its patent filings is an old trick. Applying that to Apple right now reveals the company’s interest in what’s widely called LiDAR, which, while traditionally using lasers, seems recently to have become a shorthand for more or less any technique that allows an electronic device to detect how far away things are using light.
Sometimes that’s a depth image to go along with a conventional photo, but it’s just as valid to think about LiDAR in the same way as we consider its almost-namesake, radar. Both create a map of distances to obstacles that might be useful to a self-driving car (or other application) whether or not that map is associated with a conventional image.
Probably the first technology to be referred to as LiDAR involved lasers, and has sometimes been used to create 3D models of real scenes for tasks including visual effects for feature films and TV, architecture and construction, and even crime scene recording. The approach used there is often to have a laser scan vertically across the scene via a rotating mirror as the sensor itself rotates through 360 degrees, creating an accurate three-dimensional plot of everything visible from the point at which the scanner is located. The results tend to look like a scene illuminated from that point, with things the scanner can’t see vanishing into what looks like shadow.
Multiple scans can be superimposed to improve coverage and range, although a big advantage of this approach is that the range can be quite long, limited only by the range of what’s essentially a laser rangefinder. Spatial resolution degrades at longer range, but distance resolution need not, and large outdoor areas can potentially be scanned. That said, full three-dimensional environment recording at high resolution generally can’t be done at video speeds, with each scan taking minutes.
A Faro laser scanner (top) and its output (bottom).
The other approach is often stereoscopy, comparing two images taken some distance apart to detect the range of objects. Unlike laser rangefinding, accuracy falls off significantly even at quite short ranges; remember that the human visual system, working with two eyes separated by perhaps 60mm on average, can barely detect distance stereoscopically beyond about thirty feet. Adding more cameras can help average out the noise in the readings, which is essentially what we mean by a lightfield array as promoted by organisations like Fraunhofer Labs.
The short term for cellphone applications will involve the latter approach, because it can be done with cellphone technology, although plenty of automated driving systems have used laser scanners. The rotating mirror assemblies we sometimes see at the corners of self-driving car prototypes are presumably LiDAR devices, although clearly types intended for lower-resolution scanning sufficient to avoid obstacles, rather than to create a realistic representation of a scene.
Apples and Octrees
Apple’s interest seems to be in how LiDAR data is stored and transmitted. The company’s patents talk about compression in familiar terms, with various clauses talking about overlooking duplicate points and using techniques such as octrees to represent three-dimensional data. Octrees are not a new idea, having appeared at least as early as 1980 in a paper by Donald Meagher at Renssaelaer Polytechnic Institute in New York, and being found in a large proportion of 3D graphics software.
Pseudo-3D environment simulators such as the original Doom use octrees. The involve considering the world as a cube, and dividing that cube up into eight cubes of half the dimensions, then repeating the process on each smaller cube until some suitable resolution limit is achieved. The resulting data structure is tree-shaped with each node having eight subnodes, organised in ways that simplify the sort of searches and calculations that 3D applications often need. Meagher’s patent on the technique is priority-dated 1984 and talks about “image generation,” that is, 3D rendering. Apple appears to have patented the use of octrees, among other more directly compression-related techniques such as run-length encoding, specifically with LiDAR data in mind.
This doesn’t tell us much about the company’s intended application, although it’s as likely to involve augmented reality as driverless cars; iPhones already have a measurement app. Since Pokemon Go is already well-established, if not genuinely old news, rumour has recently abounded that Apple is interested in AR not only on cellphones but also with head-mounted displays. Memories of Google Glass might provoke frowns here, as might the fact that head-mounted AR displays suffer at least some of the same problems as any other application involving stereo 3D: edge violations, disparity between the focus distance and the apparent stereoscopic distance, and others.
Theoretically, sufficiently advanced optics capable of recollimating on the fly, along with eye tracking, could resolve the focus-stereoscopy offset, although that’s probably some way from being a consumer technology in 2021. One advantage of a head-mounted display is that it solves the issue of a projected stereoscopic image in which the stereo offset is always horizontal; this assumes the audience will hold their heads level, which ceases to be a problem when the display is attached to the viewer. A sufficiently wide field of view on the display might help avoid edge violations, although micro-miniature display technology still seems unlikely to be equal to this task at the time of writing.
And, in the end, companies patent things pre-emptively all the time, and Apple might have no specific interest in actually applying any of its newly-protected ideas, at least not now. The application might be as simple as making it possible to re-focus images long after they were taken, using depth data to simulate depth of field. One of Apple’s new claims even refers to video.
Or, Apple might be developing a driverless car. Got to keep up with the Google, eh?