Minority Report has a lot to answer for, not least the stimulus given to a million articles like this about the future of the human-machine interface. Controlling internet-connected devices with gesture and voice is widely seen as the future but nothing has come close to the slick air interface imagined in Steven Spielberg’s 2002 movie.
Google hasn’t cracked it either – but it’s got something that has potential and it’s already inside an actual product, the Pixel 4 phone.
It’s disarmingly simple too and stems from the idea that the hand is the ultimate input device. The hand, would you believe, is “extremely precise, extremely fast”, says Google. Could this human action be finessed into the virtual world?
Google assigned its crack Advanced Technology and Projects team to the task and they concentrated research on radio frequencies. We track massive objects like planes and satellites using radar, so could it be used to track the micro-motions of the human hand?
Turns out that it can. A radar works by transmitting a radio wave toward a target and then the receiver of that radar intercepts the reflected signal from that target. Properties of the reflected signal include energy, time delay and frequency shift which capture information about the object’s characteristics and dynamics such as size, shape, orientation, material, distance and velocity.
The next step is to translate that into interactions with physical devices.
Google did this by conceiving Virtual Tools: a series of gestures that mimic familiar interactions with physical tools. Examples include a virtual dial that you turn as if miming turning a volume control. The virtual tools metaphor, suggests Google, makes it easier to communicate, learn, and remember interactions.
While virtual, the interactions also feel physical and responsive. Imagine a button between thumb and index finger. It’s invisible but pressing it means there is natural haptic feedback as your fingers touch. It's essentially touch but liberated from a 2D surface.
“Without the constraints of physical controls, these virtual tools can take on the fluidity and precision of our natural human hand motion,” Google states.
The good news doesn’t end there. Turns out that radar has some unique properties, compared to cameras, for example. It has very high positional accuracy to sense the tiniest motion, it can work through most materials, it can be embedded into objects and is not affected by light conditions. In Google’s design, there are no moving parts so it’s extremely reliable and consumes little energy and, most important of all, you can shrink it and put it in a tiny chip.
Google started out five years ago with a large bench-top unit including multiple cooling fans but has redesigned and rebuilt the entire system into a single solid-state component of just 8mm x 10mm.
That means the chip can be embedded in wearables, phones, computers, cars and IoT devices and produced at scale.
Google developed two modulation architectures: a Frequency Modulated Continuous Wave (FMCW) radar and a Direct-Sequence Spread Spectrum (DSSS) radar. Both chips integrate the entire radar system into the package, including multiple beam-forming antennas that enable 3D-tracking and imaging.
It is making available an SDK to encourage developers to build on its gesture recognition pipeline. The Soli libraries extract real-time signals from radar hardware, outputting signal transformations, high-precision position and motion data and gesture labels and parameters at frame rates from 100 to 10,000 frames per second.
Just imagine the possibilities. In the Pixel 4, Soli is located at the top of the phone and enables hands-free gestures for functions such as silencing alarms, skipping tracks in music and interacting with new Pokémon Pikachu wallpapers. It will also detect presence and is integrated into Google’s Face Unlock 3D facial-recognition technology.
Geoff Blaber, vice president of research for the Americas at analyst CCS Insight, says it’s unlikely to be viewed as game-changing but that marginalises the technology and Google’s ambition for it.
In fact, this radar-based system could underpin a framework for a far wider user interface for any or all digital gadgets. It could be the interface which underpins future versions of Android.
Google has hinted as much. In a web post, Pixel product manager Brandon Barbello said Soli “represents the next step in our vision for ambient computing”.
“Pixel 4 will be the first device with Soli, powering our new Motion Sense features to allow you to skip songs, snooze alarms, and silence phone calls, just by waving your hand. These capabilities are just the start and just as Pixels get better over time, Motion Sense will evolve as well.”
This is a way of describing the volume of internet-connected devices likely to be pervasive in our environment – particularly the smart home – over the next few years. Everything from voice-activated speakers to heating, light control, CCTV and white goods will be linked to the web.
Google makes a bunch of these (from smoke detectors to speakers under its Nest brand) and wants to link them up under its operating system (self-fuelling more data about individuals to refine the user experience). The battle for the smart home will also be fought between Microsoft, Apple, Samsung and Amazon. Soli may be the smart interface that links not just Google products, but perhaps all these systems together.
Of course, it’s early days. The virtual gestures may be intuitive, but we still have to learn to use them; our virtual language needs to be built up. Previous gesture recognition tech like the IR-driven Kinect and the Wii have proved to be an interesting novelty but clunky in practice. Gesture will work best when combined fluently with voice interaction and dovetailed with augmented reality so that we can view and manipulate text, graphics, even video, virtually.
Just like Minority Report – except without the gloves which Tom Cruise’s PreCrime detective wore.
It couldn’t get everything right.