Audio has often played second fiddle to the visual yet to paraphrase George Lucas, sound is more than half the picture. Certainly, when it comes to the immersive information and entertainment experiences that are promised with ubiquitous high capacity broadband the focus has been on what we might see rather than the primacy of the aural. That could be about to change and here’s why.
Personal voice assistants such as Alexa, Siri and Google Assistant are emerging as the biggest interface revolution since the iPhone popularised the touchscreen. By 2023, we will speak rather than type more than half of our Google search queries, predicts Comscore.
At the same time, one of the fastest growing categories of body worn sensors connected to the internet are wearable audio technologies. Known as ‘hearables’ these are likely to harness machine learning to ascertain our preferences, habits and behaviour in order to engage more personally on a one-to-one level.
So rapid is development in this area that the worldwide market for smart in-ear 'hearables' will be valued over $73 billion by 2023, according to Futuresource Consulting.
“The thirst for technology integration, notably voice assistants, exhibits potential to build a unique class of innovative hearable products,” believes analyst Simon Forrest.
Coupled with location-based awareness via an on-board GPS, spoken direction will become “an essential skill for hearables”, suggests Forrest, well beyond the ‘command and control’ voice interface we have today, capable of directing users through spoken step-by-step instructions.
The basic use cases are in health monitoring of pulse or stress levels and as an aid to hearing.
One scenario, envisaged by Poppy Crum, Chief Scientist at Dolby Laboratories and an Adjunct Professor at Stanford University, is where you’re trying to follow a football match on TV while in the kitchen cooking. Your hearables know there’s a problem because they’ve detected an increase in your mental stress, based on changes in your blood pressure and brain waves and will automatically increase the volume of sounds coming from the direction of the TV.
Similar audio amplification and directionality could happen to enable you to hear your dinner companion in a restaurant, or a friend in a club.
Hearables can even figure out exactly whom you are trying to hear by tracking your attention, even if you can’t see the person directly.
“We’ve all been at a party where we heard our names in a conversation across the room and wanted to be able to teleport into the conversation.” Soon we’ll be able to do just that, says Crum.
Adaptive noise cancellation technology, integration of voice assistants and addition of smart user interfaces all stem from developments in wireless technology.
Wireless earbuds such as Apple AirPods or Bose Sleepbuds show how advances in miniaturisation and battery technology have enabled small, lightweight devices that weren’t possible just a decade ago. Bose recently introduced Bose Frames which have directional speakers integrated into a sunglasses frame.
All of these new features improve the listening experiences for consumers and helps to reduce dependence on the smartphone for simple controls (such as to pause/ play music, ask for weather or navigation information, adjust volume etc.).
How about ditching all the complicated menus within menus and buttons on your digital camera and simply requesting your personal voice AI to ‘record rapid burst 4K, stop at 3GB, save as JPEG and RAW and give me HDR options’.
That wouldn’t work if you’re taking close-up snaps of easily disturbed wildlife – but it’s as hands free as you’re likely to get. And over time, as the voice AI understands more of your personal photography preferences with natural language processing, you and your AI will develop a shorthand. You’ll be creating images together.
Voice assistants, evolved
Amazon, Google and others are working on ways to evolve assistants from a voice interface that completes basic tasks to one that can handle complex conversational style comprehension.
Efforts are being made to stitch together voice assistant applications under one operating system so that the user need only interface and converse with one wherever they are.
Skip forward a few years and you can readily imagine a scenario as played out in Spike Jonze 2013 film Her in which the lead character Theo falls in love with his voice-driven OS called Samantha. Samantha would pass the Turing Test, her artificially intelligent relationship with Theo indistinguishable in his mind from the real thing.
A new concept of ‘audible AR’ could evolve, presenting opportunity for 5G hearables that overlay spoken information to augment the real-world environment in real-time.
Science fiction? Not for Poppy Crum. She is working toward audio technology that is “truly empathetic” and calls the ear the biological equivalent of a USB port.
“It is unparalleled not only as a point for ‘writing’ to the brain, as happens when our earbuds transmit the sounds of our favourite music, but also for ‘reading’ from the brain,” she says.
Today’s virtual assistants, rely on the cloud for the powerful processing needed to respond to requests. But artificial neural network chips coming soon from IBM, Mythic and others will often allow such intensive processing to be carried out in a hearable itself, eliminating the need for an internet connection and allowing near-instantaneous reaction times.
“Voice assistants will no longer remain quiescent until summoned by the user,” says Forrest. “Instead they will intelligently interject at optimum moments throughout the day, influencing the user’s thoughts and behaviour.”
He suggests that the race is on to identify and monetise services that do not necessarily rely on screens. “Advertisers will be quick to harness opportunity to speak to wearers, conveying precisely timed and relevant information based upon geolocation,” he says.
A whole new world of audible applications will develop alongside visual ones, presenting digital enhancement of the soundscape.
Rather than layer the world with visual information, audible AR offers a ‘layered’ listening experience.
Crum thinks future hearables will use software to translate fluctuations in the electrical fields recorded in our ears drawing on decades of research that have helped scientists draw insights into a person’s state of mind from changes in electroencephalograms (EEGs).
By that stage we may not need to talk to our AI at all since it will be reading our minds before we’ve even processed the thought.