Sound is 70% of what we see, or so the mantra goes. Much is made of sound design for traditional films, but how do you create an effective sound mix for VR?
George Lucas once said that “sound and music is half the entertainment in a movie”, but he may want to up that percentage when applied to VR. As anyone who has popped on a headset may testify, the feeling of disorientation is amplified by sensory deprivation. Sound amplifies emotions and adds realistic depth to otherwise hollow visual experiences. If the sound doesn’t faithfully match or fully immerse you in the picture as you turn your head, your ability to navigate and enjoy the experience suffers.
This is, even more, the case with so-called full VR, otherwise known as 6DOF (six degrees of freedom), the ability to walk around within a 360-degree environment. Without convincing audio, the audience can’t feel fully embedded in a virtual reality story.
Start-up G’Audio Lab thinks it has the answer. It has devised a format that it claims provides a superior sense of localisation and sound quality compared to any other current technology.
The founder members of the LA-based outfit contributed to international audio standardisations including the binaural rendering aspect of MPEG-H 3D Audio. One of them, Dr Oh Henney, holds more than 1000 patents.
While MPEG-H 3D Audio was developed to support channel, object, and/or Ambisonics signals, it is not optimal for VR, they argue. At its inception, of course, VR applications weren’t really on MPEG’s radar. MPEG-H 3D Audio was created as a standard for situations in which there may be many loudspeakers used in the audio presentation. The focus became UHD-TV and then multi-channel configurations like 22.2 surround sound.
As they point out, having audio replayed through headphones or through defined speakers works well if the content is pre-rendered and viewed from a fixed position as in 3DOF but far less well for 6DOF when it can’t accurately reflect the movement of the VR user.
Additionally, they point out that MPEG-H 3D Audio is essentially focused on receiving, delivering or playing a signal but not about recording sound in the first place — and it neglects post production.
G’Audio Lab’s proprietary format, GA5, incorporates channel, ambisonics and object-based audio formats. It supports object tracks to provide pinpointed sound and ambisonics for ambiance while metadata contains the positional information for playing and rendering the respective object, channel, and ambisonics signals.
Its spatial audio post production solution, Works, can be added to Avid Pro Tools as an AAX plugin and assists in the positioning of object sounds in a virtual environment. It has also built a renderer SDK, called Sol, to support GA5, which can be integrated into any web player, HMD, or standalone app.
Its process combines binaural rendering with a head-related transfer function (HRTF) — which is a response that characterises how an ear receives a sound from a point in space.
“Because spatial audio uses HRTF, sound can be placed anywhere in a 3D space, with elevation and distance also being taken into account,” the firm explains. “Using binaural rendering and HRTF, even if spatial audio is consumed through headphones, it's possible to hear sounds as if they were coming from external sources.
“When content created using Works is played on an HMD, sound objects change according to the users’ interactions,” it states. “What they see is synchronised with what they hear. When each sound source is delivered to the playback side as an individual object signal, it can truly reflect both the environment and the way the user is interacting within the environment.”
And since the format supports the simultaneous use of three different kinds of audio signals — object (mono), channel and ambisonics, this combination “enables new levels of freedom and intuition to deliver the most realistic sounds possible."
Making a VR user feel present in the virtual world is the key to alternate reality experiences. From an audio perspective, that feeling of real presence only happens when users can hear the action the way they see it. Anything which can bring higher fidelity and sophistication to sound production and reception is a step in the right direction.