Using algorithms to generate foley sounds

Can a computer predict what noise this combination will make?

It seems that there are few jobs left that computers can’t edge into somehow. A US team has got a computer to match sounds with silent video to an impressively efficient degree.

It’s a complicated process setting up a computer to be able to replicate sound, but it is doable and it has some interesting implications for audio further down the line when the algorithms have been refined a little.

As detailed in the paper ‘Visually Indicated Sounds’ by a team of researchers from MIT, UC Berkeley and Google, an algorithm was fed a dataset of 978 videos with various materials being hit or scratched with a drumstick 46,577 times. There was a load of metadata included along with it, such as identifying a hit or a scratch, sorting the materials into categories and identifying the physical reaction that resulted (splash, deformation etc), but these weren’t input for learning but rather to allow the team to keep track of how the algorithm was working once up and running, i.e. where it was pulling the sound from. After that, it was pretty much left to get on with it.

The resulting soundtrack it produced to a silent video is impressive, especially for certain materials such as leaves and dirt which could be defined as ‘non-solid’. Indeed, the human viewers of the resulting videos proved to be twice as likely to pick the algorithm-generated audio as the ‘real’ sound over and above the genuine audio track.

“Often when a participant was fooled, it was because the sound prediction was simple and prototypical (e.g., a simple thud noise), while the actual sound was complex and atypical,” says the paper which you can read in its entirety here. “True leaf sounds, for example, are highly varied and may not be fully predictable from a silent video.”

For other, harder materials it was less successful, and it was also sometimes fooled by a near miss, but it has interesting potential, especially for realtime effects such as those required by games and 360 degree video. For foley though, the research seems to prove what foley artists have known all along: it's not about what you hear, it's about what you expect to hear, and, conditioned by decades of film and TV, the two things are not always the same.

Tags: Audio

Using algorithms to generate foley sounds

Comments

Related Articles

Boris FX acquires iZotope, adding RX and Ozone to its plugin lineup

GoPro Wireless Mic System goes on sale, 24-bit/48 kHz audio up to 150 m

Insta360 Mic Pro review: a unique wireless mic in a crowded market

Insta360 Mic Pro: E-Ink display, three-mic array, and 32-bit float internal recording

Bose takes aim at Sonos Era 100 speakers with new Lifestyle range

Popular

Everything we know about the DJI Osmo Pocket 4 Pro

The Best Action Cameras of 2025 Compared: DJI Osmo Action 6, Insta360 Ace...

Sonos Era 100 Stereo Pair Review: Can They Compete With Two Sonos Era...

Blackmagic Adds Monthly License Option for DaVinci Resolve Studio

The Best 360 Cameras of 2025 Compared: Insta360 X5 vs GoPro Max 2 vs DJI...

About Redshark

links

categories

Using algorithms to generate foley sounds

Comments

Related Articles

Boris FX acquires iZotope, adding RX and Ozone to its plugin lineup

GoPro Wireless Mic System goes on sale, 24-bit/48 kHz audio up to 150 m

Insta360 Mic Pro review: a unique wireless mic in a crowded market

Insta360 Mic Pro: E-Ink display, three-mic array, and 32-bit float internal recording

Bose takes aim at Sonos Era 100 speakers with new Lifestyle range

Popular

Everything we know about the DJI Osmo Pocket 4 Pro

The Best Action Cameras of 2025 Compared: DJI Osmo Action 6, Insta360 Ace...

Sonos Era 100 Stereo Pair Review: Can They Compete With Two Sonos Era...

Blackmagic Adds Monthly License Option for DaVinci Resolve Studio

The Best 360 Cameras of 2025 Compared: Insta360 X5 vs GoPro Max 2 vs DJI...

Signup to Redshark

About Redshark

links

categories

Signup on Social