How speech recognition software can improve video compression

Written by David Shapton

GoogleSpeech Recognition by Google

Speech recognition software could ultimately lead us to a new way to work with video

Speech recognition is something that’s been “assumed” for a very long time: decades, in fact (and that is a long time in the field of digital media technology). Go to any science fiction film or TV program from the last forty years and you’ll find computers that you can talk to.

Here’s a short film, made by Google, about the history of Speech Recognition (or “Beach Wreck Ignition” if there’s a bit too much background noise…).


Google Now, Siri and Microsoft’s Cortana are all examples of computer services that you can “talk” to, within very specific limits, and Google, particularly, are throwing a huge amount of weight behind their speech recognition efforts including recruiting Ray Kurzweil, one of the foremost and accurate predictors of the future on the planet, and - importantly for Google, one of the pioneers of computer speech recognition.

Getting a machine to understand speech is a “hard” problem, not least because it’s not clear that we understand what it means for a machine to “understand”. You could say that if the machine or computer responds in the way that we expected and wanted it to, then it has “understood” our “conversation”, but no machine today comes even close to behaving like a sentient being that is able to intelligently converse with us. And to “understand” speech, a computer would have to “understand” the world around it.

That may happen. You can see why this is important to Google, because the more their computers “understand” the world, the better and more useful will be their search results.

This all has implications for the future of video as well. Computers with sight, hearing, and the ability to move around and interact with their environments (let’s call them “Robots”) need to understand the world in order to be able to do useful things within them. So, rather than seeing the world as a series of bitmaps, these robots need to understand the nature of the objects it “sees” and the relationships between them.

Once we reach the point where a computer is talking about objects that their interactions, we can make this more and more granular and perhaps dispense with pixels altogether.

Imagine: resolution and frame rate-independent video!

Speech recognition is a small but significant step towards this.

Tags: Audio


Related Articles

26 May, 2020

Accusonus Mauvio app brings professional audio filters to your iphone

The new Accusonus Mauvio app apparently works magic on your mobile audio, with an interface that anybody can use. The results sound impressive.


Read Story

15 May, 2020

Clean Audio with CrumplePops RustleRemover AI and Levelmatic

CrumplePop RustleRemover AI and Levelmatic Audio Plugins, two plugins for audio cleanup we have a look at how they can clean up audio from anything...

Read Story

9 May, 2020

Creating the real out of the unreal for Ex Machina's sound production

The visual effects take centre stage in Alex Garland's examination of what it means to be human but, as Kevin Hilton explains, Glenn Freemantle's...

Read Story