How speech recognition software can improve video compression

Written by David Shapton

GoogleSpeech Recognition by Google

Speech recognition software could ultimately lead us to a new way to work with video

Speech recognition is something that’s been “assumed” for a very long time: decades, in fact (and that is a long time in the field of digital media technology). Go to any science fiction film or TV program from the last forty years and you’ll find computers that you can talk to.

Here’s a short film, made by Google, about the history of Speech Recognition (or “Beach Wreck Ignition” if there’s a bit too much background noise…).


Google Now, Siri and Microsoft’s Cortana are all examples of computer services that you can “talk” to, within very specific limits, and Google, particularly, are throwing a huge amount of weight behind their speech recognition efforts including recruiting Ray Kurzweil, one of the foremost and accurate predictors of the future on the planet, and - importantly for Google, one of the pioneers of computer speech recognition.

Getting a machine to understand speech is a “hard” problem, not least because it’s not clear that we understand what it means for a machine to “understand”. You could say that if the machine or computer responds in the way that we expected and wanted it to, then it has “understood” our “conversation”, but no machine today comes even close to behaving like a sentient being that is able to intelligently converse with us. And to “understand” speech, a computer would have to “understand” the world around it.

That may happen. You can see why this is important to Google, because the more their computers “understand” the world, the better and more useful will be their search results.

This all has implications for the future of video as well. Computers with sight, hearing, and the ability to move around and interact with their environments (let’s call them “Robots”) need to understand the world in order to be able to do useful things within them. So, rather than seeing the world as a series of bitmaps, these robots need to understand the nature of the objects it “sees” and the relationships between them.

Once we reach the point where a computer is talking about objects that their interactions, we can make this more and more granular and perhaps dispense with pixels altogether.

Imagine: resolution and frame rate-independent video!

Speech recognition is a small but significant step towards this.

Tags: Audio


Related Articles

21 July, 2020

Alan Turing invented computer music

Similar to unearthing a time capsule, a recently re-discovered recording by Alan Turing reveals his pioneering efforts in the field of computer music.

Read Story

20 July, 2020

How to reduce embedded wind noise in your audio

Wind noise is the bane of audio recording. Here are some tips on how to reduce it using the tools already in your NLE.                              ...

Read Story

10 July, 2020

VCA Faders: One simple thing that can make your audio mixing much easier

Replay: If you often end up with layer upon layer of audio, how do you make easy sense of it when it comes to mixing? Tim Dunphy takes us through the...

Read Story