UK company Speech Graphics is gunning for the top-spot in lip-sync technology: mark their words. RedShark contributor David Valjalo reports
How does the adage go? Poorly synced virtual lips sink narrative ships. Well, something like that. The fact is, to create unforgettable characters in the modern landscape of computer generated animation they need to be believable. One of the key tools in the war on audience disbelief, keeping them invested in your world, is to make sure they believe the words being spoken belong to the speaker, whether it's a Lorax or a layman.
Enter Speech Graphics, which recently appointed one Colin MacDonald as chairman. A game industry veteran and currently the commissioning editor for games at UK broadcasting powerhouse Channel 4, MacDonald is clear about the bold, raw ambition of the studio: "We’re aiming to be the standard facial animation solution for the AAA games [and] essentially anywhere realistic speech-synchronised facial animation is required", he tells me. A bold ambition for a studio set up just three years ago, but with decades of experience in the field across the team - not least MacDonald's time overseeing videogame developer Realtime Worlds - it's certainly not impossible and the technology is striking (see the video demonstration below along with MacDonald's in-depth explanation of how it all works).
The Science Bit - Colin MacDonald on the tech behind the teeth
Speech Graphics’ core technology involves advances in two main areas: specialised acoustic algorithms for extracting key information from the speech signal, and a revolutionary muscle-dynamic model that predicts how human facial and tongue muscles move to produce speech sounds. The output is high-fidelity lip sync over hours of speech. Moreover, as the muscle-dynamic model is based on universal physical principles, it works across all languages, which is a huge advantage when it comes to localisation. The Speech Graphics solution conveys not just speech but also emotional content, which is captured from the audio through a proprietary analysis. The system matches the dynamics of facial movements to the dynamics of the speaker’s voice, so that the emotional content of the speech is reflected in the face. Going even further, the animation can also be driven by text, using a third-party Text- to-Speech system to drive the same procedural model to output-synchronised facial animation.