Back in 2016, we asked ourselves whether artificial intelligence was on the verge of taking over media. Our utopian view was that AI presented the opportunity to take work off our hands, letting humans find more time to be creative. The dystopian? One word: Skynet.
Five years on, and in post-production, at least, our utopian theory seems to be winning. From AI-driven grading, music scoring and video upscaling software through to a new deep learning engine that can take a 2D image and turn it into an animatable 3D model, there are plenty of ways machines are helping to improve post-production pipelines.
But what are some of the best ways AI is helping post teams today, and how might AI tools shape workflows for the future? We’ve explored AI post-production software from Adobe, Avid, Colourlab.ai, NVIDIA, EditShare, CrumplePop, Topaz Labs and more to find out.
From 700 Clicks to 70
“All current AI trends are exciting. Some of them are exciting in thrilling ways – like stand-right-at-the-edge-of-a-cliff sort of exciting,” begins Andrew Page.
As director of advanced technology for media and entertainment at NVIDIA, Page is responsible for developing SDKs that other companies and studios can use to add AI features into their workflows.
To him, one of the big values of AI – particularly in post – is in its ability to automate traditionally repetitive tasks like metadata tagging, captioning or even rotoscoping. “These days, AI helps make sure that rather than going through 700 mouse clicks a day, you only need to go through 70,” he continues. “It gives you the time back to do something more meaningful.
Today, tools like EditShare’s EFS and Avid Media Composer have integrated cloud AI services like AWS and MediaCentral in order to automatically tag shots with metadata, based on the objects and people detected within them. Every clip is automatically organised, so assistants and editors can quickly find the shots they need, giving them more time to focus on telling the story.
And if facial recognition isn’t enough – machine learning also makes it possible to organize and find clips via dialogue.
For Media Composer editors, Avid PhraseFind automatically analyses all project clips, then phonetically indexes audible dialog. Meanwhile, Avid ScriptSync not only indexes all text and audible dialog in your project, but also synchronizes each source clip to its associated line in a movie or television script. Editors can then locate clips based on scene number, page number, word or phrase search.
For captioning, Premiere Pro’s recently announced Speech to Text and Auto Captions features, which are powered by its Sensei machine learning technology, will automatically create a video transcript, then generatecaptions on the timeline that mirror the pacing of spoken dialog and match it to the video timecode involved.
Also powered by Sensei is Content Aware Fill, which helps remove boom mics or out-of-place signs in a shot with a few clicks – and the new Roto Brush 2 masking tool, which streamlines the process of manual rotoscoping. For anyone that’s spent hours rotoing hairs or blades of grass, this, like the other time-saving AI solutions we’ve described, can be life-changing.
“With the Roto Brush 2 tool, creators can select actors from a scene and place them in an entirely different environment, essentially unlocking the advantages of a green screen without actually using one,” says Byron Wijayawardena, Strategic Development Manager, Digital Video Audio at Adobe.
“The AI today doesn’t need a green screen anymore, because it didn’t learn the colour green,” NVIDIA’s Page adds. “What it learned was how to follow the contour of your sweater, or how your hair feathers to the background. It's actually a person, character or an object extractor. The more you train it, the more it learns and that's the power of these tools - is their ability to learn. Imagine what we've just done for a VFX artists and rotoscopers of the world.”
Pulling the Bunny out of the Hat
As well as automating repetitive tasks, we’ve also seen AI post-production tools solve problems that were once seemingly impossible without additional hardware or re-shoots.
Problems like removing background wind – something that is currently possible through solutions like CrumplePop’s WindRemover AI.
“AI is currently getting used to create a lot of uncanny stuff, which is fun right now because it's novel,” explains Gabe Cheifetz, Founder of CrumplePop. “But it's not very useful, and the novelty is going to wear off quickly. It's a little bit like when people first got their hands on Photoshop – ‘the photo is black and white, but the rose is in color! Amazing!’
“For us, it's much more interesting to use AI to remove real obstacles. Since the invention of the microphone, wind noise has been a problem. The plugin we developed, WindRemover AI, is a very practical use of AI that solves that problem, and it's within reach of anyone, from big production companies to individual YouTubers.”
Upscaling footage is another perfect example. Whether you’re refreshing older video or simply intercutting lower resolution footage with more modern shots, upscaling can make a big difference. But it’s also traditionally been extremely pricey to do with hardware.
AI solutions like Topaz’s Video Enhance AI can change this. Video Enhance AI performs an upscale, as well asusing AI to rid the footage of artefacts such as moire, macro blocking, aliasing, and other issues that can afflict various lower quality cameras and footage.
“As our processes mature, our AI will soon have the ability to hallucinate details between frames as well,creating frames where previously there were none,” explains Taylor Bishop, Product Developer at Topaz Labs.“This will effectively allow you to double or triple the framerate of your source footage while maintaining the level of quality you expect.”
And AI is showing no signs of slowing down. We’ve seen brand new AI research from NVIDIA called GanVerse3D, which can turn a 2D image into an animatable 3D model – including mesh and textures – in seconds.
And we’ve recently reviewed a new system, Dynascore, that uses artificial intelligence to create musical scores that fit with your edits by breaking down a piece of music and re-assembling it around small blocks of sound.
For grading, we’ve also been impressed by Colourlab.ai – a colour grading tool that uses AI to take looks from a reference shot or image and apply them consistently across the entire edit, even matching across different cameras.
“Ultimately, this system is going to work on your mobile phone,” adds Dado Valentic, Founder of Colourlab.ai. “You’re going to point your camera, it’s going to recognise what’s in the shot, and you won’t have to be an amazing cinematographer or colourist to get a professional-looking graded result, in real-time.”
What Comes Next?
If there’s one thing all the experts agree on, is that we’ve probably only just scratched the surface when it comes to how AI can change post for the future.
“There’s so much that’s exciting,” NVIDIA’s Director of Advanced Technology for Media and Entertainment, Andrew Page adds. “Things like voice understanding, for example. Why should an artist not be able to say ‘take those import clips and process them like this’? That’s the long-term vision.”
How we develop AI for post is probably going to change too. “When I first started getting excited about AI in post, I was frustrated that it was being developed mostly by data scientists rather than creatives from set,” says Valentic.
“In post-production, you can’t just say ‘if the results aren’t good enough, we’ll simply improve the data that feeds into the AI algorithm’ because everybody has a different idea of what is good enough. That’s why in the future, we’ll see more creatives informing how the AI works and what problems it’s meant to solve.”
For CTO of EditShare Stephen Tallamy, it’s also crucial to consider how AI systems can introduce bias that can negatively impact diversity for the future.
“If your production process depends on an AI to index video but is unable to understand an individual’s speech, due to dialect, disability or lack of language support, those people can become invisible to the production,” he concludes.
“We are excited by the potential of remote production increasing the diversity within the industry. We need to ensure we are not impacting this opportunity by selecting technologies that have been trained with a restrive dataset, that codifies biases that have been present for thousands of years.”