26 Jul 2017

Automatic (journalism) for the people

  • Written by 
  • submit to reddit  
We call upon the author to explain... We call upon the author to explain... Shutterstock.com

This article was written by a human, but then again, any computer designed to pass Turing's famous test would say that, wouldn't it?

Even if you believe me, it might not be that long before an awful lot of content on the internet is being churned out by automated systems. I use the term “content” quite specifically, here: content is a product, a consumable that the modern internet increasingly subsists on, and can, according to Google and the UK's Press Association, increasingly be generated by algorithms. Conversely, journalism, we might hope, is a minor artform, which cannot be generated by algorithms. Either way, that's a hope we're all going to have to cling to because it's becoming pretty clear that the culture of free and the sheer volume of material required to keep the Twitters and Facebooks of the world current to the demi-nanosecond don't really support having actual people write the words, at least if those people aspire to live indoors and eat food in some of the world's most expensive countries.

There's two interesting things about this. First, and perhaps most positively, consider the technological advances in natural language processing which are likely to make this possible. What's being proposed is that a machine will take a press release and turn it into a piece of pseudo-journalism about that press release. To forestall a storm of protest, let's be clear that we'll be discussing below whether it's actually desirable to do that, let alone whether it's possible, but the ability to do it convincingly at any level would be a staggeringly impressive piece of logical wizardry. It's hard enough to make a computer read a sentence and comprehend it in any useful sense; it's harder still to make it write one. It has been done, of course; sports management computer games have been trying to make up authentic-sounding news stories for decades, and outfits like Automated Insights (www.automatedinsights.com) use that comparison in some of their own promotional editorial about their natural language products.

The thing about all this which will scare actual journalists – most of whom have written cookie-cutter press release coverage at least a few times - is how disturbingly plausible it is. Part of the reason for that is that a lot of stories about press releases already read alarmingly as if they were put together by a robot at the ED-209 level, so the bar for acceptability is fairly low. The sort of writing that Google and the PA seem to be seeking to replace can practically be put together from a fridge-magnet set of editorial tropes at the best of times, and it's hardly a crucible of journalistic practice so it probably doesn't represent much training value when it's given to the office junior. The structure of these things can practically be templated, even without advanced natural language processing. Describe the product. Describe the company. Quote from the PR guy. Observation, make it slightly wry if you like and you don't think the company will pull its advertising. Link to the website.

So, no matter how hard we try to buff some sort of shine onto this situation by enthusing about the technology, it's difficult to square with the public face of Google's Digital News Initiative. The DNI describes itself as seeking to “support high-quality journalism through technology and innovation,” which is perhaps at least accurate inasmuch as the sort of journalism being targeted here isn't likely to reach much of a bar of quality regardless of whether it's being done by hand or by a computer. Nobody's seriously speaking up for the sort of recycled press-release writing that we're discussing here. Can it be automated? Probably. Will that damage the world? Possibly, but it's not actually that frightening a possibility at first glance. I'll prepare a hat for later eating on this, but an AI good enough to do genuinely good journalism seems some way off.

What's a bit more alarming, at least in terms of the (more enjoyable, more valuable) employment it might destroy, is the potential for this sort of thing to spread to other industries. Summer blockbusters spring readily to mind as a target for this sort of automation. With due deference to the people actually credited for writing them, who generally face a strait-jacket of corporate influence, they're generally not very well written, and they are often beset with cliches. Is it possible to make – or even to write – Guardians of the Galaxy using current levels of procedural generation? No, probably not, because wit is difficult, but it probably is possible to keep churning out superhero franchises like that.

Which is of course the paradox, because we've enthused before about the potential for procedurally-generated content in places like video games. [https://www.redsharknews.com/production/item/1444-are-films-turning-into-video-games?page=1]. There, arguably, it's worthwhile and necessary, because the far greater interactivity of a video game can require more content than can reasonably be created by artists. Films have hit this too (it's presumably being used all the time, but at least some of the cities in the more recent Total Recall had procedurally generated buildings in the far distance.) Perhaps this sort of thing is OK when it's used to do the donkey work, the repetitive, unimportant stuff, although that thought immediately smells like a slippery slope which will soon have us watching movies that have been procedurally generated from script to screen.

And then reading automatically generated articles about them…

 


Phil Rhodes

Phil Rhodes is a Cinematographer, Technologist, Writer and above all Communicator. Never afraid to speak his mind, and always worth listening to, he's a frequent contributor to RedShark.

Twitter Feed