Of all the new technology companies disrupting the field of journalism, none hits home quite like Narrative Science. As Co.Exist reported back in November, the Chicago-based company, which is up for a Moxie Award this week, has developed an algorithm that can mimic human writing so effectively that… well just have a look at these two Forbes.com leads and see if you can tell which was written by a robot:
"Take-Two Interactive Software (TTWO) is expected to book a wider loss than a year ago when it reports fourth quarter earnings on Tuesday, May 22, 2012 with analysts expecting a loss of 60 cents per share, down from a loss of 23 cents per share a year ago."
"Take-Two Interactive shares are trading higher after hours Tuesday following the video game publisher’s financial results for the fiscal fourth quarter ended March 31."
If you guessed the first one, you probably just got lucky.
The phrase "convincingly human" has probably never been used by the Pulitzer Prize committee, but it's good enough when it comes to analyzing large data sets, or the earnings reports that Narrative Science files for Forbes.com. These Reporter-bots are perfect for the kinds of stories journalists don’t tell. Before the year is out, for example, Narrative Science will write between 1.5 and 2 million little league recaps, something no other publication has the resources or desire to do. "What we’ve been able to do is cover a story for a really large albeit disaggregated audience that would not get coverage otherwise," says CEO Stuart Frankel.
So why does every story written about Narrative Science act like the journalist apocalypse is nigh? Because once Narrative Science can begin collecting enough of the right data, its output will almost surely become competitive with real reporters.
Consider a recent Narrative Science project measuring the support for Republican Primary candidates on Twitter. Most of the company’s work up to that point dealt with very structured data: box scores, profit margins, and the like. But here, Narrative Science made sense of vast amounts of unstructured data—in this case tweets. And what are tweets if not quotes, the bread-and-butter of traditional journalists? "As we develop technology to extract data from unstructured sources, particularly Twitter, we can pull information from Twitter conversations and source data to generate stories," says Frankel. Relying on tweets to write stories about public opinion trends makes sense. After all, it can’t be much worse than basing stories on notoriously inaccurate exit polls.
But what about breaking news coverage? Journalists like NPR's Andy Carvin use Twitter to wrangle real-time reports from everyday citizens on the ground in hostile areas. But Carvin takes painstaking steps to verify information before labeling it "confirmed." Can an algorithm be trained to do that?
Maybe. Last December, the Guardian posted a series of data visualizations that tracked how rumors were spread and later debunked on Twitter during the London riots. What the data junkies at the Guardian found was that the Twitter community itself was quite adept at calling bullshit on false information, usually within hours (though journalists certainly played a role in the debunking).
Even whistleblowers nowadays are as likely to leak sensitive information to the Internet as they are to call up a reporter. Once their testimony becomes data, Narrative Science can work its magic. "If the data is there, and a human can write that story using the data, then we can write that story."
NYU Journalism professor Clay Shirky predicted the rise of robot-journalism in 2009, and wrote that its success will depend on whether audiences can trust a robot to be as authoritative a source as, say, Walter Cronkite. In A Speculative Post on Algorithmic Authority, Shirky writes:
"There’s a spectrum of authority from 'Good enough to settle a bar bet' to 'Evidence to include in a dissertation defense', and most uses of algorithmic authority right now cluster around the inebriated end of that spectrum, but the important thing is that it is a spectrum, that algorithmic authority is on it, and that current forces seem set to push it further up the spectrum to an increasing number and variety of groups that regard these kinds of sources as authoritative."
Good journalism isn’t about writing like a human. It’s about trust. And as trust in conventionally authoritative sources continues to erode, Narrative Science's robots may be lying in wait to pick up the slack.
[Photo Illustration: Joel Arbaje]