advertisement
advertisement

A Year Later, The Tech Team Behind The Panama Papers Continues To Help Break News

It was “the biggest leak in the history of data journalism.” International Consortium of Investigative Journalists head of data & research Mar Cabra speaks about what worked and didn’t work from a tech and data science point of view.

A Year Later, The Tech Team Behind The Panama Papers Continues To Help Break News

It was an exposé made for the era of social media—and it’s still having an impact.

advertisement

A year ago this week, reporters at the International Consortium of Investigative Journalists and more than 100 news organizations around the world released their first stories linked to the Panama Papers—a leaked trove of more than 11 million records from Panama-based law firm Mossack Fonseca. The documents and thousands of published stories showed how politicians and other prominent figures around the world were linked to secretive offshore corporations and accounts, spurring investigations, protests, and political resignations from Iceland to India. The law firm’s founders were arrested in February on money laundering charges.

“I think we’re looking at five years of continued prosecutions and political adjustment based on this information,” says Lawrence J. Trautman, a law and business professor at Western Carolina University who has written about the leak.

That impact was the result of work by almost 400 reporters around the world, who analyzed the leaked data for more than a year before the first stories were published—as well as a team of engineers who built the infrastructure that made it possible for them to sift through the almost 3 terabytes of information.

“This really is a testament to collaborative journalism in the truest sense of the word—I don’t think anything has existed like this, remotely like this, the kind of range of partners, BBC and Guardian, the smaller players in Ecuador,” said Kevin G. Hall, chief economics correspondent and senior investigator for the McClatchy newspaper chain, in a recent panel discussion on the Panama Papers at the Brookings Institution.

And to let those people collaborate, the ICIJ team effectively deployed its own internal social network, allowing the journalists to securely discuss their work and findings even though they were seldom, if ever, in the same room.

advertisement

“Technology helped us bridge the gap that you normally have when people collaborate across borders,” says Mar Cabra, head of ICIJ’s data and research unit.

Even just turning the raw data into something reporters could work with was a major technical underpinning, made easier by ICIJ’s experience working with previous smaller troves of leaked offshore finance data, she says.

“Those 11.5 million documents were in dozens of different formats,” she says. “There were a lot of PDFs that had to be made machine-readable.”

Dozens of cloud-based servers churned through those PDFs, using software that ICIJ has since made open source to extract text and index it for reporters to search and analyze.

“We became, somehow, a software development company, because we were updating the software all the time,” Cabra says. “We had one developer that was working only on the document search platform, and I told him, ‘You’re gonna work for a few months on making this data available.'”

The team also used the graph database Neo4J, designed to store and speedily analyze relationship networks like the connections between the offshore companies and their owners and directors, and the companion visualization tool Linkurious to enable reporters to browse through the links between companies in an intuitive way. The software essentially built the kinds of whiteboard social network diagrams that are a staple of Homeland and Hollywood dramas, but on an unprecedentedly massive scale.

advertisement

“They were able to find even more names than the ones that they had explored just by sifting through the documents,” Cabra says. “That because more stories—that became new leads to pursue.”

ICIJ ultimately also opened up large portions of the database to the public, letting readers use the same visualization tools to look for figures of interest to them in the records. And millions of people have visited the group’s website in order to do so, Cabra says.

“Human eyes are not made to understand rows and rows and rows of data [and] names, interconnected,” she says. “Visualizations help us make sense of that.”

The group, which recently spun off from the Center for Public Integrity, a D.C.-based investigative journalism nonprofit and is in the midst of a crowdfunding campaign, is still looking to add additional features to its software, like potential integrations with outside databases like Wikipedia.

ICIJ is also working with the makers of Neo4J to further streamline its databases, which could make it easier to link data from the Panama Papers and its other datasets—or to easily integrate information from future leaks. The group may also add something like a Google News-style alert system to highlight news stories involving previously unknown people linked to companies in the database, as Cabra says news organizations continue to discover new stories in the leaked data and collaborate to share what they find.

“I think that technology was key to make this project happen, but humans were key too,” she says. “Technology can be awesome—it can be groundbreaking, but if humans don’t collaborate, none of these would have happened.”

About the author

Steven Melendez is an independent journalist living in New Orleans.

More