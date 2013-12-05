On Google, you can find specialized tabs that only search all sorts of specific results, like news articles, images, patents, and scientific studies. The Declassification Engine is an ambitious project that aims to help citizens search for and uncover one very particular sort of result: U.S. government secrets.

The project, based out of Columbia University and launched about a year ago, uses advanced computer science methods in big data, machine learning, and natural language processing to scale up what some scholars have been struggling to do by hand for a decade: Document the rise of government secrecy, learn more about what the government isn’t releasing, and uncover new patterns and information in the millions of documents that do get declassified but contain heavy redactions.

“The Sphere of Influence” is an interactive visualization of State Department cables, mapping where government secrets travel around the globe and helping to flag anomalies.

Without more accountability, the historians, statisticians, computer scientists, and lawyers involved in the Declassification Engine project fear that our past will be “shredded in secrecy.”

“People have always complained about official secrecy, but over time there’s been measurable growth in the government secrets that are created,” says Columbia University historian Matthew Connelly, the co-leader of the project. “The whole system is breaking down.”

Today, to actually review the classified documents being created every year would demand the full-time work of every single federal and state government employee in the country, the project’s site says. The National Declassification Center has a current and rather hopeless backlog of 370 million pages of documents, and the government spends $10 billion protecting its secrets. It allocates only $50 million to declassification work.

The Redaction Archive is another tool that is turning up matches of redacted and unredacted documents side-by-side to uncover what’s beneath the black marker.

The tools that the Declassification Engine have created thus far offer a glimpse into why the overabundance of secrecy hurts American democracy.

Their first mission was to gather as many declassified documents as possible into one database. The National Archives and other government troves are one source of information. But researchers have also collected others, such as scanned and full-text documents from private database companies like ProQuest and Gale Cengage Learning. Researchers involved in the project are now working with the Internet Archive to analyze the millions of PDFs that group has scraped from government sites since 1996. It also hopes to incorporate the results of FOIA requests, which are housed in online reading rooms of government agencies.