How Vocativ Mines The “Deep Web” For Storytelling

The startup, funded by security tech magnate Mati Kochavi, adapts technology used by hedge funds and intelligence agencies to find news all over the world.


Back in 2012, a group of digital journalists went hunting for Ugandan warlord Joseph Kony. They tried to track him using a trove of data–like mercenary chatter found on an obscure corner of the web. In the end, they weren’t exactly able to string together enough information to triangulate his position. But Kony wasn’t the only signal they were tracking.


By setting geographic parameters for a data-analysis operation and opening their ears, the analysts and journalists of a new kind of news organization called Vocativ stumbled upon talk on a message board about something equally as curious as the Lord’s Resistance Army’s leader’s movements. They found a thriving Facebook subculture of gun sweepstakes where gun shops and industry publications would give away weapons, including AR-15 rifles, to random fans who “liked” their page.

The Facebook guns story wasn’t what reporters and analysts went looking for. And it was only fleshed out by mining the unindexed, un-Google-able Web. Monitoring of deep Internet chatter from Syria has also led Vocativ to stories on sex tapes of prominent Syrians being used as propaganda by both sides in that country’s bloody civil war.

Agnostic, relatively unbiased search parameters to monitor the web for hidden news is the big idea behind Vocativ, which launches today. (Vocativ has been in not-so-stealth mode, with a different site design, for much of the year.) Employees of the digital news agency come from Vice, Huffington Post, ABC, The New York Daily News, and more, and they speak a wide variety of languages. Vocativ is based in New York with outposts around the world. One of its big goals is to use the deep web as a primary source.

The “deep web” consists of all the things available on the Internet that standard search engines overlook–things like spreadsheets and Word documents, subscription-only journals and pages with dynamic content. Vocativ’s principals claim they can use the deep web, combined with monitoring of social media in a host of foreign languages, to find news stories other agencies can’t. Their search technology is similar to that used by law enforcement to detect terrorist chatter, hedge funds to find hidden financial information, and by intelligence agencies to gauge sentiment and collect intelligence.

Vocativ’s CEO Scott Cohen (formerly digital executive editor of The New York Daily News) notes that the Web is full of what he calls “clusters of disparate signals”–and that his organization’s job is to organize them into coherent stories. A big part of this is following horizontal threads of data–the type of leads hedge funds, law enforcement, and intelligence agencies thrive on–with the hope of unearthing unexpected information that’s of interest to the public. As Vocativ’s application trawls the Web for interesting potential news stories, analysts and their journalist cohorts verify sources’ accuracy (often with non-English speakers). Then the journalist-analyst pairs translate the raw materials into stories designed to challenge Vice and the Wall Street Journal, among others. The pairing of often young and uniquely experienced journalists with data analysts to produce actual stories (not just files of raw data leads) is what makes Vocativ’s business unique.


Open Mind And Sorting The Web’s Data

At the core of Vocativ’s journalism-data-mining project is an intelligence and dashboard software product called Open Mind. It was originally marketed as a tool for law enforcement and government agencies, among others; Cohen says it was originally developed for public safety and natural disaster management.

The product is marketed by a company belonging to Vocativ’s financial backer, Mati Kochavi, and is a web intelligence package that harvests “web pages, social networking sites, video-sharing sites, microblogs, forums, blogs, and RSS feeds” into searchable results. Cohen emphasized that additional changes were made to adopt it for journalism purposes. At a product demonstration, I was shown how Open Mind could be used to launch real time queries in multiple languages on Facebook, Twitter, LinkedIn, and YouTube. According to Vocativ consultants Amit Weiss and Yoni Steinmetz, the dashboard only collects what is called “open-source intelligence”–meaning that it does not collect information from closed Facebook groups, sealed sites like Google Drive or Dropbox, or password-protected sites. Although Open Mind is just one entrant in a crowded market of web intelligence products, Vocativ serves as an impressive application (and advertisement) for the product’s capabilities.

Turning Facebook Chatter Into Journalism

There’s an amazing amount of information that can be gleaned from open-source intelligence. At the 2013 installment of hacker conference Defcon, speaker Jordan Harbinger showed how he gained sensitive information from employees at defense contractors by posing as a job recruiter on LinkedIn; there’s also good evidence to suggest that NSA employees and contractors have posted the names of highly classified projects on LinkedIn. In New York, police officers write anonymized accounts of their jobs and perspective on crime stories via bulletin boards such as Thee Rant. Then there are the technology companies that obsessively monitor Quora and GitHub to see what their competitors are up to. In other words, data-mining and searching social media and bulletin boards can develop a wealth of story leads.

At Vocativ, journalists are partnered with analysts that the company calls “data ninjas.” Cohen says that the six analysts currently at the company work with journalists on a 1.5:1 or 2:1 ratio, and he emphasizes the analysts’ multilingual background. At the moment, Vocativ’s analysts speak Arabic, English, French, Hebrew, Mandarin, Persian, Russian, and “even a bit of Hausa.” One other idea given great play at the startup is that conventional news organizations disproportionately use English-language source material when covering stories–meaning they miss out on potential leads. In Russia, for instance, analyst Tiffany Shi says that they search regional networking site VKontakt, while Chinese coverage relies more on microblogs like Sina Weibo.

This means that the organization comes across unusual news stories. One piece Cohen is proud of was written by ex-NY1 reporter Alessandria Masi using source material from Spotify. During the recent coup in Egypt, Vocativ monitored Twitter for keywords related to the curfew enforced on Egyptian citizens. Masi and the analysts found young Egyptians trading playlists and selfies to pass the time; the resulting story noted how Egypt’s Twitter-using hipsters were off listening to depressing tracks by 30 Seconds To Mars and Lana Del Rey while stuck inside their homes. Using these tweets, Vocativ was able to find Spotify playlists of music from the Egyptian curfews.


He said that the idea was, “Let’s take this tool and we can use it for media. We can give it to journalists who know how to write good stories, and we can put out a great product.” For stories such as the Syrian Civil War, this approach offers dividends.

Analysts working with the company’s data-searching software can examine keywords, relationships between individuals, topics, conduct semantic analysis, and examine ontologies from a variety of sources. Open Mind, which is used alongside dozens of rival products by governments, corporations, and NGOs around the world, also ranks data. Weiss emphasized the need to find a “golden needle” in large volumes of data–a small mention on social media or in the deep web that can be transformed into a viable story lead.

Vocativ’s Founder

Vocativ, a site with a large budget and impressive offices in midtown New York that does not currently feature advertising, is bankrolled by Mati Kochavi, an amiable Israeli-born entrepreneur whose current financial interests lie mainly in security companies. Kochavi was first named as Vocativ’s financial backer in a Bloomberg Businessweek article by Brad Stone. His flagship company, AGT International, is a Swiss-based security firm that works on products as disparate as urban monitoring products for Singapore, security systems for electrical grids, and flood management systems in China and the Netherlands.

Apart from AGT, Kochavi has been involved with a variety of security-oriented firms in the United States, Switzerland, Israel, and other countries. One of these firms is 3i:Mind, a Swiss company that markets Open Mind along with a suite of other security products. The company used Open Mind for a variety of purposes; for instance, they were the former owners of iJet, a global risk intelligence house that provides services to high-level corporate travelers. It’s an unusual background for the financial backer of a news site, but his patronage has allowed the site to do impressive coverage of world news on a similar scope to much more established media brands.

Vocativ also isn’t the first news site to experiment with data-driven dives into the Internet for story development. Several financial news startups have been using these tools for the past two years, but they’ve been developing their leads into expensive custom reports designed for clients who want to know what their competitor’s employees have been chattering about on Twitter. When Kochavi was asked why he founded a general interest news site with viral articles designed to compete with Vice and Buzzfeed, he gave a vague answer that “journalism has a critical role in keeping democracies alive.” Earlier in our interview, he also mentioned that 25-year-olds around the world, whether they be Saudi Arabian, Israeli, Brazilian, American, or Singaporean, share the “same problems, the same skepticism, and the same distrust of organizations, but also the same dreams and hopes.”


It’s also interesting to note that Kochavi is part of a wave of wealthy entrepreneurs with Middle Eastern ties who have started international digital news products over the past two years. Syrian-American oil millionaire Jamal Daniel founded the wonky Al-Monitor news site in January 2012, which provides content geared to Western audiences by some of the Middle East’s top journalists. Patrick Drahi, a French-Israeli communications tycoon, launched i24 News in July, an Israel-based news channel that broadcasts in English, French, and Arabic. The changing nature of the journalism industry, currently in a massive state of disruption, makes it much easier for bussinesspeople and outsiders to enter the ring . . .just ask Jeff Bezos.

And much like Buzzfeed and Vice, Vocativ appears to rely on multimedia content to reach that same demographic of 25-year-olds hit hard by the global economic crunch with international reporting. The startup’s challenge is to take technology Kochavi’s companies have developed for intelligence gathering and leverage it for journalism.

Data-Mining Is Coming To Journalism

Vocativ’s particular genius is that it is the right startup at the right time to leverage deep-data analysis for journalism. As mentioned earlier, this isn’t the first organizations to use data-mining or firehose data analysis as a journalism tool. Blottr is a YouTube-partnered British organization that uses verified crowdsourced multimedia content and licenses them to organizations such as the New York Times and Fox News; their syndication model relies on data-mined content from the Internet. Dataminr is a “real-time information discovery” platform that, much like Vocativ, has been hiring aggressively within New York’s journalism community. But while Vocativ searches social media to find leads to transform into journalism, Dataminr data-mines social media to compile specialized corporate intelligence. The organization, which raised $30 million in June, searches and parses Twitter’s firehose for financial interests willing to pay stiff access fees. It’s an entirely different brand of journalism there, based on the idea of real-time intelligence for traders and analysts. Another company, Selerity, offers media products to high-frequency traders and others looking for information to integrate into their algorithms. But Selerity’s interest is in providing deep web-based proprietary financial data, not general-interest news.

Elena Haliczer, Vocativ’s vice president of product, emphasized the site’s layout. It’s an impressive HTML5 product where every element of a story is designed to be shareable. Every picture, video, embedded Tweet, and node of content is tooled to be individually reshareable on Reddit, Facebook, Tumblr, Twitter, Digg, Google Plus, and other platforms. Users can share on multiple platforms with one click as well. It’s cutting edge web design and UI tech that Buzzfeed (among others) uses to boost page views and brand awareness–and a very useful trick. What Vocativ is doing is novel: The company is retooling what intelligence agencies call SIGINT (signals intelligence) to source and develop individual content. When Kochavi was asked if it was safe to say Vocativ was an example of adapting the sort of data-searching technology used by law enforcement and intelligence agencies for the journalism industry, he laughed after a pause and said “maybe.”

But in the future, odds are that Vocativ’s data-mining approach will be increasingly monitored and emulated by larger news organizations. While they are one of the first to enter the sphere, advanced social media and deep web data-mining products have been around for years. The United States government uses open-source intelligence analysis tools from a wide spectrum of vendors, and the same goes for foreign governments. Fortune 1000 companies routinely search the deep web for corporate intelligence. Law enforcement agencies with money to spare can purchase social media monitoring software, many with similar capabilities to Open Mind, from the vendor of their choice.


Vocativ’s using data-mining to create compelling journalism content, and, frankly, some amazing videos (check out its recent piece on Venezuela’s Tower of David for one). But the rise of cloud computing and massively expanded processor power means the massive reams of data Open Mind collates will become cheaper and easier to monitor. Institutions with a background in data-mining and data analysis for large scale projects, such as Stanford University, the University of Rochester, and the Israeli military’s Unit 8200 are turning out a steady stream of educated data geeks who can bridge the gap between analysts and scientists. To give one example of how this deep data analysis is leaking into the business world, the Human Genome Project–which integrates a wide range of deep data analysis techniques–is estimated to have generated $67 billion of U.S. economic output in 2010 alone.

In the future, as costs decline and the pool of talented analysts and data scientists rise, it’s reasonable to expect large news organizations to invest in web monitoring and data analysis. After all, it hasn’t been too long since journalism schools began pushing basic coding as a “must-have” skill and big news houses such as the New York Times fully embraced data-driven journalism. Organizations such as ScraperWiki are already working with journalists and news organizations to teach data-scraping and data-mining skills. Vocativ may be one of the first in the space, but if they succeed in demonstrating web analysis as a viable model for story development, they won’t be the last.