This May Be The Most Vital Use Of “Big Data” We’ve Ever Seen

Just as after the Vietnam war, there is an epidemic of suicides going on among American veteran of the Middle East wars. According to the VA, there is one suicide ever 65 minutes by veterans or active-duty personnel. We talk a lot about “solving problems” with technology on this site–and we can scarcely think of a more urgent one than this. Screening social network data could identify at-risk vets before it’s too late.


Unlike the Vietnam days, veterans today search for information (or vent their troubles) about suicide on Google and Facebook. A new project, newly launched by DARPA and Dartmouth University, is trying something new: Data-mining social networks to spot patterns indicating suicidal behavior.


Called The Durkheim Project, named for the Victorian-era sociologist, it is asking veterans to offer their Twitter and Facebook authorization keys for an ambitious effort to match social media behavior with indications of suicidal thought. Veterans’ online behavior is then fed into a real-time analytics dashboard which predicts suicide risks and psychological episodes.

However, there’s a caveat: The Durkheim Project, which launched on July 1, is only a study into the effectiveness of predictive analytics for mental health. Veterans who participate will only be monitored, and have to receive any needed mental health assistance through the VA and other sources.

Who’s Behind The Durkheim Project?

The Durkheim Project is led by New Hampshire-based Patterns and Predictions, a Dartmouth University spin-off with close ties to academics there. Patterns and Predictions received funding from DARPA, and the analytics platform is based on products from big data firms Attivio and Cloudera. Additional assistance was granted by Facebook and the Veterans Education & Research Association of Northern New England. Patterns and Predictions’ main product, Centiment, is a predictive analytics tool that uses linguistic and sentiment analysis for the financial industry.

For Facebook, the project marks one of the first times that they’ve let non-profits and the medical community conduct extensive data mining of the service for predictive analytics purposes. As the project’s literature puts it, they want to “provide real-time monitoring of text content and behavioral patterns statistically correlated with tendencies for harmful behaviors–such as suicide.”

What Are Researchers Looking For?

Sid Probstein, Attivio’s CTO, told Co.Labs that Patterns and Predictions, the Geisel School of Medicine, and the VA conducted a double-blind study that found linguistic clues associated with suicidal behavior. These keywords, word patterns, and other information were fed into Patterns and Predictions’ predictive analytics (powered by the Attivio AIE search engine and Cloudera’s Hadoop data fabric); once these patterns were baked into the program, the machine learning extrapolated useful patterns and clues.


The VA, Dartmouth, and other medical organizations hope to use these to help veterans in the future.

The information used by the Durkheim researchers generated multiple linguistics-driven prediction models that estimate the risk of suicide with 65% accuracy on small dataset. According to Probstein, one of the project goals is to substantially increase that accuracy rate in future studies,

Patterns and Predictions head Chris Poulin added that the dataset used for the machine learning process “uses a training set of certain linguistic clues and keyword features known to be related to people who needed help. These words, and or synonyms of these words, are also expected in the social/mobile data.”

How Veterans Use It

Participating veterans install a unique Facebook app and a mobile app for either Android or iOS, which also allows them to tweak sharing settings on a granular level. Informed consent forms are required for Facebook, Twitter, and Google+ monitoring. For Twitter posts, all text the user posts in tweets is data-mined.

How Researchers Predict Suicidal Behavior

All information collected from the study is stored at Dartmouth’s Geisel School of Medicine. According to Probstein, the information lies behind a medical firewall stored according to Human Subject Study/HIPAA privacy rules.


A dashboard used by researchers at Geisel shows profiles of individual study participants, along with information about the doctor treating them, clinical notes, and an overall risk rating generated by the Patterns and Predictions platform. Risk ratings update every 60 seconds, and are generated based on keywords specific to each study participant’s profile. Trends in the individual study participant’s mental health–a “mental health ticker” based on social media use–is then created. Clinicians are also given access to all of the source content used for each profile. Each study participant’s Facebook and Twitter posts are archived, and are accessible to the researchers to correlate to specific blips in the mental health ticker.

Probstein said that the dataset used for the machine learning “uses a training set of certain clues and features that are known to be data related to people who needed help and weren’t able to get it.”

Why Is DARPA Doing This?

In the long run, social media monitoring could sharply reduce the obscenely high veteran suicide rate. A toxic mix of PTSD caused by military service, the VA’s documented difficulty providing basic mental health treatment, poor job opportunities at home, and a lack of understanding from the outside world all mean that veterans kill themselves at rates far exceeding the rest of America. DARPA’s goal to think outside the box applies to veteran health care also; using social media to deliver targeted services to the veterans who need them most could be a boon for the perennially cash-strapped VA.

The Durkheim Project is part of DARPA’s Detection and Computational Analysis of Psychological Signals (DCAPS) project. DCAPS is a larger effort designed to harness predictive analytics for veteran mental health–and not just from social media. According to DARPA’s Russell Shilling’s program introduction, DCAPS is also developing algorithms that can data mine voice communications, daily eating and sleeping patterns, in-person social interactions, facial expressions, and emotional states for signs of suicidal thought. While participants in Durkheim won’t receive mental health assistance directly from the project, their contributions will go a long way toward treating suicidal veterans in the future.

It is important to note that DARPA is not the only organization funding the project. DARPA funded Durkheim’s phase one, and Facebook is joining in funding phase two, and an upcoming third phase will be funded independently.


The project launched on July 1; the number of veterans participating is not currently known but the finished number is expected to hover around 100,000.

Update: This article was appended with extra information about funding and granular privacy controls received post-publication. The original version of this article also incorrectly stated Durkheim’s occupation; he was a sociologist, not a psychologist.