Six months after its latest resurgence, the Ebola virus shows no signs of letting up. “We desperately need new strategies adapted to this reality,” said Dr. Joanne Liu, international president of Doctors Without Borders in a grim statement last week. One hope is that data, which can spread faster than disease, could give humans a technological leg up on the spread of the epidemic. The problem with this data is that it’s massive and often unstructured.
Can scientists and medical professionals make sense of the mess in time for it to make a difference?
The answer may lie in data-mining techniques that were previously used by the U.S. military to track terrorists. Modus Operandi is a Florida-based defense contractor that specializes in big data analytics and semantic analysis. The company has long partnered with clients like the U.S. Marine Corps to track people–in this case, terrorism suspects–as they travel throughout the world.
“This translates well into bioinformatics because instead of terror networks, you’re trying to figure out an infection network,” Eric Little, vice president and chief scientist at Modus Operandi, tells Co.Labs. “You’re using heuristics to look at where it’s popping up and how it’s being passed. Who’s connected to whom? How did the infected person travel? Who did they come in contact with?”
Having used this technology for military purposes–Little won’t disclose how it was used by his defense clients–he sees a natural progression not just toward headline-grabbing epidemics like Ebola, but to diseases as common as cancer. “Ultimately, we’re in the business of threats,” says Little. “That’s what we do.”
In the case of a disease like Ebola, data used to track the spread of the disease can come from any number of sources, starting with tissue samples and medical reports taken in the field. Factor in information from medical labs, NGOs, public research, and private institutions and you have a pretty hefty mess of data that comes in any number of different formats, if it’s even structured at all.
“Using semantic technologies and semantic reasoning, we’re able to take a lot of the computation out of the scientists’ heads and put it into the system itself,” says Little. “We literally code in some of their knowledge against the data itself.”
Platforms like BioIQ, the disease-tracking tool currently in development at Modus Operandi, aim to normalize the data, visualize it using charts and combine it all to create a digital model of a real-world problem, or what Little calls an ontology.
“Ontologists look at what things are,” says Little. “How do you describe them? How do you model them? There are spatial parts to it. There are temporal parts to it. The spatial parts are dependent or independent. Things have attributes. There’s all these complex relationships.”
Once the outbreak is modeled and graphed using all these disparate sources of complex data, the software is able to use its own propriety algorithms to query the data, create rules, and run computations to reveal relationships and developments that may not have been easily uncovered previously.
This approach also has the advantage of leveling the information playing field. A medical epidemic involves different scientists and researchers all well-versed in their own fields and jargon, but details can get lost in translation between disciplines. These silos of expertise have a way of hampering collaboration, which is annoying in any research environment–but potentially deadly in this one.
“If you’re a virologist and somebody is running a genomic sequence, as a virologist you probably can’t deal with or read the sequence data,” says Little. “You’re not a genomics expert. The genomics expert is not an expert in virology. None of them are probably actually physicians that are treating the patients.”
The hope is that by merging all the data in one place, analyzing it, and turning it into visually digestible graphs, BioIQ can make the data accessible to everyone who needs to work with it, regardless of their background.
One of the most important things machines look for in a case like the Ebola outbreak is how the disease spreads geographically. Perhaps the most eye-opening illustration of how location-powered data science can be used in a scenario like this is HealthMap, whose algorithm detected the current outbreak before it was publicly announced. While BioIQ isn’t as far along on the health mapping front as HealthMap (nor does it rely on social media data like HealthMap does), geographic intelligence is an integral part of the platform.
It sounds promising, but Modus Operandi in racing against a deadly clock. Scientists have created computer models showing it’s going to take at least 12 to 18 months to get the Ebola epidemic under control. BiolQ, meanwhile, is still about 12 months from being field-tested. Even that estimate “depends on the customer and the amount of testing that has to occur for the system to be deployed in real live use cases,” Little says.