Can Twitter be mined for information on food poisoning outbreaks? One Google data scientist thinks so. Adam Sadilek led a team at the University of Rochester that developed Nemesis (PDF), a machine learning system which asks "which restaurants should you avoid today?"
Using a set of keywords, Nemesis mines Twitter for geolocated posts that could be indicative of foodborne illness. In tests, tweets from New York were datamined and had metadata added indicating restaurants within 25 meters that were open at the time the user tweeted. A team of humans recruited via Mechanical Turk then came up with 27 words and phrases indicating food poisoning—things like "My tummy hurts," "stomachache," "throw up," "Mylanta," and "Pepto-Bismol." Nemesis then assigned health scores to the nearby restaurants based on the proportion of food poisoning-inferring tweets.
The kicker for Sadliek's experimental project is that the scores assigned to restaurants closely matched Health Department data: The inferred health scores from Nemesis correlated with the Health Department's NYC letter grades for restaurants. According to Popular Science's Shaunacy Ferro, the paper will be presented at the Conference on Human Computation & Crowdsourcing in November.
[Image: Adam Sadilek]