Google Flu Trends is a “big data” tool that anyone who has ever caught the sniffles can understand. Before doctors and pharmacies, Google’s search engine tend to be the first place we turn to when feeling sick symptoms. Every winter, the company does a public service by mining its search data to track and predict the spread of the virus.
But looking back at the project since 2009 has led four researchers to conclude something important: Google Flu Trends isn’t all that accurate or useful yet on its own. Sometimes its output is no better than slower predictions from the Centers for Disease Control, which gets reports from its network of actual testing labs and clinics. Other times, Google’s output is flat-out false: During peak flu season last winter, the Google algorithm told people that a staggering 11% of people in the U.S. had the flu, almost double the 6% CDC estimate. More generally, Google has been persistently overestimating the flu since at least 2011, the researchers, writing in the journal Science this week, say.
“Even three-week-old CDC data do a better job of projecting current flu prevalence than GFT [Google Flu Trends],” they write.
What’s going on?
The authors say it’s not that Google Flu Trends is useless. In fact, combined with CDC data or other up-to-date health data, it could produce results that beat any of the two forces alone (some are already doing this).
Rather, the problem is two-fold, they believe. For one, “big data hubris”–the idea that non-traditional data mining methods can produce standalone predictions that beat traditional models used in scientific analysis–is at play. But also Google is undermining its own work as it constantly tweaks it search product, including more recommended searches and automated answers to questions, to improve its business and provide more relevant results. Google Flu Trends, on the other hand, assumes only external events–like more flu virus in the world–are affecting user searchers.
The authors call on Google to be more transparent with the research community about how it analyzes its data, without violating the privacy of its users. This would allow other experts to build on and learn from their work. Google’s current public discussion about the algorithms and search terms used for Google Flu Trends obscures crucial information that would be needed for anyone outside the company to replicate the work. This idea, of course, can be expanded beyond simple flu reports and also to other companies that hold an increasing amount of the world’s digital data.
“Google is a business, but it also holds in trust data on the desires, thoughts, and the connections of humanity. Making money ‘without doing evil’ (paraphrasing Google’s motto) is not enough when it is feasible to do so much good,” the paper says.