This story reflects the views of this author, but not necessarily the editorial position of Fast Company.
It’s hard to think of a hotter field riven by more fundamental questions with potentially massive consequences than data science. As businesses sharpen their data chops, some worry that subjective judgment calls are becoming obsolete, and even that those who currently make the bulk of them could one day be driven to extinction.
That hasn’t happened yet, but there are already signs of trouble. In the political realm, last year brought two epic polling misses in close succession, with even the most sophisticated data-driven models failing to forecast Brexit in the U.K. and Donald Trump’s clinching the presidency in the U.S. Neither blunder seems likely to dampen demand for data professionals, whose career prospects in the business world show no signs of cooling down.
And why should it? Most data scientists, analysts, and tech workers in quantitative and statistical roles aren’t pollsters or political wonks. Yet the very likelihood that private industry’s data infatuation will only grow suggests that there’s a bubble underway, even if it takes a while to burst. If data can fall dismally short in one arena, the same can happen in another, especially when it’s being asked to do anything and everything.
The time for data professionals to do some soul-searching is right now, while they’re punching above their weight. Here’s why, and where they can start.
In June 2012, Northwestern University political scientist Jacqueline Stevens took to the New York Times to decry the many “lousy forecasters” in her profession.
Her targets were fellow researchers who leaned on “statistical analyses and models” to attract funding, “even though everyone knows the clean equations mask messy realities that contrived data sets and assumptions don’t, and can’t, capture.” In the process, Stevens wrote, her “colleagues have failed spectacularly and wasted colossal amounts of time and money.” Four and a half years later, her call to “stop mistaking probability studies and statistical significance for knowledge” rings just as true.
One thing that’s changed since then is the growing discussion of a lack of diversity in data-based roles–and the consequences of that. Recent research suggests that women are leaving STEM jobs where they’re already under-represented. And the longer gender gaps persist in science and tech fields (not just data science), the more likely we are to be saddled with technologies that replicate the unconscious biases of the people who build them. That’s already occurring; in one particularly dismal example, the algorithm behind an AI-judged beauty contest showed a preference for its white contestants.
But while these diversity issues are rightly gaining attention, they might be a red herring for an even bigger problem. As the most vocal advocates for greater diversity within high-tech and STEM fields point out, there are deeper concerns about how data is being used in them. If data professionals are asked—enough times—to do things with data that they simply can’t, their perceived value will decline. Reflecting recently on her Times op-ed in light of the past year’s events, Stevens worries that “bringing in people from different backgrounds is just contributing to people being differently ignorant.”
In fact, the very demographic shifts that are driving both a greater awareness of diversity issues and a political backlash against them may be partly to blame for forecasters’ recent fumbles. Citing the philosopher Karl Popper, Stevens observes that when societies go through unprecedented changes, statistical models get worse at predicting their repercussions. “What big data is good for,” explains Cathy O’Neil, author of last year’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, “is finding patterns of behavior in the past. It will never help us find something that’s completely new.”
So while the pollsters’ technical solutions may have adjusted for known sample selection bias, she says, they failed to anticipate and address a new kind of bias–“the one where Trump supporters don’t trust pollsters.” In retrospect, it seems obvious that when a candidate talks and tweets continuously about how rigged the polls are, many of his supporters would either refuse to talk to pollsters or outright lie about how they intend to vote.
Yet even the statistical models that tried to account for that fell short. Adds O’Neil, “We’ve been trained to think of these kinds of analytics as being extremely predictive of future events and to focus on the numbers as if they matter. Yet the interesting thing should be why people made a decision to vote one way or another, and what could change their minds.”
When data analyst Katie Seely-Gant of Energetics Technology Center read a 2008 article by Wired‘s editor-in-chief Chris Anderson, called “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” she recalls thinking it was “dangerous to the field.”
“I really think big data is the future,” she says, “but people are running fast and loose with what we might consider best practices and methods.” Seely-Gant has set out through her own research to track some of the most prevalent and damaging ways that is currently taking place. “Without an underlying theory,” she says, “you can go out and gather all this data, but you don’t know what you’re looking for.” As a result, you wind up placing too much faith in numbers conjured up to match the conjurer’s wishes.
That doesn’t just contravene good science, as Stevens suggests, it’s also undermining trust in a field on which more and more people, governments, and businesses are hinging their hopes–and placing their bets. So far, it’s mainly insiders and practitioners who are voicing concerns. But someday, the bubble could pop.
O’Neil partly attributes 2016’s political surprises to data professionals’ rush to indulge public fascination with polling numbers, leaving no time or inclination for honesty. “What if all the pollsters in the entire country had suddenly said one day, ‘You know what? We just realized that we have no idea what’s going to happen?’” Answering her own question, she says, “It would have been a fucking disaster.”
If there’s one thing O’Neil, Stevens, and Seely-Gant all agree on, it’s that data scientists need to be able to admit when their models don’t say anything–a matter of simple transparency. No one is better qualified to explain the limits to their trade than data professionals themselves. And managing expectations can be a long-term strategy for defending their job security.
This isn’t likely to happen, of course. Data science jobs have never seemed so secure (or highly paid). The preference for neat statistical “proofs” that Stevens took aim at in 2012, and the zeal for sound bites and headlines that Seely-Gant and O’Neil have spent the past few years shining light on, remain in full force. Yet in the near-term, organizations still seem hellbent on blindly pursuing more “data-driven” capabilities.
So no matter what epic failures those methods beget, Stevens wagers there’ll be little motive for the business world to become “more self-reflective and open to radical changes in how we acquire and analyze knowledge.” Until then, those with the most incentives to do so are the direct beneficiaries of the feeding frenzy: data professionals themselves.If their careers are taking off based partly on misplaced faith, they can be laid low by the same thing. After all, there’s no one more capable of explaining what their expertise can get right, what it can’t, and when it’s impossible to know.