What began as a warning label on financial statements has become useful advice for how to think about almost anything: “Past performance is no guarantee of future results.” So why do so many in the AI field insist on believing the opposite?
Too many researchers and practitioners remain stuck on the idea that the data they gathered in the past will produce flawless predictions for future data. If the past data are good, then the outcome will also be good in the future.
That line of thinking received a major wake-up call recently when an MIT study found that the 10 most-cited data sets were riddled with label errors (in the training dataset, a picture of a dog is labeled as a cat, for example). These data sets form the foundation of how many AI systems are built and tested, so pervasive errors could mean that AI isn’t as advanced as we may think. After all, if AI can’t tell the difference between a mushroom and a spoon, or between the sound of Ariana Grande hitting a high note and a whistle (as the MIT study found and this MIT Tech Review article denotes), then why should we trust it to make decisions about our health or to drive our cars?
The knee-jerk response from academia has been to refocus on cleaning up these benchmark data sets. We can continue to obsess over creating clean data for AI to learn from in a sterile environment, or we can put AI in the real world and watch it grow. Currently, AI is like a mouse raised to thrive in a lab: If it’s let loose into a crowded, polluted city, its chances for surviving are pretty slim.
Every AI Will Always Be Wrong
Because AI started in academia, it suffers from a fundamental problem of that environment, which is the drive to control how things are tested. This, of course, becomes a problem when academia meets the real world, where conditions are anything but controlled.
Tellingly, AI’s relative success in an academic setting has begun to work against it as businesses adopt it. A study from MIT Sloan and the Boston Consulting Group found that 90% of organizations were not achieving significant financial benefits with AI. Research from Gartner shows only 53% of AI projects make it from prototypes to production.
The COVID-19 pandemic was a grim reminder that many factors are out of our control, and the future doesn’t look anything like the past. The next year does not look like an aggregation of the past 10 years. So the approach of evaluating AI based on how well the model fits the past data doesn’t make sense.
The ugly truth is that every AI will always be wrong. Any AI will be correct sometimes and wrong sometimes. With that in mind, it’s critical to get AI out of the lab and into a production environment as quickly as possible so you can evaluate it on actual transactions that come its way. This is not benchmarkable, replicable, and thus completely unsuitable for academic papers—but exactly aligned with what businesses actually need to get value from AI.
We need a pragmatic definition of the quality of an AI. My proposal for the definition of better AI: Look at how much better decisions taken based on an AI are compared to similar decisions taken without the help of the AI. An A/B test, if you will. It could be something as simple as, “Can the AI trained on this data create economic value for me compared to what I would do without the AI? Can an AI trained on this data help me do better than I was doing before?”
For example, imagine we are training an algorithm to determine the most promising sales opportunities. Which company is the main competitor in the deal would certainly affect our likelihood of success, but most people don’t record—or even know—who they’re going up against. While training an algorithm, businesses could take the time to clean up the data, hiring researchers to trace every lost opportunity and fill in the names of the other companies that had vyed for that same opportunity. But what’s the point? When the AI predicts future sales transactions, most of the time, the competitor field will be blank in the transactions it will be asked to predict. Future data, then, does not look like clean data. That’s why we should train it on messy real-world data that better represents the messy real-world data it will have to use for predictions.
The approach of creating artificially clean data and then training and evaluating AI based on that information has become impossibly impractical. It’s time for AI to get real.
Arijit Sengupta is the founder and CEO of Aible.