It's no secret that Uber is powered by data. The ridesharing service is built on analytics and famously encourages passengers and drivers to rate each other. But data scientists at the company aren't just focused on the service's core functionality. They often delve into some pretty nitty-gritty details of how passengers use the service—and they can dig up some unexpected insights in the process.
Using Bayesian statistics, Uber's data team showed how it's been able to accurately predict the destination of its users three out of four times. To do this, they built an algorithm designed to figure out specific destinations—not just the intersection or another rough approximation, but exact addresses—and then tested it against the actual behavior of 3,000 anonymous Uber riders.
In an exceptionally wonky, math-laden post, Uber's Ren Lu breaks down the specifics of how their formula works:
We took the riding patterns of over 3000 unique riders in San Francisco earlier in 2014 (anonymizing the data to protect privacy.) Each of these trips had been "tagged" by the rider: when requesting an Uber, the rider had filled in the destination field. We assumed that this represented the true destination the rider wanted to go, creating a gold standard against which we can compare the predictions of our model.
Uber's formula looks at three key factors: the history of the user, the behavior of other users, and the general popularity of specific places. It mathematically blends all of these "priors" (in statistical parlance) and factors in other bits of logic to determine whether a user is going to a particular nightclub or the coffee shop down the street. This is no easy feat in densely populated cities like New York City or San Francisco, which is why the traditional method of reverse-geocoding the drop-off coordinates or pinging a publicly available location database wouldn't cut it.
Where has this person gone in the past? Do they frequent a certain bar? Where do other Uber users go? What businesses are popular generally? These are the basic questions the algorithm asks. On top of that, it smartly considers factors like time of day (people don't typically go to night clubs at 11 a.m.), distance (people aren't likely to get dropped off too far from their actual destination) and even the Zip code of each destination (Sketchy neighborhood? They probably didn't want to walk far, so the destination is likely near the drop-off point).
Why does this matter? Like any modern company, Uber has a lot to gain by better understanding its users. "Our rider destination model is one way the #UberData Team is working on improving the Uber ride experience," writes Lu. "Extensions of this project involve building more complex priors and likelihoods."
In other words, it's not just your final destination that Uber wants to be able to predict. Thanks to experiments like this and the models they use, Uber's ability to see into your future will only get more refined with time. Presumably, so will your ride.