For services that deliver products and people across town, efficiency is everything. But the existing GPS data sources are far from perfect. To stay competitive, companies like Lyft, SpoonRocket, and Factual are creating their own algorithms to figure out how to reach you faster. These normally tight-lipped companies gave us a peek into the geo data stacks they’ll be unveiling in the coming months. Here’s how they work.
When ride-sharing company Lyft first started, employees tried to figure out how far the driver was from the passenger by using Google’s API for Google Maps. Now with more than tens of thousands of users, they’ve had to rack their brains to come up with a solution. “After a while, it’s like, we could build a better algorithm than Google, based upon our own data,” says Lyft’s VP of Data Science Chris Pouliot.
Using Python, Lyft’s team of 15 data scientists is working on a new algorithm that uses geo data for two functions: dispatching nearest drivers and calculating estimated times of arrival (ETAs).
From the minute drivers open the app, Lyft collects a niche set of data points multiple times a minute. The method tries to best ETA predictors like Google Maps or Waze, Pouliot says.
“The problem with that is people only have Waze open when they don’t know where they’re going,” he says. “I’m traveling from San Francisco to Palo Alto, I already know where I’m going and I wouldn’t have Waze open.”
He says while drivers take passengers from San Francisco to Palo Alto, Lyft is collecting data to better understand similar trips and better predict an ETA. “We’re going to know the speed that they’re traveling, at different times of the day, different times of the week. From that, we think we can build a better model than Google’s ETA estimates.”
Instead of using road distance, data picks up different variables in the quickest route between passenger and driver by time-stamping GPS coordinates. “From that, we can figure out the speed that they’re traveling, where they’re going, and where they came from,” he says.
He uses the analogy of Google producing translations by analyzing the coincidence of words and linguistic rules used when showing a word in both languages.
“The Google engine would say, ‘Over 90% of the time when it says ‘Hello’, it says ‘Hola’ in Spanish. “We’re not trying to beforehand figure out ‘Well, this GPS coordinate is kind of close to being on 101’ but we’re taking the opposite approach, saying “Hey, it’s 50 miles an hour, therefore it must be 101. That sort of thing.”
Translating that into ride-sharing, Pouliot says Lyft data scientists measure data of the speed on roads on any given point on a map to differentiate the types of roads such as a highway and a side road. “If you overlay those speeds on a map, it would translate to something such as you’re going 60 miles an hour, you’re probably on a highway,” he says. “If you were on a side road, you wouldn’t be going that fast.”
The wide range of variables in GPS data is what attracted Pouliot, the former Netflix director of Algorithms and Analytics, to go work at Lyft. “There’s the geo-spatial element to the data–it’s a really giant economics problem, trying to balance supply and demand. All problems I never experienced at Netflix.”
Taking a different approach at creating their own ETA engine, rival ride-sharing company Uber took the Map Matching route, creating real-time mapping in the logistics framework to deliver flowers and mariachi bands.
Late last year, the company also included a feature called “Share my ETA” for users to let their friends know where and when they’ll arrive to their destination.
In the food industry, where weekly returning customers are the holy grail, SpoonRocket’s model shows geo data makes them come back more than once a week.
Though it’s only in two cities currently, SpoonRocket is building its own geo data model being programmed in Python and R to improve their ETA average of eight minutes.
“What we do right now would have been a logistics nightmare 10 years ago,” SpoonRocket CTO Anson Tsui says. “In terms of being able to dynamically route drivers all the time, there’s a lot of very advanced routing stuff in the background.”
The small startup has recently brought on a full-time data scientist to figure out localized ETA down to minutes for any given meal in San Francisco and Berkeley.
“Having our own in-house, our model is so specific to our needs that it just makes more sense to build our own,” Tsui says. “I think that’s worth waiting.”
Including past data into prediction models, Tsui says he uses
system as the most accurate and helpful tool for predicting food delivery ETA.
“It computes these data subsets called ‘Trees’ and then out of all the ‘Trees’ that get calculated, we pick the one we want,” he says. “It takes a little bit longer to compute, but it’s the best out there for this.”
Because of SpoonRocket’s already quick turnaround times, users come back and order more than once a week–beating the average food service benchmark.
But in order for businesses big and small to get the right data, they need the right framework for the right price–which is what big data company Factual hopes to enable with GeoPulse Geotag.
This August, Factual will be rolling out a feature called Geopulse Geotag–built with Open Street Map data–which provides app developers access to the normalized interface of world-wide geography.
“It’s basically a reverse geocoder, but it’s more of an entity look-up that allows you to label all the digital assets created on your mobile phone,” VP of product Tyler Bell says. “And then once you have that data, you can run that through your machine learning, so it is really a large-scale geographic annotation engine.”
With a data stack baked down to 17 million U.S. places, Factual’s vast geo data has been a center point for back-end map functionality for companies like Microsoft and Yelp.
Enabling open-source mapping like Open Street Map, Factual gives a full, core data set to hang attributes so businesses don’t have to create their own databases.
With a database of about 70 million small businesses and landmarks in 50 countries, Factual uses its 1.3 billion location data points for contextual information–a big people and places dataset.
Bell, a former Yahoo geo data expert, says, previously, you had to have a fat wallet to purchase any type of map data. “Over the last 10 years, there’s been a growth in the creation of open source or largely open geographic datasets that don’t cost an arm and leg to create maps.”
Just a year ago, the thought of in-house mapping was overly ambitious, but in the next year I’d expect to see more full-time, specialized analysts, cheaper map data, less big-name proprietors, and more accurate ETAs–all quicker than you can say Moore’s Law.