Unless you’re both feeling chatty, New York City taxi drivers’ triumphs and travails are usually hidden from the customer’s view, inscrutable behind panes of smudged plastic. But after deciphering 50 gigabytes of data from New York City’s Taxi and Limousine Commission, data scientist Chris Whong has given us a hypnotizing glimpse into what 24 hours of driving a cab might actually look like.
Understanding how medallion cab drivers make money will be key as sharing economy startups vie for a slice of the ground transportation market. Last week, the TLC abruptly announced that due to safety and driver background concerns, car-sharing app Lyft had no license to start carting around New York City passengers. The notice only came two days before Lyft’s planned 500-driver launch–the company’s largest rollout so far.
Whong found that the TLC wasn’t the most forthcoming with its cab data, either. The data scientist’s interest was piqued when he saw the TLC tweeting data visualizations about taxi metrics earlier this year. But when Whong tweeted at the TLC Twitter handle to see where he could get a hold of the raw numbers from 2013, the agency told Whong that he’d have to submit a Freedom of Information Law request.
After emailing a FOIL request, Whong reported that the TLC was polite and quick to respond, offering to transfer the data onto a hard drive. Then, with the help of civic hackers from Reddit and betaNYC, Whong was able to host the data on Google’s BigQuery servers and distill several individual cabs’ trips over the course of a day in 2013.
A few initial observations: Whong says he expected to see lulls in cabs’ activity in the early morning hours, but didn’t anticipate the pattern of empty cabs at seemingly random times in the afternoon. He also noticed that when cabs did leave Manhattan to drop off customers in the outer boroughs, they often rushed right back to the island without picking up passengers outside of Manhattan first. (Note: The latter is something of a perennial problem, though one the city’s tried to address since the summer of 2013 with its bright green Boro Taxis program.)
Whong is hoping to soon make more use of the data, perhaps to track how much time cabs usually spend idling in congestion. But his project also highlights another possibility: If city agencies go open source, it only gives the public more opportunities to identify and correct inefficiencies. If the TLC is worried about the sharing economy butting into its turf and exploiting gaps in the traditional model, applying some open source know-how could help make medallion cab drivers competitive with the newest tools.
“I think this is an interesting insight, taking a tiny slice out of big data and making it accessible to a normal user,” Whong says. “Data scientists from all over the world are now crunching the numbers on the data now that it’s become available.”