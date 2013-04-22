We asked Andrew Hogue, Foursquare’s head of data and search, how his team uses millions of data points to decide what to build (or not to build) next. You can see another excerpt from this interview–about the first-time user experience–in this article.

Almost everything that we do with data, we have some English language notion of what we want first. For example, trending places: We want new and notable places that just opened up; or we want places that are popular right now; or we want to do a good job for the query “espresso.” There’s all these things that I can convey to you in English, but then there’s a big gap to the way the computer understands the problem. So data science is really just a matter of taking human ideas of what’s a good product and mashing the technology together so that it’s more human.

Somebody comes up with the human terms first, and then it’s a matter of translating it back into the data. Sometimes the data kind of speaks for itself. This great wine bar just opened up–why don’t we see that in the app somewhere? We say: Okay, we seem to have a desire to show new venues in the app. Then, we start digging into the data, saying, we’ll just take everything that’s been created in the last month. Well–that’s a million places, so that approach is not going to work. How do we cut that back to include only the places that are really interesting? It becomes this kind of iterative process, where you’re like hacking away at ideas and kind of combining them. In the “showing new venues” case, we end up taking places in the last four months which have a high number of average check-ins, and their check-ins are kind of spread out over time so we aren’t getting venues which are really events. (Often times, someone will create an event like a farmers market for one week–you don’t want to put that in the new venues because it’s not going to be there anymore by the time we show it to you.)

Data vs. Datum: An earlier version of this post used “data” as a plural noun in the headline. While grammatically correct, this usage seems to irritate the eyes of our readers. We’ve switched it to the more common singular usage.

Take the perspective of a startup with some traction: Should they think in terms of features as well?

I think so, yes. Otherwise, the data are going to tell you a thousand different things. If you pick one feature at random and build it, but there’s no guarantee that it’s actually going to be interesting to any actual person, or maybe it’s interesting to a very small fraction of people. That’s not a good thing to build a business on. That’s definitely a trap that you fall into: You think this is a great data-driven feature and you build the whole thing–then all of the sudden you realize that only 0.01 percent of the population cares about that thing. Whereas, if you start from more human terms you can have a rational discussion among, not just engineers, but also new product people and new eye designers and researchers. You figure out how this feature idea is actually interesting–do people really do want to know what the new places are? Or are they more interested in where to go for ice cream after dinner?