As the lead geo engineer at Foursquare, David Blackman’s work has been the lynchpin of some of Foursquare’s biggest changes in recent months: Explore, Best of, and City Pages. In a larger sense, his project is also instrumental to making Foursquare grow outside the U.S. by providing an accurate user experience even where extant map data isn’t detailed or reliable. When popular startups have a next act, it’s because of the magical stardust forged in the brain of guys like Blackman, who figure out ways to increase the company’s technical mastery in their area of focus; in this case, location.
Blackman isn’t building a “killer feature” for Foursquare–he’s building a framework that is making an entire generation of killer features possible. Frameworks are like platforms within the platform; they’re kits that help developers inside and outside the company build spates of new features or entirely new apps. With a small team of engineers, Blackman is rebuilding the geographic infrastructure that Foursquare uses to figure out, at a fundamental level, where exactly you are.
The larger problem that Blackman chips away at has to do with the mountains of geodata the company collects with its check-ins. Foursquare needs ways to intelligently use their check-ins to help parse the meaning of the user’s lat/long. For machines, Blackman says, the key to understanding human beings is good maps. Accurate, humane, and dynamic maps. We talked to Blackman about what he’s building–and how it’s going to position Foursquare to forge its own future.
Why build your own geographic infrastructure in-house?
We’re building the data because it’s not there. Geonames we didn’t build but [we’re] building a worldwide polygon dataset. If I didn’t have to build it, that would be great. I’d be working on my geocoder. I’d be building better map reduces. I’d love if we didn’t have to, but it just doesn’t exist.
What’s so difficult about locating a city by querying, say, “Austin”?
[Austin] can mean a lot of things. When you say to a computer, “I want pizza in Austin” or “Show me a page about Austin,” there are a couple things you might need to do. One is just center a map. “Austin” doesn’t tell you on a map where Austin is. You need to, effectively, look in the back of an atlas, scan, find Austin, find the latitude and longitude and find the map cell. That’s effectively what the computer is doing as well. You need that index. You need it in a format that a computer can read and that lets you get a rough idea of where Austin is and draw it on a map. I can tell you what Austin means to my computer, or what Austin means to the U.S. government. Understanding what Austin is is a combination of knowing roughly where it is on a map, knowing the administrative boundaries and then trying to gain [Foursquare] data about the human intuition of the Austin metro area.
A big part of how you define a given locale is by using polygon shapes, as opposed to just drawing one big box around New York City, for example. Why not just put the world on one giant grid?
If I gave you the center of Austin, you would have no idea how far out to search to get things that are logically in Austin that people might be willing to go to. So as a start you want bounding boxes, but even better than that, we talked about polygons. Why are polygons useful? If you draw the bounding box for Brooklyn, no matter how I drew it, I’m going to get a ton of lower Manhattan, a ton of Jersey City and a ton of Queens. And there happens to be a river here, which you don’t really want to cross. Once I have the polygon for Brooklyn, I can now start doing searches where–I can still do a bigger search, but I really really try and prefer the results [in Brooklyn]. You run into this issue where as soon as you cover Manhattan, you start bleeding into Jersey City. And the behavior of people in Manhattan and Jersey City is different. So if we just grid the world, no matter how I grid the world I’m not going to be able to catch this difference. So I think about the polygons as being a way of creating humane big data. So, say we want to know what people are likely to search for near them. We might want to pull up a different list in Manhattan vs. in Jersey City. And the only way we’re going to do that is, rather than going back to arbitrary cells, we’re going back to these humane boundaries.
How are developers at Foursquare using your work?
With all the data we get internally, [developers] are like, “Can you break it down for me by city?” It’s just how we think. How is Foursquare doing in Russia or the U.S.? Well, I can’t deal on a scale that big. We have an analytics team. When they work with these data sets, even if cities aren’t perfect–even if clamping to Austin is going to cut things out at the edges–it’s going to be good enough to give them a much more intuitive understanding of what the data is doing. Sometimes we want to reach out to users and we want to target a user in Austin and say, “Hey, can you help us with some data?” We need to know where Austin is for that. As I became more and more part of the geo community, everyone was like “Hey, where can I find state polygons? Have you found state polygons yet?” Developers really want this. They really want to give their software an intuitive understanding of the world. Any mobile startup is immediately a geo startup. The phone has GPS. Even if your startup is a drawing app, you still might want to know where your users are drawing in the world. You need a humane understanding of that.
Can you actually build all the geo infrastructure you’ll need in-house?
It would be great if we took it upon ourselves to build our city polygons for like 20 countries. I would like to see that taken up by a larger community, a larger set of people. You think you can model the world, but you can’t. It’s really complicated. It changes all the time. We live in the U.S. where things are relatively stable, but there are other countries where states are changing all the time. Cities didn’t even exist a couple of years ago. Even in the U.S., if you ask the New York City government, they have a sense of where Williamsburg is and they claim that the northern border is Metropolitan Avenue. So you get out at Bedford and New York City still says you’re in the Greenpoint Business Improvement District. That might be changing, but even if we had the governments contributing all this data, it would still not be up to date with the way people are thinking about cities.
Foursquare recently launched a partnership with City Pages, the Twin Cities weekly paper, turning their “Best Of” issue into a sort of location-aware interactive guide . How do you figure out what city boundaries to use for something like that?
We built that based around our understanding of how big the cities were. In that case, we used our reverse geocoding technology, but forward geocoding was important there too. Say we don’t have polygons for Moscow. Instead what we can do is enter addresses for venues. And they might say there’s the English Moscow, there’s Russian, there’s transliteration, there’s a French name for it. They could be entering all these different things and we want to know that they’re all the same when we geocode them. That’s another way that when we don’t have polygon data, owning the geocoder and owning the data means we can forward geocode all these points and cluster them. It doesn’t always work. Some people don’t know what city they’re in. Some people don’t think about a park or a monument and might not even bother putting in an address, so reverse geocoding has been very useful there.
How do you define neighborhoods?
Neighborhoods are the most humane things, yet they’re the most inhumane things to work with. The thing about neighborhoods is that they’re only defined at the municipal government level. Maybe. And they don’t have any government of their own. So it’s a narrative that we all agree to. Obviously, real estate does influence it and then there are people like us. I really just want to sit down and draw a set of New York City neighborhood polygons. Like, take a weekend and just draw them. I think I know where they all are. But where am I getting that data from? The world is feeding it to me. I’ll look at the Village Voice‘s cool, hip neighborhoods. I’ll look at their map. I’ll look at what other people think. I’ll look at real estate listings. And there are companies who do this professionally, but then there’s a tension between the real estate definition and the on-the-ground definition. Neighborhoods are funny. Everyone has an opinion. It’s one of the trickiest things in geo. It would be great if neighborhoods were a perfect patchwork quilt where they never overlapped. We really need to pick one. Which means we need to pretend that it’s a patchwork quilt, but it’s not. Things overlap. Alphabet City’s a subset of the East Village. NoHo, Nolita, SoHo, they [overlap]. So neighborhoods are tricky. You need to simplify. You need to deal with them changing on the ground. We spend a lot of time thinking about neighborhoods.
How are users experiencing what you’re building in the Foursquare apps?
Let’s say you’re an expert in Brooklyn Heights. Your friends should trust you about Brooklyn Heights. That’s going to be a richer experience once we understand where “Brooklyn Heights” is. And once we understand the things that are happening on Foursquare in Brooklyn Heights.
What’s the long-term benefit of this framework?
I think there’s a huge value in owning our stack. We think about geographic data somewhat differently from a lot of companies. Foursquare is working with so much data that we need to process our data with map reduces. So I need my data in certain formats that are going to be amenable to map reduces. When we’re looking at where users are checking in, we need to be able to do that in nanoseconds, milliseconds, microseconds. Much faster than a lot of companies can tolerate. So building the infrastructure is the right move. We’re going to continue doing that. It’s nice to own your technology, because then you can make changes to it. We can tweak to how we need it to be for Foursquare.
What’s an example of the kind of tweaking you can do when you own your own infrastructure?
We did something where we took every place in the world and we assigned it a short, unique string that you could stuff in the URL. So now when you geocode Austin, you have to get back the string /austin-texas and within our system we know you can always use that to get back there: put in a URL and it’ll be really pretty. If we had been using an off-the-shelf geocoder–a service where the program lives somewhere else and you’re just talking to the API, we couldn’t create this list of slugs.
Is your work helping solve any adjacent problems for other features in Foursquare?
Foursquare wants to know what’s new, what’s popular, what’s trending, what’s getting more popular. There’s a problem here: New York has more people. If you train your algorithm to detect trending things in New York, it might think the things in Raleigh are never trending because you’re never seeing that upswing. We need to think about geonormalization. There’s a difference between the interior of Atlanta and the suburbs. If we geonormalize, we suddenly understand that the check-in characteristics of Atlanta are going to be different than the check-in characteristics in New York, which are going to be different from the check-in characteristics of the suburbs of Atlanta. We need to be able to do geonormalization so we have a better sense of what is significant in check-in changes. When do venues close? Same thing. Is this trickling off because there aren’t a lot of people in that area or is it trickling off because we need to have a better sense of people there?
You’re including foreign countries in your map framework; how are they different to map than domestic regions?
The difference is, the U.S. is easy to find data for. The rest of the world isn’t. We didn’t have a sense of the extent of Paris. The rules governing badges for South By [Southwest] were someone drawing a bounding box for Austin, with someone looking at a map and going “Austin’s about there.” It worked okay for South By. But I’m not going to hand draw every–well, I’m considering hand-drawing some.
You came to Foursquare from Google, where you worked on their maps team. How did you get into map-building in the first place?
I didn’t really specialize in college, I was just kind of general computer hacker guy, I did some systems work and then I’d had a couple of internships in the field of web search and it was somewhat obvious to me that this would be a good platform to get a job at Google. And Google looks deep within your soul and assigns you a completely random team. I wanted to be back in New York. I grew up here. There’s a maps team in New York and they put me on it, and it turned out that maps was an incredibly fascinating place to be. It wasn’t until I got to Foursquare and started working on these problems outside of that that I realized it was something I loved and it was exciting to everyone.
When you first got to Foursquare, what did they have you doing? Now what do you do?
Initially I was working on the quality of our venue database. [Now] I oversee both the geographic and venue efforts at Foursquare. Before we started building out the Explore product, there was less of a need for geographic insights and my responsibility in geo grew with the company’s demands on geo.
What does a typical day look like for you?
I get into work, I code, I go home, I think about coding more. I manage the geographic and venues effort so I actually don’t spend all day coding, which is why sometimes I’ll do it at night and on weekends. I just flew out to San Francisco and I did an hour and was like, this is the geographic data. We started this but let’s keep going. Let’s really have the data out there so that everyone can work with it. I have such a commitment to open geo. It’s such an exciting space to be in. I feel it everywhere. I really want to be spending all my time on it. I wake up on Saturdays and Sundays before my girlfriend and my first two hours are spent working on Twofishes [geocoder]. I have so many ideas percolating and I also am supervising the venues and geos team and Foursquare and trying to balance my two loves: Foursquare and open geo data. I’m thinking about how we build tools that let people contribute to open geographic data.