Visualizing Data And Telling Compelling Stories With OkCupid And FlowingData

Twitter users have shorter relationships. iPhone owners have more sex. And Big Macs cost 50 percent more in Brazil than in the U.S., but are much less in India. Got it–now, let’s talk data.

OkCupid Word Cloud


Having the numbers to back up a sales pitch, promotion proposal, or attention-getting web post isn’t all that matters. Knowing what story you want to tell with those numbers, and the right way to visualize it, is just as important. Take it from two blogs that know how to sell a statistic: OkTrends, the much-linked blogging arm of the OkCupid dating network, and FlowingData, the blog on data visualization–that’s also just out with a new how-to book, Visualize This.

We interviewed Sam Yagan, cofounder and CEO of OkCupid and a Harvard mathematics graduate, and dug through Visualize This from author Nathan Yau, a PhD candidate in statistics at UCLA and visualization whiz. We came back with this decidedly non-visual list of pointers for anyone who has to dress up their data.

Know your data really, really well before you even think about design

When you get your hands on some really good data, it’s easy to jump ahead and imagine how cool it will look to show increasing sales across years, differences in usage by ZIP code–whatever acclaimed result you have in your mind. But if you don’t know exactly what you’ve got, and check any anomalies with the source, you can’t begin to tell an interesting story.

Yagan said the OkTrends team often comes up with their post topics Jeopardy!-style: “Here’s this answer, this thing happening in our world we didn’t expect. What’s the question you could ask that brings out this answer?” Other times, it’s from questioning just how specific their data set is: “I wonder if there’s a correlation between what cell phone you have and how many sex partners you’ve had in the last X months.”

Sexual Activity by Smart Phone


That all comes from hard work put in with non-sexy database tools and line-by-line analysis. FlowingData’s Yau compares it to the car waxing, floor sanding, and fence refinishing that Mister Miyagi puts Daniel through in The Karate Kid, paying off in a natural feel for blocking and punching. “It’s the same thing with data. Learn all you can about the data, and the visual storytelling will come natural,” Yau writes.

You probably have everything you need to tell a great data story

Despite press agents for the leading presentation firms hitting up Yagan and his OkCupid team quite often, offering free lifetime licenses in exchange for a mention of their tools, Yagan simply uses “Excel 101-level tools, and a bunch of manual database querying.”

“If the data is telling a great story, the pictures and graphics don’t have to be flashy,” Yagan said. “In fact, if they’re too flashy, graphics can be distracting.”

“I think a lot of people think that if they have something really funky in the data, they should have a really funky graph to put out. We think the opposite. If you play down the presentation, the core underlying data shines more brightly.”

Never assume your graphics explain everything (or that everyone gets your humor)


After a period in late 2008 when big corporate data leaks and break-ins seemed like a regular occurrence, Yau put together a simple timeline of the top 10 computer data breaches up until that time. A timeline is one of the most basic, readable visualizations, but readers are also used to seeing and skimming them. If Yau didn’t point out that the breaches became closer and closer together as the timeline moved from 2000 to 2008, he doubts the graphic would have made its way across the web, and eventually into Forbes magazine. “I don’t think people would’ve given the graphic much thought had I not provided that simple observation,” Yau wrote.

Data Breaches

But don’t get overly cute in the text, either. In running down a graphic visualizing which areas of the U.S. had more bars than grocery stores, Yau asked those living in the areas with higher bar concentrations if they could confirm, adding, “I expect your comment to be filled with typos and make very little sense. And maybe smell like garbage.” A “good number of insulted Wisconsinites” who hadn’t previously read FlowingData followed up with him.

Be specific on believable topics

Specificity is what makes numbers and differences stick in people’s minds. Yagan suggests that telling an audience about something 49.6% more likely to happen is more intriguing and memorable than a rounded-up 50%. “That’s why Al Gore will say something like, ‘That’s a 7.1 trillion dollar program.’ There’s no real need for that point-one, but it makes your point more accurate, and memorable for that reason.”

Even if you think the story you have to tell has been told a million times, super-specific findings and interesting cross-references can shine through. “Everyone might know that prettier women get more (interest) messages (on OkCupid),” Yagan said. “That, in itself, won’t surprise anyone. But if I tell you that the top decile of women get 25 times the messages of the bottom decile, that is interesting.”


Make it something people can relate to, and don’t pull punches

OkCupid is in a unique position to making its posts appealing, as the site’s data set can speak directly to love, sex, longing, the lies we tell ourselves and each other, and so on. But still, their blog posts have to find an angle that people who don’t necessarily use online dating services (yet) can find intriguing. So the blog reveals, for example, that a person’s answer to “Do you like the taste of beer?” is the single best indicator of whether they’d consider sex on a first date (hint: that matches up just the way you think).

First Date

“A lot of people at bigger companies, they don’t want to be controversial, they don’t want to offend anyone at all,” Yagan said. “So they’ll say, ‘I don’t want to use the word sex,’ even if ‘sex’ in itself isn’t controversial. When there’s a clear aversion to something … something going on in the data that the writer doesn’t want to write about. And it gives a sense of softness, weakness, and lack of credibility to the presentation.”

Yau points out OkCupid’s keyword-cloud-style chart showing the most common phrases and nouns used in Asian men’s dating profiles as an example of a visualization that gets people to slow down and dig into data. “Hey, I’m Asian and a guy. Instant connection.”