Facebook Sues Data Geek, but That Doesn't Solve Its Privacy Problem

A computer programmer culls data from 210 million Facebook profiles—and pisses Facebook off in the process.

Peter Warden

Remember those fascinating graphs by Peter Warden that used Facebook data to illustrate, for example, people's interests and common names across the U.S.? Facebook has totally squashed the project. But the privacy concerns remain.

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."

Still, Facebook was none too pleased: They first requested a thorough scrubbing of the data, to eliminate any personal info that might be used by spammers. And eventually, they simply threatened to sue Warden, unless he deleted all the data. They were alleging a terms of service violation. Warden didn't have any money to fight the suit, so he deleted the data.

Facebook doesn't look totally evil though. According to Warden: "From my conversations with technical folks at Facebook, there seems to be a real commitment to figuring out safeguards around the widespread availability of this data." And yet, the problem will probably remain—if not on Facebook, then somewhere else.

Again, Warden: "To the many researchers I've disappointed, there's a whole world of similar data available from other sources too. By downloading the Google Profile crawling code you can build your own data set, and it's easy enough to build something similar for Twitter. I'm already in the middle of some new research based on public Buzz information, so this won't be stopping my work."

Still, why can't Facebook actually encourage this type of research, while working on its privacy issues in parallel? They're sitting on top of data the likes of which no one has ever seen before—it naive to even guess at what sorts of fascinating research could result.

Add New Comment

6 Comments

  • Hung Truong

    Your headline is inaccurate. As you wrote, Facebook *threatened* to sue. Threatening to do something and doing it are completely different. I think your headline needs to be changed.

  • Albo Fossa

    It's simple: either they crawled FB data or they didn't. If so, the suit has no merit, and the court should send FB squealing "wee, wee, wee" all the way home.

  • Scott Bell

    facebook needs to setup a .org like google. until they do, expect others to cull the database like this guy. Face it, the information stuffed in the databases is way more useful to the world than targeted ad sales.

  • Chris Lo

    @ Kevin Swan - Good grief, really? I'm pretty sure no one here was "confused".

    Here are some other words that may have more than one meaning:

    Elitist, arrogant, snobby, pretentious, patronising...

    Thanks! :)

  • Kevin Swan

    Cliff: Squash: has many, many meanings, among which are "quash: to legally invalidate." Quash, though, has just that one meaning, and is such a perfect word, and in this case probably avoids confusion. Squash more generally means to squeeze or crush. Thanks!