Remember those fascinating graphs by Peter Warden that used Facebook data to illustrate, for example, people’s interests and common names across the U.S.? Facebook has totally squashed the project. But the privacy concerns remain.
Warden gathered that data from public profiles using “crawling” software similar to what’s commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you’re way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google’s caches; and as Warden writes, “Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed.”
Still, Facebook was none too pleased: They first requested a thorough scrubbing of the data, to eliminate any personal info that might be used by spammers. And eventually, they simply threatened to sue Warden, unless he deleted all the data. They were alleging a terms of service violation. Warden didn’t have any money to fight the suit, so he deleted the data.
Facebook doesn’t look totally evil though. According to Warden: “From my conversations with technical folks at Facebook, there seems to be a real commitment to figuring out safeguards around the widespread availability of this data.” And yet, the problem will probably remain–if not on Facebook, then somewhere else.
Again, Warden: “To the many researchers I’ve disappointed, there’s a whole world of similar data available from other sources too. By downloading the Google Profile crawling code you can build your own data set, and it’s easy enough to build something similar for Twitter. I’m already in the middle of some new research based on public Buzz information, so this won’t be stopping my work.”
Still, why can’t Facebook actually encourage this type of research, while working on its privacy issues in parallel? They’re sitting on top of data the likes of which no one has ever seen before–it naive to even guess at what sorts of fascinating research could result.