UPDATE: On Wednesday, Twitter announced it would roll out the archive feature to all users.
When third-party companies, licensed by Twitter to resyndicate its data, began selling access to historical tweets earlier this year, privacy advocates were quick to point out an inconsistency in the platform’s policies.
Data analytics companies such as DataSift and Gnip were making money from past tweet data, but individual Twitter users still couldn’t easily access the full log of tweets they had created. “By locking users out of their own data, Twitter has managed a rare feat: making Facebook look good,” wrote The Globe and Mail.
But if Twitter moves forward with a personal archive option it began testing this week (as promised), it can end this incongruity. The company already grants users rights to all of their tweets in its terms of service. With the new feature, it will finally grant them…their tweets. Rather than making users fish tweets from the archive using an exact URL, they will be able to download a zip file that contains their Twitter history with a few clicks. The feature is similar to Facebook’s “Download Your Info” feature or Google’s “Takeout” products.
Giving users’ access to their archives provides a good comeback line for Twitter. But does the new feature provide any value to users? Here’s how it might.
Users can already search Twitter history through third-party apps such as Topsy. But, points out Cathy Marshall, a principal researcher in Microsoft’s Silicon Valley Lab who studies personal digital archiving, that doesn’t remove the risk of losing it. “What guarantee do you have that a small company has any stability itself? You’ve backed up your tweets to another service, and you don’t know what its general outlook is.”
Downloading tweets directly from Twitter, where they’ll be safe even if the Internet explodes, reduces the risk of losing content. It provides an easy way for even someone who doesn’t have a lot of computer skills to save what they’ve created.
“If you’re a geek, you care a lot, because this is about data sovereignty,” says Marc A. Smith, the cofounder of the Social Media Research Foundation, an organization that develops free and open tools for all kinds of users who want to understand networks and social media. “If I don’t actually have my own controllable copy that could be redeployed in some other service, than I’m cattle. If I can’t exit and then reengage with some other person in the marketplace, then every time I submit a bite of content to any depository, I have to do so knowing that that is a data roach hotel. Data goes in, it never comes out. I think so far we all live in data roach hotels, and we don’t have sovereignty.”
Twitter’s new tool gets halfway there. It gives some users a controllable copy of their content. But it doesn’t exactly make their data portable. Good luck uploading your Twitter archive to Facebook, for instance. Or vice versa.
“By comparison, email, it usually is possible to download a bunch of email archives and walk over to another server and upload those archives and then, bang, you’re pretty much back in business,” Smith says. “When enterprises need data portability, they typically get it. When consumers want data portability, they don’t get it.”
Creating what looks like a simple download option becomes more difficult when scaled for 140 million users who together create 340 million new tweets every day. Twitter may very well have more in mind for its archive option than it has released in its first test.
Tweet search engine and data analytics company Topsy may come closest to fully understanding the challenge. It has created a searchable three-year archive of Twitter chatter that contains more than 100 billion tweets, and, says VP of Product Jamie de Guerre, it’s been no walk in the park. “Google’s index of the entire Internet ranges from about, in some estimates, 45 billion web pages to 125 billion web pages,” he says. “So the size of Twitter is on the order of the size of the Internet–just in tweets instead of web pages. Having all of that data available, being able to query across and return a large data file to a user is definitely quite a challenge.”
Although for now there’s not much to do with your Twitter archive but reminisce, De Guerre hypothetically imagines new ways to visualize history, not unlike Facebook Timeline. Smith insists there’s no reason that online services can’t cooperate to provide full portability for their users, citing proposed common languages such as Activity Streams and GraphML.
For now, he says, Twitter’s archiving feature is a step in the right direction. “My Goodness!” he exclaims when I comment that I’m not sure why I’d want my tweet archive. “The data has value,” he says. “We know it has value because it’s traded on marketplaces.”
[Image: Flickr user Rachel Kramer]