I just spoke with Yves-Alexandre de Montjoye, a senior PhD student in computational privacy at the MIT Media Lab, and I’ve got some bad news about data and privacy. Then I’ve got worse news. But fear not, because after that I’ve got some good news!
De Montjoye has been part of a team exposing the presumption of “anonymized” data. We know there are hella terabytes of metadata about us floating in our mobile, credit card, and browser histories, and we’ve been okay with that in part because we’ve been told there’s no personally identifiable information in the records. But de Montjoye’s research has shown that 95% of individuals can be reliably identified with as few as four data points.
While any single record itself might be anonymous, we leave unique pathways through all this data. On April 6, only one person bought a subway card, then got some dried bison at the farmers’ market, then picked up some dark chocolate. That person was me, and it was a good day.
De Montjoye and his Media Lab colleagues have pushed their investigation further. In a limited experiment based only on mobile-phone metadata, the researchers were able to predict personality types, like neuroticism and extroversion, with more than 60% accuracy. Maybe you weren’t playing hard to get when you waited to text him back; maybe you were revealing your anxiety. If a small team at an academic institution can match personality to identity, imagine what multinational organizations with billions in resources, motivated by profit and competition, and with little to no oversight can do. (That’s the very bad news.)
Thankfully, de Montjoye and his peers have a solution: Let’s shift the balance of power and risk in how user data is handled. “Right now, companies are collecting data because they can,” he says. Instead, he believes, they should “code for what they want to know about a user” and request that alone, then discard it. One part of de Montjoye’s proposed system, SafeAnswers, is spiritually akin to a music-recommendation engine like Pandora. It doesn’t actually need to know every song you’ve ever listened to. It merely needs the DNA of what you like in order to give it enough information to select the next song. His other idea is a notion called Open Personal Data Store (OpenPDS). Consumers would control their own data in a secure cloud account or on a disk hidden under their mattresses and be able to decide to whom they grant access. (I would keep mine in a tooth. Hackers better have pliers!)
As much as de Montjoye’s ideas would benefit individuals, their genius is that they’d help companies too. Do you think Target, Home Depot, and eBay want to be hacker bait because they have all our credit-card info on file? Does Apple need to hold on to the exact time, location, and quantity of pushups I’ve done? (It’s a lot. I’m in a constant state of pushups.) Centralized servers holding intermingled user metadata are all too tempting for would-be cybercriminals. Distributed personal data stores would be a more expensive and difficult target, a creative solution to the current cybersecurity threat.
If we embrace these ideas, hoarding customer data would become a liability rather than an asset. And I look forward to a world where I define which insights to share. I’ll tell you for free that bison and chocolate make me less neurotic.