How Can We Build Ethics Into Big Data?

One expert suggests how to create a new “due process” to bring back individual rights in a world of murky and sometimes unfair data practices.

How Can We Build Ethics Into Big Data?
[Painting by Paul Corio]

In a day and age where purchasing items like unscented hand lotion can alert your family to the fact that you’re pregnant (see Charles Duhigg’s story on predictive analytics at Target), clear ethical guidelines on how companies and governments should use ever-growing streams of data about individuals are more necessary than ever.


But we still don’t have them. We only really have a handful of suggestions on what could be a good idea while our laws try and catch up to the 21st century. Still, one suggestion brought up at a panel this week co-hosted by the White House did seem promising, even though it’s from the year 1215: A “due process” for big data, or, in other words, a path for the people affected by predictive analytics to right wrongs.

In the wake of the NSA scandal, the panel was one of three public events over the course of a 90-day review of big data practices ordered by President Obama. This one, moderated by writer and entrepreneur Anil Dash and held at the New York University Law School, looked at the ethical and social implications of big data specifically.

Over the course of the talk, panelist Kate Crawford, a principal researcher at Microsoft and visiting professor at MIT’s Center for Civic Media, explored the notion of “due process,” a formal way for a state to respect individual rights first drummed up in the Magna Carta–a 13th-century agreement from the king of England to limit his powers by law and respect his subjects. These days, Crawford dedicates her research towards finding a new one for big data.

“Under the friendly view of optimization, big data can be used to isolate different groups and then treat them differently,” she said. “The fear isn’t that big data discriminates. We already know that it does. It’s that you don’t know if you’ve been discriminated against.”

As an example, Crawford threw out a few hypotheticals: The job candidate who didn’t know he was rejected because of a combination of his Facebook likes predicted something unfavorable, or someone whose search history correlated with a criminal’s.

Due process could apply to these people, too, Crawford said. In a paper published earlier this year in the Boston College Law Review, Crawford and scholar Jason Schultz expanded a bit on what that might look like:


1. Notice

Crawford and Schultz write that people who make predictions based on big data should disclose what’s being predicted and how.

“For example, if a company were to license search query data from Google and Bing in order to predict which job applicants would be best suited for a particular position, it would have to disclose to all applicants that it uses search queries for predictive analytics related to their candidacy,” they wrote.

2. An Opportunity for a Hearing

In January, Ukrainians near a violent protest received text messages on their phones telling them that they had been “registered as a participant in a mass riot.” Was that based on location data? What if you weren’t participating? And was there any way to challenge that assumption?

Those questions remain unanswered, but Crawford and Schultz suggest that in the United States, the Federal Trade Commission might investigate complaints about predictive analytics. But even if it isn’t the FTC, researchers stress the need for a neutral arbiter to determine whether the use of big data crossed the line.


3. An Audit

Whoever that arbiter might turn out to be, he or she would need to review how predictive analyses were made. “This would require some sort of audit trail that records the basis of predictive decisions, both in terms of the data used and the algorithm deployed,” the researchers wrote.

For now, Crawford’s “due process” still remains in the realm of ideas. And establishing anything like one would likely be an incredibly slow, drawn-out process. There are also lots of people who would argue that hindering big data in the present would slow progress that can solve real societal problems, too–like using large-scale analytics and personal medical information to monitor patients at risk of infection in hospitals.

But it’s a misconception that setting ethical standards for the use of big data seeks to undermine progress. Setting ethical standards now could prevent nasty consequences down the road, even though that action ranks pretty low on most organizations’ lists of priorities.

About the author

Sydney Brownstone is a Seattle-based former staff writer at Co.Exist. She lives in a Brooklyn apartment with windows that don’t quite open, and covers environment, health, and data.