Yes, algorithms can be biased. But they have an even bigger danger

Algorithms hold a pivotal and particularly mysterious place in public discussions around data. We speak of Google’s and Facebook’s algorithms as wizards’ spells, cryptic things that we couldn’t possibly understand. Algorithmic bias is raised in almost every data discussion, in classrooms and congressional hearings, as if all of us have some kind of shared definition of what an algorithm is and just exactly how it might be biased.

Computers run by executing sets of instructions. An algorithm is such a set of instructions, in which a series of tasks are repeated until some particular condition is matched. There are all kinds of algorithms, written for all kinds of purposes, but they are most commonly used for programming tasks like sorting and classification. These tasks are well suited to the algorithm’s do/until mentality: Sort these numbers until they are in ascending order. Classify these photographs until they fall neatly into categories. Sort these prisoners by risk of re-offense. Classify these job applicants as “hire” or “do not hire.”

A neural network is not an algorithm itself, because, when activated, it runs only once. It has the “do” but not the “until.” Neural nets are almost always, though, paired with algorithms that train the network, improving its performance over millions or billions of generations. To do this, the algorithm uses a training set—a group of data for which the programmer knows how the neural network should behave—and at each generation of training the network gets a score for how well it’s doing. The algorithm trains and retrains the network, rolling down a gradient of success, until the network passes a threshold, after which training is finished and the network can be used for whatever classification task it was designed for.

Neural networks excel at classifying things that have a lot of data attached to them. What’s more, they’re particularly good at classifying things in which the reasons for classifying correctly are hard to describe. Take, for example, a task in which a neural network is asked to decide whether a set of images contains birds: the images are labeled either “bird” or “no bird.” This is a problem that most humans are quite good at but one that computers have, in the past, had a really hard time with. This is because it’s actually quite tricky to describe what a photograph of a bird looks like. Your brain and mine might be able to look at a photo with a white cockatoo on a perch and another with a flock of starlings against a sunset and think “bird.” But where does the “birdiness” of these photos lie, exactly? It’s both beautiful and a little terrifying that we can avoid the stickiness of this question by training a big enough neural network, for enough generations, with a sufficient number of input images, to define “birdiness” on its own. By later feeding the network some “bird adjacent” images (other, similar animals, patterns that resemble feathers), its programmer might be able to reverse engineer exactly what part of the input signal the network has latched onto, but more often programmers are content with the result, a bird-finding machine built on nodes and weights and chance.

There’s an important difference between the way neural networks work and the way a standard computer program does. With a run-of-the-mill program like a decision tree, we push a set of data and a list of rules into our code-based machine, and out comes an answer. With neural networks, we push in a set of data and answers, and out comes a rule. Where we were once drafting our own rules for what is and what isn’t a bird, or which prisoners may or may not reoffend, the computer now constructs those rules itself, reverse engineering them from whatever training sets it is given to consume.

Training sets, we’ve come to learn, are too often incomplete and ill-fitted to the nuances of the real world. When Matthew Kenney ran his experiments with word2vec, the algorithm didn’t decide to link “black” to “criminal” because it found some pattern in the real world; it did it because its training set of news articles, largely from the United States, commonly placed those words together. Joy Buolamwini’s computer vision program [at MIT’s Media Lab in 2018] didn’t fail to see her face because of some mistake in its code; it failed because the image set it was trained on contained a hugely overweighted majority of white faces.

Sam Sinyangwe described how, after he and his collaborators launched Mapping Police Violence, The Washington Post released a similar project, collating various citizen-driven collection efforts into a single database. That The Washington Post‘s database and MPV’s are quite similar isn’t surprising, given they started with the same goal. However, the two teams made different decisions about how the real-world stories of police killings would be translated into data. The Post, crucially, decided that it would classify incidents in which kids were brandishing toy guns as cases where the victim was “armed.” “So they did not classify Tamir Rice as unarmed,” Sinyangwe explains. Mapping Police Violence, on the other hand, does list Rice as unarmed. “That’s a choice that needed to be made, and there isn’t a clear-cut answer,” Sinyangwe says. “But it is a political decision.”

Here is a real thing that happened, a real and painful and tragic thing, which became data in two very different ways. Consider a future in which every law enforcement officer wears a body camera (a particular solution much recommended to curtail police violence). To get around the messy judgment of fallible humans, a neural network is used to analyze footage on the fly, to decide whether a situation requires an armed response. To get to the shoot or don’t shoot rule that is at the center of the logic, the system is fed with data—images from crime scenes, video from bystanders, historical footage from body cams. But that’s not enough. The system also needs answers, to be taught in which scenarios officers might be justified in firing and in which scenarios they aren’t. Where do these answers come from?

A body-cam analysis system trained with the Post‘s data might, thanks to a decision made by the people who made the database, recognize Tamir Rice—and boys with toy guns like him—as armed. Meanwhile, another network, relying on a different data set built on different human decisions, makes the opposite choice. What might have begun as a way to remove certain biases from policing decisions ends up entrenching different ones, often harder to trace back or understand.

Algorithms can, in themselves, be biased. They can be coded to weight certain values over others, to reject conditions their authors have defined, to adhere to specific ideas of failure and success. But more often, and perhaps more dangerously, they act as magnifiers, metastasizing existing schematic biases and further darkening the empty spaces of omission. These effects move forward as the spit-out products of algorithms are passed into visualizations and company reports, or as they’re used as inputs for other computational processes, each with its own particular amplifications and specific harms.

Excerpted from LIVING IN DATA: A Citizen’s Guide to a Better Information Future. Published by MCD, a division of Farrar, Straus and Giroux, on May 4th, 2021. Copyright © 2021 by Jer Thorp. All rights reserved.

Recognize your company's culture of innovation by applying to this year's Best Workplaces for Innovators Awards before the extended deadline, April 12.

Explore Topics