Last year, Amazon was figuring out where it should offer free same-day delivery service to reach the greatest number of potential Prime customers. So the company did what you’d expect: It used software to analyze all sorts of undisclosed metrics about each neighborhood, ultimately selecting the “best” based on its calculations. But soon journalists discovered that, time and time again, Amazon was excluding black neighborhoods.
It wasn’t on purpose, Amazon claims. Its system never even looked at race. Instead, the software had essentially learned this prejudice through all sorts of other data. It’s a disquieting thought, especially since similar AI software is used by the Justice Department to set a criminal’s bond and assess whether they’re at risk of committing another offense. That software has learned racism, too: It was found to wrongly flag black offenders as high risk for re-offense at twice the rate of white offenders.
Amazon’s biased algorithm, like the Justice Department’s, doesn’t just illustrate how pervasive bias is in technology. It also illustrates how difficult it is to study it. We can’t see inside these proprietary “black box” algorithms from the outside. Sometimes even their creators don’t understand the choices they make.
But what if we could test these algorithms methodically enough from afar to sniff out inherent prejudices? That’s the idea behind Themis. It’s a freely available bit of code that can mimic the process of entering thousands of loan applications or criminal records into a given website or app. By changing specific variables methodically–whether it be race, gender, or something far more abstract–it can spot patterns of prejudice in any web form.
Themis has the potential to help companies build less biased software internally–and give citizen journalists a means to test public platforms like Amazon for bias, too.
Led by University of Massachusetts professors Alexandra Meliou and Yuriy Brun, Themis is like a software data scientist that excels at running its own experiments. “It’s actually using a very fundamental part of the scientific method called ‘causal inference,'” says Brun. “It’s this notion run in statistics that if you just observe the system, you can’t say if someone’s race causes the difference of output, or behavior of software.” But if you can test and observe a system, you can.
That’s actually a lot less complicated than it sounds, says Brun, who walks me through the example of how causal inference works when applying for a loan. “If I run my loan application–I happen to be white–and find software recommends I get a loan, then I can take that same loan application and change just the race,” Brun explains. “Then I ask, what about this application? Can you give this person a loan?” If all other variables on a loan application are the same, and the person is rejected for a loan, then the conclusion is obvious: This loan application is racist.
Of course, many forms are a lot more complicated than this theoretical loan application–as are the algorithms inside them. It may be that the loan application doesn’t discriminate on race at all, unless people are, say, shopping for homes with an income of $35,000 a year in Orange County, California, and they’re married without kids. It’s spotting these subtler instances of prejudice that may be the hardest, and it’s exactly these scenarios that Themis excels at, because it’s automated. Themis can run trial after trial on software tweaking variables to spot scientifically significant results. Vitally, it can run these tests efficiently–using 148 times fewer permutations in its testing process to than the average human would need to spot trends.
To test Themis, the researchers didn’t aim their sights at Amazon or any other public-facing company. Instead, they tested other code on Github. “We found some interesting things,” says Brun. “Sometimes, trying to not discriminate against a gender actually makes your system discriminate more against something like race. Because you constrain your system not to discriminate against a popular attribute, it forces your system to do things against other attributes.”
In one instance, Themis tested a loan application designed specifically to combat gender bias. In this instance, Themis found that, indeed, 50% of loans went out to men, and 50% went out to women, just as the software promised. But when they zoomed out to analyze the geography of those loans, they saw that women still faced bias–just of a different sort. One-hundred percent of women who got loans were all from the same country.
“Bottom line, software is tricky. These are systems taking big data, and trying to learn from it,” says Brun. “But they’re just trying to solve a mathematical formula. And if the formula says the same number of men and women should get a loan, it will figure out a way to twist things to satisfy that formula.” It can be in this twisting that new biases crop up, just as they saw with the select women who got loans.
The UM research team believes that Themis is the first system of its kind–a tool that’s actually able to poke and prod at software methodically to spot bias–but it’s far from complete. Brun points out that while Themis is good with simple categories like gender or any numerical input, it doesn’t know how to judge multimedia, like photographs, for bias. So that’s what the team is focusing its work now.
But in the meantime, Themis is available for anyone with some coding talent to use. That includes citizens and journalists who want to keep companies honest, sure, but Brun also hopes that includes the internal teams behind some of the world’s biggest platforms, as they develop software the masses will use.
“I envision system developers who develop the software for, say, Amazon, using Themis to improve their own software,” says Brun. “[But] any random person can go check if Amazon’s code discriminates.”