Last year, Amazon was figuring out where it should offer free same-day delivery service to reach the greatest number of potential Prime customers. So the company did what you’d expect: It used software to analyze all sorts of undisclosed metrics about each neighborhood, ultimately selecting the “best” based on its calculations. But soon journalists discovered that, time and time again, Amazon was excluding black neighborhoods .

It wasn’t on purpose, Amazon claims. Its system never even looked at race. Instead, the software had essentially learned this prejudice through all sorts of other data. It’s a disquieting thought, especially since similar AI software is used by the Justice Department to set a criminal’s bond and assess whether they’re at risk of committing another offense. That software has learned racism, too: It was found to wrongly flag black offenders as high risk for re-offense at twice the rate of white offenders.

Amazon’s biased algorithm, like the Justice Department’s, doesn’t just illustrate how pervasive bias is in technology. It also illustrates how difficult it is to study it. We can’t see inside these proprietary “black box” algorithms from the outside. Sometimes even their creators don’t understand the choices they make.

But what if we could test these algorithms methodically enough from afar to sniff out inherent prejudices? That’s the idea behind Themis. It’s a freely available bit of code that can mimic the process of entering thousands of loan applications or criminal records into a given website or app. By changing specific variables methodically–whether it be race, gender, or something far more abstract–it can spot patterns of prejudice in any web form.

Themis has the potential to help companies build less biased software internally–and give citizen journalists a means to test public platforms like Amazon for bias, too.

Led by University of Massachusetts professors Alexandra Meliou and Yuriy Brun, Themis is like a software data scientist that excels at running its own experiments. “It’s actually using a very fundamental part of the scientific method called ‘causal inference,'” says Brun. “It’s this notion run in statistics that if you just observe the system, you can’t say if someone’s race causes the difference of output, or behavior of software.” But if you can test and observe a system, you can.

That’s actually a lot less complicated than it sounds, says Brun, who walks me through the example of how causal inference works when applying for a loan. “If I run my loan application–I happen to be white–and find software recommends I get a loan, then I can take that same loan application and change just the race,” Brun explains. “Then I ask, what about this application? Can you give this person a loan?” If all other variables on a loan application are the same, and the person is rejected for a loan, then the conclusion is obvious: This loan application is racist.