Experts say that bias is one of the biggest problems facing the development of artificial intelligence. When a data set reflects systemic discrimination and bias in the real world, that bias gets encoded into an automated system–which can have dire consequences, determining who gets a job, who gets a loan, and even who goes to prison.
Yet it can be hard to tell when a data set is biased, especially when these systems are built by homogenous teams mostly consisting of white men. Even the existing tools that are meant to test algorithms can be biased. Take what’s known as a “benchmark data set,” basically a bunch of data that is used to assess an AI’s accuracy. Two common benchmark data sets used to test facial recognition systems, known as IJB-A and Adience, are actually composed of 79.6% and 86.2% light-skinned faces, which means that these benchmarks don’t test the accuracy of the algorithm for all kinds of faces with the same kind of rigor.
The resulting study shows that all of these real-world algorithms have significantly lower accuracy when evaluating dark female faces than any other type of face. It’s troubling proof that the AI already at work in our daily lives is deeply biased–and that we need to demand greater diversity in the people who build these algorithms and more transparency about how they work.
[Photo: TJ Rak]
Buolamwini says that the benchmark data set, composed of 1,270 images of people’s faces that are labeled by gender as well as skin type, is the first data set of its kind, designed to test gender classifiers, that also takes skin tone into account. The people in the data set are from the national parliaments of the African countries of Rwanda, Senegal, and South Africa, and from the European countries of Iceland, Finland, and Sweden. The researchers chose these countries because they have the greatest gender equity in their parliaments–and members of parliament have widely accessible images available for use.
Why does something like this happen? Typically because the data set the AI was trained on had far more light-skinned male faces, and light-skinned faces in general. It’s bias at work.
IBM responded to the research by doing a similar study to replicate the results on a new version of their software. The company reports it found far smaller differences in accuracy with the new version that has yet to be released and says it has several projects underway to address issues of bias in its algorithms. Microsoft says it is working to improve the accuracy of its systems. Face ++ did not respond to the research.
Buolamwini has some ideas about what true algorithmic justice looks like. “Facial analysis systems that have not been publicly audited for subgroup accuracy should not be used by law enforcement,” she writes. “Citizens should be given an opportunity to decide if this kind of technology should be used in their municipalities, and, if they are adopted, ongoing reports must be provided about their use and if the use has in fact contributed to specific goals for community safety.”
Only then can we start to move toward a more equitable algorithmic future.