Academics and business leaders agree that bias is one of the most vexing problems facing artificial intelligence. What they don’t agree on is how to address it. The issue is so complex and widespread, it feels impossible to tackle.
Perhaps the trick is to start small. At the EmTech Digital conference on AI in San Francisco this week, Microsoft Research Postdoctoral Researcher Timnit Gebru suggested taking cues from the field of hardware design.
“In hardware, you have this concept of a datasheet,” Gebru says, referring to documents that electronics manufacturers release to describe the characteristics of components. Circuit designers–Gebru’s former profession–consult them to understand exactly what they’re working with and how different components might fit together. “We have to have a concept of a datasheet for AI,” Gebru says.
To understand what problems this would solve, it helps to have a clear idea of exactly how bias in AI works. A typical AI is trained on data from the past, looks for patterns in that data, then makes predictions about the future. So if your dataset is racist or sexist–and much data is–the AI will generalize from that bias and exacerbate it. As Oren Etzioni, chief executive officer of the Allen Institute for Artificial Intelligence, put it at the conference: “We start with a racist dataset, and we actually make it more racist.”
So how might an AI datasheet work? “The datasheet would have guidelines for what settings a specific dataset is appropriate for,” Gebru tells Co.Design in an email. “It would also give guidelines for what certain APIs can be used for.”
For instance, an API that classifies faces by gender might be great at identifying adults, but lousy at identifying children. The datasheet would specify that. Another datasheet might reveal that a dataset consists mostly of light-skinned men between ages 25 and 35 and would not be appropriate for making predictions about women and people of color. The larger idea is to help organizations and researchers “make an informed decision on how to use [the data] and whether or not it is appropriate for their setting,” Gebru says.
Regulation and standardization will, of course, be crucial to preventing bias as AI pervades industries ranging from transportation to criminal justice to home health care. But regulation moves like Kabuki, evolving slowly and dramatically over time, and if history is any guide, many years will pass before business leaders, academics, and politicians even begin to coalesce around a set of best practices. Datasheets are something AI researchers can create now to wring some of the bias from the datasets that are quietly, and inexorably, shaping our lives.