As speech recognition becomes more prevalent–and used in settings such as hiring or medical diagnoses–the issue of data and algorithm-driven bias begins to have broader implications. And, as a new experiment conducted by University of Washington linguistics researcher Rachael Tatman shows, the problem is very real.
Tatman fed a diverse set of recordings called the International Dialects of English Archive into YouTube’s automated caption system, as well as Bing’s speech recognition API. She found that the two products were equally accurate when parsing recordings of men versus women. But when it came to race, both YouTube and Bing were more accurate when captioning Caucasian speakers than any other race. “The fact that they are recognized with more errors is most likely due to bias in the training data,” she wrote.
Tatman and other researchers are part of an effort I wrote about recently to crowdsource more representative voice samples that can be used to train speech recognition systems.