Bias is an inherent part of the human experience. It’s the silent filter created by our lived experiences, a lens through which our everyday decisions pass. It shapes us. And often, we’re not even aware of it.
As a human invention, artificial intelligence (AI) has inherited bias from us, but we have an opportunity to reduce its occurrence. In the same way acknowledging unconscious bias is critical in eliminating discriminatory behaviors in social settings and creating the ideal personalized and human relevant machine learning models that function effectively.
“There’s no shortage of headlines highlighting tales of failed machine learning systems that amplify, rather than rectify, sexist hiring practices, racist criminal justice procedures, predatory advertising, and the spread of false information,” writes computer scientist Joy Buolamwini, whose research was a part of the recent documentary Coded Bias. “Though these research findings can be discouraging, at least we’re paying attention now. This gives us the opportunity to highlight issues early and prevent pervasive damage down the line.”
Zero bias is the ultimate goal. But we’re just getting started on the hard work. As my colleague, Siobhan Hanna, put it, “Reaching absolute zero bias in data may be impossible. However, collecting good data today enables the technology of tomorrow to make more accurate decisions.”
As artificial intelligence is increasingly woven into the fabric of daily business and customer-experience practices, we need to understand bias in AI in order to proactively mitigate against it.
WHAT IS DATA BIAS?
Annotated data sets are the foundational building blocks of our AI-powered world. Computers require data sets labeled by human data annotators in order to understand what they’re processing for future decision-making. There’s a common saying from computer scientists: “garbage in, garbage out”—meaning faulty or compromised code will be amplified.
Understanding data bias is the first step in preventing faulty or compromised judgments that will be amplified and have unintended consequences. Good training data—the human-labeled data sets underpinning AI—is vital. But unconscious bias is pervasive. Here are a few ways it manifests:
- Sample bias: Sample bias—or selection bias—is a data set that doesn’t reflect the diversity of the environment in which the machine-learning model is going to be run. An example is when a facial-recognition system data set draws predominantly from white men. An algorithm trained from this data set will struggle to recognize women and people of different ethnicities.
- Exclusion bias: Exclusion bias often happens in the pre-processing stage when valuable data is deleted because it’s thought to be irrelevant.
- Measurement bias: This bias stems from inconsistency and distortion. For example, the training data for facial recognition may vary from camera to camera. The difference in measuring techniques could skew the results. A measurement bias can also occur when data is inconsistently labeled.
- Recall bias: A subset of measurement bias, recall bias occurs when there is a misunderstanding of labels. Consider a series of objects labeled as damaged, partially damaged, or undamaged. There could be a difference in perspective of what counts as damaged versus partially damaged.
- Observer bias or confirmation bias: This bias is the byproduct of seeing what we expect or want to see in data. In a human-centric process like data annotating, the bias arises when the labelers’ subjective thoughts dictate how they annotate.
- Racial bias: A large part of Buolamwini’s mission has been to tackle racial bias, a process where data skews in favor of particular demographics. Speech and facial recognition have been criticized for their inability to recognize people of color as accurately as they do white people.
- Association bias: A key driver of gender bias, association bias happens when a training model multiplies a cultural bias. Consider data that shows only men working in construction and women working as nurses. In a job-finding algorithm, this data could end up not identifying construction jobs for women or nursing jobs for men.
Bias is constantly evolving and shifting in ways that are hard to keep pace with. So where do we go from here?
MANAGING BIAS IN AI
It’s well-documented that diverse teams perform better, and having a team that comes from a wide range of backgrounds is a vital part of reducing bias in machine-learning algorithms. In our industry, it can only be achieved by accessing annotators around the world with different experiences, racial compositions, and other factors to mirror the real world. This is where TELUS International’s global crowdsourced team of more than 1 million data annotators who support more than 500 languages and dialects can help bring bias closer to zero.
Further, as data becomes increasingly more complex and global, good customer experience hinges on brands having a cultural understanding of the markets they serve. Customers want to feel known which can be done with a diverse group of annotators that reflect and understand regional and cultural nuances.
Consider text annotation for an editing tool. Using a diverse team of annotators helps promote inclusive and natural language that reflects the user. For a brand looking to collect and annotate data internally, that’s a massive task. It may seem like it will save time or money to use in-house resources, but that neglects the long-term effects of biased data. It’s short-sighted.
Instead, crowdsourcing—a model in which individuals or organizations obtain goods or services, from a large, relatively open, and often rapidly evolving group of participants—is a way forward for mitigating bias that could have far-reaching ethical implications for our society as a whole. It can open up brands to source data in hundreds of different languages, dialects, and geographic markets—data that is representative and inclusive.
While achieving zero bias in machine learning and AI may be a more aspirational goal, reducing it is within our reach by fostering a more diverse AI community, staying up-to-date on this fast-moving field of research, and regularly engaging in conversations around potential human biases. Humans have an inherent ability to innovate, adjust, improve, and learn from our mistakes, and in doing so, the algorithms we develop and train will also get better by learning a more broad and inclusive range of diverse opinions and ideas.
Ed Jay leads TELUS International’s global AI Data Solutions business and is responsible for enabling artificial intelligence innovation for some of the world’s largest technology companies in social media, search, retail and mobile. To learn more about AI Data Solutions and TELUS International’s offering, please click here .
To see the rest of our Data & AI series, please click here.