To understand how far we’ve come with AI’s development, think about when cars were first invented, posits Aleksandra (Saška) Mojsilović, an IBM Fellow and codirector of IBM Science for Social Good. “It was the Wild West,” contends Mojsilović, “people driving left and right, no stop signs, or seat belts.” Mojsilović, who also leads Trusted AI for IBM Research, points out that safety measures for cars were built in much later—after accidents happened.
And if there’s anything that keeps Mojsilović and her colleague Francesca Rossi, IBM Research’s global head of AI Ethics, awake at night, it’s that AI guardrails are just being established.
In some cases, “safe” AI is likely not even on most people’s radar. Consider that right now, AI is working its way into all sorts of innocuous daily applications, like digital assistants who can tell you the weather or keep a running grocery list, or a chatbot that takes the place of a customer service rep to deal with commonly asked questions.
However, when it comes to those meant to boost human decision-making, such as hiring candidates based on merit, or training workers to be more ethical, or keeping your older loved ones company by telling them jokes or reminding them to take their medicine, it all starts to get a bit more fraught. And what about the algorithms that help with diagnosing diseases or deciding who gets a loan? “It makes us all very uncomfortable,” Mojsilović says, “when AI starts making decisions on hiring, education, if we are given parole.”
In a perfect world, we wouldn’t have to think about the dark side. Machines would make faster, smarter, less biased decisions than humans, all for the greater good with unprecedented efficiency. As the robot Sophia told CNBC’s Andrew Ross Sorkin, “You’ve been reading too much Elon Musk and watching too many Hollywood movies. If you are nice to me, I’ll be nice to you. Treat me as a smart input-output system.”
But as development starts with humans and the applications are used by humans, the data driven into the machine can be manipulated. Researchers from Harvard and MIT caution that in the case of health-care applications, the AI can be fooled into creating a false diagnosis if a human doctor or hospital changed the data for their own gain. As such, they write, those developing the AI and regulators need to build in safety measures.
AI can have bias, too
Or the data going in could be flawed. Most recently, the ACLU conducted a test of Amazon’s facial recognition tool, and the software incorrectly matched 28 members of Congress with people who have been arrested for a crime. The ACLU reported that the false matches had a disproportionate number of people of color, including six members of the Congressional Black Caucus. And in other applications, it’s proven to be unfair to immigrants, women, and transgender individuals.
This type of bias is a thorny problem that is keeping Tracey Robinson on the alert as she and her team continue to develop Amelia, IPsoft’s digital AI “colleague.” Robinson, the director of Cognitive Implementation for Amelia, maintains that the acknowledgment of bias on both the human and tech sides is one of the crucial components of IPsoft’s Amelia teams. As such, Robinson says they work with conversational linguists and designers who oversee the development, testing, and training of Amelia to identify and eliminate perceived biases before ever exposing her to customers.
“Human nature makes the complete elimination of biases impossible, which is why it is an organizational imperative to employ as diverse an AI training group as possible, be it culturally, geographically, gender, experience, and skill set,” says Robinson. If an Australian company was going to use Amelia, she says, “we would make sure she speaks and interacts in a manner specific to Australia, including colloquialisms.”
Robinson believes the most significant evolution in AI over the past few years has been this important recognition of a diverse workforce as an integral part of maintaining ethical training, so the result can be a highly objective digital colleague.
AI and the tricky nature of human language
Over at Jigsaw, which, along with Google, is part of the umbrella entity called Alphabet, software engineer Lucy Vasserman is dealing with equally tricky language bias. She and her team are working on Perspective, which is made up of machine learning models that predict toxicity in language. The way it works is that it scores text from websites or online forums for how likely it is that somebody will receive that text as toxic. “Toxic is our flagship model, but we do some other subtypes, like insults, identity attack,” Vasserman says.
Using massive data sets, like the New York Times comments section, Vasserman says the humans (between 3 and 20 per message) behind the developing AI are asked how certain messages made them feel. That’s because, says Vasserman, “People are not very good at keeping track of a specific definition [like toxic], but people are really good at knowing how they feel.” That goes into a machine learning model, called the Neural Network, which basically learns words, sentences, and paragraphs and learns how to predict what’s toxic or what people believe is toxic.”
The toughest part of this is getting nuance, says Vasserman. She explains that the word “gay” for example, isn’t always meant to be an insult. So the machine needed to be trained on how not to misinterpret the data. “We grabbed data from New York Times articles themselves and Wikipedia articles where those authors were using identity terms in healthy, positive ways,” she explains. If you give the model the right data, she says, it’s able to distinguish. “It can understand the difference between ‘I’m a proud gay man,’ and when somebody says, ‘You’re gay,’ meaning to be offensive,” she says.
The API for Perspective is open, and developers are encouraged to run with open-sourced experiments. “I think it’s really important that we are demonstrating to the industry how this can be done and what are the best practices for building fair machine learning products,” she says. “Making it work in our product is not enough.” But the work hasn’t ended. Even though Perspective’s parsed some double-digit billions of messages, Vasserman is still on high alert to combat language bias.
“We need to expand beyond just the domain of the typical comment you see online,” she says. In order to make sure that the model represents diverse conversations, you need to actually feed it diverse conversations and diverse data, explains Vasserman.
Now she’s focused on getting data from conversations where people are using reclaimed words like “dyke” to identify themselves in a positive way. “Words like that are often used offensively, but communities do reclaim them and are offended when a machine learning model just says, ‘Well, this word is here, so it has to be offensive.’ ”
The pieces of language that are really hard for human beings are also going to be hard for machines to learn, contends Vasserman. “I see the biggest benefit of what we’re doing is to get the machine to do the part that’s easy for humans, so that the human moderators [are free] to focus on the conversation and understand the context of what’s going on,” she says.
Mojsilović quips that while hardware like blenders and other household appliances come with instruction manuals, AI overall does not. And this is needed, especially at this early stage, she says. Rossi maintains that while ethics and trust are very vague, “we try to brainstorm ways to translate them into principles and properties.”
Ethics guides for AI
The result has been kind of a user’s manual in which Rossi participated as part of an expert group with the European Commission. Together, they drafted the “Ethics Guidelines for Trustworthy AI,” which among other things, stipulates that AI should be human centric and “developed, deployed and used with an ‘ethical purpose,’ grounded in, and reflective of, fundamental rights, societal values and the ethical principles of Beneficence (do good), Non-Maleficence (do no harm), Autonomy of humans, Justice, and Explicability.” This is all a fancy way of saying that any use of AI for evil would be strictly verboten.
Yet, Mojsilović points out that it’s a fine balance. The Internet, she observes, can be used to share quality information or to disseminate child porn. The overall intent is good, but there are bad actors out there. IBM is trying to combat this by getting in early to install those safety features before AI gets beyond human control with measures such as its Everyday Ethics Guide for Developers and the AI Fairness 360 toolkit.
“If someone is going to trust a decision made by a machine,” Mojsilović says, “It needs to meet human benchmarks.” That’s why, she says, developers need to make sure each and every decision made by algorithm is fair and people can relate to it. It’s a business imperative, too, Rossi adds. “For us, the purpose is not to create an alternative form of intelligence but to augment [human intelligence],” Rossi says. Very soon, responsible use will be a big differentiator between companies that are developing AI technology.
Ultimately, they all agree it’s a brave new world. “The biggest thing that keeps me up at night is trying to comprehend the endless possibilities and impacts this technology will have on health care, education, and customer experience, as well as employee experience,” says IPsoft’s Robinson, “There is a real societal impact when AI goes beyond promise and into production.”
Each of them believes we are at an inflection point between a human and a digital workforce. “The other half of the equation requires us, as humans, to change in order to realize the full potential of this shift,” says Robinson, “and hopefully reach, if not exceed, our collective ambitions as companies, employees, people, and society at large.”