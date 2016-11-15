“Algorithm” might be one of the most popular terms that almost no one understands. How could they? Not many people have PhDs in data science, and even those experts don’t always know what’s happening. “It’s not clear even from a technical perspective that every aspect of AI algorithms can be understood by humans,” says Guruduth Banavar, IBM’s chief science officer for cognitive computing, which is what IBM calls AI.

That’s a scary situation. Artificial intelligence is making decisions by reviewing people’s medical tests in hospitals, credit histories in banking, job applications in some HR systems, even criminal risk factors in the justice system. Yet it’s not always clear how the computers are thinking.

“There has been quite a bit of discussion about how these algorithms come to various conclusions and whether a person who is affected by the conclusions has a right and maybe the facility to find out how the algorithm came to those conclusions,” Banavar says. On September 20, Banavar released a paper, “Learning to Trust Artificial Intelligence Systems,” that lays out principles for algorithmic responsibility, ensuring that AI is making understandable decisions based on good data. In September, IBM joined Amazon, Facebook, Google DeepMind, and Microsoft to form the Partnership on AI. The organization will fund research and collaboration on ways to make AI more socially and technologically responsible.

AI’s rapid growth makes it hard to spot problems. “People are trying many ideas, and some of them seem to be working pretty well,” says Banavar. “But we can’t exactly explain how the internal system has achieved everything that we are seeing at the outcome.”

He provides a simple example in the mainstay AI method of deep learning. It uses neural networks, reasoning systems modeled on how the brain learns, to ingest and understand huge amounts of information. Take, for instance, a medical imaging system that has scanned a million X-rays in order to recognize and classify signs of blocked arteries. When a new X-ray is added, even the humans who built the neural network can’t necessarily predict how the system will classify it. “The internal workings of the neural networks are so complicated that if you just [examine] the internal state of the algorithm at any point, it would be meaningless to any person,” Banavar says.

There’s another challenge: Machine learning makes sense of the world based on the information fed into it. That adheres to one of the fundamental rules of computing: garbage in, garbage out. X-rays that are of poor quality or labeled incorrectly, for example, won’t teach a medical AI system how to accurately spot cardiovascular disease.

The same weakness applies to systems that make judgment calls about people. One that evaluates the likelihood of recidivism for criminal offenders could make racist sentencing calls if it’s fed information based on racial stereotypes of how people behave. That isn’t hypothetical: In May, ProPublica published an investigation into the algorithmically derived risk scores used to inform criminal sentencing in Broward County, Florida. The scores were not very accurate: They were right only about 60% of the time in predicting who would reoffend. Black offenders were mislabeled as likely to commit future crimes almost twice as often as white people were mislabeled.