How do we make sure we’re designing artificial intelligence that takes human behavior into account? How do you integrate machine learning into a product or service while ensuring that it doesn’t perpetuate bias, oversimplify nuance, or bombard everyone with fake news? To put it bluntly, how do we design algorithms that aren’t evil?
Google’s AI research division, Google Brain, says it’s on a mission to find out. On Monday, the company announced a new research program called the People + AI Research initiative (PAIR for short) that’s all about understanding how humans interact with machine learning. As part of that effort, the company has developed a set of best practices that its teams use to design experiences that include machine learning.
It’s part of a philosophy the Google UX community is calling “human-centered machine learning,” where machine learning algorithms solve problems while keeping human needs and behaviors in mind. Detailed on Medium by Josh Lovejoy and Jess Holbrook, two designers in the Research and Machine Intelligence group at Google, these are Google’s rules for designing with machine learning while still keeping the user–and their humanity–at the center. Here are a few of the basics.
Machine Learning Doesn’t Solve Everything
Lovejoy and Holbrook write that before rushing to include machine learning in your product or service, remember that it’s still your job as the designer to identify the problem and how best to solve it. Do the research that would be part of your conventional design process. Some problems may not need machine learning at all, while others might be perfectly suited to it. The point is, the algorithm doesn’t know if it’s the right tool to solve a problem. Don’t throw machine learning at everything–especially because it can be costlier to build than a simple fix.
For instance, Lovejoy and Holbrook point to the Gmail feature that reminds users to attach a file if they’ve mentioned the word “attachment” or “attached” in the body of the email. There’s no machine learning involved there—while AI might find more missing attachments, it’d be much more complicated and time-intensive to build.
In order to make sure machine learning is the right tool for the job, the duo recommend asking questions like these in order to identify what users expect from an AI-powered product:
Describe the way a theoretical human “expert” might perform the task today.
If your human expert were to perform this task, how would you respond to them so they improved for the next time? Do this for all four phases of the confusion matrix.
If a human were to perform this task, what assumptions would the user want them to make?
Out of a group of ideas for how to solve a problem, plot out which solutions would have the largest impact for users, and which would benefit the most from using machine learning. Ideas that both depend heavily on machine learning and would create the greatest impact for the users are the best ones to tackle.
Prototype With Real People Instead Of A Real Algorithm
One option for prototyping the duo suggest involves using participants’ real data (with their permission) in your machine learning system–another is to not use a machine learning system at all. This is called “Wizard of Oz” testing. In essence, participants believe they’re interacting with an AI system, but it’s actually controlled by a human. It was popular as a testing method 20 years ago, write Lovejoy and Holbrook, but the advent of machine learning has brought it back into the mainstream. These interactions are essential to guiding the design because when participants can earnestly engage with what they perceive to be an AI, they will naturally tend to form a mental model of the system and adjust their behavior according to those models,” they write.
Understanding how users’ mental models are formed is key to being able to design interactions. You can also learn about this by using participants’ data to simulate a wrong answer. How does the user respond when the machine fails? How does that change their future interactions?
Design With The System’s Failure In Mind
A machine mis-categorizing input data might seem like a small mistake. But when that input data is a real-life human, being mis-categorized by an algorithm can have major consequences. For instance, if a machine learning algorithm is deciding whether a user is a bot or a real person, it matters a lot to a person who’s wrongly blocked than a bot that’s wrongly blocked. So you’d better make sure you’re thinking about the false positive.
In order to do this, Lovejoy and Holbrook recommend using what’s called a “confusion matrix,” which lays out on a grid when the algorithm’s response is accurate, when its response is inaccurate, when it returns a false positive, and when it returns a false negative. Ultimately that means deciding what’s more important–the precision (where there are fewer wrong answers, but less right ones), or the recall (where all the right answers are included, but there might be more wrong ones as a result). In some cases, it’s more important to prioritize precision over recall and vice versa–but that means understanding what is more important to your user.
Take Google Photos, for instance. Google designers decided that it’s important that if you type in “playground,” every single playground shows up–even if there are some photos that don’t fit in. But in terms of an algorithm that identifies online bots versus humans, perhaps it’s more important to be precise, so you don’t risk antagonizing users by locking them out of their accounts on the grounds that they’re not people.
Get Feedback, Forever
How are you going to know how well the system is doing if you don’t know when it makes mistakes? Lovejoy and Holbrook write that sometimes machine learning models can be unpredictable, especially if the user’s idea of how a particular feature is supposed to work differs from the designer’s. Make sure you anticipate long-term feedback methods and build them into your platform directly for quantitative feedback–and sit down with people as they’re using it as well, to observe how their expectations from the platform change. In a world where designers don’t always understand how their AI-powered services and products work, Google’s solution is simple: get that data.
Of course, while Google has presented a set of rules on how to design for AI, the launch of the PAIR initiative is proof that even one of the pioneers of machine learning doesn’t yet understand how to responsibly design AI where humans are at the center. While it’s a promising step for Google, such initiatives are also a reminder of the challenges inherent in this technology: biased data, fallible assumptions, disregarded privacy, and all their consequences.