If you read about AI at all, most headlines would have you believe that AI is posed to remake our lives in just a few years. But consider something else: The internet was invented in 1983. It’s taken designers and engineers three more decades to figure out how to weave connectivity into our daily rhythms. And we are still working, still figuring out the patterns and expectations that make our gadgets so user-friendly that they’re “obvious.”
The point is, there are too many people asking what AI will transform, and not enough asking how. There may be no greater example than in medicine. AI has remarkable promise for the industry. Done right, even basic machine learning could transform how doctors work, making them smarter, more efficient, and less error-prone. Yet doctors themselves, while eager to try out the newest procedure or medicine, typically remain dead set against a machine telling them what to do. “Almost every decision-support system that has moved from lab to the clinic has failed,” points out John Zimmerman, a professor of human-computer interaction at Carnegie Mellon.
Over the last few years, Zimmerman’s research team—working with James Antaki, a professor of biomedical engineering at CMU—has been trying to build a digital tool with a deceptively simple purpose: to help cardiac surgeons decide whether to implant a mechanical heart, by telling the how well the patient might fare once the surgery is done, based on predictions made by an AI trained on a database of 5,000 patients who had such surgery. In so doing, Zimmerman’s work presents a case study for making AI into a tool that people actually use.
The Dizzying Hierarchies Of Hospital Life
Digital tools for doctors have typically failed because human-centered design is almost totally foreign to how those tools are developed. Thus, Zimmerman’s team began by watching how doctors actually make decisions. They discovered an almost comically difficult picture for any would-be designer.
The physical and social context was daunting. For starters, the cardiologists they observed have to wash their hands so much that they simply don’t use many computers themselves; the handling of technology instead gets handed off to nurses and junior residents. In that deeply hierarchical environment, the ones making decisions—the senior cardiologists—are in fact the ones seeing patients the least, and who are the least likely to be exposed to any sort of computer tool.
Even if junior residents and nurses had that tool themselves, the hierarchy of hospital culture means they could hardly challenge their attending, lest they be fired. One other social puzzle: Zimmerman’s team found that decision making didn’t happen at any single magical moment—rather, it was the outcome of dozens of interactions that played out over time, as doctors consulted each other and took in tidbits of new information. But if those doctors didn’t make just one decision at some pivotal moment—and they hardly ever used technology—then how could you make a computer tool to help them?
The psychological barriers even more complex, starting with the doctors themselves. The lead cardiologists at the world-class hospitals that Zimmerman observed saw themselves by definition as exceptional—they’d spent their entire careers delivering better outcomes. Years spent knowing that they were the best had ingrained in them a sense that performance data didn’t actually apply to them. “They’re trained to look at themselves as exceptional. They see that 50% of people die during some procedure and they say, ‘Not my people,'” says Zimmerman. “That’s a requirement of these kinds of decisions but it presents a problem for us.”
The Trojan Horse
So on one hand, you have a social context that makes computer tool seems superfluous. On the other, you have doctors who themselves aren’t apt to accept the statistical recommendations of a machine. “Is there an opportunity for a machine to change the social dynamic? That’s a messy human problem,” says Zimmerman. “An algorithm can’t fix that unless it’s addressing the complexity of human decision making.” And yet Zimmerman’s team found a solution that lay in finding a crucial bit of daylight in how the doctors worked: the weekly consultations they performed with their peers, to go over tough cases.
They noticed that at that time, it was up to the nurses and support staff to put all the relevant case data on a slide that could be projected up on the wall as the doctors debated each case. “No one gets paid to make those slides, so the non-medical staff do them,” points out Qian Yang, Zimmerman’s graduate assistant and the day-to-day workhorse on the project. Zimmerman’s team had the ingenious idea of inventing a slide-building tool for those weekly consultations that would combine all the relevant data on a patient—with one extra set of data on patient outcomes provided by the AI.
That way, doctors would see that data as a bonus that they could talk about among themselves. For junior staffers, they wouldn’t ever have to say, “The AI disagrees with you.” Rather, the disagreement was simply a data point on a slide. “It allows a junior staff member to ask, ‘What do you think this data is saying?'” says Yang. “That’s not the same as saying, ‘I don’t think we should do this.'” More specifically, the slide of patient information would include a data point showing how long they were expected to live after a mechanical-heart transplant—either two months, six months, or two years, in the case of the prototype Zimmerman built. That information could then spark a discussion about whether the surgery was warranted at all.
The broader strategy was to create a tool carefully embedded in existing rituals while at the same time exposing the AI’s recommendations at precisely the point of decision making—the weekly peer consultation. The one-click slide-building tool is akin to a Trojan horse. Because its utility was so irresistible to doctors and staff alike, the recommendations of an AI could quietly come along for the ride.
Lessons For Would-Be AI Designers
So far, Zimmerman says that user testing has been positive for the one-click tool. Doctors have been happy to get a tool that saves the money and time of creating slides. The next step for Zimmerman’s team is to see if the idea of presenting machine-learning recommendations to the exact point of decision making is a strategy that can be generalized to other decisions-support systems.
The fourth and final year of the current project will also be devoted to tackling the knotty problem of making high-flying doctors trust a computer system. Here, Zimmerman’s team has come up with another clever psycho-social hack. The core problem of doctors thinking they’re too good to need a machine’s recommendation comes down to the doctors thinking themselves to be exceptions to whatever data is presented to them.
One solution might be to have a system with a baseline set of recommendations, trained using a broader set of data, which the doctor then fine-tunes herself. Over the time, the recommendations would thus be a mix of the machine’s intelligence and the doctor’s. The point isn’t to create a smarter-than-thou system, but rather a system that holds the doctor up to a standard that the doctor has set using her own expertise. Adds Zimmerman, “We’re using this mechanism to increase trust and to fight against doctors not using the system.”
It’s that problem of fostering trust that’s likely to have applications far beyond cardiology. These design solutions may one day be relevant in the tiny details of how we interact with machines all around us.