You could be forgiven for thinking that AI will soon replace human physicians based on headlines such as “The AI Doctor Will See You Now,” “Your Future Doctor May Not Be Human,” and “This AI Just Beat Human Doctors on a Clinical Exam.” But experts say the reality is more of a collaboration than an ousting: Patients could soon find their lives partly in the hands of AI services working alongside human clinicians.
There is no shortage of optimism about AI in the medical community. But many also caution the hype surrounding AI has yet to be realized in real clinical settings. There are also different visions for how AI services could make the biggest impact. And it’s still unclear whether AI will improve the lives of patients or just the bottom line for Silicon Valley companies, healthcare organizations, and insurers.
“I think that all our patients should actually want AI technologies to be brought to bear on weaknesses in the healthcare system, but we need to do it in a non-Silicon Valley hype way,” says Isaac Kohane, a biomedical informatics researcher at Harvard Medical School.
If AI works as promised, it could democratize healthcare by boosting access for underserved communities and lowering costs—a boon in the United States, which ranks poorly on many health measures despite an average annual healthcare cost of $10,739 per person. AI systems could free overworked doctors and reduce the risk of medical errors that may kill tens of thousands, if not hundreds of thousands, of U.S. patients each year. And in many countries with national physician shortages, such as China where overcrowded urban hospitals’ outpatient departments may see up to 10,000 people per day, such technologies don’t need perfect accuracy to prove helpful.
But critics point out that all that promise could vanish if the rush to implement AI tramples patient privacy rights, overlooks biases and limitations, or fails to deploy services in a way that improves health outcomes for most people.
“In the same way that technologies can close disparities, they can exacerbate disparities,” says Jayanth Komarneni, founder and chair of the Human Diagnosis Project (Human Dx), a public benefit corporation focused on crowdsourcing medical expertise. “And nothing has that ability to exacerbate disparities like AI.”
Today, the most popular AI techniques are machine learning and its younger cousin, deep learning. Unlike computer programs that rigidly follow rules written by humans, both machine learning and deep learning algorithms can look at a data set, learn from it, and make new predictions. Deep learning in particular can make impressive predictions by discovering data patterns that people might miss.
But to make the most of these predictions in healthcare, AI can’t go at it alone. Rather, humans still must help make decisions that can have major health and financial consequences. Because AI systems lack the general intelligence of humans, they can make baffling predictions that could prove harmful if physicians and hospitals unquestioningly follow them.
The classic example comes from Rich Caruana, a senior researcher at Microsoft Research, as he explained in Engineering and Technology magazine last year. In the 1990s, Caruana worked on a project that tried using an earlier form of machine learning to predict whether a patient with pneumonia was a low-risk or a high-risk case. But trouble arose when the machine learning model tried to predict the case for asthma sufferers, who are high-risk because their preexisting breathing difficulties make them vulnerable to pneumonia. The model pegged these patients as low-risk, requiring minor intervention rather than hospitalization—something a human expert never would have done.
If you follow the model blindly, says Kenneth Jung, a research scientist at the Stanford Center for Biomedical Informatics Research, “then you’re hosed. Because the model is saying: ‘Oh, this kid with asthma came in and they got pneumonia, but we don’t need to worry about them, and we’re sending them home with some antibiotics.'”
Deep-learning predictions can also fail if they encounter unusual data points, such as unique medical cases, for the first time, or when they learn peculiar patterns in specific data sets that do not generalize well to new medical cases.
The AI predictions do best when applied to massive data sets, such as in China, which has an advantage in training AI systems thanks to access to large populations and patient data. In February, the journal Nature Medicine published a study from researchers based in San Diego and Guangzhou, China, that showed promise in diagnosing many common childhood diseases based on the electronic health records of more than 567,000 children.
But even large data sets can pose problems, particularly when researchers try to apply their algorithm to a new population. In the Nature Medicine study, all of the half million patients came from one medical center in Guangzhou, which means there is no guarantee the diagnostic lessons learned from training on that data set would apply to pediatric cases elsewhere. Each medical center may attract its own unique set of patients—a hospital known for its cardiovascular center, for instance, may attract more critical heart conditions. And findings from a Guangzhou hospital that mostly attracts ethnic Chinese patients may not translate to one in Shanghai with a higher number of foreign-born, non-Chinese patients.
(CAPTION: In this 2017 TEDx Talk, Shinjini Kundu of Johns Hopkins Hospital explains how AI tools have the potential to glean more from medical images than doctors alone can—including predicting diseases before patients show symptoms.)
This extrapolation will prove difficult in other situations as well. For example, says Marzyeh Ghassemi, a computer scientist and biomedical engineer at the University of Toronto, say you have 40,000 ICU patients at Beth Israel Deaconess Medical Center—that’s just one hospital in one city. “And so I have all these papers that have done predictions with this data. Does that work with another hospital in Boston? Maybe. Does it work for a hospital in another state? Would it work in another country? We don’t know.”
While AI models may not work in every case, Ghassemi thinks the technology is still worth exploring. “I am very much in favor of taking these models from the bench to the bedside,” she says, “but with really aggressive precautionary steps.”
Those steps need to exist throughout AI development and deployment, says I. Glenn Cohen, a law professor at Harvard University and a leader for the Project on Precision Medicine, Artificial Intelligence, and the Law. This may involve verifying the accuracy and transparency of AI predictions. And during data collection, researchers will also need to protect patient privacy and ask for consent to use patient data for training AI.
The consent issue comes up again when the AI model is ready for experimental clinical testing with real patients. “Do patients need to be told you’re using the algorithm on them, and does it matter whether the AI is completely guiding care or partly guiding care?” Cohen asks. “There is really very little thinking on these questions.”
Ghassemi also advocates for frequently auditing AI algorithms to ensure fairness and accuracy across different groups of people based on ethnicity, gender, age, and health insurance. That’s important given how AI applications in other fields have already shown that they can easily pick up biases.
After all those steps, the people and companies providing AI services will need to sort out legal liability in the case of inevitable mistakes. And unlike most medical devices, which usually need just one regulatory approval, AI services may require additional review whenever they learn from new data.
Some regulatory agencies are rethinking how to assess healthcare AI. In April, the U.S. Food and Drug Administration (FDA) released a discussion paper to get public feedback about how to update the relevant regulatory review. “What we are continuously trying to do here is get back to our goal of giving people access to technologies, but we’re also realizing that our current methods don’t quite work well,” says Bakul Patel, director for digital health at the FDA. “That’s why we need to look at a holistic approach of the whole product life cycle.”
In addition to issues surrounding access, privacy, and regulations, it also isn’t clear just who stands to benefit the most from AI healthcare services. There are already healthcare disparities: According to the World Bank and the World Health Organization, half of the globe’s population lacks access to essential healthcare services, and nearly 100 million people are pushed into extreme poverty by healthcare expenses. Depending on how it is deployed, AI could either improve these inequalities, or make them worse.
“A lot of the AI discussion has been about how to democratize healthcare, and I want to see that happening,” says Effy Vayena, a bioethicist at the Federal Institute of Technology in Switzerland.
“If you just end up with a fancier service provision to those who could afford good healthcare anyway,” she adds, “I’m not sure if that’s the transformation we’re looking for.”
How this all plays out depends on the different visions for implementing AI. Early development has focused on very narrow diagnostic applications, such as scrutinizing images for hints of skin cancer or nail fungus, or reading chest X-rays. But more recent efforts have tried to diagnose multiple health conditions at once.
In August 2018, Moorfields Eye Hospital in the United Kingdom and DeepMind, the London-based AI lab owned by Google’s parent company Alphabet, showed that they had successfully trained an AI system to identify more than 50 eye diseases in scans, which matched the performance of leading experts. Similarly broad ambitions drove the San Diego and Guangzhou study that trained AI to diagnose common ailments among children. The latter wasn’t as good at diagnosing pediatric diseases compared to senior physicians, but it did perform better than some junior physicians.
Such AI systems may not need to outperform the best human experts to help democratize healthcare but simply expand access to current medical standards. Still, so far, many proposed AI applications are focused on improving the current standard of care rather than spreading affordable healthcare around, Cohen says: “Democratizing what we already have would be a much bigger bang for your buck than improving what we have in many areas.”
Accenture, a consulting firm, predicts that top AI applications could save the U.S. economy $150 billion per year by 2026. But it’s unclear if patients and healthcare systems supplemented by taxpayer dollars would benefit, or if more money would simply flow to the tech companies, healthcare providers, and insurers.
“The question of who is going to drive this and who is going to pay for this is an important question,” says Kohane. “Something a bit hallucinatory about all those business plans is that they think they know how it will work out.”
Even if AI services make cost-saving recommendations, human physicians and healthcare organizations may hesitate to take AI advice if they make less money as a result, Kohane cautions. That speaks to the bigger systemic issue of the U.S. health insurers using a fee-for-service model that often rewards physicians and hospitals for adding tests and medical procedures, even when they aren’t needed.
There is another AI opportunity that could improve the quality of care while still leaving most medical diagnoses in the hands of doctors. In his 2019 book “Deep Medicine,” Eric Topol, director and founder of the Scripps Research Translational Institute, talks about creating essentially a supercharged medical Siri—an AI assistant to take notes about the interactions between doctors and their patients, enter those notes in electronic health records, and remind physicians to ask about relevant parts of the patient’s history.
“My aspiration is that we decompress the work of doctors and get rid of their data clerk role, help patients take on more responsibility, and key up the data so it doesn’t take so long to review things,” Topol says.
That “never-forgetful medical assistant or scribe,” says Kohane, would require AI that can automatically track and transcribe multiple voices between physicians and patients. He supports Topol’s idea but adds that most AI applications in development don’t seem to be focused on such assistants. Still, some companies such as Saykara and DeepScribe have developed services along these lines, and even Google teamed up with Stanford University to test a similar “digital scribe” technology.
An AI assistant may sound less exciting than an AI doctor, but it could free up physicians to spend more time with their patients and improve overall quality of care. Family physicians in particular often spend more than half of their working days entering data into electronic health records—a main factor behind physical and emotional burnout, which has dire consequences, including patient deaths.
Ironically, electronic health records were supposed to improve medical care and cut costs by making patient information more accessible. Now Topol and many other experts point to electronic health records as a cautionary tale for the current hype surrounding AI in medicine and healthcare.
The implementation of electronic health records has already created a patchwork system spread among hundreds of private vendors that mainly succeeds in isolating patient data and makes it inaccessible to both physicians and patients. If history is any guide, many tech companies and healthcare organizations will feel the pull to follow similar paths by hoarding medical data for their own AI systems.
One way around this may be to use a collective intelligence system that aggregates and ranks medical expertise from different sources, says Komarneni, who is trying this approach with Human Dx. Backed by major medical organizations such as the American Medical Association, Human Dx has built an online platform for crowdsourcing advice from thousands of physicians on specific medical cases. Komarneni hopes that such a platform could, in theory, also someday include diagnostic advice from many different AI services.
“In the same way that multiple human professionals might look at your case in the future, there is no reason why multiple AI couldn’t do it,” Komarneni says.
As doctors wait for their AI helpers, crowdsourcing projects like Human Dx “could definitely lead to improved diagnostics or even improved recommendations for therapy,” says Topol, who coauthored a 2018 study on a similar platform called Medscape Consult. The paper concluded collective human intelligence could be a “competitive or complementary strategy” to AI in medicine.
But if AI services pass all the tests and real-world checks, they could become significant partners for humans in reshaping modern healthcare.
“There are things that machines will never do well and then others where they’ll be exceeding what any human can do,” Topol says. “So when you put the two together, it’s a very powerful package.”
Jeremy Hsu is a freelance journalist based in New York City. He frequently writes about science and technology for Backchannel, IEEE Spectrum, Popular Science, and Scientific American, among other publications.