Welcome To The Cloud Hospital, Where Big Data Takes On Mysterious Medical Conditions

Medical researchers are pooling their data to find diagnoses and cures for patients with extremely rare and sometimes undiscovered diseases.

You probably haven’t heard of a disease called arterial calcification due to deficiency of the CD73 enzyme, which causes painful calcium buildup in the joints and blood vessels. Discovered in 2011 through the Undiagnosed Disease Program (UDP) at the U.S. National Institutes of Health, only a handful of individuals are believed to suffer from the disease, which is also known as ACDC.


For people with obscure conditions, sometimes called mystery diseases, UDP has been a last resort that combines weeklong medical examinations, genetic sequencing, and data analysis in an effort to finally find a diagnosis and treatment for patients who are at wit’s end.

A knee x-ray of a patient with ACDC, a rare calcification disorder, reveals calcification in the main artery supplying blood to the lower leg.

The federal program recently joined with six private institutions, including Harvard Teaching Hospitals and Stanford Medical Center, as part of the Undiagnosed Disease Network (UDN), which links databases and lets would-be patients apply for admission online rather than having to ship paper records. “This allows all of the sites to really work together to diagnose the patients,” says Anastasia Wise, an epidemiologist and co-coordinator of the UDN. Such pooling of resources is part of a bigger trend linking up medical databases around the world to mine them for insights.

Since its founding in 2008, the original Federal program has been able to diagnose about 25% of the 800 people–from the U.S. and other countries–who have been accepted for exams. That modest diagnosis rate belies other benefits. “Sometimes [the] information that has come out of the clinical evaluation has led to a treatment option for the condition that has helped the patient, even without a diagnosis,” says Wise.


The exams also provide bigger-picture medical insights. Fibromyalgia and Chronic Fatigue Syndrome (CFS) may be the best-known mystery diseases—undeniable symptoms that so far lack a definitive cause, or causes. But there are many others. About half of the patients (both men and women) have neurological disorder symptoms, such as cognitive issues or problems with movement.

Louise Benge of Brodhead, Kentucky, is monitored while walking. Annette Stine, research coordinator at the National Heart Lung and Blood Institute, NIH, monitors the treadmill test.

Cloud Hospital

On a basic level, the new UDP promises to harnesses the efficiency of the web that makes everything from finding an obscure product on Amazon to making a restaurant reservation with OpenTable easier. Patients can expand their search for medical experts beyond their hometown doctor. They can also clear the roadblock of doctors who don’t know, and sometimes don’t care, about a non-textbook medical problem.

“We understand that patients can have a difficult time finding someone who can understand their condition,” says Wise. Applying to UDN does require a doctor’s letter describing the patient’s medical history and symptoms, but that’s a standard procedure that any doctor can do. “Typically, patients can find someone who can write that letter for them, even if it’s not their primary care physician,” says Wise.


Those who are accepted into the program undergo a customized week of examinations by anyone who might be able to help. “It’s really unique to get that many specialists to see a patient at once,” says Wise. Most of the patients also receive genetic sequencing.

Collecting and comparing all this patient data is key for understanding new diseases. If two or more patients share outward symptoms, what clinicians call phenotypes, and genetic variations, or genotypes, doctors might be able to make a cause-and-effect link. As the databases grow, UDN will have more opportunities to put the robots to work, using the artificial intelligence of machine learning. Programs sort through massive amounts of data to flag similarities that a human doesn’t have the capability to process. “Machine learning may be used to help determine how patterns of patient symptoms, genetic variants found from the sequence, and phenotypes seen in model organisms relate to help with diagnosing patients,” Wise wrote in an email.

All the UDN sites have agreed to share patient data, which includes the names of the patients. “This is really rather unique,” says Wise. It’s not unusual for medical research institutions to share anonymous data to study diseases, but you can’t treat someone if you don’t know who they are. Patients in UDN sign a consent form allowing their personal info to be available to all the researchers and clinicians in the network, combining all the institutions into a giant virtual hospital.

Louise Benge undergoes Doppler arterial examination to evaluate arterial circulation of the lower extremities. Kevin Smith, NIH clinical research nurse and exercise physiologist and research coordinator at the National Heart Lung and Blood Institute, and Catherine Groden, NIH nurse practitioner, observe Doppler arterial waveforms on a monitor.

The End Of Digital Chicken Scratch

The stereotype about doctors’ illegible writing carries over to medical databases, which are full of sloppily typed freeform notes. Take, for example, the various ways that doctors at the Hospital for Sick Children in Ontario recorded the “behavioral problems” phenotype:

behavioral problem
behavioural problems
behaviour problem
behavoural problem
behav. Pro
behav. pro.
behav. prob.
behav. problem.

Those are among many examples dug up in 2013 by University of Toronto researchers who developed an online tool called PhenoTips earlier that year. Instead of freeform fields, PhenoTips uses autofill to provide options from a standard catalog called the Human Phenotype Ontology (HPO). Based on each term, PhenoTips suggests additional terms to fill out a structured tree.

“So you can have two patients, but you might not have put in the exact same terms to describe their symptoms,” says Wise. “But you can tell that those two symptoms are similar to each other because they’re related to a higher-level term that’s more general.” PhenoTips also has a freeform field; if variations of a new term keep popping up, a standard version can be added to the database. All members of UDN use PhenoTips, as do other organizations that can all mine the PhenomeCentral database for similar patients.


On the genotype side, UDN is part of a network called Matchmaker Exchange, which also launched in 2013. Rather than providing a central database, Matchmaker Exchange links databases of its member organizations, including UDN. Someone using any of these databases can query all of them at once to find a genetic mutation that is common to people who never saw the same specialists or visited the same center.

About 100 people have applied to UDN online in its first month, which is a bit more than the original program averaged per month with mail-in applications, says Wise; and she hopes the UDN will have some diagnoses in the next couple of months.

While everyone wants a quick answer, the pooling of ever more medical data and the development of smarter AI raises the hope that answers will eventually come. “Sometimes it can still take years before a diagnosis is found,” Wise says. “Because it might be that data will be reanalyzed later with a new technique that comes along, and that might be the piece that finds a solution.”



About the author

Sean Captain is a business, technology, and science journalist based in North Carolina. Follow him on Twitter @seancaptain.