Obama’s Precision Medicine Initiative Is The Ultimate Big-Data Project

Curing both rare diseases and common cancers doesn’t just require new research, but also linking all the data that researchers already have.

Obama’s Precision Medicine Initiative Is The Ultimate Big-Data Project
[Photo: Flickr User DoD News Features]

Modern medicine is incredibly data-intensive, especially now that doing a full genetic sequencing of every patient is becoming affordable and more prevalent. New efforts to develop tailored treatments for illnesses–known as precision medicine–will require collecting and sharing data on a hard-to-comprehend scale. That’s the biggest takeaway from the White House Precision Medicine Initiative (PMI) Summit.


It was big data meets genetics, with the live-streamed presentations and panel discussions (one featuring President Obama) featuring the data-science abbreviation API—application programming interface—as much as DNA. Aside from the inspirational speeches (and some self-congratulatory exchanges), the summit included a huge list of initiatives by dozens of government bodies, tech, and biotech companies, and nonprofits to advance tailored medicine. The projects are dominated by data collection, storage, and–most importantly–sharing. Getting patients access to their own health records—currently a Kafkaesque bureaucratic process—is also one of the goals, as well as making it easier for patients to donate their data to research.

The undertaking, which is connected with the government’s Cancer Moonshot, will have to overcome massive logistical barriers, as well as the challenges of big egos and bureaucratic inertia.

The Data Beast

First, there’s the scale: The underlying concept of precision medicine is to refine the understanding of someone’s illness based on their specific genetic makeup and other personalized medical data. One person’s kidney cancer, for instance, may not be like that of another. In fact, researchers have already discovered 16 genetic variants of kidney cancer that, until recently, would have all been treated the same. “I couldn’t practice medicine really without what we now call precision medicine,” said Dr. Marston Linehan, chief of urologic oncology at the National Cancer Institute (NCI). “It helps us decide what operation to do, whether to do an operation or not, what drug to give.”

Knowing the exact mutations that cause a particular cancer, neurodegenerative disease, digestive ailment, or other illness promises to make it easier to pick the best medication or develop new ones targeted to an individual’s condition. But that requires a lot of data. Sequencing the genomes of just the people who will be diagnosed with cancer this year (about 1.65 million) will amount to four exabytes—four billion gigabytes—of data, or 400,000 times all the information in the Library of Congress, said Eric Dishman, general manager of Health and Life Sciences at Intel, and a cancer survivor. “This is one of the biggest of the big-data challenges that we’re ever going to have to solve to be able to share this data,” he said.

The federal government is already gearing up to collect info at that scale. Today the National Institutes of Health (NIH) announced the PMI Cohort Program, which will enroll at least one million people for a longitudinal study—one that tracks people’s health over many years—in order to learn about a variety of diseases. Vanderbilt University and Verily, Google’s big-data health spin-off, are being tapped to pilot the project, which aims to recruit its first 79,000 participants by the end of the year.

This is in addition to the Million Veteran Program run by the Department of Veterans Affairs, with help from the Department of Defense (both big recipients of new proposed cancer research funding, too). The program has already signed up about 455,000 service personnel (vets and active duty) who have volunteered to share their medical data for research, including 400,000 genes.


The challenge is not just in collecting new data, but in accessing what’s already there—most of which is locked away from other researchers. Only 4% of data on cancer patients, for example, is accessible to researchers, said Dishman.) Privacy is one of the big reasons that data can’t be shared. In theory, it can be anonymized and/or shared with a patient’s consent, but institutions are wary of the liabilities if data gets out. The White House just put out a draft proposal for how to maintain data security in the Precision Medicine Initiative.

Out-of-date and incompatible technologies are other reasons. Even a single hospital may have multiple patient-data systems that can’t talk to each other—let alone share data securely with institutions around the country. The White House and companies announced several projects to help tackle those compatibility nightmares, though there likely won’t be quick fixes (electronic health record modernization is a component of the Affordable Care Act, aka Obamacare, and has been limping along for years). he The Advisory Board Company–a research, tech, and consulting firm–will be building APIs for pilot projects in up to five health care organizations to make it easier to share data.

Institutions also have reasons to hoard their data as an asset, since they are competing with other organizations to get credit for their work, and thus funding to do more work. “Research dollars and grants flow in the direction of who gets credit,” said Obama. “Redesigning…grant making to encourage collaboration rather than siloing—that’s going to be important.”

Power To The Patients

The biggest promises for data sharing that came out of the summit were for patients. The current, non-standardized processes of requesting and receiving records sometimes look like a throwback to the 80s—with faxes still playing a major role. “It was like going to the DMV every day,” said Noga Leviner, a Crohn’s disease sufferer and CEO of PicnicHealth, a consumer-focused service that collects patients’ records for them and enters it all in a uniform online portal. Leviner pledged to provide a free consumer guide on how to request medical records, along with detailed instructions, including phone numbers, for the 500 biggest hospitals and health systems in the U.S.

Many of those health providers announced their own patient-data plans, too. Yale New Haven Health, Intermountain Healthcare in Utah, and UCLA Health, for example, announced programs that will let patients easily download their health records starting this year or next. One focus of these data-access programs is letting patients donate their information to research. It might be easier for institutions to share data through the patients instead of between each other—both logistically and in terms of privacy and consent. NIH just launched a pilot program called “Sync for Science” that will develop open data standards so that people can collect their electronic health records and also submit them to research programs like the NIH’s own efforts.

With modern information technology—from Google searches to downloadable genetic sequence reports—patients can know more about their own conditions than any physician or researcher. And who has more incentive to work for a cure? “I always expected that if anything ever happened to me, if I ever got sick, there would be someone in charge,” said Leviner. “As it turns out, that really doesn’t exist, and the only person that can do that is you, frankly, the patient.”

About the author

Sean Captain is a technology journalist and editor. Follow him on Twitter @seancaptain.