Getting a full readout of your entire genetic sequence promises to radically alter how we monitor our health, providing advanced warning of cancer and other diseases we may suffer and our chances of passing on those ailments. Clinical genetic testing firm Illumina is valued at nearly $23 billion, for example, while direct-to-consumer offering 23andMe is at about $1 billion. Meanwhile, the price for so-called whole genome sequencing has dropped to about $1,000.
But such whole-genome sequencing currently over-promises in several ways. One of them is a false sense of what constitutes a "normal" genome with which to compare someone's results. (The U.S. government's National Institutes of Health provides a narrow, widely used model.) The promise is best if you're white, and drops off fast for other ethnicities, like people of African origin. That’s because we simply don't have easy access to enough reference genomes, from a big enough variety of people, to understand the range of normal. Nor is there much willingness for companies that analyze genomes to look at all the varieties that are out there.
Geneticists aren't blind to this problem. In fact, it's been a big topic of discussion at the Future of Genomics conference, a gathering of top genetics experts and entrepreneurs in San Diego. One solution is to simply share existing databases of sequenced genomes (in anonymized form). "We have an enormous and rapidly increasing quantity of genomic information," said David Haussler, a researcher at the UC Santa Cruz Genomics Institute. "And where is that? It's held in silos all over the world."
Haussler is a leader of a project called the Global Alliance for Genomics and Health. Instead of building a master database of genetic data, the alliance is modeling itself on the way that consumer online services and apps work: creating common application programming interfaces (APIs) so that anyone can plug into any database with the same software. (It's a biotech process like the one that, for example, allows you to access a cloud service like Dropbox from within an app like Slack). The Alliance is an open-source project, with all the code available on free programmer community GitHub.
The project is not just an academic endeavor. Among the nearly 400 members from about 40 countries, it has recruited about 120 companies, ranging from traditional tech firms like Google and Microsoft to genetic testing firms like Color Genomics, which focuses on consumer breast cancer screening. Haussler says that companies are willing to give up some data in order to get info from others that makes their services better. "We collaborate on the interface, but we compete on the implementation," he said. "And we're getting creative products out of it."
But not everyone goes for that argument. Myriad Genetics, for instance, tried to patent the genes that cause breast cancer. However, a 2013 Supreme Court case invalidated their patents on genes, which enabled the emergence of many competing firms.
The Global Alliance for Genomics and Health has several projects, such as a deep dive on how the genes for breast cancer vary from patient to patient. (12,000 versions have been found so far.) Another effort, the Human Genome Variation Map, focuses on comparing the full genomes of people from around the world. Even the scarce data now commonly available shows just how varied normal DNA is.
As an example, Haussler showed a spaghetti graph of squiggly lines—each representing the majority of genes that encode the human immune system. These variations are included in the U.S. government's reference genome (called GRCh38). One line is the reference that almost everyone uses, and the other seven are official variations. "Why those seven?" asked Haussler. "Because that's all we had. Is it a fair representation? No. Does anybody use them? No."
One piece of good news is that getting more genetic samples may not require digging deeply into old databases that are hard to make compatible. It's enough to make sure that the new data is easily accessible, according to Haussler. "The amount of genome data we will create next year dwarfs all previous historical genome data," he said.