How To Store Your Data For A Million Years

Science is building storage mediums for the distant future. But what information will we preserve? And will anyone be able to read it?

How To Store Your Data For A Million Years
[Photo: Mads Perch, Getty Images]

“We are interested in now, most of us,” says Robert Grass, a researcher in chemistry at ETH Zurich. “We buy our furniture in Ikea. We don’t care if in 10 years it falls apart. With information it is similar. We don’t think into the future.“


But Grass isn’t like most of us. His team, which is exploring how to use DNA as a data storage mechanism, is one of several academic and commercial entities grappling with the challenge of protecting data against the elements over time spans stretching out to millions of years.

At the moment, when our data isn’t entrusted to cloud servers, it’s left on disks and drives and cards and an array of devices that aren’t designed to last longer than a decade. “If my son shows photos to his grandsons,” says Grass, “he will have the photos of my parents, which are in black and white, and will be stable for a few hundred years. But there will be a hole after that because my photos won’t survive. Statistically they won’t, unless I am really careful about what I do with them.”

That kind of concern, he believes, belongs not only to data mavens but to humanity as a whole. “How we pick what to store will very strongly influence how our future will think of us.”

The idea of storing information on DNA traces back to a Soviet lab in the 1960s, but the first successful implementation wasn’t achieved until 2012, when biologist George Church and his colleagues announced in the journal Science that they had encoded one of Church’s books in DNA. More recently, reports the New Yorker, the artist Joe Davis, now in residence at Church’s lab, has announced plans to encode bits of Wikipedia into a particularly old strain of apple, so that he can create “a living, literal tree of knowledge.”

DNA can store a vast amount of information in a tiny amount of organic material. “You could take all the information of the world and store it in a few grams of DNA–unimaginable with all other techniques we have,” says Grass.

Under the right conditions, DNA can also last a very long time. In 2013, a complete genome was extracted from the fossil of a 700,000 year old horse found in Canada. Inspired by fossils like these, Grass’s team embedded DNA into a dense, inorganic material–microscopic spheres of silica, with a diameter of roughly 150 nanometers–in order to protect it from humidity, oxygen, and other environmental aggressors. (The researchers encoded Switzerland’s Federal Charter of 1291 and the Methods of Mechanical Theorems by Archimedes.)


“We can prove that in these capsules, it’s as stable as in these bones, which have an excellent longevity,” he says. The team also developed a type of sunscreen for the silica capsules to block the effect of light.

The biggest danger to the data, however, is heat. Any chemical bond or structure you build to store information decays over time depending on temperature. Accelerated testing showed that data in glassed DNA could last 2,000 years at a temperature of around 10 degrees Celsius, but storage at -18 degrees Celsius extended its lifetime up to 2 million years.

Like any data storage method, DNA is not error free. Reinhard Heckel, also from ETH Zurich, developed an error-correction scheme for the DNA-encoded data based on the Reed-Solomon Codes, which are widely used in consumer data storage methods like DVDs and in satellite communications.

Because it’s still at the research stage, and there are no commercial tools to encode data into DNA or read the stored data, DNA storage is expensive. It costs about $1,500 to encode the 83 kilobytes of documents used by Grass in testing.


For now, the number of applications that require information to be stored for a million or even a thousand years may be limited, Grass acknowledges, but practically everyone has data they want to be accessible 10 years from now. Current storage methods like CDs or hard drives simply cannot offer that guarantee. A study from Backblaze showed that only 50% of hard disks will survive until their 6th birthday. A CD might last a decade. Magnetic tape has a lifetime of a few decades when stored in the right conditions. To make it last longer, all that data must be actively maintained by regularly transferring it from one medium to another. Methods like DNA could offer not just longevity but certainty.

“I think a lot of people are not enough aware of how fragile the information is that they store,” he says.

Who’s Going To Look At This Stuff And What Will They See?

Storing data for long durations is one challenge. Another is ensuring that the data will even be legible to whatever civilizations discover it in the future.

This kind of translation problem isn’t new. In 1799, a group of Napoleon’s soldiers were rebuilding a fort near the Egyptian town of el-Rashid. One of the men noticed something unusual embedded in a wall that the soldiers had been ordered to demolish: a grey stone slab covered in strange markings. The slab, which later became known as the Rosetta stone, repeated the same text in three languages: Greek, Egyptian hieroglyphics and Egyptian demotic (the everyday language of ancient Egypt). Acting as a type of dictionary, the stone allowed scholars to finally decipher hieroglyphics, a language whose meaning had been lost for 2,000 years.


All long-term data storage methods face the same problem 18th-century scholars had with hieroglyphics: how to decipher data from the past. Future readers need not only a device capable of reading the physical storage medium, but also an understanding of the data encoding. In other words, our descendants will need their own Rosetta stone.

Grass’s team used a simple enough code: DNA bases A and C for “0” and G and T for “1”. “For DNA, the theory is if we have a highly developed culture in the future, it will be interested in investigating its personal genome, and there will be tools to do that,” he says. “You can write down your (decoding) instructions on a piece of paper or engrave them into stone or gold.”

“You need a vocabulary of about 1,000 words to understand a dictionary. So there must be a guide to 1,000 keywords by using images,” says Miko Elwenspoek, a professor of engineering at the University of Twente in the Netherlands and one of the founders of the Human Document Project, which aims to preserve information about mankind for a million years.

“One million is just a big round number. What is meant is very long, much longer than our horizon,” he says. Such a project requires us to select what information to preserve and a find a suitable location for it. Elwenspoek’s preference is for the Moon.


“On Earth, one has to cope with the very active geology,” he says, not to mention the possibility of a nuclear or environmental disaster, or simple vandalism.

So what might we choose to transmit to the future? “Science and technology, art, music, philosophy, literature, religion,” says Elwenspoek. “Our daily little business. Start with all printed books: they would fit in perhaps 100 hard-disc drives.”

Not Your Typical Memory Sticks

Or in considerably fewer “superman crystals.” Peter G. Kazansky studies optoelectronics, the science of electronic devices that source, detect, and control light. Kazansky and his team at the University of Southampton in the U.K. have developed a technique for etching data in fused quartz crystals, which, under temperatures as high as 200 degrees Celsius, could keep data preserved, says Kazansky, “for the lifetime of our universe.”

Superman crystals diskKazansky Lab

Storing data for 13.8 billion years bears a passing resemblance to storing data on a regular optical disc. A CD or DVD is coated with a thin layer of organic dye. To burn the CD, a semiconductor laser creates gaps in the dye. Instead, Kazansky’s team uses ultrafast lasers to write on the nanoscale in quartz crystal.

“In normal CDs/DVDs, you create some kind of modification, like a hole, and then you can encode information in 0 and 1,” Kazansky explains. “We are not producing a simple hole. Inside of the focus [of the laser] we create another structure, like a grating, self-assembled inside glass,” by what he calls “some magic process” that he and his colleagues are still trying to understand.

Because the nanostructure created by the laser is more complex than a simple gap, it can contain more information than a 0 or 1. In fact, he says, it can store up to 256 bits. Additional information can be captured in both the orientation of the “grating” and its periodicity (the characteristics of the repeated pattern of atoms within it). This “five-dimensional” approach gives the Kazansky’s material a very high information density. A single CD-sized disk could store 360 terabytes of data, roughly equivalent to five Library of Congress’s worth of data.


As with DNA storage, the main obstacle to making this a practical storage medium is cost. The ultrafast laser used by the research team costs around $150,000, and the data is currently read back using a microscope. It takes three hours to write 2 MB of data–about the size of a 3.5-inch floppy disk–but the team thinks that write time can easily be reduced to about half an hour.

Longevity of current storage methods, including Kazansky’s 5D crystal storage methodKazansky Lab

Kazansky, who is also a member of the Human Document Project, cites the movie Interstellar when he describes an even bigger vision than a thousand millennia of storage.

“Some future beings, they can even manage to travel in time,” he says. “If they will find our disk, I hope they will be intelligent enough to read it, and they will be able to pass some information from the future to here.”

Doug Hansen, the CTO of M-DISC, has a more modest and practical goal for his optical disk: a millennium. “We are not touting 1,000 years because we think that is what most people want to do with their data,“ he says. “What we are trying to get across is that there is enough certainty here that you know that this is going to be good for a century or two.”

CDs and DVDs use organic, optical dyes that are vulnerable to light. Blu-rays often rely on inorganic materials but will fail when exposed to heat and humidity. M-DISC uses oxides, nitrides, and other compounds (the exact materials are a trade secret) that are, says Hansen, “a lot like stone in their characteristics.” An M-DISC can be read by any DVD or Blu-ray drive. Like other optical disks, M-DISCs keep best in a cool environment, but they “will typically last for several centuries or longer in your bedroom closet,” says Hansen.

Optical disks have the advantage of already being widely used as a storage format. Even if that changes in the future, “it’s very easy to make a player,“ says Hansen. “It’s very easy to find ways to decode the format, because they are all public standards maintained by things like ISO and things of that sort.” An M-DISC DVD costs around $2 when bought in bulk. A Blu-ray will set you back $4. Existing customers include businesses that are required by U.S. law to keep documents like tax records up to 50 years, professionals like photographers, and individual consumers.


“We are just coming for the first time in our history to where we really have to start dealing with our data,“ says Hansen. “This wasn’t a problem 100 years ago.” Even 20 years ago, digital data wasn’t a very large concern. “But as computer technology has freed us to be more creative, to capture more and record more, there’s a lot more to try and save, and that problem is not going to get smaller with time.”

About the author

Lapsed software developer, tech journalist, wannabe data scientist. Ciara has a B.Sc. in Computer Science and and M.Sc in Artificial Intelligence