University of Washington Three image files were synthesized and sequenced for the DNA data storage experiments.
The Microsoft and UW researchers said they developed "a novel approach" to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine - represented as As, Gs, Cs and Ts.
To access the stored data, the researchers encode the equivalent of zip codes and street addresses into the DNA sequences. Polymerase Chain Reaction (PCR) techniques — commonly used in molecular biology — help them more easily identify the zip codes they are looking for.
Using DNA sequencing techniques, the researchers can then "read" the data and convert it back to a video, image or document file by using the street addresses to reorder the data.
"How you go from ones and zeroes to As, Gs, Cs and Ts really matters because if you use a smart approach, you can make it very dense and you don't get a lot of errors," said co-author Georg Seelig, a UW associate professor of electrical engineering and of computer science and engineering.
The Microsoft and UW researchers announced their breakthrough at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems.
"DNA is an attractive possibility," the researchers said, because it is extremely dense, with a theoretical limit that is eight orders of magnitude denser than tape. Magnetic tape technology can store as much as 185TB on a single cartridge that can fit in the palm of your hand.
The Microsoft and UW researchers also confirmed synthetic DNA's longevity, saying it has a half-life of more than 500 years in harsh environments. Tape cartridges have a lifespan of 10 to 30 years and hard disk drives are rated to last three to five years, the researchers noted.
The U.S. researchers emphasized the need for a more dense archival medium as all the data contained in our computers, historic archives, movies, photos and businesses systems and mobile devices worldwide is expected to hit 44 trillion gigabytes by 2020, according to The Digital Universe research paper from IDC and EMC.
"That's a 10-fold increase compared to 2013, and will represent enough data to fill more than six stacks of computer tablets stretching to the moon. While not all of that information needs to be saved, the world is producing data faster than the capacity to store it," the researchers said in their paper.
A DNA storage system still has problems that must overcome before its ready for commercial use. First, DNA synthesis and sequencing is far from perfect, with error rates on the order of 1% per nucleotide. A key aspect of DNA storage will be to devise appropriate encoding schemes that can tolerate errors by adding redundancy.
Sign up for Computerworld eNewsletters.