
Chemical markers attached to pre-fabricated units of DNA can easily encode data. Credit: Nobeastsofierce/SPL
DNA has been humanity’s go-to data repository for millennia. Tough and compact, it is so information-dense that just one gram of it can hold enough data for 10 million hours of high-definition video.
But there is always room for improvement.
An innovative method now allows DNA to store information as a binary code — the same strings of 0s and 1s used by standard computers. That could one day be cheaper and faster than encoding information in the sequence of the building blocks that make up DNA, which is the method used by cells and by most efforts to harness DNA for storing artificially generated data.
The method is so straightforward that 60 volunteers from a variety of backgrounds were able to use it to store the text of their choice. Many of them initially didn’t think the technique would work, says Long Qian, a computational synthetic biologist at Peking University in Beijing and an author of the study1 describing the technique.
“When they saw the sequence and got back the correct text, that’s when they started to believe that they could actually do it,” she says. The study was published today in Nature.
Short memory
The technique is just one of many efforts to transform DNA into a sustainable replacement for standard, electronic storage options, which are unable to keep up with the world’s mushrooming data production. “We’re reaching physical limits,” says Nicholas Guise, a physicist at the Georgia Tech Research Institute in Atlanta. “And we’re generating more and more data all the time.”
How to make your scientific data accessible, discoverable and useful
DNA’s enormous storage capacity makes it an appealing alternative. What’s more, if shielded from moisture and ultraviolet light, DNA can last for hundreds of thousands of years. By contrast, electronic hard drives need to be replaced every few years, or data become corrupted.
The most obvious way to store information in DNA is by incorporating the data into the DNA sequence, a process that requires a DNA strand to be synthesized from scratch. This approach is slow and many orders of magnitude more expensive than electronic data storage, says Albert Keung, a synthetic biologist at North Carolina State University in Raleigh.
To develop a cheaper, faster way, Qian and her colleagues, looked to the ‘epigenome’ — a variety of molecules that cells use to control gene activity without modifying the DNA sequence itself. For example, molecules called methyl groups can be added to or removed from DNA to modify its function.
Qian and her colleagues developed a system in which a series of short, prefabricated DNA “bricks” — with or without methyl groups — could be added to a reaction tube to form a growing DNA strand with the correct binary code. To retrieve the data, the researchers use a DNA-sequencing technique that can detect the methyl groups along the DNA strand. The results can be interpreted as a binary code, with the presence of a methyl group corresponding to a 1, and the absence to a 0.
Panda portrait in DNA
Because the technique uses prefabricated fragments of DNA, it could be further optimized for bulk production, says Keung. That would make it much cheaper than synthesizing a bespoke DNA strand for each bit of information to be stored. The next hurdle, he says, will be to see how well the system scales up to accommodate large sets of data.
As a step towards that goal, Qian and her colleagues encoded and then read out the instructions to make an image of a tiger rubbing from the Han dynasty in ancient China and a colour picture of a panda in lush green surroundings. The images were encoded in nearly 270,000 1s and 0s, or ‘bits’.
For now, the field still needs to bring costs down before it can compete with electronic data storage, Guise says. “DNA storage has a long way to go before it could become commercially relevant,” he says. “But there’s a need for disruptive technology.”


 
                                    