The Storage Crisis

While not all of that information needs to be saved, today the world is creating mind-boggling volume of data faster than the capacity to store with the current storage technologies.

According to IBM research, the digital universe comprising all the digital information creates 2.5 quintillion bytes of data each day and is expected to hit 44 trillion gigabytes by 2020. That’s a tenfold increase compared to 2013 and roughly 90% of the world’s data was created in the last two years alone. These enormous data is said to enough for filling more than six stacks of computer tablets stretching to the moon.

Now science is looking to nature to find the best way to store these overwhelming data.

DNA as next generation storage

In nature, DNA (Deoxyribonucleic Acid) stores biological information acting as genetic blueprints for all living organisms. Recently scientists have discovered that a single molecule of DNA can be used for storing vast quantities of information for eternity.

In 2012 researchers at Harvard Medical School published that they had stored an entire book in DNA: George Church’s “Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves”. Also researchers at the European Molecular Biology Laboratory have encoded audio, image, and text files into a synthesized DNA molecule about the size of a dust grain, and then retrieve the information from the DNA with 99.99 percent accuracy.

In addition to academic or research institute, innovator companies like Microsoft is also trying to capitalize on the vast opportunities seen in DNA as storage system. Recently, Microsoft has purchased ten million DNA sequences from a tech startup called Twist Bioscience, specifically for research into data storage. Microsoft is known to be working on DNA Storage in collaboration with the University of Washington.

DNA molecules can store information many millions of times more densely than existing digital storage technologies like flash drives, hard drives, magnetic and optical media. An external hard drive for instance is about the size of a paperback book, can store about five terabytes of data and might last 50 years. In contrast, one gram of DNA could fit on a coin, store 455 exabytes of memory (that’s more than all the current digital data in the world, by a huge margin) and can survive for millions of years, a digital archive encoded in this form could be recovered by people for many generations to come.

To store a binary digital file as DNA, the binary digits are converted from 1 and 0 to the letters A, C, G, and T. These letters represent the four main compounds in DNA: adenine, cytosine, guanine, and thymine. The physical storage medium contains these four compounds (ie A, C, G, and T) in a sequence corresponding to the order of the bits (ie 0 and 1) in the digital file. To recover the data, the sequence A, C, G, and T representing the DNA molecule is decoded back into the original sequence of bits 1 and 0.

High throughput techniques with automated systems have made tailoring of DNA molecule from a digital code much easier, and high-throughput replication techniques that can create thousands of copies in just an hour or two.

The downside of DNA

The foremost limitation of DNA storage for practical use today are its exorbitant cost of making DNA for digital storage, slow encoding speed with access times of many hours to days.

The speed issue limits the technology’s promise for archiving purposes nearly, although with the technological advancement the speed may improve to the point where DNA storage can function effectively for daily applications as primary storage.

As for the cost, expenses may come down to the point as advances in technology for sequencing have already brought down from several millions of dollars to just hundreds of dollars now. For now DNA is best suited for archival purpose, rather than where files need to be accessed instantly.