DNA: The most efficient data storage system


Data Storage


Data storage is one of the software industry's most expensive activities but also the most important—companies like Google, Microsoft, etc., spend billions of dollars on data storage. These companies also provide services based on their data. Google's primary source of income is all the web page data it stores and returns through its search engine. So, it makes sense for these massive tech companies to spend billions on large storage called data centers. However, their storage systems are not the most efficient in the world. They take up a lot of energy to "stay alive" and require a lot of backup in case one copy gets destroyed. Here, we will discuss the most efficient method of storing data worldwide. <b><h2><center>Data Centers</b></h2></center> At its simplest, a data center is an organization's physical facility to house its critical applications and data. A data center's design is based on a network of computing and storage resources that enable the delivery of shared applications and data. So, data centers are a compound of multiple computers specialized to store and retrieve data as quickly as possible. Data centers can be as large as 2-3 floored buildings with large rooms dedicated to these specific data storage units. One data center can store multiple petabytes of data depending on its size. The larger the data center, the more investment is required for its maintenance and backup. <br><a href="https://ibb.co/6WBP9Qw" target="_blank" rel="noopener noreferrer"><img src="https://i.ibb.co/VqmSn69/google-datacenter.jpg" alt="google-datacenter" border="0"></a> <b><h2><center>DNA</b></h2></center> The human genome contains the complete <b>genetic information of the organism as DNA sequences stored in 23 chromosomes</b> (22 autosomal chromosomes and one X or Y sex chromosome), structures that are organized from DNA and protein. A DNA molecule consists of two strands that form the iconic double-helix “twisted ladder,” whose backbone, which is made of sugar and phosphate molecules, is connected by rungs of nitrogen-containing bases. DNA is composed of 4 different bases: <b>Adenine (A), Thymine (T), Cytosine (C), and Guanine (G)</b>. These bases are always paired in such a way that Adenine connects to Thymine, and Cytosine connects to Guanine. <br><a href="https://ibb.co/34b3Dh3" target="_blank" rel="noopener noreferrer"><img src="https://i.ibb.co/xXbkp5k/DNA.jpg" alt="DNA" border="0"></a> <b><h2><center>Talking Numbers</b></h2></center> Organized into 23 pairs of chromosomes, humans carry DNA in every diploid cell of our bodies. If the DNA of a single cell is unfolded, it reaches a length of 2 meters. Since it is perfectly organized and compact. However, the diameter of a nucleus containing the entire human genome measures no more than 6 µm. On top of all that, the DNA manages to encode data for 20,000 to 25,000 proteins in only 4 letters. If converted to digital media, a diploid genome can store 1.5 gigabytes of data. Now, consider that the human body consists of 100 billion cells! Since DNA has the ability to encode 2 bits per nucleotide, one gram of dried DNA can store 455 exabytes (EB) of data. Here is the comparison of DNA storage with the existing storage systems. <br><a href="https://ibb.co/f4xBBBW" target="_blank" rel="noopener noreferrer"><img src="https://i.ibb.co/Gc5BBBD/DNA-comparison-to-current-devices.png" alt="DNA-comparison-to-current-devices" border="0"></a> For comparison, the world has around 300 exabytes of digital data as of 2023. So it would take less than 1 gram of DNA to store all the world's digital data. Currently, Google has more than 120 data centers in place to store around 15 EB of data. It takes around $2M-$3M to maintain data in these data centers. However, with DNA, they can store all data in their offices instead of storing it in separate data centers, which would take negligible space. <b><h2><center>How to Store Data in DNA?</b></h2></center> DNA encodes data in 4 bases (A, T, G, and C). We can use these 4 bases by mapping them to 1s and 0s to store data and read it later. For example, since there are 4 bases, we must encode them in 2 binary digits. Let's say A is 00, T is 01, G is 10, and C is 11. Now, we can convert the characters into A, T, G, and C. For example, if we use ASCII values, which require 8 binary digits to identify each character uniquely, we can store these ASCII values with 4 bases. For example, the character 'a' is given by number 97 in ASCII. 97 in binary is 01100001. If we follow the above mapping of DNA bases to binary, 'a' can be written as TGAC. This is what we will store in DNA for the character 'a'. <br><a href="https://ibb.co/jyq5fNx" target="_blank" rel="noopener noreferrer"><img src="https://i.ibb.co/7nBzCfT/DNA-read-write.png" alt="DNA-read-write" border="0"></a> <b><h2><center>Advantages of DNA</b></h2></center> DNA is at least 1000-fold more dense than the most compact solid-state hard drive and at least 300-fold more durable than the most stable magnetic tapes. In addition, DNA’s four-letter nucleotide code offers a suitable coding environment that can be leveraged, like the binary digital code used by computers and other electronic devices to represent any letter, digit, or other character. <b>DNA storage is potentially less expensive, more energy-efficient, and longer lasting</b>. Studies show that DNA properly encapsulated with salt remains stable for decades at room temperature and should last much longer in the controlled environs of a data center. DNA doesn’t require maintenance, and files stored in DNA are easily copied for negligible cost. <b><h2><center>Current Projects</b></h2></center> The idea of DNA as digital storage was introduced in the 1950s. The research has been going on since. One of the earliest uses of DNA storage occurred in a 1988 collaboration between artist Joe Davis and researchers from Harvard University. The image, stored in a DNA sequence in E.coli, was organized in a 5 x 7 matrix. It contained 1s and 0s, with 1 depicting the light pixels and 0 the dark ones. In the last 10 years, there has been a lot of improvement, especially with EBI's introduction of error-correcting codes for DNA storage in a 2013 research paper. In March 2018, the University of Washington and Microsoft published results demonstrating the storage and retrieval of approximately 200MB of data. The research also proposed and evaluated a method for randomly accessing data items stored in DNA. In March 2019, the same team announced they had demonstrated a fully automated system to encode and decode data in DNA. <b><h2><center>Davos Bitcoin Challenge</b></h2></center> On January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI), one of the original authors of the 2013 Nature paper, announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting in Davos. DNA tubes were handed out to the audience during his presentation, with the message that each tube contained the private key of precisely one bitcoin, all coded in DNA. The first one to sequence and decode the DNA could claim the Bitcoin and win the challenge. The challenge was set for three years and would close if nobody claimed the prize before January 21, 2018. Almost three years later, on January 19, 2018, the EBI announced that a Belgian PhD student, Sander Wuyts, of the University of Antwerp and Vrije Universiteit Brussel, was the first to complete the challenge.

- Ojas Srivastava, 12:27 PM, 20 Apr, 2024

Data storage