All
List View
Title
Post
Loading...

DNA Data Storage Could Redefine Cold Archives in the AI Data Era

Future Storage Column

DNA Data Storage
May Not Replace SSDs,
But It Could Redefine Cold Data

The world is generating data faster than conventional storage can comfortably absorb. DNA offers an extreme answer: ultra-dense, long-lived storage that uses almost no power after writing.

A futuristic data storage image showing binary code turning into A, C, G, and T DNA letters, with server racks, a glowing DNA helix, and a sealed archive vial, symbolizing ultra-dense long-term cold data storage.

Computers speak in binary. Every photo, video, document, model weight, medical record, and software file is ultimately converted into 0s and 1s. Life, however, stores information in a different alphabet: A, C, G, and T.

That is the basic idea behind DNA data storage. Digital data can be converted into DNA sequences, synthesized into artificial DNA strands, stored physically, and later read back through sequencing. The output can then be converted back into ordinary computer data.

The idea sounds strange at first. But biologically, it is logical. DNA is nature’s information-storage system. It can preserve genetic instructions in tiny physical space for very long periods. If biology can store life’s code this way, engineers are now asking whether human civilization can store digital memory the same way.

DNA storage begins by translating binary into four letters

A computer stores information with two symbols: 0 and 1. DNA uses four chemical bases: adenine, cytosine, guanine, and thymine. These are usually written as A, C, G, and T.

A simple encoding system can map binary pairs into DNA bases. For example, 00 could become A, 01 could become C, 10 could become G, and 11 could become T. In reality, practical DNA storage systems use more complex encoding because they must avoid error-prone patterns, manage sequencing mistakes, and include correction codes.

But the broad process is easy to understand. A digital file is broken into binary code. That binary code is translated into DNA letters. A synthetic DNA molecule is manufactured with that sequence. Later, a sequencer reads the DNA, software decodes the sequence, and the original file is reconstructed.

In a hard drive, data is stored as electronic or magnetic states. In DNA storage, data is stored as a chemical sequence.

The first advantage is extreme density

The most powerful argument for DNA storage is density. Researchers often cite the estimate that one gram of DNA can theoretically store around 215 petabytes of data. One petabyte equals 1,000 terabytes, or 1,000,000 gigabytes.

That means DNA can store information at a scale that looks almost absurd compared with conventional devices. A data archive that would require racks of storage hardware could theoretically be compressed into a physical amount of DNA measured in grams.

This matters because data is growing faster than physical infrastructure. AI training data, video archives, scientific instruments, financial records, medical images, government archives, satellite data, and corporate backups all keep expanding. Even if SSDs and HDDs keep improving, physical footprint, cooling, electricity, replacement cycles, and maintenance become serious constraints.

DNA does not solve every storage problem. But for very large archives, its density changes the imagination of what storage infrastructure could look like.

DNA’s value is not that it makes today’s SSD faster. Its value is that it makes massive long-term archives physically tiny.

The Colossus comparison shows why density matters

Elon Musk’s xAI Colossus supercomputer is a useful way to understand the scale of the current AI infrastructure race. xAI says Colossus was built in 122 days and later doubled to 200,000 H100 GPUs in a single interconnected cluster. Reuters has also reported xAI’s plan to expand toward at least one million GPUs.

That kind of AI data center is not just a computing facility. It is an industrial-scale project. It needs buildings, power, cooling, networking, servers, storage, water infrastructure, backup systems, maintenance teams, and constant capital spending.

AI infrastructure therefore creates two different storage problems. The first is active storage, where data must be accessed quickly during training and inference. DNA is not suitable for that. The second is long-term storage, where datasets, model versions, legal records, research archives, and historical data need to be preserved for years or decades. DNA could eventually compete in that second category.

If a future archive required 5,000 petabytes of long-term storage, the theoretical DNA equivalent at 215 petabytes per gram would be roughly 23 grams of DNA. That is not a commercial engineering estimate, because real systems need packaging, redundancy, error correction, indexing, and retrieval hardware. But it shows the direction of the density advantage.

Conventional storage needs space and electricity. DNA storage needs synthesis, sequencing, and careful preservation. The cost profile is completely different.

The second advantage is almost zero idle power

Data centers consume power even when data is not being actively used. Drives must be maintained. Servers must be cooled. Backup systems must remain available. Storage media must be refreshed or migrated before failure.

DNA is different. Once information is written into stable synthetic DNA and stored in a dry, protected capsule, it does not need electricity to keep existing. There is no spinning disk. There is no powered memory cell. There is no cooling requirement comparable to live data-center hardware.

This is why DNA storage is attractive for archival use. Many organizations hold data that must be preserved but rarely accessed. Film archives, government records, scientific datasets, legal documents, historical records, medical archives, and AI training datasets may need to survive for decades. They do not all need millisecond access.

For that kind of data, energy cost matters differently. The question is not how fast the data can be retrieved every second. The question is how safely and cheaply it can be preserved over long periods.

DNA storage is not designed for files you open every day. It is designed for data you cannot afford to lose for a very long time.

The third advantage is longevity

HDDs and SSDs have finite service lives. Hard drives have mechanical parts and magnetic media. SSDs have limited write endurance and electronic failure risk. Enterprise storage systems must be replaced, copied, refreshed, and migrated on regular cycles.

DNA can last much longer if stored properly. Ancient DNA can still be extracted from fossils under favorable conditions. Synthetic DNA stored dry and protected from heat, light, and moisture could remain readable for extremely long periods.

This is not a promise that every DNA archive will survive forever. Storage conditions still matter. DNA can degrade under heat, humidity, ultraviolet light, and chemical exposure. But compared with magnetic or electronic storage, the long-term preservation potential is very different.

For archives that need to survive beyond one hardware generation, this is a major advantage. Digital civilization has a hidden problem: even if the data exists, the device needed to read it may become obsolete. DNA, by contrast, is a universal biological format. As long as sequencing technology exists, the storage medium remains interpretable.

Microsoft proved the concept years ago

Microsoft and the University of Washington were among the early leaders in DNA data storage research. In 2016, they announced that they had stored 200 megabytes of data in DNA strands. The stored files included text, images, and other digital content, and the team demonstrated error-free recovery.

Microsoft and University of Washington researchers later demonstrated automated DNA storage and retrieval with a small “hello” message. The important point was not the size of that file. It was the automation. A practical DNA storage system cannot depend on manual laboratory work forever. It must eventually become an integrated write-store-read machine.

CATALOG also drew attention by encoding the English-language text of Wikipedia into synthetic DNA. That demonstration showed that DNA storage could move beyond tiny symbolic files and into larger archive-like examples.

These milestones do not mean DNA storage is commercially ready at scale. But they show that the basic pipeline works: encode, synthesize, store, sequence, decode, and recover.

The question is no longer whether data can be stored in DNA. The question is whether it can be stored cheaply, quickly, and reliably enough for real customers.

The biggest weakness is speed

DNA storage has one obvious weakness: it is slow.

Writing data means synthesizing DNA. Reading data means sequencing DNA and decoding the result. These processes are much slower than reading from an SSD, HDD, or tape library.

That makes DNA storage unsuitable for hot data. Hot data is data that must be accessed frequently and quickly. Databases, operating systems, active AI training pipelines, real-time video processing, financial trading systems, and cloud applications need fast access. DNA cannot compete there.

Cold data is different. Cold data is stored mainly for preservation, compliance, legal, scientific, cultural, or strategic reasons. It may be retrieved rarely. In that market, speed is less important than density, durability, cost per long-term archive, and energy use.

This is why DNA storage should not be framed as a direct threat to Samsung Electronics or SK hynix in the near term. DRAM, NAND, SSDs, and HBM serve active computing and high-speed storage needs. DNA storage is aimed at a different layer of the data stack.

DNAformer shows how AI may help solve the read problem

One promising area is faster and more accurate reading. In 2025, Technion researchers introduced DNAformer, an AI-based method for DNA data retrieval. The approach uses deep learning to improve decoding speed and accuracy when reading DNA-stored information.

Reports from Technion and related coverage described a major improvement in reading speed, including retrieval of 100 megabytes of data far faster than previous high-accuracy methods. The system also helps address errors such as base deletion, substitution, and ordering problems.

This matters because DNA storage is not only a chemistry problem. It is also an information-theory and software problem. DNA synthesis and sequencing create noise. Error-correction systems must reconstruct the original data even when some strands are damaged, missing, duplicated, or misread.

AI can help by recognizing patterns, correcting errors, and improving decoding efficiency. In that sense, AI may not only create more data to store. It may also help make DNA storage practical enough to preserve that data.

AI is increasing the storage problem. But AI may also become part of the storage solution.

Atlas Data Storage shows the field is moving toward commercialization

In 2025, Twist Bioscience spun out its DNA data storage business into a new independent company called Atlas Data Storage. Atlas launched with $155 million in seed financing from investors including ARCH Venture Partners, Deerfield Management, Bezos Expeditions, Tao Capital Partners, Earth Foundry, Rsquared VC, In-Q-Tel, and others.

That investor list is important. Bezos Expeditions shows interest from long-term technology capital. In-Q-Tel, the strategic investor linked to the U.S. intelligence community, suggests that national-security and archival applications may be part of the market.

Atlas says it is building synthetic DNA storage for critical assets that need to remain readable long after hardware becomes obsolete. Later reports described Atlas’s Eon 100 offering as a DNA-based storage service aimed at dense, durable long-term archiving.

This does not mean DNA storage is ready for mass consumer adoption. Ordinary users will not replace external hard drives with DNA capsules soon. But enterprise archives, government agencies, media preservation groups, AI labs, and scientific institutions are different customers. They may pay for long-term preservation if the economics work.

The target market is not personal storage. It is institutional memory.

DNA storage makes the most sense where data has high long-term value and low access frequency.

One example is AI training data. Model developers may need to preserve datasets, model snapshots, training logs, benchmark records, alignment data, and compliance archives. Not all of that must be accessed every day, but losing it could be costly.

Another example is media. Film studios, broadcasters, streaming platforms, and cultural institutions hold enormous archives. Some content may not be commercially active today, but it can still have legal, historical, or future monetization value.

Healthcare is another candidate. Genomic records, medical images, clinical trial data, and regulatory records may need long retention. Financial institutions also face retention requirements for transaction records, contracts, and compliance documents.

In these markets, the storage question is not only capacity. It is durability, auditability, physical footprint, and migration cost. DNA storage could become attractive if it reduces the need to repeatedly copy old archives from one aging storage format to another.

Why this does not kill NAND, HDD, or tape

DNA storage should not be read as a near-term replacement for today’s storage industry. Each storage medium serves a different need.

DRAM is for working memory. NAND and SSDs are for fast storage. HDDs are still useful for large-scale cloud and enterprise capacity. Tape remains important for cold archives because it is relatively cheap and mature. DNA is trying to enter the archive layer with a radically different density and longevity profile.

The biggest barriers are cost, write speed, read speed, automation, standardization, and retrieval workflow. DNA synthesis is still too expensive for broad storage replacement. Sequencing is improving, but retrieval remains slower and more complex than conventional systems.

That means DNA storage is likely to begin as a premium archival product. It may first serve customers with special preservation needs rather than ordinary cloud storage users.

DNA storage is not a faster SSD. It is a possible successor to parts of the deep archive market.

The economics depend on synthesis cost

The hardest commercial problem is writing data. DNA must be synthesized, and synthesis cost remains a major bottleneck.

Reading DNA has benefited from the rapid improvement of sequencing technologies. Writing DNA at low cost and high throughput is harder. Until synthesis becomes much cheaper, DNA storage will remain expensive compared with tape and other archive media.

This is why companies such as Twist and Atlas matter. Twist’s core business is synthetic DNA manufacturing. If DNA storage is going to scale, it needs industrial DNA writing, not handcrafted laboratory synthesis.

In the long run, the market will watch three numbers: cost to write per terabyte, time to retrieve per file, and total cost of preservation over decades. DNA does not need to win on every metric. It only needs to win decisively in the specific archive markets where density and longevity matter most.

The strategic importance may be larger than the commercial market at first

DNA storage may become strategically important before it becomes a mass market.

Intelligence agencies, defense organizations, research institutions, national archives, and large AI labs all have reasons to preserve sensitive or high-value data for very long periods. They may also care about physical compactness and offline security.

A DNA archive can be physically tiny and disconnected from networks. That does not automatically make it secure, but it changes the attack surface. A capsule containing synthetic DNA is not the same kind of target as an internet-connected server.

This is why government-linked interest is unsurprising. For strategic archives, the question is not whether the technology is cheap enough for consumers. The question is whether it can preserve critical information better than existing media under extreme long-term conditions.

What investors should watch

The first thing to watch is synthesis cost. If DNA writing becomes dramatically cheaper, the market opportunity expands.

The second is retrieval speed. Technologies such as DNAformer are important because archive customers still need practical access. Cold data does not need instant retrieval, but it cannot take too long or be too expensive to recover.

The third is commercial packaging. DNA must be stored in formats that enterprises can handle: capsules, cartridges, indexing systems, software interfaces, audit trails, and retrieval workflows.

The fourth is customer adoption. Media archives, AI labs, government records, scientific data centers, financial institutions, and healthcare organizations will be the early test markets.

The fifth is standardization. A storage medium only becomes trusted when future users believe they can still read it decades later. DNA has the advantage of biological universality, but encoding standards, metadata, and error-correction schemes also need durability.

Conclusion: DNA storage is not science fiction anymore, but it is still not ordinary storage

DNA data storage has moved beyond pure imagination. Researchers have encoded and recovered real digital files. Microsoft and the University of Washington demonstrated hundreds of megabytes. CATALOG stored Wikipedia text. Technion’s DNAformer shows how AI can improve retrieval. Atlas Data Storage is trying to turn synthetic DNA storage into a commercial archive product.

But the technology is still early. It is slow compared with conventional storage. Writing costs must fall. Retrieval workflows must improve. Enterprise systems must become easier to use.

The right way to understand DNA storage is not as a replacement for SSDs, HDDs, or HBM. It is a new category for data that must be preserved for a very long time with extreme density and low idle power.

That market may become more important as AI makes the world’s data problem bigger. The more data humanity creates, the more valuable ultra-dense, low-power, long-term storage becomes.

The simplest way to read DNA data storage is this: it will not run your apps or train AI models in real time, but it may become one of the most powerful ways to preserve the data that future AI, science, finance, medicine, and culture cannot afford to lose.

Related Recent Coverage 🔗