Sunday, 5 February 2012

DNA as a digital data storage medium

This was an idea to use DNA to store data, since it's now very cheap to sequence DNA. In the same way people use magnetic tape for long term storage, you could back up to DNA...

The idea was developed with the help of Utopiah, Kanzure, and others in irc://irc.freenode.net/##hplusroadmap

and was added to the awesome (but sadly now historical) Seedea project, here:
http://fabien.benetou.fr/innovativ.it/www/HistoricalArchives/Seedea/Oimp/MetaDNA

where details were thrashed out by various people.

This idea depends on DNA printing technology.





EDIT: Here is the text from the Seedea page (around 2009)



metaDNA

General discussion

use the dedicated page.

Goal

To encode data as DNA, allowing the storage of vast quantities of data 'in a cupboard'. Advantage: we get massive data transfer rates by shipping DNA (e.g using UPS).

Abstract

New advances in DNA sequencing technology promise to revolutionize the fields of biology and health care. The human genome project, initiated in 1990, took just 13 years to complete at a cost of approximately $3 Bn [cite HGP]. Today, obtaining the complete sequence of an individual costs 100,000 times less at approximately $30,000, and takes approximately 1 day to obtain [cite 1000 GP]. The acceleration in the progress of DNA sequencing technology shows no sign of slowing. It is estimated that capacity increases by a factor of 100 every year [cite Richard Durbin, personal communication].
However, these phenomenal technological advances in the field of molecular biology, they have created a new bottleneck in the scientific discovery pipeline. Namely, the cost of data storage.
DNA can store 1021 bits per gramhttp://www.sciencemag.org/cgi/reprint/296/5567/478.pdf. This compares favorably with conventional storage at around 1014 bits per gram, (blu-ray: 200GB/16g) for a one-million-fold improvement. How to effectively utilize this awesome storage capacity?
Here we propose an alternative storage medium for long term archival of data, DNA. We present a DNA encoding algorithm that is optimized for data recovery, outline a novel design for a microfluidic DNA sequencing chip and describe a DNA protectant that will allow for long term storage of DNA in ambient conditions.

Problem abstraction

The problem with 'next generation' DNA sequencing (nextGen) is that it is too good. The technologies are generating too much data too quickly. Simply put, we don't have enough hard disks to keep pace with the data storage requirements.
How do you cope with this situation?

In situ example

  1. Company A gather a sample S from a living organism
  2. Company A studies it and produces a result R that is a very large amount of data including specific DNA samples (original and modified)
  3. Company A works for a Client K that requires additional work on R and eventually S by company B
  4. R+S are information that needed to be shipped as fast as possible by A to B
  5. We encode R+S in P thanks to our specific method and ship it to B

Proposed solutions

Design a 'DNA encoding' that maximizes ease of reading
  1. the DNA encoding - lots of check sums and handling of repeat regions
  2. the 'DNA protectant molecule that we use to store data at rtp

Complete process

  1. ?
  2. design a micro fluidic dna sequencing chamber
  3. ?

Opportunity

microfluidics is getting very cheap, so its easy to design and print a 'chip' that will control the flow of ATCG into a reaction chamber.

Business model

  • cost optimization to advertize during the difficult time of "pipe cloging" and energy cost (logistic, network congestion)

Market and trends

  • Familybuilder DNA on Sale: Familybuilder Introduces Low Cost Testing
  • AsperBio How to send DNA samples?

Alternatives

  • efficient data compression. One human is much like another at the genetic level. Perhaps we can simply compress the data produced by the personal genomics initiative (for example) against a 'reference' genome.

To explore

  • "retarded polymerase"

References

  • article A from X written by Y on date Z
  • book A from X written by Y on date Z
  • ...

Important related patents

Relations that would be interested

  • Paola (positive about the idea but doubtful about the ethical or moral slippery)
  • Laurent (feedback yet to ask)
  • Kenza (feedback yet to ask)
  • Contacts from KAO (genetic engineering in the BayArea)

Marketing

Project name

  • metaDNA doesn't really sums it up.
  • EnCodeMe
  • DNAStore
  • DNA bank
  • Molecular Storage
  • DNCode
  • DNAta
  • DNA backup
  • DNA Storage
  • DNA data
  • DNA backup
  • backDNA
  • DNAstics
  • DNA Logistics

Tag line

DNA is a fantastic storage medium. It has a track record of 4 billions years.
DNA, it'll store your ass off.

To explore

1 comment:

Kill Face said...

http://www.nature.com/news/synthetic-double-helix-faithfully-stores-shakespeare-s-sonnets-1.12279