$5000 Human Genome Sequenced, Extinct Retroviruses and All
Complete sequencing of the human genome, with all of its repeat sequences stacked end on end, is quite the task. Of the 3 billion base pairs, only ~3% are ‘genes’ - the rest is repeated ‘junk’ DNA that to a large extent was laid down by old retroviral insertions. All DNA sequencing technologies are based on generating ‘reads’ that have lengths of 10-20 base pairs, and then aligning those reads (usually to reference genome). When your read is in the middle of a repeat sequence, you have no way to place that repeat sequence anywhere in the genome. Current DNA sequencing technologies such as 23andme utilize hybridization arrays to look for single nucleotide polymorphisms (SNPs) within certain genes, interrogating several hundred thousand base pairs - nowhere near a complete genome. The last 10 years of DNA research is based on analyzing these SNPs to assign them a role in disease processes, drug metabolism, etc. However, new data and theories are assigning roles for development and disease processes to this ‘junk’ DNA - elucidating where it is in individuals is now a priority. Current sequencing technologies cost about $100,000 dollars and take longer than a year to complete. Complete Genomics will sequence everything, junk and all, for $5000. They plan on 1000 genomes for 2009, and 20,000 genomes in 2010. At this cost and scale, nearly everyone in the U.S. may soon have their own genome in hand!