The 1000 Genomes Project, an ambitious international research project to completely sequence the genomes of 1,000 people from all over the world in three years, announced this week that it will be getting help from three companies in the form of new sequencing technology.
“It is a win-win arrangement for all involved,” consortium co-chair Richard Durbin of Wellcome Trust Sanger Institute said in a statement. “The companies will gain an exciting opportunity to test their technologies on hundreds of samples of human DNA, and the project will obtain data and insight to achieve its goals in a more efficient and cost-effective manner than we could without their help.”
A half-dozen people have had their entire genomes sequenced up to now, including scientists James Watson, Craig Venter and most recently, Marjolein Kriek. Within three years, the number of sequenced people is expected to go up at least 17,000 percent due to the 1000 Genomes Project alone.
Over the next year, California-based companies Applied Biosystems of Foster City (which on Wednesday became part of another California-based biotech company, Invitrogen of Carlsbad) and Illumina Inc. of San Diego, as well as Branford, Connecticut-based 454 Life Sciences will each sequence about 75 billion DNA bases. As each human being has about 3 billion DNA base pairs, that’s roughly equivalent to looking at 25 human genomes.
But there’s a difference between thoroughly scanning a genome sequence and just skimming it. The 1000 Genomes Project, which builds off the International HapMap Project, has outlined three pilot projects for this year that involve both kinds of sequencing in order to figure out how to achieve their end goal.
First, researchers want to do very thorough, high-resolution genome sequences for two families of three people, each composed of two parents and their child. By going over their genetic information an average of 20 times for any given point in the genome, the researchers will get practice with the new sequencing technologies, make sure they don’t miss any genetic variation and get a good sense of each individual’s genetic makeup.
The second project involves producing low-resolution genome sequences of 180 people – in which they’ll go over the information a minimum of only two times – in order to discover genetic variations that appear in about 1% of the population.
The final pilot will have researchers sequencing the protein-coding regions of 1,000 genes in 1,000 people in more depth, to identify rarer genetic differences – the kind that show up in roughly one person in 1,000 – in these specific sections of greater research interest.
Since the project’s launch in January, it has sequenced and deposited into public research databases in Europe and the United States 240 billion bases of genetic information. The project’s goal is use the information to provide biomedical researchers a wider perspective on human genetic variations.
How much information could researchers have if each of the 1,000 people whose DNA is part of the project receives the same high-resolution 20-fold sequencing being done on the six people in the first small-scale effort? That data – a whopping 60 terabases worth – would need to be stored on either 375 160GB iPod Classics, 13,000 DVDs or 85,700 CDs.