The structure of DNA was first publicly described 55 years ago at Cold Spring Harbor Laboratory (CSHL) on Long Island in New York by James Watson. Thursday night, the now 80-year-old Watson opened up the 2008 Personal Genomes meeting at CSHL by telling the story of the origins of the Human Genome Project, which he headed from 1990 to 1992. In 2003 the Human Genome Project produced a (nearly) complete reference DNA sequence of a human genome that is now essential to basic and applied human genetic research.
These days, Watson pointed out, scientists are able to read the DNA letters of the double helix so quickly and inexpensively that it is becoming practical to sequence the genomes of large numbers of people. With this progress comes a flood of research questions, technological challenges, and hope that these insights from the lab will translate into advances in personalized medicine.
Watson was followed on Thursday night by Francis Collins, who also followed him as director of the Human Genome Project, and later by Mary-Claire King, the renowned breast cancer geneticist from the University of Washington. Collins pointed out that health care costs have risen steadily over the years to the current level of roughly 20% of the US GDP. How much of this is spent on treatments that might have been identified as unnecessary with the availability of genetic information? He suggested that widespread genomic sequencing and analysis could lead to the discovery of the genetic causes of common diseases, such as lung cancer and Type II diabetes, for which some genetic links are now known, but much more remains to be learned.
Mary-Claire King considered breast cancer as a case study for personalized medicine. In the case of breast cancer, she noted, there are more than a thousand known mutations in each the genes BRCA1 and BRCA2 that can predispose a woman to the disease. Many of these are unique to specific families, or specific localities — she gave the example of one BRCA mutation endemic to a Norwegian valley. King illustrated through recent breast cancer studies that linking newly-discovered mutations to disease is a formidable technical challenge, but emphasized that the rewards for succeeding in doing so would be immense: roughly 5% of new breast cancer cases in the US each year – around 10,000 – are linked to known BRCA1/2 mutations, and thus might have been prevented through such measures as prophylactic mastectomy.
Friday moved into reports from the trenches. The morning session consisted of talks by researchers from major genome sequencing centers and from the companies behind the so-called “next generation” sequencing methods that underlie this conference. The new technologies, namely Illumina’s Solexa, 454’s FLX, and ABI’s SOLiD, follow the same general plan as the venerable Sanger sequencing method: scan short fragments, or ‘reads’, of DNA letters, and then reconstruct the original sequence from the reads. They just do it much faster than before, mainly by doing the scanning of many reads in parallel. Much of the concern these days is on the reliability of these new techniques – considering that a single changed DNA letter can mean the difference, for example, between getting Alzheimer’s or not – and so the presentations tended to focus on technical topics like error rates and comparisons across platforms. Even so, there were suggestions that some exciting new scientific findings might be around the corner; Richard Gibbs of Baylor showed early data from their sequencing of a HapMap trio (a father, mother and child) suggesting that the human mutation rate might be much higher than previously thought. And Elaine Mardis from Washington University showed that her lab had been able to find mutations unique to tumor tissue in a lung cancer patient. Known as somatic mutations, they had arisen in the patient during their lifetime, and were not found in non-tumorous skin tissue from the same patient. Her study did not show that one of these mutations had actually caused the cancer, but the demonstration that such changes may even be found is intriguing.
The afternoon session moved into the imposing task of storing, processing and interpreting the flood of data these new technologies generate. Paul Flicek from the European Bioinformatics Institute produced that rarest of things, the funny bioinformatics talk, in describing the travails of dealing with the 100 terabytes (that’s 100,000 gigabytes, or 100 million megabytes) generated so far by the pilot phase of the 1000 Genomes Project , and the specter of dealing with a petabyte (1,000 terabytes) of sequence data. Carlos Bustamante of Cornell described some of the insights into human evolutionary history that have made possible by the DNA deluge, including using sequence data to infer possibly the most detailed models yet of historical human population size and migrations. He also described his lab’s and John Novembre’s recent findings on the relationships between geography and human genetics; a topic we’ve blogged on recently at the Blog here and here.
There’s another big day of talks to come here at CSHL. I’m glad to be here keeping up to date on the latest research, so we can incorporate it into 23andMe, and to show off the site to a bunch of people on the cutting edge of genetics.