We’re a little late to the party, but yesterday a consortium of scientists, who are part of a project called The Encyclopedia of DNA Elements, or ENCODE, came out with what some are calling a roadmap of the human genome.
More than 400 scientists worked on the project, which was funded by the National Institutes of Health over the last five years at a cost of about $130 million. The announcement came with a blizzard of 30 papers in the journals Nature and Genome Research and many more are expected to come.
There has been some great coverage that explains both the background and gives some context, such as a nice piece by Nature writer Brendan Maher’s. Discovery Magazine blogger Ed Yong did a yeoman’s job of the history of ENCODE’s work and what the future may hold. And Gina Kolata’s article in the New York Times includes our favorite quote. It came from Eric Lander, president of the Broad Institute, who compared the data to a sort of Google Map of the genome and said it was a stunning resource for researchers.
“My head explodes at the amount of data,” Lander said.
Us too, and the challenge is really giving people a sense of what it all means. Much of the work in genetics has been focused on the areas in our DNA that code for proteins, but this is a very small part — just 1 percent — of our genome. The ENCODE project is an attempt to map the other 99 percent, the uncharted “dark matter” of our genome.
There’s much that’s interesting there it turns out. And the work sheds some light on the importance of what once was labeled “junk DNA.” It turns out, unsurprisingly, that it’s not junk and the data from ENCODE provides evidence that many of those mysterious regions of the genome play important functional roles.
For us at 23andMe the ENCODE data sheds more light on some of the genetic variations (mainly single nucleotide polymorphisms, or SNPs) scientists have associated with health conditions and traits. Shirley Wu, who heads up our curation and science content team, took a gander at the data using the SNPs in our coronary heart disease report and came up with a few examples of how this new data set can help inform us and our customers.
In our coronary heart disease report we look at 15 different SNPs that are associated with risk for coronary heart disease in people with European ancestry. One of those SNPs is rs10757278 in the “9p21″ region of the genome. There are many SNPs in the 9p21 region associated with coronary heart disease, and based on the ENCODE data, it appears that this SNP and rs1333047 are the primary functional SNPs. Of the various SNPs in the region, rs10757278 and rs1333047 are likely to affect the binding of transcription factors — proteins that bind to DNA and regulate the expression of genes. These two SNPs are perfectly correlated in people with European ancestry, and the G version of rs10757278 is associated with higher odds of coronary heart disease in people with European ancestry.
Another example is the SNP rs964184 located near the APOA5 gene, which has also been associated with cholesterol levels. The ENCODE data show that it likely affects binding of transcription factors and results in differences in expression of a “target” gene of those transcription factors. The G version of rs964184 is associated with higher odds of coronary heart disease in Europeans.
These examples highlight how this new data can inform the “why” and “how” of genetic associations. Where before scientists often had very little idea which exact SNP was involved and why it was associated with a disease, now we can narrow down the SNPs likely to be responsible and start looking at their biological impact. The ENCODE data show what Kolata describes as at least four million gene switches in our DNA that collectively control how the cells, organs and other tissues in our bodies develop and behave.
Though there’s still much to learn about this newly revealed and incredibly complex genetic circuitry, it’s clear that the data will be a resource for many scientists working in genetics including us here at 23andMe.