Labs Remove Genetic Data from Public Databases After Forensic Breakthrough

Yesterday we reported on a new statistical method that can establish the presence of a single individual’s genetic signature in a sample containing DNA from hundreds of different people. The method has enormous potential in forensics, because DNA samples from crime scenes and mass disasters often contain genetic material from more than one person.

But what’s promising in one context turns out to be troublesome in another. As we mentioned in yesterday’s post, the new method could in principle be used to identify individuals whose genetic data is held in several public databases. Those databases aggregate information collected by large genome-wide association studies, which typically involve thousands of subjects.

It was previously assumed that aggregating the data of hundreds or even thousands of people — essentially giving the overall genetic composition of the group as a whole — would make it impossible to identify any one person in it.

But yesterday’s paper proved that assumption wrong. Now it has been reported that on Monday, four days before the paper came out, the National Center for Biotechnology Information pulled aggregated data off its Database of Genotypes and Phenotypes (dbGaP). The Broad Institute of MIT and Harvard and the Wellcome Trust in Britain have also removed aggregate data from public view.

Most study protocols don’t allow study subjects to see their own genetic data. So if aggregated data had stayed on those databases, study subjects’ genetic data would be publicly available — at least in theory — even though they themselves never had access to it!

Hypothetical scenarios aside, it is highly unlikely that any person has ever actually been picked out of an aggregate database, and not just because the mathematics of the new method are so complex. For one thing, there is little incentive to do so. And anyone who did want to would have to obtain and genotype a biological sample from the person they were interested in.

Because 23andMe does not publicly release its customers’ data in any form, the new method does not present an issue for us. Even so, we will continue to monitor this and other developments in genetics and security to ensure our customers’ privacy.

Image from istockphoto