SNPwatch: Uncertainty Surrounds Longevity GWAS

[Update: This study was formally retracted by Science on July 22, 2011] A genome-wide association study of extreme longevity published last week in the journal Science has been receiving a lot of press attention.   The results are quite extraordinary: the authors identify 70 loci with genome-wide significant evidence for association with living past the age of 100, and they construct a SNP-based model for predicting exceptional longevity (EL) that has 77% accuracy in an independent set of individuals.   We were initially very excited by the article and thought it would be of great interest to the 23andMe community.   However after a closer reading of the article and supporting materials, we think this study actually inadvertently points to some of the pitfalls in analyses of genome wide datasets. There are several reasons for skepticism about these new results.   Another recent genome-wide study has reported no significant associations with longevity.   There is suggestive evidence of genotyping quality control problems in the new results, and some routine quality control checks do not appear to have been done.   The design of the study is particularly susceptible to introducing biases into the results.   And a preliminary analysis of the proposed 150-SNP model for predicting longevity indicates that it is not predictive in the 23andMe community. We expect that most of the results of this study will not have the same longevity as its participants.   In genetics, as with most things in life, if a result seems too good to be true, it probably is.   That said, this study does contain some interesting tidbits, such as the association of near the ApoE gene with longevity.   This SNP has previously been shown to be associated with Alzheimer’s Disease.   For the time being we won’t be incorporating the data from this new study into our Personal Genome Service or putting information about any of the other SNPs here in the Spittoon.   But we will be on the lookout for other attempts to replicate the study’s findings.   If and when such a replication is published, we’ll scrutinize it like all of the papers we cover, and we’ll let you know what we find. (A more technical discussion of the issues follows) – A large study combining results of four genome-wide association studies of longevity was published in May in the Journals of Gerontology.   That study found no associations meeting their pre-specified criteria for genome-wide significance.   While they used a more inclusive phenotype (age 90 or older), it is surprising that there could be so many loci associated with survival to age 100 in the new study, some with very large effect sizes, yet none were found in the larger study from earlier this year. – An important part of designing an experiment is choosing the criteria that will constitute convincing evidence of a positive finding.   We have to make a trade-off between setting the bar so high that we will miss many interesting and true results (false negatives), or setting the bar so low that we will get many spurious findings (false positives).   In the new study, the authors use an unusually permissive standard for genome-wide significance that appears to allow a false positive rate of 6 per 100,000 SNPs tested, or 18 false positives across the whole study.   A more conventional standard for significance is to control the false positive rate at 0.05, genome-wide, meaning that we expect only a 5% chance of even one spurious finding. – None of the strong associations appear to be supported by any evidence from nearby SNPs.   Each of the reported associations stands alone, but typically, we expect that nearby SNPs will show some intermediate evidence for association, because nearby SNPs tend to be correlated.   This is a red flag because genotyping quality issues can produce these kinds of uncorrelated association signals. – Many of the associated loci have high rates of missing genotypes in the EL individuals compared to the controls.   For the 70 genome-wide-associated SNPs, the median missing data rate in EL samples was 9%, compared to 3% in controls.   Some of the SNPs with high EL call rates appear to have large deviations from Hardy-Weinberg equilibrium in the EL group.   For instance, the two SNPs with strongest evidence for association, and , are far out of equilibrium (P < 10-20).   Both of these are suggestive of data quality problems that can produce false associations, though there can be other valid explanations for the Hardy-Weinberg results. – While the authors performed a replication of their results in an independent set of EL samples, the replication may share some biases with their initial genome scan.   The authors drew most of their controls for their initial scan from a reference dataset, and used this same source of controls in their replication.   And both sets of EL samples were genotyped with the same method.   They show that substituting an alternate set of controls makes little difference, but this does not rule out a genotyping issue in the EL data. We took a preliminary look in our customer data to see if the proposed SNP-based model described in Sebastiani et al. is predictive of exceptional longevity.   A commonly used measure of test discrimination is to calculate how often, for a randomly selected case and control, a test correctly assigns a higher score to the case.   This is known as the “c statistic” or “area under the curve”.   The authors of the new study say their model scored a 0.93 for this statistic.   But when we compared 134 23andMe customers with age ≥ 95 to more than 50,000 controls, we obtained a test statistic of 0.532, with a 95% confidence interval from 0.485 to 0.579.   Using 27 customers with age ≥ 100, we get a value of 0.540, with a 95% confidence interval from 0.434 to 0.645.   A random predictor of longevity would give a 0.5 on this scale, so based on our data, performance of this model is not significantly better than random.   Even with our small sample size, we can also clearly exclude values as high as the published result of 0.93. Study designs that use independently collected control genotype data require extra attention to quality control to rule out the possibility of systematic differences in genotyping between cases and controls.   In any experiment that tests hundreds of thousands of SNPs, a small proportion of SNPs can be expected to have problems with automated genotype assignment.   This may not be a problem if cases and controls are affected equally, but if cases and controls are genotyped separately, then errors or missing genotypes may be concentrated in just the cases or just the controls, and these can give the appearance of a relationship between genotypes and phenotypes that is actually an experimental artifact. The results could be strengthened if the authors could inspect the raw data for the associated SNPs to verify that genotypes are being assigned consistently.  Differential missingness in the EL samples is a major issue because if it is the result of poor clustering, it will almost always tilt the apparent genotype frequencies of affected SNPs.   If the clustering does suggest that there is a “batch effect”, then there are some strategies that can be used to rescue the analysis.   One approach is to aggressively filter the data using strict quality criteria and manual inspection of putative associations.   Another approach is to directly model the batch effect and then test for association conditional on the batch structure of the data.   This only works if the problem is not perfectly confounded with the phenotype.   If that is the case, the only resolution may be to use another technology to verify genotypes of associated SNPs and fill in missing values. Addendum: We repeated our analysis restricted to individuals with European ancestry.   The results were similar: for 129 customers with age ≥ 95 and more than 43,000 controls, we got a test statistic of 0.534, and for 26 customers with age ≥ 100, we got a value of 0.558.   In both cases, the 95% confidence interval includes 0.5.
  • JustMe

    Although I am only generally familiar with GWAS studies, it would seem that better controls could have been performed… I appreciate the critical analysis of the paper by the 23andme team.

  • JDR

    Thank you for this posting. There are other reports about anomalies in Illumina’s 610-Quad DNA chip used for some of the Science paper’s analysis. 23andMe use an Illumina chip. Is it the same one? Have you used different chips in providing your service and if so what precautions does the company take to avoid chip model anomolies?

  • DavidH

    We do not use the 610-Quad, we use a customized version of the 550-Quad with some 23andMe-selected content. That said, I think every high density array product has its own quirks and probes that do not work as intended. We take special care with SNPs that are included in health and trait reports; this is a small enough number that we can manually verify that they are well behaved. The algorithms that use the rest of the data (i.e. Relative Finder, ancestry determination) are tolerant of a small number of misbehaving SNPs. We also track a “blacklist” of SNPs where we’ve detected issues and exclude these from analyses. We also do additional data quality checks when we are doing analyses for research.

  • LM

    This is an interesting example of how the database can be used for quick turnaround to validate other studies. One question though; how do you know that the 134 customers with ages >95 are really >95? Given that many people are concerned about genetic privacy, and there is no requirement to use real names or identifying info to use the service, is it possible that many of these customers were just using a made up date of birth?

    Is there any other self-report data that might help clarify this– for example, you would expect higher self-reports of conditions like arthritis, which are much more prevalent in an aging population.

    And as an aside, I just find it very interesting that someone who has made it past age 95 would use a genetic prediction service! I wonder what they would be looking for….

  • DavidH

    We cannot directly verify customer-provided information. We know that, generally speaking, the accuracy of our self reported data is reasonably good, based on our ability to replicate known associations. Birth year (not birth date) is part of the customer profile so we have a value for most customers, but it is not mandatory to provide a value. We have corroborating evidence in some cases, depending on which surveys a customer has filled out. For instance, most of the individuals with age 95+ who have filled out the alopecia survey report substantial hair loss.

    We have done some targeted recruitment of older adults as part of a “healthy aging” research project. Some of our customers are primarily interested in the ancestry functionality. Also, sometimes one customer will recruit multiple family members to provide samples, though those individuals may never use the site directly. So it is hard to generalize about the motivations of this group.

  • Jen

    I’m wondering what the ethnic background of the people you used was. The NECS subjects were all Caucasian.

  • DavidH

    For the analysis we reported, we did not filter on ancestry. Our long-lived subset is nearly all European ancestry (126 out of 134 classified as European). The remaining customers are predominantly European. We’ll redo the analysis on the European subset but given the numbers I wouldn’t expect a substantive change in the outcome.

  • TR

    Thanks, David, for the critique of Sebastiani et al.’s paper. Your points are well reasoned and do cast doubt on the validity of the paper’s central results. Have the authors of that paper seen your comments and made any responses?

  • DavidH

    We have been in contact with them however I think they are understandably busy. While we would welcome their feedback, it may make more sense for them to work on a single discussion of the various criticisms that have appeared, rather than responding specifically to our comments.

  • DavidK

    Thanks a lot for the info, David.

    A question: the same group published previosuly their GWAS results from a smaller effort (Sebastiani P et al. (2009) RNA Editing Genes Associated with Extreme Old Age in Humans and with Lifespan in C. elegans. PLoS ONE).

    Did you try to replicate whether genes ADARB1 and ADARB2 are associated with extreme old age in your database? More frequent in nonagenarians/centenarians vs. the regular-age folks?

    Regards, DK

  • DavidH

    Our dataset is not powered to address the ADARB1/B2 hypotheses because the reported effect sizes are fairly small (which is not at all unusual for association studies of complex traits). We would need a much larger number of older customers to be able to effectively test whether we could replicate those results. We were well powered to evaluate the predictive accuracy of the model described in the more recent paper only because that was such a strong result.

  • Jackson

    Thanks for putting detailed info like this on the blog.

  • Superedu

    Dear David,

    Will 23andme try to use a similar approach (nested genetic risk models) as Sebastiani et al. to find their own list of predictive SNPs? (you might use 90 years as cutoff to increase group size)

    Also, will 23andme publish the above described analyses, as your data and analyses appear very convincing and strongly impact the conclusion? Doing so would reach a larger scientific community. Or can 23andme never publish anything due to legal issues?

    Finally, compliments for doing and openly posting these rigorous analyses and my compliments on the clear presentation!

  • DavidH

    Our dataset isn’t nearly large enough to that kind of modeling, even with a more inclusive phenotype.

    We are certainly able to publish, and we have a number of papers in various stages of preparation. In this case, we decided that it didn’t seem worth the substantial time and effort that would be required to turn our findings into a publication. Much of our evidence of problems in the original analysis is circumstantial and Sebastiani et al are the only ones who can really assess what went wrong. They are doing that now, and I think it makes sense to wait and see what they come up with.

  • I wondered if these SNPs are proved wrong do they get removed from the GWAS catalog ( . I was looking through the catalog today and saw that 29 of the SNPs from this study were included there. It seems to me that the catalog is quickly updated with new data, but do you know if studies that are show to be in error are often removed?

    How about 23andMe’s database? If you find out that a study was shown to be incorrect, do you update the analysis for your customers? Do they get notified? I recently became a customer of your service so I am interested in this as well.

  • Superedu

    Dear David,

    You mention that Sebastiani et al. report no SNPs in LD (“None of the strong associations appear to be supported by any evidence from nearby SNPs.)

    However, in section 4 of their Supporting material (page 3), the authors mention: “SNPs in strong LD were removed using the program PLINK with a SNP window of 50 and sliding window of 5 SNPs and we removed 1 SNP from each pair of SNPs with r2 > 0.30 leaving 97,508 SNPs for this analysis.”

    I think this might explain why “none of the strong associations appear to be supported by any evidence from nearby SNPs”. It seems this eliminated 2/3 of all SNPs.

  • DavidH

    @Superedu: that section of the Supporting Online Material is describing how they performed their principal components analysis to assess population structure; it does not describe the SNP association analysis.

  • DavidH


    I’m not sure whether the site has a protocol for removing erroneous results. It should be very rare that a study is actually wrong due to an experimental or analytical error; on the other hand there are lots of valid reasons why a signal that shows up in one study may never be replicated (i.e. subtle differences in phenotype definitions, differences in populations, or simple false positives due to the luck of the draw). The list has set a low threshold for inclusion, so they are willing to tolerate some false positives. Which is a reasonable standard, for their purposes.

    For our health reports, we are much more picky, so it should be much less likely that we would include a false report. However, we do update reports as better information becomes available. For example, our Type I diabetes report was recently updated and a SNP was replaced.

  • Roger2

    Defending the Boston University Longevity Study!

    A number of independent researchers are questioning the validity of this study. I have read all of the negative reports and news stories as well as the comments from staff. If I hadn’t previously read about the study I would probably go along with all the negative comments and look somewhere else for answers but then I would miss the wealth of information and insight found in the Boston University, BU, study.

    The Boston study found an extremely good relation between longevity, disease and SNPs/risk alleles. The purpose of this paper is to put the facts about this study in proper perspective as critics comments are never unbiased no matter their source. I will endeavor to address each negative comments/criticism and explain what actually happened so that even a person non-versed in this field will understand what, apparently, the experts seem to have missed.

    The chief criticisms about this study are listed below:

    1. The SNP data is flawed.

    2. The statistics are misleading and can’t be reproduced.

    3. The Manhattan Plot is not representative of conventional Manhattan plots where numerous related SNPs are always seen.

    4. Identical analysis were done using other centenarian data with the 150 SNPs by independent researchers and only random correlations were found

    5. 10% of the centenarians tested were done with a flawed chip.

    To follow along the reader is urged to obtain a copy of the BU Main Research paper and its Supporting Material paper. The Main Research paper can be found here:
    The Supporting Material paper can be found here:

    Before we start I would like to briefly explain the way the experiment was run otherwise some of the results discussed may not make any sense as seems to already be the case at least in part. The BU study started out by individually phenotyping each of the centenarians used in the study. The major part of the phenotype survey was a list of specific diseases (cancer, cardiovascular, COPD, dementia, diabetes, hypertension, macular degeneration, and stroke); If the subject had ever been diagnosed as having any of them and when. There were many other questions as well about their family, health, etc.

    From looking at the historical disease data from the centenarians the researchers came up with a brilliant idea; to manually add a list of disease related SNPs (for the diseases in the survey) to the test. Their idea was if centenarians don’t all get these diseases until very late in life, which is a fact their survey turned up, then their disease related SNPs may be turned off which may be a good way to look for longevity traits.

    After all of the centenarians were genotyped their data was individually inspected for the disease related SNP alleles. Searches and comparisons were done and finally the researchers identified a list of slightly over 2000 SNPs to be classified and used in their study. From this list 150 SNPs were eventually selected with what they initially thought had the highest correlation to longevity.

    When they matched these 150 SNPs to three centenarians, see Figure 2A at the top of page 7 in their Main Report, they found there was very little correlation. Eventually they decided they needed to group the subjects according to phenotype and try again. They did this and settled on 19 groups. They initially had over 30 groups but when they discarded all groups with less than 6 candidates they arrived at the 19 groups used in the study. There were over 100 centenarians in these discarded groups.

    The BU researchers found that there was very little correlation with groups C14 to C18 and no correlation at all in group 19. There were 114 in these last six groups. They found very high correlation in groups C1 to C4 which contained 109 centenarians. The remaining groups contained 487 centenarians for a total of 710 that remained in the study.

    The brilliant thing about this study was it defined 19 different groups from just prior historical subject data. Each one of these 19 groups had a longevity and disease risk factor associated with it. If you know which group you were in you would know about how long you will live and when you might acquire some of the very bad diseases that generally plague older individuals.

    By the end of the study they had found a way where each group could be identified by the percentage of an individuals SNP matches (to their 150 SNP set). Now, there were two ways to determine which group you belong, 1. By going through their initial survey process, or 2. By determining the percentage of the SNPs your genotype matches. Both of these could only be done on an individual basis.

    Now, on to the criticisms:

    1. The SNP data is flawed.
    In the 23andMe comments on The Spittoon they stated the error of faulty SNPs had to be as high as 18 SNPs in the 300,000 SNPs analysis. The Boston University, BU, researchers had previously stated they expected 11 SNPs could be flawed. From a Newsweek article two of the flawed SNPs had been identified from the history of the new chip used near the end of the study. These two SNPs were rs1036819 and rs1455311.

    The problem arose when the last 10% of the centenarian subjects were run using a different/new chip. This new chip that had known problems with these two SNPs. The other 90% were run with a chip that had no problems with these two SNPs. That means 90% of the data for these two SNPs was read fine.

    The BU researchers stated in their papers that 10% of the centenarian data was outside of their cluster groups so it was not used. It isn’t known if the 10% discarded was the same 10% ran on the new chip or not. A little over 1000 centenarians were initially removed from the study and later another 100 had unusable data. By the end of the study over 200 of them were dropped from the study.

    Except for the two SNPs, set out above, all of the questionable data had been removed from the study. This left two possibly flawed SNPs out of 150 (90% of the data for these two SNPs was fine). As the BU study was also grouping by percentage of total matches not OR values the maximum error, if the two SNPs were not excluded from the study, one would expect to see would be 2/150 = 1.3 % in determining group membership which was all they were concerned about.

    2. The statistics are misleading and can’t be reproduced.
    The criticism here is the 77% predictability, claimed in the study, is too high and should be only 3%. The 77% number came from the results shown in Figure 2B in the middle of page 7. Here, a separate analysis was preformed on a test set of centenarians and a set of controls. In the study two correlations were run, one with just a select group of centenarians (the makeup of which isn’t fully known) and one with just controls.

    The data showed the centenarian’s, in this select group, correlation ranged between 0.5 and 0.9 with a median of 0.77. While this is a rather wide range the researchers chose to publish the median value of 77% rather than the range of 50% to 90%.

    What the 77% showed was for a small sample, where all the individuals had specific genetic signatures, the researchers were able to predict longevity to a reasonable degree. However, their paper also stated for the most part there was little correlation with many of the other groups.

    The list of 150 test SNPs had been pulled from their initial set of 2000 SNPs and refined for the high value groups C1 to C4. Only 13.6% of the centenarians in this study were found in the high value groups.

    What the critics did was look at the data in a different way and they came up with numbers that ranged between 3% and 77% (yes, one of the critics actually said the 77% number could be arrived at but then said it didn’t mean anything). It has been said “with statistics you can prove or disprove about anything if you have enough data” all it depends on is where you want the numbers to go and how creative you are with your sample definitions.

    If you want to show a very low number, like 3%, then select a very large data set, not a small select group of subjects. Then you will get small numbers which is what the critics did. As the actual makeup of the select group was not divulged it isn’t possible to replicate this test or verify the 77% number or not. As such this statistical criticism has no merit.

    3. The Manhattan Plot is not representative of conventional Manhattan plots where numerous related SNPs are always seen.
    This criticism is discussed in length in the link set out above. The problem here is that independent researchers expected to see sharp vertical columns of associated SNPs by each of the high significance SNPs found in the study. As this didn’t happen these researchers assumed it had to be a ‘red flag’ indicating bad experimental data.

    What they failed to realize was that in the BU study the researchers had manually selected over half of the SNPs from disease related SNPs. Most of the SNP selection had been selected manually rather than automatically and when the Manhattan plot was made only the specific SNPs that had been manually selected from the set of 2000 stood out. There was never any effort made to look for or incorporate any associated SNPs related to the 150 they decided on and this was mentioned in the paper. From looking at their Manhattan data plot, page 22 of the Supporting Material paper, it isn’t clear if this plot contains only their 150 SNP subset or their entire data set of 2000 SNPs.

    However, this is not the worst mistake the experts made. After careful reading of the BU papers it should have been obvious that the Manhattan plot for each of the 19 groups will be very different. The two extremes would be for groups C1 and C19. For C1 the Manhattan plot would show the standard base compaction we are all familiar with and above that would be a horizontal straight line going from chromosome 1 to chromosome 22 with nothing above or below the horizontal line except for the base compaction. For C19 the plot would resemble a random number scatter plot with no discernable features (other than its randomness) for all the SNPs in the study.

    The Manhattan plot shown is representative of group C9 but I think its really a median plot from the variations in the 150 SNPs. No one knows for sure except the BU researchers but which one is actually irrelevant. The way the BU study found their longevity associations does not lend itself well to a Manhattan plot. This type of plot is more representative of data that lends itself to whole population scans which the BU did not endeavor to do and was never their intension.

    Related/associated SNPs were never entered on purpose. In their paper the BU researchers stated they only wanted to use a very small set of SNPs. The intent of the BU researchers was to find a small group of SNPs, the 150 they decided on, that would confer group identification not to look for definitive SNP proof of longevity. To look for SNP proof of longevity one would expect to see the associated SNPs in the Manhattan plot.

    4. Identical analysis were done using other centenarian data with the 150 SNPs by independent researchers and only random correlations were found.
    Several independent research groups tried to replicate the results using their own in-house data. These attempts were all utter failures. This is because the BU study required the subjects to be sorted by phenotype into 19 groups first and even then only a few of these groups exhibited high correlation with their SNPs.

    Figure 2A at the top of page 7, in their Main Report, shows what a failure it is to do this without presorting the data into phenotypes first. Figure 2A shows the initial correlation between the 150 SNPs and three centenarians. For a 106 year old centenarian the correlation was around 50%, in the 107 year old the correlation was around 70% (it was almost the inverse of the control group correlation). The 119 year old had a 99% correlation with the 150 SNPs. The point here is if you don’t take the group affiliation into account you aren’t going to know what the results mean.

    As none of the risk allele data has been publicly released for all of the 150 SNPs means these independent researchers were just matching their centenarian data against all 450 alleles, for each centenarian, without knowing which allele was the correct one to look for. They could have assumed matching against the minor allele but from the released data, Table S1 and S2 in the Supporting Material. It can be seen that some risk alleles are the minor component and some are not. Even among the minor alleles some have OR values less than 1 and others have OR values greater than 1 (normally one would expect a risk allele to have an OR, odds ratio, value >1). Without knowing the specific risk alleles used for each of the 150 SNPs in the BU study any matching study will produce nothing more than random numbers with a mean about 0.5.

    Even if the independent researchers had been given all of the risk alleles for all the 150 SNPs it would have made no difference. Because without knowing the phenotype grouping of their centenarian data their analysis would still have centered about 0.5 (the mean for random numbers). The BU model works great for a very small number of data sets but breaks down for large data sets, >10.

    Another problem, with the way the independent researchers attempted to validate the BU study, is not all of the 150 SNPs and risk alleles were common in all of the phenotype groups. The high predictability (99%) groups of these 19 groups were comprised of as few as 8 to as many as 64 subjects out of over 1000 centenarians. This would indicate the selection criteria for the high predictability groups only comprised a small percentage of the total centenarian population. Without knowing the group selection criteria there is no way one can say whether the Boston study’s 150 manually selected SNPs were good or not. Unfortunately, the group characteristics are proprietary, as they are selling this information, so this information may not be forthcoming in the foreseeable future.

    Just having a SNP to search for does no good unless you know the risk allele for the trait of interests. For example rs2075650 has been shown to be a predictor of Alzheimer’s Disease for GG genotypes. But in the longevity study this same SNP was only considered significant, as far as longevity is concerned, if the genotype was AA. This particular SNP was manually selected for inclusion in the study based on initial centenarian surveys.

    One of the technical comments I saw from several independent researchers, about this study, was “finding an Alzheimer SNP related to longevity is very interesting.” The point here is these independent researchers either didn’t read the BU study’s reports or else they didn’t comprehend what they read, as the paper clearly pointed out that SNPs related to diseases were manually included in the study set (Alzheimer’s Disease SNPs were specifically set out on page 7 of the Supporting Material paper and several times in Table S3 on page 39).

    The bottom line is if you search a large data set for matches against a SNP, without knowing the risk allele, you will get a match for almost every single data set and they will be clustered into three sets depending on the different population types included in the data. Searching/matching for a desired trait without knowing the risk allele for that trait is just a worthless waste of time and energy. Apparently the independent researchers thought all centenarians would have one specific set of these 150 SNPs and when they didn’t see any they attributed the error to the BU study instead of their own short comings.

    5. 10% of the centenarians tested was done with a flawed chip.
    More than 10% of the centenarians were initially discarded from this study because of very bad data fits. Another 10% belonged to groups where there was little correlation. The researchers are aware of the 10% flawed data and have issued a response to the critical Newsweek article (at the very end of the Newsweek article).

    No one seems to dispute the fact that 90% of the centenarians data was read with the old/good chip, that no one had any problems with, and with the exception of the two aforementioned SNPs there was no other known problems with the new chip used in the last 10% of the study. With 90% old/good chip data plus discarding 20% of the data, because of poor data correlation, could mean that 99% of the data used in the BU study was all good. However, the critics were all extremely careful not to mention the fact there could be a lot of good data included in the study.

    This criticism is close to the one that said they used flawed SNP data. When the SNP data was looked at closer even the critics agreed the flawed data to only be a maximum 18 SNPs out of the 300,000 scanned. Of these 18 only two SNPs were found in the high correlation set.

    While over 20% of the centenarian data was discarded because it didn’t fit the model, it isn’t known how much of this data was in the 10% that was read by the new chip. The researchers would know but so far this information has not been released.

    Keep in mind that we are only looking for percentage matches to whole groups not individual SNP response. One would also expect if the new chip produced 10% flawed data this data may not match the earlier 90% data and therefore would not correlate and as a result be removed from the study data. This is what the BU researchers found and all the non correlate able data was removed from the study.

    While this study had some short comings thinking outside the box wasn’t one of them. While their model didn’t work very well across the board, as was pointed out in their Main Research paper, the BU model was extremely predictive of exceptional longevity for the first six phenotype groups (C1 to C6). For the later groups (C7 to C14) their model became less predictive and for the remaining groups it didn’t work within reasonable limits. Even though this study wasn’t the end all solution to longevity it was an extremely good start.

    Even if some of the BU study results are flawed that is no reason to also exclude the pre-study data associations derived from their test subjects: ie centenarians don’t come down with any of the deadly dieses until very late in their life (as this was historical data collected from the test subjects). This information is extremely important and has not been associated in a longevity before. Even though a small portion of the test data may be flawed (18 per 300,000 SNPs) there is no reason to condemn the entire study or question its validity to the degree that is being done.

    If this study was actually as bad as the critics would like everyone to believe it would be removed from the Government GWAS data base (which is where it still is after nearly eight months) as it would be misleading for future researchers. Instead, the Government only removed or noted the very few SNPs that may have been flawed and nothing more. Apparently, the United States Government, including NIH, seems to think the BU study has merit.

  • DavidH


    You’ve clearly spent a lot of time thinking about this paper. However, your defense substantially misrepresents both the original BU work and the criticisms of that work. I’ll try to make just a few points:

    Your description of the overall design of the BU study is incorrect. The BU researchers did not preselect disease-associated SNPs for study. They conducted a genome wide association study with ~300,000 unselected SNPs, using off-the-shelf SNP arrays manufactured by Illumina for this purpose. The 150 longevity-associated SNPs are selected from these 300,000. Also, the BU researchers did not choose to divide the centenarians into 19 groups based on their phenotype data. The 19 groups were chosen solely based on genetic information from the 150 selected SNPs.

    The SNP data quality problems I described (differential missingness, and deviations from Hardy Weinberg equilibrium) were pervasive and not limited to a few SNPs. Your characterization of the Manhattan plot is also incorrect. The plot shows genome wide data for all ~300,000 SNPs that were tested, not 2000 or 150.

    You assert that the NIH has somehow endorsed the study results by leaving them in the NHGRI GWAS catalog. A search of the catalog for longevity associated SNPs shows that the top two SNPs listed for the BU paper are rs1036819 and rs1455311 — the two SNPs widely reported as giving erroneous results. It is somewhat disappointing that the study remains listed, but the catalog is not closely curated and it may be that the maintainers feel they should leave these results in as long as the paper is part of the scientific record (i.e. until it is actually retracted).

  • peter ezzell

    Near the end of April I communicated with one of the papers authors, Tom Perls, and asked about the status of resolving the criticisms. He wrote that an independent lab at Yale independently produced a clean data set from which the same analyses were then peformed. He also noted that the corrected findings have been submitted to Science and they are awaiting word. He did not characterize how the findings changed

  • DavidH

    I’ll look forward to seeing how this works out.

    • This study has now been formally retracted. See here for discussion.