Researchers at Johns Hopkins Bloomberg School of Public Health, Harvard, the Broad Institute, the National Cancer Institute, and 23andMe have developed a method that significantly improves the performance of polygenic risk models for people of non-European ancestry.
This new method holds promises that future clinical use of new polygenic models will be less likely to exacerbate health inequities.
Improving Polygenic Risk Scores
Over the last several years, these kinds of risk models, also known as polygenic risk scores, or PRS, have begun to offer up meaningful risk information on many diseases like breast cancer, type 2 diabetes, or heart disease. But the performance of these models — which rely on very large genome wide association studies and are based on hundreds or thousands or even many thousands of variants to calculate risk — perform poorly for non-Europeans. That’s in part because of the lack of genetic research that includes data from non-Europeans.
Outlined in a paper published in the journal Nature Genetics, this new method addresses that problem, improving both the performance and training and building speed of these risk models.
The New Model
Dubbed CT-SLEB, this new approach substantially improved the performance of polygenic risk models in diverse populations, especially among people with African ancestry. And these CT-SLEB models can be modeled much faster than other polygenic risk modeling methods.
“At 23andMe, we are committed to providing health value to everyone. With this collaborative study, we helped improve a method to make polygenic risk models perform better for underrepresented populations so that everyone can have a better understanding of their future health,” said Jianan Zhan, a senior scientist with 23andMe’s Product R&D. “There’s great potential for using these models in clinical care too, but to make that a reality we need to make sure they work for people of all ancestries.”
How it works
The scientists used three distinct steps in their approach. First, they used something they called “clumping and thresholding.” This allowed them to select risk variants relevant to different populations — European, African, East Asian, South Asian and Latino. Then, they used a statistical method called “Empirical-Bayes.” This statistical method uses the risk averages from the whole dataset to adjust the risk averages of each specific population. Finally, the team applied a third layer that involved a machine learning approach. Calling it “super learning,” this machine learning approach adjusts and improves the predictive accuracy of the polygenic risk model.
They then compared their model to nine other modeling methods across five different ancestry groups. They did this to test the predictive value for seven complex traits using data from more than 3.7 million 23andMe customers who consented to participate in research. These research participants were also of different ancestries. The 3.7 million includes more than 413,000 Latinos, 117,000 African Americans, 96,000 people of East Asian descent, and another 26,000 people of South Asian descent.
It proved to be much faster, highly scalable and one of the most powerful methods for generating risk predictions in non-European populations, particularly among African Americans.
While this new approach offers promise, the study authors noted some of the limitations. This new method substantially improved the performance of polygenic risk models in many settings, but there is still room to improve results for non-Europeans.
In addition, the authors noted that the best approach for risk prediction might involve using multiple methods and combining the results. Even with best methods, the disparity in polygenic risk model performance may remain. That is unless there are larger sample sizes from understudied populations.
23andMe offers more than 30 reports based on polygenic risk models. The model is trained with a different method but includes some of the approaches outlined in this paper that were adjusted to account for the uniqueness of our database. You can learn more about our methods for creating these polygenic scores.
With its large size and scale, 23andMe offers a unique opportunity to develop and improve upon these types of models with unmatched power.