Jun 5, 2019 - Research

A Conversation with Nilanjan Chatterjee

Abstract chaotic background

This week, 23andMe is hosting its third annual Genome Research Day, a meeting focused on human genomics research happening in and around the Bay Area. This event will also allow 23andMe scientists to explain some of our latest research.

Among the many presentations, we’ll highlight some of our work around polygenic risk modeling. We asked Johns Hopkins University statistical geneticist Nilanjan Chatterjee, Ph.D., a Bloomberg Distinguished Professor of Biostatistics and Medicine at Johns Hopkins University, to help us put into context the work being done around the world on risk modeling.

Chatterjee is a noted expert in developing risk prediction models, particularly around cancer. As more genetic data becomes available, polygenic risk models are increasingly being used to help calculate the cumulative risk of not just a few genetic variants but the many hundreds or even thousands of variants that convey risk. In addition, these new models can also include other factors  — such as lifestyle and environmental influences — to offer a clearer picture of disease risk.

Bigger and Getter GWAS

Thank you for chatting with us, Dr. Chatterjee. Can we start by asking why we are hearing so much about polygenic risk scores now, even though they have been around for a while?

Dr. Nilanjan Chatterjee: My guess is that because the sample size for genome-wide association studies has increased by leaps and bounds, this has led to much more improvement (polygenic risk scores). In particular, PRS is now providing meaningful risk-stratification for several common diseases like breast cancer, type-2 diabetes, and heart disease. I anticipate the power of (these models) for risk-stratification will continue to increase as the sample size for GWAS grows, especially for non-European ethnic groups where the data have been lacking in the past.

There are many different kinds of non-genetic risk models to estimate risk. How are polygenic risk score models different, and is that important?

Dr. Chatterjee: Well, that is an interesting question that does not have a simple answer. At one level, all models are trying to produce an estimate of “risk,” which is a single number. It does not matter whether this is being driven by genetic or non-genetic factors as long as the overall model performs well. But at another level there are important pros- and cons- for using genetic and non-genetic factors. It can be simpler for people to feed in information on simple factors like BMI, smoking history, and family history through filling out questionnaires. While one-day, genetic information, including PRS, may be part of routine clinical care, we are not there yet.

On the other hand, lifestyle factors and other non-genetic factors, including various biomarkers, change over time, and they may have limited value for long-term risk prediction based on a single measurement. Information on inherited DNA needs to be collected only once in a lifetime. We often see that genetic predisposition leads to risks that remain throughout an individual’s lifetime, and non-genetic risk factors act in a multiplicative fashion on the background of that stable genetic risk.

Complicating the future use of polygenic risk models is a huge issue related to our confidence in the systems in place to protect the security of our genetic information. But setting aside the confidentiality issue, I feel future risk models should include both genetic and non-genetic information. It’s important to understand risks associated with lifestyle because that is something we can do something about it. At the same time, genetics is an important determinant of our background risk, and even if it cannot or should not be modified, it can help us make decisions regarding strategies for risk reduction through lifestyle changes, medication, and screening.

Non-Genetic Factors

We focus a lot on how polygenic risk scores can include many more genetic variants associated with risk. Still, they can be extended to also use non-genetic variables and environmental factors. What is the significance of this, and do you see some good examples of risk modeling that include these different types of data?

Dr. Chatterjee: Coronary heart disease and breast cancer are two fields that have historically been ahead in the development and validation of risk models for clinical application. In both fields, I see considerable effort going toward the integration of polygenic risk scores with other types of information, such as rare, high-penetrant mutations and non-genetic risk factors.

For example, the BOADICEA model that uses extensive family history information to predict the risk of breast cancer has recently been extended to include polygenic risk scores and non-genetic risk factors (Lee et al., Genetics in Medicine, 2019). In our own work, we have recently put a lot of effort into integrating polygenic risk scores for breast cancer (Mavaddat et al. AJHG, 2019) with information on a set of classic risk factors for breast cancer. We’ve validated these models across a variety of cohorts encompassing 6 different countries (ongoing work).

How far are we from incorporating polygenic risk scores into day-to-day health care?

Dr. Chatterjee: We still have a while — perhaps 10 years —  to sort a few things out before polygenic risk scores can become part of routine care. There are a lot of questions we need to address. First, how much will it cost, and who will pay for it? Beyond the costs, what kind of data do we need to convince insurance companies and employers of the value of polygenic risk scores in healthcare? Also, how do we reassure the public that the information will not be misused? Finally, how will the information be communicated so that it does not create either a sense of false alarm or reassurance? These are just a few thoughts that come to my mind.


Do you think these risk models offer people actionable information in the current state?

Dr. Chatterjee: Yes, I do think some well-validated models could generate useful information now to guide preventive action. That said, given the current popularity of PRS and risk models in general, I am also wary that many risk models out there are not correctly validated and could generate misinformation.

What are your thoughts about 23andMe’s use of polygenic scores for Type 2 Diabetes?

Dr. Chatterjee: I am somewhat familiar with 23andMe’s Type-2 diabetes report. The underlying methodology used is sound, and the data, as well as other data, shows that a PRS for type-2 diabetes can provide meaningful risk stratification for individuals, especially given that it’s a relatively common outcome. In the future, more effort is needed to improve the performance of PRS using additional GWAS studies, specifically for non-European populations.

What are some strategies for including more data from non-Europeans to make polygenic risk models more broadly applicable?

Dr. Chatterjee: Well, of course, we need more GWAS for non-European populations. Nevertheless, in most cases, complex diseases have many shared genetic backgrounds across different ethnic populations. So, we should do everything possible to exploit available data across multiple ethnic groups to develop polygenic risk scores optimized for other ethnic populations. More research is needed to develop methodologies in this area.

Dr. Nilanjan Chatterjee, Ph. D., is a Bloomberg Distinguished Professor of Biostatistics and Medicine at Johns Hopkins University. Dr. Chatterjee’s opinions expressed here are his own and not those of Johns Hopkins University. In addition, 23andMe has not compensated him for his participation in this interview, and he has no other financial interest in 23andMe.

Stay in the know.

Receive the latest from your DNA community.