A Conversation with Nilanjan Chatterjee

This week 23andMe is hosting our third annual Genome Research Day, a meeting focused on human genomics research happening in and around the Bay Area. This also allows 23andMe scientists to explain some of our latest research.

Among the many presentations, we’ll be highlighting some of our work around polygenic risk modeling. We asked Johns Hopkins University statistical geneticist, Nilanjan Chatterjee, Ph.D., a Bloomberg Distinguished Professor of Biostatistics and Medicine at Johns Hopkins University, to help us put into context the work being done around the world on risk modeling.

Chatterjee is a noted expert in developing risk prediction models, particularly around cancer. As more genetic data becomes available, polygenic risk models are increasingly being used to help calculate the cumulative risk of not just a few genetic variants but the many hundreds or even thousands of variants that convey risk. In addition, these new models can also include other factors  — such as lifestyle and environmental influences — to offer a clearer picture of disease risk.

Bigger and Getter GWAS

Thank you for chatting with us Dr. Chatterjee. Can we start by asking why we are hearing so much about polygenic risk scores now when they have been around for a while?

Dr. Nilanjan Chatterjee: My guess is that because sample size for genome-wide association studies have increased by leaps and bounds, this has led  to much more improved (polygenic risk scores). In particular, PRS now is providing meaningful risk-stratification for a number of common diseases like breast cancer, type-2 diabetes and heart disease. I anticipate the power of (these models) for risk-stratification will continue to increase, as sample size for GWAS grows, specially for non-European ethnic groups where the data have been lacking in the past.

There are many different kinds of non-genetic risk models to estimate risk. How are polygenic risk score models different, and is that important?

Dr. Chatterjee: Well that is an interesting question that does not have a simple answer. At one level, all models are trying to produce an estimate of “risk,” which is a single number. It does not matter whether this is being driven by genetic or non-genetic factors as long the overall model does a good job. But at another level there are important pros- and cons- for using genetic and non-genetic factors. It can be simpler for people to feed in information on simple factors like BMI, smoking history and family history, through filling out questionnaires. While one day genetic information, including PRS, may be part of routine clinical care, but we are not there yet.

On the other hand, lifestyle factors and other non-genetic factors, including various biomarkers, change over time, and they may have limited value for long-term risk prediction based on a single measurement. Information on inherited DNA needs to be collected only once in a lifetime. We often see that genetic predisposition leads to risks that remain over the course of an individual’s lifetime, and non-genetic risk-factors act in a multiplicative fashion on the background of that stable genetic risk.

Complicating the future use of polygenic risk models are huge issues related to our confidence in the systems in place to protect the security of our genetic information. But setting aside  the confidentiality issue, I feel future risk models should include both genetic and non-genetic information. It’s important to understand risks associated with lifestyle because that is something  we can do something about it. At the same time, genetics is an important determinant of our background risk and even if it cannot or should not be modified, it can help us make decisions regarding strategies for risk-reduction through lifestyle changes, medication and screening.

We focus a lot of how polygenic risk scores can include many more genetic variants associated with risk, but they can be extended to also use non-genetic variables and environmental factors. What is the significance of this, and do you see some good examples of risk modeling that is including these different types of data?

Dr. Chatterjee: Coronary heart disease and breast cancer are two fields which historically have been ahead in the development and validation for clinical application of risk models. In both fields, I see there is considerable effort going toward the integration of polygenic risk scores with other types of information, such as rare high-penetrant mutations and non-genetic risk factors.

For example, the BOADICEA model that uses extensive family history information to predict risk of breast cancer has recently been extended to include polygenic risk scores and non-genetic risk factors (Lee et al, Genetics in Medicine, 2019). In our own work, we have recently put a lot of effort integrating a polygenic risk scores for  breast cancer (Mavaddat et al. AJHG, 2019) with information on a set of classic risk-factors for breast cancer. And we’ve validated these models across a variety of cohorts encompassing 6 different countries (ongoing work).

How far off are we from incorporating polygenic risk scores into day to day health care?

Dr. Chatterjee: I think we still have a while — perhaps 10 years —  to sort a few things out before polygenic risk scores can become part of routine care. There are a lot of questions we need to address. First, how much it will cost and who will pay for it? Beyond the costs, what kind of data do we need to convince insurance companies and employers of the value of polygenic risk scores in healthcare? Also how do we reassure the public that the information will not be misused? Finally, how will the information be communicated so that it does not create either a sense of false alarm or reassurance? These are just a few thoughts that come to my mind.

Actionable?

Do you think in the current state, these risk models offer people actionable information?

Dr. Chatterjee: Yes I do think some models, which are well validated, could generate useful information now to guide preventive action. That said,  given the current popularity of PRS and risk models in general, I am also wary that there could be many risk models out there that is not properly validated and could generate misinformation.

What are your thoughts about 23andMe’s use of polygenic score for Type 2 Diabetes?

Dr. Chatterjee: I am somewhat familiar with 23andMe’s Type-2 diabetes report The underlying methodology used is sound and the data as well as other data shows that a PRS for type-2 diabetes can provide meaningful risk-stratification for individuals especially given that it’s a relatively common outcome. In the future, more effort is needed to improve performance of PRS using additional GWAS studies, specifically for non-European populations.

What are some strategies for including more data from non-Europeans to make  polygenic risk models more broadly applicable?

Dr. Chatterjee: Well of course we need more GWAS for non-European populations. Nevertheless, in most cases complex diseases have  a lot of shared genetic background across different ethnic populations. So we should do everything possible to exploit available data across multiple-ethnic groups to develop polygenic risk scores which are optimized for different ethnic populations. More research is needed  in developing methodologies in this area.

Dr. Nilanjan Chatterjee, Ph.D., is a Bloomberg Distinguished Professor of Biostatistics and Medicine at Johns Hopkins University. Dr. Chatterjee’s opinions expressed here are his own, and not Johns Hopkins University’s. In addition, he has not been compensated by 23andMe for his participation in this interview and he has no other financial interest in 23andMe.