Talking About Rare Disease with Senior Scientist Suyash Shringarpure

Among his many jobs as 23andMe senior scientist in statistical genetics, Suyash Shringarpure, Ph.D. leads some of our rare disease discovery and prediction work.

But since first coming here in 2016, he’s been involved in a wide variety of projects all leaning on his extensive background in machine learning. Suyash came here after doing post-doctoral work in Carlos Bustamante’s Lab at Stanford University where he worked on genetic data privacy, ancestral inference, and next-generation genetic sequencing. He received his Ph.D. in Machine Learning from Carnegie Mellon University, and he received his undergraduate degree in Computer Science and Engineering from the Indian Institute of Technology in Bombay.

Suyash Shringapure, Ph.D., a 23andMe senior scientist, and statistical geneticist.


There’s a lot we could talk about because his work touches so many parts of the company — from research and development for our health product team, to work in support of Therapeutics, to some of the backend technical efforts related to things like the phasing of genetic data. But we sat down, virtually, to talk to Suyash specifically about his work on rare diseases at 23andMe. This is one in an occasional series we’ve been doing over the last few months looking at rare diseases. Suyash recently spoke about his research at the National Press Foundation as part of a session on rare diseases. (We will post a recording of the session when it becomes available.)

In the spring we published new research led by Suyash looking specifically at the unique way 23andMe can study rare diseases. It offered some unexpected insights into these hard-to-study conditions.

“This approach will enable us to discover new genetic associations across multiple populations much faster than could be done otherwise,” he said at the time.

Read on to find out more about what he had to say:

Even before you came to 23andMe, you were known for using machine learning and statistical methods for work on a broad range of different scientific problems. Why is machine learning such a powerful tool in genetics and biology and for the study of disease?

Genetics has become a data-rich field in the last couple of decades. There are now many large human genetic (and gene expression etc.) datasets available for researchers. Because the genetic risk for most common diseases is an aggregate of many different genetic variants — sometimes hundreds even thousands of variants — machine learning and statistical methods offer a powerful tool in calculating disease risk. In genetics and biology, these algorithms can infer the small contributions of many genetic variants and combine them appropriately to describe disease risk.

Why are you and other scientists at 23andMe so interested in rare diseases, and why are we studying systemic sclerosis, and idiopathic pulmonary fibrosis (IPF) in particular?

Though each rare disease only affects a small number of people, there are nearly 7000 known rare diseases, and the total number of people affected by rare diseases is quite large. It’s estimated that about 20-25 million Americans and 300 million people worldwide are living with a rare disease. In addition, about 70 percent of all rare diseases are genetic, while only 5 percent of all rare diseases have approved treatments. So there is a large unmet need for therapeutics for rare diseases, which is why we at 23andMe are so interested. Beyond the need, our large database is also well-suited for studying rare diseases. It’s often difficult to study rare diseases because it is a challenge to find enough individuals to participate in research. But 23andMe now has enough people who have consented to participate in research that we can more quickly gather enough individuals with a specific rare disease to find statistically robust genetic associations.

Idiopathic pulmonary fibrosis (IPF) is a life-threatening, progressive lung disease with a median survival of three to five years after diagnosis. The genetics of IPF is not well-understood, and at present, there are only two FDA-approved therapies, and unfortunately, neither is always well-tolerated by patients. We believe we can use our research model to find insights into the disease, and that’s why we started our IPF Study.

Similar to IPF, Systemic sclerosis is a rare disease for which there is an unmet need for treatments. Systemic sclerosis is an autoimmune disease that affects the skin and internal organs, and we don’t fully understand the genetics of the disease.  Much like with our IPF Study, we launched our Systemic Sclerosis Study to better understand the genetics of the condition and to improve our ability to discover new genetic associations for these diseases.   

You completed an interesting study using 23andMe’s unique research model to study rare diseases. Were those findings a surprise when you made them?

Some of them were definitely a surprise. We have lots of experience in studying common diseases, where we need to have tens of thousands of affected individuals in our study to find statistically significant genetic associations. It was a surprise for us when we were able to find genetic associations in studies of rare diseases with only tens of affected individuals. Another surprise was finding that common genetic variants show association with rare disease risk. Previously, it has been found that rare genetic variants cause rare diseases, so this was unexpected. Unsurprisingly, we found that similar regions of the genome contribute to risk in different populations and that aggregating data across all populations enables us to make the most discoveries. 

What makes 23andMe’s research model unique for studying rare diseases?

The main bottleneck in studying rare diseases is finding enough people with the condition to participate in the research. 23andMe’s large research-consented cohort means that even for a disease that affects one-in-100,000 people, we can find nearly 100 affected individuals for our analysis. The self-reported data-collection model we have also allowed us to study thousands of diseases at once, rather than studying one disease at a time. So for example, in the study we just published on the preprint server MedRxiv, we did genome-wide association studies (GWAS) on 33 different rare diseases. For three of those rare diseases, our GWAS was the first-ever done on those conditions. So the unique features of 23andMe’s research database allowed us to find the results included in our study. Another unique feature is that through additional surveys, we can also find associations of rare diseases with other traits/diseases. For example, early insights from our IPF study show that IPF patients are more likely to report having gastrointestinal reflux, shortness of breath, depression, and anxiety. 

What are you currently working on?

On the rare disease front, we are continuing to recruit IPF and Systemic sclerosis patients for our ongoing studies. We are also studying genetic associations for more diseases, as our cohort keeps growing. Outside of that, my team and I are working on trying to improve disease risk prediction, specifically on how to make genetic risk predictions accurate for all populations, and how to combine genetic and non-genetic data for risk prediction.

Have you learned anything interesting from your own 23andMe results?

I’ve never been a regular coffee drinker, even in graduate school, when many people are heavy coffee drinkers. Turns out I’m genetically less likely than average to drink caffeine, according to my 23andMe Caffeine  Consumption Wellness report!