Insights from 23andMe’s Study of Idiopathic Pulmonary Fibrosis

In July, 23andMe launched a large-scale study of idiopathic pulmonary fibrosis (IPF), a serious disease that scars the lungs for reasons that scientists have yet to understand.Illustration for Idiopathic Pulmonary Fibrosis

23andMe currently has more than 4,000 people with IPF participating in research, making this one of the largest genetic studies of the condition to date. Currently, there is no cure for IPF. Our researchers are already generating novel insights into this uncommon but debilitating condition, including a possible link to increased risk for COVID-19. [1]

A Rare Condition

Idiopathic pulmonary fibrosis is a rare condition,[2] impacting approximately 100,000 people in the United States, or less than .05 percent of the population. Such a low prevalence of IPF makes it challenging to find enough people with the condition to participate in research. This makes studying the disease to understand its causes and prognosis difficult.

IPF Risk Factors 

The risk of IPF is thought to be partly influenced by genetics. For example, a variant in the promoter of the MUC5B gene may predispose its carriers to chronic inflammation — and eventually to IPF — by reducing the clearance of harmful substances such as cigarette smoke.[3] 

However, known genetic variants are not likely to fully explain who develops IPF, or why. Through our study,[4] we hope to find novel risk factors for manifesting IPF symptoms within and beyond the genome. Previous studies suggested that environmental factors, including infections with certain viruses (e.g. the Epstein Barr virus causing mononucleosis),[5] smoking, or occupational exposures,[6] may also contribute to IPF risk. Other studies have linked IPF to acid reflux disease, postulating that stomach acid may get aspirated into the lungs and cause the characteristic scarring.

IPF and COVID-19 Risk

There is strong evidence that suggests individuals with lung conditions such as IPF are at higher risk for COVID-19 and its complications.[7] Additionally, IPF and severe COVID-19 share a set of risk factors. For example, men are at higher risk for both COVID-19 and IPF, so too are older adults. Certain underlying conditions such as those that affect the cardiovascular system also put one at higher risk for both IPF and COVID-19. 

Because IPF is a rare condition, we did not have enough data to investigate the connection between IPF and COVID-19 in depth. We did, however, inquire whether individuals diagnosed with IPF are more likely to report taking protective measures such as physical distancing during the current pandemic. 

Early on in the pandemic, we asked 23andMe research participants whether they practiced social distancing or avoiding crowds.  We found a strong correlation, suggesting individuals with an IPF diagnosis were more likely to avoid public gatherings (OR= 1.87, P-value=1.74e-5). The correlation held when adjusted for age, sex, essential worker status, education, urban vs. rural residence, and timing of the measurement. Associations with the two other measures of physical distancing — keeping distance from others, and avoiding public places — were not statistically significant.

IPF Insights 

In addition to COVID-19 insights, our data from more than 4,000 participants in the 23andMe database helps to corroborate some of the earlier observations and offer new insights. 

For example, we observed a strong correlation between having IPF and gastrointestinal reflux (odds ratio (OR)=2.89, P-value=1.82e-184). We also saw a weaker correlation between IPF and a history of mononucleosis (OR=1.45, P-value=0.0002). Individuals with IPF were also more likely to report more smoking throughout their lifetime (OR=1.26, P-value=1.33e-6). All of our estimates were adjusted for age, sex, and ancestry.

Although the causes of IPF are not well understood, the impact the condition has on patients’ lives is something we can see clearly in the data. Among those participating in the 23andMe IPF study and in the 23andMe database, having IPF is strongly associated with depression, anxiety, and decreased quality of life  (depression, OR=3.34, P-value=1.01e-29, and anxiety, OR=1.21, P-value=9.02e-81).

Our participants reported sleep disturbances (OR=2.24, P-value=1.13e-26), shortness of breath (OR=6.09, P-value=5.68e-115), and migraines (OR=3.57, P-value=1.93e-8). Consistent with previous reports, we also saw strong associations between IPF and allergy symptoms, although our data do not allow us to disentangle the cause-and-effect relationship between the two. The top associations with IPF in our database are summarized in Figure 1. 

A graphic showing the top environmental wide associations with IPF
Figure 1: Environment-wide association study of IPF in the 23andMe customer population. Top IPF-associated factors are listed on the left, with the number of customers who provided information on each factor in parentheses. The estimates of association are presented with error bars on the right side.


Additionally, we looked at gastrointestinal distress symptoms that commonly accompany medications taken by IPF patients, i.e. pirfenidone, nintedanib, and N-acetylcysteine. We observed a significant association between IPF diagnosis and irritable bowel symptoms, adjusted for age, sex, and race/ethnicity (OR=1.90, P-value<1.00e-16). Upon additional adjustment for medication use, the association lost its statistical significance (OR= 1.50-1.53 depending on medication, P-values>0.05), suggesting that the observed correlation between IPF and gastrointestinal symptoms was strongly mediated by medication use.

IPF Demographics in the 23andMe Database

From a demographic standpoint, our study population is consistent with the current understanding of IPF epidemiology. The prevalence of IPF strongly increases with age (Figure 2), and there is evidence that the aging process itself accelerates the pathogenesis of this disease. [8]

Although past reports indicate that men are far more affected by IPF than women, in our data set the prevalence of IPF among men only slightly exceeded that among women (7.4 cases per 10,000 men vs. 7.0 cases per 10,000 women). However, it is believed that the previously reported increase in risk among men may be attributed to historical smoking patterns rather than underlying biological differences, and is likely to diminish as smoking rates drop. The diversity of our customer cohort enabled us to also consider IPF patterns by race/ethnic group. We observed the highest prevalence (7.8 cases per 10,000 individuals) among individuals of European descent, followed by African Americans (6.7 cases per 10,000 individuals). Individuals of East Asian descent had the lowest prevalence of IPF (2.4 cases per 10,000 individuals). 

Figure 2. Prevalence of IPF in the 23andMe data set by age group.


Our study included IPF patients from all 50 U.S. States (Figure 3). The observed geographic patterns reflected the effects of age (i.e. states with an older population, like Arizona and Florida, had a higher prevalence of IPF) as well as occupational exposures– e.g. mining in West Virginia and New Mexico.

Figure 3. The geographic distribution of IPF prevalence among 23andMe customers by state.[9]


With our powerful data collection and analysis platform, we are just beginning to understand IPF risk and disease history. We appreciate any new data from our customers as it helps us get closer to finding new effective treatments for this debilitating condition. 

 Our efforts to recruit participants with IPF to our study and expand upon these findings are ongoing.