In the hunt for new medications where the odds are stacked against you, where fewer than one in ten drug programs succeed, you need an edge.
A new study by 23andMe’s Therapeutics and Research teams builds upon previous research, indicating that human genetic data offers that kind of advantage and doubles or triples the success rate of drug development.
If you add improved gene mapping to your toolbox, as 23andMe has done for this study, the success rate improves by four or five times. Moreover, the sheer scale of the 23andMe data ensures that genetic insights linking a drug target to disease are available for 60 percent more drug targets than from other resources.
Traditional drug development is risky, time-consuming, and costly. Ninety percent of drugs fail. All those failures add time and money to the process. That means that getting a drug approved costs, on average, about $1 billion and takes more than a decade. Only about 50 new drugs are approved in an average year by the FDA. There were 55 in 2023, and so far this year, there have been 24. This is despite the billions spent each year by pharmaceutical companies on research and development for new medications.
A Massive Opportunity
Previous work has pointed to human genetics as one way to reverse this trend, and several companies are now relying on human genetics as a starting point for drug target discovery. We now show unequivocally that using human genetic data at scale could radically change the hunt for new drugs, helping to get new medicine to people who need it faster and more efficiently.
“There’s still a massive amount of value left utilizing human genetics for drug development,” said Adam Auton, Ph.D., Vice President of Human Genetics at 23andMe and a study co-author. “We’ve just scratched the surface, and the scale of the 23andMe dataset allows us to draw links between genes and disease that aren’t possible to find anywhere else. 23andMe continues to invest in new capabilities and computational approaches such as integrating artificial intelligence and machine learning into our discovery pipeline.”
The new study, released on MedRxiv, a preprint server for health scientists, offers a peek behind the curtain into 23andMe’s massive and powerful therapeutic research database. In this post, we hope to highlight some of its critical findings and illustrate how using human genetic data can support drug target discovery, target validation, and target de-risking in the drug development process.
“Traditional approaches to drug development are often a gamble, where nine out of ten programs fail. But with the scale and precision of 23andMe’s human genetic data, we’re changing the odds,” said Xin Wang, PhD, a 23andMe Therapeutics Senior Scientist and the lead author of the study.
“Our study demonstrates that leveraging genetics can double or triple success rates and, when combined with advanced gene mapping, improve success by up to fivefold. Our methods ensure we’re targeting diseases more accurately and efficiently, bringing life-saving treatments to patients faster.”
Size & Scale: Why Big is Better
23andMe has the world’s largest genetic and phenotypic database for therapeutic research. It’s vital to grasp the power of a resource of that scale.
For this study, 23andMe scientists used de-identified data from more than 7.5 million individuals who consented to participate in research. For perspective, that research cohort is almost twice the size of the combined total of all the major public biobanks worldwide. Most of these biobanks number in the tens of thousands, while the largest are between 500,000 and one million. 23andMe has more than 15 million customers with genotype and phenotype information. More than 80 percent of 23andMe customers consent to participate in research and have contributed over 4 billion health and trait data points, including disease diagnosis, medication use, medication side effects, lifestyle, diet, and family health history.
Overall, the combined phenotypic and genetic data available for life sciences research from 23andMe is an order of magnitude larger than any other cohort.
The size and scale of 23andMe’s drug discovery database are critically important because they capture much more diversity. Having a diverse research database is essential for equity and because having enough data from people of different ethnicities helps discern genetic variants causing disease across these populations. Sometimes, a variant is more common in one population than in another. Including people of diverse ethnicities offers more potential to identify targets not easily identified in a more homogenous dataset.
Bigger is better because as the size of 23andMe’s database grows and as we apply new technologies like AI and machine learning to this massive dataset, researchers can find more genetic associations, leading to more potential drug targets.
Gene Mapping and More Targets
For example, the total number of target-indication pairs is 60 percent greater with 23andMe’s dataset than from public biobank data. A target-indication pair is the pairing of a specific condition or disease, known as ‘an indication,’ to a specific protein or biological ‘target.’ Identifying these target-indication pairs is an essential first step in drug discovery and development. As part of this study, 23andMe scientists found about 1,050 target-indication pairs for medically relevant conditions related to more than 41 disease categories, including different cancers, metabolic conditions, neurological conditions, and autoimmune diseases.
Just look at one of these conditions, asthma, to illustrate the power of size and scale. 23andMe has data for more than three million individuals within its asthma cohort — this includes those with asthma and those without the condition who are used as controls. From that cohort, researchers identified 652 significant genetic associations for asthma. For comparison, a 2022 meta-analysis by researchers at the Broad Institute included about 1.6 million individuals and identified 49 novel genetic associations among a total of 179 genetic associations found.
The 652 significant genetic associations found for asthma is an example of one of more than 1,050 target-indication pairs for a medically relevant condition. In all, 23andMe scientists found more than 140,000 significant genetic associations for different common and rare conditions. That amounts to an average of 130 different genetic associations for each condition. In addition, those associations point to 15,007 genes associated with therapeutically relevant conditions, including 1,355 genes associated with conditions that scientists had not previously identified in public genetic studies. That amounts to about 5.9 to 18.8 novel indications per genetic association.
By adding gene mapping, scientists can more easily identify the likely causal gene for a genetic association and determine the mechanism for how a specific gene is related to a disease or condition. 23andMe researchers found that improved gene mapping methods from human genetic associations also improve relative success by four or five times.
This is a potent tool for scientists who want more power in their work and offers a head start for those hunting for higher-confidence potential drug targets.
“The scale of the data enables me to perform studies in an afternoon at 23andMe that would have taken years in my previous academic life,” Auton once said about the research database.
From Gene to Drug
Traditional target-based drug discovery often doesn’t follow a direct path like a game of chutes and ladders. There are many dead ends, much trial and error, that might force researchers to scrap one target and start with another.
With a target identified, researchers can search for a compound that will hit that target. They then must test whether it is safe and effective. It must adequately bind to a specific disease-causing target in the body and trigger the correct response without affecting other parts of the body. If the drug has no effect or triggers an adverse reaction, the effort returns to square one. Each step in the process— from identifying the right target to finding a molecule or drug that will properly bind to that target to determining if that drug elicits the right response— is a potential point of failure.
However, using human genetic data offers an edge at each point, improving the potential for success. We’ve shown how human genetics dramatically boost the identification of a potential target; it can also enhance the chances of success at further steps in the process.
Disease Subtypes
But leveraging human genetic data isn’t just about boosting potential targets; it can also help in identifying disease subtypes and safety signals and identifying individuals best to recruit for clinical trials. Human genetic data can be used to identify disease subtypes, which may include different symptoms and progression and may also respond differently to a therapeutic. In addition, human genetic data can be used to identify safety signals, giving researchers more confidence that targeting a gene is safe. For example, with this large dataset, 23andMe researchers can identify individuals with a loss of function variant. Knowing that turning off that specific gene is not deleterious is incredibly important for drug development. Finally, researchers can use genetic data to help identify individuals for clinical trial recruitment who are most likely to respond to a therapeutic, further improving the chances for clinical trial success.
A drug target for an indication is not an actual target; it’s a potential target. Researchers must do additional work to determine if it is the right target for a specific condition. Making that determination is another crucial step that can be enhanced using human genetic data in gene mapping. Gene mapping is vital for understanding whether a potential target is relevant to a specific condition.
23andMe’s massive research data allows scientists to create a gene-mapping algorithm that, in turn, can improve the likelihood of clinical success of a drug target by four or five times, according to 23andMe’s new study. As part of this research, 23andMe scientists mapped 416 unique gene-trait associations, genes linked explicitly to about 190 different diseases or conditions.
Mapping a variant or drug target indication to a nearby gene is called “variant-to-gene mapping.” This “variant-to-gene mapping” helps researchers determine the most likely causal gene from a genetic association. Mapping that out involves using functional data—information on protein and molecular structures, data on biochemical interactions, and other data— to understand the link between a potential drug target and a specific health condition.
Researchers use gene mapping to understand how a specific variant affects the function of a gene and its role in a disease or condition. Is it protective against a disease or condition, or is it causal? What happens if that gene expression is changed or turned on or off?
Rare variants offer another way of making these determinations because a rare variant precisely pinpoints causal genes at the point of a genetic association, something common variants don’t. In some cases, these rare variants might turn off gene expression, known as a “loss-of-function” variant. From a research standpoint, identifying these rare variants and understanding whether that loss of function is either deleterious or protective is incredibly powerful. Knowing that ahead of time gives researchers more confidence that targeting that variant is safe and or has the desired effect. One well-known example of this is a loss-of-function variant in the PCSK9 gene that lowers the level of LDL in the blood. That information has been used to develop cholesterol-lowering medications.
So, identifying rare variants offers researchers a shortcut for focusing on targets with the most promise for success. In addition, 23andMe researchers use machine learning to discern patterns in the data, match genes to diseases using genetic associations, and suggest possible useful drug candidates that match the target variants.
The Human Genetic Advantage
For this research, 23andMe scientists analyzed trillions of data points to illustrate the advantage of adding human genetics in efficiently identifying drug targets, de-risking the drug development process, and dramatically boosting the probability of success.
The new study just scratches the surface of what is possible with this data resource’s vast scale and diversity. It also hints at what is possible when artificial intelligence and machine learning techniques are applied to this powerful human genetic database for drug discovery.
“In an industry where just one in ten drugs make it to patients, the potential impact from what we’re doing at 23andMe and what our scientists have demonstrated with this paper could be game-changing,” said Auton. “We believe that 23andMe can enable the broader biotech industry to bring new drugs to patients faster and more cheaply. ”