Oct 23, 2024 - Ancestry

Improvements to DNA Relatives: New HybridIBD Algorithm and Relationship Predictions

Recently, 23andMe significantly improved relationship estimates with two new updates: a new algorithm that improves predictions for distant relatives and HybridIBDSM, a new method to determine DNA Relatives.

These improvements are particularly beneficial for those who use 23andMe to find connections with their most distant DNA Relatives. Whether you’re a hobbyist dabbling in genealogy or a dedicated researcher tracing distant branches of your family tree, these changes will provide better insights to help you delve more into your genetic connections.

What These Updates Mean for You

With these updates, all customers will receive improved relationship predictions. These predictions will help you uncover older family connections that may date back much further than previously possible.

Existing 23andMe+ Premium and Total Health members will automatically be computed using the new HybridIBD algorithm. This will result in many new distant relatives. This update will provide more reliable insights for your genealogy research. 

All new 23andMe ancestry customers will be computed using the new HybridIBD algorithm. 

Existing Ancestry and Health + Ancestry Service customers who want their results to be computed using the HybridIBD algorithm should upgrade to 23andMe+ Premium or Total Health. 

IBD: The Science Behind Your Genetic Matches

Have you ever wondered what the signature of genetic relatedness is? The hallmark of genetic relatedness is long tracts of DNA shared between two people. Using shared DNA to estimate relatedness is called identity-by-descent (IBD).

We use IBD extensively within 23andMe, especially in our feature, DNA Relatives. DNA Relatives shows customers how close their relationships are.  We are launching these two updates to improve the accuracy of our relationship predictions. The first update changes our method of making relationship estimates from IBD data. The second update improves the algorithm we use to detect IBD in the first place.

Improved Relationship Predictions

You may have noticed that your predictions from 23andMe and other companies top out around the 5th or 6th cousins. This is regardless of how far down the list of relatives you scroll. For the better part of two decades, researchers have thought that the most distant relationship we could detect using IBD was around 15 degrees. This corresponds roughly to an 8th cousin who shares a common ancestor during the American Revolutionary War. It has long been possible to detect small shared segments that arose hundreds of generations in the past. However, it was thought that the long shared DNA segments detected among genotyped individuals in large databases like that of 23andMe represented relatively recent relationships.

Most relatives inferred by 23andMe are probably moderately distant, perhaps from 4th to 15th cousins. However, a new 23andMe paper released earlier this year demonstrated that the most distant relationship we can detect may be closer to a few hundred degrees. This corresponds to a common ancestor who lived thousands of years ago, perhaps before the rise of the Roman Empire.

There are two primary reasons existing relationship estimates do not go back that far. First, existing inference methods underestimate the probability that distant relatives share IBD. These estimators assume that each pair of distant relatives is connected through only one genealogical lineage. This can be a reasonable assumption when two relatives are closely related, but the assumption breaks down when we consider relationships whose common ancestors lived many generations in the past. In the distant past, each person shared many common ancestors with each of their relatives. Each person is connected to their ancestors by many different lineages, a phenomenon known as “pedigree collapse”.

Published Research on Relationship Inference

Our recent paper pointed out that the formulas underlying existing relationship inference methods don’t account for pedigree collapse. The fact that we are connected to each of our relatives many times over means that there are many paths for DNA transmission, not just one. This leads to a higher probability than we thought of sharing DNA with very distant relatives. As a result, there is a reasonable chance of sharing DNA with someone through an ancestor who lived many hundreds or even thousands of years in the past.

The second reason existing relationship estimators underestimate relationships is that they are applied almost exclusively to pairs of individuals known to share IBD in the first place. Genetic testing companies use a two-step procedure to identify your relatives. First, they find people with whom they share IBD. Next, they use this IBD to estimate relationships. However, the formulas that underlie existing relationship estimators aren’t designed for this approach. Instead, the formulas assume that two putative relatives were selected regardless of whether or not they shared IBD. Until our recent paper, it wasn’t well understood that this unintended application of estimators profoundly affected the inferences they made. 

Below, Figure 1A shows the result of estimating a relationship using an existing relationship estimator, i.e., without accounting for two people sharing IBD.

In comparison, Figure 1B shows estimates made when accounting for the fact that IBD is shared. Figure 1 shows that existing relationship estimates all have a ceiling of around 10 degrees and are profoundly biased for more distant relationships. In comparison, a new estimator that accounts for observed IBD can generate very deep relationship estimates. Moreover, the higher uncertainty in the latest estimates captures the true range of likely relationships.

Two graphs showing degree of relatedness.

Figure 1: Existing and new relationship estimates. A) A relationship estimator that does not account for the fact that at least one segment of IBD is observed. B) An estimator that accounts for observing at least one segment of IBD. 

Introducing HybridIBD: Exceptionally Accurate DNA Matching

The new DNA relationship estimates described above rely on accurate IBD detection. HybridIBD is a new IBD detection algorithm that combines strengths from two separate 23andMe IBD algorithms to dramatically improve IBD detection. Let’s take a look at these two algorithms.

  • phasedIBD compares matches between phased genomes, separated into distinct parental contributions. It excels at finding the short IBD segments that connect you to distant relatives. With phasedIBD, segments down to 5 cM or shorter can be identified reliably. This is important for accurately identifying distant relatives!
  • IBD64 works by comparing unphased genomes, which gives it an advantage for finding long IBD segments shared between close relatives since these are often considerably fragmented by phasing errors.

    For a more detailed assessment of phasedIBD and IBD64, see the paper authored by 23andMe scientists in Molecular Biology and Evolution.

With HybridIBD, every new genome is first analyzed by phasedIBD to find distant relatives. Then, any pair sharing a significant amount of IBD has refined their IBD using IBD64. This dual approach leverages the strengths of each algorithm, improving the overall accuracy of DNA Relatives.

We’ve developed a robust and scalable pipeline for HybridIBD that processes new customers against our database of over 15 million individuals, discovering approximately 300 million new relationships daily!

Benefits for 23andMe Customers

When the new relationship estimators are implemented, all 23andMe customers can expect improved distant relative predictions for many relatives currently predicted as second cousins or higher.  Customers will also see improvements in some close relationship predictions, especially first cousins, as the new estimator better accounts for age differences between you and your relatives.

Customers with Ashkenazi Jewish ancestry will see particularly large improvements in their distant relative predictions because our new relationship estimator better accounts for pedigree collapse, which was particularly strong in the historical Ashkenazi population. The new, wider confidence intervals also better capture the true range of likely relationships.

Customers’ results computed with HybridIBD can expect further changes. First, the accuracy of distant DNA Relatives will increase significantly. With phasedIBD now responsible for identifying distant relatives, many previous identifications made by IBD64 will be superseded by more accurate matches from phasedIBD. Two particular improvements to call out are that customers with South Asian, West Asian, or North African ancestries will find fewer DNA relationships with customers with only European ancestry, and customers with East Asian ancestries will see significantly more accurate second, third, and fourth-cousin relationships.

Learn More

If you’re curious to learn more about the science behind these updates, we recommend checking out these papers:

Find out more

23andMe customers can see their Ancestry Composition

Are you still waiting to be a customer? Find out more about 23andMe’s  Ancestry Service and other services.

Stay in the know.

Receive the latest from your DNA community.