Feb 22, 2023 - Ancestry Service

New Algorithm Cleans Up 23andMe Family Trees

Family Tree

Some types of close relatives, like aunt, uncle, half-sibling, nephew, niece, grandparent, and grandchild have very similar patterns of shared DNA, making predicting those relationships from DNA alone difficult. That’s why some customers might see someone incorrectly labeled as one of these relationships. To help improve detection of these relationships, 23andMe is now using a new technique that analyzes DNA from full siblings.

23andMe, like other companies, uses shared segments of DNA to detect customers’ relationships to each other. But unlike other companies, we also combine information from every pair of relatives using 23andMe’s Bonsai software. In addition to building your 23andMe Family Tree, Bonsai uses segments from multiple pairs of relatives at once, allowing it to make a better prediction than if we only analyzed you and each of your relatives one by one. 

Still, no way of estimating relationships from DNA is perfect, and that’s true even for close relatives. 

RelationshipAmount of DNA shared (95% confidence interval)*Age distribution 23andMe uses (95% confidence interval)
Half-sibling~19.8-30.2% 17.4 year age difference
Avuncular (aunt/uncle- nephew/niece)~20.3-29.7%17.5 – 39.3 year age difference
Grandparental~18.7-31.3%45.3 – 67.7 year age difference
* Calculated using shared segments from the autosomes (i.e., excluding the X and Y chromosomes) from simulated relatives.
Using ages or shared parents to distinguish relative types

Telling these relative types apart is especially difficult because the amount of DNA each type shares is almost identical (see the above table). Because of this similarity, Bonsai relies on customers’ self-reported ages to make a decision about their relationships.  

Even though general age differences exist between these relative types—half-siblings are typically closer in age than aunt-nephew or uncle-niece relatives—there are always families that don’t follow these trends. In such families, Bonsai is prone to making mistakes. For example, as shown in the above table, Bonsai will often consider half-siblings who were born 20 years apart as avuncular or it will confuse a grandmother for an aunt if the grandmother is less than 40 years older than her grandchild.

It’s worth noting that if the parent who links two half-siblings, a grandparent and grandchild, or an aunt/uncle and niece/nephew is a 23andMe customer, Bonsai would detect this and get the relationship right in nearly all cases. 

For example, in the figure below, if we had data for Ava and Beau’s parents, Bonsai would almost always infer their mystery relative correctly as the aunt of the two siblings. Still, not all families have these parents or other informative relatives tested, and in the end, Bonsai predicts a significant fraction of half-sibling and avuncular relatives (aunt/uncle or niece and nephews) incorrectly when it relies on age data alone to distinguish between them (see the table near the end of this post).

On average, people share ~25 percent of their DNA with half-siblings, aunts, uncles, nieces, nephews, grandparents, or grandchildren. Based only on a DNA comparison of a pair of relatives, it is difficult to distinguish between these relationships and pinpoint the correct one.  But there’s a simple trick to help narrow down the correct relationship when one person in the pair has a full sibling: If—at the same location in their genomes—a pair of siblings share identical segments of DNA with another relative, and those siblings do not share any DNA at those locations with each other, this often means that the relative is an aunt or an uncle to the siblings. Having more than 90 cM of such locations almost always means the relative is an aunt or uncle.

A new approach for distinguishing aunts and uncles from other relative types

The ambiguities between these relationships would be impossible to fully resolve except for the fact that information from shared DNA segments can be combined and their positions analyzed across multiple relatives. 

In the case of distinguishing avuncular from half-sibling or grandparental relationship types, we can leverage information carried by DNA segments that are shared between full siblings. Full siblings are related through both of their parents, and this leads them to have a somewhat unique pattern of sharing: an average of 25 percent of their DNA is contained in fully identical segments. This is roughly 880 centiMorgans in 23andMe customers. (CentiMorgans, or “cM,” are a unit of measuring lengths of DNA.) In other words, full siblings have certain sections of their chromosomes where they each inherited a matching segment of DNA from their mom and, at the same location, a matching segment of DNA from their dad. With such high fully identical sharing rates, genetic relationship prediction methods rarely make mistakes when labeling full siblings.

A Technique for Detecting Aunts and Uncles

Researchers at Cornell University used this signature of fully identical segments from full siblings to build a technique for detecting aunts and uncles of a pair of siblings even if their parent has not had their DNA tested.

The idea is to find places where the shared segments between siblings Ava and Beau and their aunt or uncle indicate that the siblings’ parent has a fully identical segment with the aunt or uncle. If this recovers a lot of fully identical shared segments, we can be quite confident that the aunt or uncle is an aunt or uncle and not a half-sibling, niece, nephew, grandparent, or grandchild.

Full siblings

Consider the figure below that again shows DNA from the full siblings Ava and Beau, their father Tom, and the father’s full sister, Maria. In the highlighted regions, Ava and Beau inherited DNA from different copies of their father’s chromosomes: Ava inherited DNA from their father’s mom and Beau from their father’s dad. Because of this (and because they inherited DNA from different copies of their mother’s chromosomes), Ava and Beau have no shared segments with each other in the chromosomal region with a box around it.

This fact allows us to reason about whether, in the boxed region, the untested father had a fully identical segment with Maria (the potential aunt). Sure enough, in this location, Maria is fully identical to Tom (has the same DNA as him) and has a shared segment to each of the siblings. All this allows us to conclude that, in this region, the father does have a fully identical segment to Maria, and, with enough of these present, the updated version of Bonsai would conclude that Maria is Ava and Beau’s aunt and not any of the other relationship types with similar amounts of shared segments.

Can other close relatives have this pattern of shared DNA?

Importantly, neither a grandparent, grandchild, half-sibling, niece, or nephew is expected to have fully identical sharing with Ava and Beau’s parent, or with the parent of any pair of siblings. The key reason is that fully identical regions occur at places where two people inherited the same DNA from both their parents. 

Because full siblings share the same two parents, fully identical regions arise most frequently between them. You can see this above by looking at Ava and Beau’s parent Tom and his full sibling Maria. A half-sister of Ava and Beau through Tom should not share fully identical segments with Tom because she would have inherited only one chromosome from him—and the other from her mother. That mother will typically be unrelated to Tom and so will not share DNA with him. Similarly, Ava’s grandson would inherit Ava’s DNA, but only on one of his chromosomes, making fully identical sharing with Ava’s father Tom impossible in nearly all cases. By the same token, Ava’s grandmother (Tom’s mother) shares DNA with Tom on only one chromosome, and Ava and Beau’s nephew, being Tom’s grandchild, could only inherit DNA from one of Tom’s chromosomes at any location.

Fully Identical Segments

Even though this technique  is designed to detect aunts and uncles, when a half-sibling does not have these implied fully identical segments, that provides evidence that they are half-siblings, and this can override the age-based relationship detection in Bonsai. 

For example, if two full siblings had a half-sister that is >20 years older than each of them, the old version of Bonsai would predict the half-sister as the aunt of the two full siblings (assuming other close relatives such as the shared parent had not done a 23andMe test). However, the latest Bonsai version would look for implied fully identical segments, and, finding few or none of these, it would conclude that a half-sibling relationship is more likely than aunt. Therefore ages have less influence on the Bonsai relationship prediction for these classes of relationships (half-sibling, avuncular, and grandparent-grandchild) than before this new release.

Incorporating this technique into Bonsai greatly improves relative prediction accuracy
Relationship to full sibling pairPercent correct without IBD011Percent correct with IBD011
Aunt/uncle98.7%~100%
Half-sibling75.6%83.8%
Grandparent97.9%98.6%

We have incorporated this technique (which researchers called IBD011 for reasons described in the summary figure below) into the latest version of Bonsai. The table above shows the percent of relatives that Bonsai correctly infers both without the IBD011 signal and with it. Bonsai does very well in both settings, but always much better when using IBD011, so it now works better at telling apart half-sibling, avuncular, and grandparent-grandchild relatives. 

Here’s how you can apply this updated Bonsai method to your relative predictions:

You can use this updated Bonsai method by rebuilding your family tree. To do this, you can request a recalculation of your tree through a link in the DNA Relatives FAQ page, in the question called “How do I recalculate my tree?” 

The 23andMe database currently includes over 2 million customers who could be affected by this update, which translates to many thousands of relationships potentially changed. For those of you with a full sibling and a predicted half-sibling, aunt, or uncle, the updated Bonsai may produce a different (and more trustworthy) tree. 

Keep in mind that unfortunately your tree annotations don’t carry over when you rerun Bonsai. Still, some of you will see changes and the predictions Bonsai now makes are much more reliable for the close relationship types discussed here.

What is IBD011?

On average, people share ~25% of their DNA with their half-siblings, aunts, uncles, nieces, nephews, grandparents, or grandchildren, among a few other less common relationship types. 

Based only on a DNA comparison of a pair of relatives, it can be very difficult to distinguish between these relationships and pinpoint the correct one. 

But there’s a simple trick to help narrow down the correct relationship when one person in the pair has a full sibling: If — at the same location in their genomes — siblings share identical segments of DNA with a relative, and those siblings do not share any DNA at those locations with each other, this almost always means that the relative is an aunt or an uncle. 

Stay in the know.

Receive the latest from your DNA community.