Now, this model is a simplification because some families have a dozen children and some have none; but this model does help illustrate just how many potential distant cousins are out there. Finding a relative in the 23andMe database, however, rests on two other important conditions. First, the relative has to be a 23andMe customer. Second, you and the relative need to share a long identical segment of DNA. The more distant a pair of cousins, the less identical DNA they share. Very small fragments of DNA can be hard to detect with our algorithm. So although you might have thousands of 6th cousins, we think we can only detect about 4% of them with 23andMe’s current technology. Our detection success is much higher for more closely related cousins, though – we can detect about 46% of 4th cousin pairs and 90% of 3rd cousin pairs.Our ability to detect a person’s distant cousins is also influenced by the ancestry of the individual. If you are Ashkenazi Jewish, you may have noticed that 23andMe’s Relative Finder feature shows you over a thousand cousins. This is because Ashkenazi Jews are more closely related to each other than a random sample of European-Americans. Over the past several hundred years, a cultural tendency to choose marriage partners of the same ethnicity (also known as endogamy) means that Ashkenazi individuals are more likely to share the same ancestors. In fact, we estimate that any two randomly chosen individuals who identify as Ashkenazi are on average the genomic equivalent of 4th-5th cousins, because they share many recent common ancestors.This phenomenon doesn’t just occur in the Ashkenazim. We looked at 121 populations, many from the 23andMe customer database. Pairs of individuals in Iceland, Finland or South Africa are more closely related than pairs of individuals from Italy or Japan.
In our analyses, we also looked at DNA data for over 5,000 individuals with just European ancestry in the 23andMe database. The DNA indicated that in this sample there are over 5,000 3rd cousin pairs and 30,000 4th cousin pairs. This result is important beyond thinking about genealogy. It means that large disease association studies that sample thousands of individuals will have many pairs of distant cousins in the dataset. By identifying cryptic (or non-obvious) relatives in databases, researchers can see if certain disease mutations occur more often in different extended families.