A remarkable aspect of 23andMe’s Ancestry Composition feature is that the innovative machine learning technology under the hood gets better and more precise as we add new customers and refine our technology.
We’ve been revving that machine up lately, so this week some customers who have tested on our latest genotyping chip will start to see more precise detail in their Ancestry Composition results.
What’s happening here reflects both changes to our genotyping chip and our ever-increasing numbers of new customers. These new customers enabled us to “retrain” the machine learning algorithm and improve the accuracy and specificity of ancestry assignments we report. So now, for example, a customer with a percentage of their Ancestry Composition previously designated as “Broadly European,” may now see some of this ancestry assigned as “French or German.”
These changes improve an already fantastic feature by boosting precision and recall for many ancestral populations. Customers interested in their Chromosome Paintings may also notice improvements due to our use of an updated “phasing” algorithm.
These changes will also help set the table for 23andMe to begin to add additional reference populations in the coming months. These, in turn, will also improve results for customers.
Currently, we use 24 reference populations from around the world to report 31 population labels, including higher-order groupings. Adding new populations will allow 23andMe to offer more details to customers. That was in part what motivated our researchers to launch the African Genetics Project last year. By adding new African reference populations, we will be able to return more precise results to customers with African ancestry. We plan to do similar work with other understudied populations around the world.
Changes to our genotyping chip will also help with this work. 23andMe periodically updates the chip we use for genotyping. We do this to take advantage of improvements in technology, to update the kind of information we can offer customers, and to offer flexibility for future research. This newest chip — our fifth version, hence the name v5 — is an Illumina Infinium® Global Screening Array supplemented with ~50,000 SNPs of custom content. This array was specifically designed to better capture global genetic diversity and to help standardize the platform for genetic research.
Ancestry Composition is a great tool that promises to keep getting better, so if you haven’t had a look lately check it out. You can read more about the science behind all this by looking at this white paper on Ancestry Composition.