Massive study on the genetics of educational attainment

In one of the largest genetic studies ever done, an international consortium of scientists has found more than 1,200 genetic variants associated with educational attainment, which is defined as the number of years individuals spent in school or university.

While income and environment strongly influence the amount of formal education a person completes, genes associated with such things as brain development, cognitive function, and personality traits also play a significant and measurable role, according to the study.

Indeed this same group of researchers, the Social Science Genetic Association Consortium, completed a similar but much smaller study two years ago and identified 74 variants associated with educational attainment.

Daniel Benjamin, associate professor at USC’s Center for Economic and Socials Research.

But this study, which includes data from more than 1.1 million people, is more than three times as large as the study done two years ago. Researchers used de-identified aggregaged data from 23andMe customers who consented to participate in research, data from the UK Biobank, and 69 other smaller research cohorts. The large size of this study offered scientists unprecedented ability to identify genetic variants that have even a very small influence on educational attainment.

“Even variants with the largest effects predict, on average, only about three more weeks of schooling in those who have those variants compared to those who don’t,” said Daniel Benjamin, a lead author on the study and an associate professor at the Center for Economic and Social Research at the University of Southern California. “Yet when we analyze the combined effects of many genetic variants, taken together they can predict the length of a person’s formal education as well as demographic factors.”

Benjamin is alluding to an additional step the researchers took for this study. The team was able to create what is called a “polygenic score” to determine the predictive power of the genetic variants associated with educational attainment. A polygenic score is essentially an algorithm that adds up the impact of multiple variants altogether. The researchers used more than a million variants across the genome and found that the polygenic risk score predicted educational attainment just as well as looking solely at demographic influences. Using the polygenic model the researchers were predictive of 11-to-13 percent of variation in the number of years completed in school. 

While this finding is promising, the researchers point out that even taken together the genetic variants they found associated with educational attainment, they are far from being able to determine educational attainment through genetics alone or even explaining all the genetic influences. Twin studies suggest that there are yet undiscovered genetic factors involved. Twin studies suggest that genetics accounts for about 20 percent of the variation in educational attainment. So the researchers believe there are many, many more variants, perhaps millions that have yet to be found. Identifying those additional genetic variants will require studies of even larger size.

The study also illustrates the power of big data and the potential of the research model pioneered by 23andMe not just for work like this by social scientists, but studies of other traits, health conditions and rare disease. Earlier this year, a study on insomnia using data from more than 1.3 million people, making it the largest genetic study to date, also depended on 23andMe and UK Biobank data for insight.

“Having so many research participants has been instrumental in making these kinds of genetic studies possible.”, said David Hinds, Ph.D., a research fellow and statistical geneticist at 23andMe. “This is part of what makes the 23andMe model unique.  Studies of very complex outcomes like educational attainment will require very large sample sizes, and this work demonstrates the value of the 23andMe model of genetics research as really a two-way process. Our customers choose to share all sorts of information about themselves with us, which enables us to make discoveries and share back more information about their genetics.”