A Different Kind of Gene Mapping: Comparing Genetic and Geographic Structure in Europe

By Chris Gignoux and Mike Macpherson

It should be no surprise that in general, we are more genetically similar to our neighbors than to people living far away. The reason is fairly simple – until recently in human history it was fairly rare for people from widely separated geographic regions to even meet, much less reproduce.

This pattern, known as isolation-by-distance, has been observed in a number of studies over the past several decades. This week, it has been confirmed in Europe by the largest study of its kind to date.

The researchers produced a two-dimensional map, like the one below, that preserves the genetic similarities between individuals as far as possible; in other words, the closer two dots (people) are on the map, the more closely related they are genetically.

Two dimensional genetic similarity map of Europeans showing the northern and southern clusters. Each colored symbol in the plot on the left represents a single person’s genotype. Note the similar placement of symbols on the plot to the left and the geographic legend to the right. Adapted from Tian et al., Plos Genetics, (2008).

In the figure above, each individual was labeled with their country of origin after the mapmaking procedure was run. If Europe were genetically homogeneous, you would expect the different nationalities to appear in a jumble. Instead, they   separate into clusters that, remarkably, roughly recapitulate the geography of Europe.

Northern vs. Southern Europe

Even though Europe has been occupied for only a relatively short time compared to other parts of the world, different populations within the continent have had time to differentiate from one another. Scientists have known for a long time that certain traits, like lactase persistence and light-colored eyes and hair are more common in northern than in southern Europe. Likewise, there are certain diseases such as sickle cell anemia that, although rare across Europe, are found more in the south than in the north. Height and skin color also vary from northern to southern Europe: both vary gradually with latitude rather than in quick jumps. Early genetic studies (such as those in the landmark population genetics text History and Geography of Human Genes) showed that this north-south cline was also a genetic one: even though Europeans of different nationalities did not fit into simple clusters, there was an overarching north-south difference. Newer studies have increased the number of people typed, and the number of markers, to approach the genome-wide level of hundreds of thousands of SNPs we use here at 23andMe – which brings us to this week’s paper.

A summary of genome-wide findings

The Lao et al. study out this week obtained genotypes from more than 2,500 individuals of known European ancestry. Each of the genotypes consists of about half a million SNPs typed on the Affymetrix 500K, a chip similar in size to the Illumina 550K used here at 23andMe. They confirm the findings of several recent but smaller European studies (Seldin et al, PLoS Genetics (2006); Bauchet et al, AJHG (2007); Tian et al, PLoS Genetics (2008); Price et al, PLoS Genetics (2008); Paschou et al, PLoS Genetics (2008)), namely:

  • Over all SNPs, Europeans are very genetically similar.
  • There is a small set of SNPs that does allow European populations to be distinguished – at least when used among people whose ancestors are all from the same part of Europe – and they are surprisingly effective.
  • Most of the genetic variation in Europe is found along the north-south axis, which is consistent with archaeological knowledge. The next most prominent axis of genetic variation runs roughly east-west.
  • More isolated populations tend to exist at the extremes of these plots. In the case of this current paper the Finns are the only nationality completely distinct from the rest of the European samples. The Finns speak a different kind of language from much of the rest of Europe, and are the only Scandinavian population represented.
There’s plenty of action in the blogosphere on this one. For more discussion check out dienekes’ anthropology blog, anthropology.net, gene expression, and genetic future.
  • rogers

    I have not read the paper (yet!) but based on the map I find it somewhat misleading as there are instances where genetic similarities between populations exist that are not even remotely close geographically.

    Take for example the Y-chromosome haplogroup I. Its frequency among the male population between regions of southern europe and northern europe are remarkably similar. More specifically, in the north west balkans the frequency of the “I” haplogroup matches or exceeds frequencies of that found in areas of far northern europe. Perhaps this is why this region of europe was left out of the study?

  • This paper, and the others like it, use data from across the entire genome to determine genetic distances and associated techniques to best fit the data across multiple dimensions. The dimensions that do come out are not an attempt on the authors’ part to create a map out of genes: these are the first two dimensions of variation in the data. They just happen to line up well to N/S and E/W axes.

    It is sometimes hard to believe how good the concordance is between genes and geography in Europe, but it’s worth noting that this is what you would expect: neighbors are more likely to mate with each other than people on opposite sides of the continent. Thus, over time, neighbors will be more likely to be similar to one another than to people from far away.

    Your example of Y-chromosome haplogroup I is worth bringing up. However, on more precise examination there are different lineages within haplogroup I in different parts of Europe. I1 is found further north, on average, and I2 is found at higher frequencies in the Balkans. Most markers across the genome provide very little information (less than the Y-chromosome haplogroup I SNPs, for example) about ancestry, but combined, hundreds of thousands of markers do tell quite a lot. If you are a customer at 23andMe or have a demo account, you can see it for yourself. The new Advanced Global Similarity feature puts you in this same sort of display:
    check it out!

  • rogers

    @ chris

    I submitted a sputum sample to 23andme and was PC plotted in Northern Europe with Germans and Austrians however my ancestors are from Croatia (southern Europe). Moreover, my closest genome matches are from a male from Wales, Belgium, Germany and Scandinavia.

    It just goes to show that autosomal DNA is not the be all and end all in determining genetic similarity.

    I still think the PLOS genetics study is a farce and very bias. If I were a reviewer of the manuscript I would have torn it to shreds and asked for a major review.

  • Thanks for your comments, rogers. I believe you are discussing two related but distinct issues. One is whether genetic distance (derived from autosomal data) correlates with geographic distance at the population level. The other is how reliable a given individual’s position in one of these genetic distance plots is.

    As to the first, there is quite a number now of these studies (several more have emerged since this blog post), conducted by different researchers on independently-collected genotypic data, and many find a strong correlation between genetic and geographic distance. This signal is fairly robust. I disagree with your broad claim that there were somehow major methodological problems with any of the several PLoS Genetics studies, but would be interested to learn more about what specifically you found to be farcical or biased about them.

    As to your position in Global Similarity: Advanced, this does depend on the reference genotypes that are available to the analysis. At the moment we have fewer Southern (and Eastern) European reference individuals (the colored squares in the plots) than we would like. This is something that we hope to address as more data become available. Were we to come into a number of Croatian samples it would be informative to see where you fall with respect to them. If you moved towards the Croatians, that would indicate that your present position was inaccurate due to the absence of relevant reference data. If you stayed about where you are, that would suggest that you might have some northern European ancestry of which you were not previously aware.