Having been a genetics educator for over two decades, I was excited to learn in May that Berkeley’s incoming freshman class would have an opportunity to discover whether they have an AA, AG, or GG at a specific site near the LCT gene, along with similar information for the ALDH2 and MTHFR genes. Time and again I have seen students, friends and relatives pale at the mention of genetics concepts, until they get their own data and are instead forthcoming with questions and eager to learn. So the idea of university students learning about genetics through a small set of personal genetic data seemed an excellent one. The data would provide an opportunity for discussion of a broad range of topics, from genetics to concepts of race to the ethical issues surrounding medical genetics. And the organizers had thought long and hard about the project, developing, for instance, an online video consent form.
Last week I was deeply disappointed to learn that the course organizers, in response to a decision by the California Department of Public Health, will not be providing students with their personal genetic data. Instead, they will provide aggregate data — summaries of how many students have each of these “genotypes” (for instance, AA, AG, or GG). I understand the importance of making sure that students’ privacy is protected, and that students are adequately informed before obtaining their data. And I think the topic of whether the CDPH decision makes sense is a question worthy of further discussion. But today I will focus on a different, related topic, because, after a brief period of disappointment over this decision, I began to think more constructively. From an educator’s perspective, there are still valuable genetics lessons to be learned via this initiative, even if students don’t gain access to their personal genetic data.
In fact, were I coordinating the project, even if there were an opportunity to provide students with personal data, I would refrain from giving students immediate access. Indeed, I would wait to provide access to aggregate data as well. I would start by inviting students to make an educated guess regarding their genotype for each of the genetic markers in the LCT, ALDH2, and MTHFR genes. Presumably the CDPH doesn’t have jurisdiction over students guessing their genotype, even though a guessing exercise would lead to far less accurate personal data than genotyping would. So how would students guess their genotype? Let’s take each of the markers one-by-one.
Lactose tolerance (LCT gene): A genotype of GG at one particular position near the LCT gene can lead a person to have trouble digesting the lactose in cow’s milk. Given that Berkeley students can drink milk legally, making a guess regarding their genotype (AA, AG, GG) for this marker is more straightforward than for the other two genes. They can base their guess regarding their “lactose tolerance” genotype on their own experience drinking milk, and the experiences of their family members. If, for example, they don’t digest milk well, and neither of their parents drinks milk comfortably, they might guess that their own genotype is GG. If only one parent has enjoyed drinking milk as an adult, their initial guess might be AG. And if everyone in their family drinks gallons of milk, their best guess would be AA. Next, I would provide them with frequencies of each of the LCT variants (A and G) in a number of populations. Such data are readily available. Students would be invited to put these two sets of data (personal experience, family history and population frequencies) together and guess their genotype, recording it for themselves and submitting it (anonymously) to a course instructor.
Alcohol metabolism (ALDH2 gene): A genotype of AA at one particular position in the ALDH2 gene can lead a person to have trouble tolerating alcohol so that they experience what is called “alcohol flush” and other discomforts. The steps for guessing one’s genotype are similar to those for LCT, except that most Berkeley freshman cannot drink alcohol legally in California. Therefore their guess regarding this marker will need to be based on information from parents or other older relatives. Students could find out if their relatives ever “flushed” after drinking alcohol. If two parents said yes, then the student might guess that they have the AA genotype for this locus. If the flushing is moderate, the student might guess that they are most likely to have the AG genotype. Next, I would provide them with frequencies of each of the ALDH2 variants (A and G) in a number of populations. As for LCT, students would be invited to put these two sets of data (family history and population frequencies) together and guess their genotype, recording it for themselves and submitting it (anonymously) to a course instructor.
Folic acid requirement (MTHFR gene): A genotype at one particular position in the MTHFR gene can influence a person’s requirement for folic acid (found in, for instance, leafy green vegetables). Here most students would have little personal experience or family history to help them out. So they would need to rely on the population frequencies. They might even use a random number generator to help them make an educated guess. If one parent were affiliated with one population and the other parent with a second population, the student would use the relevant population frequencies of the two populations. As before, they would record their genotype guess for themselves and submit it to their instructor.
During this “guessing” process students could discuss how comfortable they are sharing the information with one another. Would they be comfortable making the information public? Do they see any differences between sharing their guesses and sharing their actual genotype data (were they to have access to it)? They might also discuss the implications of the different frequencies in the different populations. Is any variant at 100% in any population and 0% in another so that ethnicity predicts genotype? Are their stereotypes around the phenotypes of alcohol flush and lactose tolerance? What about the health implications of the information?
Once the students have submitted their guesses for all three markers, instructors would tabulate and report back the aggregate data. If students have also submitted information regarding their sex and ethnicity, data could be summarized within those categories. At this point it would be time to reveal the actual aggregate data to the students. The big questions would be: How close are the students’ guesses (in aggregate) to the actual data? Are they close for all three markers? If not close, what are the possible explanations? Lack of true correlation between genotype and the trait? Incorrect published estimates of variant frequencies? Laboratory error? Incorrect calculations by the students when they made their educated guesses? What else might be going on?
Under the current plan, this comparison of the summary of educated guesses and the summary of actual data would be the final chapter of the genetics component of the course. But maybe next year would be different. Maybe this year’s freshman class at Berkeley would pave the way for future classes to have one more small but highly significant step: comparing their individual educated guesses with their actual personal genotype data. The personal data would provide opportunities for several powerful lessons. For instance, a number of students would discover that they have variants that are often reported as being found only in ethnic groups other than their own, revealing the fallacy of equating race or ethnicity with biology.
Having been through the preparatory lessons and exercises, interested students would be prepared to gain access to their personal genetic information. And thousands of individuals would be better prepared to consider the genetic information that will be broadly available in the near future.