Lecture 14: Quantitative Genetics: mapping QTL
Contents
Lecture 14: Quantitative Genetics: mapping QTL¶
The ultimate level of causation in biological systems is the genetic control of phenotypic variation. In this section of the course we’ll discuss how one can go about mapping phenotypic variation, down to the level of individual alleles and genotypes.
Classical Mapping in Drosophila¶
Mapping genes began early in Drosophila perhaps because of the availability of many visible mutations which allowed the investigator to readily pick out genotypes. Thomas Hunt Morgan’s lab at Columbia was the birthplace of Drosophila genetics and the science of gene mapping to our study starts with him. In particular Morgan noticed that some genes were partially linked, and that the probability of obtaining recombinants between genes was variable. Alfred Sturtevant, an 18 year old student at Columbia who was working in Morgan’s lab thought about the data and came up with the idea that because of this variability one might be able to create a linear map of where genes sit on a chromosome. Sturtevant went home one night with tables of recombination fractions between visible mutants on the X chromosome and came back the next day with the world’s first genetic map. What have you been doing with your evenings? (An amazing book documenting the early history of Drosophila genetics is “The Lords of the Fly” by Robert Kohler).
Sturtevant’s concept of genetic mapping relies on the very same linkage disequilibrium that we studied earlier. In essence the ability to map genes based on neighboring markers all boils down to the fact that LD is predicted to be larger over shorter genetic distances. Sturtevant’s idea also proved to be general: any gene can be mapping by looking at how it co-segregates with linked markers. If a marker segregates nearly identically to a gene, you know that that marker is at least close (genetically) to the gene.
This brings up the necessary detour of genetic versus physical distance. Physical distance is the distance two loci are from one another on a chromosome and it is measured in basepairs, or kilobases, or megabases, etc.. Genetic distance is the distance that two loci are from another with respect to recombination. Genetic distance is measured in centimorgans (cM) where 1cM represents a recombinational distance of 1 recombinant progeny in 100 offspring (1% recombination fraction). Only “normal” recombining chromosomes physical distance and genetic distance are often correlated, although not perfectly. On non-recombining chromosomes (like the Y chromosome) loci can be arbitrarily far in basepairs but still have a genetic distance of zero.
Drosophila genetics has advanced over the past 100 years to the point where now most any mutant can be localized within a few generations of crossing. In class we considered the mapping of a recessive lethal mutation using the rucuca/ruPrica multiply marked chromosomes. The idea of behind that cross is that recombination between a chromosome that is wild type except for our recessive lethal, and our multiply marked chromosome allows multiple intervals to be tested for our mutation. Just by reading off which visible mutations are present on a chromosome, the interval containing the mutant of interest can be found.
QTL mapping¶
QTL mapping takes essentially the same approach as the classical genetic model, but now two things have changed: 1) the phenotype is a continuously varying trait, and 2) the markers are molecular markers (e.g. SNPs, microsatellites, VNTRs) rather than visible markers. Because the phenotype is continuous, we are now interested in describing markers that are associated with loci which explain additive genetic variation in some trait. We call these loci Quantitative Trait Loci (QTL). Any individual QTL might only represent a small proportion of the additive variation behind some trait, but this need not be the case. Often for any given trait multiple QTL will be found, each of which might describe a lesser and lesser degree of the variation. Some people draw an arbitrary but useful distinction between minor and major QTL.
One standard way of QTL mapping is to use a crossing design which produces Recombinant Inbred Lines (RILs). Imagine that you wanted to map the allelic variation responsible for the aggressive behavior seen in Japanese fighting fish. You could start by taking fish which had been selected to be aggressive for many generations (assume this population is completely homozygous) and breed them to fish which are not aggressive (also assume this population is completely homozygous). The first step is to produce \(F_1\) heterozygotes between our populations. These \(F_1\)s are then allowed to mate with each other, recombining along the way to produce \(F_2\)s. These recombinant \(F_2\)s are then selected, and individually inbred for multiple generations to just capture a single chromosome/genome (each of these lineages is a RIL). RILs are effectively scrambled together versions of the parental lines, each RIL though is a unique combination that resulted from meiosis.
Each RIL can then be phenotyped and genotyped at markers throughout its genome to see which markers segregated with the aggressive behavior. Those markers that are more associated with aggression, should be genetic close to the gene responsible for the behavior. Again LD is doing all the work for us.
Generally QTL mapping is done in a manner where the entire genome is scanned using a lower density of markers. This opens up the possibility that individual QTL recovered will be large and full of many genes. In particular QTL are often very large if recombination rates are low in a region- why should this be? From this point traits can be fine-mapped using further QTL mapping techniques, or other techniques such as association studies.
SNP Association Studies¶
The latest iteration of the mapping technologies is called SNP association mapping. This technique utilizes our modern ability to genotype individuals at hundreds of thousands of loci simultaneously using microarrays or related technologies. SNPs are the markers of choice for this method, as thus individual genotypes can be called at huge numbers of loci very rapidly.
The idea behind SNP association mapping is very similar to QTL mapping: what one is looking for is individual markers, or a cluster of markers, that is closely associated with some trait through being in LD with the variants underlying phenotypic variation. There are some major distinctions however. Generally association mapping provides a level of resolution that QTL mapping cannot just due to a higher density of markers. QTL analyses revolve around tightly controlled crosses and pedigree designs whereas association mapping is more useful in outbred populations, or populations for which the pedigree is unkown. As a result of this distinction, population subdivision, and other nonequilibrium population genetic forces can skew the results of association studies.
Whole genome association mapping (WGA; also sometimes called genome-wide association mapping) is now big business in human medicine. WGA proceeds by phenotyping tons of individuals at different traits (i.e. disease status, weight, blood pressure, etc.) and then genotyping those individuals at \(10^6\) SNPs. Individual SNPs are then tested for association with the trait. So far some big successes in the field (good example: age-related macular degeneration) but also some serious problems. Huge problem: multiple testing. If you do 20 statistical tests you expect one to be significant at random (if significance cut off is 0.05). Now what happens if you do \(10^6\) statistical tests? How do we know what a significant SNP is? Often we need more data, or replicated mapping population.