Lecture 18: Phylogenetics I: systematic analysis and inference
Contents
Lecture 18: Phylogenetics I: systematic analysis and inference¶
First of two lectures on systematics. We are following a natural progression from the variation and dynamics of genes within populations, to divergence of populations and speciation to systematics: the scientific study of the kinds and diversity of organisms and their relationships.
Systematic modes of analysis¶
A systematic or phylogenetic perspective on diversity of life itself follows logically from the fact that there is a phylogenetic tree that relates all organisms: from one generation to the next there is a pedigree that relates the parents to offspring. Within a population at any one time there is a complex pedigree or network of the ancestry of genes that describes who received what genes from whom (sometimes called the coalescent tree). Among populations of a species there is a tree indicating which populations diverged from others and the sequence of branching events of population separation. At the species level, there is another tree of relationships that describes the sequence of branching events that led to the formation of descendant from ancestral species. Thus, just as a kind of population thinking is required to appreciate the evolutionary significance of variation among individuals, a kind of tree thinking is required to appreciate the evolutionary significance of the history of ancestor-descendant relationships that unites all levels of organization: genes, individuals, populations, species, higher taxa.
What are the goals of modern systematics? 1. Differentiate individual organisms and establish the basic units: species 2. to arrange these units in a logical hierarchy that permits easy and simple recognition in the basis of similarity = classification 3. to keep the details of 1 and 2 separate = nomenclature 4. determine the evolutionary (ancestor-descendant) relationships between all levels of the hierarchy =phylogeny.
Identification is not classification. Identification is to place an individual into an already existing classification scheme. Classification is to assemble groups into larger groups. There are conflicting goals of systematics: static classification of organisms into pigeon holes for easy reference; but this should reflect a dynamic history of common descent = phylogeny. A systematic solution to the problem of diversity incorporates both of these goals, but this result is not always easily obtained.
Terminology: Taxon (taxa) = a group of organisms of any taxonomic rank that is sufficiently distinct to be worthy of being assigned to a definite category. Category= rank or level in hierarchic classification. Taxa = robin, thrushes, songbirds, birds, vertebrates, animals
Categories = species, family, suborder, class, subphylum, kingdom How do you classify? Historically: Downward classification by logical division. Analogous to “20 questions”. In Aristotle’s time things were either animals or plants. One could start by asking oneself: is this an animal or a plant? Does this have feathers or not? and so on down until it was properly placed in its category.
Linnaeus believed in the reality of the genus. He used downward classification through his Linnaean hierarchy (kingdom, phylum, class, order, family, genus, species [recall: King Philip Came Over From Germany Speaking]) to reach the genus and then make the final division into the appropriate species. This approach lead to the Binomial nomenclature: Genus + species: Homo sapiens, etc. This methodology gave way to Upward classification by empirical grouping. It became apparent that the groupings of Linnaeus were not Natural. The bottom of a downward classification process often lead to groupings where members had clearly the wrong affinity.
Darwin’s discovery forced the thinking towards Descent from a common ancestor. It became apparent that these were the Natural groups that had been sought. What ultimately is the basis for upward classification? Characters. Can take varied forms: morphology, chemistry, behavior, ecology, physiology all could provide good characters.
Characters have character states: we all have hair, but our hair is different color; we all have eyes but our eyes are different color. Character states may vary together in Character complexes, or they may vary independently = Mosaic evolution of characters. Skin, eye and hair color all vary together in humans. Is this three characters or one (pigmentation)? Morphological and molecular characters may not evolve together, in a mosaic fashion. Thus different characters may suggest different patterns of relationships.
Ancestor and descendant are linked by intermediate forms having the same character. These characters are the basis of determining a true phylogeny Analogous characters = homoplasious characters: two characters not sharing a common genetic and developmental history and usually attained by adaptation to a similar ecological or functional challenge. Bats and Birds forelimbs and wings: they are homologous as forelimbs, but analogous as wings (a simple but crucial distinction).
Analogous characters are attained by convergent evolution where descendants resemble each other more than they do their respective ancestors. Ichthyosaur, Fish, Porpoise; desert plants. Parallel evolution (e.g. recall slide example of marsupials and placentals). Ancestors are viewed as different but related. Point is: two possible evolutionary trees could be drawn.
If the characters that a set of organisms have could be either analogous or homologous characters, the systematist is faced with several problems: 1) attempting to identify which are which, and 2) deciding whether (or how) to perform character weighting. Excluding characters is an extreme form of weighting (weight = 0). Placentals have a placenta, marsupials a pouch where the immature young finish their development. These are major characters, should they carry more weight in our assignment of relationship. If we looked for other characters in the animals we could probably find many that would link the dog-dog, squirrel-squirrel, cat-cat, anteater-anteater, etc. Since characters are the data we will use to do systematics other questions arise: 1) should we use single or many characters?, 2) what are legitimate characters (morphology, ecology, etc.)? 3) how do we weight those that are chosen?
Analogous/homologous problem revolves around the distinction of the similarity of characters with adaptive or genetic bases. This leads to the distinction between grade and clade. Grade = level of adaptation; organisms of similar grade due to similar adaptations due to convergence (e.g., the “dog” grade or the “anteater” grade that goes across marsupial/placental distinction). Clade = a group descended from one common ancestor; a genetic lineage (e.g., the placental clade vs. the marsupial clade).
Schools of Systematics¶
There are different schools of systematics: different schools place different emphasis on the goals of systematics. Some will emphasize classification over phylogeny (grade over clade); another emphasizes phylogeny over classification (clade over grade).
Phenetics (Numerical taxonomy) classification based on overall similarity of organisms. Treat all characters of equal weight and amass as many character as you can. Enter the characters into a computer that runs an algorithm that gives you a number reflecting the degree of similarity between different taxa. Assumes: homologous and analogous characters will be in there together, but rate of character change is roughly proportional to evolutionary distance and the homologous characters will carry the day. Results plotted in a Phenogram showing evolutionary relationships.
Cladistics (Phylogenetic systematics) Clade is everything. Define a hierarchical series of dichotomous branching events reflecting ancestor-descendant relationships. Seeks to identify monophyletic groups that, by definition, are derived from a single common ancestor. Defines these groups as taxa sharing derived characters (synapomorphies). Assumes that speciation is dichotomous producing two sister taxa and that the ancestral taxon disappears at the speciation event.
Evolutionary systematics uses homologous characters but will commonly weight characters differently depending on the “importance” of the character. A good evolutionary systematist is one who “knows” the group and can thus decide which characters to weight more heavily. Criticized as being highly subjective and not scientific because decisions are not testable hypotheses, but statements of faith about the importance of the characters. Acknowledges grade as relevant to the study: crocodiles and birds are different classes to evolutionary systematists, but sister taxa to cladists.
Which approach do we use? Ideally a classification should be objective in that the criteria use to classify are not subject to the whim of the person doing the classifying. Objectivity is important if classification is to be a scientific endeavor: someone else ought to be able to step in and repeat your “experiment” in classification. Moreover a classification should be natural and not artificial so that if a set of characters were used to assign relationships, these relationships should also be apparent in other characters not used in the analysis. There are natural groups that have been generated during the history of life and systematists should attempt to discover these groups. In recent years cladistics has become the dominant school of systematics as it meets these two criteria well. However, phenetics is still very active and character weighting is still being used. Note that natural groups might generate many more hierarchical levels than the classical Linnaean hierarchy.
Phylogenetic Inference¶
The crucial issue in systematics is that there is a history of the organisms we wish to classify, but we don’t know that history. We must infer the sequence of branches or evolutionary transformations that have taken place. There is a true phylogeny which we may never know, our task is to collect and analyze data to provide the best estimate of the true phylogeny.
We will work some examples that illustrate the difficulty of this task. Phenetics: classification based on overall similarity. See fig. 14.4, pg. 378. Matrix of shared character states. Those taxa with the most number of similar character states are deemed more similar.
Distance (or similarity) matrix derived from morphological measurements, genetic distance measures, etc. Each cell in the matrix is a value indicating the degree of difference (or similarity) between the two taxa. These can be clustered by UPGMA (unweighted pair group methods with averages). The two most similar (least distant) taxa are joined to form a group (e.g., taxa 1 & 2); the length of each branch is half the distance value between the two taxa. The next most similar taxon (3) is joined to the tree and the distance is calculated as the average of the distance from taxon 1 to taxon 3 and taxon 2 to taxon 3. At each such step in building a tree, the number of taxa in the matrix is reduced by one and new distance values are calculated as the average distance from each member of the group just formed to each taxon outside that group. This process of adding the most similar new taxon to a group is continued until all taxa are joined. Another algorithm that works similarly, but doesn’t assume that lineages evolve at similar rates is Neighbor Joining, or NJ. The NJ algorithm is used often as a first pass analysis before more sophisticated methods such as maximum likelihood are used for tree construction.
The tree produced is a Phenogram and is one way to infer relationships. Why might this tree not reflect phylogeny (true ancestor descendant relationships)? 1) Variable evolutionary rates: faster evolving taxon will be more different from all others and appear as an “outgroup” 2) Homoplasy (convergence) will tend to make character states similar between unrelated taxa and the UPGMA approach will join them.
Cladistics: classification reflects sequence of branching events, not degree of difference/similarity. Classification is on shared derived characters (synapomorphies). Note that relationships are never based on the absence of characters (e.g., “Invertebrates” makes sense to us, but refrigerators and pizzas are “invertebrates” because they don’t have back bones, but they clearly are not related to animals. For that matter, plants are invertebrates!). Tree produced is a Cladogram and is a hypothesis of relationship. A taxon can evolve at a different rate, but it will tend to accumulate autapomorphies which will not be shared with any other taxa and thus will affect the branch pattern less (but variable rate can lead to incorrect cladograms). How about Homoplasies? They will affect the hypothesis since those characters showing convergences (or parallelisms) will contradict data from other characters.
This brings us to the topic of Parsimony: in constructing cladograms we seek that branching pattern which requires the fewest number of evolutionary steps. Example of marine mammals (chosen since we know that it is an example of a convergence). It is more parsimonious to evolve fins twice and all the characters that hold mammals together once, than it is to evolve fins once and all the characters that ally whales with other mammals twice. We tolerate fins as Homoplasies (=analogies) since it is much more parsimonious than calling all the mammalian characters homoplasies.
Parsimony is central to the cladistic method and can be used for both studying the Polarity (direction of evolution in a transformation series) of characters and the confidence of hypotheses of relationships. Example: Drosophila chromosome banding patterns (e.g., chromosomal inversions). Each species has a distinct pattern of bands in their salivary gland chromosomes. The sequence of bands appears to have been inverted for certain sections of the chromosome during evolution. One can determine a network of likely evolutionary steps from one species to another. Big problem: can start anywhere in the network. Need to establish where the network begins, i.e. where to Root the tree?
Choose an Outgroup: A taxon (or taxa) that are known to lie outside that group in question and are thus believed to be ancestral to the ingroup. Requires independent information. Once properly selected the determination of polarity falls out logically based on parsimony. the identification of an outgroup can help identify Character reversals = reversal in a trend of character change. An example is winglessness in insects: insects evolved from a wingless myriapod ancestor, but there are groups derived (i.e., more recently evolved) insects that have no wings (fleas). Wings have been lost in fleas and represent a character reversal. The use of an outgroup is extremely important in phylogenetic inference as it allows you to determine the “polarity” or direction of evolution as illustrated with insect wings. Once a reliable phylogenetic tree has been produced based on a data set of characters properly rooted with an outgroup, one can use the polarity provided by the outgroup to analyze the patterns of character evolution in general (how many times does a character originate during evolution?).
Compatibility methods: go with the tree that is supported by the largest number of characters. Said another way: the most likely tree is that which is supported by the greatest number of independent characters (the largest “clique” of characters) in which there are no homoplasies.
Primer on Phylogenetic terminology¶
Terminology (see text book chapter 2 for more on systematic lexicon).
Monophyletic - referring to a group of taxa descended from a single common ancestor (e. g. angiosperms or seed plants)
Apomorphic - a derived character (seeds in angiosperms and gymnosperms relative to ferns)
Plesiomorphic - an ancestral character (stomata in angiosperms and gymnosperms)
note: apo and plesiomorphic are relative terms: vascular tissue is apomorphic to the monophyletic group above the bryophytes, but plesiomorphic to the angiosperms)
Syn - shared Aut - “self”, unique to a group
synapomorphies are shared derived characters and are what define monophyletic groups because the members of that group have the character because they are descended from a common ancestor (seeds in angiosperms and gymnosperms)
symplesiomorphies are shared ancestral characters (chlorophyll in the angiosperms and gymnosperms)
autapomorphies are derived characters unique to one group (flowers in angiosperms)
autplesiomorphies can’t exist, by definition
Paraphyletic group includes some but not all of the descendants of a common ancestor. Incomplete grouping based on symplesiomorphies (e.g. the non-natural group of ferns and gymnosperms based on the presence of chlorophyll and stomata; angios have these but are not in the group)
Polarity distinguishes between the plesiomorphous and apomorphous state of a character by comparison to an outgroup (a taxon know to lie outside the hierarchy of the groups being considered, e.g., the outgroup algae determines the polarity of the evolution of seeds and secondary growth with respect to the other taxa)
Sister taxa are the two lineages that descend from a common ancestor following a splitting event; can be considered at any level of hierarchy in a cladogram