1 / 49

BCB 444/544

BCB 444/544. Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02. Required Reading ( before lecture). Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30

adolph
Download Presentation

BCB 444/544

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCB 444/544 Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  2. Required Reading (before lecture) Wed Oct 30 - Lecture 29 Phylogenetics Basics • Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 - 169 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  3. Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  4. BCB 544 "Team" Projects Last week of classes will be devoted to Projects • Written reports due: • Mon Dec 3(no class that day) • Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period • See Guidelines for Projects posted online BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  5. BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  6. Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Bob Jernigan BBMB, ISU • Control of Protein Motions by Structure BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  7. Chp 10 - Phylogenetics SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 10 Phylogenetics Basics • Evolution and Phylogenetics • Terminology • Gene Phylogeny vs. Species Phylogeny • Forms of Tree Representation • Why Finding a True Tree is Dificult • Procedure of Building a Phylogenetic Tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  8. Tree Building Procedure • Choose molecular markers • Perform MSA • Choose a model of evolution • Determine tree building method • Assess tree reliability BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  9. Choice of Molecular Markers • Very closely related organisms - nucleic acid sequence will show more differences • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences • Very distantly related species - use highly conserved protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  10. Multiple Sequence Alignment • Most critical step in tree building - cannot build correct tree without correct alignment • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one • Most alignments need manual editing • Make sure important functional residues align • Align secondary structure elements • Use full alignment or just parts BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  11. Automatic Editing of Alignments • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences • Gblocks – detect and eliminate poorly aligned positions and divergent regions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  12. How do we measure divergence between sequences? • Simple measure – just count the number of substitutions observed between the sequences in the MSA • Problem – number of substitutions may not represent the number of evolutionary events that actually occurred BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  13. Multiple Substitutions C A A T G Just because we only see one difference, does not mean that there was only one evolutionary event BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  14. Multiple Substitutions A A A T G Just because we only see no difference, does not mean that there were no evolutionary events BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  15. Choosing Substitution Models • Statistical models of evolution are used to correct for the multiple substitution problem • Focus on DNA models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  16. Jukes-Cantor Model • Jukes-Cantor model assumes all nucleotides are substituted with equal probability • Can be used to correct for multiple substitutions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  17. Many Other Models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  18. Evolutionary Models for Protein Sequences • PAM and JTT substitution matrices already take into account multiple substitutions • There are also models similar to Jukes-Cantor for protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  19. What about differences in mutation rates between positions within a sequence? • One of our assumptions was that all positions in a sequence are evolving at the same rate • Bad assumption • Third position in a codon changes with higher frequency • In proteins, some amino acids can change and others cannot • This variation is called among-site rate heterogeneity • Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  20. Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  21. Tree Construction • Two main categories of tree building methods • Distance-based • Overall similarity between sequences • Character-based • Consider the entire MSA BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  22. Distance-Based Methods • Given a MSA and an evolutionary model, calculate the distance between all pairs of sequences • Construct distance matrix • Construct phylogenetic tree based on the distance matrix BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  23. Distance Matrices a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d 0 1 2 3 4 5 6 7 8 a b c d BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  24. Distance-Based Methods • Two ways to construct a tree based on a distance matrix • Clustering • Optimality BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  25. Clustering-Based Methods • E.g., UPGMA and Neighbor-Joining • A cluster is a set of taxa • Interspecies distances translate into intercluster distances • Clusters are repeatedly merged • “Closest” clusters merged first • Distances are recomputed after merging BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  26. UPGMA • UPGMA – Unweighted Pair Group Method Using Arithmetic Average • Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree) • This assumption is usually wrong • So why use UPGMA? • Very fast BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  27. UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  28. UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  29. UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  30. UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  31. Neighbor Joining • Idea: Find a pair of taxa that are close to each other but far from other taxa • Implicitly finds a pair of neighboring taxa • No molecular clock assumption BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  32. Neighbor Joining • NJ corrects for unequal evolutionary rates between sequences by using a conversion step • The conversion step requires calculation of “r-values” and “transformed r-values” BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  33. Neighbor Joining The r-value for a sequence is: The sum of the distances between sequence i and all other sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  34. Neighbor Joining The transformed r-value for a sequence is: Where n is the number of taxa Transformed r-values are used to determine the distance of a taxon to the nearest node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  35. Neighbor Joining The converted distance between two sequences is: These converted distances are used in building the tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  36. Neighbor Joining The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is: BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  37. Neighbor Joining Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  38. Neighbor Joining Example • Initialize tree into a star shape with all taxa connected to the center • Step 1: Compute r-values and transformed r-values for all taxa BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  39. Neighbor Joining Example • Step 2: Compute converted distances BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  40. Neighbor Joining Example • Step 3: Fill out converted distance matrix BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  41. Neighbor Joining Example • Step 4: Create a node by merging closest taxa • In this example, the distance between A and B is the same as the distance between C and D • We can pick either pair to start with • Let’s pick A and B and create a node called U B ? A A U B ? D C BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  42. Neighbor Joining Example • Step 5: Compute branch lengths • Use the equation for computing the distance from a taxa to a node 0.15 A U B 0.25 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  43. Neighbor Joining Example • Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U • In UPGMA, we simply calculated the average BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  44. Neighbor Joining Example Our reduced distance matrix: BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  45. Neighbor Joining Example • From here, we go back to step 1 • Continue until all taxa have been decomposed from the star tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  46. Optimality-Based Methods • Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies • Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix • Two algorithms: • Fitch-Margoliash • Minimum evolution BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  47. Fitch-Margoliash • Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix • Basically, a least squares method • Dij = distance between i and j in matrix • dij = distance between i and j in tree • Objective: Find tree that minimizes BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  48. Minimum Evolution • Similar to Fitch-Margoliash, but uses a different optimality criterion • Searches for a tree with the minimum total branch length • This is an indirect way of achieving the best fit of the branch lengths with the original data BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

  49. Summary of Distance-Based Methods • Clustering-based methods: • Computationally very fast and can handle large datasets that other methods cannot • Not guaranteed to find the best tree • Optimality-based methods: • Better overall accuracies • Computationally slow • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

More Related