1 / 24

Inferring phylogenetic trees: Distance and maximum likelihood methods

Inferring phylogenetic trees: Distance and maximum likelihood methods. GENOME 373: Genomic Informatics Prof. William Stafford Noble. Outline. Distance methods Fitch-Margoliash Neighbor joining UPGMA Maximum likelihood. One-minute responses. Is the parsimony model biologically accurate?

raleigh
Download Presentation

Inferring phylogenetic trees: Distance and maximum likelihood methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring phylogenetic trees: Distance and maximum likelihood methods GENOME 373: Genomic Informatics Prof. William Stafford Noble

  2. Outline • Distance methods • Fitch-Margoliash • Neighbor joining • UPGMA • Maximum likelihood

  3. One-minute responses • Is the parsimony model biologically accurate? • No. Parsimony ignores back-mutation, parallel mutation, etc. • The following tree can have a score of 2 or 3, correct? • Correct. However, the idea of parsimony is to select the tree with the smallest number of mutations along the tree. • Is it biologically acceptable to make the assumptions of the JC model? • No. The assumptions are made for statistical reasons – essentially, we often don’t know the proper values for the more parameter-rich models. • What other considerations can be taken to get a better tree? • The most important ones are site-by-site variation in mutation rate, and dependencies between adjacent sites. • Is there any way to check whether the tree obtained is significant? • You can check whether individual branches are significant using something called “bootstrap analysis.” • Still unclear how to use these trees in a biological way. • Primarily, these trees are used to understand evolutionary history. • Will we be using any of the phylogeny software in this class? • No.

  4. One-minute responses • What’s a real event that is your “oracle” that tells you the true evolutionary history of substitutions for Jukes-Cantor? • There is no oracle, and luckily, you don’t need one in order for Jukes-Cantor to work. • It was difficult to understand how you were computing parsimony scores at first.

  5. Distance methods • Fitch-Margoliash • Neighbor-joining • UPGMA Multiple sequence alignment Pairwise distance matrix Phylo- genetic tree

  6. Star topology C B • Sum of all branches is S*=a+b+c+d+e. • Summing all distances in the matrix counts each edge four times (e.g., dAB, dAC, dAD and dAE). • Hence, the sum of all distances in the matrix is 4S*. b c a d A D e E

  7. Adding one branch C C B b • Sum of branches is S = a + b + c + d + e + f = (dAC + dAD + dAE + dBC + dBD + dBE)/6 + dAB/2 + (dCD + dCE + dDE)/3 c c B b a d A f d D D a e e A E E

  8. Neighbor joining • Add one branch to the star topology and compute the difference between S* and S. • Repeat for each pair of leaves in the tree. • Choose the pair that yields the largest difference (the closest neighbors). • Join that pair. • Repeat until all pairs are joined.

  9. UPGMA • Unweighted pair group method with arithmetic mean. • Also known as agglomerative hierarchical clustering. • Basic idea: iteratively connect the two most closely related sequences.

  10. UPGMA

  11. UPGMA • Find the smallest off-diagonal element in the matrix.

  12. UPGMA • Compute the average between the two rows and columns.

  13. UPGMA

  14. UPGMA • Each merger creates a subtree. Smik Sbay

  15. Maximum likelihood for each possible tree for each column of the alignment compute the likelihood of the column, given the tree return the tree with the highest likelihood • Similar to parsimony, but capable of using a model of evolution. • Computationally expensive. • DNAML is the Phylip program for maximum likelihood. FastDNAML is a fast clone (http://geta.life.uiuc.edu/~gary/programs/fastDNAml.html).

  16. Computing the likelihood • What is the probability of observing this column, given this tree and an assumed model of evolution? ACGCGTTGGG ACGCGTTGGG ACGCAATGAA ACACAGGGAA + Pr(column|tree,model) T T A G

  17. Computing the likelihood • Solution: Enumerate all possible assignments to the internal nodes. Compute the probability of each tree, and sum. C G A A A A A A A T T T T A G T A G T A G

  18. Computing the likelihood • What is the probability of observing this column, given this assigned tree and an assumed model of evolution? ACGCGTTGGG ACGCGTTGGG ACGCAATGAA ACACAGGGAA + A Pr(column|tree,model) T A T T A G

  19. Computing the likelihood The probability of observing a substitution from A to T on a branch of length m is given by the evolutionary model. πA, πC, πG, πT The probability of the ancestral observation being A is just πA. A m T A T T A G

  20. Computing the likelihood πA, πC, πG, πT • The desired probability is the product of the probabilities of the branches. • L(tree) = L0  L1  L2  L3  L4  L5  L6 L0 A L1 L2 T A L5 L3 L4 L6 T T A G

  21. Computing the likelihood • The probability of the tree is the sum of the probabilities of the individual trees. • L(tree) = L(tree1) + L(tree2) + L(tree3) + … C G A A A A A A A T T T T A G T A G T A G tree1 tree2 tree3

  22. Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability

  23. Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability Multiply probabilities of independent events. Add probabilities of mutually exclusive events.

  24. Overview • Parsimony • Distance methods • Computing distances • Finding the tree • Fitch-Margoliash • Neighbor-joining • UPGMA • Maximum likelihood

More Related