1 / 36

Tutorial 7

Tutorial 7. Clustering and phylogenetic trees. Agenda. How to construct a tree using Neighbor Joining algorithm Unsupervised clustering – K-means EP- Clust tool Cool story of the day: Horizontal gene transfer. Neighbor Joining vs. UPGMA. Neighbor Joining. UPGMA.

bettyeb
Download Presentation

Tutorial 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 7 Clustering and phylogenetic trees

  2. Agenda • How to construct a tree using Neighbor Joining algorithm • Unsupervised clustering – K-means • EP-Clust tool Cool story of the day: Horizontal gene transfer

  3. Neighbor Joining vs. UPGMA Neighbor Joining UPGMA • Assumption: Divergence of sequences is assumed to occur at a constant rate • Distance to root is equal • Constructs an unrooted guide tree from a distance matrix • We do not assume constant rate of evolution

  4. Neighbor Joining Algorithm 2 matrices • Calculate all pairwise distances. • Find 2 nodes i and j, such that the relative distance between i and j is minimal. • Remove the rows and columns of i and j • Add a new row and column k (the parent of i and j), and compute the distance from k to any other remaining node. • Continue until two nodes remain – connect them with an edge.

  5. Step 1. Calculate all pairwise distances • A, B, C, D and E are tree nodes. Each character represents a sequence. • How can we measure distance between sequences?

  6. Step 1. Calculate all pairwise distances Distance between sequences • Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences. • The score increases as the dissimilarity between residues increases.

  7. Step 1. Calculate all pairwise distances The distance between each pair of sequences is based on multiple sequence alignment Multiple sequence alignment a: A T G G C b: A A G C C c: C A G C C d: G G G C G e: A T G C C A T G G C A A G C C

  8. Step 2. Two nodes with minimal relative distance If we assume constant evolution rate we may construct a wrong tree. Closest leaves aren’t necessarily neighbors: i and j are neighbors, but (dij= 13) > (djk = 12)

  9. Step 2. Two nodes with minimal relative distance • Find a pair of leaves that are close to each other, but far from other leaves. • This is called “relative distance”.

  10. Step 2. Two nodes with minimal relative distance Relative distance between i and j Distance between i and j (from the distance table) Distance between i and all other nodes Number of leaves (=sequences) left in the tree

  11. Step 2. Two nodes with minimal relative distance Distances matrix: B A C E D

  12. Step 2. Two nodes with minimal relative distance Distances matrix:

  13. Step 2. Two nodes with minimal relative distance The relative distance table: A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances

  14. Steps 3+4. Remove i, j and add k to the matrix The distance from k to any other leaf m can be computed as: Dkm = (Dim + Djm – Dij)/2 Compress i and j into k, iterate algorithm for rest of tree

  15. Steps 3+4. Remove i, j and add k to the matrix Now we’ll calculate the distance from X to all other nodes: B A K C E D

  16. Steps 5. Continue till 2 nodes remain B A The final tree: K 12 10 C What is missing? 20 D E Y 6 5 Z 9 4

  17. Phylogeny.fr

  18. Unsupervised ClusteringK-means clustering

  19. Unsupervised Clustering – K-means clustering An algorithm to classify the data into K number of groups. K=4

  20. How does it work? 1 2 3 4 The centroid of each of the k clusters becomes the new means. k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean Steps 2 and 3 are repeated until convergence has been reached. The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

  21. How should we determine K? • Trial and error • Take K as square root of gene number

  22. Tool for clustering - EPclust http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

  23. Choose distance metric Choose algorithm

  24. Hierarchical clustering

  25. Zoom in by clicking on the nodes

  26. K-means clustering K-means clustering

  27. Samples found in cluster Graphical representation of the cluster Graphical representation of the cluster

  28. 10 clusters, as requested

  29. Cool Story of the day Horizontal gene transfer

  30. Is horizontal gene transfer possible?

  31. Viruses

  32. Horizontal gene transfer in Bacteria Horizontal gene transfer is the primary reason for bacterial antibiotic resistance and plays an important role in the evolution of bacteria. Horizontal gene transfer is very abundant in bacteria, it is hard to talk about a bacteria’s genome, but more of the genome of a “society of bacteria”. http://en.wikipedia.org/wiki/Horizontal_gene_transfer

  33. Sea slug The sea slug Elysiachlorotica incorporates chloroplasts from the algae that it eats into its body. Photosynthesis continues for up to 12 months using genes within the chloroplast, which are directed by algal nuclear genes that were transferred to the nuclei of the slug. http://en.wikipedia.org/wiki/Horizontal_gene_transfer

  34. Until the full speciation… Bioinformatics/ David W.Mount p. 244

More Related