310 likes | 488 Views
Tutorial 5. Multiple sequence alignment. A. C. D. B. Multiple Sequence Alignment – When?. More than two sequences DNA Protein Evolutionary relation Homology Phylogenetic tree Detect motif. GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC.
E N D
Tutorial 5 Multiple sequence alignment
A C D B Multiple Sequence Alignment – When? • More than two sequences • DNA • Protein • Evolutionary relation • Homology Phylogenetic tree • Detect motif GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
A C D B Multiple Sequence Alignment – How? • Dynamic Programming • Optimal alignment • Exponential in #Sequences • Progressive • Efficient • Heuristic GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
Hierarchical Clustering • A way to represent similarities graphically. • Sums up a pairwise distance matrix as a dendrogram. • Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC
ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
ClustalW • Progressive (incremental) • At each step align two existing alignments or sequences. • Gaps present in older alignments remain fixed. • Uses the Neighbor Joining algorithm.
Neighbor Joining Algorithm An agglomerative hierarchical clustering method. Constructs unrooted tree. 7
Neighbor Joining (Not assuming equal divergence) • Step by step summary: • Calculate all pairwise distances. • Pick two nodes (i and j) for which the relative distance is minimal (lowest). • Define a new node (x). • Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. • Continue until two nodes remain – connect with edge.
Measuring Distance • Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges. • Jukes-Cantor
Measuring Distance (cont) • Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences • the score increases proportionally to the extent of dissimilarity between residues
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Relative distance between i and j Distance between i and j from the distance table Distance of i from all other sequences Number of leaves (=sequences) left in the tree
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Etc. A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances
Step 3. Define a new node (x) E A D B C X
Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes.
Step 5 - Continue until two nodes remain E A New Mi,j table D B C Y X
E A New Di,j table D Only 2 nodes are left. Let’s calculate all the distances to Z B C Z Y X
The tree 5 Z 9 C Y 20 X And in newick tree format 6 12 E B 4 10 D A ((C(D,E))(A,B))
ClustalW - Input http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Scoring matrix Gap scoring Output format Email address
ClustalW - Output Match strength in decreasing order: * : .
ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score
ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .
ClustalW - Output Branch length