1 / 22

Pairwise alignment

Pairwise alignment. Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial explosion than with pairwise alignment…. Multi-dimensional dynamic programming (Murata et al. 1985).

Download Presentation

Pairwise alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise alignment • Now we know how to do it: • How do we get a multiple alignment (three or more sequences)? • Multiple alignment: much greater combinatorial explosion than with pairwise alignment…..

  2. Multi-dimensional dynamic programming(Murata et al. 1985)

  3. Simultaneous Multiple alignmentMulti-dimensional dynamic programming MSA (Lipman et al., 1989, PNAS86, 4412) • extremely slow and memory intensive • up to 8-9 sequences of ~250 residues DCA (Stoye et al., 1997, CABIOS13, 625) • still very slow

  4. Alternative multiple alignment methods • Biopat (first method ever) • MULTAL (Taylor 1987) • DIALIGN (Morgenstern 1996) • PRRP (Gotoh 1996) • Clustal (Thompson Higgins Gibson 1994) • Praline (Heringa 1999) • T Coffee (Notredame 2000) • HMMER (Eddy 1998) [Hidden Marcov Models] • SAGA (Notredame 1996) [Genetic algorithms]

  5. Progressive multiple alignment general principles 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Scores Similarity matrix 5×5 Scores to distances Iteration possibilities Guide tree Multiple alignment

  6. General progressive multiple alignment technique(follow generated tree) d 1 3 1 3 2 5 1 3 2 5 1 root 3 2 5 4

  7. Progressive multiple alignment Problem: Accuracy is very important Errors are propagated into the progressive steps “Once a gap, always a gap” Feng & Doolittle, 1987

  8. Multiple alignment profilesGribskov et al. 1987 i A C D    W Y 0.3 0.1 0    0.3 0.3 Gap penalties 1.0 0.5 Position dependent gap penalties

  9. Profile-sequence alignment sequence profile ACD……VWY

  10. Profile-profile alignment profile A C D . . Y profile ACD……VWY

  11. Clustal, ClustalW, ClustalX • CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct guide tree. • Sequence blocks are represented by profiles, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree. • Further carefully crafted heuristics include: • (i) local gap penalties • (ii) automatic selection of the amino acid substitution matrix, (iii) automatic gap penalty adjustment • (iv) mechanism to delay alignment of sequences that appear to be distant at the time they are considered. • CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)

  12. Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors

  13. Pre-profile generation 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Cut-off Pre-profiles Pre-alignments 1 A C D . . Y 1 2 3 4 5 2 2 A C D . . Y 1 3 4 5 5 A C D . . Y 1 5 2 3 4

  14. Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors

  15. Protein structure hierarchical levels SECONDARY STRUCTURE (helices, strands) PRIMARY STRUCTURE (amino acid sequence) VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

  16. Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors

  17. Globalised local alignment 1.Local (SW) alignment (M + Po,e) + = 2.Global (NW) alignment (no M or Po,e) Double dynamic programming

  18. Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors

  19. Matrix extension – T COFFEE 2 1 3 1 4 1 3 2 4 2 4 3

  20. Summary • Weighting schemes simulating simultaneous multiple alignment • Profile pre-processing (global/local) • Matrix extension (well balanced scheme) • Smoothing alignment signals • globalised local alignment • Using additional information • secondary structure driven alignment • Schemes strike balance between speed and sensitivity

  21. References • Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem.23, 341-364. • Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217. • Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477.

  22. Where to find this….http://www.cs.vu.nl/~ibivu/teaching

More Related