1 / 20

Class 5: Multiple Sequence Alignment

Class 5: Multiple Sequence Alignment. Multiple sequence alignment. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG---

Download Presentation

Class 5: Multiple Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 5:Multiple Sequence Alignment .

  2. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns • Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3d structural positions

  3. Multiple alignment – why? • Identify sequence that belongs to a family • Family – a collection of homologous, with similar sequence, 3d structure, function or evolutionary history • Find features that are conserved in the whole family • Highly conserved regions, core structural elements

  4. The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]

  5. Scoring a multiple alignment (1) Important features of multiple alignment: • Some positions are more conserved than others  Position specific scoring • Sequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution

  6. Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.

  7. Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure

  8. Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:

  9. Multidimensional DP

  10. Multidimensional DP

  11. Multidimensional DP Complexity Space: Time:

  12. Pairwise projections of MA

  13. MSA (i) [Carrillo and Lipman, 1988]

  14. MSA (ii)

  15. MSA (iii) Algorithm sketch

  16. Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: • PW alignment order • One growing alignment or subfamilies • Alignment and scoring procedure

  17. Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: • Leaves – sequence • Interior nodes – alignments • Root – complete multiple alignment

  18. Feng-Doolittle (1987) • Calculate all pairwise distances using alignment scores: • Construct a guide tree using hierarchical clustering • Highest scoring pairwise alignment determines sequence to group alignment

  19. Profile alignment • Use profiles for group to sequence and group to group alignments • CLUSTALW (Thompson et al., 1994): • Similar to Feng-Doolittle, but uses profile alignment methods • Numerous heuristics

  20. Iterative Refinement • Addresses “frozen” sub-alignment problem • Iteratively realign sequences or groups to a profile of the rest • Barton and Sternberg (1987) • Align two most similar sequences • Align current profile to most similar sequence • Remove each sequence and align it to profile

More Related