1 / 72

Pairwise Alignment

Pairwise Alignment. Alexei Drummond. Week 1 Learning Outcomes. Have an appreciation of what Computational Biology is Know what DNA, RNA and Protein sequences are :-)

lloyd
Download Presentation

Pairwise Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Alignment Alexei Drummond

  2. Week 1 Learning Outcomes • Have an appreciation of what Computational Biology is • Know what DNA, RNA and Protein sequences are :-) • Understand that sequence evolution can be modeled with a stochastic model of evolution, so that the probability of evolving from one character to another in a certain time can be calculated • Know what the Jukes Cantor and General time-reversible models molecular evolution imply in terms of rates and base frequencies. CS369 2007

  3. Week 2 Learning Outcomes • Understand the basic principles of dynamic programming • Be familiar with the application of dynamic programming to a variety of simple examples such as • Knapsack problem • RNA secondary structure problem CS369 2007

  4. Dynamic Programming • method for solving combinatorial optimization problems • guaranteed to give optimal solution • generalization of “divide-and-conquer” • relies on “Principle of Optimality” i.e. sub-optimal solution of sub-problem cannot be part of optimal solution of original problem instance. CS369 2007

  5. Principle of Optimality Auckland Te Kuiti Wellington CS369 2007

  6. Principle of Optimality Auckland Te Kuiti Wellington CS369 2007

  7. Key to efficiency • computation is carried out bottom-up • store solutions to sub-problems in a table • all possible sub-problems solved once each, beginning with smallest sub-problems • work up to original problem instance • only optimal solutions to sub-problems are used to compute solution to problem at next level • DO NOT carry out computation in recursive, top-down manner • same sub-problems would be solved many times CS369 2007

  8. Pairwise alignment Sequences x = a c g g t s y = a w g c c t t Alignment x¢ = a – c g g – t s y¢ = a w – g c c t t CS369 2007

  9. Scoring • Numeric score associated with each column • Total score = sum of column scores • Column types: • Identical (+ve) (2) Conservative (+ve) (3) Non-conservative (-ve) (4) Gap (-ve) x¢ = a – c g g– t s y¢ = a w – g cc t t CS369 2007

  10. Scoring • Model-based • Log-odds scoring • Empirical • Often used for amino acid alignments • PAM matrices • BLOSUM matrices • JTT • WAG • Different matrices used depending on the level of similarity of the sequences. • How do you know the similarity before doing the alignment? CS369 2007

  11. Log-odds matrices “What we want to know is whether two sequences are homologous (evolutionarily related) or not, so we want an alignment score that reflects that. Theory says that if you want to compare two hypotheses, a good score is the log-odds score: the logarithm of the ratio of the likelihoods of your two hypotheses. If we assume that each aligned residue pair is statistically independent of the others (biologically dubious, but mathematically convenient), the alignment score is the sum of the individual log-odds score for each aligned residue pair.” Sean R Eddy 2004 CS369 2007

  12. Log-odds matrices “The numerator (pab) is the likelihood of the hypothesis we want to test: that these two residues are correlated because they’re homologous. Thus, pab are the target frequencies: the probability that we expect to observe residues a and b alignment in homologous sequence alignments. The denominator is the likelihood of a null hypothesis: that these two residues are uncorrelated and unrelated, occurring independently” Sean R Eddy, 2004 CS369 2007

  13. Evolutionary interpretation of match/mismatch scores t/2 a, b homologous x y x y (d=0.1 is roughly 90% similarity) d = average number of changes per site a, b not homologous x y x y CS369 2007

  14. Jukes Cantor Model • All mutations are equally likely • xy at the same rate for all x, y • All nucleotides are equally likely (equal base frequencies: • {0.25, 0.25, 0.25, 0.25} for DNA • {0.05,…,0.05} for Proteins DNA Proteins CS369 2007

  15. Evolutionary interpretation of match/mismatch scores (DNA) x y (d=0.1 is roughly 90% similarity) d = average number of changes per site x y CS369 2007

  16. Log-odds match score Probability of ending in the same state after time d Probability of ending in the same state after infinite time CS369 2007

  17. Log-odds mismatch score Probability of ending in y (different from x) after time d Probability of ending in y (different from x), after infinite time CS369 2007

  18. Evolutionary interpretation of match/mismatch scores (DNA) CS369 2007

  19. Evolutionary interpretation of match/mismatch scores (DNA) CS369 2007

  20. BLOSUM50 matrix CS369 2007

  21. Gap penalties y¢ • Linear score:g(g) = -gd gap penality • Affine score:g(g) = -d- (g-1)e gap-open penality gap-extension penalty ---------- x¢ g CS369 2007

  22. Needleman & Wunsch algorithm • Dynamic programming algorithm for global alignment • Needleman & Wunsch (‘70), modified Gotoh (‘82) • Assumptions: • Linear gap score d • Symmetric scoring matrix S • s(a,b) = s(b,a) score from lining up a and b • s(a,-) = s(-,a) = -d score from lining up a with - CS369 2007

  23. Principle of Optimality Given sequences: Define: F(i,j) = score of best alignment between and CS369 2007

  24. Principle of Optimality Optimal alignment CS369 2007

  25. Principle of Optimality Optimal alignment Looks like …… CS369 2007

  26. Principle of Optimality Optimal alignment Looks like …… or …………… CS369 2007

  27. Principle of Optimality Optimal alignment Looks like …… or …………… or …………… CS369 2007

  28. Principle of Optimality Optimal alignment Looks like …… or …………… or …………… so …………… CS369 2007

  29. Principle of Optimality Basis: CS369 2007

  30. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  31. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  32. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  33. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  34. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  35. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  36. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  37. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  38. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  39. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  40. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  41. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  42. Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

  43. Filling up table Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

  44. Constructing alignment Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

  45. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

  46. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

  47. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

  48. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

  49. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

  50. Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

More Related