1 / 10

Sequence Matching and alignment algorithms in the field of Bioinformatics

Sequence Matching and alignment algorithms in the field of Bioinformatics. Presented by Jennifer Johnstone. Introduction. What is Bioinformatics? Sequence Matching Problem The Alignment Problem Future Research. What is Bioinformatics?.

cricket
Download Presentation

Sequence Matching and alignment algorithms in the field of Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Matching and alignment algorithms in the field of Bioinformatics Presented by Jennifer Johnstone

  2. Introduction • What is Bioinformatics? • Sequence Matching Problem • The Alignment Problem • Future Research

  3. What is Bioinformatics? Bioinformatics is the application of computers in Biology using algorithms, statistics and other mathematical techniques to decipher the language of DNA.

  4. The Sequence Matching Problem Given a string s, of size n, and a pattern p, of size m, for what indices I of s does p exactly match s. Example: Let p = ABA and s = AABAAGTABA then I = {2, 8} since AABAAGTABA ABA and AABAAGTABA ABA

  5. Algorithms • Naive String Matching Algorithm, O(m*n). • String Matching with Finite Automata , O((m*|Σ|)+n). • Boyer-Moore Algorithm, O(m+n) (in practice). • String Matching with Compact Suffix Trees, O(n log(n) + m*|Σ| +k). • String Matching using Suffix Arrays , O(n+m log(n) +k).

  6. String Matching with Finite Automata Given a pattern p = aba and a string s = acbababa we must first define the state function δ(q,x). Now we see that the match condition is met for i = 6, 8. Then the starting indexes are j = i – 3+ 1, such that I ={ 4, 6 }.

  7. The Alignment Problem Given two strings we want to generate an optimal alignment. The alignment of two strings may involve the insertion of gaps and\or the acceptance of mismatched entries. Example: Consider the following possible alignment of the two strings GACGGATTATG and GATCGGAATAG: GACGGATTATG GATCGGAATAG

  8. Dynamic vs. Heuristic Dynamic Approach • Computing Optimal Alignment using a dynamic programming matrix and a scoring function. (O(m*n)) Heuristic Approach used in practice to speed up search times on large databases. Consider the Human genome which is over 3 billion characters long for which you mayneed to align only a small portion. • FASTP and FASTA Programs • BLAST Algorithm

  9. Future Research • Development of the Heuristic approaches is constantly being improved upon and researched as the algorithms themselves are only 10 -15 years old. • Development of tools that can perform a 10-way comparison of genomes. Bioinformatics as a whole is an active field of research that strongly needs qualified professionals who have an aptitude for computing and\or biology.

  10. References • Bockenhauer, Hans-Joachim and Bongartz, Dirk (2007) Algorithmic Aspects of Bioinformatics. Berlin: Springer pg.37-114 • Haubold, Bernhard and Wiehe, Thomas (2006) Introduction to Computational Biology: An Evolutionary Approach. Basel: Birkhauser pg.65-85. • Jones, Neil C. and Pevzner, Pavel A. (2004) An Introduction to Bioinformatics Algorithms. Cambridge: The MIT Press pg. 148-226 and 311-337. • Parida, Laxmi (2008) Pattern Discovery in Bioinformatics: Theory & Algorithms. Boca Raton: Chapman & Hall/CRC pg. 139-182 and 183-212. • Polanski, Andrzej and Kimmel, Marek (2007) Bioinformatics. Berlin: Springer pg. 155-183 and 349-354.

More Related