1 / 21

Computational Biology

Computational Biology. Prepared By: Syed Khaleelulla Hussaini. Outline. Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms. DEFINITION.

eavan
Download Presentation

Computational Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Biology Prepared By: SyedKhaleelullaHussaini

  2. Outline • Proteins • DNA • RNA • Genetics and evolution • The Sequence Matching Problem • RNA Sequence Matching • Complexity of the Algorithms

  3. DEFINITION • Computational Biology encompasses all computational methods and theories applicable to molecular biology and areas of computer based techniques for solving biological problems.

  4. Protiens • They are building blocks of living organism • It is a large molecule that is composed of sequences of amino acids • There are 20 amino acids which are divided into classes hydrophobic(h-phob) hydrophillic(h-phil) polar(pos,neg)

  5. Amino acid codes Name, 3-letter & single-letter codes • Aspartic Acid Asp D Glutamic Acid Glu E • PhenylaninePhe F GlycineGly G • Alanine Ala A CystineCys C • Histidine His H Isoleucine Ile I • Lysine Lys K LeucineLeu L • Methionine Met M AsparagineAsn N • Proline Pro P Glutamine Gln Q • ArginineArg R Serine Ser S • ThreonineThr T Valine Val V • Tryptophan Trp W Tyrosine Tyr Y

  6. DNA(Deoxyribonucleic acid) • Blueprint of living organisms • DNA is composed of two strands hold by a weak hydrogen bond • Each strand is a sequence of nucleotides • DNA has four bases which are classified as two chemical types BASE SYMBOL TYPE Adenine A Purine Thymine T Purine Sytosine C Pyrimidine Guanine G Pyrimidine

  7. DNA Double Helix

  8. RNA • RNA is chemically very similar to DNA • There are two important differences • Four bases present in RNA are: Adenine(A) Guanine(G) Cystosine(C) Uracil(U) • RNA nucleotides contain a different sugar molecule(ribose)

  9. Genetics and Evolution • Mutation • The changing of the structure of a gene, resulting in a variant form that may be transmitted to subsequent generations, caused by the alteration of single base units in DNA. • Natural selection • The process whereby organisms better adapted to their environment tend to survive and produce more offspring. • Genetic Drift • Variation in the relative frequency of different genotypes in a small population.

  10. Sequence matching problem • Proteins are longer and DNA strands are even longer • We match them by breaking them in to shorter subsequences • Breaking and matching is done by notion of alignment.

  11. Sequence matching example • Consider two amino acid sequences: ACCTGAGAG ACGTGGCAG sequence alignment A C C T G A G – A C A C G T G – G C A C

  12. Finite state machines in blast • It is used to find out which of the sequences in a database are related to the new given sequence using BLAST • The BLAST system is a three step process: 1. Examine the query string and select set of substrings of length w(between 4 and 20) which are good for producing matches 2. Build a DFSM that uses set of substrings and find the sequences with the highest local matches in the database 3. Examine the matches found in step2 and try to build a longer matching sequences

  13. Regular expressions specify protein motif • Aligning collection of related proteins we can define a motif Example: E S G H D T Y Y NKN R M D T TTTT S W QS R G S D T TT P D MT A G P T TW R NT Once an motif is defined we can search for the occurrences of it in other protein sequence by using regular expressions

  14. HMM for sequence matching • HMM’s are used when sequences become fairly diverse • We can capture the variations among the members of the family and the probabilities associated with them • So by using HMM’s we can find the best alignment between two sequences and from which family does a given new sequence belongs to

  15. HMM profile is given by M = (K,O,π,A,B) • K is a set of n states, one for each position in the sequence • O is the output alphabet • Π contains the initial state probabilities • A contains the transition probabilities • B contains the output probabilities

  16. Example of HMM describing protein sequence family

  17. RNA sequence matching and secondary structure prediction using the tools of context-free languages • In RNA a change to a single nucleotide in a stem region could completely alter the molecules shape and its function • So an change in the stem must be matched by a corresponding change in the paired nucleotide • Context free languages are used describe these nested dependencies and secondary structure

  18. Example:

  19. Complexity of algorithms used in computational biology • Approaches to many of the problems described here are computational like breaking up of large protein and DNA molecules into substrings • NP-hard • Conversion to decision problem SHOERTEST-SUPERSTRING(<S,K> ): S is a set of strings and there exists some superstring T such that every element of S is a substring of T and T has length less than or equal to K) – NP-complete

  20. Reference • http://en.wikipedia.org/wiki/Computational_biology • http://www.google.com • http://www.cs.utexas.edu/~ear/cs341/automatabook/

  21. Thank you…

More Related