1 / 48

BCB 444/544

BCB 444/544. Lecture 19 A bit of: Protein Structure - Basics Protein Structure Visualization, Classification & Comparison #19_Oct05. Required Reading ( before lecture). √ Mon Oct 1 - Lecture 17 Protein Motifs & Domain Prediction Chp 7 - pp 85-96 √ Wed Oct 3 - Lecture 18

Download Presentation

BCB 444/544

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCB 444/544 Lecture 19 A bit of: Protein Structure - Basics Protein Structure Visualization, Classification&Comparison #19_Oct05 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  2. Required Reading (before lecture) √MonOct 1- Lecture 17 Protein Motifs & Domain Prediction • Chp 7 - pp 85-96 √Wed Oct 3 - Lecture 18 Protein Structure: Basics (Note chg in Lecture Schedule online) • Chp 12 - pp 173-186 √Thurs Oct 4 & Fri Oct 5 - Lab 6 & Lecture 19 Protein Structure: Basics, Databases, Visualization, Classification & Comparison • Chp 13 - pp 187-199 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  3. BCB 544 - Extra Required Reading Assigned Mon Sept 24 BCB 544 Extra Required Reading Assignment: for 544 Extra HW#1 Task 2 • Pollard KS, …., Haussler D. (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature443: 167-172. • http://www.nature.com/nature/journal/v443/n7108/abs/nature05113.html doi:10.1038/nature05113 • PDF available on class website - under Required Reading Link BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  4. BCB 544 Projects (Optional for BCB 444) • For a better idea about what's involved in the Team Projects, please look over last year's expectations for projects: http://www.public.iastate.edu/~f2007.com_s.544/project.htm • Criteria for evaluation of projects (oral presentations) are summarized here: http://www.public.iastate.edu/%7Ef2007.com_s.544/homework/HW7.pdf Please note: wrong URL (instead of that shown above) was included in originally posted 544ExtraHW#1; corrected version is posted now BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  5. Assignments & Announcements - #1 Students registered for BCB 444: Two Grading Options 1) Take FinalExam per original Grading Policies 2) Instead of taking Final Exam - you may participate in a Team Research Project If you choose #2, please do 3 things: • Contact Drena (in person) • Send email to Michael Terribilini (terrible@iastate.edu) • Complete544 Extra HW#1 - Task 1.1 by noon on Mon Oct 1 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  6. Assignments & Announcements - #2 BCB 444s (Standard): 200 ptsMidterm Exams = 100 points each 200 Homework & Laboratory assignments = 200 points 100 Final Exam 500 pts Total for BCB 444 BCB 444p (Project): 200 ptsMidterm Exams = 100 points each 200 Homework & Laboratory assignments = 200 points 190 Team Research Project 590 pts Total for BCB 444p BCB 544: 200 pts Midterm Exams = 100 points each 200 Homework & Laboratory assignments 100 Final Exam 200 Discussion Questions & Team Research Projects 700 pts Total for BCB 544 BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  7. Assignments & Announcements #3 ALL: HomeWork #3 Due: Mon Oct 8 by 5 PM • HW544: HW544Extra #1 √Due: Task 1.1 - Mon Oct 1 by noon Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM (not Monday) • 444 "Project-instead-of-Final" students should also submit: • HW544Extra #1 • Due: Task 1.1 - Mon Oct 8 by noon • Due:Task 1.2 - Fri Oct 12 by 5 PM (not Monday) Task 2 NOT required! BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  8. QUESTIONS re: HW#3? Due Mon BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  9. HMM example from Eddy HMM paper: Toy HMM for Splice Site Prediction This is a new slide BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  10. An HMM for Occasionally Dishonest Casino Transition probabilities • Prob(Fair  Loaded) = 0.01 • Prob(Loaded Fair) = 0.2 But, where do you start? "Begin" state not shown BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  11. Occasionally Dishonest Casino - HW#3 "Begin" state? 50:50 chance of starting with F vs L die BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  12. Calculating Different Paths to an Observed Sequence This slide has been changed transition probability emission probability Calculations such as those shown below are used to fill a matrix with probability values for every state at every position BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  13. Calculate optimal path? Construct a matrix of probability values for every state at every residue How: one way = Viterbi Algorithm • Initialization (i = 0) • Recursion (i = 1, . . . , L): For each state k • Termination: To find*, use trace-back, as in dynamic programming BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  14. Viterbi for Calculating Most Probable Path* x 2 6  6 0 0 B 1 0 (1/6)max{(1/12)0.99, (1/4)0.2} = 0.01375 (1/6)max{0.013750.99, 0.020.2} = 0.00226875 (1/6)(1/2) = 1/12 0 F  (1/2)max{0.013750.01, 0.020.8} = 0.08 (1/10)max{(1/12)0.01, (1/4)0.8} = 0.02 (1/2)(1/2) = 1/4 0 L * Path within HMM that matches query sequence with highest probability BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  15. Total Probability Several different paths can result in observation x Probability that our model will emit x is: BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  16. Calculating the Total Probability: This slide has bee changed x 2 6  6 0 0 B 1 0 (1/6)sum{(1/12)0.99, (1/4)0.2} = 0.022083 (1/6)sum{0.0220830.99, 0.0200830.2} = 0.004313 (1/6)(1/2) = 1/12 0 F  (1/2)sum{0.0220830.01, 0.0200830.8} = 0.008144 (1/10)sum{(1/12)0.01, (1/4)0.8} = 0.020083 (1/2)(1/2) = 1/4 0 L Total probability = = 0 + 0.004313 + 0.008144 = 0.012 Note: This not the same as matrix on previous slide! Here, last column contains sums for each row BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  17. A few more Details re: Profiles & HMMs • Smoothing or "Regularization" - method used to avoid "over-fitting" • Common problem in machine learning (data-driven) approaches • Limited training sample size causes over-representation of observed characters while "ignoring" unobserved characters • Result?Miss members of family not yet sampled (too many false negative hits) • Pseudocounts- adding artificial values for 'extra' amino acid(s) not observed in the training set • Treated as a 'real' values in calculating probabilities • Improve predictive power of profiles & HMMs • Dirichlet mixture - commonly used mathematical model to simulate the aa distribution in a sequence alignment • To "correct" problems in an observed alignment based on limited number of sequences BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  18. Chp 7 - Protein Motifs & Domain Prediction SECTION II SEQUENCE ALIGNMENT Xiong: Chp 7 Protein Motifs and Domain Prediction • √Identification of Motifs & Domains in MSAs • √Motif & Domain Databases Using Regular Expressions • √Motif & Domain Databases Using Statistical Models • Protein Family Databases • Motif Discovery in Unaligned Sequences • √Sequence Logos BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  19. Motifs & Domains • Motif - short conserved sequence pattern • Associated with distinct function in protein or DNA • Avg = 10 residues (usually 6-20 residues) • e.g., zinc finger motif - in protein • e.g., TATA box - in DNA • Domain - "longer" conserved sequence pattern, defined as a independent functional and/or structural unit • Avg = 100 residues (range from 40-700 in proteins) • e.g., kinase domain or transmembrane domain - in protein • Domains may (or may not) include motifs BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  20. 2 Approaches for Representing "Consensus" Information in Motifs & Domains • Regular expression - symbolic representation of information from MSA • e.g., protein phosphorylation site motif: [S,T]- X- [R,K] • Symbols represent specific or unspecified residues, spaces, etc. • 2 mechanisms for matching: • Exact • "Fuzzy" (inexact, approximate) - flexible, more permissive to detect "near matches" • Statistical model - includes probability information derived from MSA • e.g., PSSM, Profile, or HMM BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  21. Motif & Domain Databases Based on regular expressions: • Prosite (Interpro includes Prosite, PRINTS, etc) • Emofit Limitation: these don't take probability info into account Based on statistical models: • PRINTS • BLOCKS • ProDom • Pfam • SMART • CDART • Reverse PsiBLAST • READ your textbook & try some of these at home; there are distinct advantages/disadvantages associated with each • TAKE HOME LESSON: Always try several methods! (not just one!) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  22. Protein Family Databases • In addition to databases of "related" protein sequences, based on shared motifs or domains (Pfam, BLOCKS, CDART), some databases "cluster" sequences into families based on near full-length sequence comparisons • COGs - Clusters of Orthologous Groups (at NCBI) • Mostly Prokaryotic sequences • KOG = newer Eukaryotic version • COGnitor - softwared to search database • ProtoNet - also clusters of homologous protein sequences • Advantages: tree-like hierarchical structure • Provide GO (gene ontology) annotations • Provides InterPro keywords BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  23. Motif Discovery in Unaligned Sequences Expectation Maximization - generate"random" alignment of all sequences, derive PSSM, iteratively match individual sequences to PSSM to edit & improve it Problems? Can hit a local optimum (premature convergence) Sensitive to initial alignment • MEME - Multiple EM for Motif Elicitation - modified EM, avoids local optimum issues; two step procedure Gibbs Sampling - generate "trial" PSSM from random alignment first, as in EM, but leave one sequence out of initial alignment, then iteratively match PSSM to left-out sequences • Gibbs Sampler - web-based motif search via Gibbs sampling • Not mentioned in textbook: • Stochastic context-free grammers • Other "state of the art"pproaches in recent literature, but not available in web-based servers (yet) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  24. Chp 12 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 12 Protein Structure Basics • LAB 6 • Introduction to Protein DataBank - PDB • PyMol • Cn3D? BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  25. Chp 12 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 12 Protein Structure Basics • Amino Acids • Peptide Bond Formation • Dihedral Angles • Hierarchy • Secondary Structures • Tertiary Structures • Determination of Protein 3-Dimensional Structure • Protein Structure DataBank (PDB) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  26. Protein Structure & Function • Protein structure - primarily determined by sequence • Protein function - primarily determined by structure • Globular proteins: compact hydrophobic core & hydrophilic surface • Membrane proteins: special hydrophobic surfaces • Folded proteins are only marginally stable • Some proteins do not assume a stable "fold" until they bind to something = Intrinsically disordered • Predicting protein structure and function can be very hard --& fun! BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  27. 4 Basic Levels of Protein Structure BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  28. Primary & Secondary Structure • Primary • Linear sequence of amino acids • Description of covalent bonds linking aa’s • Secondary • Local spatial arrangement of amino acids • Description of short-range non-covalent interactions • Periodic structural patterns: -helix, b-sheet BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  29. Tertiary & Quaternary Structure • Tertiary • Overall 3-D "fold" of a single polypeptide chain • Spatial arrangement of 2’ structural elements; packing of these into compact "domains" • Description of long-range non-covalent interactions (plus disulfide bonds) • Quaternary • In proteins with > 1 polypeptide chain, spatial arrangement of subunits BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  30. "Additional" Structural Levels • Super-secondary elements • Motifs • Domains • Foldons BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  31. Amino Acids • Each of 20 different amino acids has different "R-Group" or side chain attached to Ca BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  32. Peptide Bond is Rigid and Planar BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  33. Hydrophobic Amino Acids BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  34. Charged Amino Acids BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  35. Polar Amino Acids BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  36. Certain Side-chain Configurations are Energetically Favored (Rotamers) Ramachandran plot: "Allowable" psi & phi angles BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  37. Glycine is Smallest Amino Acid R group = H atom • Glycine residues increase backbone flexibility because they have no R group BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  38. Proline is Cyclic • Proline residues reduce flexibility of polypeptide chain • Proline cis-trans isomerization is often a rate-limiting step in protein folding • Recent work suggests it also may also regulate ligand binding in native proteins Andreotti (BBMB) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  39. Cysteines can Form Disulfide (S-S) Bonds • Disulfide bonds (covalent) stabilize 3-D structures • In eukaryotes, disulfide bonds are often found in secreted proteins or extracellular domains BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  40. Globular Proteins Have a Compact Hydrophobic Core • Packing of hydrophobic side chains into interior is main driving force for folding • Problem?Polypeptide backbone is highly polar (hydrophilic) due to polar -NH and C=O in each peptide unit (which are charged at neutral pH=7, found in biological systems); these polar groups must be neutralized • Solution? Form regular secondary structures, • e.g., -helix, b-sheet, stabilized by H-bonds BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  41. Exterior Surface of Globular Proteins is Generally Hydrophilic • Hydrophobic core formed by packed secondary structural elements provides compact, stable core • "Functional groups" of protein are attached to this framework; exterior has more flexible regions (loops) and polar/charged residues • Hydrophobic "patches" on protein surface are often involved in protein-protein interactions BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  42. Protein Secondary Structures • Helices • Sheets • Loops • Coils BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  43. Helix: Stabilized by H-bonds between every ~ 4th residue in Backbone C = black O = red N = blue H = white Look! - Charges on backbone are "neutralized" by hydrogen bonds (H-bonds) -red fuzzy vertical bonds BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  44. Certain Amino Acids are "Preferred" & Others are Rare in Helices • Ala, Glu, Leu, Met = good helix formers • Pro, Gly Tyr, Ser = very poor • Amino acid composition & distribution varies, depending on on location of helix in 3-D structure BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  45. -Sheets - also Stabilized by H-bonds Between Backbone Atoms Anti-parallel Parallel BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  46. Loops • Connect helices and sheets • Vary in length and 3-D configurations • Are located on surface of structure • Are more "tolerant" of mutations • Are more flexible and can adopt multiple conformations • Tend to have charged and polar amino acids • Are frequently components of active sites • Some fall into distinct structural families (e.g., hairpin loops, reverse turns) BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  47. Coils • Regions of 2' structure that are not helices, sheets, or recognizable turns • Intrinsically disordered regions appear to play important functional roles BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

  48. Chp 13 - Protein Structure Basics SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classfication • Protein Structural Visualization • Protein Structure Comparison • Protein Structure Classification BCB 444/544 F07 ISU Dobbs #19- Protein Structure Basics & Classification

More Related