1 / 18

High Accuracy Scoring Functions for Computational Protein Structure Refinement

High Accuracy Scoring Functions for Computational Protein Structure Refinement. Michael Zhou Departments of Computer Science and Biochemistry. The Big Picture. Translation. Ala. Ser. Glu. Folding. Leu. Pro. Ser. 3D Protein Structure. Stop. Amino Acid Sequence.

zack
Download Presentation

High Accuracy Scoring Functions for Computational Protein Structure Refinement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Accuracy Scoring Functions for Computational Protein Structure Refinement Michael Zhou Departments of Computer Science and Biochemistry

  2. The Big Picture Translation Ala Ser Glu Folding Leu Pro Ser 3D Protein Structure Stop Amino Acid Sequence

  3. Importance and Difficulties • Structure  Function • Understanding function is important • Protein misfunction diseases • Function of disease organism proteins • Engineering proteins

  4. Why Computational Prediction? • Fast and Cheap • Unfortunately It’s a Difficult Problem • Large search space – many possible conformations • Lack of good templates

  5. My Project • Focus on Model Refinement • Making better models from existing ones • How do you know which models are better? • Answer: Scoring functions -35.0123 Protein Model Score

  6. Approach • Combination of different types of scoring functions • Improve performance through multiple linear regression

  7. Compactness and Hydrophobicity Based Interatomic Distance Based Multiple Linear Regression Combination Score

  8. Training the Method • Training Set • ~40000 conformations across 127 different proteins from CASP8 • 10 fold cross validation • Independent testing and training sets

  9. Training Set Testing Set

  10. Measuring Performance • Measuring Model Accuracy • CαRMSD - Our “gold standard”

  11. Measuring Performance • Measuring Scoring Accuracy • Want to be able to pick out best models

  12. Comparison of Scoring Functions

  13. Score vsCαRMSD

  14. Score vsCαRMSD

  15. Future Directions • Generalize to beyond refinement • Training sets with models from many different generation methods • More sophisticated machine learning • SVM, HMM, Neural Networks, etc • Benchmarking with other methods in the field

  16. Acknowledgements Samudrala Computational Biology Group • Ram Samudrala • Brady Bernard • Jeremy Horst • Gaurav Chopra • Michael Shannon • Ling-Hong Hung • Tianyun Liu • Raymond Zhang • Adrian Laurenzi • Brian Buttrick • Manish Mishra • Stewart Moughon • Thomas Wood Funding • NSF Research Opportunities for Undergraduates Grant • Mary Gates Research Scholarship • 2010 US NIH Director's Pioneer Award (DP1OD006779) and NSF CAREER Award (0448502) to Ram Samudrala Departments • Microbiology • Computer Science • Biochemistry

  17. Questions?

  18. The Big Picture Transcription Translation DNA RNA Protein

More Related