High Accuracy Scoring Functions for Computational Protein Structure Refinement

High Accuracy Scoring Functions for Computational Protein Structure Refinement Michael Zhou Departments of Computer Science and Biochemistry

The Big Picture Translation Ala Ser Glu Folding Leu Pro Ser 3D Protein Structure Stop Amino Acid Sequence

Importance and Difficulties • Structure  Function • Understanding function is important • Protein misfunction diseases • Function of disease organism proteins • Engineering proteins

Why Computational Prediction? • Fast and Cheap • Unfortunately It’s a Difficult Problem • Large search space – many possible conformations • Lack of good templates

My Project • Focus on Model Refinement • Making better models from existing ones • How do you know which models are better? • Answer: Scoring functions -35.0123 Protein Model Score

Approach • Combination of different types of scoring functions • Improve performance through multiple linear regression

Compactness and Hydrophobicity Based Interatomic Distance Based Multiple Linear Regression Combination Score

Training the Method • Training Set • ~40000 conformations across 127 different proteins from CASP8 • 10 fold cross validation • Independent testing and training sets

Training Set Testing Set

Measuring Performance • Measuring Model Accuracy • CαRMSD - Our “gold standard”

Measuring Performance • Measuring Scoring Accuracy • Want to be able to pick out best models

Comparison of Scoring Functions

Score vsCαRMSD

Future Directions • Generalize to beyond refinement • Training sets with models from many different generation methods • More sophisticated machine learning • SVM, HMM, Neural Networks, etc • Benchmarking with other methods in the field

Acknowledgements Samudrala Computational Biology Group • Ram Samudrala • Brady Bernard • Jeremy Horst • Gaurav Chopra • Michael Shannon • Ling-Hong Hung • Tianyun Liu • Raymond Zhang • Adrian Laurenzi • Brian Buttrick • Manish Mishra • Stewart Moughon • Thomas Wood Funding • NSF Research Opportunities for Undergraduates Grant • Mary Gates Research Scholarship • 2010 US NIH Director's Pioneer Award (DP1OD006779) and NSF CAREER Award (0448502) to Ram Samudrala Departments • Microbiology • Computer Science • Biochemistry

Questions?

The Big Picture Transcription Translation DNA RNA Protein

High Accuracy Scoring Functions for Computational Protein Structure Refinement

High Accuracy Scoring Functions for Computational Protein Structure Refinement

Presentation Transcript

Structure and Content Scoring for XML

Protein Structure

High Throughput Computing and Protein Structure

PROTEIN STRUCTURE

Protein structure

Computational Protein Design

Computational protein design

Protein Structure

Protein Structure

Protein Structure

Computational Methods for Protein Structure Prediction

Protein Structure

Protein structure

Protein Structure

Protein Structure

PROTEIN FUNCTIONS

Macromolecular structure refinement

Computational Protein Folding

Protein structure

Protein Structure

Structure and Content Scoring for XML

Protein structure