Protein Structure Prediction

Protein Structure Prediction • What’s the big deal? • Why is it important? • Who is working on it • Different methods • Methods; • Comparative modeling • Fold recognition or threading • ab initio folding • Genetic algorthms

Why do we need structure prediction? 3D structure give clues to function: • active sites, binding sites, conformational changes... • structure and function conserved more than sequence 3D structure determination is difficult, slow and expensive Intellectual challenge, Nobel prizes etc... Engineering new proteins

IEEE Computer July 2002 page 27 Computational Biology’s Holy Grail “When asked what the Holy Grail of computational biology is, most researchers would answer that it is either sequence-structure-function prediction or Computing the genotype-phenotype map.”

Comparison of Different methods Structure prediction Summary of the four main approaches to structure prediction. Note that there are overlaps between nearly all categories. Method Knowledge Approach Difficulty Usefulness Comparative Proteins of Identify related Relatively Very, if Modelling known structure with easy sequence (Homology structure sequence methods, identity modelling) copy 3D coords and > 40% drug modify where design necessary Fold Proteins of Same as above, but Medium Limited due Recognition known use more to poor Structure sophisticated models methods to find related structure

Comparison of Different methods Structure prediction Summary of the four main approaches to structure prediction. Note that there are overlaps between nearly all categories. Method Knowledge Approach Difficulty Usefulness Secondary Sequence- Forget 3D Medium Can improve structure structure arrangement and alignments, Prediction statistics predict where the fold helices/strands are recognition, ab initio ab initio Energy Simulate folding, or Very hard Not really Tertiary functions, generate lots of Structure statistics structures and try to Prediction pick the correct one

Protein Structure Prediction Instrumentation methods for determining a proteins structure • X-ray crystallography • NMR spectroscopy

A Guide to Structure Prediction (version 2.1) EMBLMeyerhofstrasse, 1D-69117 HeidelbergGermany speedy.embl-heidelberg.de/gtsp/

Experimental Data • Much experimental data can aid the structure prediction • process. Some of these are: • Disulphide bonds, which provide tight restraints on the location • of cysteines in space. • Spectroscopic data, which can give you an idea as to the • secondary structure content of your protein. • Site directed mutagenesis studies, which can give insights as • to residues involved in active or binding sites. • Knowledge of proteolytic cleavage sites, post-translational • modifications, such as phosphorylation or glycosylation • can suggest residues that must be accessible. • Remember to keep all of the available data in mind when • doing predictive work. Always ask yourself whether a • prediction agrees with the results of experiments. If not, • then it may be necessary to modify what you've done.

The PSA Protein Structure Prediction Server bmerc-www.bu.edu/psa/ The Protein Sequence Analysis (PSA) server predicts probable secondary structures and folding classes for a given amino acid sequence. Used for proteins of unknown structure and for which no homologous sequences are known Developed at: The BioMolecular Engineering Research Center (BMERC) of Boston University in Boston, Massachusetts, and TASC, Inc. in Reading, Massachusetts. Email or webpage submissions Return data in PDF or PS format

NNPREDICTProtein Secondary Structure Prediction www.cmpharm.ucsf.edu/~nomi/nnpredict.html nnpredict is a program that predicts the secondary structure type for each residue in an amino acid sequence. The basis of the prediction is a two-layer, feed-forward neural network. The network weights were determined by a separate program -- a modification of the Parallel Distributed Programming suite of McClelland and Rumelhart (1). Input a sequence consisting of one-letter amino acid codes (A C D E F G H I K L M N P Q R S T V W Y) (NOTE: B and Z are not recognized as valid amino acid codes) or three-letter amino acid codes separated by spaces (ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN PRO GLN ARG SER THR VAL TRP TYR).

Other Sources of Protein Structure Prediction 123d.ncifcrf.gov/sarf2.html Common SARFs in protein structures SARF stands for Spatial ARangement of backbone Fragments. Small alpha helix 1aca Submit 123d.ncifcrf.gov/run123D+.html http://www.sbg.bio.ic.ac.uk/~3dpssm/ A Fast, Web-based Method for Protein Fold Recognition using 1D and 3D Sequence Profiles coupled with Secondary Structure and Solvation Potential Information.

UCSC HMM Applications 2GLIA. Chain A, Five-Finger [gi:2392684] TITLE Crystal structure of a five-finger GLI-DNA complex: new perspectives on zinc fingers SOURCE Homo sapiens (human) See graphics (Secondary structure of 2gli) and 2GLIx500 Amino acid sequence: >vyetdcrwdgcsqefdsqeqlvhhinsehihgerkefvchwggcsrelrpfk aqymlvvhmrrhtgekphkctfegcrksysrlenlkthlrshtgekpymcehe gcskafsnasdrakhqnrthsnekpyvcklpgctkrytdpsslrkhvktvhgpda

Testing The UCSC HMM Application Using a known protein five-finger GLI on DNA

LIBRA I Structure Prediction by Threading: Forward Folding Protocol www.ddbj.nig.ac.jp/E-mail/libra/LIBRA_I.html Compatible structures of a target sequence are sought from the structural library chosen from Protein Data Bank (PDB). The target sequence and 3D profile are aligned by simple dynamic programming. According to the alignment, sequence re-mounts on the structure and its fitness are evaluated by psuedo-energy potential. Scores are sorted from the best match and shown as well as their alignments.

LIBRA I Sequence Homology Search by Threading: Inverse Folding Compatible sequences of a target structure are sought from the sequence database (Swiss-Prot). Scores are sorted from the best match and shown as well as their alignments. A recent study revealed that it is suitable in this search to use the 3D-1D alignment score per se as the compatibility score rather than the sequence re-mounting score.

The problem of predicting protein structure from sequence remains fundamentally unsolved despite more than three decades of intensive research effort. The search has been driven by the belief that the 3D structure of a protein is determined by its amino acid sequence (Anfinsen, 1973). While it is now known that chaperones often play a role in the folding pathway, and in correcting misfolds (Corrales and Fersht, 1996, Hartl et al., 1994), it is believed that the final structure is at the free-energy minimum. Thus, all information needed to predict the native structure of a protein is contained in the amino acid sequence, plus a knowledge of its native solution environment.

Ab initio prediction of protein structure from sequence: not yet; Given only the amino acid sequence, it should be possible in principle to directly predict protein structure from physico- chemical principles using, for example, molecular dynamics methods (Levitt and Warshel, 1975). In practice, however, such approaches are frustrated by the enormous complexity of the calculation (requiring many orders of magnitude more computing time than is currently feasible) and by inaccuracies in the experimental determination of basic parameters (van Gunsteren, 1993, Shortle et al., 1996). Thus, the most successful structure prediction tools are knowledge-based, using a combination of statistical theory and empirical rules.

Odyssey of evolution teaches us structure prediction; It appears that for most proteins, almost all residues can be changed without affecting the structure (Rost et al., 1996b); however, a single, randomly chosen mutation is more likely to destabilize than to maintain a particular structure. Thus, the precise pattern of amino acid exchanges observed in a multiple sequence alignment of a protein family is highly indicative of the particular structure. These patterns constitute a fossil record of mutations preserving protein structure and function. The importance of such evolutionary information for structure prediction was realized very early and has long been exploited in exceptional cases by experts, as well as in automatic and systematic ways. More recently, the use of evolutionary information has grown in importance. This importance was made particularly clear recently when it was shown that the accuracy of secondary structure was improved to over 70% due to the use of evolutionary Information.

Genetic programming for protein structure prediction S. Sun, Reduced representation model of protein structure prediction: statistical potential and genetic algorithms, Protein Science, vol 2, no 5, pp. 762-785, 1993. Lamont, Gary B., Charles Kaiser, George Gates, Laurence Merkle, and Ruth Pachter, Real-Valued Genetic Algorithm Case Studies in Protein Structure Prediction, Proceedings of the SIAM Conference on Parallel Applications, March 1997. Natalio Krasnogor & Daniel H. Marcos & David Pelta & Walter A. Risi. Protein Structure Prediction as a Complex Adaptive System ,Frontiers in Evolutionary Algorithms (FEA98), 1998 Natalio Krasnogor & Bill Hart & Jim Smith & David Pelta Protein Structure Prediction With Evolutionary Algorithms, Proceedings of the 1999 International Genetic and Evolutionary Computation Conference (GECCO99).

Some source pages http://www.sbc.su.se/~maccallr/ http://scpd.stanford.edu/SOL/courses/proEd/RACMB/vidList.htm http://scpd.stanford.edu/SOL/courses/proEd/RACMB/vidList.htm

Protein Structure Prediction