310 likes | 366 Views
Protein Tertiary Structure. Protein Data Bank (PDB). Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids: ~87,000 structures.
E N D
Protein Data Bank (PDB) • Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids: ~87,000 structures. • The data is typically obtained by X-ray crystallography or NMR (Nuclear magnetic resonance) spectroscopy and submitted by biologists and biochemists from around the world. • Freely accessible.
Accession number PDB file Java based visualization tools 2ndary structure
PDB file example: A PDB file can be viewed by different visualization tools , such as Pymol
Protein, chain, domain • Here is a protein compound by 4 chains. • Which protein is that?
Protein, chain, domain • One chain may have multiple domains. • A protein domain is a conserved part of a given protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. • Each domain has a stable 3D structure.
Protein domain classifications • Scientists have tried to classify proteins by their structural properties into a tree-like hierarchy. • The 2 most used domain classifications are CATH and SCOP.
CATH:Protein Domain Structure ClassificationClass, Architecture, Topology and Homology • Class: The secondary structure composition: mainly-alpha, mainly-beta and alpha-beta. • Architecture: The overall shape of the domain structure. Orientations of the secondary structures : e.g. barrel or 3-layer sandwich. • Topology: Structures are grouped into fold groups at this level depending on both the overall shape and connectivity of the secondary structures. • Homologous Superfamily: Evolutionary conserved structures
CATH:Protein Domain Structure ClassificationClass, Architecture, Topology and Homology
SCOPStructural Classification of Proteins http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html Based on known protein structures • Manually created by visual inspection • Hierarchical database structure: • Class, Fold, Superfamily, Family, Protein and Species
Node Parents of node Children of node
Protein structure alignment • Structural alignment attempts to establish homology between two or more protein structures based on their 3D conformation. • Structural alignmentoften implies evolutionary relationships between proteins with low seq-id.
Sequence – structure relations • Similar sequences Similar structures. • Different sequences ??? • Different sequences that fold into similar structures are most interesting, since they imply a common origin. • This is what we aim to find
Protein structure alignment • Alignment tools try to superimpose the 2 structures, so that the distance between them is minimal. • The distance measure is RMSD - Root Mean Square Deviation. • Given two sets of n points v and w, the RMSD is defined as follows:
Protein structure alignment • The structural alignment servers do LOCAL structural alignment. • They try to align larger stretches of protein backbone with minimal RMSD. • Thus, another parameter to assess the quality of the alignment is the alignment length.
Protein structure alignment similar • Low RMSD _________ structures • Low alignment length _________ structures • SAS score = 100*RMSD/(alignment length) • Low SAS _________ structures dissimilar similar
Structure alignment servers Dalilite: http://www.ebi.ac.uk/Tools/structure/dalilite/ • 1XIS and 1NAR have only 7% sequence identity, but they are structurally similar. • We will download their pdb files from the PDB, and structurally align them using Dalilite.
Food for thought How can structure alignment help us in structure prediction?
Structure prediction • Input: protein sequence; • Output: protein 3D structure. • This is a VERY difficult task. • CASP: Critical Assessment of Techniques for Protein Structure Prediction • Worldwide experiment for protein structure prediction taking place every two years.
Structure prediction Comparative Modeling Ab Initio Modeling • uses previously solved structures as starting points, or templates. build 3D protein models "from scratch", i.e., based on physical principles rather than on previously solved structures. • Homology modeling: searches similarity in sequences with known structures. • Protein threading: • sequence to structure alignment, against a database of ‘templates’ – known structures.
I-TASSER structure prediction server • based on multiple-threading alignments • I-TASSER (as 'Zhang-Server') was ranked as the No 1 server for protein structure prediction in recent CASP7, CASP8, CASP9, and CASP10 experiments.