1 / 49

Methods for comparing protein structures Protein structural classifications

Comparing and Classifying Domain Structures. Methods for comparing protein structures Protein structural classifications How do structures and functions diverge in protein superfamilies What proportion of genome sequences can be predicted to belong to superfamilies of known structure?.

mcannon
Download Presentation

Methods for comparing protein structures Protein structural classifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing and Classifying Domain Structures • Methods for comparing protein structures • Protein structural classifications • How do structures and functions diverge in protein superfamilies • What proportion of genome sequences can be predicted to belong to superfamilies of known structure?

  2. Protein Domain Family Classifications Known domain structures Alexey Murzin, LMB, Cambridge Predicted domain structures Julian Gough, Bristol University Known domain structures Predicted domain structures Christine Orengo, UCL Domain sequences Alex Bateman, Sanger

  3. domains are important evolutionary units 60-80% of genes in genomes code for multidomain proteins

  4. Evolution gives rise to families of proteins (homologues) Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved

  5. Evolution gives rise to families of proteins (homologues) orthologues Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved

  6. Evolution gives rise to families of proteins paralogues Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved

  7. Structural diversity in the CATH Domain Family P-loop hydrolases Cocaine esterase Acetylcholinesterase Cutinase structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved

  8. residue substitutions due to single base mutations insertions or deletions (indels) of residues - usually not in the secondary structures but in the connecting loops Usually the structural cores are highly conserved Although structure is much more conserved than the sequence there can still be considerable structural differences between relatives outside the core Challenges in comparing protein structures

  9. residue insertions usually occur in the loops connecting secondary structures • substitutions can cause shifts in the orientations of secondary structures

  10. Superposition of OB fold Structures

  11. Related structures RMSD usually < 3.5A

  12. ignore the variable loop regions and only compare the secondary structures use algorithms which can explicitly handle insertions/deletions e.g. dynamic programming, simulated annealing Coping with Insertions and Deletions

  13. Fast structure comparison by secondary structures

  14. Graphs can be compared using the Bron Kerbosch algorithm to find the largest common graph In this example the common graph contains 5 nodes. E E E E E E H H E H H H H H H H H Generallly ~1000 times faster than residue based methods

  15. STRUCTAL Score distances between superposed residues in path matrix Use dynamic programming to find best path Align sequences Superpose structures Use equivalences given by the best path to re-superpose the structures

  16. Secondary structure based: SSM HenrickPDB GRATH Harrison & Orengo CATH Residue based: SSAP Taylor and Orengo CATH DALI Holm and Sander SCOP Comparer Sali and Blundell HOMSTRAD FatCat Adam GodzikPDB Structal Levitt PDB Structure Comparison Algorithms Structure classification Structural Bioinformatics, Ed: Phil Bourne, Wiley 2003 Bioinformatics: Genes, Proteins and Computers, Bios, 2003

  17. C lass Domain structure database A Orengo & Thornton 1993 rchitecture T opology or Fold Group H omologous Superfamily ~200,000 domains 2600 domain superfamilies

  18. H C A T ~200,000 domains 3 Class ~40 Architecture Topology or Fold ~1200 domain database

  19. CATH Architectures Orthogonal bundle Up-down bundle • -horseshoe a-solenoid b-ribbon aa-barrel b-sheet b-roll b-barrel

  20. CATH Architectures Clam Trefoil 2-layer b-sandwich Orthogonal b-prism Parallel b-prism 3-layer b-sandwich b-solenoid ab-roll b-propeller

  21. CATH Architectures ab-barrel 2-layer (ab) sandwich 3-layer (aba) sandwich 3-layer (bba) sandwich 3-layer (bab) sandwich 4-layer (abba) sandwich ab-prism ab-box ab-horseshoe

  22. H A T C Topology or Fold Group ~1200 40,000 domain entries ~200,000 domain entries Homologous Superfamily ~2600 Sequence Family (30%)

  23. Divergent Evolution Divergent Evolution ..VILST… ..KLST… ...SLTRF... ..VILST… ..KLST… ...SLTRF... Convergent Evolution Convergent Evolution

  24. Homologous Structures cholera toxin pertussis toxin SSAP score 97 81 Sequence identity 79% 12% Heat labile enterotoxin • high structure similarity score, often < 4A • may have detectable sequence similarity e.g. by HMMs • related functions

  25. structural similarity • no sequence similarity • no functional similarity Evolutionary Ancestry Uncertain

  26. How do proteins evolve new functions?

  27. Evolution of Protein Functions in Domain Superfamilies domain duplication residue mutations and domain structure embellishments domain fusion, change in domain partner oligomerisation

  28. Mutation of ResiduesTIM barrel glycosyl hydrolases acid chitinase A Glu general acid narbonin Glu incorporated in a salt-bridge and this blocks substrate access

  29. Changes in domain function in paralogous relatives EC code: 2.7.7.3 2.7.7.39 binding site binding site Pantetheine-phosphate adenyltransferase Glycerol-3-phosphate cytidylyl transferase changes in the domain structure can modify the binding site or domain surface

  30. binding site Pantetheine-phosphate adenyltransferase Arginyl-tRNA synthetase 1od6A00 1f7uA01

  31. Arginyl-tRNA synthetase

  32. changes in the domain partnerships can modify the binding site binding site Pantetheine-phosphate adenyltransferase Asparagine synthetase B

  33. Change in Oligomerisation Thioredoxin superfamily peroxidase calsequestrin

  34. The Mosaic Theory of Protein Evolution Teichmann et al 2001,2003 Gerstein et al. 2001 60-80% of proteins are multi-domain few thousand domain superfamilies (< 10,000 CATH, SCOP and Pfam) > Two million domain combinations (multi-domain architectures)

  35. Similarity in Chemistry 19% P I conserved P semiconserved I 67% P P P poorly conserved 7% I P P I’ 7% P’ unconserved nearly 90% of families show full or partial conservation of functions

  36. chemistry is conserved or semi-conserved across the family but the substrate can change cytochrome P450s FAD/NAD(P)(H)-dependent disulphide oxidoreductases hexapeptide repeat proteins

  37. blade domain

  38. fulcrum domain

  39. handle domain

  40. How representative are these structural superfamilies (ie in CATH, SCOP) of all proteins in nature?

  41. :Domainstructure predictions in genome sequences protein sequences from UniProt ~ 26 million domain sequences assigned to CATH superfamilies ~6000 annotated genomes scan against library of sequence patterns (HMM models) for CATH

  42. Pfam-A Pfam-B Other • Pfam-A • 10,340 curated families with annotation

  43. NewFam? CATH and Pfam coverage of genomes

  44. Protein Family Databases Each family is represented by a sequence profile or HMM

More Related