1 / 30

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions.

arden-joyce
Download Presentation

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting domain structure families and their domain contexts • Exploring how structural divergence in domain families correlates with functional change • Predicting domain relatives likely to have significantly different structures and functions Exploiting Structural and Comparative Genomics to Reveal Protein Functions T H A C Domain families of known structure Gene3D Protein families and domain annotations for completed genomes

  2. Congratulations Swiss-Prot - 20 Years!! Thanks to Amos, Rolf and the Swiss-Prot Team!!!!

  3. T H A C Class (3) Orengo and Thornton (1994) Architecture (36) 86,000 domains Topology or Fold (1100) Homologous superfamily (2100) H1 H2 H3

  4. Gene3D:Domain annotations in genome sequences scan against library of HMM models ~2100 CATH ~8300 Pfam >2 million protein sequences from 300 completed genomes and UniProt assign domains to CATH and Pfam superfamilies Benchmarking by structural data shows that 76% of remote homologues can be identified using the HMMs

  5. DomainFinder: structural domains from CATH take precedent NewFam Pfam-1 CATH-1 Pfam-2 Gene3D: Domain annotations in genome sequences UniProt sequence N C CATH-1 Pfam-2 Pfam-1 NewFam Assigned domains

  6. Domain families ranked by size (number of domain sequences) NewFam of unknown stucture (>50,000 families) Pfam families of unknown structure Percentage of all domain family sequences in UniProt CATH superfamilies of known structure Rank by family size >90% of domain sequences in UniProt can be assigned to ~7000 domain families

  7. Domain families ranked by size (number of domain sequences) NewFam of unknown stucture (>50,000 families) Pfam families of unknown structure Percentage of all domain family sequences in UniProt CATH superfamilies of known structure Rank by family size 100 largest families of known structure account for 30% of domain sequences in UniProt

  8. Correlation of sequence and structural variability of CATH families with the number of different functional groups Structural Diversity Population in genomes

  9. Prediting domain structure families and their domain contexts • Exploring how structural divergence in domain families correlates with functional change • Predicting domain relatives likely to have significantly different structures and functions Exploiting Structural and Comparative Genomics to Reveal Protein Functions T H A C Domain families of known structure Gene3D Protein families and domain annotations for completed genomes

  10. Some superfamilies show great structural diversity Gabrielle Reeves J. Mol. Biol. (2006) Multiple structural alignment by CORA allows identification of consensus secondary structures and secondary structure embellishments 2DSEC algorithm In 117 superfamilies relatives expanded by >2 fold or more

  11. Structural embellishments can modify the active site Galectin binding superfamily

  12. Structural embellishments can modulate domain interactions side orientation face orientation Glucose 6-phosphate dehydrogenase a Dihydrodipiccolinate reductase Additional secondary structure shown at (a) are involved in subunit interactions

  13. Structural embellishments can modify function by modifying active site geometry and mediating new domain and subunit interactions Biotin carboxylase D-alanine-d-alanine ligase ATP Grasp superfamily Dimer of biotin carboxylase

  14. Secondary structure insertions are distributed along the chain but aggregate in 3D

  15. Secondary structure insertions are distributed along the chain but aggregate in 3D

  16. 80 60 Frequency (%) 40 Indel frequency < 1 % 20 0.85% 0.38% 0.23% 0.11% 0.06% 0.02% 0 1 2 3 4 5 6 7 8 9 10 11 12 Size of Indel (number of secondary structures) 85% of insertions comprise only 1 or 2 secondary structures Frequency (%) Size of insertion (number of secondary structures) For ~70% of domains analysed, 80% of the secondary structure embellishments are co-located in 3D with 3 or more other embellishments In 80% of domains, 1 or more embellishments contacts other domains or subunits

  17. 3 Layer Alpha/Beta Sandwich 2 Layer Alpha/Beta Alpha/Beta Barrel 2 Layer Beta Sandwich Many structurally diverse superfamilies adopt folds with these regular layered architectures

  18. 3 Layer Alpha/Beta Sandwich 2 Layer Alpha/Beta Alpha/Beta Barrel 2 Layer Beta Sandwich Many structurally diverse superfamilies adopt folds with these regular layered architectures

  19. Predicting domain structure families and their domain contexts • Exploring how structural divergence in domain families correlates with functional change • Predicting domain relatives likely to have significantly different structures and functions Exploiting Structural and Comparative Genomics to Reveal Protein Functions T H A C Domain families of known structure Gene3D Protein families and domain annotations for completed genomes

  20. GEMMA – GEne Model and Model AnnotationAlgorithm for Predicting Sequence Homologues with Similar Structures and Functions structural superfamily subfamily of close sequence relatives predicted to have similar functions (>=60% sequence identity) Largest 100 CATH families have more than 20,000 subfamilies

  21. GEMMA – Predicting Functional Groups in CATH Superfamilies subfamily of close relatives predicted to have similar function (>60% identity) structural superfamily Build multiple sequence alignments for each subfamily

  22. GEMMA – Predicting Functional Groups in CATH Superfamilies subfamily of close relatives predicted to have similar function (>60% identity) structural superfamily Cluster subfamilies predicted to have similar functions into functional groups

  23. Pyruvate phosphate dikinase (subfamily 1) Succinyl-CoA synthetase (subfamily 22) SSAP score = 68.69 PSS score = 0.375 SSAP score = 93.01 PSS score = 0.827 Pyruvate phosphate dikinase (subfamily 15) SSAP score = 68.32 PSS score =0.333 ATP Grasp Family 192 subfamilies

  24. subfamily profiles coloured by residue conservation (red = high, blue = low) Profiles aligned using profile -profile comparison (MAFFT) Pyruvate phosphate dikinase Pyruvate phosphate dikinase Many fully conserved positions 6/7 positions are fully conserved Equivalent functions Scorecons (Valdar and Thornton, Profunc)

  25. subfamily profiles coloured by residue conservation (red = high, blue = low) Profiles aligned using profile -profile comparison (MAFFT) Succinyl-CoA synthetase Pyruvate phosphate dikinase Fully conserved positions No fully conserved positions Different functions Scorecons (Valdar and Thornton, Profunc)

  26. Performance in Merging Subfamilies into Functional Groups Number of functional groups predicted Error rate 10 experimentally identified enzyme functions identified in this family

  27. GEMMA – Predicting Functional Groups in CATH Superfamilies subfamily of close relatives predicted to have similar function (>60% identity) structural superfamily functional group Benchmarked on 12 large enzyme families in CATH 6-10 fold reduction in the number of functional subfamilies

  28. Summary • More than half the domains in UniProt can be assigned to families of known structure • Analysis of some very large structural families revealed how secondary structure insertions can modulate functions • Functional groups can be identified in diverse families by comparing multiple features (e.g. residue conservation, predicted secondary structure)

  29. CATH Gene3D Lesley Greene Stathis Sidderis Russell Marsden Ian Sillitoe Sarah Addou Juan Ranea Tony Lewis Dave Lee Ollie Redfern Alison Cuff Mark Dibley Ilhem Diboun Adam Reid Corin Yeats Tim Dallman http://www.biochem.ucl.ac.uk/bsm/cath_new MRC, Wellcome Trust, NIH, EU -Biosapiens, Embrace, Enfin, BBSRC

More Related