160 likes | 243 Views
IMG terms and pathways. Krishna Palaniappan Amy Chen Frank Korzeniewski Yuri Grechkin Ernest Szeto Victor Markowitz. Natalia Ivanova Iain Anderson Thanos Lykidis Nikos Kyrpides. MGM Workshop May 16, 2012. New: SEED subsystems Transport DB, Phenotypes. Why so many?
E N D
IMG terms and pathways Krishna Palaniappan Amy Chen Frank Korzeniewski Yuri Grechkin Ernest Szeto Victor Markowitz Natalia Ivanova Iain Anderson Thanos Lykidis Nikos Kyrpides MGM Workshop May 16, 2012
New: SEED subsystems Transport DB, Phenotypes Why so many? What’s the difference? Which one should I use?
Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … Where it all comes from This information is recorded in a structured way: • ontologies (e.g. Gene Ontology) • pathway collections(metabolic and protein-protein interaction) • other (reasoning rules, like TIGR Genome Properties)
Genes are connected to phenotypes via a multi-step process, with many parameters We have very vague ideas about the steps/parameters for the majority of genes/phenotypes If we design a relational database for gene/phenotype connections, most tables will be empty Modeling the data properly – why nobody does that phenotype gene pathway transcript reaction protein enzyme compounds evidence
KEGG http://www.genome.jp/kegg/ MetaCyc http://metacyc.org/ What it looks like in real life – KEGG vs MetaCyc
Plus 4 more entries: for 1.14.99.39 for each subunit Ammonia oxidation pathway in KEGG
Similar problems to KEGG: multifunctional enzymes multisubunit enzymes differences in reaction recording The same pathway/reaction in MetaCyc
Which subunit has which cofactor? Type of Cu2+ cluster, type of Fe2+ cluster? One of the subunits is a cytochrome c, yet the enzyme is cytosolic? Does it require any help with maturation of metal clusters? • Pseudomonas sp. PB16 was shown to have only 1 enzyme from the pathway, hydroxylamine reductase. Does it have the entire pathway? Even MetaCyc record is still incomplete
Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … Even bigger mess: bioinformatics inference What about gene B in genome Y, which is similar to gene A?
If GenBank record says nothing about gene B annotation protocol, the annotation must be correct If GenBank record says the gene was manually annotated, the annotation must be correct If GenBank record says gene B was manually annotated, and it has a bi-directional best BLAST hit to gene A with e-value of 1.0e-5, the annotation must be correct … “True or false?” game
Orthology detection: fails on many families with deviation from vertical transmission BLAST is agnostic of which amino acids are more important for protein function Using consensus sequence (either as PSSM or HMM) with family-specific bit score cutoffs would be much better, but cannot be used in current implementation of KEGG Weaknesses
Pathway collections: KEGG, MetaCyc and others Which particular set of interactions is a pathway? (i. e. how do we define pathway boundaries within the network?)
All pathway collections share a common skeleton of reactions, which consist of reactants (compounds) All reactions share the common base of proteins annotated as catalysts Can we merge the information from different collections, using the best features of all of them? Ideal solution: pathway NR
A B Not an IMG term! R1 Enzyme (EC x.x.x.x) IMG term of the type “Protein complex” Enzyme (EC x.x.x.x) monomeric, needs cofactor C Enzyme (EC x.x.x.x) heterotrimeric, needs cofactor D C R2, spontaneous R4, chaperone Enzyme (EC x.x.x.x) heterotrimeric, subunit C IMG term of the type “Modified protein” Enzyme (EC x.x.x.x) monomeric precursor Enzyme (EC x.x.x.x) heterotrimeric, subunit B Enzyme (EC x.x.x.x) heterotrimeric, subunit A IMG term of the type “Gene product” IMG term of the type “Gene product” D R3, spontaneous Enzyme (EC x.x.x.x) heterotrimeric, subunit A precursor IMG terms: 3 types • IMG terms of 3 types:1. gene product2. multi-subunit protein complex3. modified protein