1 / 13

Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

Large-Scale Metabolic Network Alignment: MetaCyc and KEGG. Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com. Problem Motivation. There are an increasing number of ‘encyclopedic’ metabolic networks, or reaction databases

golda
Download Presentation

Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-Scale Metabolic Network Alignment: MetaCyc and KEGG Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com

  2. Problem Motivation • There are an increasing number of ‘encyclopedic’ metabolic networks, or reaction databases • KEGG and MetaCyc, plus Rhea, BRENDA, and GO • A natural question to ask is, “what is similar / different between them?” • There has been some linking of MetaCyc compounds to KEGG, but none for reactions

  3. Challenges with Mapping Objects • Multiple aspects to compare (name, chemical structure, reaction substrates, external identifiers) • Inexact naming • Inexact structures (different specificity of stereocenters) • Inexact description of reactions (classes vs. instances, proton-balancing) • How to combine the evidence in a logical fashion

  4. Evidence / Prediction Data Structure • Prediction type • Predictor name • MetaCyc Object Frame ID • KEGG Object identifier • Iteration number • Parameters used

  5. Compound Evidence • Curated MetaCyc links to KEGG • Name matching • PubChem identifier mapping (used for ChEBI as well) • Molecular Fingerprint Tanimoto Similarity Coefficient • InChI string comparison • Exact Sub-Structure Match (no stereochemistry) • ‘All-but-one’ inference

  6. Compound Prediction Detail: ‘All-but-one’ • Most of the compounds between these two reactions are the same • Class vs. instance, and naming issues lead to unknown match between “acceptor” and “oxidized electron acceptor”

  7. Reaction Evidence • EC Numbers • UniProt Accession Numbers • Name matches (gleaned from associated objects) • Exact equation match • Inexact equation match (cosine similarity)

  8. Reaction Prediction Detail: UniProt Mapping • Use UniProt Accession numbers to map the enzymes in MetaCyc and KEGG to one another • Use UniRef 90 or 100 to map “the same protein” when not exact same Accession Number

  9. From Evidence to Prediction • Rule #1: Evidence is partitioned into exact and inexact quality types • Rule #2: Evidence is partitioned (orthogonally) into qualitative and quantitative types • Rule #3: A high-quality prediction will consist of qualitative and quantitative evidence in favor • Rule #4: We try exact quality types first, then inexact • Rule #5: No contradictory predictions • Rule #6: Iterate until no new predictions • This is the general idea, but implementation has some domain-based heuristics thrown in, such as with EC numbers

  10. Statistics • 3269 MetaCyc reactions with links to KEGG (~75%) • 4004 MetaCyc compounds with links to KEGG (>50%)

  11. Future Work • Greatest Common Subgraph Compound Matching • Sensitivity / Specificity characterization of all evidence types • Analysis of unmatched content of KEGG and MetaCyc for algorithm improvement and focused curation (inexact reaction matching and novel reaction import) • Hierarchical clustering of compounds and reactions • A general tool that can be used for any two given metabolic networks • Recasting algorithm in a machine learning framework

  12. Peter Karp Douglas Brutlag Dan Davison Anamika Kothari Joseph Dale Acknowledgements MetaCyc.org

  13. … One more thing! • Pathway Tools API (beta) • http://www.ai.sri.com/~taltman/ptools-api/ptools_api.html

More Related