1 / 70

Bio-ontologies for Annotation and Service Discovery

Bio-ontologies for Annotation and Service Discovery. Chris Wroe ( + material from Carole Goble, Alan Rector, Jeremy Rogers, Ian Horrocks) University of Manchester, UK. Overview. Example driven tour of the why , what and how of ontologies in life sciences

marged
Download Presentation

Bio-ontologies for Annotation and Service Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bio-ontologies for Annotation and Service Discovery Chris Wroe ( + material from Carole Goble, Alan Rector, Jeremy Rogers, Ian Horrocks) University of Manchester, UK

  2. Overview • Example driven tour of the why, what and how of ontologies in life sciences • Cover the key features of an ontology • Vocabulary, definitions, hierarchies, grammar & reasoning • Cover the key targets of ontology use • Biological knowledge, service descriptions, (database schema)

  3. Ontology – the discipline • Semantics – the meaning of meaning. • Philosophical discipline, branch of philosophy that deals with the nature and the organisation of reality. • Science of Being (Aristotle, Metaphysics, IV,1) • What is being? • What are the features common to all beings?

  4. In science…ontology the thing • A resource to aid the precise communication and integration of information • Binds a community to communicate information in some domain of interest in a consistent manner.

  5. Gene Ontology – a community effort • Model organism databases need to be integrated • Not possible if they all use a different vocabulary • Gene Ontology Consortium got together to form • “a dynamic controlled vocabulary that can be applied to all eukaryotes”

  6. Gene Ontology – keeping it simple • Provide three separate vocabularies to describe: • The function a gene product is capable of. • The process a gene product takes part in. • The location at which the gene product has been found.

  7. Annotation GOannotations Gene detail page in MGD for the vitamin D receptor gene, Vdr

  8. Annotation Feature 1: Ontologies provide a shared controlled vocabulary of concepts. GOannotations Gene detail page in MGD for the vitamin D receptor gene, Vdr

  9. Gene ontology - definitions • A diverse community, so explicit definitions important. • 60% of GO concepts have a textural definition e.g. • apoptotic nuclear changes GO:0030262 Changes affecting the nucleus and its contents during apoptosis; includes condensation and fragmentation of nuclear DNA and of the nucleus itself.

  10. Gene ontology - definitions Feature 2: Ontologies provide an agreed definition for each concept to ensure each concept is used in the same way. • A diverse community so explicit definitions important. • 60% of GO concepts have a textural definition e.g. • apoptotic nuclear changes GO:0030262 Changes affecting the nucleus and its contents during apoptosis; includes condensation and fragmentation of nuclear DNA and of the nucleus itself.

  11. Gene ontology – organisation • An alphabetical list of 11000 terms is not enough • Hierarchies allow similar terms to be grouped together. biological process death cell death tissue death necrosis histolysis

  12. Gene ontology – hierarchy use • GO hierarchy is used for • Navigation of concepts by users • Indexing of information in databases • Aggregating information

  13. Taxonomy remark 1 • The world is not a tree, it’s a lattice animal wild vermin domestic pet working rodent dog cow mouse cat

  14. Door Action associated with a door Closing the Door Kind of a door Monumental Door Metalwork of a Door Door-Knocker Something attached to a door Threshold Door-keeper Taxonomy remark 2 • What does the taxonomy mean? • Concept A is a parent of concept B iff every instance of B is also an instance of A • Superset/subset • ICONCLASS

  15. The CelestialEmporium of Benevolent Knowledge, Borges Classification trickiness "On those remote pages it is written thatanimals are divided into: a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in thisclassification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hairbrush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance"

  16. Classification is task and culture specific Dyirbal classification of objects in the universe, • Bayi: men, kangaroos, possums, bats, most snakes, mostfishes, some birds, most insects, the moon, storms, rainbows, boomerangs, some spears, etc. • Balan:women, anything connected with water or fire,bandicoots, dogs, platypus, echidna, some snakes, some fishes, most birds, fireflies, scorpions, crickets, the stars, shields, some spears, some trees, etc. • Balam: all edible fruit and the plants that bear them, tubers,ferns, honey, cigarettes, wine, cake. • Bala: parts of the body, meat, bees, wind, yamsticks, somespears, most trees, grass, mud, stones, noises, language, etc.

  17. Gene ontology – directed acyclic graphs • Each concept is explicitly grouped either by is-a or part of relationships • Functions are often grouped by type • Cellular components are often grouped by part • Each concept can have multiple parents • A concepts positions is represented by a directed acyclic graph • Hierarchies are handcrafted so as to suit the ‘culture’ of biologists

  18. Feature 3: Ontologies organise concepts in multiple ways for multiple uses. Principle of grouping should be explicit.

  19. Taking it further • GO concepts are often phrases • insulin control element activator complex, insulin processing, insulin receptor, insulin receptor complex, insulin receptor ligand, insulin receptor signalling pathway, insulin secretion, insulin acticated sodium/amino acid transporter, • Components of phrase hidden to computer applications

  20. Explicit conceptualisation • Semantic similarity searching • Automated maintenance of hierarchies. • What we need is.. • A formal grammar with which to compose phrases • Software which can interpret phrases and produce sound and complete hierarchies

  21. The exploding bicycle • ICD-9 (E826) 8 • READ-2 (T30..) 81 • READ-3 87 • ICD-10 (V10-19) 587 • V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income • W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity • X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

  22. Defusing the exploding bicycle:500 codes in pieces • 10 things to hit… • Pedestrian / cycle / motorbike / car / HGV / train / unpowered vehicle / a tree / other • 5 roles for the injured… • Driving / passenger / cyclist / getting in / other • 5 activities when injured… • resting / at work / sporting / at leisure / other • 2 contexts… • In traffic / not in traffic • V12.24 Pedal cyclist injured in collision with two- or three-wheeled motor vehicle, unspecified pedal cyclist, nontraffic accident, while resting, sleeping, eating or engaging in other vital activities

  23. hand extremity body Lung inflammation infection abnormal normal Coordination: Conceptual Lego gene protein cell expression chronic acute bacterial deletion polymorphism ischaemic

  24. Conceptual Lego “SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which isanatomicallynormal”

  25. DAML+OIL • Specifically designed to compose phrases in a compositional manner • Becoming a standard ontology interchange language • Adopted by W3C and will soon become Ontology Web Language (OWL)

  26. Reasoning support • Consistency — check if knowledge is meaningful • Subsumption— structure knowledge, compute taxonomy • Equivalence— check if two classes denote same set of instances • Instantiation— check if individual i instance of class C • Retrieval — retrieve set of individuals that instantiate C Problems all reducibleto consistency (satisfiability)

  27. Gene Ontology Next Generation • Early aim • Proof of concept showing DAML+OIL & description logic can practically help in at least one aspect of GO maintenance. • In cooperation with Mike Ashburner and the GO editorial team • Further aims • Prototype an evolutionary environment in which the benefits can be replicated on a larger scale

  28. Preliminary task • Providing an exhaustive is-a taxonomy • GO is-a poly-hierarchy • It becomes increasingly laborious to make sure that all concepts are linked to all possible is-a parents

  29. Metabolism terms: e.g. heparin biosynthesis [i] (GO:0006024) [chemical] biosynthesis (GO:0009058) [i]carbohydrate biosynthesis (GO:0016051) Axis 1: Chemicals [i]aminoglycan biosynthesis (GO:0006023) [i] glycosaminoglycan biosynthesis (GO:0006024) [i]heparin biosynthesis (GO:0030210) Axis 2: Process [i]heparin metabolism (GO:0030202) [i]heparin biosynthesis (GO:0030210)

  30. Is this important? • Complete taxonomy not necessary for browsing by biologist (and may actually get in the way) • BUT… improves fidelity of DB record retrieval. • Asking for records annotated with ‘glycosaminoglycan biosynthesis’ or more specific will lead to an additional result O94923 SPTr ISS - D-glucuronyl C5-epimerase (Fragment)

  31. How can we support the task? • Step 0. Translate to DAML+OIL syntax • Provided by OilEd • Provide DAML+OIL based definitions of GO concepts – initially in the metabolism area

  32. DAML+OIL definitions for metabolism concepts • heparin biosynthesis • class heparin biosynthesis definedsubClassOf biosynthesisrestrictiononProperty acts_on hasClass heparin (acts_on is unique) • Paraphrase: biosynthesis which acts solely on heparin • glycosaminoglycan biosynthesis • class glycosaminoglycan biosynthesis defined subClassOf biosynthesis restriction onProperty acts_on hasClassglycosaminoglycan

  33. DAML+OIL definitions for metabolism concepts • heparin biosynthesis • class heparin biosynthesis definedsubClassOf biosynthesisrestrictiononProperty acts_on hasClass heparin (acts_on is unique) • Paraphrase: biosynthesis which acts solely on heparin • glycosaminoglycan biosynthesis • class glycosaminoglycan biosynthesis defined subClassOf biosynthesis restriction onProperty acts_on hasClassglycosaminoglycan Feature 4: Ontologies provide a formal computer interpretable concept definition.

  34. A chemical ontology • Initially used MESH to create a DAML+OIL ontology from a subset of the chemical taxonomy (using UMLS tools/ API) • Provides the following information carbohydrates [i] polysaccharides [i] glycosaminogylcans [i] heparin

  35. Reason over the combination • Combine GO definitions with chemical ontology using OilEd API • Send to FaCT DL reasoner…

  36. Paraphrased reasoning process • heparin biosynthesis • class heparin biosynthesis definedsubClassOf biosynthesisrestrictiononProperty acts_on hasClassheparin • glycosaminoglycan biosynthesis • class glycosaminoglycan biosynthesis defined subClassOf biosynthesis restriction onProperty acts_on hasClassglycosaminoglycan Is-a

  37. Inferring a new is-a link • heparin biosynthesis • class heparin biosynthesis definedsubClassOf biosynthesisrestrictiononProperty acts_on hasClassheparin • glycosaminoglycan biosynthesis • class glycosaminoglycan biosynthesis defined subClassOf biosynthesis restriction onProperty acts_on hasClassglycosaminoglycan Is-a Is-a

  38. Inferring a new is-a link • heparin biosynthesis • class heparin biosynthesis definedsubClassOf biosynthesisrestrictiononProperty acts_on hasClassheparin • glycosaminoglycan biosynthesis • class glycosaminoglycan biosynthesis defined subClassOf biosynthesis restriction onProperty acts_on hasClassglycosaminoglycan Feature 5: Ontologies can become a dynamic service with reasoning support. Is-a Is-a

  39. Output • OilEd API reports additional inferred is-a relationships.E.g.heparin biosynthesis has new is-a parent glycosaminoglycan biosynthesis • Sanitised version sent to GO editorial team for comment. • They (Jane Lomax) makes changes to GO if appropriate and sends back queries

  40. Results • Carbohydrate metabolism • 22 additional is-a links 17 of which now in GO • Amino acid metabolism • Further 17 additional is-a links now in GO • Currently preparing results for metabolism as a whole

  41. Where next with GONG? • Moving from proof of concept requires dedicated software tools to support the process. • Authoring/ Curation of DAML+OIL definitions • Tracking GO as it evolves • Tracking suggested changes and response to changes.

  42. myGrid & high level ontologies • myGrid: Personalised extensible environments for data-intensive in silico experiments in biology • Higher level services: workflow, databases, knowledge management, provenance… • Bioinformatics services are published as Web services (and soon Grid Services) • http://www.ebi.ac.uk/collab/mygrid/service0/axis/index.html

  43. Ontologies for Service Discovery • Find appropriate type of services • sequence alignment • Find appropriate instances of that service • BLAST (an algorithm for sequence alignment), as delivered by NCBI • Assist in forming an appropriate assembly of discovered services. • Find, select and execute instances of services while the workflow is being enacted. Knowledge in the head of expert bioinformatician

  44. RASMOL Similar Structure Protein Fetch Fetch sequences modelling name View WF An in silico experiment as a workflow

  45. Four-tiered service descriptions Domain “semantic” • Class of service: • a protein sequence alignment, a protein sequence database. • Specific example of an abstract service: • BLAST, SWISS-PROT. • Instance service description of a specific service: • BLAST, SWISS-PROT as offered by the EBI. • Invoked instance service description: • BLAST as offered by the EBI on a particular date, with particular parameters when a service was actually enacted. Business “operational”

  46. Service description phrases • Build up a phrase describing classes of service functionality. • Building blocks for phrase come from a suite of ontologies • Template for the description based on DAML-S specialised for bioinformatics. • Use reasoning to maintain a classification of services

More Related