1 / 26

Stuart Aitken Artificial Intelligence Applications Institute

A Process Ontology for Cell Biology. Stuart Aitken Artificial Intelligence Applications Institute. Outline. Rapid Knowledge Formation (RKF) Project RKF Project goals and domain The Cyc knowledge based-system RKF Tools Process Ontology General approach Formalisation Example.

keitha
Download Presentation

Stuart Aitken Artificial Intelligence Applications Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Process Ontology for Cell Biology Stuart Aitken Artificial Intelligence Applications Institute

  2. Outline • Rapid Knowledge Formation (RKF) Project • RKF Project goals and domain • The Cyc knowledge based-system • RKF Tools • Process Ontology • General approach • Formalisation • Example

  3. Rapid Knowledge Formation • The RKF project aims to develop tools which will allow domain experts to enter knowledge directly into the KBS. • DARPA-funded, two teams: • CYCORP • SRI • Organised around ‘Challenge Problems’ – Cell Biology

  4. RKF Aim: To enable biologists to construct an ontology/KB from a textbook source formalise Ontology Alberts et al, Essential Cell Biology, 1998

  5. Rapid Knowledge Formation Key techniques: • The KBS has knowledge of the KA process • Knowledge of salience • Knowledge of the requirements of an adequate formalisation • There is a dialogue between expert and system, which clarifies the concept being defined.

  6. Rapid Knowledge Formation Evaluation: After a period of tool development, • trials are organised, both • expert performance, and • KE performance is measured, • and assessed independently. The evaluation is extensive – over a period of 2 weeks

  7. The Cyc KBS • Cyc (Doug Lenat) is a knowledge-based system, under development since ~1984, aiming to represent common sense knowledge. • Cyc uses a large upper-level ontology • Uses a logical language based on first-order logic

  8. The Cyc KBS Concepts in the Upper Ontology: • Thing, Agent, Event • TangibleThing, InformationBearingObject • …. Dog, Book • subclass(genls), instance-of(isa) • parts, subevent, role predicates • 1600 concepts in total in the public release (1998) - small% of Cyc Classification: • Stuff-like vs Object-like • Individual vs Set

  9. The Cyc KBS • The upper-ontology supports application development: Thing Upper-level Intermediate-level Application-level

  10. The Cyc KBS Cyc includes: • An inference engine, • GUI, • tools for ontology development. • Until the RKF project, ontology development was by trained knowledge engineers, working with domain experts.

  11. RKF New tools in Cyc: • Define a new concept, and place it correctly in the ontology • Refine a concept definition • Define a new predicate • Assert a new fact • Define a new rule • State an analogy • Construct a new process

  12. RKF User interaction: • Selection of items in the interface • Choice determined ‘intelligently’, KBS has knowledge of salience, and the KA process, this knowledge must be authored • Browsing of the ontology • Search • Natural language dialogue

  13. Process Models RNA Transcription BindsTogether Move

  14. Process Descriptor Q: Name the process A: [ RNA Transcription ] Q:Select the type of Process that describes the category best • event localised • creation or destruction event… • ‘say this:’[ _ _ _ _ _ _ ] Q: Define: • affected object: [ _ _ _ _ _ ] • location: [ _ _ _ _ _ ] • actor: [ _ _ _ _ _ ]

  15. Process Models Describing Processes: • Complex expressions at the instance level • Simpler to describe in terms of types subevent(Event,Event) doneBy(Event,Agent) Upper-level Intermediate-level ? Application-level ForAll ?E ?F ?G implies (subevent(?E,?G) and isa(?E,BindsTogether) subevent(?F,?G) and isa(?F,Move)) before(startOf(?E),startOf(?F))

  16. Script Vocabulary The Script theory defines the semantics of Type-Level assertions (typePlaysRoleInScene RNATranscription DNAMolecule BindsTogether objectActedOn) • Requires rules for identity • Can require complex reasoning • Good for user input • Can be extended to cover pre and postconditions of actions

  17. Scripts subevents RNA Transcription startsAfterStartingOfInScript BindsTogether Move t e f Forall subevents f of t, of type Move, and all subevents e of t, of type BindsTogether, (startsAfterStartingof f e) where t is of type RNATranscription

  18. Scripts Type playing role BindsTogether Nucleotide Types: Instance: N e objectActedOn For some n in N, (objectActedOn e n)

  19. New Script Vocabulary • Pre and Post conditions • (preconditionOfScene-negated BindsTogether touchingDirectly <Ribonucleotide Nucleotide>) • (postconditionOfScene BindsTogether connectedTo <Ribonucleotide Nucleotide>) BindsTogether N N not touchingDirectly R R connectedTo

  20. New Script Vocabulary BindsTogether Ribonucleotide Nucleotide Types: Set of Instances: role role N R e Postcondition: Precondition: Some ?n in N, some ?r in R (not (touchingDirectly ?n ?r)) Some ?n in N, some ?r in R (connectedTo ?n ?r) identity

  21. Script Vocabulary • The Script vocabulary forms an ‘intermediate level’, which • lies behind the Process descriptor GUI (i.e. the textboxes) • Not, in itself, a taxonomy of processes, but allows processes to be described in detail. • Defining the subclass relation is just one task.

  22. Vaccinia Virus Life Cycle • The vaccinia virus life cycle was selected as an example of a complex model to formalise as a set of Scripts. • The model includes actions, decomposition, ordering, objects-playing-roles and pre/postconditions • It is a good test for the Script vocabulary

  23. Vaccinia Virus Life Cycle Temporal: mRNATranscription-Early ViralGeneTranslation-Early MovementOfProtein Outputs:messengerRNA Participants mRNATranscription-Early Inputs:messengerRNA ViralGeneTranslation-Early MovementOfProtein Conditions: mRNATranscription-Early Pre:spatiallySubsumes Cell VirusCore ViralGeneTranslation-Early Post:spatiallySubsumes CellCytoplasm Vitf2 MovementOfProtein

  24. Evaluation • 8 biologists were selected, and trained in the tools, 4 per team • The knowledge to be formalised was selected (chapter 7 in Alberts) • The knowledge base was allowed to contain ‘pump-priming’ knowledge • The biologists entered knowledge , using the tools, then tested it against a set of questions, • Ontology/KB was revised

  25. Evaluation Results (outline) • A huge amount of data was collected, but analysis is complex (IET Inc) • Domain experts were able to develop ontologies after ‘light’ training • Knowledge engineers out-perform domain experts in ontology construction

  26. Summary ‘Power Tools’ for ontology development are being implemented and tested in the RKF project. • A Script/Process vocabulary has been developed and applied to processes in cell biology, covering: • Temporal order • Participants • Pre/postconditions • Repetition

More Related