260 likes | 405 Views
A Process Ontology for Cell Biology. Stuart Aitken Artificial Intelligence Applications Institute. Outline. Rapid Knowledge Formation (RKF) Project RKF Project goals and domain The Cyc knowledge based-system RKF Tools Process Ontology General approach Formalisation Example.
E N D
A Process Ontology for Cell Biology Stuart Aitken Artificial Intelligence Applications Institute
Outline • Rapid Knowledge Formation (RKF) Project • RKF Project goals and domain • The Cyc knowledge based-system • RKF Tools • Process Ontology • General approach • Formalisation • Example
Rapid Knowledge Formation • The RKF project aims to develop tools which will allow domain experts to enter knowledge directly into the KBS. • DARPA-funded, two teams: • CYCORP • SRI • Organised around ‘Challenge Problems’ – Cell Biology
RKF Aim: To enable biologists to construct an ontology/KB from a textbook source formalise Ontology Alberts et al, Essential Cell Biology, 1998
Rapid Knowledge Formation Key techniques: • The KBS has knowledge of the KA process • Knowledge of salience • Knowledge of the requirements of an adequate formalisation • There is a dialogue between expert and system, which clarifies the concept being defined.
Rapid Knowledge Formation Evaluation: After a period of tool development, • trials are organised, both • expert performance, and • KE performance is measured, • and assessed independently. The evaluation is extensive – over a period of 2 weeks
The Cyc KBS • Cyc (Doug Lenat) is a knowledge-based system, under development since ~1984, aiming to represent common sense knowledge. • Cyc uses a large upper-level ontology • Uses a logical language based on first-order logic
The Cyc KBS Concepts in the Upper Ontology: • Thing, Agent, Event • TangibleThing, InformationBearingObject • …. Dog, Book • subclass(genls), instance-of(isa) • parts, subevent, role predicates • 1600 concepts in total in the public release (1998) - small% of Cyc Classification: • Stuff-like vs Object-like • Individual vs Set
The Cyc KBS • The upper-ontology supports application development: Thing Upper-level Intermediate-level Application-level
The Cyc KBS Cyc includes: • An inference engine, • GUI, • tools for ontology development. • Until the RKF project, ontology development was by trained knowledge engineers, working with domain experts.
RKF New tools in Cyc: • Define a new concept, and place it correctly in the ontology • Refine a concept definition • Define a new predicate • Assert a new fact • Define a new rule • State an analogy • Construct a new process
RKF User interaction: • Selection of items in the interface • Choice determined ‘intelligently’, KBS has knowledge of salience, and the KA process, this knowledge must be authored • Browsing of the ontology • Search • Natural language dialogue
Process Models RNA Transcription BindsTogether Move
Process Descriptor Q: Name the process A: [ RNA Transcription ] Q:Select the type of Process that describes the category best • event localised • creation or destruction event… • ‘say this:’[ _ _ _ _ _ _ ] Q: Define: • affected object: [ _ _ _ _ _ ] • location: [ _ _ _ _ _ ] • actor: [ _ _ _ _ _ ]
Process Models Describing Processes: • Complex expressions at the instance level • Simpler to describe in terms of types subevent(Event,Event) doneBy(Event,Agent) Upper-level Intermediate-level ? Application-level ForAll ?E ?F ?G implies (subevent(?E,?G) and isa(?E,BindsTogether) subevent(?F,?G) and isa(?F,Move)) before(startOf(?E),startOf(?F))
Script Vocabulary The Script theory defines the semantics of Type-Level assertions (typePlaysRoleInScene RNATranscription DNAMolecule BindsTogether objectActedOn) • Requires rules for identity • Can require complex reasoning • Good for user input • Can be extended to cover pre and postconditions of actions
Scripts subevents RNA Transcription startsAfterStartingOfInScript BindsTogether Move t e f Forall subevents f of t, of type Move, and all subevents e of t, of type BindsTogether, (startsAfterStartingof f e) where t is of type RNATranscription
Scripts Type playing role BindsTogether Nucleotide Types: Instance: N e objectActedOn For some n in N, (objectActedOn e n)
New Script Vocabulary • Pre and Post conditions • (preconditionOfScene-negated BindsTogether touchingDirectly <Ribonucleotide Nucleotide>) • (postconditionOfScene BindsTogether connectedTo <Ribonucleotide Nucleotide>) BindsTogether N N not touchingDirectly R R connectedTo
New Script Vocabulary BindsTogether Ribonucleotide Nucleotide Types: Set of Instances: role role N R e Postcondition: Precondition: Some ?n in N, some ?r in R (not (touchingDirectly ?n ?r)) Some ?n in N, some ?r in R (connectedTo ?n ?r) identity
Script Vocabulary • The Script vocabulary forms an ‘intermediate level’, which • lies behind the Process descriptor GUI (i.e. the textboxes) • Not, in itself, a taxonomy of processes, but allows processes to be described in detail. • Defining the subclass relation is just one task.
Vaccinia Virus Life Cycle • The vaccinia virus life cycle was selected as an example of a complex model to formalise as a set of Scripts. • The model includes actions, decomposition, ordering, objects-playing-roles and pre/postconditions • It is a good test for the Script vocabulary
Vaccinia Virus Life Cycle Temporal: mRNATranscription-Early ViralGeneTranslation-Early MovementOfProtein Outputs:messengerRNA Participants mRNATranscription-Early Inputs:messengerRNA ViralGeneTranslation-Early MovementOfProtein Conditions: mRNATranscription-Early Pre:spatiallySubsumes Cell VirusCore ViralGeneTranslation-Early Post:spatiallySubsumes CellCytoplasm Vitf2 MovementOfProtein
Evaluation • 8 biologists were selected, and trained in the tools, 4 per team • The knowledge to be formalised was selected (chapter 7 in Alberts) • The knowledge base was allowed to contain ‘pump-priming’ knowledge • The biologists entered knowledge , using the tools, then tested it against a set of questions, • Ontology/KB was revised
Evaluation Results (outline) • A huge amount of data was collected, but analysis is complex (IET Inc) • Domain experts were able to develop ontologies after ‘light’ training • Knowledge engineers out-perform domain experts in ontology construction
Summary ‘Power Tools’ for ontology development are being implemented and tested in the RKF project. • A Script/Process vocabulary has been developed and applied to processes in cell biology, covering: • Temporal order • Participants • Pre/postconditions • Repetition