260 likes | 577 Views
Building Ontologies. Building Ontologies. No field of Ontological Engineering equivalent to Knowledge or Software Engineering; No standard methodologies for building ontologies; Such a methodology would include: a set of stages that occur when building ontologies;
E N D
Building Ontologies • No field of Ontological Engineering equivalent to Knowledge or Software Engineering; • No standard methodologies for building ontologies; • Such a methodology would include: • a set of stages that occur when building ontologies; • guidelines and principles to assist in the different stages; • an ontology life-cycle which indicates the relationships among stages. • Gruber's guidelines for constructing ontologies are well known.
The Development Lifecycle • Two kinds of complementary methodologies emerged: • Stage-based, e.g. TOVE [Uschold96] • Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94]. • Most have TWO stages: • Informal stage • ontology is sketched out using either natural language descriptions or some diagram technique • Formal stage • ontology is encoded in a formal knowledge representation language, that is machine computable • An ontology should ideally be communicated to people and unambiguously interpreted by software • the informal representation helps the former • the formal representation helps the latter.
A Provisional Methodology • A skeletal methodology and life-cycle for building ontologies; • Inspired by the software engineering V-process model; • The overall process moves through a life-cycle. The left side charts the processes in building an ontology The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology
The V-model Methodology Ontology in Use Evaluation: coverage, verification, granularity Identify purpose and scope Knowledge acquisition User Model Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency Conceptualisation Integrating existing ontologies Conceptualisation Model Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation Encoding Representation Implementation Model
The ontology building life-cycle Identify purpose and scope Knowledge acquisition Building Language and representation Conceptualisation Integrating existing ontologies Available development tools Encoding Evaluation
User Model: Identify purpose and scope • Decide what applications the ontology will support • EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source • TAMBIS: retrieval across a broad range of bioinformatics resources • The use to which an ontology is put affects its content and style • Impacts re-usability of the ontology
User Model: Knowledge Acquisition • Specialist biologists; standard text books; research papers and other ontologies and database schema. • Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer • Evaluation: • Fitness for purpose • Coverage and competency
Conceptualisation Model: Conceptualisation • Identify the key concepts, their properties and the relationships that hold between them; • Which ones are essential? • What information will be required by the applications? • Structure domain knowledge into explicit conceptual models. • Identify natural language terms to refer to such concepts, relations and attributes; • Determine naming conventions • Consistent naming for classes and slots • EcoCyc: • Classes are capitalized, hyphenated, plural • Slot names are uppercase A quality ontology captures relevant biological distinctions with high fidelity
Conceptualisation Model: Pitfalls • Pitfall: Missing ontological elements • Missing classes: Swiss-Prot Protein complexes • Missing attributes: Genetic code identifier • Confuse 1:1 with 1:Many, or 1:Many with Many:Many • Cofactor as an attribute of reaction • Important data is stored within text/comment fields • Pitfall: Extra ontological elements • Pitfall: Stop over-elaborating – when do I stop? • Pitfall: Relevance – do I really need all this detail?
Integrating Existing Ontologies • Reuse or adapt existing ontologies when possible • Save time • Correctness • Facilitate interoperation • Integration of ontologies • Ontologies have to be aligned • Hindered by poor documentation and argumentation • Hindered by implicit assumptions • Shared generic upper level ontologies should make integration easier
Encoding: Implementation Toolkit • Construct ontology using an ontology-development system • Does the data model have the right expressivity? • Is it just a taxonomy or are relationships needed? • Is multiple parentage needed? Inverse relationships? • What types of constraints are needed? • Are reasoning services needed? • What are authoring features of the development tool? • Can ontology be exported to a DBMS schema? • Can ontology be exported to an ontology exchange language? • Is simultaneous updating by multiple authors needed? • Size limitations of development tool?
Encoding: Ontology Implementation Pitfalls • Pitfall: Semantic ambiguity • Multiple ways to encode the same information • Meaning of class definitions unclear • Pitfall: Encoding Bias • Encoding the ontology changes the ontology
Encoding: Ontology Implementation Pitfalls • Pitfall: Redundancy (lack of normalization) • Exact same information repeated • Presence of computationally derivable information • Date of birth and age • DNA sequence and reverse complement • More effort required for entry and update • Partial updates lead to inconsistency • OK if redundant information is maintained automatically
Encoding: The Interaction Problem • Task influences what knowledge is represented and how its represented • Molecular biology: chemical and physical properties of proteins • Bioinformatics: accession number, function gene • Underlying perspectives mean they may not be reconcilable • If an ontology has too many conflicting tasks it can end up compromised – TaO experience
Evaluate it - A guide for reusability • Conciseness • No redundancy • Appropriateness – protein molecules at the atomic resolution when amino acid level would do • Clarity • Consistency • Satisfiability – it doesn’t contradict itself • Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction • Commitment • Do I have to buy into a load of stuff I don’t really need or want just to get the bit I do?
Documentation: Make Ontology Understandable! • Produce clear informal and formal documentation • An ontology that cannot be understood will not be reused • Genbank feature table • NCBI ASN.1 definitions • There exists a space of alternative ontology design decisions • Semantics / Granularity • Terminology • Pitfall: Neglecting to record design rationale
Publish the Ontology • Formal and informal specifications • Intended domain of application • Design rationale • Limitations • See EcoCyc paper in ISMB-93/Bioinformatics 00 • See TAMBIS paper in Bioinformatics 99
Macromolecule Reference Ontology MacroMolecule SequenceComponent Gene Motif Lipid Phosphorylation site Nucleic Acid Protein RNA Peptide Enzyme Restriction site mRNA DNA componentOf cDNA gDNA mDNA
Discussion • What is a macromolecule? • Where does macromolecule fit into an upper level ontology? • Substance? • Structure? • Is lipid a macromolecule? • If we replace macromolecule with biopolymer is the placement of lipid legit? • Is a peptide a protein and therefore a macromolecule? If not, where does it go?
Taxonomy and Roles • Do we want to assert everything in a taxonomy? • Or do we want to define things in terms of their properties? • Enzyme = Protein catalyses Reaction • gDNA = DNA hasLocation Chromosomal • Sufficiency as well as necessary conditions • Whats the relationship between • cDNA and EST • cDNA and some child of RNA ?
Axioms and constraints • Not all RNA is translated to protein • Do we want to say that DNA is translated to protein? • Do we want to model catalytic RNAs? • Relationships – what other ones do we need? • Genes express proteins • Genes express rRNA, tRNA • Genes are found on gDNA • Genes are found on mDNA • Genes have their own components – recursive relationships with partitive semantics • Reasoning? Instances? • Reusable? Clear? Concise?
Ontological Pitfalls • Stop-over – when do I stop over elaborating? • Proteins amino acid residues side chains physical chemical properties …. • Relevance • Do we need to mention all the types of nucleic acid?
EcoCyc Chemicals MacroMolecule Compounds-And-Elements Nucleic-Acids Compounds Proteins Lipids DNA RNA PolyPeptides Protein-Complexes Misc-RNA DNA-Segments Genes
Macromolecule in other Ontologies Gene Ontology • Used to add attributes to gene instances in databases • Doesn’t need to talk about molecules or components of molecules TAMBIS Ontology • Models it in a similar way to our reference macromolecule ontology • Because it asks questions of bioinformatics sources