890 likes | 1.02k Views
Principles for Building Biomedical Ontologies . Barry Smith. Computers are tools for scientists. this fact does not mean that the sciences themselves have new kinds of objects (data, information) bio-ontologies are about genes, cells, organisms not about terms, symbols, concepts, data.
E N D
Principles for Building Biomedical Ontologies Barry Smith
Computers are tools for scientists • this fact does not mean that the sciences themselves have new kinds of objects (data, information) • bio-ontologies are about genes, cells, organisms • not about terms, symbols, concepts, data
Overview • Following basic rules helps make better ontologies • We will work through the principles-based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful
Why do we need rules for good ontology? • Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) • Unintuitive rules for typeification lead to entry errors (problematic links) • Facilitate training of curators • Overcome obstacles to alignment with other ontology and terminology systems • Enhance harvesting of content through automatic reasoning systems
First Rule: Univocity • Terms (including those describing relations) should have the same meanings on every occasion of use. • In other words, they should refer to the same kinds of entities in reality
MedDRA • a cold • cold (vs. hot) • C.O.L.D. (Chronic-Obstructive-Lung-Disease) code with ‘C.O.L.D.’ or call to check
Second Rule: Positivity • Complements of types are not themselves types. • Terms such as ‘non-mammal’ or ‘non-membrane’ do not designate genuine types.
Third Rule: Objectivity • Which types exist is not a function of our biological knowledge. • Terms such as ‘unknown’ or ‘untypeified’ or ‘unlocalized’ do not designate biological natural kinds.
Fourth Rule: Single Inheritance No type in a typeificatory hierarchy should have more than one is_a parent on the immediate higher level
Rule of Single Inheritance • no diamonds: C is_a2 B is_a1 A
Problems with multiple inheritance B C is_a1 is_a2 A ‘is_a’ no longer univocal
‘is_a’ is pressed into service to mean a variety of different things • shortfalls from single inheritance are often clues to incorrect entry of terms and relations • the resulting ambiguities make the rules for correct entry difficult to communicate to human curators
is_a Overloading • serves as obstacle to integration with neighboring ontologies • The success of ontology alignment depends crucially on the degree to which basic ontological relations such as is_a and part_of can be relied on as having the same meanings in the different ontologies to be aligned.
Use of multiple inheritance • The resultant mélange makes coherent integration across ontologies achievable (at best) only under the guidance of human beings with relevant biological knowledge • How much should reasoning systems be forced to rely on human guidance?
Fifth Rule: Intelligibility of Terms and Definitions • Terms should be intelligible • ‘apoptosis inhibitor activity’ is a function in GO • relations between function and the processes they enable become very difficult to state unless function terms designate functions in an intelligible way • structural constituent of tooth enamel
extracellular matrix structural constituent • puparial glue (sensu Diptera) • structural constituent of bone • structural constituent of chorion (sensu Insecta) • structural constituent of chromatin • structural constituent of cuticle • structural constituent of cytoskeleton • structural constituent of epidermis • structural constituent of eye lens • structural constituent of muscle • structural constituent of myelin sheath • structural constituent of nuclear pore • structural constituent of peritrophic membrane (sensu Insecta) • structural constituent of ribosome – note possibility of confusion with ‘major ribosome unit’ (check) • structural constituent of tooth enamel • structural constituent of vitelline membrane (sensu Insecta)
Fifth Rule: Intelligibility of Terms and Definitions • The terms used in a definition should be simpler (more intelligible) than the term to be defined • otherwise the definition provides no assistance • to human understanding • for machine processing
To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via brute force
Some rules are Rules of Thumb • The world of biomedical research is a world of difficult trade-offs • The benefits of formal (logical and ontological) rigor need to be balanced • Against the constraints of computer tractability, • Against the needs of biomedical practitioners. • BUT alignment and integration of biomedical information resources will be achieved only to the degree that such resources conform to these standard principles of typeification and definition
Definitions should be intelligible to both machines and humans • Machines can cope with the full formal representation • Humans need to use modularity • Plasma membrane • is acell part [immediate parent] • that surrounds the cytoplasm [differentia]
Terms and relations should have clear definitions • These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: • actual cells, actual portions of cytoplasm, and so on…
Sixth Rule: Basis in Reality • When building or maintaining an ontology, always think carefully at how types (types, kinds, species) relate to instances in reality
Axioms governing instances • Every type has at least one instance • Every genus (parent type) has an instantiated species (differentia + genus) • Each species (child type) has a smaller type of instances than its genus (parent type)
Axioms governing Instances • Distinct types on the same level never share instances • Distinct leaf types within a typeification never share instances
substance organism animal cat instances siamese species, genera mammal leaf type frog
Interoperability • Ontologies should work together • ways should be found to avoid redundancy in ontology building and to support reuse • ontologies should be capable of being used by other ontologies (cumulation)
Main obstacle to integration • Current ontologies do not deal well with • Time and • Space and • Instances (particulars) • Our definitions should link the terms in the ontology to instances in spatio-temporal reality
Benefits of well-defined relationships • If the relations in an ontology are well-defined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C). Relations used in ontologies thus far have not been well defined in this sense. • Find all DNA binding proteins should also find all transcription factor proteins because • Transcription factor is_a DNA binding protein
How to define A is_a B A is_a B =def. • A and B are names of types (natural kinds, universals) in reality • all instances of A are as a matter of biological science also instances of B
Biomedical ontology integration / interoperability • Will never be achieved through integration of meanings or concepts • The problem is precisely that different user communities use different concepts • What’s really needed is to have well-defined commonly used relationships
Idea: • Move from associative relations between meanings to strictly defined relations between the entities themselves. • The relations can then be used computationally in the way required
Key idea:To define ontological relations • For example: part_of, develops_from • Definitions will enable computation • It is not enough to look just at types or types. • We need also to take account of instances and time
Kinds of relations • Between types: • is_a, part_of, ... • Between an instance and a type • this explosion instance_of the type explosion • Between instances: • Mary’s heart part_of Mary
Seventh Rule: Distinguish types and Instances • A good ontology must distinguish clearly between • types (universals, kinds, species) and • instances (tokens, individuals, particulars)
Don’t forget instances when defining relations • part_of as a relation between types versus part_of as a relation between instances • nucleus part_of cell • your heart part_ofyou
Part_of as a relation between types is more problematic than is standardly supposed • testis part_of human being ? • heart part_of human being ? • human being has_part human testis ?
Why distinguish types from instances? • What holds on the level of instances may not hold on the level of types • nucleus adjacent_to cytoplasm • Not: cytoplasm adjacent_to nucleus • seminal vesicle adjacent_to urinary bladder • Not: urinary bladderadjacent_to seminal vesicle
part_of • part_of must be time-indexed for spatial types • A part_of B is defined as: Given any instance a and any time t, If a is an instance of the type A at t, then there is some instance b of the type B such that a is an instance-level part_of b at t
instances derives_from (ovum, sperm zygote ... ) C1 c1att1 C c att time C' c' att
same instance C1 C c att c att1 time transformation_of pre-RNA mature RNAchild adult
transformation_of • C2 transformation_of C1 =def. any instance of C2 was at some earlier time an instance of C1
C1 C c att c att1 embryological development
tumor development C1 C c att c att1
Time menopause part_of aging aging part_of death ---------------------------------------- menopause part_of death
The simple, formal details “Relations in Biomedical Ontologies” Genome Biology, 2005, 6 (5)
Principles for Building Biomedical Ontologies:A GO Perspective David Hill Mouse Genome Informatics The Jackson Laoratory
How has GO dealt with some specific aspects of ontology development? • Univocity • Positivity • Objectivity • Single Inheritance • Definitions • Formal definitions • Written definitions • Basis in Reality • Universals & Instances • Ontology Alignment
The Challenge of Univocity:People call the same thing by different names Taction Tactition Tactile sense ?
Univocity: GO uses 1 term and many characterized synonyms Taction Tactition Tactile sense perception of touch ; GO:0050975
= bud initiation = bud initiation = bud initiation The Challenge of Univocity: People use the same words to describe different things