Introduction to anatomy ontology building

Introduction to anatomy ontology building David Osumi-Sutherland FlyBase (www.flybase.org) Virtual Fly Brain (www.virtualflybrain.org)

Take home messages • An ontology is a classification • There are lots of useful ways to classify stuff • Maintaining multiple classification schemes by hand is impractical • So you should automate it. • Everybody makes mistakes • So you should get the computer find errors for you • Re-use other people’s work where possible • import class hierarchies • use common patterns • Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!

What is an ontology ? • A set of defined, inter-related terms to use in annotation/metadata/knowledge bases. • A classification • A query-able store of (scientific) knowledge that uses logical inference.

What is an ontology ? • A set of defined, inter-related terms to use in annotation/metadata/knowledge bases. • A classification • A query-able store of (scientific) knowledge that uses logical inference. depends on depends on depends on

What (use) is an ontology? • A set of defined, inter-related terms to use in annotation. • Annotation of • papers; specimens; gene expression; phenotype… • Use of common annotation terms across multiple databases allows easy shared integration. • Relations between terms allow annotations to be grouped in scientifically meaningful ways • requires an ontology to be an accurate and scientifically meaningful classification and store of scientific knowledge.

What is an ontology ? • A classification • There are lots of scientifically useful ways to classify a bit of anatomy. • its parts and their arrangement • its relation to other structures • what is it: part of; connected to; adjacent to, overlapping? • its shape • its function • its developmental origins • its species or clade • its evolutionary history?

What is an ontology ? • The scientific knowledge an ontology contains can make the reasons for classification explicit. • e.g. • Any sense organ that functions in the detection of smell is an olfactory sense organ • All large basiconicsensilla of the antenna function in detection of smell • Therefore all large basiconicsensilla of the antenna are are olfactory sense organs

Virtual Fly Brain Demo

Why ontology development is like software or database development • Ideal case – • maintainable • basic maintenance (e.g. correcting simple errors) is easy • scalable • grow your project as large as you need without breaking • extensible • easy to add new functionality without breaking existing • integrate-able • Can integrate easily with work of others – so you don’t have to solve all problems yourself

Why ontology development is like software or database development • Ideal case – Future editors can build on your work • maintainable – By multiple editors • basic maintenance (e.g. correcting simple errors) is easy • scalable – By multiple editors • grow your project as large as you need without breaking • extensible – By multiple editors • easy to add new functionality without breaking existing • integrate-able • Can integrate easily with work of others – so you don’t have to solve all problems yourself

How not to build ontologies- The trap • A small, simple ontology or program with one developer can get away with practices that a large one can not • given • shallow, single inheritance classification (each class has 0-1 superclasses) • very few relationship types • < 1000 terms. • it is feasible to: • have little annotation/documentation • have no automated error checking • have no automated classification • keep redundancy to a minimum by hand

How not to build ontologies- The trap • Small, simple ontologies and programs have a habit of growing large and complicated. • Users demand lots more terms for annotation • Users demand multiple axes of classification • No scientific reason to favor one over another • Users demand/editors favor multiple relationship types to record information they believe scientifically important. • Editors/coders move on • someone else has to continue their work. Is the documentation mainly in the old developers head?

How not to build ontologies- The trap • Worst case scenario – the tangled pit of misery: • Difficult, perhaps impossible to maintain or extend • Tangled, convoluted, redundant structure with little or no documentation or annotation. • Editing tends to inadvertently break previous functionality. • Little or no error checking means you don't even notice when you break stuff. Users find out later. • Even you can't easily edit what you built 6 months ago without getting confused and making a mess.

Avoiding tangled pits of misery • There are no perfect answers, but these might help: • good annotation and documentation; • good, consistent style; • avoidance of redundancy; • let the computer keep track of things for you • modularity; • automate • a consistent set of tests of existing functionality (j-unit / consistency); • constant testing during development; • design patterns.

Good Practice 1:Good annotation and documentation • Clear textual definitions with references • ensure accurate manual annotation • make assertions of scientific fact trace-able • serve as documentation for future ontology developers • Also useful to record – for users and future developers: • Experimental evidence for assertions of scientific fact • Notes on confusing or conflicting usage of terms • Reasons for design choices/compromises

Options for formalization • OWL • W3C standard • Decidable • Big open source community of tool developers • multiple fast reasoners – getting better all the time • Easy to read syntax – OWL Manchester syntax (OWL MS) • OBO • Best thought of as a subset of OWL, with which it is increasingly integrated • Limited community of tool developers • Easy(ish) to read syntax • Common logic • Very powerful. But easy to come up with solutions that can’t be usefully reasoned with.

Relationships are the formalized part of a definition. • The criteria for class membership is recorded using textual definitions, at least some elements of which are formalized as relationships. • name: insect wing • def: “A membranous dorsal appendage or the meso- or metathorax that functions in flight .” [Snodgrass, 1935] • is_a: appendage • relationship: part_of thoracic segment • relationship: has_function_in flight

Classification is transitive • If A SubClass* of B and B SubClassOf C then A SubClassOf C • All members of class A are members of class C. So, the definition of class C must apply to class A. * OWL (MS) SubClassOf≅OBO is_a

Classification is transitive • ‘material anatomical entity’ <- is_a ‘sense organ’ <- is_asensillum <- is_a ‘olfactory sensillum’ <- is_a ‘antennal basiconicsensillum’ • ‘material anatomical entity’: “… has mass.” • ‘sense organ’: “… functions in the detection of a stimulus involved in sensory perception.” • sensillum: “A sense organ consisting of a small cluster of cells of various types.” • ‘olfactory sensillum’: “… functions in the detection of smell” * OWL (MS) SubClassOf≅OBO is_a

class – class relationships are quantified • Class:Class relationships are many to many • Does the relation apply to all or just some of the class ? • we specify this with quantifiers: • ∀: for all, all, only, every • ∃: there exists, some • Cautionary note – • Modeling knowledge as class hierarchies defined with quantified logic is an extremely useful but is limited. • Don’t expect to be able to use if for everything you know! • Expressivity of OWL is more limited still.

relationships specify necessary conditions for class membership • Being part of an insect thorax is a necessary condition of being in the class ‘insect leg’. • English: • All insect legs are part of some (type of) insect thorax • OBO (quantifiers hidden) • name: insect leg • relationship: part_ofthorax • OWL (MS): • ‘insect wing’ SubClassOfpart_ofsome thorax • PL: • ∀leg(x), ∃thorax(y) and part_of(x,y) * * ignoring time argument from OBO RO 2005

Classification is transitive • If A SubClass* of B and B SubClassOf C then A SubClassOf C • All members of class A are members of class C. So, the definition of class C must apply to class A. (all) leg part_ofsome thorax ‘front leg’ SubClassOfleg therefore (all) ‘front leg’part_ofsome thorax * OWL (MS) SubClassOf≅OBO is_a

Directionality and quantifiers • True: all ‘insect wing’ part_ofsome ‘insect thorax’ • False: all ‘insect thorax’ has_partsome ‘insect wing’ • True: all ‘claw’ connected_tosome ‘tarsal segment’ • False: all ‘tarsal segment’ connected_tosome claw

Manually maintaining an ontology with multiple classification schemes is impractical • It is difficult to keep track of multiple classification chains to: • ensure completeness; • avoid redundancy; • avoid introducing error due to inheritance of classification criteria from a distant ancestor

Automating multiple classification. • The scientific knowledge an ontology contains can make the reasons for classification explicit. • e.g. • Any sense organ that functions in the detection of smell is an olfactory sense organ • All large basiconicsensilla of the antenna function in detection of smell • Therefore all large basiconicsensilla of the antenna are are olfactory sense organs

Automating multiple classification. • We can specify that some set of necessary conditions for class membership are sufficient to determine class membership • English • Any sense organ that functions in the detection of smell is an olfactory sense organ • OWL (MS): • olfactory sense organ’ EquivalentTo: sense organ that has_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • OBO • name: olfactory sense organ • intersection_of: sense organ • intersection_of: has_function_in ‘detection of chemical stimulus involved in sensory perception of smell’

Automating multiple classification. • ‘olfactory sense organ’ EquivalentTo: sense organ that has_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • ‘large basiconicsensillum of antenna’ SubClassOf: ‘sense organ’; SubClassOfhas_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • Reasoner concludes: ‘large basiconicsensillum of antenna’ SubClassOf‘olfactory sense organ’ Keene & Waddell, 2007

Use other people’s work to build your classification • Gene Ontology classification of sensory processes:

Automating multiple classification.

Some extra OWL expressivity • In OWL we can also specify number (cardinality): • (all) insect: SubClassOfhas_componentexactly 6 leg

Error checking is essential – everybody makes mistakes • Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. • We say that such classes are disjoint • Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints. • This is your main means of error checking. Use it extensively. It also speeds up some reasoners.

Error checking - domain and range constraints • ‘cortisol secretion’ SubClassOf ‘endocrine hormone secretion’ SubClassOf process • ‘adrenal gland’ SubClassOf ‘endocrine gland’ SubClassOfstructure • structure DisjointWithprocess (nothing can be both a structure(adrenal gland) and a process (e.g. cortisol secretion) • has_function_in • domain: structure* • range: process* if xhas_function_iny then x must be an object and y must be a process. • Now if I mistakenly add: cortisal secretion has_function_in some adrenal gland. • Inconsistency: cortisol secretion SubClassOfstructure and process * more strictly, structure= continuant; range = occurrent

Error checking is essential – everybody makes mistakes • Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. • We say that such classes are disjoint • Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints. • This is your main means of error checking. Use it extensively. It also speeds up some reasoners.

Reasoner assisted error checking by eye • Keep an eye on classification inferred by the reasoner. • Protégé shows inferred classification and inherited relationships – keep an eye on these

Reasoner assisted error checking by eye • Run some test queries – do they give the answers you expect?

Mereology part_of is transitive If A part_of B part_of C part_of D Then A part_of D overlap is not transitive. If A overlaps B overlaps C then A may or may not overlap C A B C D A C B C A B

Transitivity of part_of • Given • (All) ‘insect coxa’ part_ofsome ‘insect leg’ • (All) ‘insect leg’part_ofsome ‘insect thoracic segment’ • (All) ‘insect thoracic segment’part_ofsome ‘insect thorax’ • Then • (All) ‘insect coxa’ part_ofsome ‘insect thorax’

Automating partonomy • As for class – maintaining multiple overlapping part hierarchies by hand is hard. • Some scope for auto-populating partonomies – e.g.- • English • Any anatomical structure that functions in endocrine hormone secretion is part of some endocrine system • OWL • (‘anatomical structure’ that has_function_insome ‘endocrine hormone secretion’) SubClassOf(part_ofsome ‘endocrine system’) • OBO • name: endocrine system component • intersection_of: anatomical structure’ • intersection_of: has_function_in ‘endocrine hormone secretion’ • relationship: part_of endocrine system

Declaring spatial disjointness provides error checking for partonomy • In OWL:part_ofsome X DisjointWithpart_ofsome Y

Reasoning with overlap B A A overlaps B if and only if there exists some X and X part_of A and X part_of B rules: If X part_of A then X overlaps A If A has_part X then A overlaps A overlaps . * part_of . * has_part In OWL (MS) * = SubPropertyOf In OBO *= is_a X X A B

Reasoning with overlap B B A A More rules If A has_part X and X part_of B then X overlaps B If C has_part A and A overlaps B then C overlaps B If B overlaps A and A part_of C then B overlaps C In OWL (MS): has_partopart_of -> overlaps In OBO: name: overlaps holds_over_chain: has_partpart_of X X X A B C

Image - Greg Jefferis Keene & Waddell, 2007

Shortcut relations • In OWL, we can write compound class expressions: • ‘antennal lobe projection neuron’ has_partsome (soma that part_ofsome ‘antennal lobe cortex’) • But these can quickly get long and verbose • ‘‘DL1 adPN’ has_partsome (potsynaptic membrane (GO) that part_ofsome (synapse (GO) that part_ofsome ‘DL1 glomerulus’)))

Shortcut relations • Shortcut relations stand in for compound class expressions. • ‘DL1 adPN’ has_part some (potsynaptic membrane (GO) that part_of some (synapse (GO) that part_of some ‘DL1 glomerulus’))) • > • ‘DL1 adPN’ has_postsynaptic_terminal_in some ‘DL1 glomerulus’ • Can be expanded if detail needed. • Provides rigorous documentation of meaning.

Where to start? • Make a flat list of the terms you need and list the types of classification you want to use to link them together. • Has someone already formalized this type of classification? • If so, use their pattern. If not – draft some formalizations yourself: • Are any simplifications justifiable – or likely to be too misleading? • DON’T FORMALIZE FOR THE SAKE OF IT! Some classifications are hard to formalize well – or may be best left to human judgment. • Import upper classifications and relations • Import classifications to root for all foreign terms used. • Work with ontologists to formally define relations where possible • But don’t let this become a road block!

Technical issues • Imports: • Importing whole ontologies is easy in both OBO and OWL • But importing large ontologies is impractical in both • Generating simple slices of OBO ontologies is easy (have perl scripts, happy to share) • Generating slices of OWL ontologies – some tools (Ontofox), but still need work.

Developing nested ontologies CARO VAO Present TAO Modularized ontology

Resources • CARO – upper ontology • new version being prepared out soon. • Some standard patterns using qualities • FUNCARO • provides standard patterns for representing function using CARO + GO • ro.owl • new home for OBO relations – particularly shortcut relations. Imports fundamental relations from BFO (basic formal ontology)

Multiple classification • There are lots of scientifically useful ways to classify a bit of anatomy: • parts and their arrangement - • its relation to other structures • what is it: part of; connected to; adjacent to, overlapping? • its shape • its function • its developmental origins • its species or clade • its evolutionary history?

Introduction to anatomy ontology building

Introduction to anatomy ontology building

Presentation Transcript

Anatomy Ontology Community

Introduction to Anatomy

INTRODUCTION TO ANATOMY

INTRODUCTION TO REGIONAL ANATOMY

INTRODUCTION TO REGIONAL ANATOMY

Introduction to Ontology

Introduction to Anatomy

Introduction to Anatomy

Introduction to Anatomy

Ontology - Introduction

INTRODUCTION TO ANATOMY

Introduction to Ontology

Introduction to Ontology

Building a C. elegans Cell and Anatomy Ontology

INTRODUCTION TO ANATOMY

INTRODUCTION TO ANATOMY

Introduction to Anatomy