210 likes | 454 Views
War of Ontological Worlds: Mathematics, Computer Code or Esperanto? By Andrey Rzhetsky and James A. Evans PLos Computational Biology 7(9): e1002191 http :// tinyurl.com/6r34vst Presented by Brian Davis, Ph.D. VCDE WS Teleconference November 17, 2011.
E N D
War of Ontological Worlds: Mathematics, Computer Code or Esperanto?By Andrey Rzhetsky and James A. EvansPLos Computational Biology 7(9): e1002191 http://tinyurl.com/6r34vstPresented by Brian Davis, Ph.D. VCDE WS TeleconferenceNovember 17, 2011
Why is this Paper Important (for Brian)? • Conceptual framework to understand, and perhaps address, issues I see (in caBIG, 3rd Millennium, Politics, …Life) • Why do people talk past each other? • Why do some participants in caBIG (among whose aims is after all, semantic interoperability) have such a hard time understanding one another? • Why do some “Technologists” disagree on approaches to solving technical issues (SPARQL vs. DSQL; Ontologies vs. Metadata, etc.) • This is not a deep technical paper (4 pages, 1 figure). • Enjoyable, clever (Mao, Tolkien) • However, it is illustrates different Points of View (POV) among “ontology” experts … • especially in regards to the value of Use cases end Users • Allows a facilitator to perhaps find common ground to build upon.
War of Ontology Worlds: Different Views on Ontologies • “In biomedicine today, the term ontology means different things to different experts” • Abstract: • 3 clusters of experts view… • Ontology as Mathematics: value is rigor and logic, symmetry and consistency, representation across scientific sub-fields and include only non-contradictory knowledge • Ontology as Code: value is value is on utility and diversity: fit for purpose and custom design of ontologies • Ontology as Esperanto: value is facilitating cross disciplinary communication across data sets and diverse communities. • These different views align with classical divides in Science and suggestions how synthesis of concerns could strengthen the next generation of biomedical ontologies.
Origen's of and developments in ontologies • Definitions: philosophical inquiry into nature and categories of existence • Circa 1900: logicians extended and formalized…as a system for describing entities that exist in the world.” • properties • interrelationships • inferential mechanisms for reasoning • Circa 1990: computer scientists …applying it to…machine-readable knowledge representations. • Circa 2011: rise of scientific databases that are increasingly complex and persistent and require interoperability, ontologies have become enlisted in information technologies
MEANINGS of Ontologies • “In biomedicine today, the term ontology means different things to different experts” • A continuum • Unordered terminologies • Example: American Medical Association's list of Current Procedural Terminologies (CPT Coded) • Taxonomies • Example: International Classification for Diseases (ICD) • Organizes by hierarchical “is-a” relationships • Formal Ontologies • Example: Gene Ontology (GO), Foundational Model of Anatomy (FMA) • Organizes by rich, rigorous relationships • Disagreement on GO categorization (inconsistent structure)
Use examples • Unstructured Terminology (CPT) • Billing patients for medical procedures at hospitals • Taxonomies and Ontologies (GO) • Annotation of experimental findings in research • Formal Ontologies (FMA) • Reasoning across annotated findings for novel insight
Ontologists and Ontologies • Ontologies constructed by heterogeneous groups of: • Computer scientists • Bench biologists • Bedside physicians • Programmers • Philosophers • = “Ontologists” (self identified) • Conferences of Ontologists: • Focus on construction of ontologies • NOT focused on understanding ontologies as Knowledge Representations • When discussed = “scuffle of emotionally charged opinion”.
This paper • Interviews with 14 leading ontologists • Summarize the wide range of worldviews • Categorize as 3 Archetypal or Caricatures that highlight essential differences: • Mathematics • Code • Esperanto • Intermediate views (mixtures of each Archetype): consisting of weighted mixtures of the 3 above.
Ontologies as Mathematics Value = Formal consistency (because…) …Ultimate goal is computational reasoning across ontologies A single, unifying ontology covering the whole of biology and medicine is possible to design and pursue It need not be complete and should only contain established knowledge in order to approximate the underlying reality Quote: “unless you have a core of terms and relations which is universally valid, however small it might be, then you’re always going to have some kind of slack in your ontology…fall short of rigorous…” No need to represent uncertainty, hypotheses or speculations Introduction of probability will lead to “…results of quite low value.” First order logic (tools) and computationally tractable subsets of logic are appropriate tools for …inference across rigorous ontologies.
Ontologies as Mathematics (con’t.) • “Every ontology ever built should have the same upper [level] ontology, ideally.” • Examples: BFO, SUMO, Cyc • The best upper level ontologies will compete for scientific attention until the best will win out • Training: computer scientists and philosopher
Ontologies as Code • Value: utility • Practical value should trump mathematical elegance. • Ontologies should be designed specifically for a range of special or general purposes (like programming languages C++, HTML) • Quote: “I view ontologies as primarily as software artifacts.” • An Ontology should serve its function and intended user community (even if small) • The number of ontologies should be equal to or greater than the number of projects requiring structured knowledge representations • “Let a thousand flowers bloom” –Mao): let users create own custom ontologies • Design choices (of the ontology) are secondary to desired utility • Explicitly OPPOSED to the view of a unified ontology for the whole of biomedicine
Ontologies as Code (con’t.) “overly abstract mathematical ontologies provide a false sense of certainty. They obscure distinctions that might be useful to a particular task, and make unnecessary distinctions.” Abstract, upper level ontologies are disconnected from reality and may not have utility. Ontologies should be evaluated based on usability and efficiency in the context of specific problems. No unification of all ontologies: all ontologies can co-exist in peace This group = medical researchers, clinical researchers, bioinformaticists and biologists
Ontologies as Esperanto Value: facilitation of cross-community communication Ontologies should cross-link concepts from different domains to allow for knowledge transfer and insight between areas, even if imperfectly. Motivated by possibility of making data computable over fields, experimental techniques, countries and time periods. A Unified ontology is unrealistic Practical solution is “a federated interlinkage…a grid or a network of ontologies and vocabularies…” Systematically borrowing terms between ontologies is essential to create productive overlaps that reduce redundancy and facilitate cross-communication. Don’t need complete cross-mapping, but mapping is sufficient to compute over datasets as a whole. Ontology construction requires diplomatic social activity to coordinate between scientists and fields. (besides deep domain knowledge and design precision). Linguists
How they view each other • Mathematics vs. Code and Esperanto • suggests that computer code and Esperanto approaches are messy and inconsistent, even “silly and Childish”. • Esperanto and Code ontologies are inefficient to improve • Rarely able to reason over Esperanto or Code Ontologies without using probability to allow for contradiction and error. • Mathematics vs. Esperanto • Efforts to integrate domain-specific ontologies as compromising half-measures that abandon the potential strength of unification
How they view each other (con’t.) • Code and Esperanto vs. Mathematics • Mathematics approach is utopian • Of little practical use • Even potentially sinister (“one mother ontology to serve all purposes and in the darkness bind them”) -from Lord of the Rings • Code vs. Mathematics • Mathematics ontologies are incomplete • Unrepresentative of relevant knowledge in an area • Hence, unproductive • Mathematics ontologies are rigid and artificial to domain experts • Esperanto vs. Code • Environment is “eclectic chaos” • Multiplying unnecessary redundancy • Failing to exploit natural linking opportunities across knowledge
Ontology Challenges posed by Text Mining • Multiple levels of granularities co-exist in scientific literature • Eg: “protein methylation” • Molecular Biology: “PMRT5 methylates Histones H3 and H4” • Chemistry: multistage process. • Therefore, if we extract information from (legacy) text, we cannot commit to a single representation, if we want to retain the fidelity of its source. • Disagreement persist in the scientific communities: if we wish to retain fidelity (without arbitrary censorship) the disagreement must be retained. • Objects in ontologies change over time, so mentions in text may also change (eg, childhood lifecycle: changes of outcomes based on time of exposure: measles) • Must retain uncertainty and ambiguity: Theories and symbols change (eg, early “tubulin” later become “alpha tubulin, beta-tubulin”, etc.)
Conclusions and Next steps • “These challenges suggest a new virtue: Representativeness” (Esperanto view) • If ontologies are employed as indexing biomedical knowledge and to discover it, they must maintain inconsistent biomedical claims (just as research scientists attempt to do) • Inconsistencies should not be ignored as they point to theoretic weakness and opportunities. • Suggest that • All three ontology perspectives need to be honored • Usability of an ontology for a particular community should NOT be compromised • Additional efforts to maximize an ontologies mathematical rigor will improve its re-use and facilitate integrative analysis and discovery across biomedicine
In caBIG ICR and TBPT F2F in August 2011: Misunderstanding between research scientists, bioinformaticists and computer scientists (Esperanto, Code and Mathematics) Semantic Infrastructure: Discovery across federated services via ontologies (Sparql)(mathematics) Terminologies/Ontologies as fit for purpose in specific circumstances (BioPortal with >400 available) Tension between Software developer teams and users for terms they want/need (Code) vs. Infrastructure teams that desire “higher level utility” (eg, discovery across federated data via structured ontologies) Need to Quick changes and local terminologies (“Dynamic Extensions”)(Code) and need for consistency and rigor for federated discovery (Mathematics) SAIF Level of Abstraction (Conceptual, Logical, Implementable) and Viewpoints (Information, Business, etc.)
Suggestions? There are three valid points of view regarding the use and value of ontologies As technologists (possibly leaning to mathematics side), we should not foist our opinions and values on Code and Esperanto.