240 likes | 385 Views
Ontrez. Clement Jonquet – Nigam Shah { jonquet,nigam }@ stanford.edu. Speech overview. Ontrez general idea “provide a service that will enable users to locate biomedical data resources related to their search for particular ontology terms” Functional specification and conceptual levels
E N D
Ontrez Clement Jonquet – Nigam Shah {jonquet,nigam}@stanford.edu
Speech overview • Ontrez general idea • “provide a service that will enable users to locate biomedical data resources related to their search for particular ontology terms” • Functional specification and conceptual levels • Ontrez example • GEO dataset • Architecture and implementation • UML class diagrams • Ontrez position within NCBO project • Relation with OBD@Berkeley and OBD@Stanford • Next steps & conclusion
Classic Ontrez use case A user search for information and content related to a specific disease… • Go to BioPortal to search ontologies… • The given disease name matches with several ontology terms… • For each of this term, the user get a link to all the resource elements (data sets, clinical trial, articles) annotated by this term.
Ontrez challenge • Biomedical resource elements (e.g., experiments and data) in the public domain are exploding • Element = a collection of observations resulting from a biomedical experiment (experimental data sets, records of disease associations of gene products in mutation databases, entries of clinical-trial descriptions, etc.) • Resource = a collection of elements (GEO, PubMed, or other public repositories) • Researchers need tools to enable them to find all the resource elements relevant to their area of study • The problem now is locating the ‘elements’ that matters to a user. • Key challenge is to annotate (or tag) various resource elements to identify the biomedical concepts to which they relate • Annotation = an assertion declaring a relationship between a biomedical resource elements and a term in an ontology • Term = concept found in an specific ontology
Creation of the annotation database Ontrez proposal • Retrieve the metadata from data resources (A) • Annotate/tag them with ontology terms using the library of ontologies in Bioportal (B) • Store the result in an annotation database
Query of the annotation databaseOntrez Proposal • User queries are formu--lated as a set of terms (1) • Use of the BioPortal index to convert the query to ontology terms • Use the subsumption relations in the ontologies and the mappings in BioPortal to expand the query • Query the annotation tables with the expanded set of terms (2) • The user receives the result (3) in terms of references to the original data sources.
Ontrez functional specification • For a given resource, being able to: • access and update automatically resource information • access and keep locally the set of elements of this resource • automatically update the local copy if necessary • extract the structure of elements of this resource • For a given element, being able to: • extract (according to the structure) the metadata • annotate each part of the metadata with a dictionary • For a given set of ontologies, being able to: • construct a dictionary • For a given annotation, being able to: • process the transitive closure for a given set of ontologies • link back to the original resource element • For a given term, being able to: • get the resources elements annotated with this term • semantically expand to a larger relevant set of terms
Ontrez in the new BioPortal prototype Example of resource available (name and description) Number of resource elements annotated with this ontology term Ontology term search by a user URL link to the original element Context in which an element has been annotated ID of an element
Ontrez implementation • Construction of the index: Java • Storage of resource elements, dictionaries, and … annotations: MySQL • Connexion to ontologies for dictionary construction and semantic query expansion: • Web services for Bioportal; • JDBC connection to MySQLDB for UMLS
Concept recognition tools • University of Michigan mgrep tool • National Center for Integrative Biomedical Informatics (NCIBI) • Has a very high degree of accuracy (over 95%) in recognizing disease names • RongXu’s • BMIR PhD candidate • MetaMap Transfer (MMTX) • We have not yet conducted an evaluation • Nipun Bathia’s project BS student at Stanford
First resources processed(Ontrez v0 and v1) • 5 resources for the moment (not complete) • To be completed and done with versioning, automatic updates, etc.
Open Biomedical Data (OBD) NCBO Core 2 • “a database resource (…) • “that will allow expert scientists both to archive experimental data and to use the OBO ontologies and terminologies to create appropriate annotations” • CREATION • “for storing, visualizing, and analyzing the ontology-based annotations that are linked to primary experimental data” • USE
Different annotation sets in OBD • OBD@stanford: • “disease oriented approach” • automatically generated annotations of text meta-data • OBD@Berkeley: • “genotype-phenotype pairsapproach” • manually generated and curator based annotations/assertions
Interaction with and integration into Bioportal ? • We don’t really care one another about how these DBs are produced • Manually, with curators etc. on Berkeley side • Using a NLP tool on text meta-data on Stanford side • We have several annotations databases (Ontrez/OBD/external ones) for which we need to specify: • Annotation table structure i.e., [ elementLocalID | termID ] • Interaction with Bioportal to: • get the term IDs; • request the DB and integrating results in UI. Web services API .jar ? ? [entryID | elementLocalID | termID | itemKey | dictionaryVer]
Next steps & conclusion • Resources to process completely and versioning and update mechanisms to implement • First evaluation/comparison • Research and implementation of semantic query expansion • New resources to be added • Formalization with Berkeley folks of OBD@ncbo • Results integration within BioPortal • Collaboration on what to do with annotations
Thank you Any questions?
Who am I? Clement, 27, French Post doc on NCBO since September 2007 at Stanford University • PhD in Informatics • University Montpellier 2 (FR) • Thesis • Multi-Agents Systems • Service oriented computing • Grid • Nothing to go with biomedical ontologies but… • www.stanford.edu/~jonquet