1 / 20

Semantic Web Technologies for Analysis of Transcriptome

Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1 , Khaled Khelif 1 , Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis ACACIA project, http://www.inria.fr/acacia http://www.inria.fr/acacia/corese 2 IPMC, Sophia Antipolis http://www.ipmc.fr. Outline.

baptiste
Download Presentation

Semantic Web Technologies for Analysis of Transcriptome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz1, Khaled Khelif1, Olivier Corby1 Pascal Barbry2 1INRIA - Sophia Antipolis ACACIA project, http://www.inria.fr/acaciahttp://www.inria.fr/acacia/corese 2IPMC, Sophia Antipolishttp://www.ipmc.fr

  2. Outline • Context: Memory of Biochip Experiments • The MEAT Project • Semi-automatic generation of semantic annotations • Conclusions: Requirements for Semantic Web

  3. Context: Biochip experiments • DNA microarrays (gene chips, biochips) enable to simultaneously measure the expression level and transcription rate of various genes in an organism. • Applications in biology, medicine, pharmacology…: • Gene discovery • Disease diagnosis or prognosis • Drug discovery: Pharmacogenomics • Toxicological research: Toxicogenomics

  4. Towards Biochip Experiment Memory Experiment sheets Biologist Domain Ontologies Experiment DB Documents • Need of Knowledge Management for a community of biologists:  Biochip Experiment memory • Need of support to validation & interpretation of results of biochip experiments

  5. The MEAT Project MEDIANTE MEAT-Annot&Search MEAT-Miner UMLS, Gene Onto… MEAT-Onto

  6. Order slides in order to launch a new biochip experiment Submission of journal articles on genes supposed interesting Constitution of an electronic document corpus Creation of semantic annotations on these articleswith MEAT-Annot Phases: before experiment Biologist checks & validates probes available on the biochip& selects a subset

  7. Statistical analysis of results with MEAT-Miner Interpretation of results, using more bibliographical searches Addition of new semantic annotations on the experiment Phases: after experiment Storage of the experiment description and of its resultsin MEDIANTE, according to Array Express format

  8. MEAT-Annot:Annotation Acquisition Tool Automatic generation of annotations from a corpus Manual annotation editor MEAT-Search CORESE Search engine BRIGENE:Annotation base Article annotation base Result annotation base • - MEAT-dedicated • Query interface • Result browsing Interface General knowledge base MEAT-Annot&Search ARRAY-EXPRESS - Experiment description - Result description

  9. MEATAnnot: Technical Choices • NLP tools : term extractor + relation extractor • Extraction of terms corresponding to UMLS Ontology concepts, from texts • Extraction of relations between them, from texts • Automatic generation of a semantic annotation and representation in RDF

  10. Relationship extraction Test corpus Syntex • Syntex (Bourigault D. 2000) : Corpus syntactic analyser • Used to reveal « verb syntagms » usually used in the biochip domain

  11. Relationship extraction • Choosing potential relationship revealed by Syntex • Writing relationship extractiongrammar : using JAPE {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"}| {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"}| {Token.string == "important"}| {Token.string == "critical"}| {Token.string == "some"} | {Token.string == "unexpected"}| {Token.string=="multifaceted"} | {Token.string =="major"})? ({SpaceToken})? {Tag.lemme == "role"}

  12. System architecture UMLS Knowledge server {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"}| {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"}| {Token.string == "important"}| {Token.string == "critical"}| {Token.string == "some"} | {Token.string == "unexpected"}| {Token.string == "multifaceted"} | {Token.string == "major"})? ({SpaceToken})? {Tag.lemme == "effects"} {Tag.lemme == "play"} {SpaceToken} ({Token.string == "a"} | {Token.string == "an"})? ({SpaceToken})? ({Token.string == "vital"} | {Token.string == "important"} | {Token.string == "critical"} | {Token.string == "some"} | {Token.string == "unexpected"} | {Token.string =="multifaceted"} | {Token.string == "major"})? ({SpaceToken})? {Tag.lemme == "role"} Gate API ----- -- --- ----------- ---- ------------- RDF Annotations Biologist Documents MeatAnnot

  13. HGF : an instance of the concept « Amino Acid, Peptide or protein » • lung development  : an instance of the concept « organ or tissue function » • HGFplay rolelung development : an instance of the relation « play role » between the two terms Example « HGFplays an important role in lung development» The information extracted from this sentence are:

  14. RDF Annotation Generated • <rdf:RDF • xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' • xmlns:m='http://www.inria.fr/acacia/meat#' • xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'> • <m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> • <m:play_role> • <m:Organ_or_Tissue_Function rdf:about='lung_ • development#'/> • </m:play_role> • </m:Amino_Acid_Peptide_or_Protein> • </rdf:RDF>

  15. <accident> <date> 19 Mai 2000 </date> <description> <facteur>le facteur </description> </accident> Ontologies Documents XML Legacy sys. Users <ns:article rdf:about="http://intranet/articles/ecai.doc"> <ns:title>MAS and Corporate Semantic Web</ns:title> <ns:author> <ns:person rdf:about="http://intranet/employee/id109" /> </ns:author> </ns:article> <rdfs:Class rdf:ID="thing"/> <rdfs:Class rdf:ID="person"> <rdfs:subClassOf rdf:resource="#thing"/> </rdfs:Class> query answer push Schema in RDFS Annotations in RDF formed by instances of schema in RDFS RDFS RDF Queries Rules RDF/S Semantic Web server CG Support Web stack QUERIES PROJECTION RULES CG Base CORESE ONTOLOGY CG Results RDFS CG Rules INFERENCES RDF XML NAMESPACES CG Query URI UNICODE CORESE Semantic search engine

  16. Ontology-based query Formulate queries Interface Biologists Return results Submit queries Corese load load Annotation Base UMLS

  17. Semantic Web requirements • Adaptation of Corese semantic search engine to OWL • Corese query language vs SPARQL • Contextual annotations  Need of expression of multiple contexts / viewpoints • Temporal queries on the past biochip experiment base+ temporally evolving ontologies & annotations • Scalability of NLP tools: articles stemming from scientific watch on the open (semantic) Web…

  18. Many thanks to • ACACIA team: in particular Khaled Khelif, Laurent Alamarguy, Olivier Corby, Alain Giboin… • IPMC: Pascal Barbry, Kevin Le Brigand, Hélène, Chimène, Yves • Bayer Crop Science: Rémi Bars • Didier Bourigault (ERSS), developer of Syntex • The developers of GATE (Sheffield Univ.)

  19. Documents (Patient record, Best practices Guide …) <dossierPatient> <date> 19 Mai 2000 </date> <donneesAdministratives> <Patient><nom>Dupont</nom> <prenom> Michel </prenom> </Patient> </donneesAdministratives> … Support to health network Medical Ontology Semantic Annotations Translator Life Line Coresesearch engine Virtual Staff Member of the health network Nautilus DB

  20. Visual Staff Architecture

More Related