1 / 67

Semantic Web: Promising technologies, Current Applications & Future Directions

Semantic Web: Promising technologies, Current Applications & Future Directions. Australia, July-August 2008 Amit P. Sheth amit.sheth@wright.edu Thanks Kno.e.sis team and collaborators. Outline. Semantic Web – very brief intro of key capabilities and technlologies

thiery
Download Presentation

Semantic Web: Promising technologies, Current Applications & Future Directions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web: Promising technologies, Current Applications & Future Directions Australia, July-August 2008 Amit P. Sheth amit.sheth@wright.edu Thanks Kno.e.sis team and collaborators

  2. Outline • Semantic Web – very brief intro of key capabilities and technlologies • Real-world Applications demonstrating benefit of semantic web technologies • Exciting on-going research

  3. Evolution of the Web Web of people - social networks, user-created content - GeneRIF, Connotea Web of services - data = service = data, mashups - ubiquitous computing Web of databases - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation Web as an oracle / assistant / partner - “ask to the Web” - using semantics to leverage text + data + services + people 2007 1997

  4. What is Semantic Web? • Associating meaning with data: labeling data so it is more meaningful to the system and people. Formal description increases automation. Common interpretation increases interoperability. • TBL – focus on data: Data Web (“In a way, the Semantic Web is a bit like having all the databases out there as one big database.”) • Others focus on reasoning and intelligent processing

  5. Semantic Web Enablers and Techniques • Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base • Semantic Annotation (metadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic • Reasoning/computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, hypothesis validation, discovery, visualization

  6. Open Biomedical Ontologies Many ontologies exist Open Biomedical Ontologies, http://obo.sourceforge.net/

  7. Drug Ontology Hierarchy(showing is-a relationships) owl:thing prescription_drug_ brand_name brandname_undeclared brandname_composite prescription_drug monograph_ix_class cpnum_ group prescription_drug_ property indication_ property formulary_ property non_drug_ reactant interaction_property property formulary brandname_individual interaction_with_prescription_drug interaction indication generic_ individual prescription_drug_ generic generic_ composite interaction_with_monograph_ix_class interaction_ with_non_ drug_reactant

  8. N-Glycosylation metabolic pathway N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 GNT-Vattaches GlcNAc at position 6 N-acetyl-glucosaminyl_transferase_V UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 GNT-Iattaches GlcNAc at position 2

  9. Information Extractionfor Metadata Creation WWW, Enterprise Repositories Nexis UPI AP Digital Videos Data Stores Feeds/ Documents Digital Maps . . . . . . Digital Images Digital Audios . . . Create/extract as much (semantics)metadata automatically as possible; Use ontlogies to improve and enhance extraction EXTRACTORS METADATA

  10. Deep semantics Expressiveness, Reasoning Shallow semantics Metadata and Ontology: Primary Semantic Web enablers

  11. Automatic Semantic Metadata Extraction/Annotation

  12. Characteristics of Semantic Web Self Describing Easy to Understand The Semantic Web: XML, RDF & Ontology Machine & Human Readable Issued by a Trusted Authority Can be Secured Convertible Adapted from William Ruh (CISCO)

  13. Application Example 1: • Status: In use today • Where: Athens Heart Center • What: Use of semantic Web technologies for clinical decision support

  14. Operational since January 2006

  15. Active Semantic Electronic Medical Records (ASEMR) Goals: • Increase efficiency with decision support • formulary, billing, reimbursement • real time chart completion • automated linking with billing • Reduce Errors, Improve Patient Satisfaction & Reporting • drug interactions, allergy, insurance • Improve Profitability Technologies: • Ontologies, semantic annotations & rules • Service Oriented Architecture Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paper

  16. Demonstration

  17. Opportunity: exploiting clinical and biomedical data text binary Scientific Literature PubMed 300 Documents Published Online each day Health Information Services Elsevier iConsult User-contributed Content (Informal) GeneRifs NCBI Public Datasets Genome, Protein DBs new sequences daily Clinical Data Personal health history Laboratory Data Lab tests, RTPCR, Mass spec Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.

  18. Application Example 2 • Status: Completed research • Where: NIH/NIDA • What:Understanding the genetic basis of nicotine dependence. • How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).

  19. Genome and pathway information integration KEGG Reactome HumanCyc • pathway • protein • pmid Entrez Gene • pathway • protein • pmid • pathway • protein • pmid GeneOntology HomoloGene • GO ID • HomoloGene ID

  20. Entrez Knowledge Model (EKoM) BioPAX ontology

  21. Gene-Pathway Data Integration–Understanding the Genetic-basis of Nicotine DependenceCollaborators: NIDA, NLM Biological Significance: Understand the role of genes in nicotine addiction Treatment of drug addiction based on genetic factors Identify important genes and use for pharmaceutical productions

  22. Scenario 3 • Status: Completed research • Where: NIH • What: queries across integrated data sources • Enriching data with ontologies for integration, querying, and automation • Ontologies beyond vocabularies: the power of relationships

  23. Use data to test hypothesis Gene name Interactions GO gene Sequence PubMed OMIM Link between glycosyltransferase activity and congenital muscular dystrophy? Glycosyltransferase Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07

  24. In a Web pages world… (GeneID: 9215) has_associated_disease Congenital muscular dystrophy,type 1D has_molecular_function Acetylglucosaminyl-transferase activity Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07

  25. With the semantically enhanced data glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl-transferase GO:0008375 has_molecular_function acetylglucosaminyl-transferase GO:0008375 EG:9215 LARGE Muscular dystrophy, congenital, type 1D MIM:608840 has_associated_phenotype SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . ?g has molecular function ?t . ?g has_associated_phenotype ?b2 . ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) } From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07

  26. Scenario 5 • Status: Research prototype and in progress • Workflow withSemantic Annotation of Experimental Data already in use • Where: UGA • What: • Knowledge driven query formulation • Semantic Problem Solving Environment (PSE) for Trypanosoma cruzi (Chagas Disease)

  27. Knowledge driven query formulation • Complex queries can also include: • - on-the-fly Web services execution to retrieve additional data • inference rules to make implicit knowledge explicit

  28. T.Cruzi PSE Query Interface Figure 4: Semantic annotation of ms scientific data

  29. N-GlycosylationProcess (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification N-dimensional array Peptide list Data correlation Signal integration

  30. Biological Sample Analysis by MS/MS Agent Raw Data to Standard Format Agent Data Pre- process Agent DB Search (Mascot/Sequest) Agent Results Post-process (ProValt) O I O I O I O I O Storage Raw Data Standard Format Data Filtered Data Search Results Final Output Biological Information Semantic Web Process to incorporate provenance Semantic Annotation Applications

  31. ProPreO: Ontology-mediated provenance parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data

  32. ProPreO: Ontology-mediated provenance <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/> <fragment_ion m-z=“580.2985” abundance=“0.3592”/> <fragment_ion m-z=“688.3214” abundance=“0.2526”/> <fragment_ion m-z=“779.4759” abundance=“38.4939”/> <fragment_ion m-z=“784.3607” abundance=“21.7736”/> <fragment_ion m-z=“1543.7476” abundance=“1.3822”/> <fragment_ion m-z=“1544.7595” abundance=“2.9977”/> <fragment_ion m-z=“1562.8113” abundance=“37.4790”/> <fragment_ion m-z=“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data

  33. Problem – Extracting relationships between MeSH terms from PubMed 4733 documents 9284 documents 5 documents UMLS Semantic Network complicates Biologically active substance affects causes causes Disease or Syndrome Lipid affects instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH PubMed

  34. Background knowledge used • UMLS – A high level schema of the biomedical domain • 136 classes and 49 relationships • Synonyms of all relationship – using variant lookup (tools from NLM) • 49 relationship + their synonyms = ~350 mostly verbs • MeSH • 22,000+ topics organized as a forest of 16 trees • Used to query PubMed • PubMed • Over 16 million abstract • Abstracts annotated with one or more MeSH terms T147—effect T147—induce T147—etiology T147—cause T147—effecting T147—induced

  35. Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) • Entities (MeSH terms) in sentences occur in modified forms • “adenomatous” modifies “hyperplasia” • “An excessive endogenous or exogenous stimulation” modifies “estrogen” • Entities can also occur as composites of 2 or more other entities • “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium” (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )

  36. Method – Identify entities and Relationships in Parse Tree Modifiers TOP Modified entities Composite Entities S VP UMLS ID T147 NP VBZ induces NP PP NP NN estrogen NP IN by JJ excessive PP DT the ADJP NN stimulation MeSHID D004967 IN of JJ adenomatous NN hyperplasia NP JJ endogenous JJ exogenous CC or MeSHID D006965 NN endometrium DT the MeSHID D004717

  37. What can we do with the extracted knowledge? • Semantic browser demo

  38. Evaluating hypotheses affects Migraine Magnesium Stress inhibit isa Patient Calcium Channel Blockers Complex Query Supporting Document sets retrieved Keyword query: Migraine[MH] + Magnesium[MH] PubMed

  39. Workflow Adaptation: Why and How • Volatile nature of execution environments • May have an impact on multiple activities/ tasks in the workflow • HF Pathway • New information about diseases, drugs becomes available • Affects treatment plans, drug-drug interactions • Need to incorporate the new knowledge into execution • capture the constraints and relationships between different tasks activities

  40. Workflow Adaptation Why? New knowledge about treatment found during the execution of the pathway New knowledge about drugs, drug drug interactions

  41. Workflow Adaptation: How • Decision theoretic approaches • Markov Decision Processes • Given the state S of the workflow when an event E occurs • What is the optimal path to a goal state G • Greedy approaches rely on local optimization • Need to choose actions based on optimality across the entire horizon, not just the current best action • Model the horizon and use MDP to find the best path to a goal state

  42. Conclusion • semantic web technologies can help with: • Fusion of data: semi-structured, structured, experimental, literature, multimedia • Analysis and mining of data, extraction, annotation, capture provenance of data through annotation, workflows with SWS • Querying of data at different levels of granularity, complex queries, knowledge-driven query interface • Perform inference across data sets

  43. Take home points • Shift of paradigm: from browsing to querying • Machine understanding: • extracting knowledge from text • Inference, software interoperation • Semantic-enabled interfaces towards hypothesis validation

  44. References • A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006. • Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype InformationWWW2007 HCLS Workshop, May 2007. • Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4 • Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, submitted, 2007. • Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596 • Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006. • Demos at: http://knoesis.wright.edu/library/demos/

  45. More about the Relationship Web Relationship Web takes you away from “which document” could have information I need, to “what’s in the resources” that gives me the insight and knowledge I need for decision making. Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails between Web Resources. IEEE Internet Computing July 2007 (to appear) [.pdf]

  46. Events: 3 Dimensions – Spatial, Temporal and Thematic Spatial Temporal Thematic

  47. Events and STT dimensions • Powerful mechanism to integrate content • Describes the Real-World occurrences • Can have video, images, text, audio all of the same event • Search and Index based on events and STT relations • Many relationship types • Spatial: • What events happened near this event? • What entities/organizations are located nearby? • Temporal: • What events happened before/after/during this event? • Thematic: • What is happening? • Who is involved? • Going further • Can we use What? Where? When? Who? to answer Why? / How? • Use integrated STT analysis to explore cause and effect

  48. Example Scenario: Sensor Data Fusion and Analysis High-level Sensor Low-level Sensor • How do we determine if the three images depict … • the same time and same place? • the same entity? • a serious threat? 50

  49. Data Pyramid Sensor Data Pyramid Knowledge Ontology Metadata Expressiveness Entity Metadata Information Feature Metadata Raw Sensor (Phenomenological) Data Data

  50. What is Sensor Web Enablement? http://www.opengeospatial.org/projects/groups/sensorweb 52

More Related