1 / 19

Semantics-Empowered Text Exploration for Knowledge Discovery

48 th ACM Southeast Conference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010. Semantics-Empowered Text Exploration for Knowledge Discovery. Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services Science Center (Kno.e.sis)

gudrun
Download Presentation

Semantics-Empowered Text Exploration for Knowledge Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 48th ACM Southeast Conference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010. Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services Science Center (Kno.e.sis) Department of Computer Science and Engineering Wright State University Dayton, OH Victor Chan Division of Biosciences and Performance Human Effectiveness Directorate Air Force Research Lab (AFRL) Wright-Patterson Air Force Base Dayton, OH

  2. OUTLINE • Background • Paradigm Shift • Demo • Architecture • Experimental Results • Future Work • Conclusion 3

  3. BACKGROUND • IR Systems - Interaction Paradigm • Manually seek information • Hyperlinked Documents • Document-Centric Model • Basis - Interaction Paradigm • Keyword Search • Document Browsing 4

  4. BACKGROUND S • Interaction Sequence • Assemble Keywords and Search • Document Selection • Document Inspection • Aggregation/Organization Information Need What is the role of Magnesium in relation to Migraine? Magnesium migraine Search 5

  5. LIMITATIONS Query: Father of the Web Answer: Sir Tim Berners-Lee • Query Reformulations • Impatient users • Recognition over Recall • Constrained navigation • Hyperlink dependent - apriori • Fuzzy User Interests • Haiti Earthquake – Recovery, Relief, Political Climate, Crime • Ineffective for Exploratory Search • Search-and-Sift

  6. MOTIVATION • Users are • A priori hyperlink dependent • Semantic Web Standards • Entity Identification (Semantic Annotations) • Relationship and Triple Identification • Explore documents/information via relationships information seekers Information is embedded in documents 7

  7. PARADIGM SHIFT • Search Hit > Annotated Hit • Bag of annotated words/phrases • Annotated phrase is known entity • Entity is Subject/Object of Triple • Navigation driven by relationships • Entity[Document]Relationship  Entity[Document] Contextual Navigation (relationships as context)

  8. CONTRIBUTIONS • Novel Information Exploration Paradigm • Data-Centric Model • Demonstrate use of background knowledge • Named Entities, Relationships • Prototype Implementation • Semantic annotations for navigation • Aggregation Utilities • Saving, bookmarking, publishing etc 9

  9. DEMO 10

  10. ARCHITECTURE Background Knowledge Document Corpus Articles saved using Lucene. Indexed as of Aug. 2009 Medline (19 million Abstracts) Linked Open Data 2 2 6 UMLS Yahoo (indexed documents accessed as a Web Service using Yahoo Search Boss) HCPO Ontology Search Semantic Browser Semantic Trail Log Workbench (SERP) 7 Spotter Module 1 1 3 3 Sequential record of each triple navigated by a user Annotated entities provide anchors that serve as entry points to navigation 5 Trie-based Spotter for Named Entity Identification used ultimately for document annotation Utilities provided for promoting, bookmarking, and saving search results Organize Save Publish 4 4 Controlled Vocabulary 992,281 DBpedia terms 15,742 HPCO terms 5,232 UMLS terms 8 Figure 1: System Components and Architecture

  11. IMPLEMENTATION • Spotter Module <abstract> Dietary restriction with hypomagnesia is normally associated with diminished urinary excretion. </abstract> magnesium magnesium Magnesium C0024467 magnesium Dietary restriction with hypomagnesia EntityID: This process is called Spottingand uses a Trie data structure. 12

  12. ARCHITECTURE • Document Corpus • Medline • Lucene Index - 19 million abstracts Aug 2009. • REST Endpoint: http://knoesis1.wright.edu/IndexWrapper • XML Response (or JSON) • Keyword queries, Document IDs • Background Knowledge • UMLS (Unified Medical Language System) • 5,232 entities and 16,540 triples • HPCO (Human Performance & Cognition Ontology) • 15,742 entities and 22,298 triples 13

  13. EVALUATION • Rank Feature on [1-5] scale • Normalized Relative Aggregated Scores 14

  14. CONCLUSION • Novel Information Exploration Paradigm • Semantic Browser support Contextual Navigation • Identify Named Entities and Relationships • Provide Semantic Annotations • Utilities for Aggregation • Semantic Trails to Knowledge Discovery 15

  15. FUTURE WORK x • Formal Model for Paradigm Shift • Improved Spotter • Additional Vocabularies, Context, Rule Based • Relationship Ranking • Document Re-ranking • Trail Logs Analysis 16

  16. ACKNOWLEDGEMENTS People • Cartic Ramakrishnan • BilalGonen, AdityaDhoke • Wesley Workman, Rodrigo Gama, Guilherme de Napoli Air Force Research Lab Human Effectiveness Directorate Wright-Patterson Air Force Base National Science Foundation Award SemDis: Discovering Complex Relationships in the Semantic Web. No. 071441 Wright State University No. IIS-0325464 to University of Georgia 17

  17. QUESTIONS 18

  18. TERMINOLOGY • Semantic Web – is an extension of the current web in which datais expressed in a common vocabulary making such that the data becomes machine processable. • Ontology– is a specification of concepts and relationships between them. • Triple- a ternary relation containing an entity pair and a relationship that expresses the link between them i.e. subject-predicate-object • Entity/Concept– an instance of a thing • URI– a unique identifier for any resource/entity/thing on the web • LOD- a semantic web initiative to provide a repository of semantically connected datasets 19

More Related