1 / 11

Exploiting Wikipedia Inlinks for Linking Entities in Queries

SIEL@ERD . Exploiting Wikipedia Inlinks for Linking Entities in Queries. Priya Radhakrishnan , Romil Bansal , Manish Gupta, Vasudeva Varma International Institute of Information Technology, Hyderabad, India. Entity Recognition and Disambiguation Challenge ACM SIGIR 2014

sheri
Download Presentation

Exploiting Wikipedia Inlinks for Linking Entities in Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIEL@ERD Exploiting Wikipedia Inlinks for Linking Entities in Queries PriyaRadhakrishnan, RomilBansal, Manish Gupta, VasudevaVarma International Institute of Information Technology, Hyderabad, India Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37th Annual International ACM SIGIR Conference priya.r@research.iiit.ac.in , romil.bansal@research.iiit.ac.in, gmanish@microsoft.com , vv@iiit.ac.in

  2. ERD Challenge http://www.freebase.com/m/046yc7 SIEL@ERD team from IIIT, Hyderabad SIEL@ERD team from IIIT, Hyderabad The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection[1] or knowledge base. SIEL@ERD

  3. SIEL@ERD system • TAGME[1] system with time and performance optimizations • Mention detection • Reduce the number of DB look-ups. • Disambiguation • Use (1-δ) instead of δ • Prominent senses restriction • Pruning SIEL@ERD

  4. Data Preprocessing and Measures Inlinksare links made from anchor to wikipedia article. Indexes Process English Wikipedia dump to create three indexes In-Link Graph Index Anchor Dictionary WikiTitlePageId Index Measures link frequency link(a) total frequency freq(a) pages linking to anchor(a) Pg(a), Prior probability Pr(p/a) Link Probability lp(a), Wikipedia Link-based Measure (δ)[3] SIEL@ERD

  5. Optimization - Mention detection • mention is any word or group of words that can potentially identify an entity. • Checking every word (and word group) for DB presence, increases the number of DB look-ups. • Reduce the number of mention candidates - Mention filtering methods. • Stopwordfiltering • Twitter POS Filtering SIEL@ERD

  6. Optimization - Mention detection Mention filtering methods. 1. Stopword Filtering : If the mention identified in the given query text contains only stopwords, we ignore that mention. We use the standard JMLR stopword list. 2. Twitter POS Filtering : The query text is Part-Of-Speech (POS) tagged with a tweet POS tagger [12]. Mentions that do not contain at least one word with POS tag as NN (indicating noun) are ignored. RUNS : Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48) SIEL@ERD

  7. Optimization - Disambiguation Identify all senses of the mention and choose the right one. 1. For identical pages, the δ should be 1. So we measured Relatedness between pages as 2. Prominent senses restriction 3. Disambiguation score for a mention a from candidate sense Pa RUNS: Run3 achieved an F1 of 0.483 SIEL@ERD

  8. Optimization – Pruning Identify and discard senses that are not semantically coherent Coherence is defined as the average relatedness between the given sense pa and the senses assigned to all other anchors. Pruning score combines coherence and link probability RUNS : Run6 SIEL@ERD

  9. Results *Base System : Linear Combination + TPOS Filtering + Normalized vote + Multi-row anchor index **Evaluated on 100 query set SIEL@ERD

  10. Please visit our poster SIEL@ERD Source code and Datasets : https://github.com/priyaradhakrishnan0/Entity-Recognition-and-Disambiguation-Challenge SIEL@ERD

  11. References • [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum,2014 • [2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010 • [3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 2008.

More Related