110 likes | 232 Views
SIEL@ERD . Exploiting Wikipedia Inlinks for Linking Entities in Queries. Priya Radhakrishnan , Romil Bansal , Manish Gupta, Vasudeva Varma International Institute of Information Technology, Hyderabad, India. Entity Recognition and Disambiguation Challenge ACM SIGIR 2014
E N D
SIEL@ERD Exploiting Wikipedia Inlinks for Linking Entities in Queries PriyaRadhakrishnan, RomilBansal, Manish Gupta, VasudevaVarma International Institute of Information Technology, Hyderabad, India Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37th Annual International ACM SIGIR Conference priya.r@research.iiit.ac.in , romil.bansal@research.iiit.ac.in, gmanish@microsoft.com , vv@iiit.ac.in
ERD Challenge http://www.freebase.com/m/046yc7 SIEL@ERD team from IIIT, Hyderabad SIEL@ERD team from IIIT, Hyderabad The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection[1] or knowledge base. SIEL@ERD
SIEL@ERD system • TAGME[1] system with time and performance optimizations • Mention detection • Reduce the number of DB look-ups. • Disambiguation • Use (1-δ) instead of δ • Prominent senses restriction • Pruning SIEL@ERD
Data Preprocessing and Measures Inlinksare links made from anchor to wikipedia article. Indexes Process English Wikipedia dump to create three indexes In-Link Graph Index Anchor Dictionary WikiTitlePageId Index Measures link frequency link(a) total frequency freq(a) pages linking to anchor(a) Pg(a), Prior probability Pr(p/a) Link Probability lp(a), Wikipedia Link-based Measure (δ)[3] SIEL@ERD
Optimization - Mention detection • mention is any word or group of words that can potentially identify an entity. • Checking every word (and word group) for DB presence, increases the number of DB look-ups. • Reduce the number of mention candidates - Mention filtering methods. • Stopwordfiltering • Twitter POS Filtering SIEL@ERD
Optimization - Mention detection Mention filtering methods. 1. Stopword Filtering : If the mention identified in the given query text contains only stopwords, we ignore that mention. We use the standard JMLR stopword list. 2. Twitter POS Filtering : The query text is Part-Of-Speech (POS) tagged with a tweet POS tagger [12]. Mentions that do not contain at least one word with POS tag as NN (indicating noun) are ignored. RUNS : Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48) SIEL@ERD
Optimization - Disambiguation Identify all senses of the mention and choose the right one. 1. For identical pages, the δ should be 1. So we measured Relatedness between pages as 2. Prominent senses restriction 3. Disambiguation score for a mention a from candidate sense Pa RUNS: Run3 achieved an F1 of 0.483 SIEL@ERD
Optimization – Pruning Identify and discard senses that are not semantically coherent Coherence is defined as the average relatedness between the given sense pa and the senses assigned to all other anchors. Pruning score combines coherence and link probability RUNS : Run6 SIEL@ERD
Results *Base System : Linear Combination + TPOS Filtering + Normalized vote + Multi-row anchor index **Evaluated on 100 query set SIEL@ERD
Please visit our poster SIEL@ERD Source code and Datasets : https://github.com/priyaradhakrishnan0/Entity-Recognition-and-Disambiguation-Challenge SIEL@ERD
References • [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum,2014 • [2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010 • [3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 2008.