190 likes | 472 Views
Linking named entities in Tweets with knowledge base via user interest modeling. Author: Chen Li Bin Wang Xiaochun Yang Speaker: Annan Wei. Outline. Introduction Tweet Entity Linking KAURI Framework Experiments Conclusion. Introduction. Twitter: a popular micro-blogging platform
E N D
Linking named entities in Tweets with knowledge base via user interest modeling Author: Chen Li Bin Wang Xiaochun Yang Speaker: Annan Wei
Outline • Introduction • Tweet Entity Linking • KAURI Framework • Experiments • Conclusion
Introduction • Twitter: a popular micro-blogging platform important information source • Tweets: users can publish and share information topics ranging from daily life to news events Sun: the star at the center of the Solar System Sun Microsystems: a multinationalcomputer company Sun-HwaKwon : a fictional character or many other entities named “Sun”.
Outline • Introduction • Tweet Entity Linking • KAURI Framework • Experiments • Conclusion
Tweet Entity Linking • Tweet Entity Linking: The task to link the named entity mentions detected from tweets with the corresponding real world entities in the knowledge base. • Previous methods: • linking entities in Web documents • Context Similarity • Topical coherence • Challenging: noisy ,short ,informal nature
Tweet Entity Linking • Intra-tweet local information: • prior probability, similarity and topically coherent • Inter-tweet user interest information input output t1->Bulls 1.Bulls(rugby) 2.Chicago Bulls 3.Bulls,New Zealand (not Work well) t3->Scott (not Work well) t2->Sun: 1. Sun 2.Sun Microsystems 3.Sun-Hwa Kwon (Work well)
Outline • Introduction • Tweet Entity Linking • KAURI Framework • Graph construction • Initial interest score estimation • User interest propagation algorithm • Experiments • Conclusion
KAURI Framework • Assumption 1. Each Twitter user has an underlying topic interest distribution over various topics of named entities. • Assumption 2. If some named entity is mentioned by a user in his tweet, that user is likely to be interested in this named entity. • Assumption 3. If one named entity is highly topically related to the entities that a user is interested in, that user is likely to be interested in this named entity as well.
KAURI Framework • Construct a graph of which the structure encodes the interdependence information between different named entities • Estimate the initial interest score for each named entity in the graph based on the intra-tweet local information • User Interest Propagation Algorithm, to propagate the user interest score among different named entities across tweets using the interdependence structure of the constructed graph
Graph construction G =(V, A, W) Weight: • Indicating the strength of interdependence • Calculated using the Wikipedia Link-based Measure
Initial interest score estimation Initial interest score Context similarity Prior Probability Topical coherence in tweet For tweet t1 which lack intra-tweet context information to link entity mention”Bulls”. For tweet t4,the prior probability candidate entity : Tony Allen(musician) > Tony Allen(backetball), But initial interest scores is higher than Tony Allen(musician). α + β + γ = 1
User interest propagation algorithm The Final interest score The interest propagation strength matrix Initial interest score
Outline • Introduction • Tweet Entity Linking • KAURI Framework • Experiments • Conclusion
Experiments • Data set: • Tweet entity linking consists of detecting all the named entity mentions in all tweets and identifying their corresponding mapping entities exist in YAGO.
Experiments LOCALfull and KAURIfull: performance by leveraging all the intra-tweet local features. LOCALβ=0,γ=0and KAURIβ=0,γ=0: when we calculate the initial interest score using Formula 4, we set β=0and γ=0.
Outline • Introduction • Tweet Entity Linking • KAURI Framework • Experiments • Conclusion
Conclusion • Proposed KAURI, a graph-based framework that combined intra-tweet local information with the inter-tweet user interest information. • KAURI achieves high performance in term of accuracy and efficiency ,and scales well to tweet stream.
Thanks! Question?