1 / 22

Neighborhood - based Tag Prediction

Neighborhood - based Tag Prediction. Adriana Budura (adriana.budura@epfl.ch) joint work with: Sebastian Michel, Philippe Cudré-Mauroux, Karl Aberer. 1. Outline. Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions. 2.

singletary
Download Presentation

Neighborhood - based Tag Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neighborhood - based Tag Prediction Adriana Budura (adriana.budura@epfl.ch) joint work with: Sebastian Michel, Philippe Cudré-Mauroux, Karl Aberer 1 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  2. Outline Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 2 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  3. Motivation • Tagging portals « Web 2.0 » • users attach keywords (tags) to resources: flickr, del.icio.us, citeulike,… • Tags: • unstructured textual information • reflect the meaning of resources for users powerful tool to improve search BUT: we need many tags and users are lazy  • Therefore…. Automatic Tag Inference 3 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  4. Neighborhood -based Tag Prediction • IDEA: copy tags from other resources • Semantically related resources –> related tags • How to discover semantically similar resources? • Resources are connected via links (e.g., HTML, citations ) • neighborhood of a resource captures its context (e.g., citations in „Related Work“ ) • propagate tags along the edges of the graph • How relevant is a tag found in the neighborhood? Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  5. Computational Model • 3 concepts: • Documents • the resources for which we infer tags; uniquely identifiable • in our scenario: scientific publications, Web pages • Tags • keywords attached to the resources • Document neighborhoods • documents connected by users  graph Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  6. How relevant is a tag found in the neighborhood? • Neighborhood defines context (far away -> less related) • Enough support in the neighborhood • Some tags are more likely to occur together • Similar documents are likely to share the same tags Tag Distance Tag Occurence Tag Co-Occurence Document – Document Similarity Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  7. ranking IR IR TopK IR PageRank P2P P2P distributed distributed Principles of Tag Propagation e.g. Citation graph of publications d_init Tag Occurence Doc-Doc Similarity Tag Co-Occurence Tag Distance Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  8. Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 8 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  9. (1) Tag Co-Occurrence • relevance of a tag t for d_init based on the tags already assigned to d_init ? • conditional probability: • d_init can have more than one initial tag => we aggregate for sets of tags T(d_init) 9 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  10. (2) Doc-Doc Similarity • relevance of a tag t (coming from a document d) for d_init, based on the similarity between d and d_init ? • vector space model: • for documents that are several hops away we aggregate 10 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  11. (3) Tag Distance / (4) Tag Occurence • Tag Occurrence • what is the popularity (support) of a tag in the neighborhood • expressed as a sum over all scores for a tag t • Tag Distance • the distance between the documents d_init and d with tag t smallest path 11 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  12. Putting it All Together - sum of partial scores for each occurrence of a tag t in the neighborhood d_init Tag Occurence Doc-Doc Similarity, Tag Distance Tag Co-Occurence Combined Scoring Function: 12 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  13. Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 13 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  14. Inferring tags for a document • traverse the graph of documents and gather tags for the initial document • do not visit the whole neighborhood need smart graph traversal • the scoring model can compute a score for “every” tag  top-k tags are enough … when should we stop? Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  15. Graph Traversal • Precomputed: Tags + Scores for each document Doc 1 Doc 1 Doc 2 P2P, 0.3 Tag, 0.28 Social, 0.25 Paper, 0.2 2009, 0.1 Social, 0.4 Search, 0.33 Budura, 0.25 Tag, 0.2 Paper, 0.2 Doc 2 D_init • Select the next document based on the doc-doc similarity 15 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  16. Top-K Graph Traversal • List of all neighbors sorted by doc-doc sim • Select best document Doc x Visited Doc x P2P, 0.3 Tag, 0.28 Social, 0.25 Paper, 0.2 2009, 0.1 Social, 0.4 Search, 0.33 Budura, 0.25 Tag, 0.2 Paper, 0.2 D_init Social, 0.65 Paper, 0.4 Tag, 0.48 P2P, 0.3 .... top-k 16 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  17. Top-k Tag Inference • Fagin et al. - NRA Algorithm w b for each candidate tag • worst_score = actual score • best_score = worst_score + • best_to_come_score • prune a tag when • best_score < score of tag currently at rank k • stop when • seen k tags && no candidate tags left w b b w score (m-m‘) * Top-k, pos. k Candidate Expelled • unknown final “score” mass for each tag • Consider ONLYm occurences for each tag Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  18. Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 18 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  19. Experimental Setup • Datasets • del.icio.us (120K bookmarks) • CiteULike/CiteSeer (2200 crawled pdfs) • Measures of Interest: • Precision (user study) • Relative precision (computed based on already assigned tags) • Cost (number of visited neighbors) 19 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  20. Experimental Results: CiteULike 30 initial documents manual precision evaluation (user study) 20 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  21. Experimental Results: Del.icio.us 120 initial documents relative precision evaluation 21 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

  22. Conclusions Tag inference over edges of resource graphs 4 principles of tag propagation Scoring model Top-k tag inference with modest access to the resource graph 22 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09

More Related