1 / 25

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs. Reporter Hsan-Yu Lin. Outline. Introduction Related Work Reformulation Strategies Reformulation Effectiveness Metrics Discussion And Conclusion. Introduction. Query reformulation (refinement)

louvain
Download Presentation

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing and EvaluatingQuery Reformulation Strategies in Web Search Logs Reporter Hsan-Yu Lin

  2. Outline • Introduction • Related Work • Reformulation Strategies • Reformulation Effectiveness Metrics • Discussion And Conclusion

  3. Introduction • Query reformulation (refinement) • Users frequently modify a previous search query in hope of retrieving better results • Goal: • Look at the types of query reformulation users perform • Evaluate them using effectiveness metrics such as click data

  4. Related Work • Computer-Generated Reformulations

  5. Related Work • Query Session Boundary Detection • Automatic new topic identification using multiple linear regression (Information Processing & Management 2006) • using time and common words • Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) • using hierarchical clustering to find better timeout value

  6. Procedure 1. Create taxonomy of query reformulation strategies defined by formal language 2. An unsupervised rule-based classifier in detecting the different query reformulation strategies 3. Analysis of correlations between query reformulation strategies and effectiveness metrics

  7. Reformulation Strategies • Definitions: • _ : space character • P = {',−,.} : punctuation • λ : empty string • Σ = {[a - z],[0 - 9]}U P: alphabet • ci∈ Σ: character • wi∈ Σ∗: word • zi∈ ( Σ U {_} )∗ :any string

  8. Reformulation Strategies • REFORM. 1: WORD REORDER • seattle pizza palace  pizza seattle palace • REFORM. 2: WHITESPACE AND PUNCTUATION • wal mart, tomatoprices walmart tomato prices

  9. Reformulation Strategies • REFORM. 3: REMOVE WORDS • yahoo stock price  price yahoo • REFORM. 4: ADD WORDS • eastlake home  eastlake home price index • REFORM. 5: URL STRIPPING • http www.yahoo.com yahoo

  10. Reformulation Strategies • REFORM. 6: STEMMING • running over bridges  run over bridge • REFORM. 7: FORM ACRONYM • personal computer pc • REFORM. 8: EXPAND ACRONYM • pda personal digital assistant

  11. Reformulation Strategies • REFORM. 9: SUBSTRING • is there spyware on my computer  is there spywa • REFORM. 10: SUPERSTRING • nevada police rec  nevada police records 2008 • REFORM. 11: ABBREVIATION • shortened dict --> short dictionary

  12. Reformulation Strategies • REFORM. 12: WORD SUBSTITUTION • Synonym: easter egg search easter egg hunt • Hyponym: crimson scarf  red scarf • Hypernym: personal computer laptop • Meronym: finger hand • Holonym: automobile wheel • REFORM. 13: SPELLING CORRECTION • reformualtion reformulation

  13. Undetected Reformulations • Categories of reformulations which are not included in taxonomy: • Semantic Rephrasing • how to calculate nutritional values  weight watchers calculator • Multi-Reformulations • lane county gabrage  lane county garbage disposal (add words and spelling correction) • Classifier Rule Limitations • spelling correction used a Levenshtein edit distance of 2 • Wordnet database limitation

  14. Undetected Reformulations

  15. The Rule-based Classifier

  16. Measures For Session Boundary Detection • Test data: • 100 users in the AOL query logs for evaluation • Same queries were removed (40.8% of queries) • 9,091 query pairs • 2,483 reformulations and 6,608 new queries (27.3% reformulations)

  17. Measures For Session Boundary Detection • Hope high precision but not necessarily high recall • interested in inter-reformulation rather than intra-reformulation

  18. Reformulation Effectiveness Metrics • Data: AOL query logs (released on 08/03/2006) • Queries: 36,389,567 • 16,069,421 new queries • 14,861,326 same queries • 3,411,706 reformulations • Metrics • Click Pattern • Click URL • Rank Change of Clicked Results

  19. Click Pattern

  20. Click Pattern • (SkipSkip + ClickSkip) v.s (SkipClick + ClickClick) • (SkipSkip) v.s (SkipClick)

  21. Click URL

  22. Rank Change and Median Time between Queries

  23. Discussion • different reformulation strategies were effective depending on the action from the initial query • Word substitution • Skip  Skip • Click  Click • spelling correction • Skip  Click • Click  Skip

  24. Limitations • Lack of Context • Normalized Query Logs • Ambiguous Queries • ‘american airlines’ , ‘delta airlines’ • Search Engine Effects

  25. CONCLUSIONS • Describes the human side of query reformulation and contributes to our understanding of users in search interaction • add/remove words, word substitution, acronym expansion, and spelling correction seem most effective • acronym formation and reordering wordsmay be less beneficial to the user

More Related