1 / 23

Navigation Aided Retrieval

Navigation Aided Retrieval. Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo . Search & Navigation Trends. Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. Why ? Query formulation problems

seth-hughes
Download Presentation

Navigation Aided Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo

  2. Search & Navigation Trends • Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. • Why ? • Query formulation problems • Open ended search tasks • Preference for orienteering

  3. Search & Navigation Trends • User behaviour in IR tasks not often fully exploited by search engines ……….. • Content based – words • PageRank – in and out links for popularity • Collaborative – clicks on results • Search engines do not examine these navigation patterns ………(they fail to mention SearchGuide – Coyle et al that does)

  4. NAR – Navigation Aided Recommendation • New retrieval paradigm that incorporates post query user navigation as an explicit component – NAR • A query is seen as a means to identify starting points for further navigation by users • The starting points are presented to the user in a result-list and they permit easy navigation to many documents which match the users query

  5. NAR • Navigation retrieval with Organic structure • Structure naturally present in pre-existing web documents • Advantages • Human oversight – human generated categories etc • Familiar user Interface – list of documents (i.e. result-list) • Single view of document collection • Robust implementation – no semantic knowledge required

  6. The model • D – set of documents in corpus, T - users search task • ST – answer set for search task, QT- the set of valid queries for task T • Query submodel – belief distribution for the answer set given a query. What is the likelihood that doc d solves the task - Relevance • Navigation submodel – likelihood that a user starting at a particular document will be able to navigate (under guidance) to a document that solves the task.

  7. Conventional probabilistic IR Model • No outward navigation considered • Probability of solving the task depends on whether there is a document in the document collection which solves the task • Probability of the document solving a task is based on its “relevance” to the query

  8. Navigation-Conscious Model • Considers browsing as part of the search task • Query submodel – any probabilistic IR relevance ranking model • Navigation submodel – Stochastic model of user navigation WUFIS (Chi et al)

  9. WUFIS W(N, d1, d2) - probability that a user with need N will navigate from d1 to d2. • Scent provided by anchor and surrounding text. • The probability of a link being followed is related to how well a user’s need matches the scent – similarity between weighted vector of need terms and scent terms.

  10. Final Model • Documents starting point score = Query submodel X Navigation submodel

  11. Volant - Prototype

  12. Volant - Preprocessing • Content Engine • R(d,q) –estimated by Okapi DM25 scoring function • Connectivity Engine • Estimates the probability of a user with need N(d2) navigating from d1 to d2 starting with dw • Dijikstra’s algorithm used to generate tuples

  13. Volant – Starting points • Query entered -> ranked list of starting points • Retrieve from the content engine all documents, d’, that are relevant to the query • For each document retrieved from 1 retrieve from the connectivity engine all documents d for which W(N(d’),d,d’)>0 • For each unique d, compute the starting point score. • Sort in decreasing order of starting point score

  14. Volant – Navigation Guidance • When a user is navigation Volant intercepts the document and highlights links that lead to documents relevant to their query, q. • Retrieve from content engine all documents d’ that are relevant to q • For each d’ retrieved, get the documents that can lead to d from the connectivity engine i.e.W(N(d’),d,d’)>0 • For each tuple retrieved in step 2 highlight the links that point to dw

  15. Evaluation • Hypothesis • In query only scenarios Volant does not perform significantly worse that conventional approaches • In combined query/navigation scenarios Volant selects high-quality starting points. • In a significant fraction of query navigation scenarios the best organic starting point is of higher quality than the one that can be synthesized using existing techniques.

  16. Search Task Test Sets • Navigation prone scenarios are difficult to predict. Simplified Clarity Score was used to determine a set of ambiguous and unambiguous queries • Unambiguous – 20 search tasks with highest clarity from Trek 2000 • Ambiguous - 48 randomly selected tasks from Trek 2003

  17. Performance on Unambiguous Queries • Mean Average Precision • No significant difference • Why? Relevant documents tended not to be siblings or close cousins so Volant deemed that the best starting points were the documents themselves.

  18. Performance on Ambiguous Queries • User study – 48 judges judge the suitability of starting documents as starting points • 30 starting points generated • 10 Trec winner 2003 CSIRO • 10 Volant with user guidance • 10 (same as first 10 Volant) Volant without user guidance

  19. Performance on Ambiguous Queries • Rating criteria • Breadth – spectrum of people, different interests • Accessibility – how easy to navigate and find info • Appeal – presentation of material • Usefulness – would people be able to complete their task from this point. • Each judge spent 5 hours on their task

  20. Results

  21. Summary & Future Work • Effectiveness – responds to users and positions them at suitable starting point for their task, guides them to further information in a query driven fashion. • Relationship to conventional IR – generalizes conventional probabilistic IR model and is successful in scenarios where IR techniques fail – ambiguous queries etc

  22. Discussion • Cold Start Problem • Scalability • Bias in Evaluation

More Related