1 / 15

Navigation-Aided Retrieval

This presentation discusses the concept of Navigation-Aided Retrieval (NAR), which utilizes organic structure in documents to enhance search tasks. It introduces a formal model of NAR and evaluates its effectiveness through a user study.

lhawthorne
Download Presentation

Navigation-Aided Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Navigation-Aided Retrieval Shashank Pandit and Christopher Olstony Presentation by Yang Yu CSE 450 Web Data Mining

  2. Outline • Introduction • Related Work • System Model • Prototype System • Evaluation • Summary & Future Work

  3. Introduction • Background reasons for this work • Difficulty in formulating appropriate queries • Open-ended search tasks • Preference for orienteering • Navigation-Aided Retrieval

  4. Introduction • Organic versus Synthetic Structure • One is trying to synthesize structure automatically into query results • One is trying to use structure that naturally exists in documents • Advantages of organic NAR • Human oversight. • Familiar user interface. • A single view of the document collection. • Robust implementation by a third party • Contributions • Formal model of navigation-aided retrieval • An overview of techniques for a NAR-based retrieval system • Empirical evaluation via a user study

  5. Related Work • Selecting Starting Points • Best Trails system • An ad-hoc scoring function for starting points • Restrict starting points to be documents that themselves match the query • It does not take into account navigability factors • User interface departs substantially from the traditional interface • Topic distillation that mainly uses HITS • Only effective for broad topic areas for which there are many hubs and authorities • Guiding Navigation • WebWatcher highlights hyperlinks along paths taken by previous users who had posed similar queries.

  6. System Model • Generic Model • Query submodel: • Navigation submodel: • generic scoring function • Assumption: every member of relevance set St is a singleton set. • “Fatten" St into {d1, d2, …, dn}.

  7. System Model • Instantiations of Generic Model • Conventional Probabilistic IR Model • Navigation-Conscious Model • The two terms embody the two key factors • the number of documents reachable from d that are relevant to the search task • the ease and accuracy with which the user is able to navigate to those documents.

  8. Prototype System • Preprocessing • Content Engine • Connectivity Engine: <d1, d2, dW, W(N(d2), d1, d2)> • Intermediary

  9. Prototype System

  10. Prototype System • Selecting Starting Points • 1. Retrieve from the content engine all documents d’ relevant to q. • 2. For each relevant document d’ retrieved in Step 1, then retrieve from the connectivity engine all documents d that can navigate to d’; • 3. For each unique document d in Step 2, compute the starting point score; • 4. Sort the documents in decreasing order of this score, truncate after the top k documents.

  11. Prototype System • Adding Navigation Guidance • 1. Retrieve from the content engine all documents d’ for which R(d’, q)>= T; • 2. For each document d’ retrieved in Step 1, retrieve from the connectivity engine the tuple corresponding to <d, d’>, if it exists. • 3. For each <d1, d2, dW, W(N(d2), d1, d2)> tuple retrieved in Step 2, highlight links on d that point to dW. • Efficiency and Scalability

  12. Evaluation • Experimental Hypotheses • In query-only scenarios, Volant does not perform significantly worse • In combined query/navigation scenarios, Volant performs better • The best organic starting point is of higher quality than one that can be synthesized using existing techniques. • Search Task Test Sets • Unambiguous: • Ambiguous: • Performance on Unambiguous Queries

  13. Evaluation • Performance on Ambiguous Queries • 4 Criteria - Breadth; Accessibility; Appeal; Usefulness.

  14. Summary and Future Work • Summary • Effectiveness • Relationship to conventional IR • Relationship to synthetic approaches • Future Work • Add redundancy to corpora • Tune scoring function to be applicable for synthetic starting points • Unified method can both for exploration and directly return document

  15. Thank you! Questions or Comments?

More Related