1 / 10

Retrieval Evaluation

Retrieval Evaluation. Introduction. Evaluation of implementations in computer science often is in terms of time and space complexity. With large document sets, or large content types, such performance evaluations are valid.

paulashaw
Download Presentation

Retrieval Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Retrieval Evaluation

  2. Introduction • Evaluation of implementations in computer science often is in terms of time and space complexity. • With large document sets, or large content types, such performance evaluations are valid. • In information retrieval, we also care about retrieval performance evaluation, that is how well the retrieved documents match the goal.

  3. Retrieval Performance Evaluation • We discussed overall system evaluation previously • Traditional vs. berry-picking models of retrieval activity • Metrics include time to complete task, user satisfaction, user errors, time to learn system • But how can we compare how well different algorithms do at retrieving documents?

  4. Precision and Recall • Consider if we have a document collection, a query and its results, and a task and its relevant documents. Relevant Documents in Answer Set |Ra| RelevantDocuments|R| Retrieved Documents|A| DocumentCollection

  5. Precision • Precision – the percentage of retrieved documents that are relevant. • = |Ra| / |A| Relevant Documents in Answer Set |Ra| RelevantDocuments|R| Retrieved Documents|A| DocumentCollection

  6. Recall • Recall – the percentage of relevant documents that are retrieved. • = |Ra| / |R| Relevant Documents in Answer Set |Ra| RelevantDocuments|R| Retrieved Documents|A| DocumentCollection

  7. Precision/Recall Trade-Off • We can guarantee 100% recall by returning all documents in the collection … • Obviously, this is a bad idea! • We can get a high precision rate by only returning documents that we are sure of. • Maybe a bad idea • So, retrieval algorithms are characterized by their recall and precision curve

  8. Plotting Precision/Recall Curve • 11-Level Precision/Recall Graph • Plot precision at 0%, 10%, 20%, …, 100% recall. • Normally averages over a set of standard queries are used. • Pavg(r) = Σ ( Pi(r) / Nq ) • Example (using one query): • Relevant Documents (Rq) = {d1, d2, d3, d4, d5, d6, d7, d8, d9, d10} • Ordered Ranking by Retrieval Algorithm (Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1}

  9. Plotting Precision/Recall Curve • Example (second query): • Relevant Documents(Rq) = {d1, d7, d82} • Ordered Ranking by Retrieval Algorithm(Aq) = {d10, d27, d7, d44, d35, d3, d73, d82, d19, d4 , d29, d33, d48, d54, d1} • Need to interpolate. • Now plot the average of a set of queries that matches expected usage and distribution

  10. Evaluating Interactive Systems • Empirical data involving human users is time consuming to gather and difficult to draw universal conclusions from. • Evaluation metrics for user interfaces • Time required to learn the system • Time to achieve goals on benchmark tasks • Error rates • Retention of the use of the interface over time • User satisfaction

More Related