Hao-Chin Chang Department of Computer Science & Information Engineering

Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtokand Oren Kurland and David CarmelSIGIR 2010 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/08/01

Outline • Introduction • Relevance-Model • Relevance Score • Clarity • WIG • NUC • QF • Ranking List • Experiment • Conclusion

Introduction • We present a novel framework for query-performance prediction that is based on statistical decision theory and relevance model. • We consider a ranking induced by a retrieval method in response to a query as a decision taken so as to satisfy the underlying information need. • Our goal is to predict the query-performance of M with respect to q. • We instantiate various query-performance predictors from the framework by varying the • estimates of the relevance-model • measures for the quality of a relevance-model estimate • selects a measure of similarity between ranked lists

Relevance-Model • represents the information need Iq • Negative Cross Entropy

Relevance Score(Clarity,WIG) • The socre be measured by the KL divergence • WIG is based on estimating the presumed percentage of relevant documents in the set S from which is constructed

Relevance Score(NQC) • NQC, is based on the hypothesis that the standard deviation of retrieval scores in the result list is negatively correlated with the potential amount of query drift — i.e., non-query-related information manifested in the list. • u is the mean retrieval score in

Relevance Score(QF) • this goal is to represent ranked list L by a language model • Terms are ranked by their contribution to the language model’s KL (Kullback-Leibler) divergence from the background collection model. • Top ranked terms will be chosen to form the new query Q’

Relevance Score(QF) • P(D|L) is estimated by a linearly decreasing function of the rank of document D • Each term in P(w|L) is ranked • The top N ranked terms by form a weighted queryQ={(wi,ti)} • wi denotes the i-th ranked term • weight ti is the KL-divergence contribution of wi

Similarity between ranked lists • Pearson’s coefficient and Spearman’s-ρ and Kendall’s-γ correlationbetween the original list ranking and its relevance model based ranking are computed

Experiment

Conclusion • Improving the sampling technique used for relevance model construction • Devising and adapting better measures of representativeness for relevance models constructed form cluster

Hao-Chin Chang Department of Computer Science & Information Engineering