270 likes | 416 Views
Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims. Presentation by Dinesh Bhirud bhiru002@d.umn.edu. Introduction. The paper evaluates the robustness of learning to rank documents based on Implicit feedback. What is implicit feedback?
E N D
Evaluating the Robustness of Learning from Implicit FeedbackFilipRadlinskiThorsten Joachims Presentation by Dinesh Bhirud bhiru002@d.umn.edu
Introduction • The paper evaluates the robustness of learning to rank documents based on Implicit feedback. • What is implicit feedback? • Relevance feedback obtained from search engine log files • Easier to collect large amount of such training data as against explicitly collecting relevance feedback.
Osmot • Osmot – Search engine developed at Cornell University based on Implicit Feedback • Name Osmot comes from the word “osmosis” – learning from the users by osmosis • Query Chains – Sequence of reformulated queries. • Osmot learns ranked retrieval function by observing query chains and monitoring user clicks.
Data Generation • Set of W words are chosen, word frequencies obeying a Ziph’s law • T topics are picked by picking N words/topic uniformly from W. • Each document d is generated as • Pick kd binomially from [0,T] • Repeat kd times • Pick topic t • Pick L/kd words from topic t.
Relevance • 3 kinds of relevance • Relevance with respect to topic • Can be measured/known because document collection and topics are synthetic • Used for evaluating the ranking function. • Relevance with respect to query • Actual relevance score of a document with respect to a query • Used to rank documents • Observed relevance • Relevance of a document as judged by the user seeing only the abstract. • Used to simulate user behavior.
User behavior parameters • Noise – Accuracy of user’s relevance estimate • Affects observed relevance. (obsRel) • obsRel is drawn from an incomplete Beta distribution where α gives noise level and β is selected so that mode is at rel(d,q) • Threshold – User selectivity over results (rT) • Patience – Number of results user looks at before giving up (rP) • Reformulation – How likely is the user to reformulate query(Preform)
User Behavior Model While question T is unanswered 1.1 Generate query q (Let d1,d2..,dn be results for q) 1.2 Start with document 1 iei = 1 1.3 while patience (Rp) > 0 1.3.1 if obsRel(di,q) > rT 1.3.1.1 if obsRel(di+1, q) > obsRel(di,q) + c then continue looking further in the list 1.3.1.2 else di is a good document, click on it. If rel(di,T) is 1, user is DONE Decrease patience Rp. 1.3.2 else Decrease patience Rp Rp = Rp - (rT – obsRel(di,q)) 1.3. 3 Set i = i + 1 1.4 With probability (1 – Preform), user gives up.
User Preference Model • Based on the clickthrough log files, users’ preferences for documents given query q can be found. • Clickthrough logs generated by simulating users. • From preference, features values are calculated.
Feedback Strategies Single Query Strategy • Click >q Skip Above • For query q, if document di is clicked, di is preferred over all dj, j < i. • Click 1st >q No-Click 2nd • For query q, if document 1 is clicked, it is preferred over the 2nd document in the list.
Feedback Strategies 2-Query Strategy 1 • This strategy uses 2 queries in a query chain, but document rankings only for the later query. • Given queries q' and q in a query chain • Click >q' Skip Above • For query q', if document di is clicked in query q, di is preferred over all dj, j < i • Click 1st >q' No-Click 2nd • For query q', if document 1 is clicked, it is preferred over the 2nd document in the list for q
Feedback Strategies 2-Query Strategy 2 • This strategy uses 2 queries in a query chain, and document rankings for both used. • Given queries q' and q in a query chain • Click >q' Skip Earlier Query • For query q', if document di is clicked in query q, di is preferred over seen documents in query previous query. • Click >q' Top two earlier Query • If no document clicked for query q', then di preferred over top two in previous query.
Features • Document di would be mapped to feature vector with respect to query q. • 2 types of features defined • Rank Features • Term/Document Features
Rank Features • Rank features allow representation of ranking given by the existing static retrieval function. • Used a simple TFIDF weighted cosine similarity metric (rel0) • 28 rank features used for ranks 1,2,..,10,15,20,…100. • Set to 1 if clicked document is at or above specified rank.
Term Features • Allows representation of fine grained relationship between query terms and documents. • If for query q, document d is clicked, then for each word , • Forms a sparse feature vector, as only very few words are included in query.
Learning • Retrieval Function rel(di, q) defined as where is the weight vector. • Intuitively, weight vector assigns weight to each feature identified. • Task of learning a ranking function is reduced to the task of learning an optimal weight vector.
How does affect ranking? • Points are ordered by their projections onto • For the ordering will be 1,2,3,4. • For the ordering will be 2,3,1,4. • Weight vector needs to be learnt that will minimize number of discordant rankings.
Learning Problem Learning problem can be formalized as follows • Find weight vector such that maximum of following inequalities fulfilled. such that then • Without using slack variables, this is NP-hard problem.
SVM Learning • Equivalent optimization problem would be Minimize Subject to rearranging which we get constraint and and
Re-ranking using the learnt model • SVM-Light package is used. • Model provides values for all support vectors. • User behavior is again simulated, this time using the learnt ranking function. • How does reranking work? • First, a ranked list of documents is obtained using the original ranking function. • This list is re-ordered, using the weights of each feature obtained from the learnt model.
Experiments • Experiments done to study the behavior of the search engine by varying parameters like • Noise in users’ relevance judgement • Ambiguity of words in topics and queries • Threshold value which user considers good document • Users’ trust in ranking • Users’ probability of reformulation of query.
Noise – My experiment • Did implementation for extracting preferences and encoding them in features.