320 likes | 415 Views
Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc., Sunnyvale, CA 94089, USA. Searching with Context. Agenda. Motivation Contextual Search Introduction Case Study: Y!Q Algorithms Query Rewriting Rank-Biasing Iterative, Filtering Meta-search (IFM)
E N D
Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc., Sunnyvale, CA 94089, USA Searching with Context
Agenda • Motivation • Contextual Search • Introduction • Case Study: Y!Q • Algorithms • Query Rewriting • Rank-Biasing • Iterative, Filtering Meta-search (IFM) • Evaluation and Results • Conclusion
Motivation • Traditional web search based on keywords as good as it gets? • Not too much qualitative differences between search results of major search engines • Introducing anchor text and link analysis to improve search relevancy last major significant feature (1998) • Search can be vastly improved in the dimension of precision • The more we know about a user’s information need, the more precise our results can be • There exists a lot of evidence (context) beyond the terms in the query box from which we can infer better knowledge of information need • Study of web query logs show that users are already employing a manual form of contextual search by using additional terms to refine and reissue queries when the search results for the initial query turn out to be unsatisfactory • => How can we automatically use context for augmenting, refining, and improving a user’s search query to obtain more relevant results?
Contextual Search - General Problems • Gathering evidence (context) • Representing and inferring user information need from evidence • Using that representation to get more precise results
Contextual Search - Terminology • Context • In general: Any additional information associated with a query • More narrow: A piece of text (e.g., a few words, a sentence, a paragraph, an article) that has been authored by someone • Context Term Vector • Dense representation of a context in the vector space model • Obtained using keyword extraction algorithms (e.g., Wen-tau Yih et al., KEA, Y! Content Analysis) • Search Query Types • Simple: Few keywords, no special or expensive operators • Complex: Keywords/phrases plus special ranking operators, more expensive to evaluate • Contextual: Query + context term vector • Search Engine Types • Standard: Web search engines (e.g., Yahoo, Google, MSN, …) that support simple queries • Modified: A Web search engine that has been modified to support complex search queries
Case Study: Y!Q Contextual Search • Acquiring context: • Y!Q provides a simple API that allows publishers to associate visual information widgets (actuators) to parts of page content (http://yq.search.yahoo.com/publisher/embed.html) • Y!Q lets users manually specify or select context (e.g., within Y! Toolbar, Y! Messenger, included JavaScript library) • Contextual Search Application • Generates a digest (context term vector) of the associated content piece as additional terms of interest for augmenting queries (content analysis) • Knows how to perform contextual searches for different search back-end providers (query rewriting framework) • Knows how to rank results based on query + context (contextual ranking) • Seamless integration by displaying results in overlay or embedded within page without interrupting the user’s workflow
Example Y!Q Actuator
Example Y!Q Overlay showing contextual search results
Example Y!Q: Searching in Context
Example CSRP Terms extracted from context
Implementing Contextual Search • Assumption: • We have a query plus a context term vector (contextual search query) • Design dimensions: • Number of queries to send to a search engine per contextual search query • Types of queries to send • Simple • Complex • Algorithms: • Query Rewriting (QR) • Rank-Biasing (RB) • Iterative, Filtering, Meta-Search (IFM)
Algorithm 1: Query Rewriting • Combine query + context term vector using AND/OR semantics • Input Parameters: • Query, context term vector • Number of terms to consider from context term vector • Experimental Setup: • QR1 (takes top term only) • QR2 (takes top two terms only) • … up to QR5 • Example: • QR3: Given query q and => q AND a AND b AND c • Pros: • Simplicity, supported in all major search engines • Cons: • Possibly low recall for longer queries
Algorithm 2: Rank-Biasing • Requires modified search engine with support for RANK operator for rank-biasing • Complex query comprises: • Selection part • Optional ranking terms are only impacting score of selected documents • Input Parameters: • Query, context term vector • Number of selection terms to consider (conjunctive semantics) • Number of RANK operators • Weight multiplier for each RANK operator (used for scaling) • Experimental Setup: • RB2 (uses 1 selection term, 2 RANK operators, weight multiplier=0.1) • RB6 (uses 2 selection terms, 6 RANK operators, weight multiplier=0.01) • Example: • RB2: Given q and => q AND a RANK(b, 2.5) RANK(c, 1.2) • Pros: • Ranking terms do not limit recall • Cons: • Requires a modified search engine back-end, more expensive to evaluate
Algorithm 3: IFM • IFM based on concept of Meta-search (e.g., used in Buying Guide Finder [kraft, stata, 2003]) • Sends multiple (simple) queries to possibly multiple search engines • Combines results using rank aggregation methodologies
IFM Query Generation • Uses “query templates” approach: • Query templates specify how sub-queries get constructed from the pool of candidate terms • Allow to explore the problem domain in a systematic way • Implemented primarily sliding window technique using query templates • Example: Given query q and => a sliding window query template of size 2 may construct the following queries: • q a b • q b c • q c d • Parameters: • Size of the sliding window • Experimental Setup: • IFM-SW1, IFM-SW2, IFM-SW3, IFM-SW4
IFM uses Rank Aggregation for combining different result sets • Rank aggregation represents a robust and principled approach of combining several ranked lists into a single ranked list • Given universe U, and k ranked lists 1, …, k on the elements of the universe • Combine k lists into *, such thatis minimized • For d(.,.) we used various distance functions (e.g,. Spearman footrule, Kendall tau) • Parameters: • Style of rank aggregation: • Rank averaging (adaptation of Borda voting method) • MC4 (based on Markov chains,more computationally expensive) • Experimental Setup: • IFM-RA, IFM-MC4
Experimental Setup and Methodology • Benchmark • 200 contexts sampled from Y!Q query logs • Tested 41 configurations • 15 QR (Yahoo, MSN, Google) • 18 RB (1 or 2 selection terms; 2, 4, or 6 RANK operators, 0.01, 0.1, or 0.5 weight multipliers) • 8 IFM (avg and MC4 on Yahoo, SW1 to SW4) • Per item test • Relevancy to the context, perceived relevancy used • Relevancy Judgments: • Yes • Somewhat • No • Can’t Tell • 28 expert judges, look at top 3 results, total of 24,556 judgments
Example • Context: • “Cowboys Cut Carter; Testaverde to Start OXNARD, Calif Quincy Carter was cut by the Dallas Cowboys on Wednesday, leaving 40-year-old Vinny Testaverde as the starting quarterback. The team would’nt say why it released Carter.” • Judgment Examples: • A result directly relating to the “Dallas Coyboys” (football team) or Quincy Carter => Yes • A result repeating the same or similar information => Somewhat • A result about Jimmy Carter, the former U.S. president => No • If result doesn’t provide sufficient information => Can’t tell
Metrics • Strong Precision at 1 (SP@1) and 3 (SP@3) • Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio • A result is considered relevant if and only if it receives a ‘Y’ relevant judgment • Precision at 1 (P@1) and 3 (P@3) • Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio • A result is considered relevant if and only if it receives a ‘Y’ or ‘S’ relevant judgment
Coverage Results • Highlights • Substantial drop in recall as number of vector entries in QR increases (expected), comparable between MSN, Yahoo, roughly one order of magnitude less on Google • For QR4 using MSN, Yahoo, low recall may potentially affect user experience • RB configurations tested same recall as QR2 • IFM works on substantially larger set of candidate results
Relevance Results for QR • Highlights • Use P@1, P@3, SP@1, SP@3 metrics • SP drops sharply for MSN, Yahoo beyond QR4 (recall issues) • Optimal operating point for MSN, Yahoo QR3/QR4, Google QR5 • QR4 uses 7.3 terms avg., QR5 uses 8.51 terms avg.
Relevance Results for RB and IFM • Highlights • RB2/RB6 best configurations within RBs, RB2 has highest SP@1 • IFM-RA-SW3 winner (best P@1)
Discussion of Results • Simple QR can attain high relevancy • However, precision decreases as function of low recall • Optimal setting depends on web search engine • Human reformulations are unlikely to attain the same level of relevancy as that of QR (best QR1 issues 2.25 terms attains P@3 of 0.504) • RB can perform competitively • particularly at SP@1 • Additional experiments showed that some good results are bubbling up from middle-tier of results (ranked between positions 100 and 1000) • Does not do well for SP@3 (problem if the “right” results are not recalled by selection part) • Requires substantial modifications to a web search engine • Contextual search is not solely a ranking problem, but one of recall • IFM • achieves highest recall and overall relevancy • Can be competitive and, in some measures, superior to QR • More costly to execute
Conclusion • Investigated three algorithmsd for implementing contextual search: • QR • RB • IFM • QR • can be easily implemented on top of a commodity search engine • Performs surprisingly well • Likely to be superior to manual query reformulation • Recall problems • RB and IFM break recall limitations of QR • IFM very effective • Outperforms both QR and RB in terms of recall and precision • The three algorithms offer a good design spectrum for contextual search implementers
Future Work • Further tuning of contextual search algorithms • Alterative presentations of context • Improve relevancy of context term vectors • Better word sense disambiguation • Investigate the usage of different context types (e.g., time, location, user profiles) • Improve contextual ranking and blending of different source types • How to leverage semantic web technologies • …
Example Search Scenarios • User wants to find the nearest movie theater • Context: location • Query: “movie theater” • User reads a press article about the new Mac OS X Tiger and wants to learn more about it • Context: news article • Query: review • User signs in to Yahoo! and wants to plan trip to ‘Java’ • Context: search history, user preferences • Query: java • => Query alone not sufficient! • => Context critical for returning relevant results • => Users often manually append context in form of adding extra query terms