Generating Exact- and Ranked Partially-Matched A nswers to Questions in Advertisements

Generating Exact- and Ranked Partially-Matched Answers to Questions in Advertisements R. Qumsiyeh, M.S. Pera, and Y.-K. Ng

Introduction Current Search Tools • Searching tools on existing ads websites Form-based interface Simple keyword search • Problems encountered: cannot handle natural language questions with rich syntactic & semantic content, e.g., “Ion cars, cheaper than 6000 dollars, located in New York. If possible I want one with a sunroof” (a Facebook user question)

A Proposed Solution OurQA System • A Close-domain Question Answering (QA) system on ads (CQAds) • Problems, not addressed by existing ads searching tools, handled by CQAds be not Natural language questions Ambiguous/incomplete ads questions Shorthand notations/spelling mistakes in questions AND Blue AND Yellow Implicit/explicit Boolean operators in questions Ranking answers that partially match the user’s info. needs in questions

CQAds Classifying Ads Domains • Processes a user’s question Q by identifying Q’s ad domain using a Naïve Bayes classifier • Example. Consider Q: “Metallic blue BMW 721 with alloy wheels and a manual gear” (a Facebook car ad question) An ad domain Probability of d being the domain of Q (Based on the Joint Beta-Binomial Sampling Model) Set of ads domains

CQAds SpellingErrors/ShorthandNotations • Eliminates spellingmistakes using a triedata structure • Example. “Honda accorr less than $2,000” “Honda Accord less than $2,000” • Example. “Hondaacord less than $2,000” “Honda Accord less than $2,000” • Handles shorthand notations using a simple script • Example. “Cheap 4 dr Lexus, not above 75,000 mi” “Cheap 4 doors Lexus, not above 75,000 miles”

CQAds Identifying Users’ Info. Needs • Interprets a user’s information need specified in Q by • Tagging keywords in Q with their types (using a trie), e.g., • Applies context switching to identify/merge selection criteria using proximity keywords Red Ferrari under $10,000 Type II (Descriptive property of the item showcased in A) Type I (Unique identifier of the item showcased in an ad A) Type III (Quantitative property of the item showcased in A) Cheapest Avenger that has less than 25,000 miles Superlative (Max/Min values) Boundary (Range values)

CQAds Handling Incomplete Questions • Processes any incomplete/ambiguous question Q based on the valid ranges of attributes in Q • Examples. • H2 Hummer, yellow, 24 inch wheels, 4 wheel drive below 2000 Ford F-150, 4 door less than 7500 Within the pre-defined ranges of Price and Mileage Within the pre-defined ranges of Year, Price, and Mileage

CQAds Processing Non-Boolean Questions • The evaluation steps • Considers the primary-indexed field in a relation schema • Evaluates secondary-indexed fields • Analyzes boundaries on Type III attributes values • Evaluates superlatives • Example. 4. Superlatives (Apply to ads identified in Step 3) 2. Type II attribute (Apply to ads retrieved in Step 1) Cheapest red Honda less than 2,000 miles 1. Type I attribute (A primary-indexed field) 3. Boundaries on Type III (Apply to ads obtained in Step 2)

CQAds Processing Implicit Boolean Questions • The evaluation process • Complements the negated quantifiers • Combines quantifiers on the same attribute • Handles mutually exclusive attributes “… not less than $1500” “… more than $1500” “…more than $2000, less than $8000” “… between $2000 and $8000” “Toyota, Honda 2 door…” “(Toyota OR Honda) 2 door”

CQAds Processing Explicit Boolean Questions • A question Q is processed as is, if Q consists of • Sequences of attribute values separated only by ANDs (ORs, respectively), e.g., “blue OR red OR green car” • Otherwise • Excludes all the Boolean operators from Q • Evaluates Q using inferred Boolean operators • Example. “Find Toyota and Honda cars with 2 doors” “(Toyota OR Honda) with 2 doors”

CQAds Performing Partial Matching • Performs exact-match on “N – 1” selection criteria on Q • Calculates the (normalized) degree of similarity of the remaining condition in Q (based on its attribute type) against the ads in the DB • Example. $2000 white Honda Accord ≈ ? ≈ ? ≈ ? $1500 blue Toyota Camry 0.90 (Using query logs) 0.65 (Using word correlation factors)

ExperimentalResults PerformanceEvaluation • Dataset Benchmark Dataset Ads Sources Survey • Survey data 650 ads questions on 8 different ads domains that cover users’ basic needs Users’ assessments through Facebook to evaluate CQAds

ExperimentalResults AdsDomainClassification • Classification accuracy of CQAds on assigning 650 ads questions to their corresponding domains

ExperimentalResults BooleanQuestionsInterpretation • Accuracy of CQAds on interpreting the information needs on explicit/implicit Boolean questions • Based on 10 randomly-chosen sample questions (on diverse domains) and 182 responses offered by Facebook users “Black Mustang with gps, exclude 2 wheel drive, or a yellow corvette without a gps” “Show me Black Silver cars”

ExperimentalResults Exact-MatchedAnswers • Effectiveness of CQAds on retrieving answers exactly matching users’ specifications in 650 ads questions • Determined by Facebook users

ExperimentalResults PartialMatching&Ranking • Precision@K & Mean Reciprocal Rank (MRR) scores achieved by CQAds and other ranking approaches • Determined by 866 responses provided by Facebook users on the top-5 partially-matched answers to each of the 40 sample (Non-)Boolean questions

ExperimentalResults QuestionProcessingTime • Efficiency of CQAds and other ranking approaches • Based on the average time required to generate answers to each of the 650 ads questions gathered though Facebook

Conclusions CQAds Objective Processes natural language questions on ads Uniqueness To be or not to be Handles incomplete/ambiguous ad questions Corrects spelling mistakes & detects shorthand notations

Conclusions CQAds Uniqueness Red Ferrari Blue Honda Blue Toyota …AND… Condo, Apt Blue Toyota? Determines the evaluation order of selection criteria in questions using an elegant approach Retrieves exact/partial-matching answers using word-correlation factors, domain-specific matrices, and a novel similarity formula Handles (explicit/implicit) Boolean questions Validation Merit CQAds CQAds CQAds Highly effective in answering (non- Boolean) ads natural language questions More powerful than the search tools of existing ads websites Outperformsexisting ranking approaches

Questions

RelatedWork CQAds • Closed domains QA systems rely on • Ontologies [Chung 04, Wang 10, Vargas-Vera 10] • Pre-defined taxonomies & natural language processing [Terol07] • Semantically well-formed sentences available online [Wang 09] • Ranking approaches often based on • Scoring functions [Manning 08] • User-feedback measures [Bilotti 07, Kiebling 02] • Existing Frequently Asked Questions (FAQ) [Burke 97]

Generating Exact- and Ranked Partially-Matched A nswers to Questions in Advertisements