240 likes | 348 Views
What do you mean? – Determining the Intent of Keyword Queries on Structured Data. Wolf Siberski. Overview. Motivation Approaches in keyword search on structured data QUICK – Query Intent Construction for Keywords User interaction Algorithm Evaluation Conclusion.
E N D
What do you mean? – Determining the Intent of Keyword Queries on Structured Data Wolf Siberski
Overview • Motivation • Approaches in keyword search on structured data • QUICK – Query Intent Construction for Keywords • User interaction • Algorithm • Evaluation • Conclusion
The Information Search Process Whatismysearchobjective? Whatexactly do I wanttoknow? Whichresultsatisfiesmyinformationneed? How do I express mysearchrequest? Sutcliffe/Ennis: Towards a cognitive theory of information retrieval
IMDB Example – Keyword search Have they been working together? Brad Pitt Angelina Jolie In which movies did they both act? Brad Pitt Angelina Jolie IMDb Brad Pitt Angelina Jolie
IMDB Example – Database search Are they working together, too? Brad Pitt Angelina Jolie In which movies did they both act? SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHERE A1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id AND R1.MovieId = R2.MovieId AND M.Id = R1.MovieId
Context • Trend: general information captured as structured data (DBpedia, LinkedData, etc.) • Limited support for complex information needs • Keywords: Limited expressivity, but user-friendly • Structured Queries: High expressivity, but difficult to master New ways to access this data required
IR on Structured Data (Incomplete) • Not a newidea (Universal Relation, 1984) • Relevance Notion forstructureddata • Extractdatasubgraphs (tuplejoins) matchingthequery • Rank resultsaccordingtorelevance score • BANKS,DISCOVER, SPARK, EASE, etc. • Can servethe ‚head‘ ofuserdistribution, but not thelongtail • Low qualityofrelevancejudgements [Coffmann/Weaver, CIKM10] • Form builder • Enablevisualconstructionofuser-definedqueryforms • Requiresexplorationofdatabaseschema
QUICK – Keyword Search on Databases • User startswithkeywordsearch • QUICK guidesuserthroughqueryconstructionprocess • Combines • Ease-of-useofkeywordsearch • Expressivityofdatabasequeries G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl:From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009. http://dx.doi.org/10.1016/j.websem.2009.07.005
QUICK Search Process Brad Pitt Angelina Jolie QUICK User Keywords Is “Brad” part of a movie title? Is “Brad” part of an actor name? … Compute possible query intentions Compute selection options Selection options Select intended interpretation “Brad” is part of an actor name Refined Interpretation Find movieswhereboth Brad Pitt and Angelina Jolieareactors M.TitleM.Year 101 BiggestCe… 2004 Mr. & Mrs. Smith 2005 Stars on Trial 2005 Select intended query Query Compute results Results Evaluate results
QUICK – Concepts • RDF Schema • Query Template • Query pattern on the schema • Contains only free variables • Semantic Query • Interpretation of a keyword query • Produced from query template by binding keywords
Query Guide • Query Hierarchy • Semantic queries ordered by sub-query relationship • Query Guide • Graph including paths to all possible queries
Query Guide Construction – Offline Stage • Generate all Query Templates • Start with one-variable queries • Produce all possible combinations • Repeat until max. join path length reached • Build Inverted Index • Terms -> Attributes • Enables fast keyword-query mapping at runtime
Query Guide Construction – Online Stage • Identify possible queries (leafs of query guide) • Extract partial query graph from template graph • Problem: query space can be very large Find minimal query guide • Cost function: # of steps+ # of inspected suggestions • Minimal guide: smallest maximum cost • Depth/width tradeoff: Too flat Too deep Optimum: ln(n) split
Greedy Query Guide Construction • Finding Minimal Guide: NP-Hard • Use approach similar to set cover approximation • Determine nodes (=refinement options) top-down • Greedily select node leading to the lowest cost • Cost estimation: minimally incurred cost • Repeat until all nodes are covered
Evaluation – Experiment Settings • IMDB database • Semantic Web representation • Queries from AOL query log • Selection criteria • Movie-related • 2-5 keywords • Refers to at least 2 entities • Manual assessment of query intention • Search process • Manual input of keywords • Selection of correct option according to query intention
Evaluation – Guide Quality • Intended construction option usually among top 3 • Usually 3-5 clicks needed to construct query • Effective also for large query spaces
Conclusion • Query construction with QUICK • Highly effective construction process • All intentions can be constructed • No query language or schema knowledge required • Further directions • Combine with relevance heuristics (IQP) • More flexible user interaction • Use facets for keyword bindings • Better multi term support • Optimized query guide generation • Exploit entity notion (QUnits) • Progressive query guide creation • Connect to QbE/Query Form Creation
Evaluation – Performance • Initialization takes too much time for long queries • RDF store as bottleneck (creation of query hierarchy) • After initialization, response time is ok
Optimizations • Identification of semantic queries • Index template subsets by attribute to enable fast filtering of queries without results • Enable fast disjunction of template subsets (e.g., ‚and on bitsets) • QCG generation • Parallel subquery computation • Caching of frequent subqueries
Misc Ideas • Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback, http://sites.google.com/site/massiciara/)
Cross Connections • Thomas Gottron: Traditional features (e.g. TF) not useful for very short text • Hinrich Schütze: entity related queries often ambigouous • Michael Granitzer: cycle of refinement/exploration • Norbert Fuhr: generate clusters based on possible queries and let users select the right cluster