1 / 24

Wolf Siberski

What do you mean? – Determining the Intent of Keyword Queries on Structured Data. Wolf Siberski. Overview. Motivation Approaches in keyword search on structured data QUICK – Query Intent Construction for Keywords User interaction Algorithm Evaluation Conclusion.

kimn
Download Presentation

Wolf Siberski

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What do you mean? – Determining the Intent of Keyword Queries on Structured Data Wolf Siberski

  2. Overview • Motivation • Approaches in keyword search on structured data • QUICK – Query Intent Construction for Keywords • User interaction • Algorithm • Evaluation • Conclusion

  3. The Information Search Process Whatismysearchobjective? Whatexactly do I wanttoknow? Whichresultsatisfiesmyinformationneed? How do I express mysearchrequest? Sutcliffe/Ennis: Towards a cognitive theory of information retrieval

  4. IMDB Example – Keyword search Have they been working together? Brad Pitt Angelina Jolie In which movies did they both act? Brad Pitt Angelina Jolie IMDb Brad Pitt Angelina Jolie

  5. IMDB Example – Database search Are they working together, too? Brad Pitt Angelina Jolie In which movies did they both act? SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHERE A1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id AND R1.MovieId = R2.MovieId AND M.Id = R1.MovieId

  6. Context • Trend: general information captured as structured data (DBpedia, LinkedData, etc.) • Limited support for complex information needs • Keywords: Limited expressivity, but user-friendly • Structured Queries: High expressivity, but difficult to master  New ways to access this data required

  7. IR on Structured Data (Incomplete) • Not a newidea (Universal Relation, 1984) • Relevance Notion forstructureddata • Extractdatasubgraphs (tuplejoins) matchingthequery • Rank resultsaccordingtorelevance score • BANKS,DISCOVER, SPARK, EASE, etc. • Can servethe ‚head‘ ofuserdistribution, but not thelongtail • Low qualityofrelevancejudgements [Coffmann/Weaver, CIKM10] • Form builder • Enablevisualconstructionofuser-definedqueryforms • Requiresexplorationofdatabaseschema

  8. QUICK – Keyword Search on Databases • User startswithkeywordsearch • QUICK guidesuserthroughqueryconstructionprocess • Combines • Ease-of-useofkeywordsearch • Expressivityofdatabasequeries G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl:From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009. http://dx.doi.org/10.1016/j.websem.2009.07.005

  9. QUICK Search Process Brad Pitt Angelina Jolie QUICK User Keywords Is “Brad” part of a movie title? Is “Brad” part of an actor name? … Compute possible query intentions Compute selection options Selection options Select intended interpretation “Brad” is part of an actor name Refined Interpretation Find movieswhereboth Brad Pitt and Angelina Jolieareactors M.TitleM.Year 101 BiggestCe… 2004 Mr. & Mrs. Smith 2005 Stars on Trial 2005 Select intended query Query Compute results Results Evaluate results

  10. QUICK – Concepts • RDF Schema • Query Template • Query pattern on the schema • Contains only free variables • Semantic Query • Interpretation of a keyword query • Produced from query template by binding keywords

  11. Query Guide • Query Hierarchy • Semantic queries ordered by sub-query relationship • Query Guide • Graph including paths to all possible queries

  12. QUICK Example: Construction Options

  13. QUICK Example: Query List

  14. QUICK Example: Results

  15. Query Guide Construction – Offline Stage • Generate all Query Templates • Start with one-variable queries • Produce all possible combinations • Repeat until max. join path length reached • Build Inverted Index • Terms -> Attributes • Enables fast keyword-query mapping at runtime

  16. Query Guide Construction – Online Stage • Identify possible queries (leafs of query guide) • Extract partial query graph from template graph • Problem: query space can be very large  Find minimal query guide • Cost function: # of steps+ # of inspected suggestions • Minimal guide: smallest maximum cost • Depth/width tradeoff: Too flat Too deep Optimum: ln(n) split

  17. Greedy Query Guide Construction • Finding Minimal Guide: NP-Hard  • Use approach similar to set cover approximation • Determine nodes (=refinement options) top-down • Greedily select node leading to the lowest cost • Cost estimation: minimally incurred cost • Repeat until all nodes are covered

  18. Evaluation – Experiment Settings • IMDB database • Semantic Web representation • Queries from AOL query log • Selection criteria • Movie-related • 2-5 keywords • Refers to at least 2 entities • Manual assessment of query intention • Search process • Manual input of keywords • Selection of correct option according to query intention

  19. Evaluation – Guide Quality • Intended construction option usually among top 3 • Usually 3-5 clicks needed to construct query • Effective also for large query spaces

  20. Conclusion • Query construction with QUICK • Highly effective construction process • All intentions can be constructed • No query language or schema knowledge required • Further directions • Combine with relevance heuristics (IQP) • More flexible user interaction • Use facets for keyword bindings • Better multi term support • Optimized query guide generation • Exploit entity notion (QUnits) • Progressive query guide creation • Connect to QbE/Query Form Creation

  21. Evaluation – Performance • Initialization takes too much time for long queries • RDF store as bottleneck (creation of query hierarchy) • After initialization, response time is ok

  22. Optimizations • Identification of semantic queries • Index template subsets by attribute to enable fast filtering of queries without results • Enable fast disjunction of template subsets (e.g., ‚and on bitsets) • QCG generation • Parallel subquery computation • Caching of frequent subqueries

  23. Misc Ideas • Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback, http://sites.google.com/site/massiciara/)

  24. Cross Connections • Thomas Gottron: Traditional features (e.g. TF) not useful for very short text • Hinrich Schütze: entity related queries often ambigouous • Michael Granitzer: cycle of refinement/exploration • Norbert Fuhr: generate clusters based on possible queries and let users select the right cluster

More Related