Effective XML Keyword Search with Relevance Oriented Ranking

Effective XML Keyword SearchwithRelevanceOriented Ranking Presentationby Volker Rehberg Paper by ZhifengBao, Tok Wang Ling, Bo Chen, Jiaheng Lu

Agenda I ) Motivation and Background II) Inferring Keyword Search Intention III ) RelevanceOriented Ranking IV ) Algorithms V ) Experimental Evaluation VI ) Conclusion

Motivation and Background Whatis „Effective XML Keyword SearchwithRelevance Oriented Ranking“ all about? • Keyword search Issue 1: identifysearchfornodeIssue 2: identitysearch via node Issue 3: rank each query result

Motivation and Background Ambiguities in interpretingthesearchfornodeandsearch vianode: Ambiguity 1: Keyword canappearas a xml tag nameandas a textvalueofsomeothernodes.

Motivation and Background Ambiguities in interpretingthesearchfornodeandsearch via node: Ambiguity 2: Keyword canappearasthetextvaluesof different typesofxmlnodesandcarry different meanings.

Motivation and Background Keyword query: Customer interestart SLCA returns 5 resultswithoutanyranking onlycomstumerwith ID C4 isdesiredandshouldbe top ranked

Motivation and Background Problems of SLCA: • does not considersemanticsof query and XML Data • Keyword ambiguityproblem • Norelevanceorientedranking  answers irrelevant touserssearchintention • answers not meaningfulland informative enough

Motivation and Background TF *IDF (Term Frequency * Inverse DocumentFrequency) • Rule 1: Inverse DocumentFrequency • Rule 2: Term Frequency • Rule 3: Normalization

Motivation and Background query . flat document keyword Normalize document/term frequency: Number of documents occurencesof k in document d documents containing k Weightsof query q anddocument d:

Inferring Keyword Search Intention Talking about “Art”: • Intuition :elementof „interest“ node, becausemanypeopleareinterested in art •  statisticsofunderlyingdatabase

Inferring Keyword Search Intention Node type Tissearchfornodeif: 1: Tisintuitivelyrelatedtoevery query keyword in q. 2: Tis informative enoughtocontainenough relevant information 3: T does not containtomuch irrelevant information numberofT – typednodesthatcontainkaseithervaluesor tag names in theirsubtrees keyword in query q reductionfactor (range 0-1) normallychosentobe 0.8

Inferring Keyword Search Intention Confidenceof a node type T tobedesiredsearchfornode: numberofT – typednodesthatcontainkaseithervaluesor tag names in theirsubtrees keyword in query q reductionfactor (range 0-1) normallychosentobe 0.8 Confidenceof a node type T tobedesiredsearch via node:

Inferring Keyword Search Intention Keyword query: Customer name rock interestart • „art“ shouldbe in interestand „rock“ shouldbesearchedfor in name •  order ofkeywords in the query important

Inferring Keyword Search Intention Value TypedDistance (Dist) Max(Distq (q, v, kt, k) , Dists (q, v, kt, k) In-Query Distance (IQD) Position distancebetweenktandk in q, ifktappearsbefore k in query StructuralDistance (Distq) Depthdistancebetweenv andthenearestkt – typedancestornodeofv node keyword that matches in v keyword that matches type of an anchester node of v

Inferring Keyword Search Intention Keyword query: Customer name rock interestart

RelevanceOriented Ranking Ranking Principles Searchingforcustomer via streetnodewith keyword query: Art Street Principle 1 only search via nodes affect relevance

RelevanceOriented Ranking Ranking Principles Searchingforcustomersinterested in artusing query: „art“ Principle 1 Principle 2 only search via nodes affect relevance search via node should contain keyword

RelevanceOriented Ranking Ranking Principles Keyword query: Customer name rock interestart Principle 1 Principle 2Principle 3 only search via nodes affect relevance search via node should contain keyword Order of keywords in query is important

RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node similarityvaluebetween q and a query First (base) case: similaritiesbetweenleafnodeandthe query Recursivecase: recursivesimilaritiesbetweeninternalnodenandthe query

RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node query similarityvaluebetween q and a similar to Classic TF*IDF: query flat document keyword

RelevanceOriented Ranking Capture XML‘shierarchicalstructuretocompute XML TF*IDF similarity (a) aisvaluenode (basecase) (b) aisinternalnode (recursivecase) Node query similarityvaluebetweenqanda ConfidenceofTctobesearch via node childnodeof a Similaritybetweencandq (recursively) Overall weightofaforthegiven query q Intuition Intuition relevant ifchildrenhavehighconfidencetobe a search via nodeandare relevant toq more relevant childrenincreaserelevanceofnode type

Algorithms Parsingtheinput XML document foreachnodenvisited: (1) Assign a DeweyIDton (2) Store theprefixpathprefixPathofn in hashtable

Algorithms Build 2 indices: 1. Keyword invertedlist : (1): Dup : DeweyIDand XML TF*IDF (fa,k) (2): DupType: Dup + node type (prefixpath) (3): DupTypeNorm: DupType + normalizationfactorWa „Node“ tuple: <DeweyID, prefixPath, fa,k , Wa > 2. Frequency Table: - stores (frequencyofk in node type T)

Algorithms The Algorithm: 1. Input: keywordsof query, invertedlist, frequencytable 2. Identifythesearchintentionandsearchfornode type 3. Rank bycomputing XML TF*IDF similaritybetweennandgiven query 4. returnrankedlist

Experimental Evaluation XReal vs. SLCA vs. XSeek AimsofTesting: • Searcheffectiveness • Ranking effectiveness Datasets: • real Datasets (Washington XML Data Repository, DBLP) • syntheticdatasets (XMarkbenchmark)

Experimental Evaluation

Conclusion • Identifysearchintentionand rank resultswithstatistics • Confidenceleveltobesearchfor/via nodewith XML TF*IDF • XML TF*IDF similarityrankingscheme • approachtriestosolveambiguityproblem • Prototype XReal

Effective XML Keyword Search with Relevance Oriented Ranking

Effective XML Keyword Search with Relevance Oriented Ranking

Presentation Transcript

Keyword Proximity Search on XML Graphs

XRANK: Ranked Keyword Search Over XML Documents

Effective Keyword Search for Valuable LCAs over XML Documents

Ranking support for keyword search on structured data using relevance model

Keyword Ranking: Sentiment Analysis with BaseX

The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking

Efficient Keyword Search over Virtual XML Views

Efficient Keyword Search Over Virtual XML Views

Integrating Keyword Search into XML Query Processing

Relevance Ranking and Clustering

Efficient Keyword Search over Virtual XML Views

Keyword Proximity Search on XML Graphs

XRANK: Ranked Keyword Search over XML Documents

Processing XML Keyword Search by Constructing Effective Structured Queries

XQuery Processing with Relevance Ranking

XML Keyword Search Refinement

Supporting Top-K Keyword Search in XML Databases

Keyword Search and Keyword Selection

Effective SEO strategies to improve keyword ranking

keyword ranking solutions services

The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking

Effective Keyword Search in Relational Databases