1 / 17

A Generic Framework for Querying and Updating Secondary XML Index Structures

A Generic Framework for Querying and Updating Secondary XML Index Structures. Katharina Grün. Research Methodology. Motivation. Widespread use of XML XML databases for efficient query and update processing Require index structures on content and structure of documents

helene
Download Presentation

A Generic Framework for Querying and Updating Secondary XML Index Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Generic Framework for Querying and Updating Secondary XML Index Structures Katharina Grün

  2. Research Methodology

  3. Motivation • Widespread use of XML • XML databases for efficient query and update processing • Require index structures on content and structure of documents • primary index structure • default index • on whole document • not optimized for specific queries • secondary index structures • created on demand • on specific document fragments • adapted to query workload • Framework for querying and updating secondary XML index structures(SCIENS) Become aware of problem

  4. Running example path: projects/project[1]/@name labelpath: projects/project/@name • //resource[@date>= '2005-01-01'] • //project[@name='sciens']/milestone[@id=2]/resource[@date>='2007-01-01'] • //element(resource, Report)[author='Smith'] Become aware of problem

  5. Challenges • Which secondary index structures are necessary? • each kind of query is best supported by different index structure • not possible to provide one index structure for each possible query • How to integrate them into a common framework? • each secondary index can index arbitrary properties of arbitrary fragments • query and update processing must not depend on specific indices defined • How to update them when documents change? • document updates must be propagated to affected index structures • incremental index maintenance algorithm Become aware of problem

  6. Related work (1) • XML databases • limited support for secondary index structures • XML index structures • structure and/or content • mostly primary index structure • based on different models, proprietary structures • Object-oriented index structures • proprietary structures to support queries on path navigation and/or inheritance hierarchies • Multidimensional index structures • support several value dimensions • do not consider structure Become aware of problem

  7. Related work (2) • Extensible indexing • object-relational databases • adapt index structures to different data types • Indexing tasks • Maintain secondary indices when documents are updated (KeyX1) • Select optimal index for specific query (XML Access Modules2) • Suggest set of indices for query workload (KeyX1) • currently no integrated approach for processing secondary index structures in an XML database 1) B.C.Hammerschmidt: KeyX: Selective Key-Oriented Indexing in Native XML Databases. Phd Thesis, University of Lübeck, 2005. 2) Arion, A., Benzaken, V. and Manolescu, I.: XML Acess Modules: Towards Physical Data Independence in XML Databases. Ximep workshop, 2005. Become aware of problem

  8. SCIENS - Ideas • Structure and Content Indexing with Extensible, Nestable Structures • Which secondary index structures are necessary? • select a small set of index structures and adapt them to various properties • nest index structures to reflect hierarchical queries • How to integrate them into a common framework? • provide an index model • common index interface to query and update indices • How to update them when documents change? • index maintenance algorithm that determines updates for arbitrary indices • based on update fragments and index definitions Suggest solution

  9. Index structures – one dimension (1) • Value indexing • hashtable or B+-tree on value • @date>= '2005-01-01' • Structure indexing • hashtable or B+-tree on path/labelpath/type • //resource • /project[1]//resource • /project[2]/milestone[2]/resource Construct solution

  10. Index structures – one dimension (2) Construct solution

  11. Index structures – multiple dimensions (1) 1) Robinson, J.: The KDB-tree: A search Structure for Large Multidimensional Dynamic Indexes. Sigmod, ACM Press, 1981. Construct solution

  12. Index structures – multiple dimensions (2) Construct solution

  13. Comparison • queries and indices on milestone hierarchy and date • e.g. //project[1]//resource[@date>2005-01-01] • define index that best matches query workload Evaluate solution

  14. Index framework (1) • index • search function consisting of a set of index entries • provides interface to update and retrieve index entries • index entry • maps index keys (value, type, path,…) -> returned nodes • TechnicalReport, Smith -> 3.2.1, 4.3.1,... • index definition • selects nodes to be indexed • //element(resource, $V1)[author=$V2] • represented as unordered tree pattern with index variables • index structure • specific data structure (hash table, prefix B+-tree, kdb-tree) • one index can use several index structures (index nesting) Construct solution

  15. Index framework (2) • index configuration • provides mapping from index to specific index structure • associates with each index variable the index structure to be used • $T1, $E2: kdb-tree • $E2: hash table, $T1: B+-tree • search configuration • used to access index • associates index key to be searched with each index variable • generated by index selection tool • $T1= Report, $E2= 'Smith' Construct solution

  16. Index maintenance • propagate document updates to affected indices • steps • find embeddings of index patterns in update fragments • execute queries • generate index entries [(TechnicalReport, 'Smith')  resource] [(TechnicalReport, 'Tim')  resource] • up to 9 times faster than existing approach (KeyX) Construct / evaluate solution

  17. Conclusion • select secondary index structures for XML • extensible: various properties and operations on these properties • nestable: adapt indices to hierarchical queries • integrate index structures into framework • hides indexing tasks from query and update processing tasks • provides index model (common index interface) • index maintenance algorithm • propagate updates to index structures • flexibility to define indices that match the query workload

More Related