1 / 36

Xquery Streaming à la Carte &

Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-56 1 : Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό. Xquery Streaming à la Carte & Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation.

jonah-vega
Download Presentation

Xquery Streaming à la Carte &

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Πανεπιστήμιο ΚρήτηςΣχολή Θετικών ΕπιστημώνΤμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte & Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation

  2. XQuery Streaming à la CarteIntroduction Introduction • Existing XML query evaluation techniques • Algebraic optimization with algorithms for persistent data • Streaming algorithms for transient data • New Idea • Physical algebra for XQuery • À la carte use of streaming algorithms & optimization techniques Konstantinos Galanakis

  3. XQuery Streaming à la CarteIntroduction Diverse Data Sources Join of local repository and streaming source Konstantinos Galanakis

  4. XQuery Streaming à la CartePreliminaries Preliminaries • List • Immutable ordered sequence of homogenous values • Cursor • Mutable ordered sequence of homogenous values • Destructive • C(α): Cursor containing values of type α • Operators • fromList • next • peek Konstantinos Galanakis

  5. XQuery Streaming à la CartePhysical Data Model Physical Data Model 1/2 • Physical Value • Physical XML value, (Xml) • Cursor of XML tokens, C(Tok) • List of tree values, L(Tree) • Physical table, (Table) • Cursor of tuples, C(τ) • Physical Tuple, τ: record of fields containing physical XML values • List of tuples, L(τ) • XML Token, (Tok): • Parsing event produced by SAX parser Konstantinos Galanakis

  6. XQuery Streaming à la CartePhysical Data Model Physical Data Model 2/2 • XML Token, (Tok) : Parsing event produced by SAX parser • startElem • endElem • text • atomic • hole Konstantinos Galanakis

  7. XQuery Streaming à la CartePhysical Representation & Conversion Physical Representation & Conversion Konstantinos Galanakis

  8. XQuery Streaming à la CartePhysical Algebra – Overview & Operators Physical algebra for logical Algebra proposed in C. Re, J. Simeon and M. Fernandez , “A complete and efficient algebraic compiler for XQuery”, In ICDE 2006 Konstantinos Galanakis

  9. XQuery Streaming à la CartePhysical Algebra - Constructors Constructors Konstantinos Galanakis

  10. XQuery Streaming à la CartePhysical Algebra – Navigation Operators Navigation Operators 1/3 • TreeProject • Projection of path expressions on a tree. • Injected after Parse to reduce the plan input size. • TreeJoin • Returns a node sequence in document order with no duplicate • Strictly-forward path expressions • self axes • child axes • descendant axes • descendant-or-self axes • attribute axes Konstantinos Galanakis

  11. XQuery Streaming à la CartePhysical Algebra – Navigation Operators Navigation Operators 2/3 desc-or-self::section child::title Compiled in physical plan Applying the plan to an input document Konstantinos Galanakis

  12. XQuery Streaming à la CartePhysical Algebra – Tuple Operators Tuple operators 1/2 • Polymorphic Operators except MapFromItem • MapFromItem • Input → Item sequence • Output → Tuple for each item • 2 implementations • For Lists of trees and for token cursors • Relies to map and split Konstantinos Galanakis

  13. XQuery Streaming à la CartePhysical Algebra – Tuple Operators Tuple operators 2/2 Konstantinos Galanakis

  14. XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 1/4 • Mapping from a logical plan (Op) to a physical plan (POp). • CS(Op) → POp • Physical plan correctness • Stream safety • Sufficient to ensure correctness Konstantinos Galanakis

  15. XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 2/4 Op Conditions for Stream Safety Navigational access on the XML values returned by Op is strictly forward Tuples returned by Op consumed in the order of creation Tuple fields returned by Op accessed at most once Konstantinos Galanakis

  16. XQuery Streaming à la CartePhysical Algebra – Code Selection Code selection 3/4 • Code selection heuristic based assumptions • conversion between physical representations is expensive • streaming operators are more efficient on streamed sources • copying whole sub-trees is expensive and should be avoided • Following rules are applied to each subplan Op of a whole plan Op0, bottom-up • If a)inputs of Op are streamed, b)streaming operators POp exists for OP c)Op is stream-safe, then CS(Op) selects Op • If Op is a constructor operator, CS(Op) uses a streaming operator. Konstantinos Galanakis

  17. XQuery Streaming à la CarteExperimental Evaluation Experimental Evaluation 1/2 • Experiments on synthetic data • verify linear scalability of streaming operators w.r.t. query and document sizes • run over MemBeR documents in XCheck framework • XMark benchmarks • Q2, 6, 15 are fully streamable • Q1, 4, 5, 7, 14, 16 – 19 are partially streamable • Self-join queries Q8 – 12 /Q20 not streamable Konstantinos Galanakis

  18. XQuery Streaming à la CarteExperimental Evaluation Experimental Evaluation 1/2 Konstantinos Galanakis

  19. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction General • Buffer manager of a streaming Xquery will • Only relevant query evaluation data put into buffer • Avoid keeping data buffered longer than necessary • Avoid keeping multiple copies of the data in buffers • Claim: Combination of static analysis and dynamic buffer minimization techniques needed Konstantinos Galanakis

  20. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction Previous Work 1/2 XQuery Projection Paths <q> { for $b in /bib/book where ($b/author=“A. Turing” and fn:exists($b/price)) return $b/title } </q> { /bib/book, /bib/book/author/ dos::node(), /bib/book/price, /bib/book/title/ dos::node() } XML Document bib book book article author price title isbn author price title isbn … … … … … … … … … … … Konstantinos Galanakis

  21. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationIntroduction Previous Work 2/2 XQuery <q> { for $x1 in //book return for $x2 in //* return for $x3 in //article return <node/> } </q> Two approaches: (1) Single DOM-tree (2) Buffers for variables Konstantinos Galanakis

  22. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationActive Garbage Collection Active Garbage Collection • Buffer management technique for Xquery Engines • Both static and dynamic analysis is exploited • Basic idea • Which data objects won’t be accessed in the future • A.G.C. Strategy • Reference counting • New approach • Roles assigned to nodes • Multiple roles per node • Multiple nodes per role • signOff-statement Konstantinos Galanakis

  23. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationMain Idea Input stream Roles Projection Tree Buffer (Nodes role annotation) Role removal (A.G.C.) XQuery normalizations Rewritten Xquery (Role updates) Variable bindings Evaluator Output stream XQuery Konstantinos Galanakis

  24. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Query Language • XQ is an XQuery Fragment • Nested for-expressions • Conditions • Joins • Covers syntactically simple fragments of Xquery • Assume that syntactically richer fragment could be evaluated • Remove let-expressions → Query normalization • Rewrite where-conditions to if-then-else expressions • Replace for-loop with nested single step for-loops Konstantinos Galanakis

  25. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language where-expressions → if-statement <r> { for $b in /bib return ( if (fn:exists($b/book)) then <books> else (), if (fn:exists($b/book)) then $b/book else (), if (fn:exists($b/book)) then </books> else () ) } </r> <r> { for $b in /bib where (fn:exists($b/book)) return <books>{ $b/book }</books> } </r> Konstantinos Galanakis

  26. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language where-expressions → if-statement pushing if-statements <r> { for $b in /bib return ( if (fn:exists($b/book)) then <books> else (), if (fn:exists($b/book)) then $b/book else (), if (fn:exists($b/book)) then </books> else () ) } </r> <r> { for $b in /bib where (fn:exists($b/book)) return <books>{ $b/book }</books> } </r> Konstantinos Galanakis

  27. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role extraction <r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title ) } </r> / /bib /* /book /price[1] dos::node() /title/dos::node() KonstantinosGalanakis

  28. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role assignment Roles XML document r1 / r2 /bib r3 /bib/* r4 /bib/*/price[1] r5 /bib/*/dos::node() r6 /bib/book r7 /bib/book/title/dos::node() { r2 } bib { r3, r5, r6 } book { r5, r7 } { r5 } title author Roles assigned to document node when projected into buffer On-the-fly role assignment Nodes without roles and role-carrying ancestors need not to be buffered Konstantinos Galanakis

  29. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Role update inserting <r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r> <r> { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title) } </r> r1 / r2 /bib $bib r3 /bib/* $x r4 /bib/*/price[1] $x/price r5 /bib/*/dos::node() $x r6 /bib/book $b r7 /bib/book/title/dos::node() $b/title Konstantinos Galanakis

  30. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationQuery Language Active Garbage Collection <r> { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } </r> Input stream: <bib> <book> <title/> <author/> </book> … Buffer: {r2} bib {r6} {r5 , r6} {r3 , r5 , r6} book {r5 , r7} {r7} {} {r5} title author Output stream: <r> <book> <title/> <author/> </book> Konstantinos Galanakis

  31. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationOptimizations Path steps → for-expressions <r> { for $bib in /bib (return $bib/book, signOff($bib,r1), signOff($bib/book/dos::node(),r2)) } </r> <r> { for $bib in /bib return $bib/book } </r> <r> { for $bib in /bib return (for $_1 in $bib/book (return $_1/book, signOff($_1/book/dos::node(),r2)), signOff($bib,r1)) } </r> <r> { for $bib in /bib return for $_1 in $bib/book return $_1/book } </r> Aggregated roles Remove redundant roles Konstantinos Galanakis

  32. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 1/5 • Time and memory consumption • Queries and documents from the XMark Benchmark • Queries and documents modified to match the supported fragment • 3GHz CPU Intel Pentium IV with 2GB RAM • SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems • Time limit: 1 hour • Benchmarks against the following systems • FluX • Java in-memory engine for streaming XQuery evaluation. • MonetDB v4.12.0/XQuery v0.12.0 • A secondary storage engine written in C++. Loading of the document is included in time measurements. • QizX/open v1.1 • Free in-memory XQuery engine written in Java. • Saxon v8.7.1 • Free in-memory XQuery engine written in Java. KonstantinosGalanakis

  33. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 2/5 XMark Q1: Running time (s) <query1> { for $s in /site return  for $p in $s/people return   for $pe in $pe/person return   if ($pe/person_id="person0")   then <result>{ $pe/name }</result>   else () } </query1> Konstantinos Galanakis

  34. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 3/5 Memory Consumption (MB) XMark Q1: <query1> { for $s in /site return  for $p in $s/people return   for $pe in $pe/person return   if ($pe/person_id="person0")   then <result>{ $pe/name }</result>   else () } </query1> Konstantinos Galanakis

  35. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 4/5 XMark Q8: <query8> {  for $root in (/) return  for $site in $root/site return  for $people in $site/people return  for $person in $people/person return    <item> { ( <person>{ $person/name }</person>,      <items_bought> {      for $site2 in $root/site return      for $cas in $site2/closed_auctions return      for $ca in $cas/closed_auction return         for $buyer in $ca/buyer return         if ($buyer/buyer_person=$person/person_id)         then <result> { $ca } </result>         else () } </items_bought> ) } </item> } </query8> Konstantinos Galanakis

  36. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery EvaluationBenchmarking Benchmark Results 5/5 XMark Q8 Konstantinos Galanakis Failure for 100MB: MonetDB – Failure for 200MB: GCX, FluxQuery, MonetDB

More Related