1 / 38

Management of XML and Semistructured Data

Management of XML and Semistructured Data. Lecture 7: XML-QL, Structural Recursion Monday, April 23, 2001. XML-QL. First declarative language for XML How to obtain a query language for XML fast ? Assume OEM as data model Use features from UnQL and StruQL Patterns Templates

Download Presentation

Management of XML and Semistructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Management of XML and Semistructured Data Lecture 7: XML-QL, Structural Recursion Monday, April 23, 2001

  2. XML-QL • First declarative language for XML • How to obtain a query language for XML fast ? • Assume OEM as data model • Use features from UnQL and StruQL • Patterns • Templates • Skolem functions • Design XML-like syntax

  3. Patterns in XML-QL Find all authors who published in Morgan Kaufmann: WHERE <booklanguage=“french”> <publisher> <name> Morgan Kaufmann </> </> <author> $A </> </book> in “www.a.b.c/bib.xml” CONSTRUCT <author> $A </> Abbreviation: </> closes any tag.

  4. Patterns in XML-QL Find all languages in which Jones’ coauthors have published: where <booklanguage=$X> <author> $A </author> </book> in “www.a.b.c/bib.xml” <book> <author> $A </author> <author> Jones </author> </book> in “www.a.b.c/bib.xml” construct <result> $X </> There is a join here…

  5. Constructors in XML-QL Find all authors and the languages in which they published: where <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” construct <result> <author> $A </> <lang> $L </> </> • Result is: • <result> <author>Smith</author> <lang>English </lang> </result> • <result> <author>Smith</author> <lang>Mandarin</lang> </result> • <result> <author>Doe </author> <lang>English </lang> </result> • . . . .

  6. Nested Queries in XML-QL Find all authors and the languages in which they published; group by authors: WHERE <book.author> $A </> in “www.a.b.c/bib.xml” CONSTRUCT <result> <author> $A </> WHERE <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <lang> $L </> </> Note: book.author is a (regular) path expression

  7. <result> <author>Smith</author> <lang>English</lang> <lang>Mandarin</lang> <lang>…</lang> … </result> <result> <author>Doe</author> <lang>English</lang> … </result> Result is:

  8. Skolem Functions in XML-QL Same query, with Skolem functions WHERE <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <result id=F($A)> <author> $A</> <lang> $L </> </> • Assumptions: • the ID attribute is always id • default Skolem function for author is G($A), for lang is H($A, $L) (why ?)

  9. Skolem Functions in XML-QL Object fusion with Skolem functions and block structure - Compile a complete list of authors, from two sources { WHERE <book> <author> $A </> <title> $T </> </> in “www.a.b.c/bib.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <booktitle> $T</> /* implicit Skolem function H($A, $T) */ </> } { WHERE <paper> <author> $A </> <title> $T </> <journal> $J </> </> in “www.d.e.f/papers.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <papertitle> $T</> /* implicit Skolem function J($A, $T) */ <journaltitle> $J</> /* implicit Skolem function K($A, $T) */ </> }

  10. <person> <name>Smith</name> <booktitle>Book1</booktitle > <booktitle>Book2</booktitle > </result> <person> <name>Jones</name> <booktitle>Book3</booktitle > <papertitle>paper1</papertitle > <journaltitle>journal1</journaltitle > </result> <person> <name>Mark</name> <papertitle>paper2</papertitle > <journaltitle>journal3</journaltitle > </result> … Result: (some have only books, Others only papers, Others have both)

  11. Skolem Functions in XML-QL “Wrong” query number 1: WHERE <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <result id=F($A)> <author id=G($A)> $A</> <lang id=H($A)> $L </> </> What is “wrong” here ?

  12. Skolem Functions in XML-QL “Wrong” query number 2: WHERE <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <result id=F($A,$L)> <author id=G($A)> $A</> <lang id=H($A,$L)> $L </> </> What is “wrong” here ?

  13. Skolem Functions in XML-QL “Wrong” query number 3: { WHERE <booklanguage = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <author id=F($A)> <lang id=H($A,$L)> $L </> </> } { WHERE <person> <city> $C </> <fluent-in> $X </> </> in “www.a.b.c/bib.xml” CONSTRUCT <location id=G($C)> <lang id=H($C,$L)> $L </> </> }

  14. Three Rules to Construct Only Trees Rule 1: nested elements must have Skolem functions that are… [how ??] Rule 2: an element that has an atomic content must have a Skolem function that is… [how ??] Rule 3: if a Skolem function occurs in two different places than the following condition must hold… [which ??] CONSTRUCT <tag1 id=F([args1])> <tag2 id=G([args2])> …</> </> CONSTRUCT … <tag id=F([args])> $X </> … { CONSTRUCT <tag1 id=G([args1])> <tag id=F([args])> …</> </> } { CONSTRUCT <tag1 id=H([args2])> <tag id=F([args])> …</> </> }

  15. XML-QL v.s. XQuery • Xquery (=Quilt) v.s. XML-QL + faithful XML data model + Xpath sublanguage + aggregate functions (like in SQL) + some features from XQL • Patterns • Skolem functions

  16. A Different Paradigm:Structural Recursion Data as sets with a union operator: {a:3, a:{b:”one”, c:5}, b:4} = {a:3} U {a:{b:”one”,c:5}} U {b:4}

  17. a b a result result result 3 c b 4 3 5 4 “one” 5 Structural Recursion Example: retrieve all integers in the data f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = f($T) f({}) = {} f($V) = if isInt($V) then {result: $V} else {}

  18. Structural Recursion What does this do ? f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)} f({}) = {} f($V) = $V

  19. Structural Recursion What does this do ? f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:{$L:f($T)}} f({}) = {} f($V) = $V Input = tree with n nodes Output = ???

  20. f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L= engine then {$L: g($T)} else {$L: f($T)} f({}) = {} f($V) = $V g($T1 U $T2) = g($T1) U g($T2) g({$L: $T}) = if $L= price then {$L:1.1*$T} else {$L: g($T)} g({}) = {} g($V) = $V engine engine body body part part price price price price part part price price price price 1100 1000 1000 1000 100 110 100 100 Structural Recursion Example: increase all engine prices by 10%

  21. f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L= a then g($T} U $T else { } f({}) = { } f($V) = { } g($T1 U $T2) = g($T1) U g($T2) g({$L: $T}) = if $L= b then f($T) else { } g({}) = { } g($V) = { } Structural Recursion Retrieve all subtrees reachable by (a.b)*.a a b a

  22. Structural Recursion: General Form f1($T1 U $T2) = f1($T1) U f1($T2) f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T) f1({}) = { } f1($V) = { } . . . . fk($T1 U $T2) = fk($T1) U fk($T2) fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T) fk({}) = { } fk($V) = { } Each of E1, ..., Ek consists only of {_ : _}, U, if_then_else_

  23. Evaluating Structural Recursion Recursive Evaluation: • Compute the functions recursively, starting with f1 at the root Termination is guaranteed. How efficiently can we evaluate this ?

  24. Structural Recursion Consider this: f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:f($T)}, $L:f($T)} f({}) = {} f($V) = $V

  25. Naive Recursive Evaluation a a a b b b b b c c c c c c c c c d Input tree = n nodes Output tree = 2n+1 – 1 nodes

  26. a a a b b b c c c d d d Efficient Recursive Evaluation Recursive Evaluation with function memorization. PTIME complexity. f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = {$L:f($T)}, $L:f($T)} f({}) = {} f($V) = $V Alternatively: apply the function in parallel to each input edge  Bulk Evaluation

  27. a  b d  c d d Bulk Evaluation Sometimes f doesn’t return anything  use  edges f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L=c then $T else f($T) f({}) = {} f($V) = $V

  28. Epsilon Edges Meaning of  edges: a b a b  = c d c d c d

  29. Epsilon Edges Note: union becomes easy to draw with  edges: Example:   T1 T2 U = T1 T2   a b U a b c d e = c d e = e a c d b

  30. f1($T1 U $T2) = f1($T1) U f1($T2) f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T) f1({}) = { } f1($V) = { } . . . . fk($T1 U $T2) = fk($T1) U fk($T2) fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T) fk({}) = { } fk($V) = { } Bulk Evaluation Idea: “apply” E1, ..., Ek independently on each edge, then connect with  edges  PTIME

  31. f($T1 U $T2) = f($T1) U f($T2) f({$L: $T}) = if $L= a then g($T} U $T else { } f({}) = { } f($V) = { } g($T1 U $T2) = g($T1) U g($T2) g({$L: $T}) = if $L= b then f($T) else { } g({}) = { } g($V) = { } Bulk Evaluation Recall (a.b)*.a: a b b a a a a a b d a b b a a b c d a a b d d c b b c c

  32. Structural Recursion • Can evaluate in two ways: • Recursively: memorize functions’ results • Bulk: apply all functions on all edges, in parallel, connect, eliminate what is useless • Complexity: PTIME • More precisely: NLOGSPACE • Works on graphs with cycles too !

  33. XSL • two W3C drafts: XSLT and XPATH • http://www.w3.org/TR/xpath, 11/99 • http://www.w3.org/TR/WD-xslt, 11/99 • in commercial products (e.g. IE5.0) • purpose: stylesheet specification language: • stylesheet: XML -> HTML • in general: XML -> XML

  34. Retrieve all book titles: <xsl:template> <xsl:apply-templates/> </xsl:template> <xsl:templatematch = “/bib/*/title”> <result> <xsl:value-of/> </result> </xsl:template> XSL Templates and Rules • query = collection of template rules • template rule = match pattern + template

  35. Flow Control in XSL <xsl:template> <xsl:apply-templates/> </xsl:template> <xsl:templatematch=“a”> <A><xsl:apply-templates/></A> </xsl:template> <xsl:templatematch=“b”> <B><xsl:apply-templates/></B> </xsl:template> <xsl:templatematch=“c”> <C><xsl:value-of/></C> </xsl:template>

  36. <a> <e> <b> <c> 1 </c> <c> 2 </c> </b> <a> <c> 3 </c> </a> </e> <c> 4 </c> </a> <A> <B> <C> 1 </C> <C> 2 </C> </B> <A> <C> 3 </C> </A> <C> 4 </C> </A>

  37. XSL is Structural Recursion Equivalent to: f(T1 U T2) = f(T1) U f(T2) f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t) f({}) = {} f(V) = V XSL query = single function XSL query with modes = multiple function

  38. XSL: trees only may loop Structural Recursion: arbitrary graphs always terminates XSL and Structural Recursion add the following rule: <xsl:templatematch = “e”> <xsl:apply-patternsselect=“/”/> </xsl:template> stack overflow on IE 5.0

More Related