1 / 38

Lecture 13: XQuery XML Publishing, XML Storage

Lecture 13: XQuery XML Publishing, XML Storage. Monday, October 28, 2002. Organization. Project: Next phase, need to form companies Please form group of 3, email  Tessa by Thursday Problems, little extra-credit: One group will have only two people

natala
Download Presentation

Lecture 13: XQuery XML Publishing, XML Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 13: XQueryXML Publishing, XML Storage Monday, October 28, 2002

  2. Organization Project: • Next phase, need to form companies • Please form group of 3, email  Tessa by Thursday • Problems, little extra-credit: • One group will have only two people • One billing, one shipping volunteers to do inventory Homework: • Good practice for the midterm • Try to finish before Monday

  3. Organization Midterm • Next Monday, 11/3 • Missed it ? You will get this score: • MidtermScore = 100 – 1.2(100 – FinalScore) • In other words, you will loose 20% more points than on the final

  4. Overview • DTDs: elements and attributes • XQuery

  5. Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

  6. Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ... </company>

  7. Content Model • Element content: what we can put in an element (aka content model) • Content model: • Complex = a regular expression over other elements • Text-only = #PCDATA • Empty = EMPTY • Any = ANY • Mixed content = (#PCDATA | A | B | C)* • (i.e. very restrictied)

  8. Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIST personageCDATA #REQUIRED> <personage=“25”> <name> ....</name> ... </person>

  9. Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIST personageCDATA #REQUIRED idID #REQUIRED managerIDREF #REQUIRED managesIDREFS #REQUIRED > <personage=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ... </person>

  10. Attributes in DTDs Types: • CDATA = string • ID = key • IDREF = foreign key • IDREFS = foreign keys separated by space • (Monday | Wednesday | Friday) = enumeration • NMTOKEN = must be a valid XML name • NMTOKENS = multiple valid XML names • ENTITY = you don’t want to know this

  11. Attributes in DTDs Kind: • #REQUIRED • #IMPLIED = optional • value = default value • value #FIXED = the only value allowed

  12. Using DTDs • Must include in the XML document • Either include the entire DTD: • <!DOCTYPE rootElement [ ....... ]> • Or include a reference to it: • <!DOCTYPE rootElement SYSTEM “http://www.mydtd.org”> • Or mix the two... (e.g. to override the external definition)

  13. FLWR (“Flower”) Expressions FOR ... LET... WHERE... RETURN...

  14. Sample Data for Queries (more or less) <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  15. FOR-WHERE-RETURN Find all book titles published after 1995: FOR$xINdocument("bib.xml")/bib/book WHERE$x/year/text() > 1995 RETURN$x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>

  16. Equivalently (perhaps more geekish) FOR-WHERE-RETURN FOR$xINdocument("bib.xml")/bib/book[year/text() > 1995] /title RETURN$x And even shorter: document("bib.xml")/bib/book[year/text() > 1995] /title

  17. FOR-WHERE-RETURN • Find all book titles and the year when they were published: FOR$xINdocument("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

  18. FOR-WHERE-RETURN • Notice the use of “{“ and “}” • What is the result without them ? FOR$xINdocument("bib.xml")/ bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

  19. XQuery: Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR$b IN document(“bib.xml”)/bib,$aIN$b/book[publisher /text()=“Morgan Kaufmann”]/author RETURN <result> { $a, FOR$tIN$b/book[author/text()=$a/text()]/title RETURN$t} </result> In the RETURN clause comma concatenates XML fragments

  20. XQuery Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

  21. Aggregates Find all books with more than 3 authors: FOR$x IN document("bib.xml")/bib/bookWHEREcount($x/author)>3 RETURN$x count = a function that counts avg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates

  22. Aggregates Same thing: FOR$x IN document("bib.xml")/bib/book[count(author)>3] RETURN$x

  23. Aggregates Print all authors who published more than 3 books – be aware of duplicates ! FOR$b IN document("bib.xml")/bib, $aINdistinct-values($b/book/author/text())WHEREcount($b/book[author/text()=$a)>3 RETURN <author> { $a } </author>

  24. XQuery Find books whose price is larger than average: FOR$b in document(“bib.xml”)/bibLET$a:=avg($b/book/price/text()) FOR$x in $b/book WHERE$x/price/text() > $a RETURN$x

  25. FOR-WHERE-RETURN • “Flatten” the authors, i.e. return a list of (author, title) pairs FOR$bINdocument("bib.xml")/bib/book,$xIN$b/title/text(),$yIN$b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer> Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>

  26. FOR-WHERE-RETURN • For each author, return all titles of her/his books Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . . </answer> FOR$b IN document("bib.xml")/bib,$xIN$b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer> What aboutduplicateauthors ?

  27. FOR-WHERE-RETURN • Same, but eliminate duplicate authors: FOR$b IN document("bib.xml")/bibLET$a := distinct-values($b/book/author/text())FOR$xIN$aRETURN <answer> <author> $x </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer>

  28. FOR-WHERE-RETURN • Same thing: FOR$b IN document("bib.xml")/bib,$xINdistinct-values($b/book/author/text())RETURN <answer> <author> $x </author> { FOR$yIN$b/book[author/text()=$x]/titleRETURN$y } </answer>

  29. FOR-WHERE-RETURN Find book titles by the coauthors of “Database Theory”: FOR$b IN document("bib.xml")/bib,$xIN$b/book[title/text() = “Database Theory”],$yIN$b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer > Question: Why do we get duplicates ?

  30. Distinct-values Same as before, but eliminate duplicates: FOR$b IN document("bib.xml")/bib,$xIN$b/book[title/text() = “Database Theory”]/author/text(),$yINdistinct-values($b/book[author/text() = $x] /title/text()) RETURN <answer> { $y } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer > distinct-values = a function that eliminates duplicates Need to apply to a collectionof text values, not of elements – note how query has changed

  31. FOR$y in /db/Company/row[city/text()=“Seattle”],$x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name } CoolXQuery SQL and XQuery Side-by-side Product(pid, name, maker)Company(cid, name, city) Find all products made in Seattle FOR$r in document(“db.xml”)/db,$x in $r/Product/row,$y in $r/Company/rowWHERE$x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name } SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle” SQL XQuery

  32. <db> <product> <row> <pid> ??? </pid> <name> ??? </name> <maker> ??? </maker> </row> <row> …. </row> … </product> . . . . </db>

  33. XQuery • FOR$x in expr -- binds $x to each value in the list expr • LET$x := expr -- binds $x to the entire list expr • Useful for common subexpressions and for aggregations

  34. XQuery Find all publishers that published more than 100 books: <big_publishers> { FOR$pINdistinct-values(//publisher/text()) LET$b := /db/book[publisher/text() = $p] WHEREcount($b) > 100 RETURN <publisher> { $p } </publisher> } </big_publishers> $bis a collection of elements, not a single element count = a (aggregate) function that returns the number of elms

  35. XQuery Summary: • FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause Instance of Xquery data model

  36. FOR v.s. LET FOR • Binds node variables iteration LET • Binds collection variables one value

  37. FOR v.s. LET Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... FOR$xIN /bib/book RETURN <result> { $x } </result> LET$x := /bib/book RETURN <result> { $x } </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>

  38. Collections in XQuery • Ordered and unordered collections • /bib/book/author/text() = an ordered collection: result is in document order • distinct-values(/bib/book/author/text()) = an unordered collection: the output order is implementation dependent • LET$a := /bib/book $a is a collection • $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> <author>...</author> ... </result> RETURN <result> { $b/author } </result>

More Related