1 / 33

II. XML Data Management

II. XML Data Management. A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data management. What is XML?.

Jims
Download Presentation

II. XML Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. II. XML Data Management A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data management

  2. What is XML? • Acronym for eXtensible Markup Language • Syntax for structuring data and documents in human-readable form • THE "Syntax of the WEB" • Meta language for defining languages • Bases of many extensions • Namespaces • Stylesheets • Hyperlinks • Schemata • Standardized by W3Chttp://www.w3.org/TR/REC-xml HS / DBSII-03-XML-1

  3. What XML is Not.. • No protocol • Language for describing data • Used as data format in protocols • Protocols may be syntactically defined by XML • No programming languagebut • XML documents may contain code fragments • New languages allow for XML – code as part of the language (Xen, a MS extension of C# ) • Some XML extensions with superimposed PL semantics,rule semantics in XSLT • No magic semantics • Interpretation by humans, applications, standards derived from XML HS / DBSII-03-XML-1

  4. Why XML? • … not a question any more, since widely adopted • Simple • Extensible • Easy to process • Easy to generate • Data interchange critical for networked applications "XML will be the ASCII of the Web: basic, essential, unexciting" Tim Bray ... it is already HS / DBSII-03-XML-1

  5. Prologue <?xml version="1.0"?> <PURCHASE_ORDER> <PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM ItemNum ="2"> < QUNTY > 2 </ QUNTY > <PRICE> 14.53 </PRICE> </ITEM> </PURCHASE_ORDER> Attribute Elements XML example • Pre-XML representation of data: • XML representation of the same data: “PO-1234”,”CUST001”,”X9876”,”5”,”14.98” HS / DBSII-03-XML-1

  6. {ItemNum=X9876 } ITEM PRICE 2 14.53 XML example • Graphical representation PURCHASE_ORDER PO_NUM Cust:_ID PO-1234 CUST001 QUNTY XML documents - tree structured - Data an metadata in the same document (as opposed to RDBS) HS / DBSII-03-XML-1

  7. XML Usage • Two basic types of XML usage Document centric (document oriented) • structuring a digital document, including logical layout • primary focus of SGML - predecessor of XML • Data centric • Description of data in a self describing form for later processing • Distinction not totally clear • See purchase order example: If typical document characteristic included (company addr.,customer addr, date, …, company logo) it would be a document oriented usage of XML HS / DBSII-03-XML-1

  8. Document centric XML documents: example <Product> <Name>Variabler Maulschlüssel</Name> <Developer> Full Fabrication Labs, Inc. </Developer> <Summary> Großer, verstellbarer Schraubenschlüssel</Summary> <Description> <Para>Der Engländer besteht aus erstklassigem Stahl und besitzt einen gummierten Handgriff. Die Maulgröße liegt zwischen 0 und 32 mm. </Para> <Para>Sie können..... </Para> <List> <Item> <Link URL="Order.html"> Bestellen </Link></Item> <Item> <Link URL="Wrenches.htm"> Andere Werkzeuge ansehen</Link> </Item> <Item> <Link URL="catalog.zip"> Den Katalog herunterladen</Link> </Item> </List> <Para>Der Schraubenschlüssel kostet 15.33 Euro inkl. MWSt. Wenn Sie jetzt bestellen, erhalten Sie zusätzlich unsere wertlose Hobbybastler-Fibel.</Para> </Description> </Product> Typical:Long text elements HS / DBSII-03-XML-1

  9. Data centric XML documents: example <Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> <CustName> ABC Industries</CustName> <Street> 123 Main St.</Street> <City>Chicago</City> .... </Customer> <Line LineNumber="1"> <Part PartNumber="123"> <Description> <p><b> Turkey wrench:</b><br /> Stainless steel, one-piece construction, lifetime guarantee.</p> </Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Line> ....... </SalesOrder> </Orders> HS / DBSII-03-XML-1

  10. XML Syntax • One, and only one, root element • Sub-elements must be properly nested • A tag must end within the tag in which it was started • Attributes are optional • Attribute values must be enclosed in “” or ‘’ • No data type but 'string' • Processing instructions optional • XML is case-sensitive • <tag> and <TAG> are not the same type of element HS / DBSII-03-XML-1

  11. Why hierarchical "data model"? • Hierachies (nesting) in data bases? Why not? • REDUNDANCY! Multiple items, customers, … occur multiple times in different orders Normalization replaces redundancies by foreign keys OO / OR – Data bases?? • Nesting useful in data transfer • External application does not have access to foreign key / to database. HS / DBSII-03-XML-1

  12. XML Attributes vs Elements • Distinction between subelement and attribute • In the context of documents: • attributes are part of markup • subelement contents part of the basic document contents • In the context of data representation: difference not clear, but confusing • Same information can be represented in two ways • <account account-number = “A-101”> …. </account> • <account> <account-number> A-101 </account-number> … </account> • Suggestion: use attributes for identifiers of elements use subelements for contents HS / DBSII-03-XML-1

  13. DBMS DBMS How to use XML data? • Basic Idea Applicationwith XML-Generator DOM SAX Receiving application XML-Parser Standard- Interfaces How does application know about - syntactical correctness - data semantics ? HS / DBSII-03-XML-1

  14. Different encodings • specified by encoding attribute Correct or not correct ? HS / DBSII-03-XML-1

  15. Correctness of XML documents • Syntactic correctness • Conformance to XML syntax • Document structured according to XML syntax is well-formed • Compare Syntax checker for program • Semantic correctness • Given Meta level description of XML documents:Document Type Definition (DTD) or XML Schema • Document is valid with respect to DTD (Schema) if all definitions and restrictions have been fulfilled • No DTD allowed, applications must know, what is meant • What is semantics?? • Interpretation of tags is a matter of humans and/or the application program: <xyz> could mean "book title" or "first name" or… HS / DBSII-03-XML-1

  16. xmlns: bk = “http://www.example.com/bookinfo/” Namespace declaration Prefix URI (URL) XML Namespaces • Part of XML’s extensibility • Allow autonomous users to differentiate between tags of the same name (using a prefix) • Frees author to focus on the data and decide how to best describe it • Allows multiple XML documents from multiple authors to be merged HS / DBSII-03-XML-1

  17. Namespace • Examples • No prefix: all elements belong to same namespace <BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE> <BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> HS / DBSII-03-XML-1

  18. DTD and XML schema • Type of XML document defined as • DTD - not expressible in XML syntax • XML schema • Document Type Definition (DTD) • Does not constrain types: all values are strings in XML • Syntax <!ELEMENT elem (subelement-spec)> <!ATTLIST elem (attribute-specs) > HS / DBSII-03-XML-1

  19. DTD: elements and attributes • Example (element decl) <!ELEMENT depositor (customer-name account-number)> <!ELEMENT customer-name (#PCDATA) > <!ELEMENT account-number (#PCDATA)> • Subelements • names of elements • #PCDATA (parsed character data), i.e., character strings • EMPTY (no subelements) or ANY (anything can be a subelement) • Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> • Notation: • “|” : alternatives • “+” : 1 or more occurrences  "?" 0 or one • “*” : 0 or more occurrences HS / DBSII-03-XML-1

  20. DTD example <!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account-number branch-name balance)> <!ELEMENT customer (customer-name customer-street customer-city)> <!ELEMENT depositor (customer-name account-number)> <!ELEMENT account-number (#PCDATA)> <!ELEMENT branch-name (#PCDATA)> <!ELEMENT balance (#PCDATA)> <!ELEMENT customer-name (#PCDATA)> <!ELEMENT customer-street (#PCDATA)> <!ELEMENT customer-city (#PCDATA)> ]> HS / DBSII-03-XML-1

  21. DTD attributes • Attribute specification : for each attribute • Name • Type of attribute • CDATA • ID (identifier) or IDREF (ID reference) or IDREFS • more on this later • Whether • mandatory (#REQUIRED) has a default value (value), • or neither (#IMPLIED) • Examples • <!ATTLIST account acct-type CDATA “checking”> • <!ATTLIST customer customer-id ID # REQUIRED accounts IDREFS # REQUIRED> HS / DBSII-03-XML-1

  22. DTD attribute ID • At most one attribute of type ID per element • ID attribute value of each element in an XML document must be distinct • ID attribute value is object identifier • attribute of type IDREF must contain the ID value of an element in the same document • attribute of type IDREFS contains a set of (0 or more) ID values. ID value must contain the ID value of an element in the same document • ID, IDREF, IDREFS do not designate a particular domain (no type!) HS / DBSII-03-XML-1

  23. DTD declaration External DTD-declaration<?xml version="1.0"><!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd"><bank> ... </bank> Internal DTD-declaration<!DOCTYPE custDesc [ <!ELEMENT custDesc (#PCDATA)> ]><custDesc> consumer rights protagonist </custDesc> Mixed usage<!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd" [ <!ATTLIST bankDescr CDATA #REQUIRED>]><bank Descr=" mostly private customers and ATM"> ... </bank> HS / DBSII-03-XML-1

  24. DTD limits • No typing of text elements and attributes • All values are strings, no integers, reals, etc. • Difficult to specify unordered sets of subelements • Order is usually irrelevant in databases • (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • How to express: a, b and c in arbitrary order? <!ELEMENT a ((b,c,d) | (c,b,d) | (b,d,c), ...)> • IDs and IDREFs are untyped • The owners attribute of an account may contain a reference to another account, which is meaningless • owners attribute should ideally be constrained to refer to customer elements HS / DBSII-03-XML-1

  25. XML Schema • XML Schema (XSD): much more expressible Schema language compared to DTD schemas • Typing of values • E.g. integer, string, etc • constraints on min/max values • User defined types • specified in XML syntax, unlike DTDs • More standard representation, but verbose • namespace support • Many more features • List types, uniqueness and foreign key constraints, inheritance Ability to map to RDB,… • significantly more complicated than DTD syntax • Use of XSD recommended HS / DBSII-03-XML-1

  26. <xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema> <xsd:element name=“bank” type=“BankType”/> <xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType> </xsd:element> …..definitions of customer and depositor …. <xsd:complexTypename=“BankType”><xsd:squence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xsd:sequence> </xsd:complexType> </xsd:schema> XSD example (from Silverschatz)

  27. Using XML • Data exchange  • Data management: • Store, retrieve, query large document sets efficiently • Today's solutions: • Mapping to RDB / ORDB / OODB • "Native" XML data management (not necessarily very different from storing in conventional DB) • Standardized data description: different extensions and applications • Bioinformatic Sequence Markup Language (BSML) • MathML • Scalable Vector Graphics (SVG).. And many, many more • Ressource Description in the web (RDF) … HS / DBSII-03-XML-1

  28. fritz@web.de emailOf Encoded in XML: <?xml version="1.0"?> <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://description.org/schema/"> <Description about="http://www.me.de/~fritz"> <s:Creator>Fritz Müller</s:Creator> </Description> <Description about=fritz@web.de> <s:emailOf> Fritz Müller </s:emailOf> </Description> </RDF> Using XML: RDF with XML syntax RDF-Modell www.me.de/~fritz Homepage Fritz Müller Creator Many of these triples form a graph HS / DBSII-03-XML-1

  29. XML-Doc.(Layout-transf.) XML-Doc.(device spec. Layout) Standard Software(HTML-Browser) Standard-Software(XSL-Processor) XML-Doc.(Daten) Using XML • Layout of documents? • XML documents have logical structure • Layout structure needed for output • Use transformation language to describe device specific transformations Transformation into all kinds of languages (HTML, pdf, …) on all kinds of devices HS / DBSII-03-XML-1

  30. XML transformation • XSLT: The language used for converting XML documents into other forms • Describes how the document is transformed • Expressed as an XML document (.xsl) • Template rules • Patterns match nodes in source document • Templates instantiated to form part of result document • XPath for querying, sorting, etc. • XSL-FO language for describing layout XSL = XSLT + XPATH + XSL-FO HS / DBSII-03-XML-1

  31. XML transformation: example (1) • Document <sales> <summary> <heading>Scootney Publishing</heading> <subhead>Regional Sales Report</subhead> <description>Sales Report</description> </summary> <data> <region> <name>West Coast</name> <quarter number="1" books_sold="24000" /> <quarter number="2" books_sold="38600" /> <quarter number="3" books_sold="44030" /> <quarter number="4" books_sold="21000" /> </region> ... </data> </sales> HS / DBSII-03-XML-1

  32. XML transformation: example (2) • XSL style sheet - mapping to HTML <xsl:param name="low_sales" select="21000"/> <BODY> <h1><xsl:value-of select="//summary/heading"/> </h1> ... <table><tr><th>Region\Quarter</th> <xsl:for-each select="//data/region[1]/quarter"> <th>Q<xsl:value-of select="@number"/></th> </xsl:for-each> ... <xsl:for-each select="//data/region"> <tr><xsl:value-of select="name"/></th> <xsl:for-each select="quarter"> <td><xsl:choose> <xsl:when test="number(@books_sold &lt;= $low_sales)"> color:red;</xsl:when> <xsl:otherwise>color:green;</xsl:otherwise></xsl:choose> <xsl:value-of select="format-number (@books_sold,'###,###')" /> </td> ... <td><xsl:value-of select="format-number(sum(quarter/@books_sold), '###,###')"/> XPath expression XPath: query language on doc trees HS / DBSII-03-XML-1

  33. XML transformation: example (2) • The result HS / DBSII-03-XML-1

More Related