1 / 16

Advanced Topics

Advanced Topics. XML and Databases. XML. Overview Structure of XML Data XML Document Type Definition DTD Namespaces XML Schema Query and Transformation XPath XSLT XQuery. XML Overview. eXtensible Markup Language xML

hlee
Download Presentation

Advanced Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Topics XML and Databases

  2. XML • Overview • Structure of XML Data • XML Document Type Definition DTD • Namespaces • XML Schema • Query and Transformation • XPath • XSLT • XQuery

  3. XML Overview • eXtensible Markup Language xML • Hyper-Text Markup Language (HTML) for document presentation and Standard Generalized Markup Language SGML for document management. • XML can handle structured data typical of DBMS. • XML is flexible and can handle semi-structured data that cannot be handled by relational DBMS. • XML is the de facto representation to exchange data between applications on the Web.

  4. XML Overview • Markup Language • separation of content and markup; • meaning of the markup; • E.g., HTML shows document markup for presentation; • Tags • <title> Database System Concepts </title> • HTML has a specific set of tags; • XML is extensible and applications can specify tags as needed.

  5. XML Overview • Comparison with DBMS • Focus is on the EXCHANGE of data between applications. • Storage and management of XML is more complex than for relational DBMS since XML is semi-structured. • Tagged XML means that the message is self-documenting. No need for catalog, etc. • Format of XML is not rigid and an application can ignore any fields. • Versatile since most browsers are XML enabled and most DBMS vendors support XML data.

  6. Structure of XML Data • XML document; single root, e.g., bank in Figure 10.1 • Element: bank is the root element and document also contains customer, account and depositor elements. • Elements in the XML document must be properly nested, i.e., matching start and end tag within parent. • <account> <balance> </balance> </account> is properly nested. • <account> <balance> </account> </balance> is not properly nested. • Figure 10.2 – Combine unstructured data (text) and semi-structured data. This is one of the strengths of XML data exchange.

  7. Structure of XML Data • Nested data in XML can be considered similar to the output of a join from multiple tables or an unnormalized (nested) relational table. • Figure 10.3 shows account elements nested within customer elements. • Advantage is that there is no need to join customer and account. • Shipping address is stored with each shipment. • Disadvantage is that if customer and account is a many-to-many relationship then the account information will be replicated with all the disadvantages of replicated information.

  8. Structure of XML Data • Element • Subelement • <element> </element> or <element/> • Attribute • Figure 10.4 • Attribute is of type string; it cannot be repeated within an element and cannot have sub-elements. • account is an element; acct-type is an attribute; account-number and branch-name and balance are subelements of element account.

  9. XML Namespace • Namespace allows organizations to specify globally unique names for element tags. • Each tag or attribute is associated with a URI and this combination of URI and tag (attribute) is unique. • Namespace can be declared in the root element. • <bank xmlns:FB=http://www.FirstBank.com> …. <FB:branch> <FB:branchname> …. </FB:branchname> <FB:branchaddress> … </FB:branchaddress> </bank>

  10. XML DTD • XML documents do not have to conform to any schema or set of pre-defined tags. • However, in most cases, applications require that data conforms to some pre-defined tags. • XML DTD • Allowed list of elements and subelements within elements. • Does not identify data types and other constraints. • | (or) + (1 or more) ? (0 or more)

  11. XML DTD • Figure 10.6 DTD Example • bank element consists of one or more account or customer or depositor elements (in that order). • account element has subelements account-number, branch-number, balance, etc. • elements account-number, branch-name, etc. are of type #PCDATA (text or string). • empty – element has no contents. • any – element can have any subelements. • attrributes must have a type declaration and a default value. <!ATTLIST account acct-type CDATA “checking”>

  12. XML DTD • ID and IDREF and IDREFS Figure 10.7 • ID • An attribute of type ID for an element provides a unique (global) identifier or key for that element. • An element can at most have one such attribute of type ID. • <!ATTLIST account account-number ID #REQUIRED • An attribute of type IDREF is a reference to an element; its value MUST BE the unique ID value of some element in the document. • IDREFS is a set of ID values. • ID and IDREF and IDREFS capture primary key and foreign key functionality of the relational data model. • Figure 10.8 Example of XML document with ID and IDREFS. • IDREF must point to an ID but there is no type checking so it can point to the ID of an account or the ID of a customer or the ID of a branch!

  13. XML Schema – Figure 10.9 • XML Schema is closer in spirit to relational schemas. • It is closely associated with namespaces, e.g., xmlns:xsd=http://www.w3.org/2001/XMLSchema> • Supports uniqueness of primary keys and constraints on foreign keys. • element has name and type • complexType (account or customer or depositor) is a sequence of subelements. • complexType BankType is a sequence of references to elements of type account or customer or depositor. • More well defined than XML DTD since IDREF could refer to an element irrespective of whether it was an account or a customer. • minOccurs and maxOccurs are multiplicity constraints.

  14. Query and Transformation of XML • 3 kinds of query languages • XPath is the building block of path expressions. • XSLT is a transformation language. • Originally designed to convert to HTML. • XSLT can transform one XML document to another so it is also a query language. • Most widely supported. • XQuery is more like an object query language. • Tree model of XML data • Root • Nodes are either elements or attributes. • Element nodes can have children which are subelements or attributes of that element.

  15. Query and Transformation of XML • Path expression • Sequence of /xx/yy/zz where / refers to the root. • Result is a set of values from the XML document. • /bank-2/customer/customer-name on Figure 10.8 returns <customer-name>Joe</customer-name> and <customer-name>Lisa</customer-name> and <customer-name>Mary</customer-name> • /bank-2/customer/customer-name/text() would return only the values and not the tagged elements. • /bank-2/account/@account-number also returns the set of account numbers. @ cannot be applied to IDREFS. • Selection • /bank-2/account[balance > 400] • /bank-2/account[balance > 400]/@account-number • Count • /bank-2/account/[customer/count() > 2] • Skip intermediate elements • /bank-2//name

  16. BMGTG402 Namespace • <402s04grade xmlns:402s04=http://www.rhsmith.umd.edu/is/aqiuol/402s04> <402s04:grade> <402s04:student> …. </402s04:student> <402s04:team> … </402s04:team> </402s04:grade>

More Related