290 likes | 416 Views
XML-to-Relational Schema Mapping Algorithm ODTDMap. Speaker: Artem Chebotko* Email: artem@wayne.edu Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi. Introduction. XML has emerged as the standard for representing and exchanging data on the World Wide Web.
E N D
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: artem@wayne.edu Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi
Introduction • XML has emerged as the standard for representing and exchanging data on the World Wide Web. • The increasing amount of XML documents requires the need to store and query XML documents efficiently.
Current approaches of storing and querying XML documents • Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS. • XML-enabled commercial database systems such as SQL Server, Oracle, and DB2 • Using RDBMS/ODBMS to store and query XML documents.
Issues of the relational approach • Schema Mapping • XML data model needs to be mapped into the relational model • Data Mapping • XML documents need to be shredded and composed into tuples to be inserted into the relational database • Query Mapping • XML queries need to be translated into SQL queries • Reverse Data Mapping • Query results need to be tagged to XML format.
Our contributions • We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents. • Improvements over the existing algorithms • Losslessness • Efficient support for XML queries • Completeness (recursion, set-valued attributes DTD operators)
Outline of the talk • Introduction of XML DTDs • Mapping DTDs to relational schemas • Simplifying DTDs • Creating and inlining DTD graphs • Generating relational schemas • An example • Conclusions and future work
An overview of DTDsA DTD example <!DOCTYPE memo [ <!ELEMENT memo (to, from, date, subject?, body)> <!ATTLIST memo security CDATA> <!ATTLIST memo lang CDATA> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (para+)> <!ELEMENT para (#PCDATA)> ]
DTD: Document Type Defintion • <!DOCTYPE root-element [ doctype-declaration... • <!ELEMENT element-namecontent-model>, content model: “|”, “,”, “*”, “+”, “?” • <!ATTLIST element-nameattr-nameattr-typeattr-default ...>
DTD: Document Type Definition (con’t) • <!ATTLIST element-nameattr-nameattr-typeattr-default ...>declares which attributes are allowed or required in which elements attribute types: • CDATA: any value is allowed (the default) • (value|...): enumeration of allowed values • ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated) • attribute defaults: • #REQUIRED: the attribute must be explicitly provided • #IMPLIED: attribute is optional, no default provided • "value": if not explicitly provided, this value inserted by default • #FIXED "value": as above, but only this value is allowed
Mapping DTDs to relational schemas • Simplifying DTDs • Creating and inlining DTD graphs • Generating relational schemas
Simplifying DTDs • A DTD might be very complex due to nesting, e.g., <ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)> • An XML query language is concerned about: • The parent-child relationships between XML elements • The relative order relationships between siblings (add an ordinal attribute to each relation)
DTD simplifications rules • e+ e* • e? e • (e1 | … | en) (e1, … ,en) • (a) (e1,… ,en)* (e1*, … ,en*) (b) e** e* 5. (a) …, e, …, e, … …,e*,…,… (b) …, e, …, e*, … …,e*,…,… (c) …, e*, …, e, … …,e*,…,… (d) …, e*, …, e*, … …,e*,…,…
Example of simplifying a DTD <ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)> simplified to <ELEMENT a (b*, c*, d, e, f, g*, h*)>
Creating and inlining DTD graphs • We create a DTD graph based on the simplified DTD. • Definition 3.2 (DTD graph) The structure of a DTD can be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity. • Idea: inline a child c to its parent p if p can contain at most one occurrence of c. • Rationale: inlined elements will produce a relation.
Inlinable node and subtree, shared node • Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. • Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e). • Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.
Inlining • Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a. • Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining. • Case 3: Node a is connected to b by a star edge, no inlining.
Complexity of inlining • Theorem 3.7 (Time Complexity) The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.
Generating schema mapping info. • Definition 3.8 (s Mapping)s is a mapping from X to R, where X is theset of XML element and attribute types in the input XML DTD, and R is theset of relations in the relational database. Given an XML element type e, s(e)will return the corresponding relation that is used to store e. Similarly, givenan XML attribute type a of element type e, s(e.a) will return thecorrespondingrelation that is used to store a of e.
Conclusions • We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones. • It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap. • It has efficient support for recursive queries and schemas. • It defines how to map set-valued XML attributes. • Experimental results showed good performance and scalability of the algorithm.
Future work • Extending our work to XML Schema to support data types other than string type. • Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.