Main challenges in XML/Relational mapping

Juha Sallinen Hannes Tolvanen Main challenges in XML/Relational mapping

Agenda • Introduction: XML and databases • Objectives of the study • Findings • Conclusions

Introduction: XML and databases

Basic definitions • XML/relational mapping means data transformation between XML and relational data models • Mapping method is the way the mapping is done

Native vs. Relational • Why to store XML documents in relational database and not in native XML database? • Immaturity of current native XML database technology • Emerging technology - no ”de facto” standard • Well-working relational databases currently in use • Efficient and usable • May have been in use for years

Mapping dilemma • XML data model supports much more flexible data structures than relational model • Two fundamental differences: • XML tags • Nested structure of XML elements vs. flat structure of relational tables • If an XML document is not originated from another relational data source, it is possible that the data does not fit to relational schema very well

Dichotomy of mapping methods • There are two fundamentally different techniques of storing XML documents in a relational database • LOB presentation • Composed presentation

LOB presentation • LOB stands for Large Object • One XML document is put into a single column of a relational table • At least one column for indexing is also needed • Does not take full advantage of classical relational database (no XML extensions) • Not possible to use SQL to query XML elements • Not a very interesting choice!

Composed presentation • Data structure of an XML document is ”shredded” over one or more tables • Example: Different elements to different columns • Multiple ways to do this • Table-based and object-relational mapping will be introduced later

Objectives of the study

Objectives of the study • Find and explain the main issues to be considered when converting XML schema to relational schema • In other words: The main challenges that have to be taken into account by • Designers of XML/relational mapping methods • Users who need to map the data explicitly • Find and describe briefly two general mapping methods based on composed presentation

Findings

Issues to consider in mapping • Some of the most essential data characteristics • Existence of schema definition document • Stability of the schema • Degree of structure • Usage model for data • Queries against the database • Requirement of preserving ”hidden” information • DBMS implementation • not covered by the study, because scope was limited to the classical relational model

Data characteristics: Existence of XML schema definition • Schema definition says how the structure of XML documents conforming the schema is restricted • XSD (XML Schema Definition) and DTD (Document Type Definition) are currently the dominating standards for defining XML schema. • If we have the definition for the schema, conversion to relational schema will be based on it. • If we don’t have the schema definition, we have to make guesses how the structure of the given XML vocabulary is restricted. • Guesses are based on the data of instances of the vocabulary (XML documents). In other words we extract the schema from available data. • This is not unproblematic as we see from next example

Data characteristics: Existence of XML schema definition 2 - Example • Illustration of the problem of extracting the schema from data: <addressbook> <personname>eddy example</personname> <adddress>mannerheimintie 10, 00000 helsinki</address> </addressbook> • We might deduce from the document, that we wish to restrict the schema to <!ELEMENT addressbook (name, address)> <!ELEMENT B (#PCDATA)> <!ELEMENT C (#PCDATA)>

Data characteristics: Existence of XML schema definition 2 – Example continued • But if following document is received from the data source, we either have to extend our relational schema or dismiss the data that relational schema doesn’t support (summer cottage’s address) or combine the two fields: <addressbook> <personname>person2</personname> <address>jämeräntaival 10, 02150 espoo</address> <summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress> </addressbook> • We can alter the database schema by adding an extra column to table mapped from addressbook element to support the the new information • This solution can’t be however applied if we don’t know the relation between person and summercottage is 1:1. We might get documents containing persons that have many addresses for summer cottages, and again, we would run to the situation that we would have to alter the database schema. We would have to create a property table for the addresses.

Evolving schema • If the schema of XML vocabulary is defined, but it experiences changes, respective changes must be made to relational schema • Changes are not always such easy to make to relational schema as in previous example (if composed approach is used) • It should be evaluated what are the chances for schema to change.

Degree of structure of the XML schema • Categorization used in the study: • Structured data • Data is totally independent from the presentation used to describe it. • Document can be navigated without examining it first • Semi-structured data • Some blocks of the document may contain optionalities • Marked-up text • Documents require the preservation of ”hidden” information • E.g. HTML documents • These terms have different meaning in the literature. Information on the following slide is based on the definitions of this slide.

Degree of structure of the XML schema • Structured documents can be easily mapped to database using composed presentation. Also semi-structured documents can be decomposed if schema definition is provided. If mixed content is included, it depends on the usage of data whether LOB presentation is better for the mixed content block than further fragmentation. • Marked-up text's requirement for “hidden information's” preservation is discussed later.

Storing mixed content to relations • Mixed content: Document elements embedded to character data . E.g. <h1>example</h1><p>here you have a <b>short</b> example</p> • Designing a relational schema to store mixed content • If there are blocks in the content that make sense only as a whole, decomposition of those blocks makes no sense. • If we have strong arguments for decomposing a block containing mixed content, one possible decomposition method is to create one table for the root element and one property table for character data, and a property table for every element that appears in the content.

Relational schema A(a_pk) B(a_fk,b, bOrder) C(a_fk, c, cOrder) PCDATA(a_fk, pcdata, pcdataOrder) Mixed content mapping example • DTD <!ELEMENT A (#PCDATA | B | C)*> <!ELEMENT B (#PCDATA)> <!ELEMENT C (#PCDATA)> • Example instance: Here we have a <b>nice </b> example <c>!</c>

Usage models for data: Type of queries executed against the database • The spectrum of queries • Queries that retrieve XML documents • Queries that retrieve fragments of XML documents • Queries that make transformations on XML data • And even more complex queries...

Query examples 1 • Sample documents <addressbook> <personname>person1</personname> <streetaddress>jämeräntaival 10</streetaddress> <city>espoo</city> <summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress> </addressbook> <addressbook> <personname>person2</personname> <streetaddress>smt 10</streetaddress> <city>espoo</city> <summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress> </addressbook> • Query emitting XML fragment: Select the names of persons who live in Espoo <personname>person1</personname> <personname>person2</personname>

Query examples 2 • Query making transformation: “select the number of persons living in Espoo” <numberOfPersonsInEspoo>2</numberOfPersonsInEspoo>

Preservation of “hidden” information • The XML document contains “hidden” information that is related to the presentation of the data, not the data itself. • Order of elements • Comments • Whitespaces • It might be required that original XML documents can be retrieved • Trivial when LOB presentation is used • If composition presentation is used, all “hidden” information need to be stored to relations

Table-based mapping <Tables> <Table_1> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> ... </Table_1> ... <Table_n> <Row> <Column_1>...</Column_1> ... <Column_m>...</Column_m> </Row> ... </Table_n> </Tables> • Listing 1. Required structure of XML document in table-based mapping (Bourret, 2001).

Object-Relational mapping • Mapping method for mapping any XML document that has a schema definition. • The idea is to convert the schema of document to an object schema, and then convert the object schema to relational schema • Step of object/relational conversion is predefined, but XML/object conversion leaves some freedom to define the object view that is mapped from XML data.

Conclusions

Conclusions • The selection between the choice of possible relational representations for XML data include many issues that must be considered. • Some of the issues limit the choice to LOB presentation (no schema, rapidly evolving schema, queries include only retrieval of original documents) • LOB presentation can be also used for storing blocks of the document where are no references from elsewhere. • Usual reason why decomposition method is generally preferred if possible, is the performance gain. Also the data comes more accessible to applications that use the database, but don’t publish any views of data in XML.

Main challenges in XML/Relational mapping

Main challenges in XML/Relational mapping

Presentation Transcript

Forest simulation models in Belgium: main developments and challenges WG1

XML in FORTRAN

XML Processing in

Forest simulation models in Switzerland: main developments and challenges WG1

Using XML in

Forest simulation models in Czechia : main developments and challenges WG1

Forest simulation models in Slovenia : main developments and challenges WG1

Forest simulation models in Germany: main developments and challenges WG1

Forest simulation models in Portugal: main developments and challenges WG1

Market liberalization in Moldova -main challenges-

ESCB statistics and its main challenges ahead

XML Full-Text Search: Challenges and Opportunities

Forest simulation models in Netherlands: main developments and challenges WG1

Rural regions in Europe: Territorial potentials and main challenges

PEMEX: Perspectives and main Challenges

The French Water Services: Main present challenges

Four main challenges

MiFID Implementation and its main challenges

Rural regions in Europe: Territorial potentials and main challenges

Namespace in XML

XML Schemas in Oracle XML DB

Main HI Workforce challenges