1 / 45

Semi-Structured data (XML)

Semi-Structured data (XML). CS561-Spring 2012 WPI, Mohamed eltabakh. Semi-Structured Data. ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance Efficient implementation and various storage and processing optimizations

tceballos
Download Presentation

Semi-Structured data (XML)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Structured data (XML) CS561-Spring 2012 WPI, Mohamed eltabakh

  2. Semi-Structured Data • ER, Relational, ODL data models are all based on schema • Structure of data is rigid and known is advance • Efficient implementation and various storage and processing optimizations • Semistructured data is schemaless • Flexible in representing data • Different objects may have different structure and properties • Self-describing (data is describing itself) • Harder to optimize and efficiently implement

  3. Relational model for Movie DB Collection of records (tuples) Movie Star Stars-in Relationship

  4. Semi-Structured model Collection of nodes • Leaf nodes contain data • Internal nodes represent either objectsor attributes • Each link is either an attribute link or relationship link

  5. XML • XML: Extensible Markup Language • XML is a tag-based notation (language) to describe data • XML has two modes • Well-formed XML ---No Schema at all • Valid XML --- governed by DTD (Document Type Definition) • Allows validation and more optimizations and pre-processing XML document

  6. HTML Tags vs. XML tags • HTML tags describe structure/presentation <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999

  7. HTML Tags vs. XML tags (Cont’d) • XML tags describe content (have semantics) • <bibliography • <book> <title> Foundations… </title> • <author> Abiteboul </author> • <author> Hull </author> • <author> Vianu </author> • <publisher> Addison Wesley </publisher> • <year> 1995 </year> • </book> • … • </bibliography>

  8. XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…</book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> • an XML document: single root element Well-formed XML document: if it has matching tags CS561 - Spring 2007.

  9. XML: Attributes <bookprice = “55”currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> Inside the start tag Attributes are alternative ways to represent data CS561 - Spring 2007.

  10. Semantic tags Instructional tag (the doc. Is XML) Standalone means it does not follow a schema (well-formed) Root element Sub elements Attributes

  11. Attributes vs. Sub-elements • Two alternative ways to describe the attributes of an object • Attributes are also used to define IDs and references

  12. Attributes vs. Sub-elements

  13. XML: ID and IDREF • In XML document they appear like any other attribute • ID and IDREF are formally defined in DTD or XML Schema

  14. XML Namespaces • Tags may have namespaces • They define where the tag is defined (its format or structure) • Namespace format  xmlns:<name>=… <bookxmlns:isbn=“www.isbn-org.org/def”> <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book> CS561 - Spring 2007.

  15. defined here XML Namespaces • syntactic: <number> , <isbn:number> • semantic: provide URL for “shared” schema <tagxmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> CS561 - Spring 2007.

  16. Covered so far… • What are XML documents • XML Structure • Tags, start and end tags, elements, attributes • XML Types • Well-formed XML (No schema) • Valid XML (has a schema)

  17. XML Schema

  18. XML Schema • An XML document is usually (but not always) validated by an XML Schema • The XML Schema provides the information on whether the XML document “followed the rules” set up in the XML Schema • An XML Schema is an agreement between the sender and the receiver of a document as to the structure of that document Two mechanisms Document Type Definition DTD XML Schema

  19. XML Schema Schema can define: -Elements -Attributes -Data types -Required or optional -Min and Max occurrences

  20. Example

  21. Data Types in XML Schema

  22. Simple data types in XML Schema

  23. Example: Simple Types

  24. Complex types in XML Schema

  25. Example: Complex data types

  26. Movies schema

  27. Type inheritance <complexTypename="Address"> <sequence> <elementname="street" type="string"/> <elementname="city" type="string"/> </sequence> </complexType> <complexTypename="USAddress"> <complexContent> <extensionbase= ”Address"> <sequence> <elementname="state" type=”string"/> <elementname="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType>

  28. Keys in XML Schema

  29. Keys in xml schema • Elements in XML can have keys (unique identifiers) • Keys can be attributes or subelements • A key can be a single field or multiple fields • Key fields (attributes or subelements) cannot be missing • Keys are defined in XML schema using special syntax • Attributes do not have keys

  30. Keys in xml schema • Key: give a name to the key • Selector: following the selector xpath starting from the root, it will return a list of objects • Field: in the returned objects, the xpath defined in ‘field’ has to be unique • @ symbol refers to attributes

  31. Keys in xml schema • In general, the key syntax is: <keyname=“someDummyNameHere"> <selectorxpath=“p"/> <fieldxpath=“p1"/> <fieldxpath=“p2"/> . . . <fieldxpath=“pk"/> </key> All these fields together form the key

  32. Foreign Keys in XML Schema • Foreign key syntax: Foreign key name Refers to which primary key • <keyrefname="personRef" refer="fullName"> • <selectorxpath=".//personPointer"/> • <fieldxpath="@first"/> • <fieldxpath="@last"/> • </keyref> Location of Foreign key

  33. Example: Movie schema

  34. Example: Stars schema

  35. Using XML Schema

  36. Using XML Schema Putting the data in XML documents following the given schema Parsing the document and validating it against the schema

  37. Reusing XML Schemas

  38. GUI for managing xml schema

  39. Expanding elements

  40. XML Model vs. Relational Model

  41. Database Architecture

  42. Relational Metadata – the Schema

  43. XML Metadata – the Document

  44. XML Metadata – the Schema

  45. Comparison XML • Relationships among items inferred by position • Used for data exchange and with XSLT for web visualization • Good for partitioned data and for retrieving objects with their all sub-components • Harder to optimize for storage and querying • Usually not straightforward RDBMS • Relationships among items is explicitly defined • General-purpose storage and processing systems • Good for general-purpose queries asking for different objects • Easy to optimize for storage and querying • Straightforward to export to XML

More Related