1 / 32

Representing Data with XML

Representing Data with XML. September 27, 2005 Shawn Henry with slides from Neal Arthorne. Data Representation. Design goals for data representation: Portable (platform independent) Easy for machines to process Human legible Flexible and usable over the Internet and other networks

Download Presentation

Representing Data with XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing Data with XML September 27, 2005 Shawn Henry with slides from Neal Arthorne

  2. Data Representation • Design goals for data representation: • Portable (platform independent) • Easy for machines to process • Human legible • Flexible and usable over the Internet and other networks • Concisely defined with formal rules

  3. Extensible Markup Language • World Wide Web Consortium (W3C) defines the Extensible Markup Language (XML) • W3C also defined HTML, CSS, HTTP, SVG and other markup languages • XML Working group formed in 1996 • XML 1.0 (Third Edition) 4 February 2004 (original Recommendation in 1998)

  4. Prolog Attribute Element XML Example <?xml version="1.0" encoding="UTF-8"?> <foods> <pizzatitle=“Deluxe Pizza”> <name>The Deluxe</name> <toppings> <topping>peppers</topping> <topping>pepperoni</topping> <topping>mushrooms</topping> <topping>cheese</topping> <topping>tomato sauce</topping> </toppings> <price>7.99</price> </pizza> </foods>

  5. XML • XML documents should be well-formed (syntax, closing tags etc) • XML documents are valid if they conform to a specified grammar (usually DTD or XML Schema) • DTDs (Document Type Definitions) provide a grammar for the XML by defining elements, attributes and entities

  6. XML Advantages • XML provides: • Logical structure for data in a textual representation • Formal rules for validating documents • Flexibility to define your own markup language • Portability across networks and platforms • Becoming a widely accepted data interchange format • Processed with off-the-shelf tools

  7. XML Disadvantages • XML drawbacks: • Not a binary format so it requires a lot of overhead for a little bit of data • Very little support for binary or mixed media data formats (hex or base64 encoding) • Only for data and holds no semantics or reasoning • DTDs do not provide: • Data types for each element or attribute • Complex structural rules for documents

  8. XML Schema • XML Schema defines a new schema language to replace DTD • Standardized by W3C in 2001 • Advantages: • Provides data typing and logical structure • Written in XML (easy to process) • Higher complexity than DTD

  9. Element name Data type Attribute name Data type XML Schema Example <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="pizza"> <xsd:complexType> <xsd:all> <xsd:element name="name" type="xsd:string" /> <xsd:element name="toppings" type="Toppings" /> <xsd:element name="price" type="xsd:float" /> </xsd:all> <xsd:attribute name="title" type="xsd:string" /> </xsd:complexType> </xsd:element> <xsd:complexType name="Toppings"> <xsd:sequence> <xsd:element name="topping" minOccurs="1" maxOccurs="unbounded" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:schema> • An XML document is an ‘instance document’ of an XML Schema

  10. Simple Types • Simple Types are of three varieties: • Atomic: Built-in or derived, e.g. <xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType> • List: multiple items of the same type <listOfMyInt>20003 15037 95977 95945</listOfMyInt> • Union: Union or two or more Simple Types

  11. Built-in Types • XML Schema defines numerous built-in types: • integer, decimal, token, byte, boolean, date, time, short, long, float, anyURI, language • Facets can be used to restrict existing types: • min/maxInclusive, min/maxExclusive, pattern, enumeration, min/maxLength, length, totalDigits, fractionDigits

  12. Complex Types • Complex Types define logical structures with attributes and nested elements • They use a sequence, choice or all containing elements that use Simple Types or other Complex Types • May reference types defined elsewhere in the schema or imported using import statement

  13. In the Schema of Things • XML Schema supersedes DTD • Defines a typed data format with no semantics or relations between data • Next step: higher level of abstraction and the ability to define objects and relations

  14. Resource Description Framework • W3C standard for describing resources on the World Wide Web (1999, revised 2004) • Objects identified by Uniform Resource Identifiers (URIs) • Generalized to identify objects that may not be retrievable on the Web • RDF represented by a directed graph and in XML syntax

  15. RDF Example http://www.example.com/people/diaz/contact • In English: http://www.example.com/people/diaz/contact has the full name Federico Diaz and has an employer called Fisher and Sons. http://www.w3.org/2000/10/pim/contact#fullName http://www.w3.org/2000/10/work#employer Federico Diaz http://www.fisherandsons.com/contact

  16. RDF Parts • Each RDF statement is a triple containing a subject(identifier by URI), a predicate(e.g. creator, title, full name) and an object • An object can be either a literal value (e.g. Federico Diaz) or another RDF resource • All three parts can be identified with an URI and fragment identifier #

  17. RDF Semantics • RDF attaches no specific meaning to RDF statements – just like the name of a database field is meaningless to an SQL engine • RDF does provide a way to attach data types to literal values, but RDF does not define data types • Generally RDF software uses the XML Schema data types • <size rdf:datatype=“xsd#int”>10</size> • Arbitrary XML can also be used as a literal • <x:prop rdf:parseType="Literal“> <a:size>10</a:size></x:prop>

  18. RDF Schema • RDF Schema is a ‘vocabulary description language’ that relates resources to each other using RDF • RDFS uses ‘classes’ of objects like in Object-Oriented (OO) systems • Class properties relate to other classes using OO concepts such as generalization

  19. RDF Schema Use • Differs from OO in that Properties are defined in terms of the resources to which they apply (their domain) – they are not restricted to the scope of a single class • domain: Classes to which a Property applies • range: The Class of a Property (i.e. type) • Allows new Properties to be created that apply to the same domain without redefining the domain

  20. RDFS Classes • Classes introduced by RDFS: • Resource - top level class • Literal – all literal values like text strings • Class – the class of all classes • Datatype – top level RDF datatype • Properties introduced by RDFS: • subClassOf • subPropertyOf • domain – domain of a Property • range – range of a Property • label, comment, seeAlso – human readable labels inheritance

  21. RDFS Example <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.or/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://example.org/schemas/food"> <rdfs:Class rdf:ID="Food"/> <rdfs:Class rdf:ID="Pizza"> <rdfs:subClassOf rdf:resource="#Food"/> </rdfs:Class> <rdfs:Class rdf:ID="Topping"> <rdfs:subClassOf rdf:resource="#Food"/> </rdfs:Class> <rdfs:Datatype rdf:about="&xsd;float"/> <rdf:Property rdf:ID="hasTopping"> <rdfs:domain rdf:resource="#Pizza"/> <rdfs:range rdf:resource="#Topping"/> </rdf:Property> <rdf:Property rdf:ID="price"> <rdfs:domain rdf:resource="#Pizza"/> <rdfs:range rdf:resource="&xsd;float"/> </rdf:Property> </rdf:RDF>

  22. RDF Example <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/schemas/food#" xml:base="http://example.org/things"> <ex:Pizza rdf:ID="ShawnsPizza"> <ex:price rdf:datatype="&xsd;float">12.99</ex:price> <ex:hasTopping rdf:resource="http://www.example.org/food/85740"/> <ex:hasTopping rdf:resource="http://www.example.org/food/85729"/> </ex:Pizza> </rdf:RDF>

  23. RDF/RDFS • Lets authors create vocabularies of Classes and Properties and show how the terms should be used to describe resources, e.g. • Property ‘author’ applies to class ‘Book’ • Class ‘Employee’ is a subclass of ‘Person’ • Does not define descriptive properties such as ‘dateOfIssue’ or ‘title’ but references them using URIs • Like in XML/XML Schema, an RDF instance document can be validated against its RDF Schema

  24. Machines Understanding the Web • RDF/RDFS along with XML/XML Schema provide a means to describe resources on the web with basic generalization • For a higher conceptual level, applications require semantic information • Ontologies serve as a starting point for understanding

  25. Ontologies on the Web • “Ontologies define the terms used to represent an area of knowledge.” – OWL Use Cases & Requirements, 2004 • Example use cases: • A web portal that needs to classify information • Multimedia archive that requires a taxonomy of media or content-specific properties • Corporate portal website that integrates vocabularies from different departments

  26. Web Ontology Language (OWL) • Supersedes DAML+OIL • DARPA Agent Markup Language (DAML) was based on RDF/RDFS and includes much of what is now OWL • Adds terms used to better describe relations between classes of RDF resources • With OWL, ontologies can be integrated, extended and shared

  27. Web Ontology Language • Individuals • OWL does not honour the Unique Names Assumption (UNA) • Properties • Binary relations between individuals • Functional, transitive or symmetric • Classes • Sets containing individuals • Organized into a taxonomy with subclasses and superclasses

  28. Three Flavours of OWL • OWL Lite • For classification hierarchies with simple constraints • OWL DL • Expressiveness with computational completeness • OWL Full • Maximum expressiveness • No computational guarantees • Extension of RDF

  29. OWL Features • OWL improvements on RDF/RDFS: • Cardinality • min/maxCardinality for Properties with respect to a Class • Equality, disjointness • equivalentClass, equivalentProperty, sameAs, differentFrom, disjointWith • Transitive, Symmetric, Functional Properties • labelling a Property allows for reasoning • A has B and B has C implies A has C (Transitive) • A has B implies B has A (Symmetric)

  30. OWL Features (cont’d) • Boolean expressions of Class relations • unionOf, complementOf, intersectionOf • Property restrictions • Limits how properties can be used by an instance of a class • Versioning • priorVersion, versionInfo, incompatibleWith, backwardCompatibleWith

  31. Conclusion ??? Conceptual level reasoning – ‘smart’ applications OWL Knowledge processing and reasoning RDF RDF Schema Resource description and vocabulary Knowledge Data XML XML Schema Data formatting and data types Unicode/ISO byte streams Machine data representation

  32. References • World Wide Web Consortium http://www.w3.org • XML http://www.w3.org/TR/REC-xml • XML Schema Part 0: Primer http://www.w3.org/TR/xmlschema-0/ • RDF Primer http://www.w3.org/TR/rdf-primer/ • RDF Concepts http://www.w3.org/TR/rdf-concepts/ • RDF/XML Syntax http://www.w3.org/TR/rdf-syntax-grammar/ • RDF Schema http://www.w3.org/TR/rdf-schema/ • OWL Use Cases & Requirements http://www.w3.org/TR/webont-req/ • OWL Overview http://www.w3.org/TR/owl-features/

More Related