320 likes | 444 Views
Representing Data with XML. September 27, 2005 Shawn Henry with slides from Neal Arthorne. Data Representation. Design goals for data representation: Portable (platform independent) Easy for machines to process Human legible Flexible and usable over the Internet and other networks
E N D
Representing Data with XML September 27, 2005 Shawn Henry with slides from Neal Arthorne
Data Representation • Design goals for data representation: • Portable (platform independent) • Easy for machines to process • Human legible • Flexible and usable over the Internet and other networks • Concisely defined with formal rules
Extensible Markup Language • World Wide Web Consortium (W3C) defines the Extensible Markup Language (XML) • W3C also defined HTML, CSS, HTTP, SVG and other markup languages • XML Working group formed in 1996 • XML 1.0 (Third Edition) 4 February 2004 (original Recommendation in 1998)
Prolog Attribute Element XML Example <?xml version="1.0" encoding="UTF-8"?> <foods> <pizzatitle=“Deluxe Pizza”> <name>The Deluxe</name> <toppings> <topping>peppers</topping> <topping>pepperoni</topping> <topping>mushrooms</topping> <topping>cheese</topping> <topping>tomato sauce</topping> </toppings> <price>7.99</price> </pizza> </foods>
XML • XML documents should be well-formed (syntax, closing tags etc) • XML documents are valid if they conform to a specified grammar (usually DTD or XML Schema) • DTDs (Document Type Definitions) provide a grammar for the XML by defining elements, attributes and entities
XML Advantages • XML provides: • Logical structure for data in a textual representation • Formal rules for validating documents • Flexibility to define your own markup language • Portability across networks and platforms • Becoming a widely accepted data interchange format • Processed with off-the-shelf tools
XML Disadvantages • XML drawbacks: • Not a binary format so it requires a lot of overhead for a little bit of data • Very little support for binary or mixed media data formats (hex or base64 encoding) • Only for data and holds no semantics or reasoning • DTDs do not provide: • Data types for each element or attribute • Complex structural rules for documents
XML Schema • XML Schema defines a new schema language to replace DTD • Standardized by W3C in 2001 • Advantages: • Provides data typing and logical structure • Written in XML (easy to process) • Higher complexity than DTD
Element name Data type Attribute name Data type XML Schema Example <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="pizza"> <xsd:complexType> <xsd:all> <xsd:element name="name" type="xsd:string" /> <xsd:element name="toppings" type="Toppings" /> <xsd:element name="price" type="xsd:float" /> </xsd:all> <xsd:attribute name="title" type="xsd:string" /> </xsd:complexType> </xsd:element> <xsd:complexType name="Toppings"> <xsd:sequence> <xsd:element name="topping" minOccurs="1" maxOccurs="unbounded" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:schema> • An XML document is an ‘instance document’ of an XML Schema
Simple Types • Simple Types are of three varieties: • Atomic: Built-in or derived, e.g. <xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType> • List: multiple items of the same type <listOfMyInt>20003 15037 95977 95945</listOfMyInt> • Union: Union or two or more Simple Types
Built-in Types • XML Schema defines numerous built-in types: • integer, decimal, token, byte, boolean, date, time, short, long, float, anyURI, language • Facets can be used to restrict existing types: • min/maxInclusive, min/maxExclusive, pattern, enumeration, min/maxLength, length, totalDigits, fractionDigits
Complex Types • Complex Types define logical structures with attributes and nested elements • They use a sequence, choice or all containing elements that use Simple Types or other Complex Types • May reference types defined elsewhere in the schema or imported using import statement
In the Schema of Things • XML Schema supersedes DTD • Defines a typed data format with no semantics or relations between data • Next step: higher level of abstraction and the ability to define objects and relations
Resource Description Framework • W3C standard for describing resources on the World Wide Web (1999, revised 2004) • Objects identified by Uniform Resource Identifiers (URIs) • Generalized to identify objects that may not be retrievable on the Web • RDF represented by a directed graph and in XML syntax
RDF Example http://www.example.com/people/diaz/contact • In English: http://www.example.com/people/diaz/contact has the full name Federico Diaz and has an employer called Fisher and Sons. http://www.w3.org/2000/10/pim/contact#fullName http://www.w3.org/2000/10/work#employer Federico Diaz http://www.fisherandsons.com/contact
RDF Parts • Each RDF statement is a triple containing a subject(identifier by URI), a predicate(e.g. creator, title, full name) and an object • An object can be either a literal value (e.g. Federico Diaz) or another RDF resource • All three parts can be identified with an URI and fragment identifier #
RDF Semantics • RDF attaches no specific meaning to RDF statements – just like the name of a database field is meaningless to an SQL engine • RDF does provide a way to attach data types to literal values, but RDF does not define data types • Generally RDF software uses the XML Schema data types • <size rdf:datatype=“xsd#int”>10</size> • Arbitrary XML can also be used as a literal • <x:prop rdf:parseType="Literal“> <a:size>10</a:size></x:prop>
RDF Schema • RDF Schema is a ‘vocabulary description language’ that relates resources to each other using RDF • RDFS uses ‘classes’ of objects like in Object-Oriented (OO) systems • Class properties relate to other classes using OO concepts such as generalization
RDF Schema Use • Differs from OO in that Properties are defined in terms of the resources to which they apply (their domain) – they are not restricted to the scope of a single class • domain: Classes to which a Property applies • range: The Class of a Property (i.e. type) • Allows new Properties to be created that apply to the same domain without redefining the domain
RDFS Classes • Classes introduced by RDFS: • Resource - top level class • Literal – all literal values like text strings • Class – the class of all classes • Datatype – top level RDF datatype • Properties introduced by RDFS: • subClassOf • subPropertyOf • domain – domain of a Property • range – range of a Property • label, comment, seeAlso – human readable labels inheritance
RDFS Example <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.or/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://example.org/schemas/food"> <rdfs:Class rdf:ID="Food"/> <rdfs:Class rdf:ID="Pizza"> <rdfs:subClassOf rdf:resource="#Food"/> </rdfs:Class> <rdfs:Class rdf:ID="Topping"> <rdfs:subClassOf rdf:resource="#Food"/> </rdfs:Class> <rdfs:Datatype rdf:about="&xsd;float"/> <rdf:Property rdf:ID="hasTopping"> <rdfs:domain rdf:resource="#Pizza"/> <rdfs:range rdf:resource="#Topping"/> </rdf:Property> <rdf:Property rdf:ID="price"> <rdfs:domain rdf:resource="#Pizza"/> <rdfs:range rdf:resource="&xsd;float"/> </rdf:Property> </rdf:RDF>
RDF Example <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/schemas/food#" xml:base="http://example.org/things"> <ex:Pizza rdf:ID="ShawnsPizza"> <ex:price rdf:datatype="&xsd;float">12.99</ex:price> <ex:hasTopping rdf:resource="http://www.example.org/food/85740"/> <ex:hasTopping rdf:resource="http://www.example.org/food/85729"/> </ex:Pizza> </rdf:RDF>
RDF/RDFS • Lets authors create vocabularies of Classes and Properties and show how the terms should be used to describe resources, e.g. • Property ‘author’ applies to class ‘Book’ • Class ‘Employee’ is a subclass of ‘Person’ • Does not define descriptive properties such as ‘dateOfIssue’ or ‘title’ but references them using URIs • Like in XML/XML Schema, an RDF instance document can be validated against its RDF Schema
Machines Understanding the Web • RDF/RDFS along with XML/XML Schema provide a means to describe resources on the web with basic generalization • For a higher conceptual level, applications require semantic information • Ontologies serve as a starting point for understanding
Ontologies on the Web • “Ontologies define the terms used to represent an area of knowledge.” – OWL Use Cases & Requirements, 2004 • Example use cases: • A web portal that needs to classify information • Multimedia archive that requires a taxonomy of media or content-specific properties • Corporate portal website that integrates vocabularies from different departments
Web Ontology Language (OWL) • Supersedes DAML+OIL • DARPA Agent Markup Language (DAML) was based on RDF/RDFS and includes much of what is now OWL • Adds terms used to better describe relations between classes of RDF resources • With OWL, ontologies can be integrated, extended and shared
Web Ontology Language • Individuals • OWL does not honour the Unique Names Assumption (UNA) • Properties • Binary relations between individuals • Functional, transitive or symmetric • Classes • Sets containing individuals • Organized into a taxonomy with subclasses and superclasses
Three Flavours of OWL • OWL Lite • For classification hierarchies with simple constraints • OWL DL • Expressiveness with computational completeness • OWL Full • Maximum expressiveness • No computational guarantees • Extension of RDF
OWL Features • OWL improvements on RDF/RDFS: • Cardinality • min/maxCardinality for Properties with respect to a Class • Equality, disjointness • equivalentClass, equivalentProperty, sameAs, differentFrom, disjointWith • Transitive, Symmetric, Functional Properties • labelling a Property allows for reasoning • A has B and B has C implies A has C (Transitive) • A has B implies B has A (Symmetric)
OWL Features (cont’d) • Boolean expressions of Class relations • unionOf, complementOf, intersectionOf • Property restrictions • Limits how properties can be used by an instance of a class • Versioning • priorVersion, versionInfo, incompatibleWith, backwardCompatibleWith
Conclusion ??? Conceptual level reasoning – ‘smart’ applications OWL Knowledge processing and reasoning RDF RDF Schema Resource description and vocabulary Knowledge Data XML XML Schema Data formatting and data types Unicode/ISO byte streams Machine data representation
References • World Wide Web Consortium http://www.w3.org • XML http://www.w3.org/TR/REC-xml • XML Schema Part 0: Primer http://www.w3.org/TR/xmlschema-0/ • RDF Primer http://www.w3.org/TR/rdf-primer/ • RDF Concepts http://www.w3.org/TR/rdf-concepts/ • RDF/XML Syntax http://www.w3.org/TR/rdf-syntax-grammar/ • RDF Schema http://www.w3.org/TR/rdf-schema/ • OWL Use Cases & Requirements http://www.w3.org/TR/webont-req/ • OWL Overview http://www.w3.org/TR/owl-features/