1 / 43

XML DTDs and Schemas

XML DTDs and Schemas. Kevin McManus http://staffweb.cms.gre.ac.uk/~mk05/web/XML/1/. XML Basics. We have already looked at… What is XML and why it is significant Content versus presentation Displaying XML documents What XML is actually used for Well-formed XML documents

taylor
Download Presentation

XML DTDs and Schemas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML DTDs and Schemas Kevin McManus http://staffweb.cms.gre.ac.uk/~mk05/web/XML/1/ the University of Greenwich

  2. XML Basics • We have already looked at… • What is XML and why it is significant • Content versus presentation • Displaying XML documents • What XML is actually used for • Well-formed XML documents • This lecture aims to cover… • Further XML syntax • Valid XML documents • Introduction to DTDs and Schemas • Namespaces the University of Greenwich

  3. Applications of XML CML MathML WML VoiceML XHTML SMIL SVG RDF SOAP UDDI WSDL ebXML etc. etc. Supporting Specifications Xpath Xlink Xpointer Xquery XSLT XSL-FO CSS DOM etc. Supporting Tools Browsers – IE Mozilla APIs – DOM SAX Parsers – Expat MSXML Xerces IDEs – XMLSpy Stylus Core XML Syntax DTD XSD Namespaces XML Technologies the University of Greenwich

  4. DTDs and Schemas • DTDs and schemas (XSD) are alternative ways of defining an XML language. • They contain rules that specify things such as • the tags in the vocabulary • which tags are allowed to be nested in other tags • which tags and attributes are optional / mandatory • which values are allowed for attributes • XML languages defined by a DTDs or schemas are used to create valid XML documents. the University of Greenwich

  5. DTDs and Schemas • For an XML document to be valid it must conform to the rules specified in its DTD or Schema XML documents that use the language defined in the DTD or Schema DTD or Schema defines an XML language encapsulated definition of the data model the University of Greenwich

  6. Why do we need valid documents? agreed format • Application codes have to validate all data before processing • data i/o is a major source of system error • check that required elements are present • check that attribute values are correct. • If a change in the format is agreed between the two companies then the application code at both ends needs changing. Estate Agent Mortgage Broker XML the University of Greenwich

  7. Why do we need valid documents? • With an agreed DTD or Schema standard code can be used at each end to generate and check the data • off-the-shelf software • validating parsers • Changes only need to be made in one place • the DTD or Schema • A DTD or Schema is a way of representing an agreed data model in a machine readable form that can be processed bystandard software the University of Greenwich

  8. Why do we need valid documents? • Because DTDs and Schemas are machine readable they can be used by standard software in a variety of ways Estate Agent application Mortgage Broker application valid document XML editor Validating parser DTD / Schema the University of Greenwich

  9. DTDs and Schemas • DTDs • easy for humans to cope with • older than schemas • supported by a much wider range of XML tools and software • have poor support for namespaces • Schemas • more verbose • much more expressive than DTDs • data types, constraints on values • an XML based vocabulary • can be manipulated with general purpose XML tools • support namespaces the University of Greenwich

  10. Defining DTDs As an example we shall develop a DTD for an XML document type intended to list books recommended by lecturers for various courses. The first version of such documents will have the following structure: • root element is recommended_books • the root element contains zero or more book elements • each book element contains the following elements: author, title, year_published, publisher, course and recommended_by • the author and recommended_by elements both consists of firstname and surname elements the University of Greenwich

  11. <?xml version="1.0" encoding="UTF-8"?> <recommended_books> <book> <author> <firstname>Stephen</firstname> <surname>Spainhour</surname> </author> <title>Webmaster in a Nutshell</title> <year_published>1999</year_published> <publisher>O'Reilly</publisher> <course>WAT</course> <recommended_by> <firstname>Gill</firstname> <surname>Windall</surname> </recommended_by> </book> <book> <author> <firstname>Benoît</firstname> <surname>Marchal</surname> </author> <title>Applied XML Solutions</title> <year_published>2000</year_published> <publisher>Sams</publisher> <course>WAT</course> <recommended_by> <firstname>Kevin</firstname> <surname>McManus</surname> </recommended_by> </book> </recommended_books> goodbooks1.xml None of the tags in this example contain attributes Note how the firstname and surname elements appear in both author and recommended_by elements

  12. goodbooks1.dtd contains 10 element definitions <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT recommended_books (book*)> <!ELEMENT book (author, title, year_published, publisher, course, recommended_by)> <!ELEMENT author (firstname, surname)> <!ELEMENT title (#PCDATA)> <!ELEMENT year_published (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT course (#PCDATA)> <!ELEMENT recommended_by (firstname, surname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT surname (#PCDATA)> the University of Greenwich

  13. type element / tag name element contents <! ELEMENT recommended_books (book*) > <! ELEMENT book (author, title, year_published, publisher, course, recommended_by) > <! ELEMENT author (firstname, surname) > <! ELEMENT title (#PCDATA) > <! ELEMENT year_published (#PCDATA) > <! ELEMENT publisher (#PCDATA) > <! ELEMENT course (#PCDATA) > <! ELEMENT recommended_by (firstname, surname) > <! ELEMENT firstname (#PCDATA) > <! ELEMENT surname (#PCDATA) > goodbooks1.dtd the University of Greenwich

  14. goodbooks1.dtd • The DTD can be read as meaning: • recommended_books contains zero of more book elements • each book element contains in order the elements: • author • title • year_published • publisher • course • recommended_by • the author and recommended_by elements both consists of firstname and surname elements • the title, year_published, publisher, course, firstname and surname elements consist of text • the actual data the University of Greenwich

  15. Expression Meaning of contents eleA? eleA is optional eleA+ eleA occurs one of more times eleA* eleA occurs zero or more times eleA | eleB eleA or eleB occurs but not both eleA, eleB eleA is followed by eleB (eleA,eleB)* parentheses ( ) are used to group elements so this means zero or more occurrences of eleA followed by eleB #PCDATA parsed character data - a string of text DTD syntax the University of Greenwich

  16. Four Element Forms • Empty Elements have no element content • can still contain information in attributes. • Element-Only Elements contain only child elements • content model is a list of child elements arranged using the expressions listed in the previous table • Text-Only Elements contain only character data (text) • content model is simply #PCDATA • Mixed Elements contain both child elements and character data • content model must contain • a choice list beginning with #PCDATA • the rest of the choice list contains the child elements • it must end in an asterisk indicating that the entire choice group is optional • although this constrains the type of child element it does not constrain the order or quantity the University of Greenwich

  17. Quick Quiz <!ELEMENT transactions (tran*)> <!ELEMENT tran (account, (debit|credit)?)> <!ELEMENT account (#PCDATA)> <!ELEMENT debit (#PCDATA)> <!ELEMENT credit (#PCDATA)> Here's a DTD Why is the following not a valid document according to the DTD? <transactions> <tran><account>7652</account></tran> <tran><account>9856</account><credit>23.56</credit></tran> <tran><account>0085<debit>45.50</debit></account></tran> <tran> <account>1134</account> <debit>100</debit><credit>23.56</credit> </tran> </transactions> the University of Greenwich

  18. goodbooks2.xml • Extending the recommended books example to include attributes • The definition of the document type is changed to: • make the year_published element optional • allow more than one course to be referenced • include a rating attribute of the book element which can take the values "ok" or "good" or "excellent" and has a default value of "ok" the University of Greenwich

  19. <?xml version="1.0" encoding="UTF-8"?> <recommended_books> <book rating="excellent"> <author> <firstname>Stephen</firstname> <surname>Spainhour</surname> </author> <title>Webmaster in a Nutshell</title> <year_published>1999</year_published> <publisher>O'Reilly</publisher> <course>WAT</course> <course>Internet Publishing</course> <recommended_by> <firstname>Gill</firstname> <surname>Windall</surname> </recommended_by> </book> <book rating="good"> <author> <firstname>Benoît</firstname> <surname>Marchal</surname> </author> <title>Applied XML Solutions</title> <publisher>Sams</publisher> <course>WAT</course> <recommended_by> <firstname>Kevin</firstname> <surname>McManus</surname> </recommended_by> </book> </recommended_books> attribute repeated course element attribute omitted year_published goodbooks2.xml

  20. goodbooks2.dtd year_published is now optional <?xml version="1.0" ?> <!ELEMENT recommended_books (book*)> <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)><!ATTLIST book rating (ok | good | excellent) "ok"> <!ELEMENT author (firstname, surname)> <!ELEMENT title (#PCDATA)> <!ELEMENT year_published (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT course (#PCDATA)> <!ELEMENT recommended_by (firstname, surname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT surname (#PCDATA)> course can occur more than once new rule defining a rating attribute for the book element the University of Greenwich

  21. Attribute Rules • "ok" is the default value from the rating enumerated series • Other attribute definitions are possible: • #REQUIRED – the attribute is required • #IMPLIED – the attribute is optional • #FIXED value – the attribute has a fixed value (constant) • As well as enumerated attribute types there are: • CDATA – unparsed character data • NOTATION – notation declared elsewhere in the DTD • ENTITY – external entity • ID – unique identifier • IDREF – reference to an ID elsewhere in the DTD • NMTOKEN – name containing only token characters, i.e. no whitespace • Attributes can be defined anywhere in the DTD • but usualy placed immediately after the corresponding element • Multiple attributes for an element are declared in a singe attribute list <!ATTLIST book rating (ok | good | excellent) "ok" reviewer CDATA #REQUIRED > the University of Greenwich

  22. Not so Quick Quiz • How do you decide if information should be in an element or an attribute? the University of Greenwich

  23. Linking the DTD to the XML document The XML document can refer to an external DTD using <!DOCTYPE > name of the root element <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE recommended_books SYSTEM "goodBooks2.dtd"><recommended_books> <book rating="excellent"> <author> <firstname>Stephen</firstname> ...... URL of document containing the DTD the University of Greenwich

  24. Linking the DTD to the XML document Alternatively the DTD can be included inline within the XML document <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE recommended_books [ <!ELEMENT recommended_books (book*)> <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> <!ATTLIST book rating (ok | good | excellent) "ok"> <!ELEMENT author (firstname, surname)> <!ELEMENT title (#PCDATA)> <!ELEMENT year_published (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT course (#PCDATA)> <!ELEMENT recommended_by (firstname, surname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT surname (#PCDATA)> ]> <recommended_books> <book rating="excellent"> <author> <firstname>Stephen</firstname> the University of Greenwich

  25. Quick Quiz Suppose we want to define an element that can contain a mixture of other elements (i.e. tags) and plain text <about> This program was brought to you by <a href="http://www.webbedwonders.co.uk">Webbed Wonders</a>. We can be contacted at <address> <line>Lettuce Towers</line> <line>Braythorpe Street</line> <line>Wessex</line> <postcode>WA1 7QT</postcode> </address> Thanks for your interest.</about> Which of the following do you think is the correct way of specifying in a DTD the <about> element as used above? 1. <!ELEMENT about (a, address)> 2. <!ELEMENT about (#PCDATA | a | address)*> 3. <!ELEMENT about (#PCDATA, a, address)*> 4. <!ELEMENT about (#PCDATA, a, # PCDATA, address, #PCDATA)> 5. It's not possible because the document isn't well-formed. the University of Greenwich

  26. What else can you do with DTDs? • Specify that an attribute value is unique within a document (a bit like a primary key in a data base table) e.g. <!ATTLIST BankBranch BranchID ID #REQUIRED> • Specify that the value of one attribute refers to an attribute type ID using an attribute type IDREF (like a foreign key) e.g. <!ATTLIST account branch IDREF #REQUIRED> ....... <BankBranch BranchID="SC30_00_02"> ....... <account branch="SC30_00_02"> the University of Greenwich

  27. What else can you do with DTDs? • Define your own entities, often commonly used strings e.g. <!ENTITY Disclaimer "Umpire decision is final!"> ........ <footer>&Disclaimer;</footer> • Define ways of handling non-XML data e.g. <!NOTATION png SYSTEM 'png_view.exe'> ........ <diagram type="png" file="graph.png"> the University of Greenwich

  28. What can you not do with DTDs? • Specify the data type (e.g. integer) of an element or attribute • the only data type recognised is string • Specify a set of values that an element's content may take • you can do this for attributes but not elements • Write them using XML tools! • the <!ELEMENT> and <!ATTLIST> constructs are SGML comment declarations. • Easily mix vocabularies (i.e. XML vocabularies) from different DTDs. • Accurately define the structure of a mixed element • cf. the preceding quick quiz. • Because of these and other restrictions there have been a number of initiatives to develop alternatives to DTDs. • the one that has the backing of the W3C is the XML Schemas specification the University of Greenwich

  29. goodbooks3.xsd Re-writing goodbooks2.dtd as an XML schema results in a significantly longer file. This is listed over the next 4 slides with the corresponding DTD for comparison <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="recommended_books"> <xs:complexType> <xs:sequence> <xs:element ref="book" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <!ELEMENT recommended_books (book*)> the University of Greenwich

  30. goodbooks3.xsd <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="author"/> <xs:element ref="title"/> <xs:element ref="year_published" minOccurs="0"/> <xs:element ref="publisher"/> <xs:element ref="course" maxOccurs="unbounded"/> <xs:element ref="recommended_by"/> </xs:sequence> ...... unless stated the value of minOccurs and maxOccurs is 1 <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> the University of Greenwich

  31. goodbooks3.xsd Note how the attribute definition is nested within the definition of the book element ...... <xs:attribute name="rating" use="optional" default="ok"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="excellent"/> <xs:enumeration value="good"/> <xs:enumeration value="ok"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> <!ATTLIST book rating (ok | good | excellent) "ok"> the University of Greenwich

  32. goodbooks3.xsd <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="firstname"/> <xs:element ref="surname"/> </xs:sequence> </xs:complexType> </xs:element> <!ELEMENT author (firstname, surname)> <xs:element name="title" type="xs:string"/> <xs:element name="year_published" type="xs:short"/> <xs:element name="publisher" type="xs:string"/> <xs:element name="course" type="xs:string"/> note data types <!ELEMENT title (#PCDATA)> <!ELEMENT year_published (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT course (#PCDATA)> the University of Greenwich

  33. goodbooks3.xsd <xs:element name="recommended_by"> <xs:complexType> <xs:sequence> <xs:element ref="firstname"/> <xs:element ref="surname"/> </xs:sequence> </xs:complexType> </xs:element> <!ELEMENT recommended_by (firstname, surname)> <xs:element name="firstname" type="xs:string"/> <xs:element name="surname" type="xs:string"/> </xs:schema> <!ELEMENT firstname (#PCDATA)> <!ELEMENT surname (#PCDATA)> the University of Greenwich

  34. Things to notice about goodbooks3.xsd • XML schemas are much more verbose than DTDs • The XML schemas language itself conforms to XML syntax rules and so can be manipulated using standard XML tools (e.g. XML Spy) • More specific restrictions can be made on the occurrence of elements than with DTDs e.g. <!ELEMENT recommended_books (book*)> <xs:element ref="book" minOccurs="0" maxOccurs="unbounded "/> • both the above mean the same but in schemas minOccurs and maxOccurs can be used to restrict the number of allowed occurrences • In DTDs the only data type for elements is #PCDATA whereas schemas contain much more support for data types e.g. <xs:element name="title" type="xs:string"/> <xs:element name="year_published" type="xs:short"/> • A full range of data types are supported (e.g. boolean, float, datetime) plus you can define your own. • XML Schemas make use of namespaces <xs:element name="recommended_books"> the University of Greenwich

  35. Linking a Schema to an XML document Not totally standard and somewhat tied to W3C but the method below works with at least some tools that support Schemas <?xml version="1.0" encoding="UTF-8"?> <recommended_books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="goodbooks3.xsd"> <book rating="excellent"> <author> <firstname>Stephen</firstname> <surname>Spainhour</surname> ...... this line associates the schema stored in goodbooks2.xsd in the same directory with the XML document the University of Greenwich

  36. ProductML CustomerML InvoiceML Namespaces Namespaces are a way of avoiding name conflicts, i.e. where different XML vocabularies use the same names to mean different things. In designing an XML based language we may want to include elements from several other XML languages e.g. when defining a new XML language to describe invoice documents we may want to draw on existing languages for describing products and customers the University of Greenwich

  37. Namespaces What to do about name clashes, e.g. it is likely that ProductML and CustomerML both contain <name> elements <name>Giant Widget</name> <name>George Barford</name> We don't want applications that process InvoiceML to confuse the <name> elements. Dear Mr Giant Widget, Your George Barford has been despatched today ... the University of Greenwich

  38. Namespaces Namespaces give a mechanism for "qualifying" element names with a prefix so that they are all unique, e.g. <prod:name>Giant Widget</prod:name> <cust:name>George Barford</cust:name> Wherever you see element names including a prefix followed by a ":" you can be sure that namespaces are being used e.g. <xs:element name="event"> the University of Greenwich

  39. Namespaces The prefix needs to be defined in the XML document that is using it by including the xmlns attribute. For example to define the prod: and cust: prefixes in an invoice document declaring a namespace associated with the prod prefix <invoices xmlns:prod="http://mycompany.com/products" xmlns:cust="http://mycompany.com/customers" xmlns="http://mycompany.com/invoices"> <invoice> <invoice_id>2314</invoice_id> .... <prod:name>Giant Widget</prod:name> <cust:name>George Barford</cust:name> .... </invoice> </invoices> declaring a namespace associated with the cust prefix declaring a default namespace that uses no prefix the University of Greenwich

  40. Namespaces In the previous example it is tempting to guess that this line… <invoices xmlns:prod="http://mycompany.com/products" xmlns:cust="http://mycompany.com/customers" xmlns="http://mycompany.com/invoices"> associates the prod: prefix with an XML Schema located at http://mycompany.com/products and cust: with one at http://mycompany.com/customers But these URLs need not be actual locations at all - they are simply unique names used to identify namespaces. URIs (URLs & URNs) are convenient ways of specifying unique values. There is a way of tying prefixes to actual XML Schemas (but not DTDs) so that documents can be validated against multiple Schemas. The syntax is both messy and unclear and beyond what we are going to look at here. the University of Greenwich

  41. References • There are masses of XML books and websites. • “SAMS Teach Yourself XML in 24 hours” - Morrison • Cheap as chips, good scope but little depth • W3Schools online tutorial http://www.w3schools.com • Try their online XML test • World Wide Web consortium at http://www.w3.org • The home of the XML specification and so much more. • XML in practice from http://www.xml.org • Articles, white papers, user groups and more • XML resources and information from http://www.xml.org • Provided by Tim O’Reilly the University of Greenwich

  42. Summary • DTDs or Schemas are used to define valid XML languages • DTDs are • widely supported • have limited features • XSDs are • an XML language • provide tighter specification than DTDs • support namespaces the University of Greenwich

  43. Questions? the University of Greenwich

More Related