1.19k likes | 1.54k Views
Introduction to XML. Is a Well-formed XML Document Valid?. Is a Well-formed Document Valid?. An XML document is said to be well-formed if it follows all of the "rules" of XML, such as proper nesting and attribute use, so by definition all XML documents are well-formed.
E N D
Introduction to XML Is a Well-formed XML Document Valid?
Is a Well-formed Document Valid? • An XML document is said to be well-formed if it follows all of the "rules" of XML, such as proper nesting and attribute use, so by definition all XML documents are well-formed. • A valid document, on the other hand, is one that is not only well-formed, but also follows the restrictions set out in a specific grammar, typically specified in a Document Type Definition (DTD) or some form of XML Schema.
Is a Wellformed Document Valid? • An example of a document that is well-formed but not valid based upon the XHTML grammar. <body><p>Example of Well-formed HTML</p><head><title>Example</title></head><zorko>What is this?</zorko></body> Why?
HTML vs. XML • In the case of HTML, browsers have been taught how to ignore invalid HTML such as the <zorko> element and generally do their best when dealing with badly placed HTML elements. • The XML processor, on the other hand, can not tell us which elements and attributes are valid. As a result we need to define the XML markup we are using. To do this, we need to define the markup language’s grammar.
Introduction to XML Tools Used to CreateXML Languages
Tools of the Trade • There are numerous “tools” that can be used to build an XML language – some relatively simple, some much more complex. • They include: • DTD (Document Type Definition) • RELAX • TREX • RELAX NG • XML Schema • Schmatron
DTD • The Document Type Definition, a direct descendant of SGML. • DTDs are what are used to describe HTML and XHTML in addition to other markup languages. • DTDs possess their own special notation – which is were we will start.
RELAX • Written by Dr. Murata Makoto. • RELAX stands for Regular Language description for XML • It is a simpler notation for describing markup grammar than XML Schema. • Like XML-Schema, it is also XML-based. • Resources: • http://www.xml.gr.jp/relax/ • http://www.xml.gr.jp/relax/html4/howToRELAX_full_en.html
TREX • Designed by James Clark. • TREX stands for Tree Regular Expressions for XML • TREX is a simple, concise notation. • Resources: • http://www.thaiopensource.com/trex/ • http://www.thaiopensource.com/trex/tutorial.html
RELAX NG • Created by James Clark and Dr. Murata Makoto. • RELAX NG stands for RELAX Next Generation. • RELAX NG is a convergence of the RELAX and TREX grammars. • Resources: • http://www.oasis-open.org/committees/relax-ng/spec-20011203.html • http://www.oasis-open.org/committees/relax-ng/tutorial.html
XML Schema • Developed by the W3C initiative. • It is an XML-based markup language used to describe grammar. • It is VERY complicated in comparison to other methods of describing grammar.
Schmatron • Developed by Rick Jelliffe. • Allows you to define patterns rather than grammar like DTDs, Relax NG and XML Schema among others. • Resources: • http://www.ascc.net/xml/resource/schematron/
DTDs Associating DTDs with XML Documents
Types of DTDs • In general we can say that there are two main types of DTDs: • Internal • External • Internal DTDs reside within the XML instance file whereas External DTDs reside in an separate DTD document. • Later we will find out how to merge the two to formulate a mixed DTD.
Internal DTDs • An internal DTD is embedded within the XML instance document using the following syntax. <!DOCTYPE root-element [ <!-- DTD declarations --> ] >
An Example <?xml version=“1.0”?> <!DOCTYPE body [ <!ELEMENT body (#PCDATA)> <!ATTLIST body color CDATA #IMPLIED> ] > <body color=“blue”>Content Goes Here</body>
External DTDs • External DTDs come in two forms: • LOCAL and • PUBLIC • Regardless of whether you are using a local or public DTD, to link an external DTD to a document, you must include a DOCTYPE declaration within your XML document just as you should with HTML or XHTML document.
Specifying a LOCAL DTD • To specify a DTD on a LOCAL, non-public server, you would use the following format for including the DOCTYPE declaration: <!DOCTYPE root-element SYSTEM “uri-of-dtd”>
An Example <?xml version=“1.0”?> <!DOCTYPE address-book SYSTEM “/httpd/dtds/catalog.dtd”> <address-book> <contact> <last-name>Smith</last-name> <first-name>Joe</first-name> <phone type=“home”>408-555-5555</phone> <email>Joe.Smith@samplemail.com</email> </contact> </address-book>
Specifying a PUBLIC DTD • To specify a DTD on a PUBLIC server that is widely known and advertised, you would use the following format within the DOCTYPE declaration: <!DOCTYPE root-element PUBLIC “public-identifier” “uri-of-dtd”>
An Example <?xml version=“1.0”?> <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> <html> <head> <title>Sample Document</title> </head> <body>Content Goes Here</body> <html>
Mixed DTDs • A mixed DTD is a combination of both an External and an Internal DTD. • Note that in the case of including both, the internal DTD declarations will override the external DTD declarations given they define the same element. <!DOCTYPE root-element EXTERNAL-DTD-TYPE “public-identifier” “uri-of-dtd” [ <!-- DTD declarations --> ] >
An Example <?xml version=“1.0”?> <!DOCTYPE address-book SYSTEM “/httpd/dtds/catalog.dtd” [ <!ELEMENT phone (home | work)> <!ELEMENT home (#PCDATA)> <!ELEMENT work (#PCDATA)> ] > <address-book> <contact> <last-name>Smith</last-name> <first-name>Joe</first-name> <phone><home>408-555-5555</home></phone> </contact> </address-book>
An Example <?xml version=“1.0”?> <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd” [ <!ELEMENT body (#PCDATA)> <!ATTLIST body color CDATA #IMPLIED> ] > <html> <head> <title>Sample Title</title> </head> <body color=“blue”>Content Goes Here</body> </html>
DTDs Creating DTD Documents
Getting Started • The first thing we must do before taking our markup language to the keyboard is to decide which information should be described as a element and which information should be described as an attribute. • Elements, in general, hold content that is part of the document (output). • Attributes, on the other hand, are used to modify the behavior of an element.
Next Steps • The next thing we should do is formulate a single record using these elements and attributes within an XML instance document and check to see if it is wellformed. • After we decide on our elements and attributes for our markup language and create a sample XML instance file, we can begin building the DTD file.
Building the DTD • When we build the DTD file, we can do it in several different ways which include: • Top-Down • Alphabetical • Random • A top-down ordered DTD begins with the outermost or root element and progresses downwards until it reaches the innermost element(s). This is the typical format you would see in a public DTD. (See Case Studies 2 and 4.) • An alphabetical DTD places all elements in alphabetical order as the DTD is built.
A Few Notes About DTDs • When creating DTD documents, you should remember that the statements are not normally INDENTED and instead they are normally flush left unless there is a line continuation involved. • Attributes associated with elements are normally defined immediately after the element is defined. • Entities are normally defined at the beginning of the document prior to their use.
DTD Syntax Comments
Comments • Defining a comment within a DTD is done the same way as we do with HTML using: <!-- a sample DTD comment --> • Note that the space after the opening dashes and before the closing dashes is required! • Two or more dashes without separators can not be placed within the DTD comment with the exception of at the beginning or end of the tag. • Character entities can not be included in comments.
DTD Syntax Elements
Element Basics • Defining elements within a DTD is done using an <!ELEMENT> declaration. • <!ELEMENT> declarations along with all other declarations within a DTD have no content. • <!ELEMENT> declarations are composed of several parts including the element name and the type of information it will contain. • The resulting element names will be case sensitive. <!ELEMENT element_name element_contents>
An Example <!ELEMENT catalog element_content> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <!–- element content goes here --> </catalog> XML File
What an <!ELEMENT> Can Contain • An <!ELEMENT> declaration can contain several different types of content which include the following: • EMPTY. • PCDATA. • ANY. • Children Elements
EMPTY • <!ELEMENT> declarations that include the EMPTY value allow us to create empty elements within our xml. • The word EMPTY must be entered in uppercase as it is case-sensitive. <!ELEMENT element_name EMPTY>
An Example <!ELEMENT catalog EMPTY> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog/> XML File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog></catalog> This can be considered be misleading.Why?
PCDATA • <!ELEMENT> declarations that include the value PCDATA allow us to include text and other parsable content in our elements within our XML instance file. • The word PCDATA must be enclosed in parenthesis with a preceding ’#’ and entered in uppercase as it is case-sensitive. • PCDATAis text that will be parsed by a parser. Tags inside the text will treated as markup and entities will be expanded. <!ELEMENT element_name (#PCDATA)>
An Example <!ELEMENT catalog (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> Household Plants </catalog> XML File
ANY • <!ELEMENT> declarations that include the value ANY allow us include any type of parsable content, including text and other elements, in our elements within our XML instance file. • The word ANY must be entered in uppercase as it is case-sensitive. <!ELEMENT element_name ANY>
An Example <!ELEMENT catalog ANY> <!ELEMENT xyz (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <xyz> Catalog Content </xyz> </catalog> XML File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> Household Plants </catalog> XML File
Children Elements • As mentioned before, the <!ELEMENT> declarationscan contain children elements allowing for the nesting of elements within a document. • Children elements can be added in a variety of ways including adding a single element or multiple elements and optional elements.
A Single Child Element • <!ELEMENT> declarations that include other elements as values to allow us to specify children elements within our xml. • Children elements should be listed after the parent <!ELEMENT>. <!ELEMENT element_name (child_element)>
An Example <!ELEMENT catalog (card)> <!ELEMENT card (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <card> Card Information </card> </catalog> XML File
Optional Children Elements • <!ELEMENT> declarationsthat include zero or one of a single child element as values can be specified by placing a question mark (?) immediately after the child element name. <!ELEMENT element_name (child_element?)>
An Example <!ELEMENT catalog (card?)> <!ELEMENT card (#PCDATA)> DTD File Note that this example is not quite right. Why? <?xml version=“1.0” encoding=“UTF-8”?> <DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> </catalog> <?xml version=“1.0” encoding=“UTF-8”?> <DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <card> Card Information </card> </catalog> XML File
An Example 0 or 1 room <!ELEMENT class (room?)> <|ELEMENT room (seats)> <|ELEMENT seats (seat)> <!ELEMENT seat (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE class SYSTEM “class.dtd”> <class> <room> <seats> <seat>1</seat> </seats> </room> </class> XML File
An Example 0 or 1 room <!ELEMENT class (room?)> <|ELEMENT room (seats?)> <|ELEMENT seats (seat*)> <!ELEMENT seat (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE class SYSTEM “class.dtd”> <class></class> XML File <?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE class SYSTEM “class.dtd”> <class/> XML File
Zero or More Children Elements • <!ELEMENT> declarationsthat include zero or more of a single child element as values can be specified by placing a splat (*) immediately after the child element name. <!ELEMENT element_name (child_element*)>
An Example <!ELEMENT catalog (card*)> <!ELEMENT card (#PCDATA)> DTD File <?xml version=“1.0” encoding=“UTF-8”?> <DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog></catalog> XML File <?xml version=“1.0” encoding=“UTF-8”?> <DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <card> Card Information 1 </card> <card> Card Information 2 </card> </catalog> XML File