470 likes | 622 Views
XML. An introduction. xml. XML like HTML is created from the Standard Generalized Markup Language, SGML. A brief introduction to XML: A simple xml doc. <?xml version =“1.0”?> <!– a simple xml example…this is a comment --!> <mymessage> <message>Welcome to XML!</message> </mymessage>.
E N D
XML An introduction
xml • XML like HTML is created from the Standard Generalized Markup Language, SGML
A brief introduction to XML: A simple xml doc <?xml version =“1.0”?> <!– a simple xml example…this is a comment --!> <mymessage> <message>Welcome to XML!</message> </mymessage>
XML documents and format • An XML document contains data, not formatting information. As we’ll learn, there are ways (xsl and fo files, for example) to provide formatting for xml analogous to that in which css provided formatting for html.
XML • XML are typically stored in a file with suffix .xml, though this is not required. They can be created with any editor (save as ASCII text). Many packages like MS Word can save files as type .xml • An xml document contains a single root which contains other elements, Anything appearing before the root is called the prolog. Elements directly under the root are its children. The structure is recursive. • In the example, the root’s child message contains the text “Here is some message”.
The character set • XML characters are CR, LF and Unicode. • An XML document consists of markup and character data. • Markup is enclosed in angle brackets (like html): <> • Character data appears between the start and end tag. • An xml parser passes whitespace characters to the application. Insignificant whitespace can be collapsed in a process called normalization. • It is a good idea to add whitespace to an xml document for readability. • &, <, >, ‘ and “ are reserved characters. An “entity reference” makes it possible to use these as characters in the character data part of an xml document. • Entity references begin with & and end with ; • In this way character data is not confused with markup. • Single and double quote are used to delimit attribute values.
More on syntax • There must be exactly one root. • Proper nesting of elements is required. • Start tags require close tags. • Unlike HTML, the author can define her own tags in XML. • Tags are case sensitive • Parser needs to distinguish markup from character data • Typically, whitespace is normalized – reduced to 1 whitespace char. • Entity references are marked with an ampersand and allow us to use meta characters (‘<‘, ‘>’ and so on) which are part of the language syntax. • Entity references (for example, “<”) allow us to represent and distinguish the reserved characters <,>,& in XML. • They may only appear as an entity reference in character data
XML intro continued • A DOM-based parser returns a tree structure. A DOM parser must process the entire document to create a (java) object which may be 3 or 4X the size of the original. Not advisable if there are storage size constraints. • A SAX (Simple-API for XML) -based parser returns events. SAX parsers have a smaller footprint. • Many parsers can be downloaded for free and several come with java 1.4+
A brief introduction to XML • An xml validator parses an XML document and indicates if it is correct. • A number of free “Validators” are available, including one from MS which I downloaded and used in this ppt.
Validator Microsoft provides a validating program free for download (with javascript and VBscript versions) at http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/samples/internet/xml/xml_validator/default.asp Or search MSDN+validator There are others out there: http://validator.w3.org/ http://www.stg.brown.edu/service/xmlvalid/ http://www.w3schools.com/XML/xml_validator.asp
Link to validator program on my w drive • http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm • This is a link for javascript validator • http://employees.oneonta.edu/higgindm/internet%20programming/validate_vbs.htm • This is a link for vbscript validator
MS Validator:http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm
Parser continued • The parser will indicate if the document is well-formed. • In DOM-based parsing, a ‘+’ in the left margin indicates a node has children and a’ –’ indicates all child nodes have been expanded. • The MS Validator uses color coding to indicate child nodes can be expanded • An element that stores other elements is called a container element. • The parser makes the document content available for further processing if it is well-formed.
Reserved characters • <message><>&</message> would enable a character data message to contain characters: <>&
DTD: document type definition • a dtd file may contain the definition of an xml structure. • XML files may refer back to a dtd. • If an XML document has a DTD or Schema, a validating parser can determine not merely if it is well-formed XML, but whether it is valid. • Valid means conforming to a dtd or schema.
Another example: Unicode • Lang.xml (next slide) uses unicode entity references to represent arabic words. • lang.dtd (also shown in a later slide) is used to generate unicode characters (arabic) for some entity references in the XML file.
DTD: document type definition: a dtd file may contain the definition of an xml structure. <?xml version = "1.0"?> <!-- Fig. 5.4 : lang.xml --> <!-- Demonstrating Unicode --> <!DOCTYPE welcome SYSTEM "lang.dtd"> <welcome> <from> <!-- Deitel and Associates --> دايتَل أند <!-- entity --> &assoc; </from> <subject> <!-- Welcome to the world of Unicode --> أهلاً بكم فيِ عالم <!-- entity --> &text; </subject> </welcome>
Lang.dtd <!-- lang.dtd --> <!ELEMENT welcome ( from, subject )> <!ELEMENT from ( #PCDATA )> <!ELEMENT subject ( #PCDATA )> <!ENTITY assoc "أسّوشِيَتْس"> <!ENTITY text "اليونيكود">
About the example • The DTD reference contains: DOCTYPE, the name of the root, the SYSTEM flag indicating the DTD file is external, and the name of that file. • Root element welcome contains two elements: from and subject. • Some lines contain entity references for unicode. • The DTD also defines some other entity references.
More about markup • XML end tags may consist of /> if there is an empty element as in <emptyelt xxxx /> • but otherwise must consist of a complete end-tag as in: <sometag> xxxxxxxxxxx </sometag> • Elements may or may not have content (child elements or character data) • Elements may have 0 or more attributes associated with them. Attributes appear in the element’s start tag: <car doors =“4”/> • Attribute values must appear in single or double quotes. • Element and attribute names may not contain blanks. • Here, element car has attribute doors with value 4. • Attributes may contain any characters and be of any length but must start with a letter or underscore.
Usage.xml uses a stylesheet <?xml version = "1.0"?> <!-- Fig. 5.5 : usage.xml --> <!-- Usage of elements and attributes --> <?xml:stylesheet type = "text/xsl" href = "usage.xsl"?> <book isbn = "999-99999-9-X"> <title>Deitel's XML Primer</title> <author> <firstName>Paul</firstName> <lastName>Deitel</lastName> </author> <chapters> <preface num = "1" pages = "2">Welcome</preface> <chapter num = "1" pages = "4">Easy XML</chapter> <chapter num = "2" pages = "2">XML Elements?</chapter> <appendix num = "1" pages = "9">Entities</appendix> </chapters> <media type = "CD"/> </book>
Usage.xls In notes <? Xxxxx ?> in usage.xml represents a pi (that is, a processing instruction). PI consist of a PI target (xml:stylesheet, in this example) and a PI value. Note syntax. PI can be used to help authors embed application-specific data in an xml document. If the application processing the xml doesn’t use the PI, then it has no effect on the xml document content.
Usage.XML document loaded into IE: Browser uses stylesheet to generate HTML
CData • The character data appearing in CData sections is ignored by the xml parser. • CData might be used for JavaScript or VBScript. • CData starts with <![CData[ and ends with ]]> • CData may contain reserved characters, but not the text: “]]>”
Text example 5.7 <?xml version = "1.0"?> <!-- Fig. 5.7 : cdata.xml --> <!-- CDATA section containing C++ code --> <book title = "C++ How to Program" edition = "3"> <sample> // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); </sample> <sample> <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); ]]> </sample> C++ How to Program by Deitel & Deitel </book>
letter.xml - I removed blank lines to get it to fit here <?xml version = "1.0"?> <letter> <contact type = "from"> <name>Jane Doe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag gender = "F"/> </contact> <contact type = "to"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag gender = "M"/> </contact> <salutation>Dear Sir:</salutation> <paragraph>It is our privilege to inform you about our new database managed with <bold>XML</bold>. This new system allows you to reduce the load on your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <paragraph>The data in an XML element is normalized, so plain-text diagrams such as /---\ | | \---/ will become gibberish.</paragraph> <closing>Sincerely</closing> <signature>Ms. Doe</signature> </letter>
namespaces • Naming collisions can occur when xml authors use the same tag names • Namespaces provide a mechanism for making tag references unambiguous. • A namespace reference appears with the start and end tags followed by a colon. So, • <movie:character>Scrooge</movie:character> can be differentiated from <ascii:character>colon</ascii:character> • Namespace prefixes are tied to unique URI in the xml document. Almost any name can be used to create a namespace prefix. • In this example ascii and movie are namespace prefixes. Namespace prefixes can precede element and attribute values to avoid collisions. • A URL may be used for a URI. The only requirement though is uniqueness as the URLs are not visited by the parser.
Namespace example 5.8 <?xml version = "1.0"?> <!-- Fig. 5.8 : namespace.xml --> <!-- Namespaces --> <text:directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </text:directory>
Namespaces continued • Providing a prefix can be tedious. A default namespace can be created and elements and attributes used in the xml document from this namespace do not need prefixes.
Default namespaces <?xml version = "1.0"?> <!-- Fig. 5.9 : defaultnamespace.xml --> <!-- Using Default Namespaces --> <directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <file filename = "book.xml"> <description>A book list</description> </file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </directory>
Default namespaces • Now, file is in the default namespace. • Compare this example to the earlier namespace example where text and image were distinct namespaces.
Day planner case study…to be continued… <?xml version = "1.0"?> <!-- Fig. 5.10 : planner.xml --> <!-- Day Planner XML document --> <planner> <year value = "2000"> <date month = "7" day = "15"> <note time = "1430">Doctor's appointment</note> <note time = "1620">Physics class at BH291C</note> </date> <date month = "7" day = "4"> <note>Independence Day</note> </date> <date month = "7" day = "20"> <note time = "0900">General Meeting in room 32-A</note> </date> <date month = "7" day = "20"> <note time = "1900">Party at Joe's</note> </date> <date month = "7" day = "20"> <note time = "1300">Financial Meeting in room 14-C</note> </date> </year> </planner>
day planner using a java GUI. SAX parser is used to parse the document.(in text chapter 8)
Homework on this section • Install an xml validator • Create your own xml file and validate it. • Post screenshots of your XML file and what validator.