380 likes | 396 Views
Learn the basics of XML, including its syntax and how to define elements, tags, attributes, and entities. Explore the importance of using a Document Type Descriptor (DTD) and the differences between PCDATA and CDATA. Examples provided.
E N D
Announcements • Final homework assigned Wednesday • Two week deadline • Will cover servlets + JAXP
Tomcat Configuration • Download the software. Go to http://jakarta.apache.org/builds/jakarta-tomcat-4.0/release/ and download and unpack the zip file for the latest version (4.1.12 as of last revision of this page). • Enable the ROOT context. Edit install_dir/conf/server.xml and uncomment this line: <Context path="" docBase="ROOT" debug="0"/>. Not necessary in Tomcat 4.0.3 and earlier.
Tomcat Configuration 3. Enable the invoker servlet. Go to install_dir/conf/web.xml and uncomment the servlet-mapping element that maps the invoker servlet to /servlet/*. Not necessary prior to Tomcat 4.1.12. 4. Change the port to 80. Edit install_dir/conf/server.xml and change the port attribute of the Connector element from 8080 to 80.
Tomcat Configuration 5. Turn on servlet reloading. Edit install_dir/conf/server.xml and add a DefaultContext subelement to the main Service element and supply true for the reloadable attribute. 6. Set the JAVA_HOME variable. Set it to refer to the base JDK directory, not the bin subdirectory.
Servlet example • Goal: Create StudentSort program with web-enable interface? • Ideas?
Outline • What is XML? • JAXP • SAX API • DOM API • XSLT API • JAX-RPC • JAXM • JAXR
XML Basics • Think of XML as a language-neutral text-based way of representing simple objects. • For example, in C: struct Auto{ char* make, char* model, int quantity, double price };
XML Basics, cont • In Java: class Auto{ public String make; public String model; public int quantity; public double price } • Note that C and Java Auto objects are not human-readable, can not be interchanged between languages and platforms (in C case).
XML version of Auto • XML is a formalism for describing simple structures in a language neutral way: <!DOCTYPE Auto [ <! ELEMENT Auto (make,model,quantity,price)> <!ELEMENT make (#PCDATA)> <!ELEMENT model (#PCDATA)> <!ELEMENT quantity (#PCDATA)> <!ELEMENT price (#PCDATA)> ]>
XML version of Auto <!DOCTYPE Auto [ <! – Each XML file is stored in a document whose name is the same as the root node -- > <! ELEMENT Auto (make,model,quantity,price)> <! – Auto has four attributes -- > <!ELEMENT make (#PCDATA)> <! – make is parsed character data -- > <!ELEMENT model (#PCDATA)> <!ELEMENT quantity (#PCDATA)> <!ELEMENT price (#PCDATA)> ]>
Making Auto objects • In C struct Auto a; a.make = “Ford”; a.model = “Pinto; etc. ... • In Java Auto a = new Auto(); a.make = “Ford”; etc. ...
Making auto objects, cont. • XML is not a programming language! • We make a car object in an xml file: <auto> <make>Ford</make> <model>Pinto</model> <quantity>100</quantity> <price>1200.50</price> </auto> • Think of this as like a serialized java object.
XML vs. DTD • Note that there are two parts to what we did • Defining the “structure” layout • Defining an “instance” of the structure • The first is done with a Document Type Descriptor (DTD) • The second is the XML part • Both can go in the same file, or an XML file can refer to an external DTD file • We will look at the syntax of each
Aspects of XML syntax • It is illegal to omit closing tags • XML tags are case-sensitive • XML elements must be properly nested • XML elements must have a root element • XML preserves whitespaces • XML comments: < -- This is a comment -- >
DTD syntax • Can be declared either before XML or in separate file as: <!DOCTYPE root-element SYSTEM "filename"> • Note: DTD’s are not required, but they are strongly recommended for: • ensuring correctness of XML document • standardizing industry-wide layouts
Building blocks of XML • From DTD point of view, XML files are composed of: • Elements • Tags • Attributes • Entities • PCDATA • DATA
Elements • Elements are the fields of the structure, so to speak: • The elements of auto are make, model, price, quantity • Tags are the things that surround (mark-up) elements (</make> ...) • Attributes provide extra information about elements • <model color=“red”/> • Entities • characters that are parsed by XML parser. Things like <, >, &, etc.
DTD elements, cont. • PCDATA • Parsed character data. • Text found between the start tag and the end tag of an XML element. • Text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. • CDATA • Character data. • Text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
Examples • Study the three simple course examples: • cars.xml • employees.xml • computers.xml • Run the validating parser listed on the course webpage to verify that the xml document is well-formed • See dtd and xml tutorials listed on course website for more cool stuff.
What can we do with this? • So, we have an XML document. • We think of this as a text-based, language neutral serialized object. • But what can we do with it? • How does it relate to html? • Why is it useful?
Displaying XML • Notice that XML documents have no display information • Opening with a browser will show the raw source nicely colored • Display information can be associated with an XML file by using one of two technologies: • CSS (Cascading Style Sheets) • Simple, quick • XSL (Extensible Stylesheet Language) • made up of XSLT, XPATH, and XSL Formatting Objects • Far slicker, more general, and complicated
Displaying XML • We will not study how to display XML in this course • Please see CSS and XSL tutorials listed on course website • Main point is that data is decoupled from display. • Display can be plugged in separately, but data can still be accessed. • Compare computers.html and computers.xml • What is the advantage of computers.xml?
Java and XML • To this point the only special thing about XML is that data and presentation were decoupled, unlike HTML. • The next key feature is language support • Language-neutral standards exists for parsing XML documents! • This means that XML documents are great ways of sharing data between programs, sending messages, etc.
Parsing XML in Java • Two distinct traditional models: • DOM (Document Object Model) • SAX(Simplified API for XML Parsing) • Both are language-neutral standards with bindings specified for Java • These are defined as part of JAXP (Java API for XML Processing) • JDK contains DOM and SAX parser implementations
DOM • DOM represents and XML document in java memory as a tree. • DOM is both read-write • Getting a DOM parser in Java: import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder();
Parsing XML to DOM • Once an instance of the parser is obtained: org.w3c.dom.Document document = builder.parse(xmlFile); • Once a Document object is obtained: org.w3c.dom.Node = document.getDocumentElement(); gets the root Node of the document
Manipulating Nodes • Once the root node is obtained, typical tree methods exist to manipulate other elements: boolean node.hasChildNodes() NodeList node.getChildNodes() Node node.getNextSibling() Node node.getParentNode() String node.getValue(); String node.getName(); String node.getText(); void setNodeValue(String nodeValue); Node insertBefore(Node new, Node ref);
What are Nodes? • Ok, but what information is placed in Nodes? See http://java.sun.com/j2se/1.4.1/docs/api/org.w3c.dom.Node.html • Note that the DOM parser parses everthing – white space, tags, comments, etc. • There are several ways to skip over info that doesn’t interest you • use e.g. (if node.getType == Node.TEXT_NODE) • use DocumentBuilderFactory methods to configure parser with comment parsing, etc. turned off.
Node Example • A good practice exercise is to do a depth-first traversal of a Document printing only the element informtion with level-dependent indentation. • I recommend doing this by yourself and then looking at my solution on the web site. • See ScanTree.java
SAX parser • SAX parser scans an xml stream on the fly and responds to certain parsing events as it encounters them. • This is very different than digesting an entire XML document into memory. • Much faster, requires less memory. • However, can not use to change XML. • Need to reparse if you need to revisit data.
Obtaining a SAX parser • Important classes javax.xml.parsers.SAXParserFactory; javax.xml.parsers.SAXParser; javax.xml.parsers.ParserConfigurationException; //get the parser SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); //parse the document saxParser.parse( new File(argv[0]), handler);
DefaultHandler • Note that an event handler has to be passed to the SAX parser. • This must implement the interface org.xml.sax.ContentHanlder; • Easier to extend the adapter org.xml.sax.helpers.DefaultHandler
Overriding Handler methods • Most typical methods to override void startDocument() void endDocument() void startElement(...) void endElement(...) void characters(...) • See examples on course website
Writing XML from DOM import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.Result; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Document; import java.io.File; public class WriteXML{ public static void write(Document doc, String filename) throws Exception { /* Prepare the DOM document for writing */ Source source = new DOMSource(doc); Result result = new StreamResult(new File(filename))); /* Write the DOM document to the file */ Transformer xformer = TransformerFactory.newInstance().newTransformer(); xformer.transform(source, result);}}
Using JAXP to simplify file input • Goal: read xml-based file into lottery servlet program.
Using JAXP for messaging • Goal: build a socket program that exchanges xml-based messages