1.18k likes | 1.73k Views
Java and XML (DOM and SAX). Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin Dick “The XML Companion” by Bradley Java Documentation from Sun Microsystems
E N D
Java and XML (DOM and SAX) Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin Dick “The XML Companion” by Bradley Java Documentation from Sun Microsystems “XML and Java” by Maruyama, Tamura and Uramoto On and Off the internet… Internet Technologies
Java and XML (DOM and SAX) • Parser Operations with DOM and SAX overview • Processing XML with SAX (locally and on the internet) • Processing XML with DOM (locally and on the internet) Internet Technologies
FixedFloatSwap.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies
FixedFloatSwap.dtd <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Internet Technologies
Operation of a Tree-based Parser XML DTD Document Tree Tree-Based Parser Application Logic Valid XML Document Internet Technologies
Tree Benefits • Some data preparation tasks require early access to data that is further along in the document (e.g. we wish to extract titles to build a table of contents) • New tree construction is easier (e.g. XSLT works from a tree to convert FpML to WML) Internet Technologies
Operation of an Event Based Parser XML DTD Event-Based Parser Application Logic Valid XML Document Internet Technologies
Operation of an Event Based Parser XML DTD public void startDocument () public void endDocument () public void startElement (…)) public void endElement (…) public void characters (…)) Event-Based Parser Application Logic Valid public void error(SAXParseException e) throws SAXException { System.out.println("\n\n--Invalid document ---" + e); } XML Document Internet Technologies
Event-Driven Benefits • We do not need the memory required for trees • Parsing can be done faster with no tree construction going on Internet Technologies
XML API’s w/jaxpack Internet Technologies
Important SAX interfaces and classes • class InputSource -- A single input source for an XML entity • interface XMLReader -- defines parser behavior (implemented by Xerces’ • SAXParser) • Four core SAX2 handler interfaces: • EntityResolver • DTDHandler • ContentHandler • ErrorHandler Implemented by class DefaultHandler Internet Technologies
Processing XML with SAX • interface XMLReader -- defines parser behavior (implemented by Xerces’ • SAXParser) • XMLReader is the interface that an XML parser's SAX2 driver must implement. • This interface allows an application to set and query features and properties in • the parser, to register event handlers for document processing, and to initiate • a document parse. Internet Technologies
Processing XML with SAX • We will look at the following interfaces and classes and then study an example • interface ContentHandler -- reports on document events • interface ErrorHandler – reports on validity errors • class DefaultHandler – implements both of the above plus two others Internet Technologies
public interface ContentHandler Receive notification of general document events. This is the main interface that most SAX applications implement: if the application needs to be informed of basic parsing events, it implements this interface and registers an instance with the SAX parser using the setContentHandler method. The parser uses the instance to report basic document-related events like the start and end of elements and character data. Internet Technologies
Some methods from the ContentHandler Interface void characters(…) Receive notification of character data. void endDocument(…) Receive notification of the end of a document. void endElement(…) Receive notification of the end of an element. void startDocument(…) Receive notification of the beginning of a document. void startElement(…) Receive notification of the beginning of an element. Internet Technologies
public interface ErrorHandler Basic interface for SAX error handlers. If a SAX application needs to implement customized error handling, it must implement this interface and then register an instance with the SAX parser. The parser will then report all errors and warnings through this interface. For XML processing errors, a SAX driver must use this interface instead of throwing an exception: it is up to the application to decide whether to throw an exception for different types of errors and warnings. Note, however, that there is no requirement that the parser continue to provide useful information after a call to fatalError. Internet Technologies
public interface ErrorHandler Some methods are: void error(SAXParseException exception) Receive notification of a recoverable error. void fatalError(SAXParseException exception) Receive notification of a non-recoverable error. void warning(SAXParseException exception) Receive notification of a warning. Internet Technologies
public class DefaultHandler extends java.lang.Object implements EntityResolver, DTDHandler, ContentHandler, ErrorHandler Default base class for handlers. This class implements the default behaviour for four SAX interfaces: EntityResolver, DTDHandler, ContentHandler, and ErrorHandler. Internet Technologies
FixedFloatSwap.dtd <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Bank (#PCDATA)> <!ELEMENT Notional (#PCDATA)> <!ATTLIST Notional currency (dollars | pounds) #REQUIRED> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Input DTD Internet Technologies
FixedFloatSwap.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Input XML Internet Technologies
Processing // NotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and // Uramoto import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; public class NotifyStr extends DefaultHandler { Internet Technologies
public static void main (String argv []) throws IOException, SAXException { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); InputSource inputSource = new InputSource(argv[0]); reader.setContentHandler(new NotifyStr()); reader.parse(inputSource); System.exit (0); } Internet Technologies
public NotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } Internet Technologies
public void startElement(String namespaceURI, String localName, String qName, Attributes aMap) throws SAXException { System.out.println("startElement called: element name =" + localName); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getLocalName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } } Internet Technologies
public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); } } Internet Technologies
C:\McCarthy\www\95-733\examples\sax>java NotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation startElement called: element name =Notional attribute name = currency type = dollars|pounds value = pounds characters called:100 startElement called: element name =Fixed_Rate characters called:5 startElement called: element name =NumYears characters called:3 startElement called: element name =NumPayments characters called:6 endDocument called: Output Internet Technologies
Accessing the swap from the internet <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Saved under webapps/sax/fpml/FixedFloatSwap.xml Internet Technologies
The Deployment Descriptor <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd"> <web-app> <servlet> <servlet-name>SaxExample</servlet-name> <servlet-class>GetXML</servlet-class> </servlet> <servlet-mapping> <servlet-name>SaxExample</servlet-name> <url-pattern>/GetXML/*</url-pattern> </servlet-mapping> </web-app> webapps/sax/WEB-INF/web.xml Internet Technologies
// This servlet file is stored under Tomcat in // webapps/sax/WEB-INF/classes/GetXML.java // This servlet returns a user selected xml file from // webapps/sax/fpml directory // and returns it as a string to the client. import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; public class GetXML extends HttpServlet { // Servlet Internet Technologies
public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { System.out.println("doGet called with " + req.getPathInfo()); String theData = ""; String extraPath = req.getPathInfo(); extraPath = extraPath.substring(1); // read the file try { // open file and create a DataInputStream FileInputStream theFile = new FileInputStream( "D:\\jakarta-tomcat-4.0.1\\webapps\\sax\\fpml\\“ +extraPath); Internet Technologies
InputStreamReader is = new InputStreamReader(theFile); BufferedReader br = new BufferedReader(is); // read the file into the string theData String thisLine; while((thisLine = br.readLine()) != null) { theData += thisLine + "\n"; } } catch(Exception e) { System.err.println("Error " + e); } Internet Technologies
PrintWriter out = res.getWriter(); out.write(theData); System.out.println("Wrote document to client"); //System.out.println(theData); out.close(); } } Internet Technologies
// TomcatNotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and Uramoto import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; public class TomcatNotifyStr extends DefaultHandler { public static void main (String argv []) throws IOException, SAXException { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } // Client Internet Technologies
XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); String serverString = "http://localhost:8080/sax/GetXML/"; String fileName = argv[0]; InputSource inputSource = new InputSource(serverString + fileName); reader.setContentHandler(new TomcatNotifyStr()); reader.parse(inputSource); System.exit (0); } Internet Technologies
public TomcatNotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } Internet Technologies
public void startElement(String namespaceURI, String localName, String qName, Attributes aMap) throws SAXException { System.out.println("startElement called: element name =" + localName); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getLocalName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } } Internet Technologies
public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); } } Internet Technologies
Being served by the servlet <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies
C:\McCarthy\www\95-733\examples\sax>java TomcatNotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap characters called: startElement called: element name =Bank characters called:Pittsburgh National Corporation characters called: startElement called: element name =Notional attribute name = currency type = CDATA value = pounds characters called:100 characters called: startElement called: element name =Fixed_Rate characters called:5 characters called: startElement called: element name =NumYears characters called:3 characters called: startElement called: element name =NumPayments characters called:6 characters called: characters called: endDocument called: Output Internet Technologies
Let’s Add Back the DTD… <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Bank (#PCDATA)> <!ELEMENT Notional (#PCDATA)> <!ATTLIST Notional currency (dollars | pounds) #REQUIRED> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Internet Technologies
And reference the DTD in the XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Pittsburgh National Corporation"> ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional currency = "pounds">100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies
We get new output C:\McCarthy\www\95-733\examples\sax>java TomcatNotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation startElement called: element name =Notional attribute name = currency type = dollars|pounds value = pounds characters called:100 startElement called: element name =Fixed_Rate characters called:5 startElement called: element name =NumYears characters called:3 startElement called: element name =NumPayments characters called:6 endDocument called: How many times did we visit the servlet? Twice. Once for the xml and a second time for the DTD. Internet Technologies
We don’t have to go through a servlet…Tomcat can send the files String serverString = "http://localhost:8080/sax/fpml/"; String fileName = argv[0]; InputSource is = new InputSource(serverString + fileName); But the servlet illustrates that the XML data can be generated dynamically. Internet Technologies
The InputSource Class The SAX and DOM parsers need XML input. The “output” produced by these parsers amounts to a series of method calls (SAX) or an application programmer interface to the tree (DOM). An InputSource object can be used to provided input to the parser. Tree application InputSurce SAX or DOM Events So, how do we build an InputSource object? Internet Technologies
The InputSource Class Some InputSource constructors: InputSource(String pathToFile); InputSource(InputStream byteStream); InputStream(Reader characterStream); For example: String text = “<a>some xml</a>”; StringReader sr = new StringReader(text); InputSource is = new InputSource(sr); : myParser.parse(is); Internet Technologies
But what about the DTD? public interface EntityResolver Basic interface for resolving entities. If a SAX application needs to implement customized handling for external entities, it must implement this interface and register an instance with the SAX parser using the parser's setEntityResolver method. The parser will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them. Internet Technologies
EntityResolver public InputSource resolveEntity(String publicId, String systemId) { // Add this method to the client above. The systemId String // holds the path to the dtd as specified in the xml document. // We may now access the dtd from a servlet and return an // InputStream or return null and let the parser resolve the // external entity. System.out.println("Attempting to resolve" + "Public id :" + publicId + "System id :" + systemId); return null; } Internet Technologies
Processing XML with DOM • The following examples were tested using Sun’s JAXP • (Java API for XMP Parsing. This is available at • http://www.javasoft.com/ and click on XML Internet Technologies
XML DOM • The World Wide Web Consortium’s Document Object Model • Provides a common vocabulary to use in manipulating • XML documents. • May be used from C, Java, Perl, Python, or VB • Things may be quite different “under the hood”. • The interface to the document will be the same. Internet Technologies
The XML File “cats.xml” <?xml version = "1.0" ?> <!DOCTYPE TopCat SYSTEM "cats.dtd"> <TopCat> I am The Cat in The Hat <LittleCatA> I am Little Cat A </LittleCatA> <LittleCatB> I am Little Cat B <LittleCatC> I am Little Cat C </LittleCatC> </LittleCatB> <LittleCatD/> </TopCat> Internet Technologies