290 likes | 661 Views
XML-based Web Publishing and Content Management at Seattle University School of Law. James Cooper Director of Technology & Media Services jcooper@seattleu.edu Evan Lenz Content Management Architect lenze@seattleu.edu. Contents. Web site requirements and architecture
E N D
XML-based Web Publishing and Content Management at Seattle University School of Law James Cooper Director of Technology & Media Services jcooper@seattleu.edu Evan Lenz Content Management Architect lenze@seattleu.edu
Contents • Web site requirements and architecture • Web site management with Cocoon • URI design discussion • Redhawk CMS • An acronym you should know: XSLT • Q&A
SU Law Web site requirements (summer 2002) • Must include a Flash-enhanced version • Must include an HTML-based version that approximates the look-and-feel and navigational structure of the Flash-enhanced version • Must include a version of the site that is designed for accessibility • Must employ the separation of presentation and content through the use of XML technologies. Multiple published versions of the same content must originate in an automatic way from the same source. • The publishing framework must employ a single point of control over navigational structure, e.g. using an XML configuration file.
Web site requirements, cont. • Must allow an average Web developer to easily author new content, edit existing content, etc. • Must accommodate the continued use of existing tools for authoring content, e.g. Dreamweaver. • Particular kinds of content that have predictable, repeating structure should be converted into custom XML vocabularies to increase their flexibility and ease of management. • The Web site must include search functionality integrated into all versions of the site.
Web content strategy today • Static pages were converted to and are stored as style-free XHTML (in VSS, with latest versions shadowed on the staging server). • Apache Ant is invoked on the staging server to incrementally build all versions (Flash, Standard, Text-only, and crawler) of each static page, using the page source, as well as global navigation and sidebar configuration files, as input. • Cocoon powers the core functionality of the site, including setting the user’s version preferences and serving dynamic content. All static pages and files are served directly by Apache. • Dynamic content pieces are identified by URI in the Cocoon sitemap, which is configured to assemble corresponding pages on-the-fly. Dynamic content examples include: • Specialized content in our home-grown CMS called “Redhawk”, which provides end-user WYSIWYG editing of certain kinds of content • Google search results • Legacy ASP pages • Traditional Web content management, e.g. WYSIWYG editing of all pages, is being considered, but not sorely missed at this time.
Benefits of using XML • Separation of presentation from content • Ensures consistency of presentation across all pages (eliminates layout errors) • Enables publication to multiple channels • Content re-use • Many commercial and open-source tools available for processing/creating XML • Integration between disparate systems (including legacy ASP pages, Google, Redhawk, etc.) • Great for configuration files
Run-time: Apache Cocoon (Java-based) Apache Web server on Linux mod_rewrite (for rewriting incoming URLs, e.g. path?mode=flash, to /flash-html/path.html) Google Appliance (for integrated search inside our site template) IIS/ASP (legacy database access scripts, e-mail forms, etc.) 4Suite, for exporting content from the Redhawk CMS (based on 4Suite) Build-time: MS Visual SourceSafe (for versioning of static content) Samba (for mounting a VSS shadow folder on the Linux staging server) Dreamweaver MX (includes XHTML support and VSS integration) Apache Ant (for building the bulk of the site statically) 4Suite, for end-user content management of specialized document types, aka Redhawk Primary tools used in our Web site
Introduction to Cocoon • Cocoon is an open-source, Java-based XML Web publishing framework • Recently gained status as a top-level Apache project, at http://cocoon.apache.org • Designed to enable the separation of concerns between content, logic, and style
The Cocoon sitemap • SAX-based pipeline mechanism allows XML content to go through a series of transformations, configurable by the sitemap, Cocoon's central point of configuration • Each pipeline consists of: • Exactly one generator • Produces XML content using any number of mechanisms: reading a file, submitting an HTTP request, calling a database, invoking a server page script, etc. • Followed by zero or more transformers • Processes the XML, e.g. XSLT or Xinclude, for subsequent handling by either another transformer or the serializer • Followed by exactly one serializer • Serializes into a particular format, e.g. well-formed XML, browser-compatible XHTML, SVG, PDF (via XSL:FO and FOP), rasterized images (via SVG and Batik), etc.
Simplified Cocoon sitemap excerpt <map:match pattern="accesstojustice/hague/cases"> <map:generate src="http://redhawk/?xslt=getCases.xsl"/> <map:transform src="stylesheets/case2html.xsl"/> <map:serialize type="xhtml"/> </map:match>
Another sitemap excerpt <map:resource name="front-door"> <map:select type="request-parameter"> <map:parameter name="parameter-name" value="set-version"/> <map:when test="flash"> <map:call resource="check-flash"/> </map:when> <map:when test="flash-confirmed"> <map:call resource="set-preference-to-flash"/> </map:when> <map:when test="standard"> <map:call resource="set-preference-to-standard"/> </map:when> <map:when test="simple"> <map:call resource="set-preference-to-simple"/> </map:when> <map:otherwise> <!-- more logic --> </map:otherwise> </map:select> </map:resource>
URI design considerations • The URI design of the SU Law Web site was inspired by Tim Berners-Lee's 1998 essay “Cool URIs don't change” – http://www.w3.org/Provider/Style/URI.html • Aims to follow two of the essay's suggestions: • Leave out file extensions • Leave out topic/classification by subject
Leave out file extensions • Cocoon makes it easy to map external URIs to internal filenames or other content generators • In the SU Law Web site, the URLs of all HTML pages do not include any file extensions • Other types of content use standard file extensions, e.g. JPG, GIF, Flash, Word, etc.
Leave out topic/classification by subject • Difficult problem • Design URIs such that they are meaningfully mnemonic and will never change, even though the corresponding pages may be classified into different topics later • Berners-Lee: "Because the relationships between subjects are web-like rather than tree-like, even...people who agree on a web may pick a different tree representation."
Decouple navigational structure from URI structure • URI structure is, of necessity, hierarchical • Site navigation tends to be hierarchical, classifying pages into topics or subjects • To help in following the original suggestion, we formulated the following mandate: • Decouple navigational structure from URI structure. • We met this goal through the use of a custom XML configuration file (navigation.xml) that maps between the two independent hierarchies (navigation and URI structure)
Excerpt from navigation.xml <navigation xmlns="http://law.seattleu.edu"> <menu display="Welcome" sectionId="welcome"> <link href="/" display="SU Law Home"/> <link display="Contact Information" href="/contactus"/> <link display="Directions" href="/directions"/> <link href="/welcome" display="From the Dean"/> <link href="/history" display="History"/> <link href="/calendar" display="Master Calendar"/> <link href="/mission" display="Mission"/> <link href="/search" display="Search"/> <link href="/sitemap" display="Site Map"/> <link href="http://www.seattleu.edu" display="Seattle University Home"/> <hidden href="/news" display="News"/> <hidden pattern="/news"/> <hidden href="/privacy" display="Privacy Statement"/> </menu> <menu display="Students" sectionId="students"> <menu display="Academics"> <link href="/academics" display="Introduction"/> <link href="/academics/calendar" display="Academic Calendar"/> <link href="/courses" display="Course Descriptions"/> <link href="/classassignments" display="Class Assignments"/> <hidden pattern="/classassignments"/> <!-- more pages --> </menu> <!-- more submenus --> </menu> <!-- more menus --> </navigation>
The benefits of URI-navigation independence • Pages can be moved from one section of the site to another by simply editing one file (navigation.xml) • Navigation structure can change without needing to update any links or change any URIs (thereby rendering them uncool) • Files do not need to be moved around just because corresponding pages “move around” the site
XML-based configuration of the Web site “sidebar” <sidebar xmlns="http://law.seattleu.edu"> <allButtons> <promotion id="laptop" img="laptoppurchase.gif“ alt="Student Laptop Purchase Program (Dell)“ href="/technology/purchase"/> <profile id="cmhall" alt="Christian Halliburton Video“ movie="cmhall.rm"/> <quote id="cumbow" img="cumbow.gif" alt="Cumbow Quote"/> ... </allButtons> ... <section id="faculty"> <profile idref="cmhall"/> <quote idref="cumbow"/> <promotion idref="giving"/> <promotion idref="newfaculty"/> <promotion idref="laptop"/> </section> ... </sidebar>
Redhawk, home-grown CMS • Redhawk is a specialized XML content management system, based on 4Suite, an open-source platform for XML and RDF processing • Named after SU mascot • Basic unit of storage is an XML document • Supports development of custom Redhawk "document classes", which correspond to XML document types (or schemas) • Provides basic CRUD (Create, Read, Update, Delete) and role-based workflow functionality • Two types of users for each document class: Author and Editor • Any Create, Update, or Delete requests by an Author must be approved by an Editor before taking effect • Pluggable WYSIWYG editing environments; so far we have developed support for Altova's free browser-based XML editor, Authentic 5 • Future plans to support Microsoft InfoPath and Word 2003
Current Redhawk applications • Announcements and events for the Docket (migration from custom production application in process) • Access to Justice Institute’s Hague Project for managing Hague Convention-related case information (in production)
The common denominator: XSLT (Extensible Stylesheet Language Transformations) • Used in Cocoon to assemble all pages (XSLT is the default type of "Transformer") • Used in our site build process, via Ant's <xslt> task for collectively applying transformations over multiple files • Built-in to 4Suite and used throughout Redhawk to assemble pages, create documents, and implement the core CMS logic (with the help of extensions) • Used in the Google Appliance to style the output of search results • Used in Redhawk in the browser to apply supplemental "clean-up" transformations to the XML resulting from Authentic editing • Growing abundance of conformant XSLT processors, including IE6 and Mozilla support, as well as a growing number of powerful tools • And… XSLT is reaching mainstream technology status: Microsoft Office 2003 will pervasively employ XSLT for the development of custom XML solutions, particularly in Word, Excel, Access, and InfoPath.
References • http://cocoon.apache.org • http://4suite.org • http://ant.apache.org • “Cool URIs don't change” – http://www.w3.org/Provider/Style/URI.html • “Cocoon and 4Suite for Content Management: The Best of Both Worlds at Seattle University School of Law” - http://www.xmlportfolio.com/xmleurope2003/