1 / 31

XML and related technologies tutorial Developed using material at w3schools/xml

XML and related technologies tutorial Developed using material at http://www.w3schools.com/xml. Topics covered XML (.xml)– describe data XML Schema (.xsd)– validate data XPATH – navigate data XSLT (.xsl) – transform data. XML (.xml)– describe data

elyse
Download Presentation

XML and related technologies tutorial Developed using material at w3schools/xml

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML and related technologies tutorial Developed using material at http://www.w3schools.com/xml

  2. Topics covered XML (.xml)– describe data XML Schema (.xsd)– validate data XPATH – navigate data XSLT (.xsl) – transform data

  3. XML (.xml)– describe data • XML stands for Extensible Markup Language. • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined. You must define your own tags • XML uses an XML Schema(.xsd) to validate the data (or Document Type Definition (.dtd), …) • XML with a XML Schema is designed to be self-descriptive • XML can be used to create other XML based languages

  4. XML pros/cons Format • Pro - Data and data description tags are written in text so they are portable and not dependent on proprietary formats or conversion processes for use. • Con – because data is verbosely described, larger datasets(e.g. model outputs) or binary formats(e.g. images) can be poor candidates for pure xml adoption. The common solution is to leave these types of data in their raw formats with use of some xml to describe the useful metadata(observation type, temporal/spatial range,…) of the file. Structure • Pro - XML structure can be easily extended with the addition of elements/attributes as needed. • Con – deciding on the initial XML structure is driven by the application use of the data which can vary widely.

  5. XML Syntax <note id=“100”> <!-- this is a comment --> <to>John</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • <note> is the ‘root’ element of the document. <note> is a parent element, <body> is a child element and <heading> is a sibling element. • id=“100” is an attribute of the <note> element. Attribute use should be limited, but is generally considered ok when referring to element metadata. • All elements must have a closing tag and be properly nested. • Tags are case sensitive. • Attribute values must be quoted.

  6. Element Naming XML elements must follow these naming rules: • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml …) • Names cannot contain spaces(substitute underscore(_) instead) and should not use the colon(:) or dash(-) characters

  7. XML Schema (.xsd)– validate data • defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements • defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes

  8. <employee> • <firstname>John</firstname> • <lastname>Smith</lastname> • </employee> Several elements can refer to the same complex type • <xs:element name="employee" type="personinfo"/> • <xs:element name="student" type="personinfo"/> • <xs:element name="member" type="personinfo"/> • <xs:complexType name="personinfo"> • <xs:sequence> • <xs:element name="firstname" type="xs:string"/> • <xs:element name="lastname" type="xs:string"/> • </xs:sequence> • </xs:complexType>

  9. XPath – navigate data • XPath is a syntax for defining parts of an XML document • XPath uses path expressions to navigate in XML documents • XPath contains a library of standard functions • XPath is a major element in XSLT

  10. #file obs_system.xml • <?xml version="1.0"?> • <system id="carocoops"> • <platform id="CAP2"> • <online>no</online> • <latitude_dd></latitude_dd> • <longitude_dd></longitude_dd> • <observation id="air_pressure"> • <online>yes</online> • <unit>bar</unit> • <last_timestamp></last_timestamp> • <last_measurement></last_measurement> • <range_fail_high>1050</range_fail_high> • <range_fail_low>900</range_fail_low> • <continuity_fail></continuity_fail> • </observation> • </platform> • <platform id="SUN2"> • <online>yes</online> • <latitude_dd></latitude_dd> • <longitude_dd></longitude_dd> • <observation id="air_pressure"> • <online>yes</online> • <unit>bar</unit> • <last_timestamp></last_timestamp> • <last_measurement></last_measurement> • <range_fail_high>1050</range_fail_high> • <range_fail_low>900</range_fail_low> • <continuity_fail></continuity_fail> • </observation> • </platform> • </system>

  11. #!perl • use strict; • use XML::XPath; • my $xp = XML::XPath->new(filename => 'obs_system.xml'); • foreach my $element ($xp->findnodes('/system/platform[@id="SUN2"]/online')) { • print $element->string_value()."\n"; • }

  12. XSLT (.xsl) – transform data • XSL stands for EXtensible Stylesheet Language. • XSLT stands for XSL Transformations • XSLT is the most important part of XSL • XSLT transforms an XML document into another XML document or output(see xsl:output method="text") • XSLT uses XPath to navigate in XML documents

  13. <?xml version="1.0" encoding="ISO-8859-1"?> • <?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?> • <catalog> • <cd> • <title>Empire Burlesque</title> • <artist>Bob Dylan</artist> • <country>USA</country> • <company>Columbia</company> • <price>10.90</price> • <year>1985</year> • </cd> • ... • </catalog>

  14. <?xml version="1.0" encoding="ISO-8859-1"?> • <xsl:stylesheet version="1.0" • xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> • <xsl:template match="/"> • <html> • <body> • <h2>My CD Collection</h2> • <table border="1"> • <tr bgcolor="#9acd32"> • <th align="left">Title</th> • <th align="left">Artist</th> • </tr> • <xsl:for-each select="catalog/cd"> • <tr> • <td><xsl:value-of select="title"/></td> • <td><xsl:value-of select="artist"/></td> • </tr> • </xsl:for-each> • </table> • </body> • </html> • </xsl:template> • </xsl:stylesheet>

  15. Salinity Workshop Possible Web services which return data IOOS XML Schema for data returned Salty Slim – more documentation at http://twiki.sura.org/twiki/bin/view/Main/SalinityWorkshop Same set of possible web services could also be used by generalized observing systems(NEON or GEOSS) or as example for USGS, NDBC, OBIS, etc web services

  16. Possible web services • ##tell me the services you offer # GetCapabilities • returns a list of methods available with their associated input vars and outputs. • ##give me reference handles to the platforms, their position and what they collect # GetPlatformList • returns list of platform_id's and their associated geographic position and observation type(standard_names) which they collect

  17. ##### • ##give me all the data # GetLatest(optional: platform_id/bounding_box) • returns a list of platform_id's and their corresponding latest observations(optionally for a specific platform_id/within the selected geographic bounding_box) • ##give me just the observations requested # GetLatestByObservation(observation_standard_name[list?], optional: platform_id/bounding_box) • returns a list of platform_id's and only the selected observation[list?] for the latest data(optionally for a specific platform_id/within the selected geographic bounding_box) • ##### • ##instead of just the latest data, give me for the specified date range • ##give me all the data # GetByDateRange(optional: platform_id/bounding_box, start_datetime, end_datetime) • returns a list of platform_id's and and observations for data within the date range(optionally for a specific platform_id/within the selected geographic bounding_box) • ##give me just the observations requested # GetByDateRangeByObservation(observation_standard_name[list?], optional: platform_id/bounding_box, start_datetime, end_datetime) • returns a list of platform_id's and only the selected observation[list?] for data within the date range(optionally for a specific platform_id/within the selected geographic bounding_box)

  18. Salty Slim – more documentation at http://twiki.sura.org/twiki/bin/view/Main/SalinityWorkshop • <?xml version="1.0"?> • <ioos_data • xmlns="http://localhost/xml_schema" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://10.203.10.47/xml_schema ioos_sst.xsd"> • <organization> • <name>ndbc</name> • <url>http://www.ndbc.noaa.gov</url> • <spatial_reference_system>NameOfReferenceSystem</spatial_reference_system> • <srs_units>meters</srs_units> • <vertical_datum>NGVD88</vertical_datum> • <horizontal_datum>NAD83</horizontal_datum>

  19. <platform> <name>41004</name> <id>ndbc_41004</id> <!-- fixed_point,fixed_profile,fixed_depth,free --> <collection_type>fixed_point</collection_type> <fixed_latitude>32.50</fixed_latitude> <fixed_longitude>-79.09</fixed_longitude> <fixed_depth units="meters">2.5</depth> <url>http://www.ndbc.noaa.gov/station_page.php?station=41004</url> <fgdc_metadata_url>http://url_to_metadata</fgdc_metadata_url> <opendap_url>http://url_to_opendap</opendap_url> <qc_documentation_url>http://url_to_qc_documentation</qc_documentation_url> <upload_interval units='minutes'>10</upload_interval>

  20. <data> • <observation> • <name>sstPrimary</name> • <id>ndbc_41004_sstPrimary</id> • <standard_name>sea_surface_temperature</standard_name> • <units>degree_Celsius</units> • <measurement_interval units='seconds'>30</measurement_interval> • <value datetime="2001-01-01T00:00:00" qc_stage="raw" qc_flags="none">20.6</value> • <value datetime="2001-01-01T01:00:00" qc_stage="raw" qc_flags="none">20.6</value> • <value datetime="2001-01-01T02:00:00" qc_stage="raw" qc_flags="none">20.6</value> • <value datetime="2001-01-01T03:00:00" qc_stage="raw" qc_flags="none">20.6</value> • <value datetime="2001-01-01T04:00:00" qc_stage="raw" qc_flags="none">20.6</value> • <value datetime="2001-01-01T05:00:00" qc_stage="raw" qc_flags="none">20.4</value> • <value datetime="2001-01-01T06:00:00" qc_stage="raw" qc_flags="none">20.4</value> • </observation> • </data> • </platform> • </organization> • </ioos_data>

  21. *fixed_profile(wls,adcp,ctd) - <fixed_latitude), <fixed_longitude> with free_depth represented on a per <value> attribute basis. <fixed_depth> omitted. <value datetime="2001-01-01T00:00:00" free_depth="2.3" qc_stage="raw" qc_flags="none">20.6</value> <value datetime="2001-01-01T00:00:00" free_depth="2.1" qc_stage="raw" qc_flags="none">21.6</value> • *fixed_depth(ships, floaters) - <fixed_depth> with free_latitude, free_longitude represented on a per <value> attribute basis. <fixed_latitude>,<fixed_longitude> omitted. <value datetime="2001-01-01T00:00:00" free_latitude="32.60" free_longitude = "-79.19" qc_stage="raw" qc_flags="none">20.6</value> <value datetime="2001-01-01T00:00:00" free_latitude="32.70" free_longitude = "-79.29" qc_stage="raw" qc_flags="none">21.6</value> • *free(subs, tagged species) - 'free_latitude', 'free_longitude', 'free_depth' represented on a per <value> attribute basis. <fixed_latitude>,<fixed_longitude>,<fixed_depth> omitted. <value datetime="2001-01-01T00:00:00" free_latitude="32.60" free_longitude = "-79.19" free_depth="2.3" qc_stage="raw" qc_flags="none">20.6</value> <value datetime="2001-01-01T00:00:00" free_latitude="32.70" free_longitude = "-79.29" free_depth="2.3" qc_stage="raw" qc_flags="none">21.6</value> • Technically speaking, everything could be represented in the 'free' type format, but it might be useful from a metadata/processing standpoint to know what the collection type is.

  22. #the following URL (which is the same as 'GetLatest') supports the Carolinas coast website latest observations • http://nautilus.baruch.sc.edu/wfs/seacoos_in_situ?SERVICE=WFS&VERSION=1.0.0&REQUEST=GETFEATURE&BBOX=-91.5,22,-71.5,36.5&typename=latest_in_situ_obs • #returns the following XML document (only one platform listing shown) • <?xml version="1.0" encoding="ISO-8859-1" ?> • - <wfs:FeatureCollection xmlns="http://www.ttt.org/myns" xmlns:myns="http://www.ttt.org/myns" xmlns:wfs="http://www.opengis.net/wfs" xmlns:gml="http://www.opengis.net/gml" xmlns:ogc="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wfs ../wfs/1.0.0/WFS-basic.xsd http://www.ttt.org/myns http://nautilus.baruch.sc.edu/wfs/seacoos_in_situ?SERVICE=WFS&VERSION=1.0.0&REQUEST=DescribeFeatureType&TYPENAME=latest_in_situ_obs"> • - <gml:boundedBy> • - <gml:Box srsName="EPSG:4269"> • <gml:coordinates>-91.320000,22.030000 -72.230000,36.480000</gml:coordinates> • </gml:Box> • </gml:boundedBy> • - <gml:featureMember> • - <latest_in_situ_obs> • - <gml:boundedBy> • - <gml:Box srsName="EPSG:4269"> • <gml:coordinates>-79.710000,32.860000 -79.710000,32.860000</gml:coordinates> • </gml:Box> • </gml:boundedBy> • - <gml:pointProperty> • - <gml:Point srsName="EPSG:4269"> • <gml:coordinates>-79.710000,32.860000</gml:coordinates> • </gml:Point> • </gml:pointProperty>

  23. <station_id>carocoops_CAP1_wls</station_id> • <report_time_stamp>2005-11-14 08:00:00</report_time_stamp> • <top_time_stamp>2005-11-14 09:54:00</top_time_stamp> • <air_pressure_time_stamp>2005-09-16 17:54:00</air_pressure_time_stamp> • <air_pressure_mb>1015.97 mb @ 3m</air_pressure_mb> • <air_pressure_mb_graph_and_data>http://nautilus.baruch.sc.edu/portal_rs/query_details_air_pressure_in_situ.phtml?hour_range=24&station_id=carocoops_CAP1_wls&lon=-79.71&lat=32.86&air_pressure_table=air_pressure_prod&archive_flag=&time_stamp=2005_09_16_17_54_00&pressure_units=MB</air_pressure_mb_graph_and_data> • <air_pressure_inches_mercury>30.00 in Hg (0 deg C) @ 3m</air_pressure_inches_mercury> • <air_pressure_inches_mercury_graph_and_data>http://nautilus.baruch.sc.edu/portal_rs/query_details_air_pressure_in_situ.phtml?hour_range=24&station_id=carocoops_CAP1_wls&lon=-79.71&lat=32.86&air_pressure_table=air_pressure_prod&archive_flag=&time_stamp=2005_09_16_17_54_00&pressure_units=INCHES_MERCURY</air_pressure_inches_mercury_graph_and_data> • <air_temperature_time_stamp>2005-09-17 03:54:00</air_temperature_time_stamp> • <air_temperature_celcius>26.98 deg C @ 3m</air_temperature_celcius> • <air_temperature_celcius_graph_and_data>http://nautilus.baruch.sc.edu/portal_rs/query_details_air_temperature.phtml?hour_range=24&station_id=carocoops_CAP1_wls&lon=-79.71&lat=32.86&air_temperature_table=air_temperature_prod&archive_flag=&time_stamp=2005_09_17_03_54_00&degree_units=C</air_temperature_celcius_graph_and_data> • <air_temperature_fahrenheit>80.56 deg F @ 3m</air_temperature_fahrenheit> • <air_temperature_fahrenheit_graph_and_data>http://nautilus.baruch.sc.edu/portal_rs/query_details_air_temperature.phtml?hour_range=24&station_id=carocoops_CAP1_wls&lon=-79.71&lat=32.86&air_temperature_table=air_temperature_prod&archive_flag=&time_stamp=2005_09_17_03_54_00&degree_units=F</air_temperature_fahrenheit_graph_and_data> • ... • </latest_in_situ_obs> • </gml:featureMember>

  24. SEACOOS XML Services http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/CodeRepositorySeacoosXMLServices Using an XML descriptor file to describe ASCII column oriented data for later processing Web forms simplify the process of creating needed XML Service currently exists for fixed point data which converts ASCII to SEACOOS netCDF Data scout currently converts netCDF to SQL for relational database population, but future efforts may skip netCDF step entirely

  25. time,wind_speed,wind_from_direction,sea_surface_temperature • 2004-10-22 14:00:00+00_SEP_5.0_SEP_120.0.0_SEP_12.0 • 2004-10-22 15:00:00_SEP_6.0_SEP_125.0_SEP_13.0 • 2004-10-22 16:00:00_SEP_7.0_SEP_130.0_SEP_14.0 • 2004-10-22 17:00:00_SEP_8.0_SEP_135.0_SEP_15.0

  26. <global_attributes> • <!-- repeatable element --> • <conventions>CF-1.0</conventions> • <conventions>SEACOOS-NETCDF-2.0</conventions> • <conventions>SEACOOS-XML-1.0</conventions> • <!-- platform information --> • <!-- format_category list • [fixed-point,fixed-profiler,fixed-map,moving-point-2D,moving-point-3D,moving-profiler] • --> • <format_category>fixed-point</format_category> • <contact_info>Jeremy Cothran (jcothran@carocoops.org)</contact_info> • <institution_desc>Baruch Institute, University of South Carolina at Columbia</institution_desc> • <institution_url>http://carocoops.org</institution_url> • <institution_id>carocoops</institution_id> • <platform_id>CAP2</platform_id> • <package_id>buoy</package_id>

  27. <!-- file information --> <data_url>http://trident.baruch.sc.edu/storm_surge_data/latest</data_url> <filename_search></filename_search> <file_row_start>2</file_row_start> <file_row_comment></file_row_comment> <file_field_separator>_SEP_</file_field_separator> <file_field_missing_value></file_field_missing_value> <column_time></column_time> <measurement_time_zone></measurement_time_zone>

  28. <dependent_variables> <!-- repeatable element --> <variable> <column_number>2</column_number> <standard_name>wind_speed</standard_name> <units>m s-1</units> <z>3.0</z> </variable>

More Related