330 likes | 347 Views
Implementing Geographical Information System Services for SERVOGrid. Marlon Pierce Community Grids Lab Indiana University. SERVOGrid Components. Component (“portlet”)-based portals. OGCE mentioned by Chris Hill Web Services for “execution grid” services Ant-based job specification
E N D
Implementing Geographical Information System Services for SERVOGrid Marlon Pierce Community Grids Lab Indiana University
SERVOGrid Components • Component (“portlet”)-based portals. • OGCE mentioned by Chris Hill • Web Services for “execution grid” services • Ant-based job specification • File transfer • Distributed session management (“context”). • Geographic Information System (GIS) services for “data grid” services. • Web Map Service • Web Feature Service • GIS-compatible information services. • Support for streaming, real-time data. • Distributed service management/orchestration • Using events and data streams.
Guiding Principles • Grids are composed of families of services • Data, execution, information, … • Use “WS-I+” approach to building service families. • Build Grids out of Web Service standards conservatively. • WS-Interoperability is the starting point. • See position paper http://grids.ucs.indiana.edu/ptliupages/publications/WebServiceGrids.pdf • SOAP and WSDL provide universal messaging framework and service definition language. • All services should communicate with the same message format. • Message delivery is left as an exercise. • Implementations are interesting.
Pattern Informatics (PI) • PI is a technique developed at University of California, Davis for analyzing earthquake seismic records to forecast regions with high future seismic activity. • They have correctly forecasted the locations of 15 of last 16 earthquakes with magnitude > 5.0 in California. • See Tiampo, K. F., Rundle, J. B., McGinnis, S. A., & Klein, W. Pattern dynamics and forecast methods in seismically active regions. Pure Ap. Geophys. 159, 2429-2467 (2002). • http://citebase.eprints.org/cgi-bin/fulltext?format=application/pdf&identifier=oai%3AarXiv.org%3Acond-mat%2F0102032 • PI is being applied other regions of the world, and John has gotten a lot of press. • Google “John Rundle UC Davis Pattern Informatics”
Pattern Informatics in a Grid Environment • PI in a Grid environment: • Hotspot forecasts are made using publicly available seismic records. • Southern California Earthquake Data Center • Advanced National Seismic System (ANSS) catalogs • Code location is unimportant, can be a service through remote execution • Results need to be stored, shared, modified • Grid/Web Services can provide these capabilities • Problems: • How do we provide programming interfaces (not just user interfaces) to the above catalogs? • How do we connect remote data sources directly to the PI code. • How do we automate this for the entire planet? • Solutions: • Use GIS services to provide the input data, plot the output data • Web Feature Service for data archives • Web Map Service for generating maps • Use HPSearch tool to tie together and manage the distributed data sources and code.
WFS + Seismic Rec. WSDL Aggregating WMS Stubs Stubs HTTP SOAP WSDL WSDL “REST” WFS + Seismic Rec. WFS + State Bounds … WMS + OnEarth
GIS Behind the Scenes • The web features are served up by a Web Feature Service. • Web Map Service aggregates maps • NASA OnEarth + our own renderings. • We re-implement Open Geospatial Consortium standards using Web Service Standards. • SOAP messages, WSDL service definitions. • Will allow us to separate messages from HTTP transport layer in future. • More WMS Info: • http://grids.ucs.indiana.edu/ptliupages/publications/acm-gis-sayar.pdf. • http://grids.ucs.indiana.edu/ptliupages/publications/Geoinformatics05_asayar.pdf. • More WFS Info: • http://grids.ucs.indiana.edu/ptliupages/publications/gwpap243.pdf • More general info, software, demos: http://www.crisisgrid.org
Tying It All Together: HPSearch • HPSearch is an engine for orchestrating distributed Web Service interactions • It uses an event system and supports both file transfers and data streams. • Legacy name • HPSearch flows can be scripted with JavaScript • HPSearch engine binds the flow to a particular set of remote services and executes the script. • HPSearch engines are Web Services, can be distributed interoperate for load balancing. • Boss/Worker model • ProxyWebService: a wrapper class that adds notification and streaming support to a Web Service. • More info: http://www.hpsearch.org
HPSearch (TRex) HPSearch (Danube) Actual Data flow HPSearch controls the Web services Final Output pulled by the WMS HPSearch Engines communicate using NB Messaging infrastructure Data can be stored and retrieved from the 3rd part repository (Context Service) WS Context (Tambora) WFS (Gridfarm001) NaradaBroker network: Used by HPSearch engines as well as for data transfer WMS Data Filter (Danube) Virtual Data flow WMS submits script execution request (URI of script, parameters) HPSearch hosts an AXIS service for remote deployment of scripts • PI Code Runner • (Danube) • Accumulate Data • Run PI Code • Create Graph • Convert RAW -> GML GML (Danube)
RDAHMM: GPS Time Series SegmentationSlide Courtesy of Robert Granat, JPL GPS displacement (3D) length two years.Divided automatically by HMM into 7 classes. • Complex data with subtle signals is difficult for humans to analyze, leading to gaps in analysis • HMM segmentation provides an automatic way to focus attention on the most interesting parts of the time series • Features: • Dip due to aquifer drainage (days 120-250) • Hector Mine earthquake (day 626) • Noisy period at end of time series
Towards Real-Time RDAHMM • A real-time version of RDHAMM could potentially be used to detect state change events in live data from a GPS station. • SCIGN maintains 125+ GPS stations, so trivially parallel RDAHHM clones can monitor state changes in the entire network. • But first we must get the data to RDAHMM.
NaradaBrokering: Message Transport for Distributed Services • NB is a distributed messaging software system. • http://www.naradabrokering.org • NB system virtualizes transport links between components. • Supports TCP/IP, parallel TCP/IP, UDP, SSL. • See e.g. http://grids.ucs.indiana.edu/ptliupages/publications/AllHands2005NB-Paper.pdf for trans-Atlantic parallel tcp/ip timings.
More Information • Contact: mpierce@cs.indiana.edu • GIS Work at CGL: www.crisisgrid.org • Software, demos, publications • Several recent manuscript submissions are/will be posted soon. • HPSearch at CGL: www.hpsearch.org • SERVOGrid Web Sites • Our fine parent project • http://servo.jpl.nasa.gov/ • http://quakesim.jpl.nasa.gov/
Acknowledgements • Geoffrey Fox, Community Grids Lab director. • Shrideep Pallickara: NaradaBrokering design/development lead • Grad Students: Ahmet Sayar, Galip Aydin, Mehmet Aktas, Harshawadhan Gadgil
SERVO Apps and Their Data • GeoFEST: Three-dimensional viscoelastic finite element model for calculating nodal displacements and tractions. Allows for realistic fault geometry and characteristics, material properties, and body forces. • Relies upon fault models with geometric and material properties. • Virtual California: Program to simulate interactions between vertical strike-slip faults using an elastic layer over a viscoelastic half-space. • Relies upon fault and fault friction models. • Pattern Informatics: Calculates regions of enhanced probability for future seismic activity based on the seismic record of the region • Uses seismic data archives • RDAHMM: Time series analysis program based on Hidden Markov Modeling. Produces feature vectors and probabilities for transitioning from one class to another. • Used to analyze GPS and seismic catalog archives. • Can be adapted to detect state change events in real time. • We will focus on the latter two.
Problems with Conventional Web Services • Transport: HTTP Request/Response is a poor choice for non-trivial data transport. • Much better to stream out data without knowing the content-length. • Representation: ASCII XML is inefficient in obvious and not so obvious ways. • For example, WS security depends upon canonicalization to make reproducible message digests. • Efficiency and performance is not just a high performance computing problem. • Needed to support PDAs and other devices
NaradaBrokering and Web Services • SOAP 1.2 defines a message routing across distributed SOAP Nodes. • Naturally maps to an NB implementation. • This has just been released from www.naradabrokering.org • NB also has support for WS-Eventing and WS-ReliableMessaging. • More generally, we argue for the use of software messaging substrates to provide/implement desirable “quality of service” features • Transport, routing/addressing, reliability, security, discovery, etc. • Specific service capabilities (like “run job”, “move file”, “query data”) are decoupled from the substrate capabilities.
Efficient XML Representation • The XML Infoset provides an abstract data model. • SOAP 1.2 is defined using the Infoset. • This separates XML from “angle bracket notation” restrictions. • Infoset-compliant binary representations are possible. • No loss of data, so you can translate between binary and ascii representations. • Current lab research investigates hand-held applications. • See http://grids.ucs.indiana.edu/ptliupages/publications/OptSOAP_CTS05.pdf • But easily extensible to high performance transport problems.
More Information • Contact: mpierce@cs.indiana.edu • GIS Work at CGL: www.crisisgrid.org • Software, demos, publications • Several recent manuscript submissions are/will be posted soon. • HPSearch at CGL: www.hpsearch.org • SERVOGrid Web Sites • Our fine parent project • http://servo.jpl.nasa.gov/ • http://quakesim.jpl.nasa.gov/
RDAHMM: SCIGN GPS Network AnalysisSlide Courtesy of Robert Granat, JPL Now segment all 127 GPS stations In blue: Number of stations that change state on a given day In red: Seismic activity Days with many state changes often do not correlate with large earthquakes. • Have found a way to detect regional aseismic signals • This software is being integrated with the Quakesim web portal • Scenarios for use with real time streaming data through the web portal are currently being investigated
Support for Streaming Data • We use NaradaBrokering messaging software to manage data streams and filters. • Open source, Java-based software from the Community Grids Lab • Based on topic-based publication/subscription for delivery of messages from/to multiple endpoints. • “Message” can be anything, including SOAP and binary data streams. • We use this for audio/video collaboration. • More recently using it to build Web Service messaging substrates • SOAP 1.2 routing model, WS-Reliability, WS-Eventing • NB ensures reliable delivery of events in the case of broker or client failures and prolonged entity disconnects. • Also supports replay. • Implements high-performance protocols (message transit time of 1 to 2 ms per hop)
GPS Stations • Current implementation provides real-time access to GP messages to following stations in RYO, ASCII and GML formats:
SOPAC GPS Services • As a case study we implemented services to provide real-time access to GPS position messages collected from several SOPAC networks. • Next step is to couple data assimilation tools (such as RDAHMM) to real-time streaming GPS data. • Next steps • Programming APIs: currently we assume the subscriber speaks NaradaBrokering Java APIs (either NB’s native API or Java Messaging Service). • Need to investigate appropriate Web Service standards and C/C++ bindings. • SOAP enveloping of the GML message stream. • A Sensor Collection Service will be implemented to provide metadata about GPS sensors in SensorML.
Position Messages • SOPAC provides 1-2Hz real-time position messages from various GPS networks in a binary format called RYO. • Position messages are broadcasted through RTD server ports. • We have implemented tools to convert RYO messages into ASCII text and another that converts ASCII messages into GML.
Real-Time Access to Position Messages • We have a Forwarder tool that connects to RTD server port to forward RYO messages to a NB topic. • RYO to ASCII converter tool subscribes this topic to collect binary messages and converts them to ASCII. Then it publishes ASCII messages to another NB topic. • ASCII to GML converter subscribes this topic and publishes GML messages to another topic.