1 / 41

Semantic and Streaming Grids

This article explores the use of semantic and streaming grids in handling the massive amounts of data generated in various scientific disciplines. It discusses the applications of data assimilation, distributed simulations, audio-video conferencing, and handheld grid devices. The article also highlights the importance of metadata and decision support systems in managing and analyzing the data.

rcarrie
Download Presentation

Semantic and Streaming Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic and StreamingGrids Chinese Academy of Sciences Dec 6 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org

  2. Four Data Streaming Application Areas • Data Assimilation applied to link the data deluge (satellites, sensors, seismometers) in real time to small and large scale parallel simulations • Use in Earthquake Science • Department of Defense (and Homeland Security) have built the Global Information Grid with a target architecture NCOW (Network Centric Operations and warfare) • They submit no jobs; rather stream data to brokers from which they are filtered and distributed • Includes their rather dated distributed simulation HLA • Audio-Video Conferencing implemented with services and Grid messaging • Hand-held Grid linking PDA/cell-phones to Grids

  3. Data Deluged Science • In the past, we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new science and new ways of computing • Data assimilation was not central to HPCC • DoE ASCI set up because didn’t want test data! • Now particle physics will get 100 petabytes from CERN • Nuclear physics (Jefferson Lab) in same situation • Use around 30,000 CPU’s simultaneously 24X7 • Weather, climate, solid earth (EarthScope) • Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present) • Virtual Observatory and SkyServer in Astronomy • Environmental Sensor nets

  4. Information/Knowledge Grids • Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) • Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors • Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data • Needs decision support front end with “what-if” simulations • Metadata (provenance) critical to annotate data • Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available

  5. SS Database SS SS SS SS SS SS SS Raw Data  Data  Information  Knowledge  Wisdom AnotherGrid Decisions AnotherGrid SS SS SS SS FS FS OS MD MD FS Portal OS OS FS OS SOAP Messages OS FS FS FS AnotherService FS FS MD MD OS MD OS OS FS Other Service FS FS FS FS MD OS OS OS FS FS FS MD MD FS Filter Service OS AnotherGrid FS MetaData FS FS FS MD Sensor Service SS SS SS SS SS SS SS SS SS SS AnotherService

  6. Semantic Grid and Services • Implications of SOA (Service Oriented Architectures) for SG (Semantic Grid) • Build services to implement SG • Implications of SG for SOA • Build metadata rich systems of services using SG • Services receive data in SOAP messages, manipulate it and produce transformed data as further messages • Meta-data is carried in SOAP messages • Meta-data controls processing and transport of SOAP Messages • Knowledge is created from data by services • The Grid enhances Web services with semantically rich system and application specific management • One must exploit and work around the different approaches to meta-data and their manipulation in Web Services

  7. H1 H2 H3 H4 Body Service F1 F2 F3 F4 Container Workflow Container Handlers Structure of SOAP Messages • SOAP Messages have System information in the header including WS-Policy based meta-data defining processing options • Processed by Handlers • Application data and meta-data is the body (controversies here!) • Processed by the Service itself • Some meta-data like WS-RF is logically “only in messages” • Other like that in WS-Context or the SRB are stored in logical equivalent of XML databases • We only need to preserve semantic structure (XML/SOAP Infoset) so transport in fast XML and store in efficient relational databases

  8. What Type of Services are there? • There are a horde of support services supplying security, collaboration, database access, user interfaces • The support services are either associated with system or application • We will study the WS-* and GS-* which implicitly or explicitly define many support services • There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input • Simulations (including PDE’s and reactive systems) • Data-mining • Transformations • Agents • Reasoning are all termed filters here • There are services like “author ontology”, “parse RDF” or “attach provenance” that directly support Semantic Grid • But all services and their interactions are bathed in sea of meta-data and so implicitly need and support the Semantic Grid

  9. It’s a Composite Hierarchical World • Filters can be a workflow which means they are “just collections of other simpler services” • One needs meta-data to control the workflow • Services are programs that accept messages and produce messages • Grids are a distributed collection of services supporting managed shared resources • Management requires meta-data • Grids are distributed systems that accept distributed messages and produce distributed result messages • Can always talk about Grids and view a service or a workflow as a special case of a Grid • It just requires meta-data to send a message to a Grid and it routed to “correct computer” holding “requested service” • Meta-data allows mapping of virtual to real addresses

  10. SOAP Message Streams Wisdom SS AnotherService Database Decisions SS Data Raw Data Knowledge SS Information SS Knowledge Data Information SS Raw Data SS Information AnotherService SS Data Data SS Raw Data Raw Data is same as outward facing applicationservice SOAP Message Streams AnotherGrid AnotherGrid Grids of Grids Architecture Semantically Rich Services with a Semantically Rich Distributed Operating Environment Filter Service OS FS FS MD MD FS OS OS FS OS Portal OS FS FS FS FS FS MD MD OS MD OS OS FS Other Service FS FS FS FS MD OS OS OS FS FS FS MD MD FS OS FS MetaData FS FS FS MD Sensor Service SS SS SS SS SS SS SS SS SS SS

  11. GIS Grids and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid

  12. A Screen Shot From the WMS Client

  13. WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>

  14. Electric Power and Natural Gas data from LANL Interdependent Critical Infrastructure Simulations Zoom-in Zoom-out FeatureInfo mode Measure distance mode Clear Distance Drag and Drop mode Refresh to initial map

  15. Typical use of Grid Messaging in NASA Sensor Grid GIS Grid Grid Eventing Datamining Grid

  16. Typical use of Grid Messaging Filter or Datamining Sensor Grid Post afterProcessing Post beforeProcessing Web Feature Service NaradaBrokering Notify WFS (GIS data) Grid Database Archives Subscribe HPSearch Manages GIS Grid WS-Context Stores dynamic data GeographicalInformation System

  17. Real Time GPS and Google Maps Subscribe to live GPS station. Position data from SOPAC is combined with Google map clients. Select and zoom to GPS station location, click icons for more information.

  18. Integrating Archived Web Feature Services and Google Maps Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records.

  19. Google Maps as Service accessed from our WMS Client

  20. 3 XML Databases of Importance • WS-Context controlling a workflow • (Extended) UDDI supporting semantic service discovery • WFS or ASFS (see later) provides application specific data/meta-data repository) • These have different performance, scalability and data unit size requirement • In our implementation, each is currently “just an Oracle/MySQL” database front ended by filters that convert between XML (GML for WFS) and object-relational Schema • Example of Semantics (XML) versus representation (SQL) difference • OGSA-DAI offers Grid interface to databases – we could use but don’t as we only need to expose WFS and not MySQL to Grid

  21. Information Management/Processing • SOAP messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides • Semantic Web technologies like RDF and OWL help us have rich expressivity • Data  Information  Knowledgetransformation • We build application specific information management/transformation systems ASIS for each application domain • One special domain is the system itself where the metadata associated with services, sessions, Grids, messages, streams and workflow is itself managed and supported by an SIIS

  22. Generalizing a GIS • Geographical Information Systems GIS have been hugely successful in all fields that study the earth and related worlds • They define Geography Syntax (GML) and ways to store, access, query, manipulate and display geographical features • In SOA, GIS corresponds to a domain specific XML language and a suite of services for different functions above • However such a universal information model has not been developed in other areas even though there are many fields in which it appears possible • BIS Biological Information System • MIS Military Information System • IRIS Information Retrieval Information System • PAIS Physics Analysis Information System • SIIS Service Infrastructure Information System

  23. ASIS Application Specific Information System I • a) Discovery capabilities that are best done using WS-* standards • b) Domain specific metadata and data including search/store/access  interface. (cf WFS). Lets call generalization ASFS (Application Specific Feature Service) • Language to express domain specific features (cf GML). Lets call this ASL (Application Specific language) • Tools to manipulate information expressed in language and key data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations) • ASL must support Data sources such as sensors (cf OGC metadata and data sensor standards) and repositories. Sensors need (common across applications) support of streams of data • Queries need to support archived (find all relevant data in past)   and streaming (find all data in future with given properties) • Note all AS Services behave like Sensors and all sensors are wrapped as services • Any domain will have “raw data” (binary) and that which has been filtered to ASL. Lets call ASBD (Application Specific Binary Data)

  24. Filter, Transformation, Reasoning, Data-mining, Analysis ASRepository AS Tool (generic) AS Service (user defined) AS Tool (generic) ASVS Display AS“Sensor” Messages using ASL ASIS Application Specific Information System II • Lets call this ASVS (Application Specific Visualization Services) generalizing WMS for GIS • The ASVS should both visualize information and provide a way of navigating (cf GetFeatureInfo) database (the ASFS) • The ASVS can itself be federated and presents an ASFS output interface • d) There should be application service interface for ASIS from which all ASIS service inherit • e) There will be other user services interfacing to ASIS • All user and system services will input and output data in ASL using filters to cope with ASBD

  25. Directly GS-* WS-* Filters/ASTT MilitaryInformationManagement System Everything Is a Service or a message/ Information Nugget ASVS

  26. MIOor Military Information Object Unit of Managed Information expressed in ASL ASFS OGSA-DAI and Sensor Standards Info-DWS-Notification WS-Eventing

  27. Two-level Programming I Service Data • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies • C++ Java or Fortran Monte Carlo module • Data streaming from a sensor or Satellite • Specialized (JDBC) database access • Such services accept and produce data from users files and databases • The Grid is built by coordinating such services assuming we have solved problem of programming the service

  28. Service1 Service3 Service2 Service4 Two-level Programming II • The Grid is discussing the composition of distributed serviceswith the runtime interfaces to Grid as opposed to UNIX pipes/data streams • Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs • Such interpretative environments are the single processor analog of Grid Programming • Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

  29. Web Service 1 WS 2 WS N-1 Web Service N 3 Layer Programming Model Level 1 Programming inside services Application expressed in in Java Fortran C++ MPI etc. WS-* Infrastructure Level 2 Programming choosing services by virtualization Application Semantics (Metadata, Ontology) Semantic Grid Level 3 Grid Programming composing multiple services Service Workflow, Transactions, Mediation Substantial work in UK e-Science program, international semantic web community

  30. Consequences of Rule of the Millisecond Classic Programming • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process either? • 4) 10 to 1000 ms – Internet delay: Workflow • So use pointers and the compute memory system when latencies of ≤ 1 millisecond but use URI looked up in a context store when longer delays allowed • Transfer data when read-only and long latency allowed • Always choose the slowest allowed methodology and remember when in doubt, Moore’s law favors computer performance and systems always get more complex and harder to maintain.

  31. Session Server XGSP-based Control Media Servers Filters NaradaBrokering All Messaging Admire SIP H323 Access Grid Native XGSP GlobalMMCS Web Service Architecture Use Multiple Media servers to scale to many codecs and many versions of audio/video mixing WebServices High Performance (RTP)and XML/SOAP and .. NB Scales asdistributed Gateways convert to uniform XGSP Messaging NaradaBrokering

  32. Audio Video Web Service Instant Messaging Web Service Shared Display Web Service Shared …. Web Service XGSP Conference Control Service Event Messaging Service (NaradaBrokering) GlobalMMCS Architecture • Non-WS collaboration control protocols are “gatewayed” to XGSP • NaradaBrokering supports TCP (chat, control, shared display, PowerPoint etc.) and UDP (Audio-Video conferencing)

  33. XGSP Example: New Session <CreateAppSession> <ConferenceID> GameRoom </ConferenceID> <ApplicationID> chess </ApplicationID> <AppSessionID> chess-0 </AppSessionID> <AppSession-Creator> John </AppSession-Creator> <Private> false </Private> </CreateAppSession> <SetAppRole> <AppSessionID> chess-0 </AppSessionID> <UserID> Bob </UserID> <RoleDescription> black </RoleDescription> </SetAppRole> <SetAppRole> <AppSessionID> chess-0 </AppSessionID> <UserID> Jack </UserID> <RoleDescription> white </RoleDescription> </SetAppRole>

  34. H323 XGSP H323 Terminal H323 Gateway Session server H225.Setup JoinAVSession H.225 JoinAVSession OK Call Setup H225.Connect with the RTPLinks <IP Addr, Port> Terminal Capability Set & capability description H.245 ACK Capability Terminal Capability Set Exchange ACK Open OpenLogicChannel ( Video ) Audio & ACK ACK with video RTPLink Video <IP Addr, Port> Logic OpenLogicChannel ( Video ) Channels ACK JoinAVSession (Video) OpenLogicChannel ( Audio ) ACK with Audio RTPLink ACK <IP Addr, Port> OpenLogicChannel ( Audio ) ACK JoinAVSession (Audio) XGSP AV Signaling Protocol with H.323

  35. NaradaBrokering 2003-2006 • Messaging infrastructure for collaboration, peer-to-peer and Grids Implements JMS and native high-performance protocols (message transit time of 1 to 2 ms per hop) • Order-preserving message transport with QoS and security profiles • Support for different underlying transport such as TCP, UDP, Multicast, RTP • SOAP message supportand WS-Eventing, WS-RM and WS-Reliability. • WS-Notification when specification agreed • Active replay support: Pause and Replay live streams. • Stream Linkage: can link permanently multiple streams – using in annotation of real-time video streams • Replicated storage support for fault tolerance and resiliency to storage failures. • Management: HPSearch Scripting Interface to streams and brokers (uses WS-Management) • Broker Topics and Message Discovery: Locate appropriate • Integration with Axis2 Web Service Container (?) • High Performance Transport supporting SOAP Infoset

  36. Average Video Delays for one broker – Performance scales proportional to number of brokers Multiple sessions One session Latency ms 30 frames/sec # Receivers

  37. Gateway Gateway Gateway Gateway XGSP Media Service WS-Context Collaboration Grid NaradaBroker Audio Mixer HPSearch Video Mixer UDDI NaradaBroker Transcoder Thumbnail WS-Security Replay NaradaBroker Record Annotate SharedWS SharedDisplay WhiteBoard

  38. GIS TV GlobalMMCS SWT Client Chat Video Mixer Webcam

  39. Archived stream Annotation / WB e - Annotation e - Annotation e-Annotation Archived Stream Annotated e-Annotation Player Player Stream Player Whiteboard player player Whiteboard Player Archived Real Time Real TimeStream List Stream List Player Real time Real time stream Archieved stream list player stream list

  40. Location of software for Grid Projects in Community Grids Laboratory • htpp://www.naradabrokering.org provides Web service (and JMS) compliant distributed publish-subscribe messaging (software overlay network) • htpp://www.globlmmcs.org is a service oriented (Grid) collaboration environment (audio-video conferencing) • http://www.crisisgrid.org is an OGC (open geospatial consortium) Geographical Information System (GIS) compliant GIS and Sensor Grid (with POLIS center) • http://www.opengrids.org has WS-Context, Extended UDDI etc. • The work is still in progress but core part of NaradaBrokering is quite mature • All software is open source and freely available

  41. Summary • Virtualization everywhere • Focus on semantics not representation to get performance combined with expressivity for transport and data access • All this enabled by powerful meta-data services • Grids add management to rich but potentially chaotic set of Web Services; • management and coherence enabled by meta-data • Can define general information architectures (ASIS, GIS, SIIS) for both applications and system • Knowledge from filters that span simulations, data-mining, reasoning and agents • A service is just a special case of a Grid • Build systems from SubGrids (Gridlets)

More Related