1 / 28

“ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective

“ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective. Mario Antonioletti EPCC mario@epcc.ed.ac.uk. Talk Overview. Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions. OGSA-DAI and DAIS. GGF DAIS WG

Download Presentation

“ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Workflow” in Data Access and IntegrationAn OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

  2. Talk Overview • Background: OGSA-DAI and DAIS • Motivation and Definitions • Hierarchies of Service Coordination • Conclusions e-Science Workflow Services - www.ogsadai.org.uk

  3. OGSA-DAI and DAIS • GGF DAIS WG • Database Access and Integration Services • Attempting to standardise interfaces based on OGSI • OGSA-DAI • Aim to provide an implementation of DAIS • Serve UK e-Science Community • OGSA-DAI and DAIS • Currently not aligned • Data service interface in OGSA-DAI coarse grained • Based on an earlier version of DAIS • Data service interface in DAIS currently fine grained • Scope for more coarse grained interfaces • OGSA-DAI will realign DAIS once the latter stabilizes e-Science Workflow Services - www.ogsadai.org.uk

  4. Powered by …. OGSA-DAI Project Partners e-Science Workflow Services - www.ogsadai.org.uk

  5. Data Resource Data Resource Data Resource Client Data Service 1. Provides access to a data resource. 2. May provide integration of several data resources. Simple Data Service Scenario e-Science Workflow Services - www.ogsadai.org.uk

  6. Some Definitions • Data Resource • An object that can source/sink data • Currently databases in scope • Files and file systems may come in scope • Data Services • Grid services • Provides common interface to data resources • Exposes some capabilities of a data resource • SQL Queries, XPath, BinX, … • Can also provide additional capabilities • Transformations, Third party data delivery, etc … e-Science Workflow Services - www.ogsadai.org.uk

  7. Want common interfaces for: Data access Data integration As requests to data service may produce lots of data Want to minimise data movement Hence encapsulate interactions with service Serialise multiple interactions into one interaction Abstract each interaction into an “activity” Data flows between activities Use a document mechanism to describe this DAIS and OGSA-DAI Concerned with data flow Currently do not have control constructs No looping, conditionals, splits, joins, … Motivation e-Science Workflow Services - www.ogsadai.org.uk

  8. 1. Coordinate of activities performed at one Data Service. 2. Client choreographs a set of services to work together. Data Service Service Service Service … or a service may orchestrate on behalf of the client. Service Coordination Patterns Client Data Service 3. Orchestration of servicesusing a document directed to one service. 4. Possibly interface with standard workflowlanguages, e.g. BPEL4WS, WSCI, … e-Science Workflow Services - www.ogsadai.org.uk

  9. Coordination Hierarchies • Service coordination may take place: • Intra service • Document based • Inter services – application driven • Choreographed/orchestrated by a client or service • Inter service – document driven • Orchestration • Ideally would look the same as the intra service document based interface • Combined with other workflow languages e-Science Workflow Services - www.ogsadai.org.uk

  10. Intra Service Processing • Service processing described by a document • Possible activities (OGSA-DAI perspective): • Statement • SQL Query, XPath Query • Delivery • Input data from third party • Output data to a third party • Deliver data in the response • Transformations • XSL Transformations, compression • OGSA-DAI has produced a framework for this e-Science Workflow Services - www.ogsadai.org.uk

  11. Simple Example: no data flow <sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> </sqlQueryStatement> sqlQueryStatement <deliverToURL name="deliverOutput"> <toURL> ftp://anon:frog@ftp.example.com/home </toURL> </deliverToURL> DeliverToURL e-Science Workflow Services - www.ogsadai.org.uk

  12. Simple Example: with data flow <sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“output1"/> </sqlQueryStatement> sqlQueryStatement <deliverToURL name="deliverOutput"> <fromLocal from=“output1"/> <toURL> ftp://anon:frog@ftp.example.com/home </toURL> </deliverToURL> DeliverToURL e-Science Workflow Services - www.ogsadai.org.uk

  13. <?xml version="1.0" encoding="UTF-8"?> <gridDataServicePerform xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types ../../../../schema/ogsadai/xsd/activities/activities.xsd"> <documentation> This example performs a simple select statement to retrieve one row from the test database. The results are delivered within the response document. </documentation> <sqlQueryStatement name="statement"> <expression> select * from littleblackbook where id=10 </expression> <resultSetStream name=“output"/> </sqlQueryStatement> <deliverToURL name="deliverOutput"> <fromLocal from=“output"/> <toURL>ftp://anon:frog@ftp.example.com/home</toURL> </deliverToURL> </gridDataServicePerform> The Perform Document e-Science Workflow Services - www.ogsadai.org.uk

  14. DeliverFromGDT outputStream DeliverToStream inputStream DeliverFromGFTP DeliverToGFTP xslTransform DeliverToURL zipArchive DeliverFromURL gzipCompression Predefined Building Blocks DeliverToGDT xmlCollectionManagement relationalResourceManager xmlResourceManagement sqlBulkLoadRowset xQueryStatement sqlUpdateStatement xUpdateStatement sqlStoredProcedure xPathStatement sqlQueryStatement e-Science Workflow Services - www.ogsadai.org.uk

  15. Activities: positives • Simple sequence pattern • Data-flow • Avoid multiple message exchanges • Minimise data movement • Extensible • XML Schema excerpt gives syntax • Associate an implementation with activity • Done at configuration • Allows optimisation • Enactment engine can optimise interaction e-Science Workflow Services - www.ogsadai.org.uk

  16. Activities: negatives • Incomplete syntax • Activity inputs and outputs are not typed • No typing of data streams • Possible issue in coming up with a sensible document • Activity implementation & XML schema loosely coupled • Keeping activity and implementation in synch • Semantics are not specified • Puts work load on the server • Workloads on the server may need to be managed • Activities not exposed at the interface level • This may change in line with DAIS • Perform document factored out from DAIS base specs • Standardisation to become a DAIS informational document • Scope may be bigger than DAIS e-Science Workflow Services - www.ogsadai.org.uk

  17. Inter Service Application Defined "Workflow" • Services stitched together by an application • Could be a client • Use the OGSA-DAI GridDataTransport (GDT) portType • Could be another service • Distributed Query Processing (DQP) • Service configured separately • Each performs its part in the workflow e-Science Workflow Services - www.ogsadai.org.uk

  18. <sqlQueryStatement> … </sqlQueryStatement> <deliverToGDT … /> <inputStream … /> <sqlUpdateStatement> … </sqlUpdateStatement> Client Driven Scenario (aka poor man's data integration) Data Service Client GDT Data Service Client creates Data Services. e-Science Workflow Services - www.ogsadai.org.uk

  19. GQES GDQS GQES GQES Evaluate sub-queries Service Driven Scenario Client Query planning, compilation, scheduling, evaluation, partitioning Distributed Query Processing e-Science Workflow Services - www.ogsadai.org.uk

  20. More Complex DQP Scenario e-Science Workflow Services - www.ogsadai.org.uk

  21. Application Driven "Workflow" • Labour intensive • Client driven (service choreography) • Restricted to small numbers of services • Need tooling • Even then this is best done through other means • Service driven (service orchestration) • DQP hides details • There may be other examples … • Need to explore this space further • Can probably accommodate these patterns in an existing workflow language • For more general data integration need: • Describe more sophisticated behaviour e-Science Workflow Services - www.ogsadai.org.uk

  22. Inter Service Document Coordination • Currently evolving • Document describes: • Sequence of operations that may span multiple services • Single document includes enough information to: • Run an expression on a source data service • Deliver the results to a target data service • Run and expression on the target data service • Informational document to be presented at GGF10 e-Science Workflow Services - www.ogsadai.org.uk

  23. Data Service A Dataset Example RemoteRequiredTableDataAccessRecipe.xsd <dar> <gsh> … </gsh> <type> …</type> <dataSet> … </dataSet> </dar> RequestDataRequest.xsd <dataRequest> … </dataRequest> Data Service Client e-Science Workflow Services - www.ogsadai.org.uk

  24. Document Driven "Workflow" • Work in this area is tentative • No implementations as yet • OGSA-DAI needs to see how it matures • Shows versatility • Carries over some of the OGSA-DAI activity framework • Focused on data • Can track provenance in the dataSet • Needs to be positioned against general workflow languages e-Science Workflow Services - www.ogsadai.org.uk

  25. Traditional Workflow • OGSA-DAI has not explored this space … yet • May need such a framework to facilitate data integration • Traditionally workflow: • Revolves around the execution of atomic activities • Use a processing model, e.g. WfMC based • Akin to how people talk about service orchestration • Want to use existing frameworks as far as possible • OGSA-DAI does not want to define its own workflow • DAIS may come up with something • Clearly: • Activity model can be used to implement a workflow • Collecting use cases e-Science Workflow Services - www.ogsadai.org.uk

  26. Workflow Issues • OGSA-DAI needs to play to see what works • Standards still evolving • IP rights: • BPEL4WS • Royalty-free … ? • WSCI • Royalty-free • Need workflow engines • Tooling to construct workflow • Ptolemy II … Triana … ? e-Science Workflow Services - www.ogsadai.org.uk

  27. Summary & Conclusions • Base standards in a state of flux • DAIS not settled down yet • If you don't like what you see get involved and change it • Document based interface needs to be re-worked • OGSA-DAI implemented simple "workflow" patterns • Successful for data access • Shied away from real workflow • Should try to use emerging standards if possible • Data integration will require workflow patterns • Need to examine use cases • Positioning of OGSA-DAI • Want it to be the leaves of your complex workflow graphs • Wrap your data sources and sinks • Try OGSA-DAI and feedback! e-Science Workflow Services - www.ogsadai.org.uk

  28. Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://cs.man.ac.uk/grid-db • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://www.ogsadai.org.uk/support • support@ogsadai.org.uk • OGSA-DAI training courses e-Science Workflow Services - www.ogsadai.org.uk

More Related