1 / 18

Under the Hood of a Workflow Manager

Under the Hood of a Workflow Manager. Matthew Shields, BiodiversityWorld GRID workshop , NeSC, 30 June - 1 July. T. n. a. r. i. a. Outline. What is Workflow management? Why should I care? Current State of the Art Workflow Languages Other Projects Triana, Architecture & Services

dash
Download Presentation

Under the Hood of a Workflow Manager

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Under the Hood of aWorkflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T n a r i a

  2. Outline • What is Workflow management? • Why should I care? • Current State of the Art • Workflow Languages • Other Projects • Triana, Architecture & Services • Extending Triana for BDWorld • Conclusion

  3. What is Workflow Management? • Concept comes from business world • Many years of research and practice • Process capture and reuse • Repeatability, provenance, audit trails & accountability • Domain expert knowledge capture • Analysis and optimization

  4. What Can a Workflow Manager do for Me? • Scientific Workflow different focus to business • Large-scale data collection • Querying • Analysis • Visualization • Similar goals • Component & workflow reuse • Knowledge capture • Additional goals • Simplified application/experiment design • Environment/Complexity abstraction

  5. State of the Art • Schedule workflow tasks (Grid/distributed environment) • Monitor/Control execution • Active visualization and computational steering • User interaction • Pause and restart • Data provenance • Component and sub-workflow reuse • Analysis and optimization

  6. Workflow Languages • No current agreed standard • Most projects use DAG or Petri-Net • Data vs control flow • Dependency vs scripting language • Many XML schema • Business workflow standards - BPEL • Not good enough fit • GGF WFM-RG • Attempting to solicit agreement on standards

  7. Workflow Management Projects • myGrid/Taverna - Southampton & others • XML/DAG based workflow language • Initially WS choreography tool - now incorporates local tools/components • Grid integration with databases via OGSA Distributed Query Processor • myGrid Project main users - Bioinformatics • Kepler - SDSC • Based on Ptolemy - modeling, simulation & design of real time & concurrent systems • Concurrent dataflow • Actors (components), Directors (workflow engines) • Local, Web Service & Grid Service actors • Ecology, biology, chemistry, oceanography, and the geosciences

  8. WM Projects 2 • Karajan/Commodity Grid (CoG) Kit, Argonne & Berkerley • Scripting workflow language for Grid tasks • Integration with Globus Toolkit GT3 & GT4 • Pure control flow • Data flow performed by data tasks - GridFTP • And many more…See • http://www.gridworkflow.org/snips/gridworkflow/ • http://www.extreme.indiana.edu/swf-survey/

  9. Triana • Cardiff University! PPARC funded • Java based Scientific Workflow Tool or PSE • Originally designed for Signal Processing • Now domain independent • Bioinformatics - obviously! • Signal Processing - gravitational wave detection & radio astronomy • Design optimisation • Data mining • Medical imaging • Distributed Audio Processing

  10. Triana Components • Local Java components • Service-oriented Components • Web services as components (WSRF coming soon) • Web service workflow • Peer 2 Peer services as components • Distributed service workflow • Grid-oriented Components • Grid file and job primitives as components • Complex Grid workflow • Legacy code components via GridMonSteer • Mix and Match composition

  11. Workflow • Inherently data flow based • control flow through “messages” • XML/DCG workflow format • Internally workflow language independent • Migration to standards based language • Simple Parent/Child relationship between tasks • Context based implied actions • Local file -> local file = file copy • Local file -> remote file = file transfer • Import/Export other workflow formats • Pegasus/EGEE read/write DAGMan format

  12. Grid services Triana Architecture Service Based Computing: Grid Computing: Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services Job Submission, File services A Graphical Grid Computing Environment or Portal GAP Interface GAT Interface P2PS JXTA Web Services Condor Unicore GridFTP GRMS WSRF Globus RLS PBS .NET GridLab P2PS Discovery UDDI JXTA Discovery SOAP P2PS Pipes SSH SGE LDR Other.. JXTA Pipes

  13. Service Discovery Dynamic? Decentralized? Communication Message Format SOAP? Transport Protocol TCP? UDP? Triana in a SO World en_fr hello network bonjour BabelFish GAP babelfish. altavista. com

  14. GAP Interface P2PS JXTA Web Services P2PS Discovery UDDI JXTA Discovery SOAP P2PS Pipes JXTA Pipes GAP Interface • A Simple Service based API, for • Service Deployment, • Service Discovery • Pipe Based Communication • Static application interface with multiple middleware bindings • P2PS • JXTA • Web services

  15. WSPeer • High Level Interface to Web Services • Discovery • Invocation • Deployment • Hosting • Abstract from usual Web Service Discovery and Communication Mechanisms (i.e. UDDI and HTTP) • P2PS Web Service Discovery? • Uses Apache AXIS as SOAP Engine • Extends Capabilities of Apache AXIS • Stubless Invocation (including complex types) • Non Standard Transports (i.e. P2PS)

  16. locate publish publish locate deploy deploy invoke invoke launch server UDDI HTTP Server WSPeer Application deploy publish locate invoke WSPeer – HTTP/UDDI WSPeer – P2PS

  17. Extending Triana for BDWorld • BDWorld proxy components talk to Web Services • Workflow Design Assistant (WfDA) • selection and composition of BDWorld workflows from available services • Uses Meta Data Repository (MDR) & Meta Data Agent (MDA) • MDR contains mapping from proxies to resources • WfDA captures domain knowledge in constraints • Constraints used to limit the possible components at each stage of composition • Simplifies valid workflow creation

  18. Conclusion • A workflow manager should: • Simplify scientific experimentation • Enable reuse at multiple levels • Component • Sub-workflow/Compund components • Collaboration • Abstract component and environment complexities • Think of all components as a service that performs a known task • Implied/Context based operations - file copy/move • Put the scientist back in control of the science, not the computing

More Related