1 / 49

Trident

Scientific Workflow Workbench. eScience’08 Tutorial. Trident. Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin Gautam Microsoft Research. Joby Thomas and the development team Aditi Technologies. MSR (Trident) Summer ‘09 Interns. Eran Chinthaka

aggie
Download Presentation

Trident

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Workflow Workbench eScience’08 Tutorial Trident Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin Gautam Microsoft Research Joby Thomas and the development team Aditi Technologies

  2. MSR (Trident) Summer ‘09 Interns Eran Chinthaka Indiana University David Koop University of Utah Satya Sahoo Wright State University Matt Valerio Ohio State University

  3. Overview of our presentation today • Technical Content • Introduction • Feature Overview and Logical Architecture • Deep(er) dive into select features with demos • Roadmap to delivery • Design Philosophy and Exit Strategy • Leverage COTS WFMS, build only what is required • Extensible and open, integrate with community tools • Drive development from actual eScience requirements • Deliver as open source accelerator to the community

  4. Ocean Observing Initiative (OOI) Formerly the NEPTUNE project Workflow for Ocean Observatories, part of an “oceanographer’s workbench” Jim Gray Collaboration with Univ. of Wash & MBARI

  5. PanSTARRs (Astronomy) • One of the largest visible light telescopes • Four unit telescopes acting as one • One Gigapixel per telescope • Survey entire visible universe in 1 week • Catalog solar system, moving objects/asteroids • ps1sc.org: Univ. Hawaii, Johns Hopkins, … • Workflow Requirements • Load/Merge Databases • Execute on Clusters • Monitor workflow execution • Logging, Provenance, Faults

  6. Pan-STARRS Load & Merge Workflows Determine affine Slice Cold DB for CSV Batch Sanity Check of Network Files, Manifest, Checksum Create, Register empty LoadDB from template For Each CSV File in Batch Validate CSV File & Table Schema BULK LOAD CSV File into Table Perform CSV File/Table Validation Perform LoadDB/Batch Validation Start End Detect Load Fault. Launch Recovery Operations. Notify Admin. Determine ‘Merge Worthy’ Load DBs & Slice Cold DBs UNION ALL over Slice & Load DBs into temp. Filter on partition bound. For Each Partition in Slice Cold DB Switch OUT Slice partition to temp Switch IN temp to Slice partition Post Partition Load Validation Slice Column Recalculations & Updates Post Slice Load Validation Start End Detect Merge Fault. Launch Recovery Operations. Notify Admin.

  7. Trident Public Website Accessible today http://beta.research.microsoft.com/en-us/collaboration/tools/trident.aspx From January ‘09 http://research.microsoft.com/en-us/collaboration/tools/trident.aspx

  8. Logical ArchitectureFeaturesBuilding on Windows Workflow

  9. Trident Logical Architecture Visualization Workflow Packages Community Design Management Studio Workbench Monitor Web Portal (myExperiment) Scientific Workflows Administration Archiving Desktop Registry Management Windows Workflow Foundation Browser Trident Runtime Services Publish-Subscribe Blackboard WF Execution Hosts Fault Tolerance Provenance HPC Scheduling Others Trident Registry Data Model (Data Agnostic Abstraction) Data Access SQL Server SSDS S3 Others

  10. Trident Features • Libraries of activities, services, and workflows • Prepackaged activities and workflows out of the box and custom libraries • Registry with rich sets of workflow meta data • Versions • Workflow packages • Social annotations • (myExperiment)

  11. Trident Features • Two programming interfaces to Trident • Use Visual Studio to develop custom activities and workflows and import them to Trident • Visually Compose Workflows • No programming and scripting is required • Drag and drop a workflow or an activity • Subsections

  12. Execution Service • Local or distributed execution of workflows • HPCS cluster • Cloud services • Interactive and non-interactive execution service • Publishes events to subscriber services, such as tracking, provenance, and monitoring.

  13. Workflow Monitoring • Remote and local monitoring • Workflow processing status • Input and output parameters • Data products • Performance

  14. Management Studio • Administration of workflows and workflow scheduling • Registry management • Monitoring

  15. What is Windows Workflow? Host Process (.exe, IIS, …) • Part of Microsoft’s .Net framework 3.0, 3.5, and upcoming 4.0 • Activities • Runtime • Tooling Workflow Activity Library WF Runtime Extensions Persistence Tracking … Tooling VS Debugger VS Designer Rehosted Designer

  16. Composite Basic Windows Workflow Base Activity Library

  17. Workflow Authoring

  18. Trident Workflow Composer An End User Application for Editing, Executing, and Monitoring Scientific Workflows

  19. What Differentiates Scientific Workflow? • Composition goes through many iterations • Data flow is a first class citizen • Need an easy way to publish and share • Provenance • Runtime • Evolutionary • Adaptable to different computing environments

  20. Trident Workflow Composer Data Options & Sharing Workflow Library Composition Space Activity Library

  21. Composer Demo

  22. Trident Registry Flexible Data Store And Some More

  23. Trident RegistryMotivation: Why a new registry system? • Single “point of truth” of the system • Facilitates state synchronization actions • Catalog keeps track of computing resources and state • Flexible Storage • What is it? • Flexible store mechanism • Supports Microsoft and non-Microsoft store providers • Supports local, client-server and cloud architectures • Non goals • Replacement for LINQ or ER Framework • Reference Catalog • Unified view of the resources • Stores references to internal and external resources • Flexible provider mechanism to abstract access to external resources

  24. Trident RegistryRegistry Connections

  25. Trident RegistryRegistry Management

  26. Trident RegistryData Providers: Abstracting “What’s out there” • Storage providers • Provides abstraction to data structures stored in the backend • No assumptions on how data was stored and related Implemented using “verbs” and “subjects” actions • “Store object user with these properties” • “Relate this user object with this service as its owner” • “Delete namespace object” • Data abstraction layer and code generation • C# generated code provides shield and programming API • C# code generator generates SQL catalog for perfect datacode match

  27. Trident RegistryData Providers: Abstracting “What’s out there” • Creating new providers • Why would I create a new storage provider? • Enable Trident to store / retrieve state from other platforms • Enable Trident to store / retrieve state on other systems • Enhance existing providers with new features and abstractions • What it takes to create a new provider • Create a new assembly (or add to an existing provider assembly) • Create a new class derived from Microsoft.Research.eResearch.Connection • Drop our new DLL into Trident folder

  28. Creating a new Registry Provider DEMO

  29. Trident RegistryStorage vs References • Use Cases • Object Tracking • Data and Process Discovery • All workflow aspects are exposed in the storage schema • Allows rich query of data, activities, parameters, etc • Data Providers • Abstraction layer to external references (similar to registry data storage) • Enables user applications to benefit from unified model • Simplifies development • Enables fault tolerance for external resource sources • Not every workflow need to worry about these details • All data provider knowledge resides in the registry • Pluggable and flexible

  30. Trident RegistryProvider API • Managed (.NET) API • Library of choice for interacting with Trident Registry • Simplifies lots of data complexity • Abstracts verbs and actions into an object model • Access to all Trident Registry objects and relations • No need for servers and services to operate (access the data backend directly) • Faster, no extra hops. Direct data access. Native Managed API API • Native API • Useful for non-managed applications and systems integration • Similar to Managed (.NET) API in terms of performance and requirements • But more limited (not a 100% feature match right now) Managed Native • Web Services API • Recommended for non-Microsoft platform integration, e.g. Linux and Mac OS • Requires a IIS web server and service configured • Greater control over data and process, higher data security • Only core objects and relationships are exposed right now • Extra parsing and processing hop. Need to consider cluster and load and balancing solutions for high-performance scenarios Web Services Web Services

  31. Trident Blackboard A Distributed Eventing Model For Workflow

  32. The Workflow Runtimeand Tracking Services • WF workflows launch in a runtime context • Runtime thread controls WF related threads • Execution thread • Built-in services • Custom services • Built-in services track workflow execution • Workflow events • Individual activity events • Data updates

  33. Trident Blackboard • A distributed Pub/Sub model for workflow eventing • Why? • Tracking information needs to be shared across compute nodes • Workflows are evolutionary and thus messengers require a pluggable interface • Large message volume means that the message broker needs to be light-weight and fast

  34. The Blackboard Message • Titled name/value pair collection • All values are strings • Title and names can resolve against an ontology Structure Example ‘Collection Title’ ‘WF Runtime Event’ ‘name 1’ ‘name 2’ ‘name 3’ ‘value 1’ ‘value 2’ ‘value 3’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’

  35. The Blackboard Message • Titled name/value pair collection • All values are strings • Title and names can resolve against an ontology Structure Example ‘Collection Title’ ‘WF Runtime Event’ ‘name 1’ ‘name 2’ ‘name 3’ ‘value 1’ ‘value 2’ ‘value 3’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’ Publisher Workflow Tracker Subscriber Subscriber Provenance Store Database Logging

  36. Blackboard Architecture Publisher Interface Subscriber Interface Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Publisher Subscriber Publisher Subscriber Message Subscription Information Lightweight Message Queue

  37. Blackboard Architecture Message Routing Publisher Interface Subscriber Interface • Message Rerouting • Subscription Information Management • Recovery Logic Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Message Subscription Information Lightweight Message Queue

  38. Blackboard Architecture Subscription Information Routing Publisher Interface Subscriber Interface • Message Rerouting • Subscription Information Management • Recovery Logic Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Subscription Information Message Subscription Information Lightweight Message Queue

  39. Blackboard Architecture Internal Technologies Publisher Interface Subscriber Interface • Message Rerouting • Subscription Information Management • Recovery Logic Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Subscription Information Message Subscription Information Lightweight Message Queue Windows Workflow (WF) Windows Communication Foundation (WCF)

  40. Blackboard Architecture Logging and Monitoring Example Publisher Interface Subscriber Interface • Message Rerouting • Subscription Information Management • Recovery Logic Trident Workflow Executor WF Runtime Services Config File Tracking Blackboard File Writer Messages Composer Registry Resources ‘WF Runtime Event’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’ Message Subscription Information Lightweight Message Queue

  41. Blackboard Demo

  42. Trident Tips and Tricks

  43. Interoperability Story • Silverlight execution environment • Web frontend for management and execution • Allows non-Microsoft operating system to use and admister Trident • Interface with other systems • Cove • myExperiment

  44. Interface Trident  Other SystemsIntegration with UW COVE system DEMO

  45. Trident Tips and Tricks • Productivity Tools • Database ready activities • Simplifies development of database aware workflows • Code generator improves development productivity • Data visualization and charting activities • Web Service ready activities • Simplifies development of web service aware workflows • Code generator improves development productivity

  46. Trident Roadmap to Release

  47. TridentRoad Map

More Related