1 / 53

The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids

The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids. Fellow of Grid Computing

howell
Download Presentation

The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids Fellow of Grid Computing Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer Science and Software EngineeringThe University of Melbourne, Australiawww.gridbus.org Rajkumar Buyya

  2. Outline • Introduction to eScience and Challenges • Introduction to the Gridbus Project • An Overview of Gridbus Components • Grid Service Broker • Architecture • Design and Implementation • Scheduling Algorithms • BioGrid Demo • OR Performance Evaluation • A Case Study in High Energy Physics • Economy-based Scheduling in Data Grids • Summary

  3. Prominent Grid Drivers: Emerging eScinece and eBusiness Apps • Next generation experiments, simulations, sensors, satellites, even people and businesses are creating a flood of data. They all involve numerous experts/resources from multiple organization in synthesis, modeling, simulation,analysis, and interpretation. ~PBytes/sec High Energy Physics Brain Activity Analysis Newswire & data mining: Natural language engineering Digital Biology Life Sciences Astronomy Quantum Chemistry Finance: Portfolio analysis Internet & Ecommerce

  4. 2100 2100 2100 2100 2100 2100 2100 2100 Distributed instruments Distributed data E-Science Elements Peers sharing ideas and collaborative interpretation of data/results E-Scientist Distributed computation Remote Visualization Data & Compute Service

  5. Grid Information Service Grid Resource Broker Application R2 R3 R4 R5 RN Grid Resource Broker R6 R1 Resource Broker Grid Information Service database Grids have Emerged as Scalable Cyberinfrastructure for e-Science Applications

  6. Type of Services Modern Grids Offer • Computational Services – CPU cycles • SETI@Home, NASA IPG, TeraGrid, I-Grid,… • Data Services • Data replication, management, secure access--LHC Grid/Napster • Application Services • Access to remote software/libraries and license management—NetSolve • Information Services • Extraction and presentation of data with meaning • Knowledge Services • The way knowledge is acquired and managed—data mining. • Utility Computing Services • Towards a market-based Grid computing: Leasing and delivering Grid services as ICT utilities. Utility Grid Knowledge Grid Information Grid ASP Grid Data Grid Computional Grid

  7. Computational Economy Security Data locality Resource Allocation & Scheduling Uniform Access System Management Resource Discovery Application Construction Network Management Grid Challenges

  8. Australia Nimrod-G Gridbus DISCWorld GrangeNet. APACGrid ARC eResearch? Brazil OurGrid, EasyGrid LNCC-Grid + many others China ChinaGrid – Education CNGrid - application Europe UK eScience EU Grids.. and many more... India I-Grid Japan NAGERI Korea... N*Grid Singapore NGP USA Globus NASA IPG AccessGrid TeraGrid Cyberinfrasture and many more... Industry Initiatives IBM On Demand Computing HP Adaptive Computing Sun N1 Microsoft - .NET Oracle 10g Infosys – Business Grid StorageTek –Grid.. and many more Public Forums Global Grid Forum Australian Grid Forum Conferences: CCGrid Grid P2P HPDC Some Grid Initiatives Worldwide 27 million 1.3 billion – 3 yrs 2? billion 120million – 5 yrs 450million – 5 yrs 486million – 5 yrs 1.3 billion (Rs) 1 billion – 5 yrs http://www.gridcomputing.com

  9. The Gridbus Project @ Melbourne:Enable Leasing of ICT Services on Demand Distributed Data WWG Gridbus World Wide Grid! On Demand Utility Computing

  10. The Gridbus Project: http://www.gridbus.org • A multi-institutional “Open Source” R&D Project with focus on: • Architecture, Specification, and Open Source Reference Implementation. • Service-Oriented Grid, Utility Computing & Distributed Data and Computation Economy • Scaling from Desktops, Clusters, Cluster Federation, Enterprise Grids to Global Grids. • Grid Market Directory and Web Services • Grid Bank: Accounting and Transaction Management • Visual Tools for Creation of Distributed Applications • Workflow Composition and Deployment Services • Data Grid Brokering and Grid Economy Services • Data Replication Strategies • GridSim Toolkit: Enhanced to support Data Grid, Reservation, etc. • Libra: Economic Cluster Scheduler • Coupling of Clusters and Computational Economy • Alchemi: Harnessing .NET/Windows-based Resources • WWG: Global Data Intensive Grid Testbed • Application Enabler Projects: • High-Energy Physics , Astronomy, Brain Activity Analysis – Osaka U., Natural Language Processing, Portfolio Analysis – Spain, BioGrid - WEHI (via APACGrid), SensorGrid (NICTA), Medical Imaging (HFI) • Supported by:

  11. Grid Economy: Methodology for Sustained Resourced Sharing and Managing Supply-and-Demand for Resources

  12. New challenges of Grid Economy • Grid Service Providers (GSPs) • How do I decide service pricing models ? • How do I specify them ? • How do I translate them into resource allocations ? • How do I enforce them ? • How do I advertise & attract consumers ? • How do I do accounting and handle payments? • ….. • Grid Service Consumers (GSCs) • How do I decide expenses ? • How do I express QoS requirements ? • How do I trade between timeframe & cost ? • How do I map jobs to resources to meet my QoS needs? • ….. • They need mechanisms and technologies for value expression, value translation, and value enforcement.

  13. GRACE: Service Oriented Grid Architecture GRid Architecture for Computational Economy (GRACE)

  14. GRACE: A ReferenceService-Oriented Grid Architecture for Computational Economies Data Catalogue Grid Bank Information Service Grid Market Services Sign-on HealthMonitor Info ? Grid Node N … Grid Explorer … Secure ProgrammingEnvironments Job Control Agent Grid Node1 Applications Schedule Advisor QoS Pricing Algorithms Trade Server Trading Trade Manager Accounting Resource Reservation Misc. services … Deployment Agent JobExec Resource Allocation Storage Grid Resource Broker … R1 R2 Rm Grid Middleware Services Grid Consumer Grid Service Providers

  15. CDB PDB Gridbus and Complementary Grid Technologies – realizing GRACE Grid Applications … Science Commerce Engineering Collaboratories Portals … ExcellGrid Gridscape Workflow X-Parameter Sweep Lang. MPI User-LevelMiddleware (Grid Tools) … Grid Brokers: Workflow Engine Gridbus Data Broker Nimrod-G Core Grid Middleware Grid MarketDirectory Grid Exchange & Federation Globus Unicore Grid Storage Economy GridBank … Alchemi NorduGrid XGrid GRIDSIM .NET JVM Condor PBS SGE Libra Tomcat Grid Economy Grid Fabric Software Mac Windows Linux AIX IRIX OSF1 Solaris Grid Fabric Hardware Worldwide Grid

  16. Gridbus Technologies • Application Construction Tools • Visual Parametric Modeller (VPM) • Grid Economy Services • Grid Market Directory • A Registry for publication of GSPs and their Services – VO/VE • Grid Bank • A Grid Accounting Services • Grid Trading Services • Data Grid Service Broker • QoS based Scheduling of Distributed Data Oriented Apps on global Grids • Grid Workflow Management System • Gridscape • Interactive Grid Testbed Portal Generator • G-monitor • Grid Application Execution Management Portal • GridSim • A Grid Simulation Toolkit • Libra • Economy based Cluster Scheduling

  17. Alchemi: .NET-based Enterprise Grid Platform & Web Services Alchemi Manager Web Services Internet Alchemi Users Internet • SETI@Home like Model • General Purpose • Dedicated/Non-dedicate workers • Role-based Security • .NET and Web Services • C# Implementation • GridThread and Job Model Programming • Easy to setup and use Alchemi Worker Agent

  18. Application Code Explore data 1 Data Visual Application Composer 10 Results+Cost Info 2 GridResource Broker Data Catalogue 5 4 Grid Info Service Data Replicator (GDMP) 12 6 3 ASP Catalogue Grid Market Directory 9 7 Job Results 8 Grid Service (GS) (Globus) Bill Alchemi GS CPU orPE PE GTS 11 GridbusGridBank Cluster Scheduler PE GSP (Accounting Service) GSP (e.g., IBM) GSP (e.g., VPAC) GSP (e.g., UofM) On Demand Assembly of Services: Putting Them All Together Data Source (Instruments/distributed sources) Cluster Scheduler PE Grid Service Provider (GSP)(e.g., CERN)

  19. Creation and Operation of Virtual Enterprises Grid Market Directory Grid Bank

  20. Grid Info. Grid Bank Grid Market Grid Market Service Service Directory (GMD) Directory (GMD) “ “ register me as GSP register me as GSP ” ” “ “ Give me list of Give me list of GSPs GSPs & price? & price? ” ” GTS GTS GTS ” ” service available? service available? “ “ “ “ Solve this in Solve this in Resource Resource (Grid Service Provider) (Grid Service Provider) 5hrs for $20 5hrs for $20 ” ” Broker Broker “ “ GTS GTS GTS (RB selects (RB selects GSPs GSPs ) ) “ “ service available? service available? (GSP) (GSP) ” ” service available? service available? GTS GTS GTS ” ” GTS GTS GTS GTS GTS GTS ( ( GTS GTS - - Grid Grid Trade Server) Trade Server) A Market-Oriented Grid Environment

  21. Grid Market Infrastructure • Grids need to provide an infrastructure that supports: • (a) the creation of one or more GMP registries; • (b) the contributors to register themselves as GSPs along with their resources/application services that they wish to provide; • (c) GSPs to publish themselves in one or more GMPs along with service prices; and • (d) Grid resource brokers to discover resources/services and their attributes (e.g., access price and usage constraints) that meet user QoS requirements.

  22. Grid Bank: Grid Transactions Authorization, Accounting, & Payment Infrastructure GridBank Server GridCheque + Resource Usage (GSC Account Charge GridCheque Establish Service Costs Applications Grid Trade Server GridBank Charging Module Grid Resource Broker (GRB) GridBank Payment Module GridCheque Resource Usage Grid Agent Grid Resource Meter User Deploy Agent and Submt Jobs Usage Agreement R1 R2 R3 R4 Grid Service Consumer (GSC) User Grid Service Provider (GSP)

  23. Grid Applications: Composition and Deployment – A Broker Perspective Nimrod-G Broker: A Grid Broker for Computational Grids Gridbus Broker: A Grid Service Broker for Data Grids

  24. Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Natural Language Engineering Ecological Modelling: Control Strategies for Cattle Tick Sensitivityexperiments on smog formation Data Mining Electronic CAD: Field Programmable Gate Arrays High Energy Physics: Searching for Rare Events Computer Graphics: Ray Tracing Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Civil Engineering: Building Design Network Simulation Automobile: Crash Simulation Aerospace: Wing Design astrophysics

  25. Three Options/Solutions: • Using pure Globus commands • Build your own Distributed App & Scheduler • Use Gridbus Resource Broker to compose and schedule Manual Automated Thesis • Build a task farming application (parameter sweep or bag of tasks) and execute it on Grid within “T” hours or early and cost not exceeding $M.

  26. The Gridbus Grid Service Broker for Data Grid Applications Builds on the Nimrod-G Computational Grid Broker and Computational Economy [Buyya, Abramson, Giddy, Monash University, 1999-2001] And Extends its notion for Data and Service Grids

  27. Grid Service Broker (GSB) • A resource broker for scheduling task farming data Grid applications with static or dynamic parameter sweeps on global Grids. • It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation) • Key Features • A single window to manage & control experiment • Programmable Task Farming Engine • Resource Discovery and Resource Trading • Optimal Data Source Discovery • Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & sharing of results • Accounting

  28. Unicore Gateway Gridbus Broker at a Glance Home Node/Portal Credential Repository MyProxy Gridbus Broker batch() -PBS -Condor -Alchemi fork() Data Catalog Alchemi Globus Data Store Job manager Access Technology SRB fork() batch() Grid FTP -PBS -Condor -SGE Gridbus agent

  29. Gridbus Broker Architecture Gridbus Client Gribus Client Gridbus Client (Bag of Tasks Applications) App, T, $, Opt (Data Grid Scheduler) Gridbus Farming Engine Schedule Advisor Trading Manager RecordKeeper Grid Dispatcher Grid Explorer Grid Middleware TM TS $ GE GIS, NWS Grid Info Server RM & TS G $ Data Catalog Data Node C $ U G Unicore enabled node. Globus enabled node. L A RM: Local Resource Manager, TS: Trade Server Alchemi enabled node.

  30. Gridbus Services for eScience applications • Application Development Environment: • XML-based language for composition of task farming (legacy) applications as parameter sweep applications. • Task farming APIs for new applications. • Web APIs (e.g., Portlets) for Grid portal development. • Workflow interface and Gridbus-enabled workflow engine. • Resource Allocation and Scheduling • Dynamic discovery of optional computational and data nodes that meet user QoS requirements. • Hide Low-Level Grid Middleware interfaces • Globus, Alchemi, Unicore, NorduGrid, XGrid, etc.

  31. Gridbus Broker: XML file <parameter name=“X" type="integer"> <domain> <range><value from="1" to="10"/> <interval type="step"> 1</interval> </range> </domain> </parameter> <parameter name=“Y" type="integer"> <domain> <single> <value> 1</value> </single> </domain> </parameter> <task> <type>main</type> <copy> <source location="local" file="calc.$OS"/> <destination location="node" file="calc"/> </copy> <execute location="node"> <command>./calc $X $Y</command> </execute> <copy> <source location="node" file="output"/> <destination location="local" file="output.$jobname"/> </copy> </task>

  32. Grid Broker World-Wide Grid Portal-based Access to Grid Broker for Launching and Steering Applications

  33. Figure 3 : Logging into the portal. Drug Design Made Easy!

  34. 2100 2100 2100 2100 2100 2100 2100 2100 ExcelGrid Middleware Excel ExcelGrid Add-In ExcelGrid Runner ExcelGridJob Enterprise Grid Gridbus Broker Excel Plugin to Access Gridbus Services

  35. Adaptive Scheduling Steps Discover More Resources Discover Resources Establish Rates Compose & Schedule Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Distribute Jobs

  36. Deadline (D) and Budget (B) Constrained Scheduling Algorithms

  37. Sample Applications of Gridbus Broker • Molecular Docking - WEHI • Drug Discovery • Brain Activity Analysis – Osaka University • Neuroscience studies • Natural Language Engineering – Melbourne NLP • Indexing of newswire data • High Energy Physics – School of Physics/Melbourne • Belle experiment data analysis • Finance - Portfolio Analysis – U. Comp. Madrid/Spain • Investment risk analysis • Astronomy • Australian Virtual Observatory • Spreadsheet Processing • Microsoft Excel

  38. Economy-based Data Grid Scheduling High Energy Physics as eScience Application Case Study

  39. Case Study: High Energy Physics • What is High Energy Physics? (HEP) • Study of the fundamental constituents of matter and forces. • High Energy Physics - using H.E. enables the probing of smaller distances/structures and study in early-universe like environ. • Particle Physics - quanta of matter/forces and their properties • The Belle Experiment • KEK B-Factory, Japan • Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain the universal matter – antimatter imbalance. • Collaboration 400 people, 50 institutes • 100’s TB data currently

  40. Case Study: Event Simulation and Analysis B0->D*+D*-Ks • Simulation and Analysis Package - Belle Analysis Software Framework (BASF) • Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data • Only the Analysis is discussed here

  41. Australian Belle Data Grid Testbed

  42. Case Study: Input File for Analysis parameter jobf Gridfilelfn:/users/winton/fsimddks/fsimdata*.mdst; task main copy runme.grid2 node:runme.grid2 node:execute ./runme.grid2 $jobf $jobname endtask • Dynamic parameter defined to describe an input data file • Logical file name pointing to the location in the replica catalog that contains a mapping to where the physical files are present. 100 data files (30MB each) were equally distributed among the five nodes

  43. Resources Used and their Service Price

  44. Network Cost (in Grid $/Currency!)

  45. Deploying Application Scenario • A data grid scenario with 100 jobs and each accessing remote data of ~30MB • Deadline: 3hrs. • Budget: G$ 60K • Scheduling Optimisation Scenario: • Minimise Time • Minimise Cost • Results:

  46. fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Time (in mins.) Time Minimization in Data Grids

  47. fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 100 90 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time(in mins.) Results : Cost Minimization in Data Grids

  48. Observation

  49. Workflow Submission Handler Workflow Scheduler Data Movement Dispatcher Workflow Language Parser Resource Discovery Grid Workflow Management System and Broker Services Workflow Planner Application Composition Scientific Portal …… Workflow description & QoS Workflow Enactment Engine Info Service GMD Parameters Tasks Dependencies Replica Catalog MDS Gridbus Broker Globus Web services HTTP GridFTP Data transfer Database Database

  50. The GridSim ToolkitA Java based tool for Grid Scheduling Simulations Application, User, Grid Scenario’s Input and Results Application Configuration Resource Configuration Visual Modeler Grid Scenario . . . Output Grid Resource Brokers or Schedulers’s Simulation GridSim Toolkit Application Modeling Resource Entities Information Services Job Management Resource Allocation Statistics Add your own policy for resource allocation Resource Modeling and Simulation (with Time and Space shared schedulers) Single CPU SMPs Clusters Load Pattern Network Reservation Basic Discrete Event Simulation Infrastructure SimJava Distributed SimJava Virtual Machine (Java, cJVM, RMI) Distributed Resources PCs Workstations SMPs Clusters

More Related