140 likes | 250 Views
Grid Middleware for High Performance Computing. Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education and Research Centre (SERC) Indian Institute of Science (IISc) Bangalore - 560012. ATIP 1 st Workshop on HPC in India @ SC-09. Grid Applications Research Lab.
E N D
Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education and Research Centre (SERC) Indian Institute of Science (IISc) Bangalore - 560012 Workshop on HPC in India ATIP 1st Workshop on HPC in India @ SC-09
Grid Applications Research Lab • Grid and Parallel Computing with primary focus on • developing grid applications, • building strategies for checkpointing, migration, rescheduling, and fault-tolerance for parallel applications on grid systems, and • performance modeling of parallel applications on grids ATIP 1st Workshop on HPC in India @ SC-09
Motivation • Developing solutions for deployment and use of large-scale scientific applications on grids • Will result in exploration of large-sized problems and long-running applications ATIP 1st Workshop on HPC in India @ SC-09
Grid ApplicationsClimate Modeling CCSM • Enable efficient executions of long-running climate modeling simulations on grid systems with the objective of solving climate science problems • Community Climate System Model (CCSM) – a multi-component global general circulation model • Analyzed the benefits of executing different components with checkpointing and rescheduling in different batch systems of a grid with a novel execution model ATIP 1st Workshop on HPC in India @ SC-09
Grid ApplicationsClimate Modeling – General IdeaIJHPCA, FGCS Novel Execution Model • Job submission to a batch system incurs queue waiting time • Waiting time depends on processor requirements • How about decomposing a job into small subjobs with small processor requirements and submitting the subjobs to multiple batch systems of a grid? • Efficiency depends on effective system utilization using checkpointing, migration and rescheduling • Leads to 55% average increase in throughput ATIP 1st Workshop on HPC in India @ SC-09
Grid ApplicationsDNA Sequence Evolutions JPDC, escience 2009 Master-Worker Architecture for Analyzing Mutations • Predictions of future sequences in an evolutionary tree important for drug discovery, pharmaceutical research and disease control • Different ways of an ancestor sequence to transform to a progeny sequence • Formulated as a search-space exploration problem and used computational grids for explorationsof the huge space of possible mutations • Used popular mutations to predict future evolutionary paths. • Performed predictions for hiv sequences and other protein sequences • 40% better than random methods 40% Better Predictions ATIP 1st Workshop on HPC in India @ SC-09
Rescheduling • It is necessary to adapt application execution to grid resource and application dynamics • SRS – a checkpointing library for malleable applications • Can allow processor reconfiguration between migrations • Supports different data distributions, storage infrastructure, active migration and fault tolerance ATIP 1st Workshop on HPC in India @ SC-09
Resheduling Strategies • Given a parallel application consisting of multiple phases and given a set of resources, the problem is to derive a rescheduling plan • Where to execute the different phases and when to migrate/reschedule Application Phases Cluster-1 2 3 Interval 1 (t1) • To find {I1, I2, …,ILopt} such that Interval 2 (t2) is minimized where Lopt – number of intervals; ti – predicted execution time of each interval; rcost – rescheduling cost Interval 3 (t3) • Developed 3 novel algorithms for deriving a rescheduling plan • Incremental algorithm, division heuristic and genetic algorithm Interval i (ti) Division heuristic ATIP 1st Workshop on HPC in India @ SC-09
Rescheduling Strategies • Performed experiments with five large-scale multi-phase parallel applications • Molecular dynamics, n-body simulations, astrophysical gas dynamics, crack propagation, electromagnetics. Huge Benefits due to Rescheduling ATIP 1st Workshop on HPC in India @ SC-09
Performance ModelingJPDC,CPE Performance Model Accuracy for Parallel QR • It is imperative to automatically derive “knowledge” (performance characteristics) of applications • Can be used for effective mapping of applications to resources • Built techniques for automatically deriving performance model functions for predicting execution costs of parallel applications on grids • First effort to deal with load changes during application executions • Less than 30% modeling errors – best reported for non-dedicated systems • Have also developed novel scheduling algorithms that use the model functions • Generates 80% better schedules than existing approaches Scheduling Results Box Elimination (BE) [red bars] 50-80% more efficient! ATIP 1st Workshop on HPC in India @ SC-09 Scheduling Method
Grid Middleware • Created a grid middleware for parallel multi-phase applications with rescheduling capabilities • Have successfully run multi-phase applications on grid consisting of multiple batch and interactive clusters in two geographically distributed sites • Also created a grid middleware for multi-component applications for coordinating the executions of the components on the different systems Grid Middleware for Multi-Component Applications Grid Middleware for Multi-Phase Applications ATIP 1st Workshop on HPC in India @ SC-09
Other Research • Checkpointing Interval Selection • For efficient execution in the presence of failures • A Markov Model consisting of 3 kinds of states for performance prediction • Extensive simulations with 9-year real supercomputer failure traces on 8 parallel systems, 3 rescheduling policies, and 3 parallel applications • Our model’s checkpointing intervals lead to high amount of useful work by the applications in the presence of failures • Compiler-aided checkpointing instrumentation • A source-to-source precompiler for automatic insertion of checkpointing calls • Performs live-variable analysis for determining data and wrappers for finding data sizes • Can handle parallel applications with block-distribution (molecular dynamics) ATIP 1st Workshop on HPC in India @ SC-09
Summary • Primary endeavor to aid scientific advancement in different domain areas using grid systems • Grid research in two different application areas that resulted in significant application benefits using grids • Contributed novel scheduling and rescheduling algorithms, performance modeling strategies and robust grid middleware for use by scientific community ATIP 1st Workshop on HPC in India @ SC-09
Areas of Collaborations • Scalability of large-scale and peta applications • Fault tolerance in high performance systems • Setting up Indo-US grids • Grid middleware collaborations Thank You ATIP 1st Workshop on HPC in India @ SC-09