480 likes | 591 Views
GRID activities at MTA SZTAKI. Peter Kacsuk MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu. Contents. SZTAKI participation in EU and Hungarian Grid projects P-GRADE ( P arallel G r id R un-time and A pplication D evelopment E nvironment )
E N D
GRID activities at MTASZTAKI Peter Kacsuk MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu
Contents • SZTAKI participation in EU and Hungarian Grid projects • P-GRADE (Parallel Grid Run-time andApplication Development Environment) • Integration of P-GRADE and Condor • TotalGrid • Meteorology application by TotalGrid • Future plans
Hungarian and international GRID projects EU DataGrid • VISSZKI • Globus test • Condor test • DemoGrid • - file system • monitoring • applications Cactus CERN LHC Grid • SuperGrid • - P-GRADE • portal • security • accounting Condor APART-2 EU GridLab EU COST SIMBEX
EU Grid projects of SZTAKI • DataGrid – application performance monitoring and visualization • GridLab – grid monitoring and information system • APART-2 – leading the Grid performance analysis WP • SIMBEX – developing a European metacomputing system for chemists based on P-GRADE
Hungarian Grid projects of SZTAKI • VISSZKI • explore and adopt Globus and Condor • DemoGrid • grid and application performance monitoring and visualization • SuperGrid (Hungarian Supercomputing Grid) • integrating P-GRADE with Condor and Globus in order to provide a high-level program development environment for the Grid • ChemistryGrid (Hungarian Chemistry Grid) • Developing chemistry Grid applications in P-GRADE • JiniGrid (Hungarian Jini Grid) • Combining P-GRADE with Jini and Java • Creation the OGSA version of P-GRADE • Hungarian Cluster Grid Initiative • To provide a nation-wide cluster Grid for universities
NIIFI 2*64 proc. Sun E10000 ELTE 16 proc. Compaq AlphaServer BME 16 proc. Compaq AlphaServer • SZTAKI 58 proc. cluster • University (ELTE, BME) clusters Structure of theHungarianSupercomputing Grid 2.5 Gb/s Internet
The Hungarian Supercomputing GRID project GRID application GRID application Web based GRID access GRID portal High-level parallel development layer P-GRADE Low-level parallel development PVM MW MPI Grid level job management Condor-G Grid middleware Globus Grid fabric Condor, SGE Condor, SGE Condor, SGE Condor, SGE Compaq Alpha Server Compaq Alpha Server SUN HPC Clusters
Distributed supercomputing: P-GRADE • P-GRADE (Parallel Grid Run-time and Application Development Environment) • A highly integrated parallel Grid application development system • Provides: • Parallel, supercomputing programming for the Grid • Fast and efficient development of Grid programs • Observation and visualization of Grid programs • Fault and performance analysis of Grid programs • Further development in the: • Hungarian Supercomputing Grid project • Hungarian Chemistry Grid project • Hungarian Jini Grid project
Communication Templates • Pre-defined regular process topologies • process farm • pipeline • 2D mesh • User defines: • representative processes • actual size • Automatic scaling
Support for systematic debugging to handle non-deterministic behaviour of parallel applications Automatic dead-lock detection Replay technique with collective breakpoints Systematic and automatic generation of Execution Trees Testing parallel programs for every time condition Macrostep Debugging
GRM semi-on-line monitor • Monitoring and visualising parallel programs at GRAPNEL level. • Evaluation of long-running programs based on semi-on-line trace collection • Support for debugger in P-GRADE by execution visualisation • Collection of both statistics and event traces • Application monitoring and visualization in the Grid • No lost of trace data at program abortion. The execution can be visualised to the point of abortion.
PROVE Statistics Windows • Profiling based on counters • Analysis of very long running programs is enabled
PROVE: Visualization of Event Traces • User controlled focus on processors, processes and messages • Scrolling visualization windows forward and backwards
Features of P-GRADE • Designed for non-specialist programmers • Enables fast reengineering of sequential programs for parallel computers and Grid systems • Unified graphical support in program design, debugging and performance analysis • Portabilityon • supercomputers • heterogeneous clusters • components of the Grid • Two execution modes: • Interactive mode • Job mode
Design/Edit Development cycle Visualize Compile Monitor Map Debug Typical usage on supercomputers or clusters P-GRADE Interactive Mode P-GRADE Interactive mode
Design/Edit Attach P-GRADE and Condor Compile Detach CondorMap Submit job P-GRADE Job Mode with Condor Typical usage on clusters or in the Grid
Condor flocking Condor 2100 2100 2100 2100 Condor 2100 2100 2100 2100 P-GRADE P-GRADE P-GRADE Mainframes Clusters Grid Condor/P-GRADE on the whole range of parallel and distributed systems GFlops Super-computers
P-GRADE program runsat the Madisoncluster P-GRADE program runs at the Budapest cluster P-GRADE program runsat the Westminstercluster Berlin CCGrid Grid Demo workshop: Flocking of P-GRADEprograms by Condor P-GRADE Budapest n0 n1 m0 m1 Budapest Madison p0 p1 Westminster
P-GRADE program runsat the Londoncluster P-GRADE program downloaded to London as a Condor job 1 3 P-GRADE program runs at theBudapestcluster 4 2 London clusteroverloaded=> check-pointing P-GRADE program migrates to Budapest as a Condor job Next step: Check-pointing and migration of P-GRADEprograms Wisconsin P-GRADE GUI Budapest London n0 n1 m0 m1
Further development: TotalGrid • TotalGrid is a total Grid solution that integrates the different software layers of a Grid (see next slide) and provides for companies and universities • exploitation of free cycles of desktop machines in a Grid environment after the working/labor hours • achieving supercomputer capacity using the actual desktops of the institution without further investments • Development and test of Grid programs
Layers of TotalGrid P-GRADE PERL-GRID Condor or SGE PVM or MPI Internet Ethernet
PERL-GRID • A thin layer for • Grid level job management between P-GRADE and various local job managers like • Condor • SGE, etc. • file staging • Application in the Hungarian Cluster Grid
Hungarian Cluster Grid Initiative • Goal: To connect the 99 new clusters of the Hungarian higher education institutions into a Grid • Each cluster contains 20 PCs and a network server PC. • Day-time: the components of the clusters are used for education • At night: all the clusters are connected to the Hungarian Grid by the Hungarian Academic network (2.5 Gbit/sec) • Total Grid capacity by the end of 2003: 2079 PCs • Current status: • About 400 PCs are already connected at 8 universities • Condor-based Grid system • VPN (Virtual Private Network) • Open Grid: other clusters can join at any time
Structure of the Hungarian Cluster Grid Condor => TotalGrid 2003: 99*21 PC Linux clusters, total 2079 PCs Condor => TotalGrid 2.5 Gb/s Internet Condor => TotalGrid
Live demonstration of TotalGrid • MEANDER Nowcast Program Package: • Goal: Ultra-short forecasting (30 mins) of dangerous weather situations (storms, fog, etc.) • Method: Analysis of all the available meteorology information for producing parameters on a regular mesh (10km->1km) • Collaborative partners: • OMSZ (Hungarian Meteorology Service) • MTA SZTAKI
Structure of MEANDER First guess data ALADIN SYNOP data Satelite Radar Lightning CANARI Delta analysis decode Basic fields: pressure, temperature, humidity, wind. Radar to grid Rainfall state Derived fields: Type of clouds, visibility, etc. Satelite to grid Visibility Overcast GRID Type of clouds Current time Visualization For users: GIF For meteorologists:HAWK
P-GRADE version of MEANDER 25 x 10 x 25 x 5 x
Live demo of MEANDER based on TotalGrid P-GRADE PERL-GRID 11/5 Mbit Dedicated job ftp.met.hu HAWK netCDF 34 Mbit Shared netCDF output job netCDF input netCDF output 512 kbit Shared PERL-GRID CONDOR-PVM Parallel execution
Results of the delta method • Temperature fields at 850 hPa pressure • Wind speed and direction on the 3D mesh of the MEANDER system
On-line Performance Visualization in TotalGrid P-GRADE PERL-GRID 11/5 Mbit Dedicated job ftp.met.hu netCDF 34 Mbit Shared job netCDF input GRM TRACE GRM TRACE 512 kbit Shared PERL-GRID CONDOR-PVM Parallel execution and GRM
P-GRADE: Software Development and Execution Edit, debugging Performance-analysis Execution Grid
Applications in P-GRADE Completed applications • Meteorology: Nowcast package (Hungarian Meteorology Service) • Urban traffic simulation (Univ. of Westminster) Applications under development • Chemistry applications • Smog forecast system • Analysis of smog alarm strategies
Further extensions of P-GRADE • Automatic check-pointing of parallel applications inside a cluster (already prototyped) • Dynamic load-balancing at • Fault-tolerant execution mechanism • Automatic check-pointing of parallel applications in the Grid (under development) • Automatic application migration in the Grid • Fault-tolerant execution mechanism in the Grid • Saving unfinished parallel jobs of the Cluster Grid • Extensions under design • Parameter study support • Connecting P-GRADE with GAT • Workflow layer for complex Grid applications
1st job 2nd job 3rd job 4th job 5th job Workflow interpretation of MEANDER
Conclusions • SZTAKI participates in the largest EU Grid projects and in all the Hungarian Grid projects • Main results: • P-GRADE (SuperGrid project) • Integration of P-GRADE and Condor (SuperGrid) • demo at Berlin CCGrid • TotalGrid (Hungarian Cluster Grid) • Meteorology application in the Grid based on the P-GRADE and TotalGrid approaches • demo at the 5th EU DataGrid conference • Access of P-GRADE 8.2.2: www.lpds.sztaki.hu
Thanks for your attention ? Further information: www.lpds.sztaki.hu
GRM semi-on-line monitor • Semi-on-line • stores trace events in local storage (off-line) • makes it available for analysis at any time during execution for user or system request (on-line pull model); • Advantages • analyse the state (performance) of the application at any time • scalability: analyse trace data in smaller sections and delete them if they are not longer needed • Less overhead/intrusion to the execution system than with on-line collection (see NetLogger) • Less network traffic: pull modelinstead of push model. Collections initiated only from top.
GRM/PROVE in the DataGrid project • Basic tasks: • step 1: To create a GRM/PROVE version that is independent from P-GRADE and runable in the Grid • step 2: To connect the GRM monitor to the R-GMA information system
GRM in the grid Submit machine Main MonitorMM Trace file PROVE Pull model => smaller network traffic than in NetLogger Site 1 Local monitor => more scalable than NetLogger Site 2 PC 1 PC 2 PC 1 Local MonitorLM Local MonitorLM Local MonitorLM shm shm shm App. Process App. Process App. Process App. Process
Server host MM Host:port host 3 GUI Main Monitor host 2 host 4 Submit host Site Start-up of Local Monitors This mechanism is used in TotalGrid and in the live demo Grid broker Local job manager LAN WAN application process2 application process1 Local Monitor Local Monitor
Client machine Main Monitor PROVE 2nd step: Integration with R-GMA R-GMA Site Machine 2 Machine 1 App.Process App.Process App.Process
Application Main Monitor Consumer Servlet Consumer API Registry API Registry Servlet Schema API Producer API Registry API Schema Servlet Instrumented application code Sensor ProducerServlet “database of event types” Integration with R-GMA R-GMA XML SQL SELECT SQL CREATE TABLE SQL INSERT