180 likes | 194 Views
General Grid Monitoring Infrastructure (GGMI). Peter kacsuk and Norbert Podhorszki MTA SZTAKI. General Grid Monitoring Infrastructure (GGMI). PULSE. PROVE. GRM. R-GMA Browser. Grid Status Monitoring Infrastructure GSMI (R-GMA). Grid Appl. Monitoring Infrastructure GAMI.
E N D
General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI
General Grid Monitoring Infrastructure (GGMI) PULSE PROVE GRM R-GMA Browser Grid Status Monitoring Infrastructure GSMI (R-GMA) Grid Appl. Monitoring Infrastructure GAMI
Performance comparison of GSMI and GAMI • Performance measurement • Loop, instrumented with GRM • For loop N, generates 2N+2 events(loop begin + loop end+ start + exit) • 1 machine • 3 machines • M1: Producer and ProducerServlet • M2: All the other servlets including ConsumerServlet • M3: Consumer
Performance of GSMI • AppTotal = time between inserting first and last event • Total = time between inserting first event and receiving the last event in Consumer • On 3 machines, N=1000 R-GMA starts loosing events. Only 1848, 1780 from 2002 events received. • For N=10K, test never finishes (one night at least)
Performance of GAMI • AppTotal = time between inserting first and last event • Total = time between inserting first event and receiving the last event in Consumer • No loosing events • Linear scaling
R-GMA vs GAMI on 3 machines R-GMA GAMI GAMI 1 machine vs 3 machines 10.000 !!! 100.000 !!!
GAMI structure Local Host PROVE Site 1 Main MonitorMM Main MonitorMM Site 2 Host 1 Host 2 Host 1 Local MonitorLM Local MonitorLM Local MonitorLM Application Process Application Process Appl. Process Appl. Process
GAMI • To deliver trace data from the application to the user efficiently. • Uses TCP Socket communication • Data in XDR format and could be optimised for TCP transmission • Two sw. hops between application and GRM: local and main monitors • One hw. hop: host of main monitor
Steps of application monitoring • Step 1: user submits a job (gets GID from the broker) • Step 2: user starts PROVE with parameter GID • Step 3: PROVE looks for the execution site (search in R-GMA) • Step 4: PROVE looks for the address of GAMI Main Monitor of the execution site (search in R-GMA) • Step 5: PROVE subscribes for application trace at the GAMI Main Monitor • Step 6: GAMI Main Monitor associates the application job id (GID) with the Unix process ids.
Problems of Application Monitoring • Problem 1: To find the execution site of the application by PROVE • Where is it running? -> machineX.siteY • Problem 2: To find the monitor to be connected • What is the address of GAMI Main Monitor running at siteY? • Problem 3: To find the application by the GAMI Main Monitor • What processes (PIDs) belong to application GID? • Solution: The info needed for solving these problems should be published in R-GMA => integration of R-GMA and GAMI needed
Problems • Problem 1: To find the execution site of the application by PROVE • Where is it running? -> machineX.siteY • Broker R-GMA (discussion with WP1) • Problem 2: To find the monitor to be connected • What is the address of GAMI Main Monitor running at siteY? • GAMI Main Monitor R-GMA
Problems • Problem 3: To find the application by the GAMI Main Monitor • What processes (PIDs) belong to application GID? • Problem to be solved: 5 levels of job/process ids • GID (generated by the resource broker) • Condor G – ID • GRAM ID • Local job manager ID • Process ID • Discussion with WP1
Temporary solution for the 3rd problem • User defines unique id for the application • Application process publishes this id to the GAMI Local Monitor • PROVE will use this id for collecting trace data
User support tools General Grid Monitoring Infrastructure (GGMI) Grid Status Monitoring Infrastructure GSMI (R-GMA) Grid Appl. Monitoring Infrastructure GAMI PULSE PROVE R-GMA Browser GRM
Tools • Pulse: • Analysis and presentation of Grid performance data • R-GMA browser: • web-based browser for available shemas and producers within the R-GMA • GRM: • Instrumentation library for trace collection • On-line and off-line monitoring of sequential and MPI applications • PROVE: • On-line and off-line visualization of trace for sequential and MPI applications
Documents and reports • User's Manual for the stand-alone GRM/PROVE • GRM/PROVE User's Guide • Versions of GRM • Peformance Monitoring, Analysis and Presentation for Grid Applications • Technical report about GRM within the EU-DataGrid project • http://www.lpds.sztaki.hu/~pnorbert/grm/
Publications • From Cluster Monitoring to Grid Monitoring Based on GRM • EuroPar’2001, Manchester • Application Monitoring in the Grid with GRM and PROVE • Proc. of the International Conference on Computational Science - ICCS 2001, San Francisco • Presentation and Analysis of Grid Performance Data • EuroPar'2003, Klagenfurt • Pulse: A Tool for Presentation and Analysis of Grid Performance Data • MIPRO'2003, Opatija • http://www.lpds.sztaki.hu/~pnorbert/edg/publications/
Summary • Advantages of the concept: • Gives a full Grid monitoring infrastructure including both • Status monitoring • Application monitoring • Supports on-line and off-line mpi application monitoring and visualization • Increases the chance that it will be used by LCG-2 • No special or extra requirement for R-GMA • Integration will be done by SZTAKI • Gives the potential of competing the US solutions • already two prestigious papers at EuroPar’01 and EuroPar’03 • Further potential publication in JOGC