1 / 13

Software Implementation

Software Implementation. Junwei Cao ( 曹军威 ) and Junwei Li ( 李俊伟 ) Tsinghua University Gravitational Wave Summer School Kunming, China, July 2009. Computing Challenges. Large amount of data Parallel and distributed processing Clusters and grids Clusters: collection of workstations

moesha
Download Presentation

Software Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Implementation Junwei Cao (曹军威) and Junwei Li (李俊伟) Tsinghua University Gravitational Wave Summer School Kunming, China, July 2009

  2. Computing Challenges • Large amount of data • Parallel and distributed processing • Clusters and grids • Clusters: collection of workstations • Grids: collection of clusters/supercomputers • Software becomes the key • LIGO data analysis software working group (DASWG) • Future computing infrastructures

  3. Birmingham• • Cardiff AEI/Golm The LSC Data Grid (LDG)

  4. The LDG Software Stack LDAS DMT LALApps Matlab End users & applications Application enabling LDM LDGreport Glue Onasys LSC Job management LSC Data management LDR LSCdataFind LSCsegFind The LSC Data Grid Client/Server Environment Version 4.0 (based on VDT 1.3.9) LSCcertUtils LSC CA LSC Security management Applications Infrastructures Condor-G Worklfow management / Condor DAGman VDS VOMS Catalog service / Globus Resource location service / Globus Metadata service Resource management / Globus GRAM Data transfer / GridFTP Job scheduling / Condor Grid security / Globus GSI Middleware / Services Operating Systems and … FC4 GCC Python Autotools MySQL

  5. The LDG 4.0 Client/Server …… platformGE( 'Linux' ); package( 'Client-Environment' ); cd( 'vdt' ); package( 'VDT_CACHE:Globus-Client' ); package( 'VDT_CACHE:CA-Certificates' ); package( 'VDT_CACHE:Condor' ); package( 'VDT_CACHE:Fault-Tolerant-Shell' ); package( 'VDT_CACHE:GSIOpenSSH' ); package( 'VDT_CACHE:KX509' ); package( 'VDT_CACHE:MyProxy' ); package( 'VDT_CACHE:PyGlobus' ); package( 'VDT_CACHE:PyGlobusURLCopy' ); package( 'VDT_CACHE:UberFTP' ); package( 'VDT_CACHE:EDG-Make-Gridmap' ); package( 'VDT_CACHE:Globus-RLS-Client' ); package( 'VDT_CACHE:VDS' ); package( 'VDT_CACHE:VOMS-Client' ); cd(); package( 'Client-FixSSH' ); package( 'Client-RLS-Python-Client' ); package( 'Client-Cert-Util' ); package( 'Client-LSC-CA' ); …… OR platformGE( 'Sun' ); package( 'SolarisPro' ); OR platformGE( 'MacOS' ); package( 'Mac' ); • Written in Pacman 3 • Based on VDT 1.3.9 • Support LDG:Client and LDG:ClientPro • Support multiple platforms: FC4, Solaris and Darwin at client and only FC4 at server • Support both 32bit and 64bit machines • Server includes client • Online documentation for step-by-step installation

  6. Data Discovery (LSCdataFind) Metadata observatory = L dataType = RDS_R_L3 startGPS = 751658016 endGPS = 751658095 Metadata Service LFNs L-RDS_R_L3-751658016-16.gwf L-RDS_R_L3-751658032-16.gwf L-RDS_R_L3-751658048-16.gwf L-RDS_R_L3-751658064-16.gwf L-RDS_R_L3-751658080-16.gwf Resource Location Service @ldas.mit.edu PFNs (for both local and remote) file://localhost/data/node10/frame/S3/L3/LLO/L-RDS_R_L3-751658016-16.gwf gsiftp://ldas.mit.edu/data/node10/frame/S3/L3/LLO/L-RDS_R_L3-751658016-16.gwf file://localhost/data/node11/frame/S3/L3/LLO/L-RDS_R_L3-751658032-16.gwf gsiftp://ldas.mit.edu/data/node11/frame/S3/L3/LLO/L-RDS_R_L3-751658032-16.gwf file://localhost/data/node12/frame/S3/L3/LLO/L-RDS_R_L3-751658048-16.gwf gsiftp://ldas.mit.edu/data/node12/frame/S3/L3/LLO/L-RDS_R_L3-751658048-16.gwf file://localhost/data/node13/frame/S3/L3/LLO/L-RDS_R_L3-751658064-16.gwf gsiftp://ldas.mit.edu/data/node13/frame/S3/L3/LLO/L-RDS_R_L3-751658064-16.gwf file://localhost/data/node14/frame/S3/L3/LLO/L-RDS_R_L3-751658080-16.gwf gsiftp://ldas.mit.edu/data/node14/frame/S3/L3/LLO/L-RDS_R_L3-751658080-16.gwf

  7. MIT (Tier 2) Caltech (Tier 1) LDRMaster LDRMaster mysql mysql 1. update metadata lfn LDRMetadataService LDRMetadataService 2. get lfn pfn@caltech(remote) RLS RLS LDRSchedule LDRSchedule 5. rlsadd lfn pfn@mit LDRTransfer (gridftp) LDRTransfer (gridftp) 3. generate lfn pfn@mit 4. gridftp pfn@caltech(remote) pfn@mit(local) Local Storage Module Local Storage Module Data Replication (LDR) 24/7 Operation LHO (Tier 0) LLO (Tier 0)

  8. Name Server Single data stream SMP DMT Online Use Scenario – control-room type DMT Monitors DMT Offline Use Scenario – standalone or grid enabled Multiple data streams Stdout Trigger files Alarm files Trend files …… Data Monitoring Toolkit (DMT) client gui Server lsmp lmsg MonServer base container xml /data/node10/frame/S3/L3/LHO/H-RDS_R_L3-751658016-16.gwf /data/node11/frame/S3/L3/LHO/H-RDS_R_L3-751658032-16.gwf /data/node12/frame/S3/L3/LHO/H-RDS_R_L3-751658048-16.gwf /data/node13/frame/S3/L3/LHO/H-RDS_R_L3-751658064-16.gwf /data/node14/frame/S3/L3/LHO/H-RDS_R_L3-751658080-16.gwf /data/node15/frame/S3/L3/LHO/H-RDS_R_L3-751658096-16.gwf /data/node16/frame/S3/L3/LHO/H-RDS_R_L3-751658112-16.gwf sigp html ezcalib xsil /data/node10/frame/S3/L3/LLO/L-RDS_R_L3-751658016-16.gwf /data/node11/frame/S3/L3/LLO/L-RDS_R_L3-751658032-16.gwf /data/node12/frame/S3/L3/LLO/L-RDS_R_L3-751658048-16.gwf /data/node13/frame/S3/L3/LLO/L-RDS_R_L3-751658064-16.gwf /data/node14/frame/S3/L3/LLO/L-RDS_R_L3-751658080-16.gwf /data/node15/frame/S3/L3/LLO/L-RDS_R_L3-751658096-16.gwf /data/node16/frame/S3/L3/LLO/L-RDS_R_L3-751658112-16.gwf dmtenv event …… trig …… frameio DMT Libraries

  9. An Example DMT Monitor filelist1.txt multilist.txt /data/node10/frame/S3/L3/LLO/L-RDS_R_L3-751658016-16.gwf /data/node11/frame/S3/L3/LLO/L-RDS_R_L3-751658032-16.gwf /data/node12/frame/S3/L3/LLO/L-RDS_R_L3-751658048-16.gwf /data/node13/frame/S3/L3/LLO/L-RDS_R_L3-751658064-16.gwf /data/node14/frame/S3/L3/LLO/L-RDS_R_L3-751658080-16.gwf /data/node15/frame/S3/L3/LLO/L-RDS_R_L3-751658096-16.gwf /data/node16/frame/S3/L3/LLO/L-RDS_R_L3-751658112-16.gwf filelist1.txt filelist2.txt filelist2.txt rmon /data/node10/frame/S3/L3/LHO/H-RDS_R_L3-751658016-16.gwf /data/node11/frame/S3/L3/LHO/H-RDS_R_L3-751658032-16.gwf /data/node12/frame/S3/L3/LHO/H-RDS_R_L3-751658048-16.gwf /data/node13/frame/S3/L3/LHO/H-RDS_R_L3-751658064-16.gwf /data/node14/frame/S3/L3/LHO/H-RDS_R_L3-751658080-16.gwf /data/node15/frame/S3/L3/LHO/H-RDS_R_L3-751658096-16.gwf /data/node16/frame/S3/L3/LHO/H-RDS_R_L3-751658112-16.gwf standalone run of rmon DMT offline monitor [jcao@ldaspc1 rmon]$ export LD_LIBRARY_PATH=/opt/lscsoft/dol/lib [jcao@ldaspc1 rmon]$ ./rmon -opt opt -inlists multilist.txt Processing multi list file: multilist.txt Number of lists added: 2 Total data streams: 2 Processing frame list file: /home/jcao/rmon/filelist1.txt Number of files added: 1188 Total frame files: 1188 Processing frame list file: /home/jcao/rmon/filelist2.txt Number of files added: 1188 Total frame files: 1188 channel[1]=H1:LSC-AS_Q channel[2]=L1:LSC-AS_Q startgps=751658000 stride=16 r-statistic=-0.00251782 startgps=751658016 stride=16 r-statistic=-0.0122699 startgps=751658032 stride=16 r-statistic=0.0168868 …… opt stride 16.0 channel_1 H1:LSC-AS_Q channel_2 L1:LSC-AS_Q

  10. client LDM server Other tools The LDM Modules and Flowchart QUEUED REJECTED SCHEDULED LSCdataFind Server ldm_submit ldm_locate_script LSCdataFind LOCATING ldm_agent RELEASED condor_master LOCATED Globus Job Manager ldm_q ldm_exec_script condor_submit RUNNING ldm_rm FINISHED LDM_SITES ldm_agent [MIT] lscdatafindserver = ldas-gridmon.mit.edu globusscheduler = ldas-grid.mit.edu/jobmanager-condor environment = LD_LIBRARY_PATH=/dso-test/home/jcao/dol/lib [CIT] lscdatafindserver = ldas-gridmon.ligo.caltech.edu globusscheduler = ldas-grid.ligo.caltech.edu/jobmanager-condor environment = LD_LIBRARY_PATH=/dso-test/jcao/dol/lib [LHO] lscdatafindserver = ldas-gridmon.ligo-wa.caltech.edu globusscheduler = ldas-grid.ligo-wa.caltech.edu/jobmanager-condor environment = LD_LIBRARY_PATH=/dso-test/jcao/dol/lib [LLO] lscdatafindserver = ldas-gridmon.ligo-la.caltech.edu globusscheduler = ldas-grid.ligo-la.caltech.edu/jobmanager-condor environment = LD_LIBRARY_PATH=/data2/jcao/dol/lib LDM_CONFIG Condor [AGENT] RESOURCES = @MIT@CIT@LHO@LLO SITES = /home/jcao/ldm/etc/LDM_SITES EXEC = /home/jcao/ldm/bin/ldm_exec_script LOCATE = /home/jcao/ldm/bin/ldm_locate_script PID = /home/jcao/ldm/var/ldm.pid LOG = /home/jcao/ldm/var/ldm.log LDG = /home/jcao/ldg-3.0/ Modules developed or deployed Modules designed and underdeveloped

  11. Data Monitoring Environment (LDM) grid-enabled run of rmon DMT offline monitor using LDM ldm.sub [job] id = test monitor = rmon args = -opt opt input = opt [data] observatory = @H@L type = @RDS_R_L3@RDS_R_L3 start = 751658000 end = 751676993 [jcao@ldaspc1 ~]$ cd ldm [jcao@ldaspc1 ldm]$ source setup.sh [jcao@ldaspc1 ldm]$ cd ../rmon [jcao@ldaspc1 rmon]$ ldm_agent [jcao@ldaspc1 rmon]$ ldm_submit ldm.sub Job test has been submitted. [jcao@ldaspc1 rmon]$ more ldm_test_condor.out Processing multi list file: ldm_test_CIT_multilist.txt Number of lists added: 2 Total data streams: 2 …… startgps=751658000 stride=16 r-statistic=-0.00251782 …… automatically generated Condor-G submission file Users are interfaced with a LIGO friendly language. universe = globus globusscheduler = ldas-grid.ligo.caltech.edu/jobmanager-condor log = ldm_test_condor.log output = ldm_test_condor.out error = ldm_test_condor.err should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = ldm_test_CIT_multilist.txt, ldm_test_CIT_filelist1.txt, ldm_test_CIT_filelist2.txt, /home/jcao/rmon/opt arguments = -inlists ldm_test_CIT_multilist.txt -opt opt environment = LD_LIBRARY_PATH=/dso-test/jcao/dol/lib executable = /home/jcao/rmon/rmon Queue Users do not bother with technical details of LSC data grid services. Data are located and file lists are generated automatically

  12. Open Science Grid (OSG) • High energy physics • Bioinformatics • Nanotechnology • …… • Grid of Grids • 20 thousand CPUs • Petabytes of data storage

  13. Thank You ! Junwei Cao jcao@tsinghua.edu.cn http://ligo.org.cn

More Related