250 likes | 501 Views
CyberShake Study 2.3 Technical Readiness Review. Study re-versioning. SCEC software uses year.month versioning Suggest renaming this study to 13.4. Study 13.4 Scientific Goals. Compare Los Angeles-area hazard maps RWG V3.0.3 vs AWP-ODC-SGT (CPU) CVM-S4 vs CVM-H
E N D
Study re-versioning • SCEC software uses year.month versioning • Suggest renaming this study to 13.4
Study 13.4 Scientific Goals • Compare Los Angeles-area hazard maps • RWG V3.0.3 vs AWP-ODC-SGT (CPU) • CVM-S4 vs CVM-H • Different version of CVM-H than previous runs • Adds San Bernardino, Santa Maria basins • 286 sites (10 km mesh + points of interest)
Study 13.4 Data Products • 2 CVM-S4 Los Angeles-area hazard maps • 2 CVM-H 11.9 Los Angeles-area hazard maps • Hazard curves for 286 sites – 10s, 5s, 3s • Calculated with OpenSHA v13.0.0 • 1144 sets of 2-component SGTs • Seismograms for all ruptures (about 470M) • Peak amplitudes in DB for 10, 5, 3s • Access via CyberShake Data Product Site (in development)
Study 13.4 Notables • First AWP-ODC-SGT hazard maps • First CVM-H 11.9 hazard maps • First CyberShake use of Blue Waters (SGTs) • First CyberShake use of Stampede (post-processing) • Largest CyberShake calculation by 4x
Study 13.4 Parameters • 0.5 Hz, deterministic post-processing • 200 m spacing • CVMs • Vs min = 500 m/s • GTLs for both velocity models • UCERF 2 • Latest rupture variation generator
Verification work • 4 sites (WNGC, USC, PAS, SBSM) • RWG V3.0.3, CVM-S • RWG V3.0.3, CVM-H • AWP, CVM-S • AWP, CVM-H • Plotted with previously calculated RWG V3 • Expect RWG V3 slightly higher than the others
WNGC CVM-S CVM-H RWG V3.0.3 - Green AWP - Purple RWG V3 - Orange
USC CVM-S CVM-H RWG V3.0.3 - Green AWP - Purple RWG V3 - Orange
PAS CVM-S CVM-H RWG V3.0.3 - Green AWP - Purple RWG V3 - Orange
SBSM CVM-S CVM-H RWG V3.0.3 - Green AWP - Purple RWG V3 - Orange
Study 13.4 SGT Software Stack • Pre AWP • New production code • Converts velocity mesh into AWP format • Generates other AWP input files • SGTs • AWP-ODC-SGT CPU v13.4 (from verification work) • RWG V3.0.3 (from verification work) • Post AWP • New production code • Creates SGT header files for post-processing with AWP
Study 13.4 PP Software Stack • SGT Extraction • Optimized MPI version with in-memory rupture variation generation • Support for separate header files • Same code as Study 2.2 • Seismogram Synthesis / PSA calculation • Single executable • Same code as Study 2.2 • Hazard Curve calculation • OpenSHA v13.0 • All codes tagged in SVN before study begins
Distributed Processing (SGTs) • Runs placed in pending file on Blue Waters (as scottcal) • Cron job calls build_workflow.py with run parameters • build_workflow.py creates PBS scripts defining jobs with dependencies • Cron job calls run_workflow.py • run_workflow.py submits PBS scripts using qsub dependencies • Limited restart capability • Final workflow jobs ssh to shock, call handoff.py • Performs BW->Stampede SGT file transfer (as scottcal) • scottcal BW proxy must be resident on shock • Registers SGTs in RLS • Adds runs to pending file on shock for post-processing
Distributed Processing (PP) • Cron job on shock submits post-processing runs • Pegasus 4.3, from Git repository • Condor 7.6.6 • Globus 5.0.3 • Jobs submitted to Stampede (as tera3d) • 8 sub-workflows • Extract SGT jobs as standard jobs (128 cores) • seis_psa jobs as PMC jobs (1024 cores) • Results staged back to shock, DB populated, curves generated
SGT Computational Requirements • SGTs on Blue Waters • Computational time: 8.4 M SUs • RWG: 16k SUs/site x 286 sites = 4.6 M SUs • AWP: 13.5k Sus/site x 286 sites = 3.8 M SUs • 22.35 M SU allocation, 22 M SUs remaining • Storage: 44.7 TB • 160 GB/site x 286 sites = 44.7 TB
PP computational requirements • Post-processing on Stampede • Computational time: • 4000 SUs/site x 286 sites = 1.1 M SUs • 4.1 M SU allocation, 3.9 M remaining • Storage: 44.7 TB input, 13 TB output • 44.7 TB of SGT inputs; will need to rotate out • Seismograms: 46 GB/site x 286 sites = 12.8 TB • PSA files: 0.8 GB/site x 286 sites = 0.2 TB
Computational Analysis • Monitord for post-processing performance • Will run after workflows have completed • May need python scripts for specific CS metrics • Scripts for SGT performance • Cronjob to monitor core usage on BW • Does wrapping BW jobs in kickstart help? • Ideally, same high-level metrics as Studies 1.0 and 2.2
Long-term storage • 44.7 TB SGTs: • To be archived to tape (NCSA? TACC? Somewhere else?) • 13 TB Seismograms, PSA data • Have been using SCEC storage - scec-04? • 5.5 TB workflow logs • Can compress after mining for stats • CyberShake database • 1.4 B entries, 330 GB data (scaling issues?)
Estimated Duration • Limiting factors: • Blue Waters queue time • Uncertain how many sites in parallel • Blue Waters → Stampede transfer • 100 MB/sec seems sustainable from tests, but could get much worse • 50 sites/day; unlikely to reach • Estimated completion by end of June
Personnel Support • Scientists • Tom Jordan, Kim Olsen, Rob Graves • Technical Lead • Scott Callaghan • Job Submission / Run Monitoring • Scott Callaghan, David Gill, Phil Maechling • Data Management • David Gill • Data Users • Feng Wang, MarenBoese, Jessica Donovan
Risks • Stampede becomes busier • Post-processing still probably shorter than SGTs • CyberShake database unable to handle data • Would need to create other DBs, distributed DB, change technologies • Stampede changes software stack • Last time, necessitated change to MPI library • Can use Kraken as backup PP site while resolving issues • New workflow system on Blue Waters • May be as yet undetected bugs