1 / 53

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS. Outline. IPSL climate modelling centre (ICMC) presentation IPSLCM history and perspective Mini how to use modipsl/libIGCM Post-processing with libIGCM Monitoring a simulation Hands-on.

kyoko
Download Presentation

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training session 2 :Advanced training course on modipsl and libIGCMNovember 14th 2013, MdS

  2. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  3. IPSL climate modelling centre (ICMC)

  4. Modeling platform(IPSL-ESM)Arnaud Caubel (LSCE) - Marie-Alice Foujols (IPSL) Current and future climate changesJean-Louis Dufresne(LMD) - Olivier Boucher (LMD) Atmospheric and surface physics and dynamics (LMDZ)Frédéric Hourdin (LMD) - Laurent Fairhead (LMD) Paleoclimate and last millennium Pascale Braconnot - Masa Kageyama (LSCE) Ocean and sea ice physics and dynamics (NEMO, LIM)C Ethé (IPSL) - Claire Lévy - Gurvan Madec (LOCEAN) “Near-term” prediction (seasonal to decadal)Eric Guilyardi (LOCEAN) - Juliette Mignot (LOCEAN) Atmosphere and ocean interactions (IPSL-CM, different resolutions) Sébastien Masson (LOCEAN) - Olivier Marti (LSCE) Regional climatesRobert Vautard (LSCE), Laurent Li (LMD) Atmospheric chemistry and aerosols (INCA, INCA_aer, Reprobus)Anne Cozic (LSCE) - M. Marchand (LATMOS) Biogeochemical cycles (PISCES)Laurent Bopp (LSCE) - Patricia Cadule (IPSL) Evaluation of the models, present-day and future climate change analysis Sandrine Bony (LMD) - Patricia Cadule (IPSL) - Marion Marchand (LATMOS) - Juliette Mignot (LOCEAN) – Jérôme Servonnat (LSCE) Data Archive and Access RequirementsSébastien Denvil (IPSL) - Karim Ramage (IPSL) ICMC organisation PI: J-L Dufresne; Office: L. Bopp, MA Foujols, J. Mignot Steering committee Continental processes (ORCHIDEE)Philippe Peylin (LSCE) - Josefine Ghattas (IPSL)

  5. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  6. IPSLCM history

  7. IPSLCM history and scientific articles IPCC reports FAR AR5 SAR TAR AR4 1990 1995 2001 2007 2013 CMIP projects CMIP3 CMIP 1 & 2 CMIP5 few articles IPSL-CM1 some articles IPSL-CM2 10+ articles IPSL-CM4 30+ articles IPSL-CM5 IPSL-CM6

  8. LMDZ : atmospheric componenthttp://lmdz.lmd.jussieu.fr/?set_language=en Next LMDZ training session : 9-11 December 2013inscription before 15th November http://studs.unistra.fr/studs.php?sondage=1wgk8t9v44nsml27

  9. Introduction to LMDZ

  10. NEMO: oceanic componenthttp://www.nemo-ocean.eu

  11. Short history of IPSL modelhttp://icmc.ipsl.fr/index.php/icmc-models

  12. 1979 : 1st Linpack performance list 80 Mflops

  13. Supercomputers timeline : top500.org *10/4 years

  14. Complexity and resolution of models IPCC, AR4, WG1, Chap. 1, fig 1.2 and 1.4

  15. top500.org : number of CPUS/cores 100 000 1 000 10 1993 2003 2013

  16. Technical challenges : HPC • More parallelism in component : • MPI : messages programming • hybrid ie MPI/OpenMP : directives and shared memory • More parallelism in coupled model • 3 executables at least • each with MPI or MPI/OpenMP • more executables with XIOS : IO servers • Huge amount of data produced, to be analysed

  17. on the road for IPSL-CM6 • New physical package : LMDZ, NEMO, ORCHIDEE • Increased H and V resolutions • Ensembles of simulations • Longer simulations : paleo • More complexity : INCA chemistry added • More processors used in parallel • New dynamical core : DYNAMICO • Optimisation in IO • Improvement and Reliability of libIGCM

  18. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  19. Récupérer, compiler et lancer une configuration de type _v5 • Accès à MODIPSL svn co http://forge.ipsl.jussieu.fr/igcmg/svn/modipsl/trunk modipsl • Accès à IPSLCM5_v5cd modipsl/util ; ./model IPSLCM5_v5 • Installation des Makefilescd modipsl/util ; ./ins_make • Compilation cd modipsl/config/IPSLCM5_v5 ; gmake + resolution choisie • Installation de l’expérience type (et post-traitements) cp EXPERIMENT/IPSLCM5/piControl/config.card . vi config.card ### JobName=MYEXP ../../util/ins_job ### recopie repertoire piControl dans MYEXP avec COMP, DRIVER, PARAM • Soumission du Job de lancementcd modipsl/config/IPSLCM5_v5/MYEXP; ccc_msub Job_MYEXPllsumbmit Job_MYEXP

  20. IPSL sources of components cvs/svn servers Connection Specific configuration dowloading Modipsl Compilation Simulation set up LibIGCM Physical package choice and set up Job set up and submission LibIGCM Front End Computing

  21. Generical job: AA_Job PeriodLength

  22. libIGCM library : schematic description EXP00/DRIVER EXP00 driver EXP00/COMP card

  23. Job_EXP00 Job_EXP00 Job_EXP00 Job_EXP00 Computing job PackFrequency pack_debug PackFrequency pack_restart RebuildFrequency rebuild pack_output Post-processing jobs SeasonalFrequency create_se atlas atlas create_ts TimeSeriesFrequency create_ts monitoring

  24. TGCC computers and file system in a nutshell Computers airainfront-end curie hybrid nodes-q hybrid airainnodes curiefront-end curiethin nodes -q standard curielarge nodes -q xlarge login compute File system Small precious filesSaved space $HOME $CCCWORKDIR sources small results IGCM_OUT : MONITORING/ATLAS cp dods/work dods_cp temporary REBUILD IGCM_OUT : files to be packed outputs of post-proc jobs $SCRATCHDIR cp quotas $CCCSTOREDIR IGCM_OUT : Packed resultsOutput, Analyse SE and TS dods/store ccc_hsm get dods_cp HPSS : Robotic tapes Temporary space Non saved space Saved space Space on tapes Visible from www October 2013

  25. curie Job_EXP00 Job_EXP00 Job_EXP00 Compute TGCC PeriodLength PeriodLength $SCRATCHDIR/IGCM_OUT/.../REBUILD RebuildFrequency rebuild Post curie $SCRATCHDIR/IGCM_OUT/XXX/Output $SCRATCHDIR/IGCM_OUT/XXX/Restart Debug PackFrequency PackFrequency pack_restart pack_debug ncrcat tar pack_output Post curie $CCCSTOREDIR/IGCM_OUT/.../RESTART DEBUG $CCCSTOREDIR/IGCM_OUT/XXX/Output TimeSeriesFrequency SeasonalFrequency create_ts create_se Post monitoring atlas curie TS et SE : $CCCSTOREDIR/IGCM_OUT/…  dods/storeMONITORING et ATLAS : $CCCWORKDIR  dods/work DodsCopy=TRUE/FALSE

  26. IDRIS computers and file system in a nutshell turingfront-end turingcalcul adappfront-end adappcompute adacompute login compute Small precious filesSaved space $HOME File system $HOME sources small results temporary REBUILD IGCM_OUT : files to be packed outputs of post-proc jobs $WORKDIR $WORKDIR $TMPDIR $TMPDIR $TMPDIR mfput/mfget mfput/mfget gaya dods $HOME dmput/dmget IGCM_OUT :Output, Analyse MONITORING/ATLAS dods_cp Robotic tapes Temporary space Non saved space Saved space Space on tapes Visible from www October 2013

  27. ada Job_EXP00 Job_EXP00 Job_EXP00 Compute IDRIS PeriodLength PeriodLength $WORKDIR/IGCM_OUT/.../REBUILD RebuildFrequency rebuild Post adapp $WORKDIR/IGCM_OUT/XXX/Output $WORKDIR/IGCM_OUT/XXX/Restart Debug PackFrequency PackFrequency pack_restart pack_debug ncrcat tar pack_output Post adapp gaya:IGCM_OUT/.../RESTART DEBUG gaya:IGCM_OUT/XXX/Output TimeSeriesFrequency SeasonalFrequency create_ts create_se Post monitoring atlas adapp DodsCopy=TRUE/FALSE gaya:IGCM_OUT/…  dods.idris.fr

  28. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  29. Time Series : create_ts.job • A Time Series is a file which contains a single variable over the whole simulation period (ChunckJob2D = NONE) or for a shorter period for 2D (ChunckJob2D = 100Y) or 3D (ChunckJob3D = 50Y) variables. • The write frequency is defined in theconfig.cardfile: TimeSeriesFrequency=10Yindicates that the time series will be written every 10 years and for 10-year periods. • The Time Series are set in the COMP/*.card files by the TimeSeriesVars2D and TimeSeriesVars3D options. • The Time Series coming from monthly (or daily) output files are stored on the file server in the IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/Composante/Analyse/TS_MO and TS_DA directories. • Bonus : TS_MO_YE (for annual mean time series) are produced for all TS_MO variables • You can add or remove variables to the TimeSeries lists according to your needs. [Post] ... #D- If you want to produce time series, this flag determines #D- frequency of post-processing submission (NONE if you don't want) TimeSeriesFrequency=10Y config.card • [OutputFiles] • List= (histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\ • ... • [Post_1M_histmth] • Patches= () • GatherWithInternal = (lon, lat, presnivs, time_counter, time_counter_bnds, aire) • TimeSeriesVars2D = (bils, cldh, ... • ... • ChunckJob2D = NONE • TimeSeriesVars3D = (upwd, lwcon, ... • ... • ChunckJob3D = OFF COMP/lmdz.card

  30. MONITORING : dods

  31. Intermonitoring : http://webservices.ipsl.jussieu.fr/monitoring/

  32. How to add a new variable in MONITORING • You can add or change the variables to be monitored by editing the configuration files of the monitoring. Those files are defined by default for each component. • The monitoring is defined here: ~compte_commun/atlas For example for LMDZ on curie : ~p86ipsl/monitoring01_lmdz_LMD9695.cfgFor example for LMDZ on adapp : ~rpsl035/monitoring01_lmdz_LMD9695.cfg • You can change the monitoring by creating a POST directory which is part of your configuration. Copy a .cfg file and change it the way you want. • use ferret language • You can monitor variables produced in time series and stored in TS_MO POST/monitoring01_lmdz_LMD9695.cfg • #-------------------------------------------------------------------------------------------------------- • # field | files patterns | files additionnal | operations | title | units | calcul of area • #-------------------------------------------------------------------------------------------------------- • nettop_global | "tops topl" | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)" | "W/m^2" | "aire[d=3]"

  33. Seasonal mean : create_se.job • A seasonal means files (SE) contain averages for each month of the year (jan, feb,...) for a frequency defined in the config.card files • SeasonalFrequency=10Y The seasonal means will be computed every 10 years. • SeasonalFrequencyOffset=0 The number of years to be skipped for calculating seasonal means. • All files with a requested Post (Seasonal=ON in COMP/*card) are then averaged within the ncra script before being stored in the directory: • IGCM_OUT/IPSLCM5A/DEVT/pdControl/MyExp/ATM/Analyse/SE. There is one file per SeasonalFrequency=10Y • ATLAS are launched by create_se. ATLAS sources are : ~rpsl035 ~p86ipsl/atlas #======================================================================== #D-- Post - [Post] ... #D- If you want to produce seasonal average, this flag determines #D- the period of this average (NONE if you don't want) SeasonalFrequency=10Y #D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency #D- Usefull if you do not want to consider the first X simulation's years SeasonalFrequencyOffset=0 config.card • [OutputFiles] • List=(histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\ • ... • [Post_1M_histmth] • ... • Seasonal=ON COMP/lmdz.card

  34. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  35. Monitoring the simulation Verification and Correction

  36. Monitoring a simulation • We strongly encourage you to check your simulation frequently during run time. First of all, check job status : ccc_mstat llq • Real time limit exceeded : jobs are killed without any message on ada • RunChecker.job : This tool, provided with libIGCM, allows you to find out your simulations' status. • One historical simulation, 156 years : 1850-2005 is composed by 50 computing jobs and 1000 post-processing jobs Documentation http://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation

  37. Monitoring a simulation : mail • You receive a message at the end of the simulation • The simulation could be completed or failed De : rpsl003@idris.fr Objet : COURSNIV2 completed Date : 22 octobre 2013 18:29:24 UTC+02:00 À : rpsl003@idris.fr Dear rpsl003, Simulation COURSNIV2 completed on supercomputer ada027 Simulation started : 20000101 Simulation ended : 20000102 Output files are available in /u/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2 Files to be rebuild are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2/REBUILD Pre-packed files are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2 Script files, Script Outputs and Debug files (if necessary) are available in /gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2 Greetings! Check this out for more information : https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/documentation Mail Début du message réexpédié : De : rpsl003@idris.fr Objet : MyJobTest failed Date : 22 octobre 2013 17:17:41 UTC+02:00 À : rpsl003@idris.fr Dear rpsl003,

  38. Monitoring a simulation : run.card • When the simulation has started, the file run.card is created by libIGCM using the template run.card.init. • run.cardcontains information of the current run period and the previous periods already finished. • This file is updated at each run period by libIGCM. • You can find here information of the time consumption of each period. • The status of the job is set to OnQueue, Running, Completed or Fatal. [Configuration] #last PREFIX OldPrefix= COURSNIV2_20000103 #Compute date of loop PeriodDateBegin= 2000-01-04 PeriodDateEnd= 2000-01-04 CumulPeriod= 4 # State of Job "Start", "Running", "OnQueue", "Completed" PeriodState= Completed SubmitPath= /gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2 #======================================================================== [PostProcessing] TimeSeriesRunning=n TimeSeriesCompleted= #======================================================================== [Log] # Executables Size LastExeSize= ( 88011086, 0, 0, 19956686, 0, 0, 1523952 ) #----------------------------------------------------------------------------------------------------------------------------------- # CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime | #----------------------------------------------------------------------------------------------------------------------------------- # 1 | 20000101 | 20000101 | 2013-10-22T17:53:48 | 2013-10-22T17:55:10 | 82.01000 | 4.21000 | # 2 | 20000102 | 20000102 | 2013-10-22T18:28:03 | 2013-10-22T18:29:17 | 74.19000 | 4.09000 | # 3 | 20000103 | 20000103 | 2013-10-23T17:28:50 | 2013-10-23T17:30:26 | 95.21000 | 4.30000 | run.card

  39. Verification and correction 1/6 • Where did the problem occur ? • 1 "failed" email : Main computation job => gaya stopped at IDRIS, hardware problem ? Check Script_output_xxxx. => When gaya restarted, or if there isn't any clear error message, try relaunching (after a clean_month): path/to/libIGCM/clean_month.job ccc_msub (llsubmit) Job_...

  40. Verification and correction 2/6 • Where did the problem occur ? • 1 "failed" email : Main computation job : analyse Script_output_xxxx ####################################### # ANOTHER GREAT SIMULATION # ####################################### 1ère partie (copying the input files) ####################################### # DIR BEFORE RUN EXECUTION # ####################################### 2ème partie (running the model) ####################################### # DIR AFTER RUN EXECUTION # ####################################### 3ème partie (post-processing) ####################################### http://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#AnalyzingtheJoboutput:Script_Output

  41. Verification and correction 3/6 --> analyse Script_output_xxxx : In general, if your simulation stops you can look for the keyword "IGCM_debug_CallStack" in this file. This keyword will come after a line explaining the error you are experiencing. ===================================================================== EXECUTION of : mpirun -f ./run_file > out_run_file 2>&1 Return code of executable : 1 IGCM_debug_Exit : EXECUTABLE !!!!!!!!!!!!!!!!!!!!!!!!!! !! IGCM_debug_CallStack !! !------------------------! !------------------------! IGCM_sys_Cp : out_run_file xxxxxxxxxxxx_out_run_file_error =====================================================================

  42. Verification and correction 4/6 --> Check closely the sub directory Debug (if it exists) Check file xxxxx_error in Debug/ : • contains LMDZ standard output. LMDZ often fails in hgardfou. Stopping in hgardfou • contains abends (abnormal termination / exception) of each and every component. Check standard outputs for NEMO, ORCHIDEE, INCA, OASIS • Debug/xxxx_ocean.output • Debug/xxxx_output_orchidee • Debug/xxxx_inca.out • Debug/xxxx_cplout

  43. RunChecker.job • RunChecker.job helps you to monitor all the jobs produced by libIGCM for a simulation

  44. RunChecker.job : usage and options This script can be launched from anywhere. Usage: path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] [-s] job_name path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -p config.card_path path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -r Options : -h : print this help and exit -u user : owner of the job -q : quiet -j n : print n post-processing jobs (default is 20) -s : search for a new job in $WORKDIR and fill in the catalog before printing information -p path : give the absolute path to the directory containing the config.card instead of the job name (needed only once) -r : check all running simulations. 1) path/to/libIGCM/RunCkecker.job –p $CCCWORKDIR/CURIE/CMIP5/R1414/IPSLCM5A_20120731/modipsl/config/IPSLCM5A/v5.rcp45CMR2 2) path/to/libIGCM/RunCkecker.job v5.rcp45CMR2

  45. STOP (Fatal into run.card) Pb !

  46. Verification and correction 5/6 • You have received 2 "failed" emails or RunChecker status is abnormal ie : red • Analyse the situation: • Simple case: • Re-submit rebuild, pack_debug or pack_restart jobs • Re-submit pack_output • Less simple case: • Use clean_year to go back to a healthy situation • Holes in the data path/to/libIGCM/clean_year.job [SSAA] • all data from current year to SSAA (included) will be deleted. • Restart the simulation https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#Startorrestartpostprocessingjobs1

  47. TimeSeries_Checker.job • Install a dedicated directory • Copy required files and directories : config.card, run.card, COMP, POST • Copy from libIGCM the script : TimeSeries_Checker.job • Modify the job : libIGCM, name of the simulation, ... • Look at the documentation :https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#TimeSeries_checker.job-Recommendedmethod > mkdir POST_REDO > cd POST_REDO > cp –pr COMP POST config.card run.card . > cp ../../../../libIGCM/TimeSeries_Checker.job . > vi TimeSeries_Checker.job # Check/Modify : libIGCM= SpaceName= ExperimentName= JobName= CARD_DIR= BRIDGE_MSUB_PROJECT=gen2211 > ./TimeSeries_Checker.job Answer y to submit create_ts.job ksh > ./TimeSeries_Checker.job 2>&1|tee TSC_OUT_TO_KEEP

  48. Verification and correction 6/6 • Everything went ok : • End of simulation email • No anomaly detected by RunChecker • TimeSeriesChecker (and SE_checker):Checks existing time series et submit create_ts jobs to build the missing ones • Keep in mind: • Rebuild jobs automatically submit pack jobs, as well as corresponding TS and SE. • Pack, TS and SE jobs may be re-submitted independently from a rebuild job

  49. The END! (so soon?) champagne-users@ipsl.jussieu.fr platform-users@ipsl.jussieu.fr Mailing list to ask for help and to share information with other users

More Related