1 / 39

CCSM4 - A Flexible New Infrastructure for Earth System Modeling

CCSM4 - A Flexible New Infrastructure for Earth System Modeling. Mariana Vertenstein NCAR CCSM Software Engineering Group. Major Infrastructure Changes since CCSM3. CCSM4/CPL7 development could not have occurred without the following collaborators DOE/SciDAC

latoya
Download Presentation

CCSM4 - A Flexible New Infrastructure for Earth System Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCSM4 - A Flexible New Infrastructure for Earth System Modeling Mariana Vertenstein NCAR CCSM Software Engineering Group

  2. Major Infrastructure Changes since CCSM3 • CCSM4/CPL7 development could not have occurred without the following collaborators • DOE/SciDAC • Oak Ridge National Laboratory (ORNL) • Argonne National Laboratory (ANL) • Los Alamos National Laboratory (LANL) • Lawrence Livermore National Laboratory (LLNL) • NCAR/CISL • ESMF

  3. Outline • What are software requirements of community earth model? • Overview of current CCSM4 • How does CCSM4 address requirements? • Flexibility permits greater efficiency, throughput, ease of porting and model development • How is CCSM4 being used in new ways? • Interactive ensembles - extending traditional definition of component • Extending CCSM to ultra high resolutions • What is CCSM4 Scalability and Performance? • Upcoming releases and new CCSM4 scripts

  4. CESM General Software Requirements User Friendly Component Parameterization Model system permits each component to be developed and tested independently on even one processor (e.g. CAM/SCAM) Scientific Consistency One code base - “stand-alone” development component code base is same as in fully coupled system Extensibility Design provides extensibility to add new components (e.g. land-ice) and new coupling strategies (interactive ensembles, data assimilation capability) Performance/Efficiency/Porting Coupling architecture/components can be easily ported and run effectively at low resolution (e.g. paleo) and ultra-high resolution on thousands of pes.

  5. Specific High Resolution Requirements Scalable and flexible coupling infrastructure Parallel I/O throughout model system (for both scalable memory and performance) Scalable memory (minimum global arrays) for each component Capability to use both MPI and OpenMP effectively to address requirements of new multi-core architectures

  6. CCSM4 Overview • Consists of a set of 4 (5 for CESM) geophysical component models on potentially different grids that exchange boundary data with each other only via communication with a coupler (hub and spoke architecture) • New science is resulting in sharply increasing number of fields being communicated between components • Large code base: >1M lines • Fortran 90 (mostly) • Developed over 20+ years • 200-300K lines are critically important --> no comp kernels, need good compilers • Collaborations are critical • DOE/SciDAC, University Community, NSF (PetaApps), ESMF

  7. What are the CCSM Components? Atmosphere Component CAM DATM (WRF) CAM Modes: Multiple Dycores, Multiple Chemistry Options, WACCM, single column Data-ATM: Multiple Forcing/Physics Modes Land Component CLM DLND (VIC) CLM Modes: no BGC, BGC, Dynamic-Vegetation, BGC-DV, Prescribed-Veg, Urban Data-LND: Multiple Forcing/Physics Modes Ice Component CICE DICE CICE Modes: Fully Prognostic, Prescribed Data-ICE : Multiple Forcing/Physics Modes Ocean Component POP DOCN(SOM/DOM) (ROMS) POP Modes: Ecosystem, Fully-coupled, Ocean-only, Multiple Physics Options Data-OCN : Multiple Forcing/Physics Modes (SOM/DOM) New Land Ice Component Coupler Regridding, Merging, Calculation of ATM/OCN fluxes, Conservation diagnostic

  8. CCSM Component Grids • Ocean and Sea-Ice must run on same grid • displaced pole, tripole • Atmosphere and Land can now run on different grids • these in general are different from the ocean/ice grid • lat/lon, but also new cubed sphere for CAM • Globally grids span low resolution (3 degree) to ultra-high • 0.25 ATM/LND [1152 x 768] • 0.50 ATM/LND [576 x 384] • 0.1 OCN/ICE [3600 x 2400] • Regridding • Done in parallel at runtime using mapping files that are generated offline using SCRIP • In past, grids have been global and logically rectangular – but now can have single point, regional, cubed sphere … • Regridding issues are rapidly becoming a higher priority

  9. CCSM Component Parallelism • MPI/OpenMP • CAM, CLM, CICE, POP have MPI/OpenMP hybrid capability • Coupler only has MPI capability • Data models only have MPI capability • Parallel I/O (use of PIO library) • CAM, CICE, POP, CPL, Data models all have PIO capability

  10. processors Original Multiple Executable CCSM3 architecture (cpl6) time CAM CLM CICE POP CPL New CCSM4 Architecture New Single Executable CCSM4 architecture (cpl7) Sequential Layout Hybrid Sequential/Concurrent Layouts Driver (controls time evolution) Driver CPL (regridding, merging) CAM POP CAM time CLM CPL CICE CLM CICE POP processors processors

  11. Advantages of CLP7 Design • New flexible coupling strategy • Design targets a wide range of architectures - massively parallel peta-scale hardware, smaller linux clusters, and even single laptop computers • Provides efficient support of varying levels of parallelism via simple run-time configuration for processor layout • New CCSM4 scripts provide one simple xml file to specify processor layout of entire system and automated timing information to simplify load balancing • Scientific unification • ALL model development done with one code base - elimination of separate stand-alone component code bases (CAM, CLM) • Code Reuse and Maintainability • Lowers cost of support/maintenance

  12. More CPL7 advantages… • Simplicity • Easier to debug - much easier to understand time flow • Easier to port – ported to • IBM p6 (NCAR) • Cray XT4/XT5 (NICS,ORNL,NERSC) • BGP (Argonne), BGL (LLNL) • Linux Clusters (NCAR, NERSC, CCSM4-alpha users) • Easier to run - new xml-based scripts permit user-friendly capability to create “out-of-box” experiments • Performance (throughput and efficiency) • Much greater flexibility to achieve optimal load balance for different choices of • Resolution, Component combinations, Component physics • Automatically generated timing tables provide users with immediate feedback on both performance and efficiency

  13. CCSM4 Provides a Seamless End-to-End Cycle of Model Development, Integration and Prediction with One Unified Model Code Base

  14. New frontiers for CCSM • Using the coupling infrastructure in novel ways • Implementation of interactive ensembles • Pushing the limits of high resolution • Capability to really exercise the scalability and performance of the system

  15. CCSM4 and PetaApps • CCSM4/CPL7 is integral piece of NSF Petaapps award • Funded 3 year effort aimed at advancing climate science capability for petascale systems • NCAR, COLA, NERSC, U. Miami • Interactive ensembles using CCSM4/CPL7 involves both computational and scientific challenges • used to understand how oceanic, sea-ice and atmospheric noise impacts climate variability • can also scale out to tens of thousands of processors • Also examine use of PGAS language in CCSM

  16. time Interactive Ensembles and CPL7 • All Ensemble members run concurrently on non-overlapping processor sets • Communication with coupler takes place serially over ensemble members • Setting new number of ensembles requires editing 1 line of an xml file • 35M CPU hours TeraGrid [2nd largest] Driver Driver Currently being used to perform ocean data assimilation (using DART) for POP2 CAM POP POP POP CAM CAM CAM POP time CLM CPL CICE CLM CPL CICE processors processors

  17. CCSM4 and Ultra High Resolution • DOE/LLNL Grand Challenge Simulation • .25° atmosphere/land and .1° ocean/ice • Multi-institutional collaboration (ANL, LANL, LLNL, NCAR, ORNL) • First ever U.S. multi-decadal global climate simulation with eddy resolving ocean and high resolution atmosphere • 0.42 sypd on 4048 cpus (Atlas LLNL cluster) • 20 years completed • 100 GB/simulated month

  18. Ultra High Resolution (cont) • NSF/PetaApps Control Simulation (IE baseline) – John Dennis (CISL) has carried this out • .5° atmosphere/land and .1° ocean/ice • Control run in production @ NICS (Teragrid) • 1.9 sypd on 5848 quad-core XT5 cpus (4-5 months continuous simulation) • 155 years completed • 100TB of data generated (generating 0.5-1 TB per wall clock day) • 18M CPU hours used • Transfer output from NICS to NCAR (100 – 180 MB/sec sustained) – archive on HPSS • Data analysis using 55 TB project space at NCAR

  19. Next steps at high resolution • Future work • Use OpenMP capability in all components effectively to take advantage multi-core architectures • Cray XT5 hex-core and BG/P • Improve disk I/O performance [currently using 10 - 25% of time] • Improve memory footprint scalability • Future simulations • .25° atm/ .1° ocean • T341 atm/ .1° ocean (effect of Eulerian dycore) • 1/8° atm (HOMME)/.25° land/ .1° ocean

  20. CCSM4 Scalability and Performance

  21. New Parallel I/O library (PIO) • Interface between the model and the I/O library. Supports • Binary • NetCDF3 (serial netcdf) • Parallel NetCDF (pnetcdf) (MPI/IO) • NetCDF4 • User has enormous flexibility to choose what works best for their needs • Can read one format and write another • Rearranges data from model decomp to I/O friendly decomp (rearranger is framework independent) – model tasks and I/O tasks can be independent

  22. PIO in CCSM • PIO implemented in CAM, CICE and POP • Usage is critical for high resolution, high processor count simulations • Serial I/O is one of the largest sources of global memory in CCSM - will eventually always run out of memory • Serial I/O results in serious performance penalty at higher processor counts • Performance benefit noticed even with serial netcdf (model output decomposed on output I/O tasks)

  23. CPL scalability Scales much better than previous version – both in memory and throughput Inherently involves a lot of communication versus flops New coupler has not been a bottleneck in any configuration we have tested so far – other issues such as load balance and scaling of other processes have dominated Minor impact at 1800 cores (kraken peta-apps control)

  24. CCSM4 Cray XT Scalability CAM 1664 time POP 4028 CICE 1800 CPL 1800 processors 1.9 sypd on 5844 cores with i/o on kraken quad-core XT5 (Courtesy of John Dennis)

  25. CAM/HOMME Dycore Cubed-sphere grid overcomes dynamical core scalability problems inherent with lat/lon grid Work of Mark Taylor (SciDAC), Jim Edwards (IBM), Brian Eaton(CSEG) • PIO library used for all I/O (work COULD NOT have been done without PIO) • BGP (4 cores/node): Excellent scalability down to 1 element per processor (86,200 processors at 0.25 degree resolution). • JaguarPF (12 cores/node): 2-3x faster per core than BGP, scaling not as good - 1/8 degree run loosing scalability at 4 elements per processor

  26. CAM/HOMMME Real Planet: 1/8° Simulations • CCSM4 - CAM4 physics configuration with cyclical year 2000 ocean forcing data sets • CAM-HOMME 1/8°, 86400 cores • CLM2 on lat/lon 1/4°, 512 cores • Data ocean/ice, 1°, 512 cores • Coupler, 8640 cores • Jaguarpf simulation • Excellent scalability: 1/8 degree running at 3 SYPD on Jaguar • Large scale features agree well with Eulerian and FV dycores • Runs confirm that the scalability of the dynamical core is preserved by CAM and the scalability of CAM is preserved by CCSM real planet configuration.

  27. How will CCSM4 be released? • Leverage Subversion revision control system • Source code and Input Data obtained from Subversion servers (not tar files) • Output data of control runs from ESG • Advantages: • Easier for CSEG to produce frequent updates • Flexible way to have users obtain new updates of source code (and bug fixes) • Users can leverage Subversion to merge new updates into their “sandbox” with their modifications

  28. Obtaining the Code and Updates Subversion Source Code Repository (Public) https://svn-ccsm-release.cgd.ucar.edu svn co svn merge obtain ccsm4.0 code make your own modifications in your sandbox obtain new code updates and bug fixes which are merged by subversion with your own changes

  29. Creating an Experimental Case • New CCSM4 Scripts Simplify: • Porting CCSM4 to your machine • Creating your experiment and obtaining necessary input data for your experiment • Load Balancing your experiment • Debugging your experiment- if something goes wrong during the simulation (never happen of course) - simpler to determine what it is

  30. Porting to your machine • CCSM4 scripts contain a set of supported machines – user can run out of the box • CCSM4 scripts also support a set of “generic” machines (e.g. linux clusters with a variety of compilers) • user still needs to determine which generic machine most closely resembles their machine and needs to customize Makefile macros for their machine • user feedback will be leveraged to continuously upgrade the generic machine capability post-release

  31. Obtaining Input Data Input data is now in Subversion repository Entire input data is about 900 GB and growing CCSM4 scripts permit user automaticallyobtain only the input data need for a given experimental configuration

  32. Accessing input data for your experiment Set up experiment create_newcase (component set, resolution, machine) determine local root directory where all input data will go (DIN_LOC_ROOT) Subversion Input Data Repository (Public) https://svn-ccsm-inputdata.cgd.ucar.edu use check_input_data to see of required datasets are present in DIN_LOC_ROOT use check_input_data –export to automatically obtain ONLY required datasets for experiment in DIN_LOC_ROOT load balance your experimental configuration (use timing files) run Experiment

  33. Load Balancing Your Experiment • Load balancing exercise must be done before starting an experiment – • Repeat short experiments (20 days) without I/O and adjust processor layout to • optimize throughput • minimize idle time (maximize efficiency) • Detailed timing results are produced with each run • Makes load balancing exercise much simpler than in CCSM3

  34. 1664 cores CAM CAM POP POP CPL7 CPL7 Reduced Idle time Load Balancing CCSM Example 2.23 SYPD 4028 cores 1664 cores 3136 cores 1.53 SYPD CLM CLM CICE CICE Time Time Processors Processors Increase core count for POP Idle time/cores

  35. CCSM4 Releases and Timelines • January 15, 2010: • CCSM4.0 alpha release - to subset of users and vendors with minimal documentation (except for script's User's Guide) • April 1, 2010: • CCSM4.0 release - Full documentation, including User's Guide, Model Reference Documents, and experimental data • June 1, 2010: CESM1.0 release • ocean ecosystem, CAM-AP, interactive chemistry, WACCM • New CCSM output data web design underway (including comprehensive diagnostics)

  36. CCSM4.0 alpha release Extensive CCSM4 User’s Guide already in place apply for alpha user access at www.ccsm.ucar.edu/models/ccsm4.0

  37. Upcoming Challenges • This year • Carry out IPCC simulations • Release CCSM4 and CESM1 and updates • Resolve performance and memory issues with ultra-high resolution configuration on Cray XT5 and BG/P • Create user-friendly validation process for porting to new machines • On the horizon • Support regional grids • Nested regional modeling in CPL7 • Migration to optimization for GPUs

  38. Contributors: D. Bader (ORNL) D. Bailey (NCAR) C. Bitz (U Washington) F. Bryan (NCAR) T. Craig (NCAR) A. St. Cyr (NCAR) J. Dennis (NCAR) B. Eaton (NCAR) J. Edwards (IBM) B. Fox-Kemper (MIT,CU) N. Hearn (NCAR) E. Hunke (LANL) B. Kauffman (NCAR) E. Kluzek (NCAR) B. Kadlec (CU) D. Ivanova (LLNL) E. Jedlicka (ANL) E. Jessup (CU) R. Jacob (ANL) P. Jones (LANL) J. Kinter (COLA) A. Mai (NCAR) Funding: DOE-BER CCPP Program Grant DE-FC03-97ER62402 DE-PS02-07ER07-06 DE-FC02-07ER64340 B&R KP1206000 DOE-ASCR B&R KJ0101030 NSF Cooperative Grant NSF01 NSF PetaApps Award Computer Time: Blue Gene/L time: NSF MRI Grant NCAR University of Colorado IBM (SUR) program BGW Consortium Days IBM research (Watson) LLNL Stony Brook & BNL CRAY XT time: NICS/ORNL NERSC Sandia Big Interdisciplinary Team! • S. Mishra (NCAR) • S. Peacock (NCAR) • K. Lindsay (NCAR) • W. Lipscomb (LANL) • R. Loft (NCAR) • R. Loy (ANL) • J. Michalakes (NCAR) • A. Mirin (LLNL) • M. Maltrud (LANL) • J. McClean (LLNL) • R. Nair (NCAR) • M. Norman (NCSU) • N. Norton (NCAR) • T. Qian (NCAR) • M. Rothstein (NCAR) • C. Stan (COLA) • M. Taylor (SNL) • H. Tufo (NCAR) • M. Vertenstein (NCAR) • J. Wolfe (NCAR) • P. Worley (ORNL) • M. Zhang (SUNYSB)

  39. Thanks! Questions? CCSM4.0 alpha release page at www.ccsm.ucar.edu/models/ccsm4.0

More Related