150 likes | 277 Views
John Athanaselis johnathana@mg.uoa.gr http://forecast.uoa.gr. National & Kapodistrian University of Athens School of Physics Division of Physics of Environment Meteorology Atmospheric Modeling and Weather Forecasting Group. Porting Atmospheric Forecasting Model to HPC Platforms.
E N D
John Athanaselis johnathana@mg.uoa.gr http://forecast.uoa.gr National & Kapodistrian University of Athens School of Physics Division of Physics of Environment Meteorology Atmospheric Modeling and Weather Forecasting Group Porting Atmospheric Forecasting Model to HPC Platforms
Topics to be discussed • Why atmospheric forecasting models are important • Key characteristics that make forecasting models challenging for HPC • Current technologies we are using and their limitations • Porting forecasting models to GPU accelerated HPC platforms
Atmospheric models Running on operational basis: • SKIRON • ICLAMS/RAMS • CAMx http://forecast.uoa.gr
Atmospheric model characteristics • weather forecasting has a real deadline • improving forecasting accuracy requires enormous computational resources • additional physical processes, increasing resolution, reducing the time step and increasing the total time interval
Atmospheric model characteristics Domain -> Number of grid points -> CFL -> Time step
Parallelism Technology Cluster MPI Shared memory OpenMP Multicore clusters MPI+OpenMP High Performance Computing
GPU Advantage: • Fast and Cheap • Energy efficient Disadvantage: • Not every algorithm can have theoretical speedup • Hard to program • No mature industrial/academic standard model
GPU What kind of algorithms run well on this architecture? • Massive parallelism - is needed to effectively use hundreds of thread processors and provide enough slack parallelism for the fast multi-threading to effectively tolerate device memory latency and maximize device memory bandwidth utilization. • Single precision (32-bit) floating point numbers - double precision float are not universally supported on GPUs. There have been efforts to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computation onto the GPU • Limited synchronization - thread processors within a multi-processor can synchronize quickly enough to enable coordinated vector operations like reductions, but there is virtually no ability to synchronize across multi-processors. • Locality - is needed to enable use of the hardware or user-managed data caches to minimize accesses to device memory.
Success stories John Michalakes stated the following: "...the 5× to 20× increase in WSM5 performance translates into 1.25× - 1.3× increase in total application performance in total application performance (Amdahl’s law limits the total increase to 1.3×). A 1.25× improvement in model performance from a few months effort is rare. Though 1.3× is clearly not enough to support strong scaling, the initial result is still promising. Moving more computation into the GPU will yield equivalent performance from smaller more efficient clusters. Furthermore, planned improvements in GPU speed, host proximity, and programmability will allow WRF and other highly data-parallel weather and climate models to execute almost entirely on the GPU." GPU Acceleration of Numerical Weather Prediction
Some concluding remarks • Exploiting GPU power requires technical skills • We have progress in automated tools and standards • It looks promising