1 / 13

NWP Transition from AIX to Linux Lessons Learned

NWP Transition from AIX to Linux Lessons Learned. Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011. Overview. Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary.

nibaw
Download Presentation

NWP Transition from AIX to Linux Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011

  2. Overview Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary

  3. Introduction • AFWA has a long history of AIX HPC environment • Air Force Weather Environment • Worldwide, 24x7x365, systems, weather data and product support • Headquarters, Operational Weather Squadrons (OWS), and Combat Weather Teams (CWTs), Climatological Center (14th WS) • 600+ systems across 4 distinct security enclaves • 16 million+ lines of code • ~1,000 software applications supported • As model resolutions improve and processing requirements soar, AFWA requirements for NWP processing capability have increased dramatically • SEMS (in-house support contractor) performed a study, evaluating IBM, HP, and Cray • Red Hat Linux on HP hardware • Transitioning from IBM/AIX to HP/Linux has resulted in a significant savings in Total Cost of Ownership (TCO)

  4. AFWA Architecture(Unclassified Only)

  5. Applications Run on HPC • Run Regional Models • WRF • WRF Chem • CDFS II (future) • Dust • LIS • Run Global UM • Ensembles • Model post-processing • Misc space products

  6. Original NWP Environment(Unclassified)

  7. “Free” Hardware Adventure • In 2008 AFWA evaluated JVN (available from HPCMO Modernization) • 1024 compute nodes • 36 racks of equipment • 589 KW power requirements • 161 tons of cooling • The “Free” hardware was not cost-effective • SEMS performed a study to evaluate alternatives • New hardware was more cost effective • Less space • Less power • Less cooling • More Flops • Lower TCO • Decision made to pursue Linux HPC solution

  8. AFWA Unclassified HPC Configuration

  9. Linux ConfigurationProd 8/DC3 OS: Linux RHEL 5.3 File System: Lustere Disk: 50 TB I/O Bandwidth: 900 Mb/s throughput Chipset (2) ) 2.53 GHz Intel Nehalem E5540 quad-core CPUs per node Compute Blades: 128 Cores/Memory: 1024 cores, 3GB per core Processing capacity: 10 TeraFlops (Production) Test and development system (DC3): 5 TeraFlops

  10. TCO Comparison Original 10 TeraFlops of IBM/AIX HPC O&M (non-labor) - $1.4M Nominally $133K per TeraFlop for IBM/AIX HPC Annual projected O&M costs for Linux (now totalling 24 TeraFlops) - $ 1M Conservatively, $30K per TeraFlop for HP/Linux HPC Bottom line: Linux HPC solution represented a significant savings

  11. Lessons Learned • Not all “free” hardware is desirable (JVN) • Differences in Linux vs. AIX compilers (minor, but require modifications) • Significant tuning differences between AIX and Linux • File system configurations significantly different (Lustere/IBRIX vs GPFS) • Job scheduler differences had to be worked through (IBM Load Leveler vs. Platform LSF) • Full reduction of TCO doesn’t occur until previous OS support is no longer required • So far, Linux has been proven to be a reliable and cost-effective OS for NWP

  12. Future Linux Plans 5000+ core Linux cluster is being planned for delivery in August 2011 Represents 51 TeraFlopsof additional capability Total HPC capacity by end of year 2011 > 90 TeraFlops Total phase out of IBM/AIX HPC environment

  13. Summary • Total Cost of Ownership is complex • Initial costs • Transition costs • Facility costs • Support costs • Linux does scale well • Linux is a viable and cost-effective HPC platform • Transitioning from IBM/AIX to HP/Linux has resulted in a significant TCO savings

More Related