480 likes | 578 Views
Parallel netCDF Study. John Tannahill Lawrence Livermore National Laboratory (tannahill1@llnl.gov) September 30, 2003. Acknowledgments (1).
E N D
Parallel netCDF Study. John Tannahill Lawrence Livermore National Laboratory (tannahill1@llnl.gov) September 30, 2003
Acknowledgments (1). • This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. • Work funded by the LLNL/CAR Techbase Program. • Many thanks to this program for providing the resources to conduct this study of parallel netCDF, something that probably would not have occurred otherwise. • This is LLNL Report: UCRL-PRES-200247
Acknowledgments (2). Additional thanks to all the people who contributed to this study in one way or another: • Argonne National Laboratory (ANL): • William Gropp, Robert Latham, Rob Ross, & Rajeev Thakur. • Northwestern (NW) University: • Alok Choudhary, Jianwei Li, & Wei-keng Liao. • Lawrence Livermore National Laboratory (LLNL): • Richard Hedges, Bill Loewe, & Tyce McLarty. • Lawrence Berkeley Laboratory (LBL) / NERSC: • Chris Ding & Woo-Sun Yang. • UCAR / NCAR / Unidata: • Russ Rew. • University of Chicago: • Brad Gallagher.
Overview of contents. • Proposal background and goals. • Parallel I/O options initially explored. • A/NW’s parallel netCDF library. (A/NW = Argonne National Laboratory / Northwestern University) • Installation. • Fortran interface. • Serial vs. Parallel netCDF performance. • Test code details. • Timing results. • Parallel HDF5 comparison. • Observations / Conclusions. • Remaining questions / issues.
Why parallel netCDF (1)? • Parallel codes need parallel I/O. • Performance. • Ease of programming and understandability of code. • Serial netCDF is in widespread use. • Currently a de-facto standard for much of the climate community. • Easy to learn and use. • Well supported by Unidata. • Huge amount of existing netCDF data sets. • Many netCDF post-processing codes and tools.
Why parallel netCDF (2)? • Hopefully a fairly straightforward process to migrate from serial to parallel netCDF. • From material presented at a SuperComputing 2002 tutorial (11/02), it appeared that at least one feasible option for a Fortran parallel netCDF capability would soon be available.
Summary of work performed under proposal (1). • Read material, performed web searches, and communicated with a number of people to determine what options were available. • Parallel I/O for High Performance Computing by John May. • Once the decision was made to go with A/NW’s parallel netCDF, collaborated with them extensively: • First, to get the kinks out of the installation procedure, for each of the platforms of interest. • Next, to get the Fortran interface working properly. • C interface complete, but Fortran interface needed considerable work. • Wrote Fortran 90 (F90) and C interface test codes.
Summary of work performed under proposal (2). • Also developed F90 test codes for performance testing: • One that emulates the way serial netCDF is currently being used to do I/O in our primary model. • Another that replaces the serial netCDF code with its A/NW parallel netCDF equivalent. • Ran a large number of serial / parallel netCDF timings. • Collaborated with Livermore Computing personnel to convert the parallel netCDF test code to its parallel HDF5 equivalent. • Ran a limited number of parallel HDF5 timings for comparison with parallel netCDF. • Created this presentation / report.
Ultimate goals. • Bring a much-needed viable Fortran parallel netCDF capability to the Lab. • Incorporate parallel netCDF capabilities into our primary model, an Atmospheric Chemical Transport Model (ACTM) called “Impact”. • Model uses a logically rectangular, 2D lon/lat domain decomposition, with a processor assigned to each subdomain. • Each subdomain consists of a collection of full vertical columns, spread over a limited range of latitude and longitude. • Employs a Master / Slaves paradigm. • MPI used to communicate between processors as necessary.
Impact model / Serial netCDF. • Impact currently uses serial netCDF for much of its I/O. • Slaves read their own data. • Writes are done by the Master only. • Data communicated back to Master for output. • Buffering required because of large arrays and limited memory on Master. • Complicates the code considerably. • Increased I/O performance welcomed, but code not necessarily I/O bound.
Parallel I/O options initially explored (1). • Parallel netCDF alternatives: • A/NW (much more later). • LBL / NERSC: • Ziolib + parallel netCDF. • Level of support? Small user base? Recoding effort? • My lack of understanding in general? • Unidata / NCSA project: • “Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability.” • PI is Russ Rew, one of the primary developers of serial netCDF. • Multi-year project and just began, so not a viable option.
Abstract of Unidata / NCSA project. Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability. The proposed work will merge Unidata's netCDF and NCSA's HDF5, two widely-used scientific data access libraries. Users of netCDF in numerical models will benefit from support for packed data, large datasets, and parallel I/O, all of which are available with HDF5. Users of HDF5 will benefit from the availability of a simpler high-level interface suitable for array-oriented scientific data, wider use of the HDF5 data format, and the wealth of netCDF software for data management, analysis and visualization that has evolved among the large netCDF user community. The overall goal of this collaborative development project is to create and deploy software that will preserve the desirable common characteristics of netCDF and HDF5 while taking advantage of their separate strengths: the widespread use and simplicity of netCDF and the generality and performance of HDF5. To achieve this goal, Unidata and NCSA will collaborate to create netCDF-4, using HDF5 as its storage layer. Using netCDF-4 in advanced Earth science modeling efforts will demonstrate its effectiveness. The success of this project will facilitate open and free technologies that support scientific data storage, exchange, access, analysis, discovery, and visualization. The technology resulting from the netCDF-4/HDF5 merger will benefit users of Earth science data and promote cross-disciplinary research through the provision of better facilities for combining, synthesizing, aggregating, and analyzing datasets from disparate sources to make them more accessible.
Parallel I/O options initially explored (2). • Parallel HDF5: • Would require a significant learning curve; fairly complex. • Would require significant code changes. • Feedback from others: • Difficult to use. • Limited capability to deal directly with netCDF files. • Performance issues? • Fortran interface?
Why A/NW’s parallel netCDF was chosen (1). • Expertise, experience, and track record of developers. • PVFS, MPICH, ROMIO. • Parallel netCDF library already in place. • Small number of users; C interface only. • Initial work on Fortran interface completed, but untested. • Interest level in their product seems to be growing rapidly. • Parallel syntax much like the serial syntax.
Why A/NW’s parallel netCDF was chosen (2). • Level of support that could be expected over the long-term. • Russ Rew recommendation; based on my needs and time frame. • A/NW developers level of interest and enthusiasm in working with me. • Belief that A/NW may play a role in the Unidata / NCSA project. • Only practical option currently available?
A/NW’s parallel netCDF library. • Based on Unidata’s serial netCDF library. • Syntax and use very much like serial netCDF: • nf_ functions become nfmpi_ functions. • Additional arguments necessary for some calls. • Create / Open require communicator + MPI hint (used MPI_INFO_NULL). • Collective functions are suffixed with _all. • netcdf.inc include file becomes pnetcdf.inc. • -lnetcdf library becomes –lpnetcdf.
A/NW’s parallel netCDF library: v0.9.0. • First version with a fully functional Fortran interface. • Installation procedure made more user-friendly. • Fortran test routines added. • Interacted extensively with the developers on the above items. • Several LLNL F90 test codes became part of the v0.9.0 release. • The end product seems to meet our needs in terms of functionality, ease of use, and portability. • Have been told that the first non-beta release will be soon.
Parallel netCDF v0.9.0 installation (1). • Web site => http://www-unix.mcs.anl.gov/parallel-netcdf • Subscribe to the mailing list. • Download: • parallel-netcdf-0.9.0.tar.gz • Parallel NetCDF API documentation. • Note that the following paper will also be coming out soon: Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, and Rob Latham, “Parallel netCDF: A Scientific High-Performance I/O Interface”, to appear in the Proceedings of the 15th SuperComputing Conference, November, 2003.
Parallel netCDF v0.9.0 installation (2). • Set the following environment variables: • Uncompress / Untar tar file. • Move into the top-level directory. • Type: ./configure --prefix=/replace with top-level directory path make make install
Performance test codes (1). • Test codes written in Fortran 90. • MPI_Wtime used to do the timings. • One large 4D floating point array read or written. • Set up to emulate the basic kind of netCDF I/O that is currently being done in the Impact model. • Use a Master / Slaves paradigm. • Lon x Lat x Levels x Species I/O array dimensions. • Each Slave only has a portion of the first two dimensions.
Performance test codes (2). • Focused on timing the explicit Read / Write calls, along with any required MPI communication costs. • Typically, Impact files are open for prolonged periods, with large Read / Writes occurring periodically, then eventually closed. • Not overly concerned with file definition costs; opens / closes, but kept an eye on them.
Serial netCDF performance test code. • Version 3.5 of serial netCDF used. • Slave processors read their own input data. • Slaves use MPI to communicate their output data back to the Master for output. • Communication cost included for Write timings. • Only Master creates / opens output file. • Timed over a single iteration of Read / Write calls in any given run.
Parallel netCDF performance test code. • Version 0.9.0 of A/NW’s parallel netCDF used. • Slave processors do all netCDF I/O (Master idle). • All Slaves create / open output file. • Translation from serial netCDF test code. • Same number of netCDF calls. • Calls are syntactically very similar. • Explicit MPI communications no longer needed for Writes. • Two additional arguments are required for Create / Open: • Communicator + MPI hint (used MPI_INFO_NULL). • netcdf.inc needs to be changed to pnetcdf.inc. • Timed over 10 iterations of Read / Write calls in any given run.
Timing issue. • I/O resources are shared, so getting consistent timings can be problematic. • More so for some machines (seaborg) than others. • Made many runs and took the best time.
Serial / Parallel netCDF performance test results for mcr (plots to follow).
Serial / Parallel netCDF performance test results for seaborg (plots to follow).
Serial / Parallel netCDF performance test results for tckk (plots to follow).
Serial / Parallel netCDF Read / Write rates. 1814 MB file 64 processors Read Write
Read / Write netCDF Serial / Parallel rates. 1814 MB file 64 processors (Note different y axis scales.) Serial Parallel
Read netCDF Serial / Parallel rates for varying numbers of processors. Read 1814 MB file mcr seaborg tckk
Write netCDF Serial / Parallel rates for varying numbers of processors. Write 1814 MB file mcr seaborg tckk
Read netCDF Serial / Parallel rates for varying file sizes. Read 64 processors mcr seaborg tckk
Write netCDF Serial / Parallel rates for varying file sizes. Write 64 processors mcr seaborg tckk
Parallel HDF5 performance test code. • Version 1.4.5 of NCSA’s parallel HDF5 used. • Slave processors do all HDF5 I/O (Master idle). • Collaborated with Livermore Computing personnel to convert the parallel netCDF test code to its parallel HDF5 equivalent. • Conversion seemed to take a good deal of effort. • Increase in code complexity over parallel netCDF. • Great deal of difficulty in getting test code compiled and linked. • Irresolvable problems with parallel HDF5 library on mcr and tckk. • Finally got things working on seaborg. • Made a limited number of timing runs for a “ballpark” comparison.
ParallelHDF5 / netCDF performance test results for seaborg (plot to follow).
Parallel HDF5 / netCDFRead / Write rates. 1814 MB file 64 processors Read Write
Observations. • Parallel netCDF seems to be a very hot topic right now. • Since A/NW’s parallel netCDF is functionally and syntactically very similar to serial netCDF, code conversion is pretty straightforward. • I/O speeds can vary significantly machine to machine. • I/O speeds can vary significantly on the same machine, based on the I/O load at any given time.
Misc. conclusions from plots. • Our current method of doing serial netCDF Slave Reads performed quite poorly in general. • Unexpected. • Can degrade significantly as number of processors are increased. • Parallel netCDF Reads are faster than Writes. • Magnitude of difference on a given platform can vary dramatically. • mcr marches to its own netCDF drummer. • Parallel Reads are quite fast; serial Reads are not. • Serial Writes faster than Reads. • Parallel Writes scale poorly. • Parallel netCDF I/O tends to get somewhat faster as the file size increases. • Different platforms can behave very differently!
Overall Conclusions. • Under the specified test conditions: • A/NW’s parallel netCDF (v0.9.0) performed significantly better than serial netCDF (v3.5). • Under the specified test conditions and limited testing: • Parallel HDF5 (v1.4.5) performed significantly better than A/NW’s parallel netCDF (v0.9.0). • To date, A/NW focus has been on functionality, not performance; they believe that there is substantial room for improvement. • On a different platform and code, A/NW developers have found that parallel netCDF significantly outperforms parallel HDF5. • Not a simple matter of one being faster than the other, platform and access patterns may favor one or the other.
Remaining questions / issues. • What about files larger than 2 GB? • It appears that a general netCDF solution may be forthcoming. • How much will A/NW be able to improve performance? • They are committed to working this issue. • When will the first A/NW non-beta release be? • Maybe early next year, after performance issues are addressed. • What will the outcome of the Unidata / NCSA project be? • What role will A/NW play? • Have any potential show stoppers been missed? • Will we incorporate A/NW’s parallel netCDF capability into our Impact model?