420 likes | 589 Views
Introduction to Scientific Computing on Linux Clusters. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002. Outline. Why Clusters? Parallelization example - Game of Life performance metrics Ways to Fool the Masses summary. Doug Sondak
E N D
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Outline • Why Clusters? • Parallelization • example - Game of Life • performance metrics • Ways to Fool the Masses • summary Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Why Clusters? • Scientific computing has traditionionally been performed on fast, specialized machines • Buzzword - Commodity Computing • clustering cheap, off-the-shelf processors • can achieve good performance at a low cost if the applications scale well Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Clusters (2) • 102 clusters in current Top 500 list http://www.top500.org/list/2001/06/ • Resonable parallel efficiency is the key • generally use message passing, even if there are shared-memory CPU’s in each box Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers • Linux Fortran compilers (F90/95) • available from many vendors, e.g., Absoft, Compaq, Intel, Lahey, NAG, Portland Group, Salford • g77 is free, but is restricted to Fortran 77, relatively slow Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (2) • Intel offers free unsupported Fortran compiler for non-commercial purposes • full F95 • OpenMP http://www.intel.com/software/products/ compilers/f60l/noncom.htm Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (3) http://www.polyhedron.com/
Compilers (4) • Linux C/C++ compilers • gcc/g++ seems to be the standard, usually described as a good compiler • also available from vendors, e.g., Compaq, Intel, Portland Group Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallelization of Scientific Codes Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Domain Decomposition • Typically perform operations on arrays • e.g., setting up and solving system of equations • domain decomposition • arrays are broken into chunks, and each chunk is handled by a separate processor • processors operate simultaneously on their own chunks of the array Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Other Methods • Parallelzation also possible without domain decomposition • less common • e.g., process one set of inputs while reading another set of inputs from a file Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Embarrassingly Parallel • if operations are completely independent of one another, this is called embarrassingly parallel • e.g., initializing an array • some Monte Carlo simulations • not usually the case Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Game of Life • Early simple cellular automata • created by John Conway • 2-D grid of cells • each has one of 2 states (“alive” or “dead”) • cells are initialized with some distribution of alive and dead states Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Game of Life (2) • at each time step states are modified based on states of adjacent cells (including diagonals) • Rules of the game: • 3 alive neighbors - alive • 2 alive neighbors - no change • other - dead Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Game of Life (3) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Game of Life (4) • Parallelize on 2 processors • assign block of columns to each processor • Problem - What happens at split?
Game of Life (5) • Solution - Overlap cells • Each time step, pass overlap data processor to processor
Message Passing • Largest bottleneck to good parallel efficiency is usually message passing • much slower than number crunching • set up your algorithm to minimize message passing • minimize surface-to-volume ratio of subdomains Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Domain Decomp. For this domain: To run on 2 processors, decompose like this: Not like this: Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
How to Pass Msgs. • MPI is the recommended method • PVM may also be used • MPICH • most common • free download http://www-unix.mcs.anl.gov/mpi/mpich/ • others also avalable, e.g., LAM Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
How to Pass Msgs. • some MPI tutorials • Boston University http://scv.bu.edu/Tutorials/MPI/ • NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Performance Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Code Timing • How well has code been parallelized? • CPU time vs. wallclock time • both are seen in literature • I prefer wallclock • only for dedicated processors • CPU time doesn’t account for load imbalance • unix time command • Fortran system_clock subroutine • MPI_Wtime Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallel Speedup • quantify how well we have parallelized our code Sn = parallel speedup n = number of processors T1 = time on 1 processor Tn = time on n processors
Parallel Efficiency hn = parallel efficiency T1 = time on 1 processor Tn = time on n processors n = number of processors Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallel Efficiency (3) • What is a “reasonable” level of parallel efficiency? • Depends on • how much CPU time you have available • when the paper is due • can think of (1-h) as “wasted” CPU time • my personal rule of thumb ~60% Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallel Efficiency (4) • Superlinear speedup • parallel efficiency > 1.0 • sometimes quoted in the literature • generally attributed to cache issues • subdomains fit entirely in cache, entire domain does not • this is very problem dependent • be suspicious! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Amdahl’s Law • Always some operations which are performed serially • want a large fraction of code to execute in parallel Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Amdahl’s Law (2) • Let fraction of code that executes serially be denoted s • Let fraction of code that executes in parallel be denoted p Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Amdahl’s Law (3) • Noting that p = (1-s) The parallel speedup is Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Amdahl’s Law (4) The parallel efficiency is Alternate version of Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Amdahl’s Law (6) • Should we despair? • No! • bigger machines solve bigger problems smaller value of s • if you want to run on a large number of processors, try to minimize s Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Ways to Fool the Masses • full title: “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers” • Created by David Bailey of NASA Ames in 1991 • following is selection of “ways,” some paraphrased Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Ways to Fool (2) • Scale problem size with number of processors • Project results linearly • 2 proc, 1 hr. 1800 proc., 1 sec. • Present performance of kernel, represent as performance of application Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Ways to Fool (3) • Compare with old code on obsolete system • Quote MFLOPS based on parallel implementation, not best serial implementation • increase no. operations rather than decreasing time Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Ways to Fool (4) • Quote parallel speedup making sure single-processor version is slow • Mutilate the algorithm used in the parallel implementation to match the architecture • explicit vs. implicit PDE solvers • Measure parallel times on dedicated system, serial times in busy environment Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Ways to Fool (5) • If all else fails, show pretty pictures and animated videos, and don’t talk about performance. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Summary • Clusters are viable platforms for relatively low-cost scientific computing • parallel considerations similar to other platforms • MPI is a free, effective message passing API • careful with performance timings Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002