1 / 12

Outline

Parallelism combining OpenMP and MPI Robert Sinkovits, Ph.D. sinkovit@sdsc.edu NPACI Parallel Computing Institute August 28 - September 1, 2000 San Diego Supercomputer Center. Outline. Purpose Show how we combine MPI and OpenMP Compiling and linking Discuss Available compilers

nani
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelism combining OpenMP and MPIRobert Sinkovits, Ph.D.sinkovit@sdsc.eduNPACI Parallel Computing InstituteAugust 28 - September 1, 2000San Diego Supercomputer Center

  2. Outline • Purpose • Show how we combine MPI and OpenMP • Compiling and linking • Discuss • Available compilers • Required options • Options not to use • Example programs • Make files

  3. Hardware description • Blue Horizon (horizon.npaci.edu) • 144 IBM SP-High Nodes • 144 8-way SMP nodes for a total of 1152 processors • 6.4 GB/s on-node memory bandwidth • 4 GB/node main memory • Power3 processors • 222 MHz, 4 floating point ops per cycle • 888 MFLOPS/processor for a total of 1.0023 Teraflop • Nodes currently connected with IBM switch • 115 MB/second • Maximum 4 MPI tasks/node in US when using switch

  4. Why Combining MPI and OpenMP • We have 8 processors/node but can only use 4 • MPI requires multiple copies of data • Program to the hardware • Some applications have limited task parallelism • Keep data in cache (maybe)

  5. Compilers • IBM • Fortran : xlf, xlf90 • Fortran with MPI: mpxlf, mpxlf90 • For OpenMP/SMP support append _r to the compiler command • xlf_r, xlf90_r, mpxlf_r, mpxlf90_r • OpenMP support for all flavors of Fortran, • C/C++ : xlc, xlC • C/C++ with MPI: mpcc, mpCC • For SMP support append _r to the compiler command • xlc_r, xlC_r, mpcc_r, mpCC_r • OpenMP support in C add -qsmp=omp • OpenMP not supported directly in C++ • C++ can call C or Fortran OpenMP subroutines

  6. Compilers • KAI • Guidef90 guide f77 • Guidec, Guidec++ • “MP” scripts derived from IBM versions • kai_mpcc_r, kai_mpCC_r, kai_mpxlf90_r, kai_mpxlf_r • These are in /usr/local/apps/KAI_mpi • The compile line option -qalias=ALLPtrs causes wrong answers

  7. A useful subroutine • Routine: thread_bind() • Causes threads to be bound to processors • Source can be found at: • www.npaci.edu/BlueHorizon/source/thread_bind.c • Needs to be called in a parallel critical region after MPI_Init • Does not have much effect on KAI compiler !$OMP PARALLEL !$OMP CRITICAL call thread_bind() !$OMP END CRITICAL !$OMP END PARALLEL #pragma omp parallel #pragma omp critical thread_bind();

  8. Examples using OpenMP and MPI • C: bothc.c bothf.f • Creates arrays of random numbers • Sums in a OpenMP parallel for/do • Sums across processors using MPI_Reduce • IBM version calls thread_bind() • C++: both_C.C • KAI only • Same functionality as bothc.c • C++ calling C: callc.C • IBM only • C++ calls a C routine that does OpenMP returning the number of threads • MPI reduce finds the total number of threads

  9. apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothf.ibm: bothf.f bind.o mpxlf90_r $(IBM_SMP) $(OP) $(IBM_F_OP) bothf.f bind.o -o bothf.ibm bothc.ibm: bothc.c bind.o mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bothc.c bind.o -o bothc.ibm callc.ibm: callc.C bind.o do_OpenMP.o mpCC_r $(IBM_SMP) $(OP) $(IBM_C_OP) callc.C \ do_OpenMP.o bind.o -o callc.ibm bind.o: bind.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bind.c -c -o bind.o do_OpenMP.o:do_OpenMP.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) do_OpenMP.c -c -o do_OpenMP.o makefile

  10. apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothc.kai: bothc.c dummy.o kai_mpcc_r $(KAI_SMP) $(OP) $(KAI_C_OP) bothc.c dummy.o -o bothc.kai both_C.kai: both_C.C kai_mpCC_r $(KAI_SMP) $(OP) $(KAI_C_OP) both_C.C -o both_C.kai bothf.kai: bothf.f dummy.o kai_mpxlf90_r $(KAI_SMP) $(OP) $(KAI_F_OP) bothf.f dummy.o -o bothf.kai dummy.o: dummy.c cc -c dummy.c makefile

  11. Output tf173i % make ibm_apps mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bind.c -c -o bind.o mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bothc.c bind.o -o bothc.ibm 1500-036: (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs do_OpenMP.c -c -o do_OpenMP.o mpCC_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs callc.C do_OpenMP.o bind.o -o callc.ibm 1540-5200 (W) The option "threaded" is not supported. mpxlf90_r -qsmp=omp -O3 -qarch=auto -qtune=auto bothf.f bind.o -o bothf.ibm ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. tf173i %

  12. Output tf173i % make kai_apps cc -c dummy.c kai_mpcc_r -O3 -qarch=auto -qtune=auto bothc.c dummy.o -o bothc.kai kai_mpCC_r -O3 -qarch=auto -qtune=auto both_C.C -o both_C.kai C++ prelinker: warning: library "libmpi_r{.so,.a}" does not exist in the specified library directories C++ prelinker: warning: library "libvtd_r{.so,.a}" does not exist in the specified library directories kai_mpxlf90_r -O3 -qarch=auto -qtune=auto bothf.f dummy.o -o bothf.kai "bothf.f", 1500-036 (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. Target "apps" is up to date. tf173i %

More Related