270 likes | 442 Views
OMPi: A portable C compiler for OpenMP V2.0. Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos. University of Ioannina. Presentation. Introduction OMPi OMPi Performance Conclusions. The OpenMP specification. High level API for parallel programming in a shared memory environment
E N D
OMPi:A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina
Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003
The OpenMP specification • High level API for parallel programming in a shared memory environment • Fortran • Version 1.0, October 1997 • Version 1.1, November 1999 • Version 2.0, November 2000 • C/C++ • Version 1.0, October 1998 • Version 2.0, March 2002 • New features such as • timing routines • copyprivate and num_threads clauses • variable reprivatization • static threadprivate EWOMP 2003
OpenMP compilers • Commercial compilers for specific machines • SUN, SGI, Intel, Fujitsu, etc. • OpenMP compiler projects (usually portable) • Nanos • OdinMP/CCp • Intone project • Omni EWOMP 2003
Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003
OMPi • Portable C compiler for OpenMP • Adheres to V.2.0 • Produces ANSI C code with POSIX threads library calls • Written entirely in C EWOMP 2003
Compilation process OMPi generated C file system C compiler (cc) object file C source file OMPi library object files system linker a.out EWOMP 2003
Code transformations • parallel construct • code is moved into a (thread) function • a struct is declared containing pointers to non-global shared variables • private variables are redeclared locally in the function body • original code is replaced by code that creates a team of threads executing the function • master thread executes the function, too EWOMP 2003
Example int a; typedef struct { /* shared vars structure */ int (*b); /* b is shared, non-global */ } par0_t; int main() { int b, c; _omp_initialize(); { /* declare par0_vars, the shared var struct */ _OMP_PARALLEL_DECL_VARSTRUCT(par0); /* par0_vars->b will point to real b */ _OMP_PARALLEL_INIT_VAR(par0, b); /* Run the threads */ _omp_create_team(3, _OMP_THREAD, par0_thread, (void *) &par0_vars); _omp_destroy_team(_OMP_THREAD->parent); } } void *par0_thread(void *_omp_thread_data) { int _dummy = _omp_assign_key(_omp_thread_data); int (*b) = &_OMP_VARREF(par0, b); int c; c = (*(b)) + a; . . . } int a; /* global */ int main() { int b, c; #pragma omp parallel num_threads(3) \ private(c) { c = b + a; . . . } } EWOMP 2003
Work sharing constructs • sectionsconstruct • a switch-case block is created • the code of each sectionis moved into a caseof the switch block • any thread may execute any section • forconstruct • each thread computes the bounds of the next chunk to execute • then, if a chunk is available, executes the for-loop within the computed bounds EWOMP 2003
Threads • a pool of threads is created when the program starts, all threads are sleeping • initial pool size is number of CPUs or $OMP_NUM_THREADS • user can request a specific number of threads by using the num_threads clause or omp_set_num_threads() EWOMP 2003
Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003
Benchmarks • NAS parallel benchmarks • OpenMP C version of ported by Omni group (v2.3) • Results for Class W • Edinburgh University microbenchmarks (EPCC) • Measure synchronization overheads EWOMP 2003
Platforms • SGI origin 2000 system • 48 MIPS R10000 CPUs • IRIX 6.5 • Compaq proliant ML 570 • 2 Intel Xeon CPUs • Redhat Linux 9.0 • SUN E-1000 Server • 4 Sparc CPUs • Solaris 5.7 EWOMP 2003
Compilers • OdinMP/CCp v1.02 • Omni v1.4a • Intel C/C++ compiler (ICC) v7.1 • Mipspro v7.3 EWOMP 2003
Compilation times for 2-CPU Linux system Compilation times for the SGI Origin 2000 system 70 200 odin odin 180 omni 60 omni 160 ompi ompi 50 140 icc mipspro 120 40 seconds seconds 100 30 80 60 20 40 10 20 0 0 bt lu sp bt lu sp NAS parallel benchmarks Compilation Time EWOMP 2003
NAS parallel benchmarksSGI Origin 2000 (execution time) bt.W 110 ompi omni 100 mipspro 90 80 70 60 50 seconds 40 30 20 10 1 2 3 4 5 6 7 8 number of threads EWOMP 2003
NAS parallel benchmarksSGI Origin 2000 cg.W 10 ompi omni 9 mipspro 8 7 6 5 4 seconds 3 2 1 0 1 2 3 4 5 6 7 8 number of threads EWOMP 2003
NAS parallel benchmarksSGI Origin 2000 ft.W 6 ompi omni 5.5 mipspro 5 4.5 4 3.5 seconds 3 2.5 2 1.5 1 2 3 4 5 6 7 8 number of threads EWOMP 2003
NAS parallel benchmarksSGI Origin 2000 lu.W 160 ompi omni mipspro 140 120 100 80 seconds 60 40 20 1 2 3 4 5 6 7 8 EWOMP 2003 number of threads
NAS parallel benchmarks Sun E-1000 bt.W cg.W 1000 90 ompi ompi omni omni 900 80 800 70 700 60 600 50 seconds seconds 500 40 400 30 300 20 200 10 1 2 3 4 1 2 3 4 number of threads number of threads ft.W lu.W 40 2000 ompi ompi omni omni 1800 35 1600 30 1400 1200 25 seconds seconds 1000 20 800 600 15 400 EWOMP 2003 10 200 1 2 3 4 1 2 3 4 number of threads
odin ompi 1000 1000 parallel parallel for for 900 900 parallel for parallel for barrier barrier 800 800 single single 700 critical 700 critical lock unlock lock unlock 600 600 ordered ordered atomic atomic microseconds microseconds 500 500 reduction reduction 400 400 300 300 200 200 100 100 0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 number of threads number of threads EPCC microbenchmarksSGI (overheads) EWOMP 2003
omni ompi parallel parallel 1400 1400 for for parallel for parallel for barrier 1200 barrier 1200 single single critical critical 1000 1000 lock unlock lock unlock ordered ordered atomic atomic 800 800 microseconds microseconds reduction reduction 600 600 400 400 200 200 0 0 1 2 3 4 1 2 3 4 number of threads number of threads EPCC microbenchmarksSUN EWOMP 2003
Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003
Conclusions • C compiler for OpenMP V.2.0 • Written in C, generated code uses pthreads • Tested on Linux, Solaris, Irix • Performance satisfactory, comparable with native compilers EWOMP 2003
Current status • Target solaris threads, sproc • Improve overheads (e.g. ordered) • Improve produced code (optimizations) • Profiling code EWOMP 2003
Thank you http://www.cs.uoi.gr/~ompi