1 / 34

SHAASTRA 2011 :: PARALLEL PROGRAMMING WORKSHOP

SHAASTRA 2011 :: PARALLEL PROGRAMMING WORKSHOP. CILK++ WITH HANDS-ON SESSION. Cilk ++ Session Outline. Introduction to cilk Basic cilk programming constructs Constructing cilk programs for common Algorithms Cilk ++ Tools. Cilk ++ Session Outline. Introduction to cilk

kesler
Download Presentation

SHAASTRA 2011 :: PARALLEL PROGRAMMING WORKSHOP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHAASTRA 2011 :: PARALLEL PROGRAMMING WORKSHOP CILK++ WITH HANDS-ON SESSION

  2. Cilk++ Session Outline • Introduction to cilk • Basic cilk programming constructs • Constructing cilk programs for common Algorithms • Cilk++ Tools

  3. Cilk++ Session Outline • Introduction to cilk • Basic cilk programming constructs • Constructing cilk programs for common Algorithms • Cilk++ Tools

  4. Development history of cilk • Developed at MIT • Cilk-1 - work-stealing scheduling • Parallel Programming constructs in cilk-5 • Complexity theory for cilk parallel programs • Intel - optimized cilk++ framework

  5. What is Cilk? A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. • Example applications:

  6. P P P $ $ $ Network Memory I/O Shared Memory Multiprocessor Systems • Abstraction form that cilk uses • One global unified shared memory • Homogenous Processors

  7. Cilk++ Session Outline • Introduction to cilk • Basic cilk programming constructs • Constructing cilk programs for common Algorithms • Cilk++ Tools

  8. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } Cilk code intfib (int n) { if (n<2) return (n); else { intx,y; x = cilk_spawnfib(n-1); y = cilk_spawnfib(n-2); cilk_sync; return (x+y); } } C elision Fibonacci Number Computation Cilk++ is a faithful extension of C++. A Cilk program’s serial elision is always a legal implementation of Cilk semantics. Cilk provides no new data types.

  9. Basic Cilk Keywords intfib (int n) { if (n<2) return (n); else { intx,y; x = cilk_spawnfib(n-1); y = cilk_spawnfib(n-2); cilk_sync; return (x+y); } } The named childCilk procedure can execute in parallel with the parent caller. Control cannot pass this point until all spawned children have returned.

  10. Dynamic Multithreading intfib (int n) { if (n<2) return (n); else { intx,y; x = cilk_spawnfib(n-1); y = cilk_spawnfib(n-2); cilk_sync; return (x+y); } } Example:fib(4) 4 3 2 2 1 1 0 “Processor oblivious” Computation DAG 1 0

  11. Parallelizing Vector Addition C++ void vadd (real *A, real *B, int n){ inti; for (i=0; i<n; i++) A[i]+=B[i]; } 1. Recursive Code void vadd (real *A, real *B, int n){ if (n<=BASE) { inti; for (i=0; i<n; i++) A[i]+=B[i]; } else { vadd (A, B, n/2); vadd (A+n/2, B+n/2, n-n/2); } }

  12. Parallelizing Vector Addition C++ void vadd (real *A, real *B, int n){ inti; for (i=0; i<n; i++) A[i]+=B[i]; } 2. Insert Cilk keywords void vadd (real *A, real *B, int n){ if (n<=BASE) { inti; for (i=0; i<n; i++) A[i]+=B[i]; } else { cilk_spawnvadd (A, B, n/2); cilk_spawnvadd (A+n/2, B+n/2, n-n/2); cilk_sync; } }

  13. P P P $ $ $ Network Memory I/O Dynamic Scheduling • Cilkallows the programmer to express potential parallelism in an application. • The Cilkscheduler maps Cilk threads onto processors dynamically at runtime. • Since on-line schedulers are complicated, we’ll illustrate the ideas with an off-line scheduler.

  14. Cilk’s Work-Stealing Scheduler Each processor maintains a work DE-queueof ready threads, and it manipulates the bottom of the DE-queue like a stack. Spawn! P P P P

  15. Cilk’s Work-Stealing Scheduler Spawn! Spawn! P P P P

  16. Cilk’s Work-Stealing Scheduler Return! P P P P

  17. Cilk’s Work-Stealing Scheduler Return! P P P P

  18. Cilk’s Work-Stealing Scheduler Steal! P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s DE-queue.

  19. Cilk’s Work-Stealing Scheduler Spawn! P P P P

  20. Cilk++ Session Outline • Introduction to cilk • Basic cilk programming constructs • Constructing cilk programs for common Algorithms • Cilk++ Tools

  21. Square-Matrix Multiplication bn1 b21 b11 an1 a21 a11 c11 c21 cn1 an2 a22 a12 b22 bn2 b12 cn2 c12 c22 ann a2n b1n b2n bnn a1n cnn c2n c1n L L L L L L L L L M M M M M M M M M O O O n  cij = aikbkj k= 1 = X C A B Assume for simplicity that n = 2k.

  22. Recursive Matrix Multiplication C11 C12 A11 A12 B11 B12 C21 C22 A21 A22 B21 B22 A11B11 A11B12 A12B21 A12B22 = + A21B11 A21B12 A22B21 A22B22 Divide and conquer — = x 8 multiplications of (n/2) x(n/2) matrices. 1 addition of n xn matrices.

  23. Matrix Multiply in Pseudo-Cilk void Mult(*C, *A, *B, n) { float *T = malloc(n*n*sizeof(float)); cilk_spawnMult(C11,A11,B11,n/2); cilk_spawnMult(C12,A11,B12,n/2); cilk_spawnMult(C22,A21,B12,n/2); cilk_spawnMult(C21,A21,B11,n/2); cilk_spawnMult(T11,A12,B21,n/2); cilk_spawnMult(T12,A12,B22,n/2); cilk_spawnMult(T22,A22,B22,n/2); cilk_spawnMult(T21,A22,B21,n/2); cilk_sync; cilk_spawnAdd(C,T,n); cilk_sync; return; }

  24. Matrix Multiply in Pseudo-Cilk void Mult(*C, *A, *B, n) { float *T = malloc(n*n*sizeof(float)); cilk_spawnMult(C11,A11,B11,n/2); cilk_spawnMult(C12,A11,B12,n/2); cilk_spawnMult(C22,A21,B12,n/2); cilk_spawnMult(C21,A21,B11,n/2); cilk_spawnMult(T11,A12,B21,n/2); cilk_spawnMult(T12,A12,B22,n/2); cilk_spawnMult(T22,A22,B22,n/2); cilk_spawnMult(T21,A22,B21,n/2); cilk_sync; cilk_spawnAdd(C,T,n); cilk_sync; return; } void Add(*C, *T, n) { h base case & partition matrices i cilk_spawnAdd(C11,T11,n/2); cilk_spawnAdd(C12,T12,n/2); cilk_spawnAdd(C21,T21,n/2); cilk_spawnAdd(C22,T22,n/2); cilk_sync; return; }

  25. Merge Sort 3 4 12 14 19 21 33 46 3 12 19 46 4 14 21 33 3 19 12 46 4 33 14 21 void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) malloc(n*sizeof(int)); cilk_spawnMergeSort(C, A, n/2); cilk_spawnMergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); } } merge merge merge 19 3 12 46 33 4 21 14

  26. Parallel Merge void P_Merge(int *C, int *A, int *B, intna, intnb) { if (na < nb) { cilk_spawnP_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; intmb = BinarySearch(A[ma], B, nb); cilk_spawnP_Merge(C, A, B, ma, mb); cilk_spawnP_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); cilk_sync; } }

  27. Cilk-for Loop • Loop Level Parallelism • Loop granularity specification void vadd(int*C, int *A, int *B, int n) { cilk_for(inti=0;i<n;i++) { c[i] = a[i] + b[i]; } }

  28. Truly Hands-on Session • Parallel Quick-Sort : +/- Mutex • Parallel Breadth First Search • Parallel Linear Search • Parallel partial sums

  29. Cilk++ Session Outline • Introduction to cilk • Basic cilk programming constructs • Constructing cilk programs for common Algorithms • Cilk++ Tools

  30. Cilkview • Performance Analysis of Parallel Programs • Estimated Speedup • Burdened Parallelism • Granularity Level of cilk_for • Efficiency of Reducer

  31. Cilkview Output

  32. Cilkscreen • Data Races detection • Output: • #Races to a statement/Variable

  33. Key Ideas • Cilk is simple: cilk, spawn, sync • Recursion, recursion, recursion, … • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span • Work & span

  34. tHank you Next up : OpenMP, MPI

More Related