1 / 44

Porting NANOS on SDSM

Porting NANOS on SDSM. GOAL Porting a shared memory environment to distributed memory. What is missing to current SDSM ?. Christian Perez. Who am i ? . December 1999 : PhD at LIP, ENS Lyon, France Data parallel languages, distributed memory, load balancing, preemptive thread migration

viola
Download Presentation

Porting NANOS on SDSM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Porting NANOS on SDSM GOAL Porting a shared memory environment to distributed memory. What is missing to current SDSM ? Christian Perez

  2. Who am i ? • December 1999 : PhD at LIP, ENS Lyon, France • Data parallel languages, distributed memory, load balancing, preemptive thread migration • Winter 1999/2000 : TMR at UPC • OpenMP, Nanos, SDSM • October 2000 : INRIA researcher • Distributed programs, code coupling

  3. Contents • Motivation • Related works • Nanos execution model (NthLib) • Nanos on top of 2 SDSM (JIAJIA & DSM-PM2) • Missing SDSM functionalities • Conclusion

  4. Motivation • OpenMP : emerging standard • simplicity (no data distribution) • Cluster of machines (mono or multiprocessors) • excellent ratio performance / price • OpenMP on top of a cluster !

  5. OpenMP / Cluster : HOW ? • OpenMP paradigm : shared memory • Cluster paradigm : message passing • Use of software DSM system ! • Hardware DSM system : SCI (write: 2 s) • specific hardware • not yet stable

  6. Related work • Several OpenMP/DSM implementations • OpenMP NOW!, Omni • But, • Modification of OpenMP semantics • One level of parallelism • Do not exploit high performance networks

  7. OpenMP on classical DSM • Compiler extracts shared data from stack • Expensive local variable creation • shared memory allocation • Modification of OpenMP standard : • default should be private instead of being shared variables • New synchronization primitives : • condition variables & semaphores

  8. OpenMP on classical DSM • One level of parallelism (SPMD) !$omp parallel do do i = 1,4 x(i) = x(i) + x(i+1) end do call schedule(lb, up, …) do i = lb, ub x(i) = x(i) + x(i+1) end do call dsm_barrier() barrier

  9. Omni compilation approach Taken from pdplab.trc.rwcp.or.jp/pdperf/Omni/wgcc2k/

  10. Our goals • Support OpenMP standard • High performance • Allow exploitation of • multithreading (SMP) • high performance networks

  11. Nanos OpenMP compiler • Convert an OpenMP program to a task graph • Communications via shared memory !$omp parallel do do i = 1,4 x(i) = x(i) + x(i+1) end do i=1,2 i=3,4

  12. NthLib runtime support • Nanos compiler generates intermediate codes • Communications still via shared memory call nthf_depadd(…) do nth_p = 1, proc nth= nthf_create_1s(…,f,…) done call nth_block() subroutine f(…) x(i) = x(i) + x(i+1)

  13. NthLib details • Assumes to run on top of kernel threads • Provides user-level threads (QT) • Stack management (allocate) • Stack initialization (argument) • Explicit context switch

  14. Nthlib queues • Global/Local • Thread descriptor • Rich functionalities • Work descriptor • High performance

  15. Nthlib : Memory management Nano-thread descriptor Successors Stack Guard zone Mutal exclusion mmap allocation SLOT_SIZE stack alignment

  16. Porting Nthlib to SDSM Data consistency Shared memory management Nanos threads JIAJIA implementation DSM-PM2 implementation Summary of DSM requirements

  17. Data consistency • Mutual exclusion for defined data structures  Acquire/Release • User level shared memory data  Barrier

  18. Data consistency • Mutual exclusion for defined data structures  Acquire/Release • User level shared memory data  Barrier barrier barrier barrier

  19. Shared memory management • Asynchronous shared memory allocation • Alignment parameter (> PAGE_SIZE) • Global variables/commondeclaration  Not yet supported

  20. Nano-threads • Run-to-block execution model • Shared stacks (father/sons relationship) • Implicit thread migration (scheduler)

  21. JIAJIA • Developed at China by W. Hu, W. Shi & Z. Tang • Public domain DSM • User level DSM • DSM : lock/unlock, barrier, cond. variables • MP : send/receive, broadcast, reduce • Solaris, AIX, Irix, Linux, NT (not distributed)

  22. JIAJIA : Memory Allocation • No control of memory alignment (x2) • Synchronous memory allocation primitive  Development of an RPC version • Based on send/receive primitive • Add of a user level message handler  Problems • Global lock • Interference with JIAJIA blocking function

  23. JIAJIA : Discussion • Global barrier for data synchronization  Not multiple levels of parallelism • No thread aware  No efficient use of SMP nodes

  24. DSM/PM2 • Developed at LIP by G. Antoniu (PhD student) • Public domain • User level, module of PM2 • Generic and multi-protocol DSM • DSM : lock/unlock • MP : LRPC • Linux, Solaris, Irix (32 bits)

  25. PM2 organization MAD1 TCP PVM MPI SCI VIA SBP MARCEL MONO SMP ACTIVATON PM2 DSM TBX NTBX MAD2 TCP MPI SCI VIA BIP http://www.pm2.org

  26. DSM/PM2 : Memory Allocation • Only static memory allocation  Build dynamic memory allocation primitive • Centralized memory allocation • LRPC to Node 0  Integration of alignment parameter Summer 2000 : dynamic memory allocation ready !

  27. DSM/PM2 : marcel descriptor Page boundary marcel_t (sp&MASK)+SLOT_SIZE NthLib requirement : a kernel thread  many nano-threads

  28. DSM/PM2 : marcel descriptor Page boundary marcel_t (sp&MASK)+SLOT_SIZE marcel_t* Page boundary *((sp&MASK)+SLOT_SIZE)

  29. DSM/PM2 : Discussion • Using page level sequential consistency + no need of barrier (Multiple levels of parallelism) – False sharing  Dedicated stack layout marcel_t* Page boundary Pad Page boundary

  30. DSM/PM2 : Discussion (cont) • No alternate stack for signal handler  Prefetch page before context switch : O(n)  Pad to next page before opening parallelism Page boundary Shared data Pad Page boundary

  31. DSM/PM2 improvement • Availability of an asynchronous DSM malloc • Lazy data consistency protocol in evaluation • eager consistency, multiple writer • scope consistency • Support for stack in shared memory (LINUX)

  32. DSM/PM2 shared stack support marcel_t SEGV stack (sp&MASK)+SLOT_SIZE

  33. DSM/PM2 shared stack support marcel_t SEGV stack (sp&MASK)+SLOT_SIZE

  34. DSM/PM2 shared stack support marcel_t SEGV stack SEGV stack (sp&MASK)+SLOT_SIZE

  35. DSM/PM2 shared stack support marcel_t SEGV stack SEGV stack (sp&MASK)+SLOT_SIZE

  36. DSM/PM2 shared stack support marcel_t SEGV stack SEGV stack (sp&MASK)+SLOT_SIZE

  37. DSM/PM2 shared stack support marcel_t SEGV stack (sp&MASK)+SLOT_SIZE

  38. DSM requirement • Support of static global shared variables • Efficient code • remove one indirection level • Enable use of classical compiler • Support for common  « Sharedization » of already allocated memory dsm_to_shared(void* p, size_t size);

  39. DSM requirement • Support for multiple level of parallelism • Partial barrier • group management • Dependencies support • like acquire/release but without lock

  40. DSM requirement • Support for multiple level of parallelism • Partial barrier • group management • Dependencies support • like acquire/release but without lock barrier barrier

  41. DSM requirement • Support for multiple level of parallelism • Partial barrier • group management • Dependencies support • like acquire/release but without lock barriers barrier

  42. DSM requirement • Support for multiple level of parallelism • Partial barrier • group management • Dependencies support • like acquire/release but without lock start(1) start(2) stop(1) stop(2) update(1,2)

  43. Summary of DSM requirements • Support of static global shared variables  « Sharedization » of already allocated memory • Acquire/release primitive • Partial barrier  group management • Asynchronous shared memory allocation • Alignment parameter to memory allocation • Threads (SMP nodes) • Optimized stack management

  44. Conclusion • Successfully port Nanos to 2 DSM  JIAJIA & DSM-PM2 • DSM requirement to obtain performance  Support MIMD model  Automatic thread migration • Performance ?

More Related