1 / 30

MuPC Run Time System for UPC

MuPC Run Time System for UPC . Steve Seidel, Phil Merkey Jeevan Savant, Kian Gap Lee Department of Computer Science Michigan Technological University Brian Wibecan, Program PI Phil Becker, Program Manager Kevin Harris, Bruce Trull, and Daniel Christians Compaq UPC Development.

varden
Download Presentation

MuPC Run Time System for UPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MuPC Run Time System for UPC Steve Seidel, Phil Merkey Jeevan Savant, Kian Gap Lee Department of Computer Science Michigan Technological University Brian Wibecan, Program PI Phil Becker, Program Manager Kevin Harris, Bruce Trull, and Daniel Christians Compaq UPC Development

  2. UPC designed by Carlson et al. • A “light weight” extension of C for parallelism • A shared memory, multithreaded model • Arrays and pointers can be shared • Array distribution is semi-automatic • Remote references are automatically resolved • Parallel constructs include • forall • fence and split barrier • Built-ins for • memory allocation/free • locks

  3. Compaq's UPC compiler • UPC object code • front end translates UPC source to EDG IL • lowering phase converts UPC-specifics to standard EDG IL • middle end converts EDG IL to GEM-compatible IL • GEM back end converts GEM IL to alpha object code • Each of the intermediate phases above has some UPC-specific components. • Alternative:“Bail out" after lowering phase to produce C code that includes calls to a run time system. • Under discussion: EDG front end for UPC

  4. Run Time System Interface • The RTS interface is an evolving set of data objects and methods that captures the semantics of “UPC minus C”. • An RTS "reference implementation" was suggested by Harris. • A publicly available reference implementation will • promote UPC code base, user base and platform base • challenge MPI and OpenMP • foster RTS evolution • promote support for UPC tools • MuPC is MTU's run time system for UPC

  5. Run Time System Structure • Run time structures describing shared objects and globals are maintained. • References to nonlocal shared objects are made through get and put. • UPC barrier’sand fence’s are passed directly to the RTS. • The same is true of UPC calls to other built-in functions that provide locks and dynamic memory allocation.

  6. Available compiler technology • Proprietary Compaq compiler supports a proprietary RTS. • Reference compiler is not currently available, but ... • Compaq will provide a compiler that supports the reference RTS.

  7. MuPC Design Goals • Public availability • Wide platform base • Open source maintained by MTU • User-level implementation • Quick delivery • Efficiency is not a primary goal

  8. Available Platforms • MTU (on site): • Beowulf cluster (64 nodes) • Sun Enterprise 4500 (12 processors) • SGI Origin 2000 (4 processors) • Sun workstation networks (various) • Linux workstation networks (various) • AlphaServer and 2 workstations (provided by Compaq) • Remote: • AlphaServer SC (Compaq) • T3E (Cray)

  9. Transport vehicle selection • Candidates • MPI no one-sided communication • MPI-2 incomplete implementations • Pthreads no multiprocessor support • OpenMP expensive, possibly incompatible • shmem limited platform base • VIA limited platform base • ARMCI limited user base • TCP/IP too low-level • Selection criteria • Portability and availability: MPI, Pthreads, TCP/IP • Technical shortcomings can be overcome

  10. MPI/Pthreads hybridtransport vehicle • MPI provides process control and interprocessor communication. • Pthreads provides multithreading within each process to handle asynchronous remote accesses. • The following are equivalent in MuPC: • one MPI process • one UPC thread (from the user’s point of view) • one user Pthread + one MPI send/recv Pthread • Thread safety is provided by isolating all MPI calls in the send/recv Pthread.

  11. upcrun -np 3 upc-demo user UPC thrd user UPC thrd user UPC thrd send/ recv thrd send/ recv thrd send/ recv thrd MPI_init pthread_create MPI_init pthread_create MPI_init pthread_create upc_finalize upc_finalize upc_finalize

  12. Example: Nonlocal array reference x=a[k]; // User shared array shared int a[10][THREADS]; // Frontend-generated temporary pointer shared int *UPC_RTS_ptr; ... // UPC source code: // x=a[k]; // Front end computes address, // phase and thread of remote reference. UPC_RTS_ptr = (vaddr,phase,thread); // Call is made to get a[k] x = MuPC_get_sync_int(UPC_RTS_ptr);

  13. x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr Pthread lock structs: send_lock recv_lock while (threads) case ... GET: MPI_Send(p,RECV) ... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T ... end while while (threads) case ... RECV: MPI_Recv(p) MPI_Send(*p,REPLY) ... end while send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data

  14. x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr Pthread lock structs: send_lock recv_lock while (threads) case ... GET: MPI_Send(p,RECV) ... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T ... end while while (threads) case ... RECV: MPI_Recv(p) MPI_Send(*p,REPLY) ... end while send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data

  15. x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr Pthread lock structs: send_lock recv_lock while (threads) case ... GET: MPI_Send(p,RECV) ... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T ... end while while (threads) case ... RECV: MPI_Recv(p) MPI_Send(*p,REPLY) ... end while send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data

  16. x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr Pthread lock structs: send_lock recv_lock while (threads) case ... GET: MPI_Send(p,RECV) ... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T ... end while while (threads) case ... RECV: MPI_Recv(p) MPI_Send(*p,REPLY) ... end while send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data

  17. x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr Pthread lock structs: send_lock recv_lock while (threads) case ... GET: MPI_Send(p,RECV) ... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T ... end while while (threads) case ... RECV: MPI_Recv(p) MPI_Send(*p,REPLY) ... end while send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data

  18. Synthetic Testing • Pseudo-code walkthroughs of all MuPC functions • Synthetic test codes are C/MPI programs that call MuPC RTS routines directly. • Shared data is artificially allocated. // THREAD 0 int a[10]; ... // a[12]=42; index=12%10; thread=12/10; MuPC_put_integer(a,index,thread,42); ... // THREAD 1 int a[10]; ... // outcome is // a[2]=42; ...

  19. Integration Testing • Wrap get’s, put’s and notify/wait to conform to the RTS interface. • Integrate MuPC with front end ... • ... data structures and globals • ... initialization and finalization • Rewrite synthetic tests in UPC and compare to previous results. • Add built-in functions for • locks • memory allocation

  20. Full-scale Testing • MTU test kernels • GWU UPC test suite • Contributed UPC codes

  21. Documentation, Delivery, and Distribution • MuPC source • Front-end binaries for targeted platforms • Makefiles, release notes, etc. • Serve these items from MTU MuPC web site • Publish a description of MuPC

  22. Preliminary Work, Summer, 2001 • RTS header files provided by Compaq • MPI-2 one-sided communication proposed as primary transport vehicle but current implementations do not meet full standard • MPI/Pthreads hybrid selected • Studied intermediate output of Compaq's UPC front end • Compaq hardware and software delivered • Single-threaded working environment verified • Accounts on AlphaServer SC also provided

  23. August 20-21, Nashua • Participants: • Bill Carlson, Brian Wibecan, Kevin Harris, Phil Becker, Daniel Christians, Jim Bovay, Savant, Merkey, Seidel • Discussed RTS definition and UPC features per Wibecan's agenda. • Outcomes: • MPI/Pthreads hybrid design feasible • MuPC will include upccc and upcrun MPI wrappers • Agreed on RTS and UPC feature interpretations • MuPC efficiency and performance not highest priority • Written meeting summary submitted to Compaq (Sept. 23, 2001)

  24. Current Work • Recent improvements: • isolating MPI calls for thread safety • send/recv threads yield control when there are no pending requests • Skeleton implementations of get/put, barrier, fence, and finalize have been scaled to over 30 nodes on MTU’s Beowulf cluster.

  25. Project Work Plan: • Start date June 28, 2001 • This plan is based on the Project Work Items specified in the March 27 RFP from Compaq and on the March 30 MTU Proposal.

  26. Completed Work Items(per MTU proposal) • 1(a): Review implementation methodologies (b): Identify development platforms (c): Align resources (staff and platforms) (d): Identify target platforms (e): Conclusion memo (sent 9/23/1) • 2: Formal Work Plan and Agreement • (Written version of this document) • 4: Initial Design of Run Time System • Design presented in Nashua on August 20, 2001

  27. Remaining Work Items (w/completion dates) • 5: Development of remaining primary components (Jan. 1, 2002) • (d) locks • (e) complete gets and puts • (b) memory allocation • (f) utility functions • 3: Test design and documentation (Feb. 1, 2002) • This testing will be done concurrent with Item 5 above. • (a) Synthetic testing • (b) Integration testing • (c) Full-scale testing

  28. 6: Public Interface development (April 1, 2002) (a) Makefiles, release notes, installation notes, etc. (b) Bundle all necessary software (c) Provide MTU-authored test codes and results (d) Release advance copies for review and comment 7: System Refinement and Delivery (June 1, 2002) (a) Release MuPC to the UPC Developers' Group (b) Maintain MuPC website at MTU (c) Publish description of MuPC 8: Completion Certification(June 28, 2002) (a) Final MuPC release by MTU

  29. MuPC Project Staff • Jeevan Savant, M.S. Graduate Student • MuPC design and implementation • (Items 5(b,d,e,f), 6(a,d), and 7(a,c) above) • Support: 9 months, half-time • Kian Gap (Mark) Lee, M.S. Graduate Student • MuPC testing and platform integration • (Items 3(a,b,c), 6(b,c), 7(b,c) above) • Support: 9 months, half-time • Phillip Merkey, Research Assistant Professor • Steven Seidel, Associate Professor

  30. Additional MTU UPC projects • Charles Wallace, Assistant Professor • UPC Memory models • Xiaodi (Lisa) Li, M.S. Graduate Student • Benchmarking MuPC using one or two NAS parallel benchmarks • Yi (Leon) Liang, M.S. Graduate Student • Pthreads-only MuPC RTS • Yongsheng Huang, M.S. Graduate Student • UPC memory models, improving MuPC efficiency • Zhang Zhang, Ph.D. Graduate Student • UPC memory models, improving MuPC efficiency

More Related