330 likes | 341 Views
Compositional C++ (CC++). Presented by Xiaojin Niu 3/13/2003. Outline. Introduction of CC++ Description of CC++ Performance Issues Summary Reading Assignment. Introduction. Parallel C++ A number of languages and extensions of C++ have emerged trying to make the C++
E N D
Compositional C++ (CC++) Presented by Xiaojin Niu 3/13/2003
Outline • Introduction of CC++ • Description of CC++ • Performance Issues • Summary • Reading Assignment
Introduction • Parallel C++ A number of languages and extensions of C++ have emerged trying to make the C++ capable of parallel computation (http://www.fou.uib.no/fd/1996/h/413002/node12.html) • pC++ : uses class and templates to achieve parallelization ( http://www.extreme.indiana.edu/sage/) • CC++ : expands the syntax of C++ with parallel primitives
Introduction (Cont.) • What is CC++ • developed by K. Main Chandy & Carl Kesselman at Caltech • designed to alleviate the frustration of parallel programming by extending the C++ • CC++ is a strict superset of C++ • CC++ compiler is a translator, it translates CC++ code into C++ code containing embedded calls to the CC++ runtime library( based on the Nexus runtime library and an operating system specific thread library
Description of CC++ • six basic abstractions implemented by the CC++ extensions : processor object , global pointer, thread, sync variable, atomic function, transfer function • keywords of CC++ • what is the keyword • how to use ( examples)
1.Processor object • mechanism for controlling locality • a collection of data and computation that defines a single address space • virtual address space (processor objects can be located on the same physical address space) • exist independently of threads and more than one thread can be mapped to a processor object (refer to the reading assignment for details)
Processor object declaration • add a “global” keyword to the class or structure declaration • the declaration specifies the interface to objects of that type • normal C++ member function rules apply to processor objects
(Cont.) • Allocating • using the C++ “new” operator • the implementation-defined type “proc_t” defines the host name where the processor object is to be run eg. { proc_t palcement(“stimpy.cis.udel.edu”); project *global project_ptr=new(placement) project(constructor-arguments); } // creates a new processor object of type “project” on host stimpy.cis.udel.edu • Deallocating • using the “delete” operator on the global pointer that points to the appropriate processor object
2.Global pointer • mechanism for linking processor objects together • communicate data between processor objects when computation is distributed to several address spaces • can refer to other processor objects in the computation while local/normal pointers can only reference memory in the processor object they are defined in
Keyword: global • declare global pointer • can reference basic types and user-defined structures • currently not support for global pointers to functions
global eg. • int *global gpoint; // declares gpoint as a global pointer to an integer • int **global gpoint; // as a global pointer to a local pointer to an integer • C *global gpoint; // as a global pointer to an object of type C
3. Parallel threads • Parallel threads : • mechanism for specifying concurrent execution • threads are created independently from processor objects • more than one thread can execute in a processor object • keywords : par parfor spawn
3. Parallel threads (Cont.) • Thread of control • it’s start is the root from which independent dynamic action within a system occurs • CC++ program, like C++ program, executes initially as a single thread of control • CC++ can create parallel threads of control
Keyword: par • par defines a block in which statements are executed in parallel by independent threads of control • a par block can lexically contain any CC++ statement except for variable declaration and statements that result in nonlocal changes in the flow of control, such as return / break / goto(allowed but restricted)
Keyword: par (Cont.) • inside a par block the execution order of statements is not defined Interleaved / concurrent / sequential • CC++ guarantees all threads get a chance to execute • a par block terminates when all its statements terminate
par eg. 1 par par { { ans1 = func(params1); result = func2(ans1); } par { ans2 = func(params2); ans3 = func(params3); } } par ans1=.. result=.. ans2=.. ans3=.. //wait for completion
par { { a1(); a2(); a3(); } // Statement S1 { b1(); b2(); b3(); } // Statement S2 } Possible execution orderings include (but are not limited to): a1 a2 a3 b1 b2 b3 a1 b1 b2 b3 a2 a3 ::::::::::::::::::::::: The statement execute in an arbitrary but fair and interleaved manner The sequential ordering within statements S1 and S2 is maintained par eg. 2
Keyword: parfor • denotes a loop whose iterations are executed in parallel • the body of each iteration is sequential • the loop control variable must be declared in the parfor statement • parfor statements completes only when all the iterations have completed
parfor eg. parfor int A[N]; int B[N,N]; parfor (int i=0; i<N; i++) { A[i] = i; parfor (int j=0;j<N;j++) B[i,j]=j; } A[0]=0 A[1]=1 parfor parfor B[0,0]=0 B[1,0]=0 B[1,1]=1 B[0,1]=1 //wait for completion //wait for completion //wait for completion
Keyword: spawn • creates a single completely independent thread of control for a function that executes in parallel with the spawning thread • spawn construct offers no mechanism to determine the state of the spawned function (synchronization must be explicitly programmed) • spawn statement terminates immediately, regardless of the status of the thread that was spawned • a spawned function cannot return a value
spawn eg. . . spawn function(A,B); . . function(A,B); spawn
int i=0; par{ i=10; for(int j=0;j<10;j++) i++; } // i may be 10, 20 or any other value!! Here is an example of the dangers of variable sharing(this creates unpredictable results) Programmers should make sure that the sharing between threads is handled safely Pitfall
4. Sync variable • be used to synchronize thread executions • constant that initially is in an undefined state and can be assigned only once (single-assignment variable) • like constant variable in C++ except for • initialization can be delayed (creation time) • any attempt to read the value of uninitialized sync object is delayed • keyword: sync
sync eg. main ….. spawn void some_function (sync int* b) { ... //Function executes *b = 1; //Sync variable is assigned ... //some_function may continue execution } int main() { sync int sync_var; spawn some_function(&sync_var); ... / /main executes simultaneously with some_function() if (sync_var == 1) //spawning thread waits here until some_function() has assigned some value to sync_var {;} ... //Now we know that some_function() has reached the _assignment_ of sync_var. } ……. If(sync_var==1) ………. ……….. some_function(&sync_var) ……… *b=1; ……… ………
5. Atomic function • mechanism for controlling the interleaving of threads executing in the same processor object • keyword : atomic • specify that the actions of such a function will not be interleaved with the actions of any other atomic function of the same object
atomic eg. class value_store { private: int x; public: atomic void store (int i) { x = i; } }; void f(void){ class value_store vs; par { vs.store(1); vs.store(2); } } par vs.store(1) vs.store(2) par par vs.store(1) vs.store(2) vs.store(2) vs.store(1)
6. Transfer function • allow arbitrary data structures to be transferred between processor objects as arguments to remote processor objects • keyword: CCVoid
CCvoid eg. class Vector { int length; double* elements; friend CCVoid& operator<<(CCVoid&,const Vector&); friend CCVoid& operator>>(CCVoid&,Vector&); }; CCVoid& operator<<(CCVoid& v,const Vector& input){ v << input.length; for (int i=0; i<input.length; i++) v << input.elements[i]; return v; } CCVoid& operator>>(CCVoid& v,Vector& output){ v >> output.length; output.elements = new double[output.length]; for (int i=0; i<output.length; i++) v >> output.elements[i]; return v; }
Performance issues • CC++ uses RPC to interact across address space boundaries (processor objects) • RPC is slower than MPI • RPC is more complex • but RPC makes the interface simple and flexible • CC++ is currently implemented on shared-memory parallel computers and uniprocessor workstations
Summary • The advantages of CC++ • The advantages of C++ are advantages of CC++ as well (strong typing,data abstraction ect.) • CC++ is C++ with a small number of extensions (easy to learn) • CC++ provides a mechanism for parallel programming not a policy (develop different types of parallel programs) • CC++ was designed to support formal methods
Reading assignment • The Ian Foster online Chapter 5 • A tutorial for CC++ http://caltechcstr.library.caltech.edu/archive/00000132/
Nexus • Nexus is a portable library providing the multithreaded communication facilities required to implement advanced languages, libraries, and applications in heterogeneous parallel and distributed computing environments. Return Reference: http://www.globus.org/nexus/