220 likes | 351 Views
Determinate Imperative Programming: The CF Model. Vijay Saraswat IBM TJ Watson Research Center joint work with Radha Jagadeesan, Armando Solar-Lezama, Christoph von Praun http://www.saraswat.org/cf.html. Problem: Many concurrent imperative programs are determinate.
E N D
Determinate Imperative Programming: The CF Model Vijay Saraswat IBM TJ Watson Research Center joint work with Radha Jagadeesan, Armando Solar-Lezama, Christoph von Praun http://www.saraswat.org/cf.html
Problem: Many concurrent imperative programs are determinate. Determinacy is not apparent from the syntax. Basic idea A variable is the stream of values written to it by a thread. Many examples Semantics Implementation Future work Outline
Five basic themes: Partitioned address space Pervasive explicit asynchrony (Cilk-style recursive parallelism) Java base Guaranteed VM invariants Explicit, distributed VM Few language extensions <s> = async <s> <s> = finish <s> <s> = foreach ( <v>, …,<v> in <e>) <s> Multidimensional arrays over distributions Background: X10 Subsumes MPI, OpenMP, SPMD languages, Cilk …
Clocks can be created dynamically. Activities are registered with clocks. An activity may register a newly created activity with one of its clocks. “next;” resumes each clock; blocks until each clock advances. This is sufficient for deadlock-freedom. Adequate for parallel operations on arrays But not dataflow Clock advances when all activities registered on it resume the clock. Operations c.resume(); next; c.drop(); Clocked final datum In each phase of the clock the datum is immutable. Read gets current value; write updates in next phase. X10: clocks, clocked final data structures Clocks do not introduce deadlock; clocked finals are determinate.
Clocked final example: Array relaxation G elements are assigned to at most once in each phase of clock c. Each activity is registered on c. intclocked (c) final[0:M-1,0:N-1] G = …; finish foreach (int i,j in [1:M-1,1:N-1]) clocked (c) { for (int p in [0:TimeStep-1]) { G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j]; next; } } Read current value of cell. Wait for clock to advance. Write visible (only) when clock advances. Takeaway: Each cell is assigned a clocked stream of immutable values.
Variables Value in a Box Read: fetch current value Write: change value Stability condition: Value does not change unless a write is performed Very powerful Permit repeated many-writer, many-reader communication through arbitrary reference graphs Asynchrony introduces indeterminacy May write out either 0 or 1. Imperative Programming Revisited int x = 0; async x=1; print(x); Reader-reader, reader-writer, writer-writer conflicts.
Asynchronous Kahn networks Nodes can be thought of as (continuous) functions over streams. Pop/peek Push Node-local state may mutate arbitrarily Concurrent Constraint Programming Tell constraints Ask if a constraint is true Subsumes Kahn networks (dataflow). Subsumes (det) concurrent logic programming, lazy functional programming Determinate Concurrent Imperative frameworks Do not support arbitrary mutable variables.
Safe Asynchrony (Steele 1991) Parent may communicate with children. Children may communicate with parent. Siblings may communicate with each other only through commutative, associative writes (“commuting writes”). Determinate Concurrent Imperative Frameworks Good: int x=0; finish foreach (int i in 1:N) { x += i; } print(x); // N*(N+1)/2 Bad: int x=0; finish foreach (int i in 1:N) { x += i; async print(x); } Useful but limited. Does not permit dataflow synch.
A shared variable is a stream of immutable values. Each activity maintains an index i + clean/dirty bit for every shared variable. Initially i=1, v[0] contains initial value. Read: If clean, block until v[i] is written and return v[i++] else return v[i-1]. Mark as clean. Write: Write into v[i++]. Mark as dirty. A read stutters (returns value in last phase) if no activity can write in this phase. E.g. for local variables. World Map=Collection of indices for an activity. Index transmission rules. Activity initialized with current world map of parent activity. On finish, world map of activity is lubbed with world map of finished activities. (clean lub dirty = clean) All programs are determinate and scheduler independent. May deadlock … nexts are not conjunctive. The CF Basic model The clock of clocked final is made implicit.
CF example: Array relaxation shared int [0:M-1,0:N-1] G = …; finish foreach (int i,j in [1:M-1,1:N-1]) { for (int p in [0:TimeStep-1]) { G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j]; } } All clock manipulations are implicit.
Some simple examples shared int x=0; finish { async {int r1 = x; int r2 = x; println(r1); println(r2);} async {x=1;x=2;} } 0 1 Only one result – independent of the scheduler!
Some simple examples shared int x=0; finish { async {int r1 = x; int r2 = x; println(r1); println(r2);} async {x=1;} async {x=1; int r3 = x; async {x=2;}} } println(x); 0 1 2 All programs are determinate.
Some StreamIt examples X10/CF StreamIt 0 1 … void -> void pipeline Minimal { add IntSource; add IntPrinter; } void ->int filter IntSource { int x; init {x=0;} work push 1 { push(x++);} } int->void filter IntPrinter { work pop 1 { print(pop());} } shared int x=0; async while (true) x++; async while (true) println(x); The communication is through assignment to x, so the same result is obtained with: 0 1 … shared int x=0; async while (true) ++x; async while (true) println(x); Each shared variable is a multi-reader, multi-writer stream.
Some StreamIt examples: fibonacci shared int x=1, y=1; async while (true) y=x; async while (true) x+=y; Activity 1 Activity 2 Can express any recursive, asynchronous Kahn network.
StreamIt examples: Moving Average void->void pipeline MovingAverage { add intSource(); add Averager(10); add IntPrinter(); } int->int filter Average(int n) { work pop 1 push 1 peek n { int sum=0; for (int i=0; i < n; i++) sum += peek(i); push(sum/n); pop(); } } shared int y=0; shared int x=0; async while (true) x++; async while (true) { int sum=x; for (int i in 1:N-1) sum += peek(x, i); y = sum/N; } • peek(x, i) reads the i’th future value, without popping it. Blocks if necessary.
StreamIt examples: Bandpass filter float->float pipeline BandPassFilter(float rate, float low, float high, int taps) { add BPFCore(rate, low, high, taps); add Subtracter();} float ->float splitjoin BPFCore (float rate, float low, float high, int taps) { split duplicate; add LowPass(rate, low, taps, 0); add LowPass(rate, high, taps, 0); join roundrobin;} float->float filter Subtracter { Work pop 2 push 1 { push(peek(1)-peek(0)); pop(); pop();}} float bandPassFilter(float rate, float low, float high, int taps, int in) { int tmp=in; shared int in1=tmp, in2=tmp; async while (true) in1=in; async while (true) in2=in; shared int o1 = lowPass(rate, low, taps, 0, in1), o2 = lowPass(rate, high, taps, 0, in2); shared int o = o1-o2; async while(true) o = o1-o2; return o; } Functions return streams.
Canon matrix multiplication Parameters whose values are finalized. <final int N>void canon (double[N,N] c, double[N,N] a, double[N,N] b) { finish foreach (int i,j in [0:N-1,0:N-1]) { a[i,j] = a[i,(j+1) % N]; b[i,j] = b[(i+j)%N, j]; } for (int k in [0:N-1]) finish foreach (int i,j in [0:N-1,0:N-1]) { c[i,j] = c[i+j] + a[i,j]*b[i,j]; a[i,j] = a[i,(j+1)%N]; b[i,j] = b[(i+1)%N, j]; } } Local variables in each activity. The natural sequential program works (for finish foreach).
Histogram <int N> [1:N][] histogram([1:N][] A) { final int[] B = new int [1:N]; finish foreach(int i in A) B[A[i]]++; return B; } • Permit “commuting” writes to be performed simultaneously in the same phase. • Phase is completed when all activities that can write have written. B’s phase is not yet complete. A subsequent read will complete it.
Cilk programs with races int x; cilk void foo() { x = x +1; } cilk int main() { x=0; spawn foo(); spawn foo(); sync; printf(“x is \%d\n”, x); return 0; } Determinate: Will always print 1 in CF. CF smoothly combines Cilk and StreamIt.
Each activity’s world map increases monotonically with time. Use garbage collection to erase past unreachable values. Programs with no sibling communication may be executed in buffers with unit windows. Considering permitting user to specify bounds on variables (cf push/pop specifications in StreamIt). This will force writes to become blocking as well. Implementation Scheduling strategy affects size of buffers, not result.
MJ/CF Very straightforward additions to field read/write. Paper contains details. Formalization Surprisingly localized.
Future work • Paper contains ideas on detecting deadlock (stabilities) at runtime and recovering from them. • Programmability being investigated. • Implementation. • Leverage connection with StreamIt, and static scheduling. • Coarser granularity for indices. • Use same clock for many variables. • Permits “coordinated” changes to multiple variables.