1 / 32

GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab

GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 15, 2008. module. interface. Bluespec: State and Rules organized into modules. All state (e.g., Registers, FIFOs, RAMs, ...) is explicit.

Download Presentation

GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 15, 2008 http://csg.csail.mit.edu/6.375

  2. module interface Bluespec: State and Rules organized into modules All state (e.g., Registers, FIFOs, RAMs, ...) is explicit. Behavior is expressed in terms of atomic actions on the state: Rule: guard  action Rules can manipulate state in other modules only via their interfaces. http://csg.csail.mit.edu/6.375

  3. Programming withrules: A simple example Euclid’s algorithm for computing the Greatest Common Divisor (GCD): 15 6 9 6 subtract 3 6 subtract 6 3 swap 3 3 subtract 0 3subtract answer: http://csg.csail.mit.edu/6.375

  4. y x swap sub State Internal behavior If (a==0) then 0 else b External interface GCD in BSV module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swap ((x > y) && (y != 0)); x <= y; y <= x; endrule rule subtract ((x <= y) && (y != 0)); y <= y – x; endrule method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmethod endmodule typedef int Int#(32) Assume a/=0 http://csg.csail.mit.edu/6.375

  5. t t t t t t int int start enab y == 0 rdy GCD module #(type t) int y == 0 rdy result GCD Hardware Module In a GCD call t could be Int#(32), UInt#(16), Int#(13), ... • The module can easily be made polymorphic implicit conditions interface I_GCD; method Action start (int a, int b); method int result(); endinterface • Many different implementations can provide the same interface: module mkGCD (I_GCD) http://csg.csail.mit.edu/6.375

  6. Combine swap and subtract rule GCD: Another implementation module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swapANDsub ((x > y) && (y != 0)); x <= y; y <= x - y; endrule rule subtract ((x<=y) && (y!=0)); y <= y – x; endrule method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmethod endmodule Does it compute faster ? http://csg.csail.mit.edu/6.375

  7. Blueview C CycleAccurate Bluesim RTL synthesis gates Legend files Bluespec tools 3rd party tools Bluespec Tool flow Bluespec SystemVerilog source Bluespec Compiler Verilog 95 RTL Verilog sim VCD output Debussy Visualization Place & Route Physical Tapeout http://csg.csail.mit.edu/6.375

  8. Generated Verilog RTL: GCD module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start, result,RDY_result); input CLK; input RST_N; // action method start input [31 : 0] start_a; input [31 : 0] start_b; input EN_start; output RDY_start; // value method result output [31 : 0] result; output RDY_result; // register x and y reg [31 : 0] x; wire [31 : 0] x$D_IN; wire x$EN; reg [31 : 0] y; wire [31 : 0] y$D_IN; wire y$EN; ... // rule RL_subtract assign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ; // rule RL_swap assign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ; ... http://csg.csail.mit.edu/6.375

  9. x y en rdy start y_en x_en y x next state values sub predicates x rdy result !(=0) > swap? subtract? Generated Hardware swap? x_en = y_en = swap? OR subtract? http://csg.csail.mit.edu/6.375

  10. x y en rdy start start_en y_en start_en x_en x y x rdy result !(=0) > swap? subtract? Generated Hardware Module sub x_en = swap? y_en = swap? OR subtract? OR start_en OR start_en rdy = (y==0) http://csg.csail.mit.edu/6.375

  11. GCD: A Simple Test Bench module mkTest (); Reg#(int) state <- mkReg(0); I_GCD gcd <- mkGCD(); rule go (state == 0); gcd.start (423, 142); state <= 1; endrule rule finish (state == 1); $display (“GCD of 423 & 142 =%d”,gcd.result()); state <= 2; endrule endmodule Why do we need the state variable? http://csg.csail.mit.edu/6.375

  12. GCD: Test Bench Feeds all pairs (c1,c2) 1 < c1 < 7 1 < c2 < 63 to GCD module mkTest (); Reg#(int) state <- mkReg(0); Reg#(Int#(4)) c1 <- mkReg(1); Reg#(Int#(7)) c2 <- mkReg(1); I_GCD gcd <- mkGCD(); rule req (state==0); gcd.start(signExtend(c1), signExtend(c2)); state <= 1; endrule rule resp (state==1); $display (“GCD of %d & %d =%d”, c1, c2, gcd.result()); if (c1==7) begin c1 <= 1; c2 <= c2+1; end else c1 <= c1+1; if (c1==7 && c2==63) state <= 2 else state <= 0; endrule endmodule http://csg.csail.mit.edu/6.375

  13. GCD: Synthesis results • Original (16 bits) • Clock Period: 1.6 ns • Area: 4240 mm2 • Unrolled (16 bits) • Clock Period: 1.65ns • Area: 5944 mm2 • Unrolled takes 31% fewer cycles on the testbench http://csg.csail.mit.edu/6.375

  14. Rule scheduling and the synthesis of a scheduler http://csg.csail.mit.edu/6.375

  15. Highly non-deterministic GAA Execution model Repeatedly: • Select a rule to execute • Compute the state updates • Make the state updates User annotations can help in rule selection Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics http://csg.csail.mit.edu/6.375

  16. Rule: As a State Transformer A rule may be decomposed into two parts p(s) and d(s) such that snext = if p(s) then d(s) elses p(s)is the condition (predicate) of the rule, a.k.a. the “CAN_FIRE” signal of the rule. (conjunction of explicit and implicit conditions) d(s) is the “state transformation” function, i.e., computes the next-state value in terms of the current state values. http://csg.csail.mit.edu/6.375

  17. Compiling a Rule rule r (f.first() > 0) ; x <= x + 1 ; f.deq (); endrule enable p f f x x d current state next state values rdy signals read methods enable signals action parameters p = enabling condition d = action signals & values http://csg.csail.mit.edu/6.375

  18. R d1,R dn,R Combining State Updates: strawman p1 p’s from the rules that update R OR pn latch enable OR d’s from the rules that update R next state value What if more than one rule is enabled? http://csg.csail.mit.edu/6.375

  19. R d1,R dn,R Combining State Updates f1 Scheduler: Priority Encoder p1 OR p’s from all the rules pn fn latch enable OR d’s from the rules that update R next state value Scheduler ensures that at most one fi is true http://csg.csail.mit.edu/6.375

  20. One-rule-at-a-time Scheduler p1 Scheduler: Priority Encoder f1 p2 f2 pn fn 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3. One rewrite at a time i.e. at most one fi is true Very conservative way of guaranteeing correctness http://csg.csail.mit.edu/6.375

  21. Executing Multiple Rules Per Cycle:Conflict-free rules rule ra (z > 10); x <= x + 1; endrule rule rb (z > 20); y <= y + 2; endrule Parallel execution behaves like ra < rb = rb < ra Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s)) pb(da(s)) 2. da(db(s)) ==db(da(s)) Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= x+1; y <= y+2; endrule http://csg.csail.mit.edu/6.375

  22. Executing Multiple Rules Per Cycle:Sequentially Composable rules rule ra (z > 10); x <= y + 1; endrule rule rb (z > 20); y <= y + 2; endrule Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s) pb(da(s)) Parallel execution behaves like ra < rb Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= y+1; y <= y+2; endrule http://csg.csail.mit.edu/6.375

  23. p1 f1 Scheduler p2 f2 Scheduler Scheduler pn fn Multiple-Rules-per-Cycle Scheduler Divide the rules into smallest conflicting groups; provide a scheduler for each group 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3.Multiple operations such that fi  fj  Ri and Rj are conflict-free or sequentially composable http://csg.csail.mit.edu/6.375

  24. Conflict Free/Mutually Exclusive) d1 p1 d2 d2 p2 p2 Sequentially Composable d1 p1 and ~p2 and and or or and and Muxing structure • Muxing logic requires determining for each register (action method) the rules that update it and under what conditions CF rules either do not update the same element or are ME p1  ~p2 http://csg.csail.mit.edu/6.375

  25. p1 pn d1 dn Scheduling and control logic Modules (Current state) Modules (Next state) “CAN_FIRE” “WILL_FIRE” Rules p1 f1 Scheduler fn pn d1 Muxing cond action dn http://csg.csail.mit.edu/6.375

  26. Extra’s http://csg.csail.mit.edu/6.375

  27. rule ra_rb(z>10 && z>20); x <= 2; endrule Behavior ra < rb rule rb_ra(z>10 && z>20); x <= 1; endrule Behavior rb < ra Sequentially Composable rules ... rule ra (z > 10); x <= 1; endrule rule rb (z > 20); x <= 2; endrule Parallel execution can behave either like ra < rb or rb < ra but the two behaviors are not the same Composite rules http://csg.csail.mit.edu/6.375

  28. Mutually Exclusive Rules • Rulea and Ruleb are mutually exclusive if they can never be enabled simultaneously s . pa(s)  ~ pb(s) Mutually-exclusive rules are Conflict-free even if they write the same state Mutual-exclusive analysis brings down the cost of conflict-free analysis http://csg.csail.mit.edu/6.375

  29. Compiler determines if two rules can be executed in parallel Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s)) pb(da(s)) 2. da(db(s)) ==db(da(s)) Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s) pb(da(s)) D(Ra)  R(Rb) =  D(Rb)  R(Ra) =  R(Ra)  R(Rb) =  D(pb)  R(Ra) =  These properties can be determined by examining the domains and ranges of the rules in a pairwise manner. These conditions are sufficient but not necessary. Parallel execution of CF and SC rules does not increase the critical path delay http://csg.csail.mit.edu/6.375

  30. Homework problem Binary Multiplication http://csg.csail.mit.edu/6.375

  31. // d = 4’d9 // r = 4’d5 // d << 0 (since r[0] == 1) // 0 << 1 (since r[1] == 0) // d << 2 (since r[2] == 1) // 0 << 3 (since r[3] == 0) // product (sum of above) = 45 1001 0101 1001 0000 1001 0000 0101101 x product r d One step of multiplication Exercise: Binary Multiplier Simple binary multiplication: What does it look like in Bluespec? http://csg.csail.mit.edu/6.375

  32. Multiplier in Bluespec module mkMult (I_mult); Reg#(Int#(32)) product <- mkReg(0); Reg#(Int#(32)) d <- mkReg(0); Reg#(Int#(16)) r <- mkReg(0); rule cycle endrule methodAction start endmethod method Int#(32) result () endmethod endmodule rule cycle (r != 0); if (r[0] == 1) product <= product + d; d <= d << 1; r <= r >> 1; endrule methodAction start (Int#(16)x,Int#(16)y) if (r == 0); d <= signExtend(x); r <= y; endmethod method Int#(32) result () if (r == 0); return product; endmethod What is the interface I_mult ? http://csg.csail.mit.edu/6.375

More Related