1 / 24

Combining Technology Mapping and Retiming

Combining Technology Mapping and Retiming. EECS 290A Sequential Logic Synthesis and Verification. Outline. Motivation Technology mapping for combinational circuits Generalizing the concept of combinational delay to sequential circuit using the concept of l-value

tarbox
Download Presentation

Combining Technology Mapping and Retiming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

  2. Outline • Motivation • Technology mapping for combinational circuits • Generalizing the concept of combinational delay to sequential circuit using the concept of l-value • Technology mapping for sequential circuits • Computation of cuts • Search for the optimum-delay solution • Computation of optimum l-values • Constructing the solution • Retiming for optimum delay

  3. Traditional Tech Mapping Approach • Cut sequential circuit at the latch boundary • Optimize and map the combinational part • Pros: Preserves latch encoding • Cons: Potentially suboptimal • (Optional) Retime the mapped circuit Latches PO LI Logic LO PI

  4. f f c c a b a b i1 i2 i1 i2 f f i2 i1 i2 i1 Motivating Example: LUT Size = 3 retiming  mapping mapping   2 LUTs 1 LUT

  5. Basic Mapping: Overview • Pre-compute truth tables of gates (supergates) • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts • Map network for delay • Recover area using heuristics • Select final mapping

  6. z1 z2 z3 x3 x1 x2 x4 x5 What is Mapping? • Mapping expresses functions using gates

  7. d a b a c b c a c b d b c a d Basic Mapping: AND-INV Graphs F(a,b,c,d) = ab + d(ac’+bc) 6 nodes 4 levels F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) 7 nodes 3 levels

  8. z1 z2 z3 n x3 x1 x2 x4 x5 Basic Mapping: Computing AIG • Technology-independent synthesis • Any synthesis flow can be used • Constructing AIG from factored forms • SOPs are factored using algebraic factoring • Balancing AIG • Reduces delay Fn= x2x3’ x4

  9. Basic Mapping: Cuts • Definition. A cut C for a node n is a set of nodes, such that all paths from the primary inputs to n passes through a node in C • Node itself is an elementary cut • k-feasible cuts are cuts containing at most k nodes • An average number of 5-feasible cuts in benchmarks is ~20 cuts per node n x3 x1 x2

  10. Basic Mapping: Computing Cuts • All k-feasible cuts are computed in one pass over the AIG • Assign elementary cuts for primary inputs • For each internal node • merge the cut sets of children while removing duplicated cuts • add the elementary cut composed of the node itself Compute all 2-feasible cuts of node n. Cuts for node p = {{p}, {s,x2}, {x1,x2}} Cuts for node q = {{q}, {x2,t}, {x2,x3}} Cuts for node n = {{p}, {s,x2}, {x1,x2}}  {{q}, {x2,t}, {x2,x3}}  {n} ={{n}, {p,q}, {p,x2,t}, {p,x2,x3}, …} 2-feasible cuts for node n = {{n}, {p,q}} n q p s t x3 x1 x2

  11. Basic Mapping: Truth Tables • Truth table is a bit-string representing Boolean function of a cut • Truth tables are computed for all cuts of all nodes • For each cut, assign elementary variables to cut leaves • Compute the truth tables for the internal nodes in topological order q t x3 x1 x2 MSB LSB x1 = 10101010 x2 = 11001100 x3 = 11110000 t = x2 & x3 = 11000000 q = x1 & t = 10000000

  12. Basic Mapping: Delay Optimality • Assign the arrival times of the primary inputs • For each node, in topological order • Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) • Compute the arrival times of each cut, in both phases • Select the best cut for each phase • When arrival times are equal, use area as a tie-breaker c1 c4 c2 c3 Tc2 < Tc3 < Tc1 < Tc4 C2 is the best cut

  13. Basic Mapping: Area Recovery • Performs three passes • Minimize area flow • Minimize exact area for best matches • Minimize area by phase assignment • In each pass, for all nodes, in topological order • Consider matches with ArrivalTime <= RequiredTime • Among these matches, pick the one minimizing area(flow) • When area(flows) are equal, use delay as a tie-breaker c1 c4 c2 c3 Ac2 < Ac3 < Ac1 < Ac4 C2 is the best cut

  14. Basic Mapping: Area Flow • Definition: • Area flow of a primary input is 0 • Area flow of a node in the network is AF(n) = [ Area(n) +i AF(fanini(n)) ] / NumFanouts(n) (1+1/3) / 2 = 2/3 1/3 0 0 0

  15. M1 g1 g2 g6 g5 g3 g4 g11 g7 g10 g9 g8 g13 g12 Basic Mapping: Area of a Match • Definition. Area of a match is the sum total of the areas of all the gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins) A(M1)=A(g1)+ A(g3)+ A(g4)+ A(g5)+A(g9)

  16. z1 z2 z3 x3 x1 x2 x4 x5 Basic Mapping: Select Final Mapping • Extracting the final mapping from the AIG after the best matches are assigned to each node • Select the best match for each primary output node • Recursively, for each fanin of a selected match, select its best matches

  17. Mapping for Sequential Circuits • Represent netlist as an AND-INV graph (AIG) • For each node, compute cuts (iteration over the circuit) • For each node, compute l-values (iteration over the circuit) • Map network for delay (iteration over the clock periods) • Recover area using heuristics • Select final mapping P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.

  18. l-Value: A Generalization of Combinational Delay • Definition. For each edge e: u  v in S, we assign l-weight equal to -d+uv, where •  is the clock period, • d is the number of latches on the edge, and • uv is the combinational delay of pin u of node v. • Definition. The l-value of a node in S is defined as the maximum weight of the paths from the PIs to the node using the l-weights. • Theorem:S can be retimed to a clock period  iff the l-value of each PO is less than or equal to .

  19. f c a b i1 i2 Example D = 1  = 1 - infeasible l(a) = 1, l(c)=2, etc D = 1  = 2 - feasible l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc D = 1  = 3 - feasible l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc

  20. Computing Cuts for each non-PO node v in N Lv = {{v0}}; done = false; while ( done == false ) do done = true; for each node v (not PI or PO) in N do tmp = merge (Lu1, Lu2, …, Lui); if ( tmp  Lv ) then Lv = tmp  {{v0}}; done = false; return success; // Lvsettled to Cv for each v merge(Cu1,Cu2,…,Cut) = {c = c1d1 c2d2 …  ctdt |ci Cui and |c| k } where cidi = {xd+di | xd  ci} and diis the number of latches on the edge from uito v.

  21. c a b i1 i2 Example i1 i2 a b c 0: {i10} {i20} {a0} {b0} {c0} 1: {i10, c1} {i20, c0} {a0, b1} {a0, i21, c1} {i10, c1, b1} {i10, c1, i21} 2: {i10, a1, b2}{i20, a0, b1}

  22. Finding Minimum l-Values for each node v in N do if (v is a PI) l(v) = 0; else l(v) = -; done = false; while ( done == false ) do done = true; for each non-PI node v in N do tmp = minc, a cut of v ( max[ l(u) - d+uv | ud  c] ) if ( l(v) < tmp ) l(v) = tmp; done = false; if ( v is a PO and l(v) >  ) return failure; return success; // bound have settled

  23. Constructing Mapping Solution U = the set of POs S = { v | v is a PI or PO } while ( U  ) do v = any node in U; U = U – {v}; for each non-trivial cut c  Cv do if ( lopt(v) ==max[ lopt(u) - d+uv | ud  c] ) cbest = c; for each ud  cbestdo if ( u is not in S ) S = S  {u}; U = U  {u}; create an edge is S from u to v with d FFs; return S;

  24. Performing Final Retiming • Retime each node v with the following retiming lag: • where lopt(v) is the optimal retiming value and •  is the selected clock period

More Related