Advancements in Timing-Driven Design for DSM Circuits

Timing Issues for DSM R. Brayton U.C. Berkeley

Caveats • This talk is about a work in progress • Much of the work is roughly described with the idea of just communicating the general thrust. • Many details remain to be decided and currently several algorithms are being programmed for experimental purposes. • We are just in the middle of many studies and depending on their results, the direction of the project may change. Tau97

Outline • Introduction - DSM project at Berkeley • Our timing abstraction and motivation • Timing driven placement (wireplanning) • slicing approach • programming approach • matching approach • Iterated logic decomposition • Logic rip-up and re-route • Technology aspects Tau97

Overview • Two levels of approach • electrical and technology level • logic level using timing abstraction • Electrical level used to insure reality • predict technology dimensions • place and wire transistors to create leaf cells using Cadence’s LAS tool or CADABRA • extract parasitics using SPACE or FASTCAP • simulate using SPICE with advanced BSIM model Tau97

Overview • Logic level works with a timing abstraction (to be explained) • we need to be sure that abstraction is correct (thus electrical experiments) • Currently cross-talk noise effects on timing ignored • Immediate goal is to build combinational logic macros that meet timing constraints • sequential circuits can be handled similarly Tau97

R f A a g R b A c d A A Macro Problem Statement • Given: • rectangular area, inputs and outputs on perimeter. • required times on outputs, arrival times on inputs. • set of logic functions to be synthesized (possibly pin locations can be somewhat flexible) • Find: Logic decomposition of the functions that can be: • placed and wired in the given area • meeting the timing constraints. Tau97

Some Facts • As dimensions shrink, gate delays decrease and wire delays increase • in the limit all delays are in the wires. • On a net, by a combination of buffer insertion and wire sizing: • delay of net from root to any leaf can be made linear in the Manhatten distance from root to leaf. Tau97

Linear Delay • By buffer insertion • spacing is determined by resistance and capacitance of the line and the buffers • optimum # of optimum sized buffers makes the delay linear Tau97

Linear Wire Delay Model for a Net y x Delay is made linear by buffer insertion and wire and buffer sizing Tau97

Since f depends on b, then is the minimum delay that can be on any path from b to f. Timing Abstraction: Linear Delay Model (LDM) • Delay is linear function of the Manhatten distance, independent of the logic it meets along a path. a f b c Tau97

Caveat • So far we are not considering the effect of cross-talk noise on delay victim aggressor Victim can be slowed by aggressor if transitions are opposing Tau97

f h g a b c Common Divisors May Cause Paths to Stray But in this example, the longest path is not increased Tau97

Example Where Longest Path Must be Increased f b Any divisor h(a,b) common to both f and g cannot be placed without increasing longest path h g a Tau97

Problem 1: Timing Driven Point Placement • Given: Area, Arrival and Required times, pin positions, and a decomposition (netlist) Find: Point placement that satisfies all timing constraints. • No consideration of areas required to implement logic gates • Areas of gates can be approximated by count of literals in factored form Tau97

Pure Point Placement congested area f g a b c Tau97

Problem 2: Placement with Area Constraints • Areas are flexible. Leaf cell “gates” remain to be built. Gates types remain to be determined (PLAs, domino, PTL, etc.) • Three experimental “wireplanning ” approaches • slicing • programming • matching Tau97

Slicing Approach • Use simulated annealing to get point placement • cost function for SA is derived by doing a delay trace through the placed points • After SA, derive slicing structure from point placement • Use flexibility of areas for final placement Tau97

Slicing Approach Hypothesis: Can make slicing so that distances are not perturbed too much from point placement Distances are estimated now as Manhatten distance center-to-center Once we get slicing structure, we need to build logic in blocks allocated LDM implies that we can build the logic so that delay < distance across logic sub-block Tau97

Programming Approach • Get initial point placement with force directed type method (or SA) • force points apart to provide space for areas • this gives relative point positions • Distribute slacks using zero slack distribution • Formulate and solve LP Tau97

LP Formulation • Distributed slacks give bound on wire lengths, dij • Assume aspect ratio given for each “gate” • Point placement gives relative positions All areas scaled by to guarantee feasibility Tau97

Matching Approach • Divide area into minimum size squares • Label each square with functions that it can contain without violating timing f fg/abc gh/bc fh/ac a b g Tau97 h c

Matching Approach • Each logic “gate” fans out to set of primary outputs (fg) and fans in from set of primary inputs (abc) • Thus a gate is labeled say fg/abc • Each gate is given an area (#lits in FF) • Want to match gates to squares so that square’s capacity is not violated. Tau97

before Iterated Decomposition • Given: netlist and current placement • Select divisor that can be placed, still satisfying timing constraints smaller areas some paths longer after Tau97

Iterated Decomposition • Choose divisor that maximally decreases • Algorithm: Get initial decomposition (say minimum area) Selectively duplicate nodes and adjust outputs Collapse local trees Global timing driven placement Do { select “best” divisor locally adjust placement (reset global placement after k divisors) Until area constraints are met} Tau97

Fast Local Adjustment • With slicing method, can insert new divisor into slicing structure, get new placement and do delay trace efficiently. • So we can accurately reflect area change as it affects delay • With LP method, can also solve fast. • Just need inequalities where areas may overlap Tau97

Comments • After k divisors selected and placed, re-do global placement to better reflect all divisors • i.e. do total timing driven placement on new netlist • Selective duplication and collapsing can be done to improve timing during the iteration. • experimenting with how to choose this selective collapsing Tau97

Rewiring • To alleviate timing further, rewiring can be done • Can use SPFDs since exact logic in “gate” is somewhat irrelevant. • SPFDs allow one wire to replace another Gives more flexibility than redundancy addition and removal Uses that logic in blue box can be changed Tau97

Technology Studies • Guess at process dimensions for DSM • “strawman ” .25m process • shrink to get .18m, ... , .05mprocesses • Design and layout different complex “gates” • Use Cadence’s LAS tool or Cadabra tool • Extract parasitics using SPACE or FASTCAP • Simulate with SPICE and Hu’s advanced BSIM model • Verify LDM Tau97

Strawman 0.05 um Process Interconnect H/W = 2.5/2.0 • 9 metal layers • Copper wires and vias • Polyimide dielectric (k=2) • H/W = 2 for all layers except M9 • M9 kept same as .25 um process • Insulator thickness = .7m H/W = 2.4/1.2 Not to scale H/W = 1.6/0.8 H/W = 0.6/0.3 H/W = 0.14/0.07 Tau97

First Six Layers of Metal Approximately to scale Tau97

Design and Extract Flow manual wireplanning netlist decomposition technology file Hand design Standard Cell Domino Pass Transistor Logic test.blif format? LAS or Cadabra test.gds test.blifmv constraint file test.verilog SPACE(3D) test.gds SPICE 0.25m... 0.18m... 0.10m... 0.05m... ...0.25m ...0.18m ...0.10m ...0.05m interconnect technology parameters transistor models Tau97

Richard Newton Alberto Sangiovanni Ralph Otten Wilsin Gosti Amit Narayan Philip Chong Mukul Prasad Amit Mehrotra Sunil Khatri Ravi Gunturi Subarna Sinha Hiroshi Murata IBM, Motorola, Intel, Fujitsu, Cadence SRC Acknowledgements Tau97

Advancements in Timing-Driven Design for DSM Circuits