1 / 68

ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE260B – CSE241A Winter 2005 Timing Analysis and Correction. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. Timing Analysis. Testing Simulation Device modeling (BSIM) Transistor-level time domain analysis (SPICE) Frequency domain interconnect analysis (AWE, PRIMA)

cathy
Download Presentation

ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE260B – CSE241AWinter 2005Timing Analysis and Correction Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

  2. Timing Analysis • Testing • Simulation • Device modeling (BSIM) • Transistor-level time domain analysis (SPICE) • Frequency domain interconnect analysis (AWE, PRIMA) • Static timing analysis • Transistor-level (PathMill) • Gate-level (PrimeTime)

  3. Combinational logic Combinational logic Combinational logic clk clk clk Sequential Machine • State is stored in registers (flip-flops or latches) • Combinational logic computes next-state, outputs from present-state, inputs Courtesy K. Keutzer et al. UCB

  4. Why Clocks? • Clocks provide the means to synchronize • By allowing events to happen at known timing boundaries, we can sequence these events • Greatly simplifies building of state machines • No need to worry about variable delay through combinational logic (CL) • All signals delayed until clock edge (clock imposes the worst case delay) FSM Dataflow Comb Logic register register Comb Logic register Courtesy K. Yang, UCLA

  5. Clock Cycle Time • Cycle time is determined by the delay through the CL • Signal must arrive before the latching edge • If too late, it waits until the next cycle • Synchronization and sequential order becomes incorrect • Constraint: Tcycle > Tprop_delay_through_CL + Toverhead • Example: 3.0 GHz Pentium-4  Tcycle = 333ps • Can change circuit architecture to obtain smaller Tcycle Courtesy K. Yang, UCLA

  6. Pipelining • For dataflow: • Instead of a long critical path, split the critical path into chunks • Insert registers to store intermediate results • This allows 2 waves of data to coexist within the CL • Can we extend this ad infinitum? • Overhead eventually limits the pipelining • E.g., 1.5 to 2 gate delays for latch or FF • Granularity limits as well • Minimum time quantum: delay of a gate • Tcycle > Tpd + Toverhead • Tcycle > max(tpd1, tpd2) + Toverhead register register register register register CL A+B CL A CL B tpd tpd1 tpd2 Courtesy K. Yang, UCLA

  7. Intel MPU FO4 INV Delays Per Clock Period • FO4 INV = inverter driving 4 identical inverters (no interconnect) • Half of frequency improvement has been from reduced logic stages, i.e., pipelining

  8. Let’s Revisit Cycle Time and Path Delay • Cycle time (T) cannot be smaller than longest path delay (Tmax) • Longest (critical) path delay is a function of: • Total gate, wire delays • logic levels cycle time data Tclock1 Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock Courtesy K. Keutzer et al. UCB

  9. Cycle Time - Setup Time • For FFs to correctly capture data, must be stable for: • Setup time (Tsetup) before clock arrives setup time data Tclock1 Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock Courtesy K. Keutzer et al. UCB

  10. Cycle Time – Clock Skew • If clock network has unbalanced delay – clock skew • Cycle time is also a function of clock skew (Tskew) data Tclock1 Tclock2 Q2 Q2 clock skew Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock 10 Courtesy K. Keutzer et al. UCB

  11. Cycle Time – Flip-Flop Delay (Clock to Q) • Cycle time is also a function of propagation delay of FF (Tclk-to-Q or Tc2q) • Tc2q : time from arrival of clock signal till change at FF output) data Tclock1 Tclock2 Q2 clock-to-Q Q2 Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock Courtesy K. Keutzer et al. UCB

  12. Min Path Delay - Hold Time • For FFs to correctly latch data, data must be stable during: • Hold time (Thold) after clock arrives • Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew) hold time data Tclock1 Q2 Q1 short path, ~3 logic levels Tclock1 Tclock2 clock Courtesy K. Keutzer et al. UCB

  13. Setup, Hold, Cycle Times cycle time hold time – D stable after clock set-up time – D stable before clock When signal may change Example of a single phase clock Courtesy K. Keutzer et al. UCB

  14. Timing Constraints for Edge-Triggered FFs • Max(Tpd) < Tcycle – Tsetup – Tc2q – Tskew • Delay is too long for data to be captured • Min(Tpd) > Thold-Tc2q+Tskew • Delay is too short and data can race through, skipping a state FlipFlop Comb Logic Tcycle Courtesy K. Yang, UCLA

  15. Example of Tpdmax Violation • Suppose there is skew between the registers in a dataflow (regA after regB) • “i” gets its input values from regA at transition in Ck’ • CL output “o” arrives after Ck transition due to skew • To correct this problem, can increase cycle time Ck’ Ck regA regB Comb Logic i o Tpdmax Tskew Ck Too late! Ck’ i Tpdmax Courtesy K. Yang, UCLA o

  16. Example of Tpdmin Violation: Race Through • Suppose clock skew causes regA to be clocked before regB • “i” passes through the CL with little delay (tpdmin) • “o” arrives before the rising Ck’ causes the data to be latched • Cannot be fixed by changing frequency have rock instead of chip Ck Ck’ regA regB i o Comb Logic Tpdmin Tskew Ck Ck’ Too early! i Tpdmin o Courtesy K. Yang, UCLA

  17. FF FF combinational logic Q D CLK CLK DATA Tcycle Thold Tsetup Summary: Timing Constraints • Synchronous design = combinational logic + sequential elements • For each flip-flop: • Tmax+ Tsetup < Tcycle - Tskew • Tmin > Thold + Tskew • Tmax : longest data propagation path delay • Tmin : shortest data propagation path delay

  18. Clock Identification • Partition the design • Clock network • Clockdefinition • Derived clock • Clock groups • Clock delay (skew) calculation • Timing constraints exist between clocks with a common divisor frequency • Data paths with timing constraints FF FF combinational logic Q D CLK1 /8 divider CLK4 CLK2 CLK3

  19. U A U 1 .20 .20 X A 0 1 X 0 2 Z F .15 C 2 V Y 0 F V 2 C .15 0 .20 Z 2 B 2 B .20 1 1 Y 2 Timing Graph • Data paths with timing constraints • Starting from primary inputs/FF outputs • Ending at primary outputs/FF inputs • Represented by a labeled directed graph G = <V,E> • Timing node V ~ pin/primary input/output • Timing edge E ~ gate/wire delay • (Timing arc ~ gate delay) Courtesy K. Keutzer et al. UCB

  20. X X 2 Z Z 2 Y 2 Y Characterization • Static analysis = vector-less worst case analysis • Graph based path propagation • No logics • Pre-characterized look-up tables for gate delays • Min/max/rise/fall • Characterized interconnect delays • On-the-fly delay calculation • SDF (standard delay format) annotation

  21. A U 1 .20 X 0 2 .15 C 0 F V 2 .20 Z B 2 1 Y 2 Compute Longest Path (Kirkpatrick 1966, IBM JRD) Compute longest path in a DAG G = <V,E,delay,Origin> // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Origin Courtesy K. Keutzer et al. UCB

  22. A U 1 .20 X 0 2 .15 C 0 F V 2 .20 Z B 2 1 Y 2 Compute Longest Path (Kirkpatrick 1966, IBM JRD) Compute longest path in a DAG G = <V,E,delay,Origin> // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Origin • Dynamic programming • How to exclude a set of paths? Courtesy K. Keutzer et al. UCB

  23. clock Timing Analysis Terminology • Actual arrival time (AAT): forward propagation • Required arrival time (RAT): backward propagation • Slack = RAT - AAT • A measure of how much timing margin exists at each node • Slack < 0  timing violation • Can optimize a particular branch • Can trade slack for power, area, robustness • Critical path

  24. Static Timing Analysis Flow • Construct timing graph • Partition clock domain (form path groups) • Ideal/propagated clock • Case analysis • AAT propagation • Levelization • Timing report • End points with violations • Path enumeration • Read in • design (LEF/DEF) • timing library (.lib) • timing constraints (GCF) • delay annotation (SDF) • Set up constraints • Annotated delays • IO path constraints • Single cycle setup/hold checks • Timing exceptions • False paths • Multi-cycle paths • Max delay constraints • Min delay constraints

  25. clock Timing Exceptions • False paths: topologically connected but logically impossible to enable • To enable a path • Logically: non-controlling values (e.g., 0 for OR gates, 1 for AND gates) at side inputs • Temporally: earlier signal transitions at side inputs

  26. False Path Representation • Abstracted graph • Set_false_path -from {…} –through {…} … -through {…} –to {…} through through from to from to through through

  27. clock False Path Identification • Tagged timing analysis • Arrival times with the same tag are compared to find worst case • False path filtered arr: 1 tag: 0 arr: 2 tag: 2 b d a c arr: 3 tag: 3 from to through through a d b tag: 2 c tag: 3

  28. Latch combinational logic combinational logic Q D CLK Handling Latch-Based Designs • Latch: level enabling sequential element • Transparent signal propagation • Time borrowing • Path delay of previous stage – Tborrow • Path delay of current stage + Tborrow CLK DATA transparent Tborrow

  29. Counting Process Variation • Off-chip variation: two paths on a chip cannot use two different operating conditions (i.e., corners) at the same time for setup or hold analysis • Launchclock_latepath (max) + data_latepath (max) < captureclock_earlypath (max) + clock_period – setup • Launchclock_earlypath (min) + data_earlypath (min) > captureclock_latepath (min) + hold • On-chip variation: the software calculates the delay for one path based on maximum operating condition while calculating the delay for another path based on minimum operating condition for setup or hold checks • Statistical static timing analysis (SSTA) • Continuous pdf (probability distribution functions) • Or discrete corners pdf

  30. max FF FF combinational logic Q D CLK max Common part Clock Re-convergence Pessimism Removal • Common part of two clock propagation paths cannot have two different path delays at the same time • Need to compute clock propagation delay from the branch point min

  31. Outline • Timing Analysis • Timing Requirements • Static Timing Analysis • Timing Correction

  32. Timing Correction • Driven by STA • “Incremental performance analysis backplane” • Two goals • Fix logic design rule violations • Fix timing problems DAC-2002, Physical Chip Implementation

  33. Logic Design Rules • Constraints of • Fanout • Slew rate • Load cap • Reduce timing look-up table extrapolation error • Control signal integrity • Transition degradation • Crosstalk noise • Supply voltage drop • Device reliability • Approaches • Resizing • Buffering • Cloning (copying cells)

  34. Timing Correction Approaches • Re-synthesis • Local synthesis transforms • Timing-driven placement • Critical net weighting • Timing-driven routing • Net ordering • Buffering • Topology optimization • Post-route optimization (IPO) • Re-routing • Re-timing and useful clock skew • Sizing • Buffering DAC-2002, Physical Chip Implementation

  35. Local Synthesis Transforms • Resize cells • Buffer or clone to reduce load on critical nets • Decompose large cells • Swap connections on commutative pins or among equivalent nets • Move critical signals forward • Pad early paths • Area recovery DAC-2002, Physical Chip Implementation

  36. TransformExample ….. Double Inverter Removal ….. ….. Delay = 4 Delay = 2 DAC-2002, Physical Chip Implementation

  37. 0.2 d a ? e 0.2 b f 0.3 a A b 0.035 a C b 0.026 Resizing DAC-2002, Physical Chip Implementation

  38. 0.2 d d A e e 0.2 a f f 0.2 ? a b g g 0.2 B b h h 0.2 Cloning DAC-2002, Physical Chip Implementation

  39. 0.2 d 0.2 d e 0.2 e 0.2 a a f f B 0.2 0.2 ? b B b g g 0.2 0.2 0.1 h 0.2 h 0.2 Buffering DAC-2002, Physical Chip Implementation

  40. Arr(a)=4 a 1 b Arr(b)=3 e 1 Arr(c)=1 c Arr(e)=6 1 d Arr(d)=0 e a 1 b 1 Arr(e)=5 c 1 d Redesign Fan-in Tree DAC-2002, Physical Chip Implementation

  41. 3 3 1 1 1 1 1 1 1 1 2 1 1 Longest Path = 4 Slowdown of buffer due to load Longest Path = 5 Redesign Fan-out Tree DAC-2002, Physical Chip Implementation

  42. Decomposition DAC-2002, Physical Chip Implementation

  43. 1 0 a 1 1 5 b 1 2 c 2 1 2 3 c 1 1 b 1 0 a Swap Commutative Pins Simple sorting on arrival times and delay works 2 DAC-2002, Physical Chip Implementation

  44. Logic Restructuring 1 • Nodes in critical section that fan out outside of critical section are duplicated f f Collapsed node e a a e b e b h h d c d c Late input signals Slides courtesy of Keutzer

  45. f k d divisor c e close to output a b Logic Restructuring 2 • Place timing-critical nodes closer to output • Make them pass through fewer gates • After collapse, a divisor is selected such that substituting k into f places critical signal c and d closer to output Re-extract factor k Collapse critical section f Collapsed node e a b c d Slides courtesy of Keutzer

  46. Summary of Local Synthesis Transforms • Variety of methods for delay optimization • No single technique dominates • The one with more tricks wins? No! • Technology dependant (for gate delay) • Differ with cell libraries • Methodology dependant (for wire delay) • Need to predict placement and routing result • Uncertainty! • Pros: large potential improvement • Cons: less predictable, more expensive

  47. Summary of Local Synthesis Transforms • Work smoothly in a physical synthesis flow • Tight integration with placement and routing • Need a good framework for evaluating and processing different transforms • Accurate, fast timing engine with incremental analysis capability • don’t want to retime the whole design for each local transform • Simultaneous min and max delay analysis • How does fixing the setup violation affect the existing hold checks?

  48. Timing Correction Approaches • Re-Synthesis • Local Transformation • Timing-Driven Placement • Timing-Driven Routing • Post-Route Optimization (IPO) • Re-Routing • Re-Timing and Useful Clock Skew • Sizing • Buffering

  49. Reducing Crosstalk Effect • Shielding • Effective for short range capacitive coupling • Not for long range inductive coupling • Net ordering (wire swizzling)

  50. Reducing Crosstalk Effect • Shielding • Net ordering • Gate sizing • A strong driver is less sensitive to crosstalk • But more likely to project crosstalk to its neighbors

More Related