270 likes | 450 Views
ISLPED 2004 8/10/2004. Power-Optimal Pipelining in Deep Submicron Technology. Seongmoo Heo and Krste Asanovi ć Computer Architecture Group, MIT CSAIL. Traditional Pipelining. Goal: Maximum performance. Vdd. Setup. Clk-Q. Propagation Delay. Clk. Clk. Clk.
E N D
ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanović Computer Architecture Group, MIT CSAIL
Traditional Pipelining • Goal: Maximum performance Vdd Setup Clk-Q Propagation Delay Clk Clk Clk
Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Time Slack Clk
Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Traded for Power (supply voltage scaling) Time Slack Clk
Pipelining as a Low-Power Tool Power * Clock frequency fixed Flip-flop Power Overhead Pipelining Time slack Delay
Pipelining as a Low-Power Tool Power * Clock frequency fixed Power Saving Supply voltage scaling Delay
Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining
Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too shallow pipelining Delay
Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Delay
Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Optimal pipelining Optimal Power Saving Delay
Contribution • Pipelining is an old idea. • Research focus has been on performance impact of pipelining. • Idea of using pipelining [Chandrakasan ’92] to lower power has not been fully explored in deep submicron technology. • Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating
Bottom-to-Top Approach • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated) active active inactive Time Idle Power Component Leakage Power Component Switching Power Component
Bottom-to-Top Approach • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated) active active inactive Time *Idle power = power consumed when circuit is idle and not clock-gated Idle Power Component Leakage Power Component Switching Power Component
Methodology • Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant • Test bench • BPTM (Berkeley Predictive Technology Model) 70nm process: • LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) • Hspice simulation at 100°C, Clock = 2 GHz Baseline N FO4 inverters (N = 2 ~ 24) TG flip-flops TG flip-flops One Pipeline Stage
Pipelining and Switching Power:Analytical Trend Optimal Saving O(N2) Flip-flop overhead Switching Power Quadratic reduction of logic switching power O(1/N) Vdd2 N2 Optimal FO4 Number of FO4 per stage, N
Pipelining and Leakage Power:Analytical Trend Optimal Saving O(1/N) O(N ) (1<< 2) Flip-flop overhead Superlinear reduction of logic leakage power Leakage Power Optimal FO4 Vdd * e(Vdd) N DIBL effect Number of FO4 per stage, N
Pipelining and Idle Power:Analytical Trend • Clock-gating is not always possible • Increased control complexity • insufficient setup time of clock enable signal • Leakage Power + Flip-flop Switching Power • Between leakage power scaling and flip-flop switching power scaling depending on leakage level
Pipelining and Idle Power:Analytical Trend Leakage Power Scale Flip-flop Switching Power Scale Optimal Saving Idle Power Optimal Saving O(N) Optimal FO4 Linear reduction of Flip-flop switching power Relative Power O(N ) (1<< 2) O(1/N) 1/N * Vdd2 N Optimal FO4 O(1/N) Number of FO4 per stage, N Number of FO4 per stage, N
Simulation Results:Power Components Fixed Throughput @ 2 GHz N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining
Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating *2 GHz *Flip-flop delay not included in optimal FO4 relative power relative power activity factor activity factor
Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating Idle Power Leakage Power relative power relative power Switching Power Switching Power activity factor activity factor
Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating relative power relative power LVT activity factor activity factor
Discussion • LVT can be fast and power-efficient • enables lower Vdd • Flip-flop delay more important than flip-flop power for power-optimal pipelining
Conclusion • Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. • Optimal Logic Depth: 6-8 FO4 • ~ 8-10 FO4 including flip-flop delay • Optimal Power Saving: 55 – 80% • It depends on Vth, AF, Clock-Gating • Insights: • Pipelining is more effective with High AF • Pipelining is most effective at saving switching power • Pipelining is more effective with lower Vth • Except for when leakage power is dominant. • Pipelining is more effective with clock-gating • reduced flip-flop overhead.
Acknowledgments • Thanks to SCALE group members and anonymous reviewers • Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.