1 / 27

Power-Optimal Pipelining in Deep Submicron Technology

ISLPED 2004 8/10/2004. Power-Optimal Pipelining in Deep Submicron Technology. Seongmoo Heo and Krste Asanovi ć Computer Architecture Group, MIT CSAIL. Traditional Pipelining. Goal: Maximum performance. Vdd. Setup. Clk-Q. Propagation Delay. Clk. Clk. Clk.

tavia
Download Presentation

Power-Optimal Pipelining in Deep Submicron Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanović Computer Architecture Group, MIT CSAIL

  2. Traditional Pipelining • Goal: Maximum performance Vdd Setup Clk-Q Propagation Delay Clk Clk Clk

  3. Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Time Slack Clk

  4. Pipelining as a Low-Power Tool • Goal: Low-Power, Fixed Throughput Vdd Setup Clk-Q Propagation Delay Clk Time Slack Clk Traded for Power (supply voltage scaling) Time Slack Clk

  5. Pipelining as a Low-Power Tool Power * Clock frequency fixed Flip-flop Power Overhead Pipelining Time slack Delay

  6. Pipelining as a Low-Power Tool Power * Clock frequency fixed Power Saving Supply voltage scaling Delay

  7. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining

  8. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Too shallow pipelining Delay

  9. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Delay

  10. Power-Optimal Pipelining • Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Too deep pipelining Too shallow pipelining Optimal pipelining Optimal Power Saving Delay

  11. Contribution • Pipelining is an old idea. • Research focus has been on performance impact of pipelining. • Idea of using pipelining [Chandrakasan ’92] to lower power has not been fully explored in deep submicron technology. • Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating

  12. Bottom-to-Top Approach • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated) active active inactive Time Idle Power Component Leakage Power Component Switching Power Component

  13. Bottom-to-Top Approach • Impact of pipelining on power component • Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated) active active inactive Time *Idle power = power consumed when circuit is idle and not clock-gated Idle Power Component Leakage Power Component Switching Power Component

  14. Methodology • Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant • Test bench • BPTM (Berkeley Predictive Technology Model) 70nm process: • LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) • Hspice simulation at 100°C, Clock = 2 GHz Baseline N FO4 inverters (N = 2 ~ 24) TG flip-flops TG flip-flops One Pipeline Stage

  15. Pipelining and Switching Power:Analytical Trend Optimal Saving O(N2) Flip-flop overhead Switching Power Quadratic reduction of logic switching power O(1/N)  Vdd2  N2 Optimal FO4 Number of FO4 per stage, N

  16. Pipelining and Leakage Power:Analytical Trend Optimal Saving O(1/N) O(N ) (1<< 2) Flip-flop overhead Superlinear reduction of logic leakage power Leakage Power Optimal FO4  Vdd * e(Vdd) N DIBL effect Number of FO4 per stage, N

  17. Pipelining and Idle Power:Analytical Trend • Clock-gating is not always possible • Increased control complexity • insufficient setup time of clock enable signal • Leakage Power + Flip-flop Switching Power • Between leakage power scaling and flip-flop switching power scaling depending on leakage level

  18. Pipelining and Idle Power:Analytical Trend Leakage Power Scale Flip-flop Switching Power Scale Optimal Saving Idle Power Optimal Saving O(N) Optimal FO4 Linear reduction of Flip-flop switching power Relative Power O(N ) (1<< 2) O(1/N)  1/N * Vdd2  N Optimal FO4 O(1/N) Number of FO4 per stage, N Number of FO4 per stage, N

  19. Simulation Results:Power Components Fixed Throughput @ 2 GHz N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining

  20. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating *2 GHz *Flip-flop delay not included in optimal FO4 relative power relative power activity factor activity factor

  21. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating Idle Power Leakage Power relative power relative power Switching Power Switching Power activity factor activity factor

  22. Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating Clock Gating relative power relative power LVT activity factor activity factor

  23. Discussion • LVT can be fast and power-efficient • enables lower Vdd • Flip-flop delay more important than flip-flop power for power-optimal pipelining

  24. Limitation of This Work

  25. Conclusion • Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. • Optimal Logic Depth: 6-8 FO4 • ~ 8-10 FO4 including flip-flop delay • Optimal Power Saving: 55 – 80% • It depends on Vth, AF, Clock-Gating • Insights: • Pipelining is more effective with High AF • Pipelining is most effective at saving switching power • Pipelining is more effective with lower Vth • Except for when leakage power is dominant. • Pipelining is more effective with clock-gating • reduced flip-flop overhead.

  26. Acknowledgments • Thanks to SCALE group members and anonymous reviewers • Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

  27. BACKUP SLIDES

More Related