1 / 28

Low-power Clock Trees for CPUs

Low-power Clock Trees for CPUs. Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov EECS Department, University of Michigan ICCAD 2010. Outline. Introduction Preliminaries Terminologies Algorithms Initial tree construction and buffer insertion. Robustness improvements. Skew optimizations.

lynna
Download Presentation

Low-power Clock Trees for CPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov EECS Department, University of Michigan ICCAD 2010

  2. Outline • Introduction • Preliminaries • Terminologies • Algorithms • Initial tree construction and buffer insertion. • Robustness improvements. • Skew optimizations. • Experimental results and Conclusions

  3. Introduction • Advanced technology nodes result in tight skew limitations for its PVT (process, voltage, temperature) variation. • Clock network and sequential element consume up to 70% of CPU power. • Clock tree is more susceptible to variations than mesh while more power efficient.

  4. Introduction (cont.) • Previous clock tree designing method rely on symmetric and regular tree topologies. • These methods are hard to dealing with obstacles, non-uniform sink distributions, and varied sink capacitances. • Commercial tools handle the capacitance well, but the skew doesn’t.

  5. Contributions • A tree solution with better result than those mesh or mixed solutions. • The notion of local-skew slack. • A tabular technique to estimate the impact of variations on skew between two sinks. • A path-based technique to enhance the robustness of a clock tree to PVT variations. • A time-budgeting algorithm and some tuning techniques for tuning the clock tree.

  6. Preliminaries • Clock network in microprocessors

  7. Preliminaries (cont.) • Clock tree construction. • H-tree, spine • DME + (MMM or GMA) • Others • Some terms and definitions.

  8. Definition of global skew • Let Ψ be the clock tree, Si be a sink of tree • Skew between two sinks: • Global skew: • where λ(Si) is the clock latency of Si. • Nominal value: the value neglecting variations.

  9. Definition of local skew • Limit the distance of sink pairs. • Two sinks within a given distance bound • Local skew:

  10. The worst local skew with variation • Let △>0 be the local skew distance bound, νbe the variation model, y be the target yield rate, and f(t) be the CDF of local skew with variations.

  11. Local skew slack • Global skew slack of a sink • The minimum amount of additional delay for a sink to ensure the clock tree satisfies the local skew constraint • Local skew slack of a sink • With additional distance bound • Local skew slack of a tree edge • The minimum one of its downstream sinks

  12. An example

  13. Considering variations • Tabular method considering wirelength, # buffers, largest type of buffer. • Once it is built, it is reusable for another tree • The local skew considering variations

  14. Impact of variations on local skew

  15. Flow

  16. Algorithms • Initial tree construction and buffer insertion. • Robustness improvements. • Skew optimizations. • Wire snaking • Delay buffer insertion

  17. Initial tree construction and buffer insertion • DME + fast variant of van Ginneken’s algorithm. • Since total # buffer is not known, estimate it by the total length of wire, then compute the initial buffer type • Do buffer insertion and get the exact # buffers so as the buffer type is more accurate

  18. Robustness improvements • Perform after initial buffer insertion. • Nominal local skew should be estimated. • From the above theorem, the minimum size of buffer type is selected to reduce capacitance and thus all buffers between these two sinks will be replaced by the selected one.

  19. Wire snaking • Ttarget – the amount of time should be delayed to achieve legal local skew. • Lsn – snaking length determined by snaking algorithm. • Tactual – actual delay resulted from Lsn. • Lideal – the length satisfies Ttarget = Tactual. • Construct a look-up table for Lsn(e) estimation, then takes iterations to make it legal.

  20. Wire snaking (cont.) • Wire snaking at buffer output nodes is much more accurate than at branch. • If the worst slew rate is more than 70% of the given slew limit, the target node is excluded from wore snaking.

  21. Delay buffer insertion • Highly unbalanced sink capacitances or layout obstacles in a sink clusters can result in significant local skew - an alternative technique is needed because wire snaking is inapplicable. • Do delay buffer insertion when

  22. Delay buffer insertion (cont.) • For each path from the buffer to the sinks, inserting at most one buffer is sufficient since the wire snaking algorithm can be invoked again.

  23. Experimental results • Cantango 2.0 in C++. • Slew limitation – 100ps • Target yield – 95% • 12hr runtime limit.

  24. Contest benchmarks

  25. Comparison in results • About 4x less capacitances in average. • Fit the local skew constraint in all cases. • Fit the target yield rate in all cases.

  26. PDF for worst local skew comparison

  27. Trade-off between capacitance and variational skew • Tight local skew leads to usage of large buffer, and hence increasing the capacitance. Loose one leads to capacitance saving.

  28. Conclusions • Proposes a tree solution for CPU clock routing that improves power consumption under tight skew constraint with variations. • Local skew slack. • A model for variational skew, a path-based technique to enhance robustness, a new time-budgeting algorithm for clock-tree tuning and accurate optimizations that satisfy budgets. • Very attractive results.

More Related