1 / 24

Drowsy Caches: Simple Techniques for Reducing Leakage Power

ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 2002, VOL 29, pages 148-157. Drowsy Caches: Simple Techniques for Reducing Leakage Power. Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of Michigan

britain
Download Presentation

Drowsy Caches: Simple Techniques for Reducing Leakage Power

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 2002, VOL 29, pages 148-157 Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of Michigan Nam Sung Kim, Steve Martin, David Blaauw & Trevor Mudge In-class presentation on 11/24/2008 by : Harshit Khanna (1200127817) Arizona State University CSE 520 Advanced Computer Architecture

  2. Outline • Summary • Motivation • Circuit Techniques • Traditional Circuit Techniques • Gated-VDD • ABB-MTCMOS • Dynamic VDD Scaling (DVS) • Comparison of various low-leakage circuit techniques • Proposed circuit technique • Policies • Implementation of drowsy cache line • Additions to the traditional cache line • Basic working description • Working set characteristics • Observations • Results • Policy evaluation • Policy evaluation • Test Setup • Energy consumption • Future work Arizona State University CSE 520 Advanced Computer Architecture

  3. Summary • Simplest policy – cache lines are periodically put into a low-power mode without regard to their access histories - can reduce the cache’s static power consumption by more than 80%. • Total energy consumed in the cache can be reduced by an average of 54%. • Fraction of leakage energy is reduced from an average of 76% in projected conventional caches to an average of 50% in the drowsy cache. • Performance degradation - 9% for crafty & < 4% for equake. Arizona State University CSE 520 Advanced Computer Architecture

  4. Motivation • Speed density leakage (static) power consumption • Leakage power accounts for 15%-20% of the total power on chips. • As processor technology moves below 0.1 micron, static power consumption is set to increase exponentially, setting static power consumption on the path to dominating the total power used by the CPU. • The on-chip caches are one of the main candidates for leakage reduction since they contain a significant fraction of the processor’s transistors. Arizona State University CSE 520 Advanced Computer Architecture

  5. Circuit Techniques Arizona State University CSE 520 Advanced Computer Architecture

  6. Traditional Circuit Techniques • Gated-VDD • Working: • Reduces the leakage power by using a high threshold (high-Vt) transistor to turn off the power to the memory cell when the cell is set to low-power mode. • Advantages : • Leakage significantly reduced. • Disadvantages : • It loses any information stored in the cell when switched into low-leakage mode. • Performance penalty. • Requires special high-Vtdevices for the control logic. Arizona State University CSE 520 Advanced Computer Architecture

  7. Traditional Circuit Techniques (contd.) • ABB-MTCMOS • Working: • Threshold voltages of the transistors in the cell are dynamically increased when the cell is set to drowsy mode by raising the source to body voltage of the transistors in the circuit. • Advantages : • Leakage significantly reduced • Disadvantages: • Supply voltage of the circuit is increased, thereby offsetting some of the gain in total leakage power. • Requires special high-Vtdevices for the control logic. Arizona State University CSE 520 Advanced Computer Architecture

  8. Dynamic VDD Scaling (DVS) • Disadvantages • Process variation dependent. • More noise susceptible. • Advantages • Retains cell information in low-power mode. • Fast switching between power modes. • Easy implementation. • More power reduction than ABB-MTCMOS. Arizona State University CSE 520 Advanced Computer Architecture

  9. Comparison of various low-leakage circuit techniques Arizona State University CSE 520 Advanced Computer Architecture

  10. Proposed circuit technique • Choose between two different supply voltages in each cache line. • DVS technique - used in the past to trade off dynamic power consumption and performance. • Exploiting voltage scaling to reduce static power consumption. • Due to short-channel effects in deep-submicron processes, leakage current reduces significantly with voltage scaling. Arizona State University CSE 520 Advanced Computer Architecture

  11. Policies Arizona State University CSE 520 Advanced Computer Architecture

  12. Implementation of the drowsy cache line • L1 drowsy data caches. • All lines in an L2 cache can be kept in drowsy mode without significant impact on performance. Arizona State University CSE 520 Advanced Computer Architecture

  13. Additions to the cache line • word line gating circuit • prevent accesses when in drowsy mode since unchecked accesses to a drowsy line could destroy the memory’s contents. • voltage controller • Determines operating voltage of an array of memory cells in the cache line • It switches the array voltage between the high (active) and low (drowsy) supply voltages depending on the state of the drowsy bit. • drowsy bit • Controlling the voltage to the memory cells Arizona State University CSE 520 Advanced Computer Architecture

  14. Basic working description • If a drowsy cache line is accessed, the drowsy bit is cleared, and consequently the supply voltage is switched to high VDD. • The wordline gating circuit is used to prevent accesses when in drowsy mode, since the supply voltage of the drowsy cache line is lower than the bit line precharge voltage; unchecked accesses to a drowsy line could destroy the memory’s contents. • Whenever a cache line is accessed, the cache controller monitors the condition of the voltage of the cache line by reading the drowsy bit. If (accessed line == normal mode) Then read the contents of the cache line (without losing any performance because the power mode of the line can be checked by reading the drowsy bit concurrently with the read and comparison of the tag). If (accessed line == drowsy mode) Then prevent the discharge of the bit lines of the memory array (because it may read out incorrect data). The line is woken up automatically during the next cycle, and the data can be accessed during consecutive cycles. Arizona State University CSE 520 Advanced Computer Architecture

  15. Working set characteristics ExecFactor - expected worst-case execution time increase for the baseline algorithm accs - the number of accesses wakelatency - wakeup latency = 1 cycle accsperline - number of accesses per line Memimpact (how much impact a single memory access has on overall performance) assumption : increase in cache access latency = increase in execution time So memimpact is set to 1 Arizona State University CSE 520 Advanced Computer Architecture

  16. Observations Should tags be put into drowsy mode along with the data? • In both cases, no extra latencies are involved when an awake line is accessed • A drowsy access takes at least three cycles to complete • In direct-mapped caches there is no performance advantage to keeping the tags awake. There is only one possible line for each index, thus if that line is drowsy, it needs to be woken up immediately to be accessed. Arizona State University CSE 520 Advanced Computer Architecture

  17. Results • The fraction of unique cache lines accessed during an update window—is relatively small. • On most benchmarks more than 90% of the lines can be in drowsy mode at any one time. • Performance degradation - 9% for crafty & < 4% for equake. • Advantages: • Significantly reduce the static power consumption of the cache • prediction techniques to control the drowsy cache not necessary if drowsy cache can transition between drowsy and awake modes relatively quickly. Arizona State University CSE 520 Advanced Computer Architecture

  18. Policy evaluation Arizona State University CSE 520 Advanced Computer Architecture

  19. Policy evaluation • The following parameters can be varied: • Update window size: specifies in cycles how frequently decisions are made about which lines to put into drowsy mode. • Simple or Noaccess policy: The policy that uses no perlineaccess history is referred to as the simple policy. In this case, all lines in the cache are put into drowsy mode periodically (the period is the window size). The noaccesspolicy means that only lines that have not been accessed in a window are put into drowsy mode. • Awake or drowsy tag: specifies whether tags in the cache may be drowsy or not. • Transition time: the number of cycles for waking up or putting to sleep cache lines. They only consider 1 or 2 cycle transition times, since the circuit simulations indicate that these are reasonable assumptions. Arizona State University CSE 520 Advanced Computer Architecture

  20. Test setup • They use various benchmarks from the SPEC2000 suite on SimpleScalar using the Alpha instruction set. • All simulations were run for 1 billion instructions. • The simulator configuration parameters are summarized below: • OO4: 4-wide superscalar pipeline, 32K direct-mapped L1 icache, 32 byte line size - 1 cycle hit latency, 32K 4-way set associative L1 dcache, 32 byte line size - 1 cycle hit latency, 8 cycle L2 cache latency. • IO2: 2-wide in-order pipeline, cache parameters same as for OO4. Arizona State University CSE 520 Advanced Computer Architecture

  21. Energy consumption • The authors find that the simple policy with a window size of 4000 cycles reaches a reasonable compromise between simplicity of implementation, power savings, and performance. • The impact of this policy on leakage energy is characterized by : • Normalized total energy - the ratio of total energy used in the drowsy cache divided by the total energy consumed in a regular cache. • Normalized leakage energy - the ratio of leakage energy in the drowsy cache to leakage energy in a normal cache. • The data in the DVS columns - energy savings resulting from the scaled-VDD (DVS) circuit technique. • The data in the theoretical minimum column - assumes that leakage in low-power mode can be reduced to zero (without losing state). i.e. it estimates the energy savings given the best possible hypothetical circuit technique. Arizona State University CSE 520 Advanced Computer Architecture

  22. Drowsy cache implementation – reduces the total energy consumed in the data • cache by more than 50% without significantly impacting performance. • Total leakage energy is reduced by : • - average of 71% when tags are always awake. • - average of 76% using the drowsy tag scheme. Arizona State University CSE 520 Advanced Computer Architecture

  23. Future work • The proposed scheme is not a solution to all caches in the processor. • L1 instruction cache does not do as well with the proposed algorithm. • Investigate the use of instruction prefetch algorithms combined with the drowsy circuit technique. • Extension of these techniques to other memory structures, such as branch predictors. • Impact of having adaptive window size. Arizona State University CSE 520 Advanced Computer Architecture

  24. Thank youQuestions? Arizona State University CSE 520 Advanced Computer Architecture

More Related