1 / 28

By Hongbin Sun, Nanning Zheng , and Tong Zhang

Leveraging Access Locality for the Efficient Use of Multibit Error-Correcting Codes in L2 Cache. Joseph Schneider March 23, 2010. By Hongbin Sun, Nanning Zheng , and Tong Zhang. The Problem. As CMOS technology shrinks, random defects increase

Download Presentation

By Hongbin Sun, Nanning Zheng , and Tong Zhang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging Access Locality for the Efficient Useof Multibit Error-Correcting Codes in L2 Cache Joseph Schneider March 23, 2010 By Hongbin Sun, Nanning Zheng, and Tong Zhang

  2. The Problem • As CMOS technology shrinks, random defects increase • Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones • As random defects increase, traditional defect strategy may no longer be sufficient

  3. The Solution • Extend the role of Error-Correcting Codes to compensate for defects • Error-Correcting Codes (ECC) also used to compensate for transient soft errors • Find a method that allows ECCsto be used for both defects and soft errors

  4. Multi-bit ECC • Multi-bit ECC – ECC that can correct multiple errors in one codeword • Suffers larger latency and higher coding redundancy than single error correction • Therefore unusable in L1 cache without suffering major performance issues

  5. Overall Goal • Implement multi-bit ECC in L2 cache design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost

  6. Steps to Success • 1. Apply multi-bit ECC only to cache blocks that require it • 2. Implement buffers to limit repeated use of multi-bit ECC • 3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it

  7. Limited multi-bit ECC • Cache blocks with one or more defective cells identified during memory testing; Multi-bit ECC selectively applied then • Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks) • ISSUE: CAM requires large energy consumption

  8. Proposed Architecture • Standard L2 cache core protecting all subblocks with single error correction, double error detection (SEC-DED) codes • Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits • Dirty Replication Cache to ensure soft error tolerance

  9. Proposed Architecture

  10. Multi-bit ECC Core • In case of write, subblock data encoded and check bits stored • In case of read, check bits fetched and decoded • ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities • Solution: Two additional buffers

  11. Multi-bit ECC Core Buffers • Pre-decoding Buffer: Small cache that keeps copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache • Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality • Reduces large amount of ECC decoding and some M-ECC cache access

  12. Multi-bit ECC Core Buffers • FLU buffer – small CAM that keeps addresses of recently accessed cache blocks that are NOT m-blocks • Also employs LRU policy • Further reduces M-ECC cache access

  13. M-ECC core Flow Chart

  14. Soft Error Tolerance • ISSUE: When ECC devoted to defect tolerance, defective subblock is vulnerable to soft errors • Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC) • Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected

  15. Dirty Replication Cache • Use of Dirty Replication (DR) cache • When cache block made dirty, data is also kept in this cache • When data leaves this cache, a write is performed to main memory • Ensures a backup is always available

  16. Evaluation • Cache defect density set at 0.5% • Multi-ECC: BCH-based DEC-TED code (double error correction, triple error detection); Subblocks with more than two errors repaired by redundancy • Cache subblocks contain 64 bits • BCH DEC-TED decoder has parallelism of 2, uses PGZ decoding algorithm- resulting latency of 82 cycles • Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core

  17. Evaluation • Compared on four bases: • Base: Defect-free L2 cache with no defect tolerant functions • M-ECC only; No buffers • M-ECC-pbuf: Use of predecoding buffer • M-ECC-pfbuf: Use of predecoding and FLU buffers • First, determine best size of buffers for use; Then compare performance of IPC and power consumption

  18. Size of precoding Buffer

  19. Size of FLU buffer

  20. Normalize IPC comparison

  21. Normalized Power Consumption

  22. Results • Similar IPC performance, M-ECC core power performance 30% of L2 cache core, which itself is about 10% of the entire system cache

  23. DR Write-back hit rates • L2 cache fixed at 1 MB 8-way associative, DR varies

  24. DR Write-back hit rates • DR fully associative with 64 blocks, 1 MB L2 cache varies

  25. Conclusions • Goal was to effectively use multi-bit ECC for L2 cache defect tolerance at minimal performance and implementation cost • Multi-bit ECC implemented only where more than one defect found • Two small buffers included to reduce performance impact of multi-bit ECC • Dirty Replication Cache included to ensure soft error tolerance

  26. Conclusions • IPC performance nearly the same as defect-free cache • M-ECC cache has less than 2.5% of area overhead and 36% of energy consumption overhead • Dirty replication cache has area overhead of only 0.3%, storing 96.4% of write-back data from L1 cache

More Related