280 likes | 382 Views
Leveraging Access Locality for the Efficient Use of Multibit Error-Correcting Codes in L2 Cache. Joseph Schneider March 23, 2010. By Hongbin Sun, Nanning Zheng , and Tong Zhang. The Problem. As CMOS technology shrinks, random defects increase
E N D
Leveraging Access Locality for the Efficient Useof Multibit Error-Correcting Codes in L2 Cache Joseph Schneider March 23, 2010 By Hongbin Sun, Nanning Zheng, and Tong Zhang
The Problem • As CMOS technology shrinks, random defects increase • Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones • As random defects increase, traditional defect strategy may no longer be sufficient
The Solution • Extend the role of Error-Correcting Codes to compensate for defects • Error-Correcting Codes (ECC) also used to compensate for transient soft errors • Find a method that allows ECCsto be used for both defects and soft errors
Multi-bit ECC • Multi-bit ECC – ECC that can correct multiple errors in one codeword • Suffers larger latency and higher coding redundancy than single error correction • Therefore unusable in L1 cache without suffering major performance issues
Overall Goal • Implement multi-bit ECC in L2 cache design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost
Steps to Success • 1. Apply multi-bit ECC only to cache blocks that require it • 2. Implement buffers to limit repeated use of multi-bit ECC • 3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it
Limited multi-bit ECC • Cache blocks with one or more defective cells identified during memory testing; Multi-bit ECC selectively applied then • Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks) • ISSUE: CAM requires large energy consumption
Proposed Architecture • Standard L2 cache core protecting all subblocks with single error correction, double error detection (SEC-DED) codes • Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits • Dirty Replication Cache to ensure soft error tolerance
Multi-bit ECC Core • In case of write, subblock data encoded and check bits stored • In case of read, check bits fetched and decoded • ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities • Solution: Two additional buffers
Multi-bit ECC Core Buffers • Pre-decoding Buffer: Small cache that keeps copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache • Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality • Reduces large amount of ECC decoding and some M-ECC cache access
Multi-bit ECC Core Buffers • FLU buffer – small CAM that keeps addresses of recently accessed cache blocks that are NOT m-blocks • Also employs LRU policy • Further reduces M-ECC cache access
Soft Error Tolerance • ISSUE: When ECC devoted to defect tolerance, defective subblock is vulnerable to soft errors • Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC) • Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected
Dirty Replication Cache • Use of Dirty Replication (DR) cache • When cache block made dirty, data is also kept in this cache • When data leaves this cache, a write is performed to main memory • Ensures a backup is always available
Evaluation • Cache defect density set at 0.5% • Multi-ECC: BCH-based DEC-TED code (double error correction, triple error detection); Subblocks with more than two errors repaired by redundancy • Cache subblocks contain 64 bits • BCH DEC-TED decoder has parallelism of 2, uses PGZ decoding algorithm- resulting latency of 82 cycles • Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core
Evaluation • Compared on four bases: • Base: Defect-free L2 cache with no defect tolerant functions • M-ECC only; No buffers • M-ECC-pbuf: Use of predecoding buffer • M-ECC-pfbuf: Use of predecoding and FLU buffers • First, determine best size of buffers for use; Then compare performance of IPC and power consumption
Results • Similar IPC performance, M-ECC core power performance 30% of L2 cache core, which itself is about 10% of the entire system cache
DR Write-back hit rates • L2 cache fixed at 1 MB 8-way associative, DR varies
DR Write-back hit rates • DR fully associative with 64 blocks, 1 MB L2 cache varies
Conclusions • Goal was to effectively use multi-bit ECC for L2 cache defect tolerance at minimal performance and implementation cost • Multi-bit ECC implemented only where more than one defect found • Two small buffers included to reduce performance impact of multi-bit ECC • Dirty Replication Cache included to ensure soft error tolerance
Conclusions • IPC performance nearly the same as defect-free cache • M-ECC cache has less than 2.5% of area overhead and 36% of energy consumption overhead • Dirty replication cache has area overhead of only 0.3%, storing 96.4% of write-back data from L1 cache