Paper Presentation

Paper Presentation An Energy-Efficient Adaptive Hybrid Cache 2013/10/21 Yun-Chung Yang Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer Science Department, University of California, Los Angeles Low Power Electronics and Design (ISLPED) 2011 International Symposium on Page 67 – 72

Outline • Abstract • Related Work • What’s the Problem • Run-time behavior • Set balancing • Proposed Method – Adaptive Hybrid Cache • Experiment Result • Conclusion

Abstract By reconfiguring part of the cache as software-managed scratchpad memory (SPM), hybrid caches manage to handle both unknown and predictable memory access patterns. However, existing hybrid caches provide a flexible partitioning of cache and SPM withoutconsidering adaptation to the run-time cache behavior. Previous cache set balancing techniques are either energy-inefficient or require serial tag and data array access. In this paper an adaptive hybrid cache is proposed to dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets. This achieves 19%, 25%, 18% and 18% energy-runtime-production reductions over four previous representative techniques on a wide range of benchmarks.

Related Work This paper SPM Energy(set utilization) Software Controlled Partition cache from way to blocks Balanced Cache[12] Victim cache[11] Serial tag/data access [2], [3] [4], [5] Use CAM memory Not need tag/data access Column caching FlexCache [8]-[10] Reconfigurable cache Virtual Local Store

What’s the problem? • Previous hybrid cache designs partition the cache and SPM without adaption to the run time cache behavior. • Due to SPM allocation is uniform and cache behavior is non-uniform • Hot cache set problem.

The Proposed Method • Adaptive Hybrid Cache • (a) Original Code • (b) Transformed Code for AH-cache • Compiler’s job • (c) Memory space for AH -cache • (d) SPM blocks • Adaptive Mapping to cache • (e) SPM mapping in cache • (f) SPM mapping look-up table(SMLT)

Hardware Configuration • Hardware for AH-cache • The green part is for accessing the SPM. • Perform addressing cache and SMLT look-upin parallel with the virtual address calculation in the pipeline architecture.

Adaptive Mapping • Dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets • Migrate SPM blocks from high demand sets to low demand sets Initial map of SPM block in cache is random

Adaptive Mapping – Victim Tag Buffer • Goal : Application requires P SPM blocks while AH-cache can provide Q SPM blocks at most, there will be S=P-Q blocks to adaptive satisfy the high-demand cache sets. • Solution : • Use victim tag buffer to capture the demand of each set. • Floating block holder – Record the cache sets that hold the floating blocks.

Floating Block Holder • Re-insertion bit = 1, means this set is highly demanded and re-inserted into FBH queue.

Improvement of FBH queue • Problem : Worst-case of S cycles delay for searching, where S is max size of SPM. • Solution : • Storing re-insertion bit into table called re-insertion bit table(RIBT) • Search parallel in 16 re-insertion bit.

What would you like to know • Storage Overhead • Critical path of SMLT table in pipeline stage • Comparison with other(performance, miss rate, energy) • Non-adaptive hybrid cache(N) • Non-adaptive hybrid cache + balanced cache(B) • Non-adaptive hybrid cache + victim cache(Vp, Vs) • Phase-reconfigurable hybrid cache(R) • Adaptive hybrid cache(AH) • Static optimized hybrid cache(S)

Storage • 16KB, 2 way-associative, 128 sets, 64B data block, 4B tag entry size. • 128 64B SPM blocks • SMLT 128 9-bit entries(1 valid + 6 bit index + 2 bit way) • Insertion flag + 4-bit counter • FBH queue 128 7-bit entries • RIBT 8 16-bit entries • Total : 0.4KB, 3% of the hybrid cache size

Critical Path of SMLT • 32nm technology(cache block size is 64B) • 0.2ns for critical path fits in 4GHz core.

Experiment Result – Miss Rate • R reduces cache miss by 34% • AH-cache reduces the cache miss by 52% • AH-cache outer perform B because of B cache allocate SPM in uniform way without considering cache set demand. • Victim cache depends on its size.

Experiment Result – Performance • AH-cache outer perform B, Vp, Vs and R by 3%, 4%, 8% and 12%, respectively

Experiment Result – Energy • Although the proposed method with additional hardware, SMLT, VTB and adaptive mapping unit,AH-cache still have energy reduction of 16%, 22%, 10% and 7% compared to designs B, Vp, Vsand R, respectively.

Conclusion & Comment • AH-cache dynamic remapping SPM blocks to cache block on run-time behavior. • AH-cache achieves energy-runtime-production reduction of 19%, 25%, 18% and 18% over designs B, Vp, Vsand R. • My comment • Detail explained • Mention the usage of tag while in SPM mode

Paper Presentation