1 / 25

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture. Samira Khan *† , Alaa R. Alameldeen *, Chris Wilkerson*, Jaydeep Kulkarni * and Daniel A. Jiménez § *Intel Labs † Carnegie Mellon University § Texas A&M University. Summary. Problem :

thane-snow
Download Presentation

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, JaydeepKulkarni* and Daniel A. Jiménez§ *Intel Labs †Carnegie Mellon University §Texas A&M University

  2. Summary • Problem: • Cache cells become unreliable at low voltage • Mixed-cell cache: Use some larger robust cells [Ghasemi 2011] • Smaller non-robust cells are turned off at low voltage • Capacity loss leads to performance loss • Goal: • No capacity loss at low voltage to gain high performance • Observation: • A clean line has a duplicate copy in the memory hierarchy • A modified line is the only existing copy • Our Approach: • Protect a modified line in larger robust cells • Store a clean line in smaller non-robust cells • Fetch data from the lower level on an error in a clean line • Significantly improves performance and reduces power compared to prior work

  3. Outline • Summary • Background and Motivation • Mixed-Cell Cache Architecture • Methodology and Results • Conclusion

  4. Background and Motivation • Multi-core designs are power-limited • Can activate more cores by lowering the voltage Voltage Scale More active cores at low voltage

  5. Ensuring Resiliency at Lower Voltage • Cache cells begin to fail at lower voltage Robust Cache Mixed-Cell Cache Non-robust • Mixed-Cell Cache [Ghasemi 2011] • Some ways built with robust cells + Resilient to error at low voltage - Area and power overhead • Only robust cells are operational at low voltage Error • Cache capacity loss at lower voltage • can degrade performancesignificantly

  6. Effect of Cache Capacity Reduction in a 4-Core System • In our experiments, 75% reduction in cache capacity leads to 20% performance loss on average

  7. Goal: Improve performance using the whole cache at low voltage

  8. Outline • Summary • Background and Motivation • Mixed-Cell Cache Architecture • Methodology and Results • Conclusion

  9. Our Mixed-Cell Architecture • Observation: • A Clean line has a duplicate copy in the memory hierarchy • On an error, can get the data from the duplicate copy • A Modified line is the only copy in the system • Criticalto keep the data error free • Idea: • Protect a modified line using larger robust cells • Store a clean line in smaller non-robust cells • Use parity/ECC to detect errors in clean lines • Fetch data from the lower level on an error in clean lines

  10. Our Mixed-Cell Architecture • Use both robust and non-robust ways at low voltage • A modified line is stored only in a robust way • A clean line is stored only in a non-robust way Robust Our Design Mixed-cell (Disable) [Ghasemi 2011] Non-robust Clean Modified Modify cache management techniques to ensure clean and modified lines are stored appropriately

  11. Mixed-Cell Architecture: Cache Miss • Write miss: Allocate line in a robust way • Read miss: Allocate line in a non-robust way X Y A B LRU LRU LRU LRU Write miss X Write miss Y Read miss A Read miss B Time

  12. Mixed-Cell Architecture: Cache Hit J K E L F G H I • Read hit: No change • Write hit: • Write hit in robust: No change • Write hit in non-robust: We propose three mechanisms • Writeback • Swap • Duplicate Write Hit G Read Hit J Write Hit E

  13. Write to a Non-Robust Line: Writeback • Write it back in the next level of memory hierarchy • Make data clean in the non-robust cell Write Hit G J K E L F G H I Now this block contains clean data Dirty block in non-robust way is vulnerable, writeback G + Simple - An extra writeback at each write in a non-robust way

  14. Write to a Non-Robust Line: Swap • Swap modified line with the LRU robust line • Writebackthe robust data to next cache level Write Hit G J K E L F G H I Now this block contains clean data Swap E and G, E is now vulnerable, writeback E + Increases write hits in robust cells - Extra latency for swap

  15. Write to a Non-Robust Line: Duplicate • Pair two non-robust ways • Static pairing: way <0,1>, <2,3>… • Duplicate the data in the partner way • On an error in one way, use data from the partner way Write Hit G G J K E L F G H I , Duplicate G in the partner way + Simple, no extra writeback - Capacity loss, extra latency for duplication

  16. Outline • Summary • Background and Motivation • Mixed-Cell Cache Design • Methodology and Results • Conclusion

  17. Evaluation Methodology • Simulator: CMP$im, a Pin-based x86 simulator [Jaleel 2008] • Benchmarks: 20 4-core multi-programmed mixes from SPEC 2006 • Each cache has 2 robust ways • L1D 32KB, 2 robust, 6 non-robust ways, 3 cycles • L2 256KB, 2 robust, 6 non-robust ways, 10 cycles • L3 shared 4MB, 2 robust, 14 non-robust ways, 25 cycles • Memory latency 80 cycles • Vmin 590 mV, 825 MHz

  18. Comparison Points • Robust: Cache uses only robust cells • Smaller capacity, L1D 20KB, L2 160KB, L3 2.25MB • Disable:Mixed-Cell Cache [Ghasemi 2011] • Only ¼ of the cache works at low voltage, L1D 8KB, L2 64KB, L3 1MB • Ideal: Cache uses only non-robust cells • Larger capacity, L1 40KB, L2 320KB, L3 4.5MB • Can not work at low voltage • Can provide higher voltage to cache using a separate Vcc • Increases complexity • Adds latency to signals crossing voltage domains

  19. 4-Core Performance at Low Voltage 2.6% 17% Swap provides 17% speedup over Disable Swap performs within 2.6% of Ideal

  20. Normalized Memory Bandwidth 6.15 21% 28.5% 3% Duplicate increases memory bandwidth by only 3% compared to Ideal

  21. Normalized LLC Static Power at Vmin (590mV) 10% 2.3X Swapand Duplicate reduce LLC static power by 10% compared to Ideal

  22. Normalized L1D Dynamic Power at Vmin (590mV) 22% 50% 30% Duplicate reduces dynamic power by 50% compared to Disable Duplicate is within 30% of the Ideal

  23. Conclusion • Problem: • Cache cells become unreliable at low voltage • Mixed-cell cache: Use some larger robust cells • Smaller non-robust cells are turned off at low voltage • Capacity loss leads to performance loss • Goal: • No capacity loss at low voltage to gain high performance • Observation: • A clean line has a duplicate copy in the memory hierarchy • A modified line is the only existing copy • Our Approach: • Protect a modified line in larger robust cells • Store a clean line in smaller non-robust cells • Fetch data from the lower level on an error in a clean line • Improves performance by 17% and reduces L1D dynamic power by 50% compared to prior work

  24. Thank you

  25. Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, JaydeepKulkarni* and Daniel A. Jiménez§ *Intel Labs †Carnegie Mellon University §Texas A&M University

More Related