1 / 30

StimulusCache : Boosting Performance of Chip Multiprocessors with Excess Cache

StimulusCache : Boosting Performance of Chip Multiprocessors with Excess Cache. HIgh PErformance Computing LAB. Background. Staggering processor chip yield. IBM Cell initial yield = 14 % Two sources of yield loss Physical defects

page
Download Presentation

StimulusCache : Boosting Performance of Chip Multiprocessors with Excess Cache

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StimulusCache:Boosting Performance of Chip Multiprocessorswith Excess Cache HIgh PErformance Computing LAB

  2. Background Staggering processor chip yield • IBM Cell initial yield = 14 % • Two sources of yield loss • Physical defects • Process variations • Smaller device sizes • Critical defect size shrinks • Process variations become more severe • As a result, yield is severely limited http://hipe.korea.ac.kr

  3. Background Core disabling to rescue • Recent multi-core processors employ “core disabling” • Disable failed cores to salvage sound cores in a chip • Significant yield improvement, • IBM Cell : 14% -> 40% with core disabling of a single faulty core • However, this approach also disables many “good components” • Ex : AMD Phenom X3 disables L2 cache of faulty cores http://hipe.korea.ac.kr

  4. Background Core disabling uneconomical • Many unused L2 caches exist • Problem exacerbated with many cores http://hipe.korea.ac.kr

  5. Motivation StimulusCache • Basic idea • Exploit “excess cache” (EC) in a failed core • Core disabling (yield ↑) + larger cache capacity (performance ↑) http://hipe.korea.ac.kr

  6. StimulusCache design issues • Questions • How to arrange ECs to give to cores? • Where to place data, in ECs or local L2? • What HW support is needed? • How to flexibly and dynamically allocate ECs? Excess caches Core 1 Core 2 Core 3 Core 0 Core 6 Working cores Core 7 Core 4 Core 5 http://hipe.korea.ac.kr

  7. Excess Cache Utilization Policies Temporal Complex Maximized Performance Simple Limited Performance Spatial http://hipe.korea.ac.kr

  8. Excess Cache Utilization Policies Temporal Spatial Exclusive to a core Performance isolation Limited capacity usage Shared by multiple cores Performance interference Maximum capacity usage http://hipe.korea.ac.kr

  9. Excess Cache Utilization Policies Temporal Spatial Static private BAD (not evaluated) Static sharing Dynamic sharing http://hipe.korea.ac.kr

  10. EC Utilization Policies Static private policy http://hipe.korea.ac.kr

  11. EC Utilization Policies Static sharing policy http://hipe.korea.ac.kr

  12. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  13. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  14. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  15. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  16. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  17. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  18. EC Utilization Policies Dynamic sharing policy http://hipe.korea.ac.kr

  19. Hardware Support H/W architecture: Vector table http://hipe.korea.ac.kr

  20. Hardware Support ECAV: Excess cache allocation vector • Data search support http://hipe.korea.ac.kr

  21. Hardware Support SCV: Shared core vector • Cache coherency support http://hipe.korea.ac.kr

  22. Hardware Support NECP: Next excess cache pointers • Data promotion / eviction support http://hipe.korea.ac.kr

  23. Software Support OS support • OS enforces an excess cache utilization policy before programming cache controllers • Explicit decision by administrator • Autonomous decision based on system monitoring • OS may program cache controllers • At system boot-up time • Before a program starts • During a program execution • OS may take into account workload characteristics before programming http://hipe.korea.ac.kr

  24. Evaluation Experimental setup • Intel ATOM-like cores without 16-stage pipeline @ 2GHz • Memory hierarchy • L1: 32KB I/D, 1 cycle • L2: 512KB, 10cycles • Main memory: 300cycles, contention modeled • On-chip network • Crossbar for 8-core CMP(4 cores + 4ECs) • Contention modeled • SPEC CPU 2006, SPLASH-2, and SPECjbb2005 http://hipe.korea.ac.kr

  25. Evaluation Static private – single thread http://hipe.korea.ac.kr

  26. Evaluation Static private – multi thread More than 40% improvement All “H” workloads Without “H” workloads With “H” workloads Multi-programmed Multithreaded http://hipe.korea.ac.kr

  27. Evaluation Static sharing – multi thread Capacity interference Multithreaded workloads Significant improvement Multi-programmed Multithreaded http://hipe.korea.ac.kr

  28. Evaluation Dynamic sharing – multi thread Additional improvement Capacity interference avoided Multi-programmed Multithreaded http://hipe.korea.ac.kr

  29. Evaluation Dynamic sharing – individual workloads Significant additional improvement over static sharing Minimum degradation over static sharing http://hipe.korea.ac.kr

  30. Conclusion • Processing logic yield vs L2 cache yield • A large number of excess L2 caches • Stimuluscache • Core disabling (yield ↑) + larger cache capacity (performance ↑) • Simple HW architecture extension + modest OS support • Three excess cache utilization policies • Static private • Static sharing • Dynamic sharing http://hipe.korea.ac.kr

More Related