1 / 30

Accurate and Complexity-Effective Spatial Pattern Prediction

Accurate and Complexity-Effective Spatial Pattern Prediction. Chi Chen Se-Hyun Yang Babak Falsafi Andreas Moshovos. Motivation – Variation in Spatial Locality. Caches Exploit Spatial Locality via Block Size Prefetch Nearby Data  Improve Performance

jeff
Download Presentation

Accurate and Complexity-Effective Spatial Pattern Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate and Complexity-Effective Spatial Pattern Prediction Chi Chen Se-Hyun Yang Babak Falsafi Andreas Moshovos

  2. Motivation – Variation in Spatial Locality • Caches ExploitSpatial Locality via Block Size • Prefetch Nearby Data Improve Performance • “One Size Fits All” Solution • Large enough for prefetching • Small enough to avoid memory link saturation • Opportunity Variation Within and Across Applications • If “Best Block Size” was known: • Prefetch even further  Higher Performance • “Turn-off” unused data in cache  Lower Leakage Power

  3. This Work • Dynamic Spatial Pattern Prediction • Leakage Power Reduction • Sub-blocks of a block as a Group • Place “unused” block parts in low leakage state • Prefetching • Consecutive Memory Blocks as a Group • Selectively Prefetch Blocks Upon First Access in Group • Key Contribution: PC + Offset Within Group • Quick Learning • Compact Representation • High Coverage

  4. How Well it Works • Spatial Pattern Predictor (SPP) • 256-entry Tag-Less Direct-Mapped • ~95% coverage • L1 Data Leakage Energy Reduction • ~40% reduction w/ 70nm CMOS technology • < 1% average performance degradation • Prefetching w/ 1024 byte Group • Up to 2x speedup and 56% Average • Conventional Cache: 14% Slowdown

  5. Outline • Conventional Cache: Optimization Opportunities • Variation in Spatial Locality • Prediction Framework • Prior Work • Results

  6. Optimization Opportunity #1 Conventional Cache typedef struct person { char name[20]; … int age; int isAdult; struct person* next; } // total 64 bytes // do something … while ( people ) { if ( peopleage >= 21 ) peopleisAdult = TRUE; people = peoplenext; } L1D with 64-Byte cache lines miss age isAdult next miss age isAdult next miss age isAdult next untouched touched Resident untouched data  Wasteful Leakage

  7. Optimization Opportunity #2 Conventional Cache typedef struct person { char name[20]; … int age; int isAdult; } people[LARGE] // do something … for i { if ( people[i].age >= 21 ) people[i].isAdult = TRUE; } L1D with 64-Byte cache lines age isAdult Group #1 age isAdult age isAdult Group #2 age isAdult Detech Access Patterns at Group Level  Selectively Prefetch Same Block Members  Improve Performance w/o Saturating Memory

  8. 100% 40% 89% 26% 48% 80% 60% 40% 20% 0% facerec gcc mcf vortex Variation in Spatial Locality Average Line Usage 8/8 7/8 6/8 5/8 All Cache Lines Touched 4/8 3/8 2/8 1/8 • Fraction of data used before eviction • Measured on 64KB 2-way L1D w/ 64B cache lines

  9. 1 0 . . . 1 Tag1 Tag0 Tag0 Tag1 Tag1 Prediction Framework Minimum Fetch Unit (MFU): • replacement unit of cache • e.g., cache line or sub block Spatial Group: • group of adjacent MFUs • indexed by logical tag Spatial Pattern: • reference pattern of a spatial group Spatial Group Generation: • starts with a new logical tag . . . . . . Time

  10. Spatial Pattern Register PHT Entry Pointer 0 1 1 0 001 1 1 0 0 000 1 0 0 0 011 1 1 1 1 010 Spatial Pattern Predictor Pattern History Table (PHT) Current Pattern Table (CPT) Data Cache Prediction Index Spatial Pattern History 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 Prediction Index: 32 bits =? PC SPG Offset Spatial Pattern Prediction • Current Pattern Table records patterns • Pattern History Table stores captured patterns

  11. Prior Work • Static profiling, V. Vleet, et al. ICCD 1999 • Adjustable block size, Dubnicki & LeBlanc. ISCA 1992 • Fetching adjacent cache lines, Temam & Jegou. ICS 1994 • Dual cache, Gonzalez, Aliagas & Valero. ICS 1995 • Spatial Locality Detection Table, Johnson, Merten & Hwu. MICRO 1998 • Spatial Footprint Predictor (SFP), Kumar & Wilkerson. ISCA 1998 Key Difference is Prediction Handle: PC + Group Offset 1. Compact Representation 2. Quick Learning 3. High Coverage

  12. Results Overview • Predictor Performance Statistics • Leakage Power Reduction • Performance Improvement w/ Prefetching

  13. Methodology • SimpleScalar simulator • 64KB 2-way L1D/L1I cache, 2-cycle latency • 2MB 8-way L2 cache, 12-cycle latency • SPEC CPU2000 • Alpha binaries + reference inputs • Predictor performance evaluation • Simulated to completion • Performance impact evaluation • Skipped 10B and simulated next 500M instructions • Energy reduction evaluation • SPICE w/ 70nm CMOS technology & 1V supply voltage

  14. 160% better 100% 80% 60% 40% 20% 0% Practical Predictor: Performance Training Over-Prediction Over-Prediction Under-Prediction Correct Prediction % of perfect predictions 256 Entries A: 16-way B: DM C: FA A B C A B C A B C A B C gcc mcf vortex fecerec • 256-entry tag-less direct-mapped • average prediction accuracy of 96%

  15. Predictor Applications • Leakage energy reduction • Sub blocks as minimum fetch units • Cache lines as spatial groups • A cache miss starts a spatial group generation • Assuming Gated-Ground by Agarwal, Li, & Roy • Spatial group prefetcher • Cache lines as minimum fetch units • Adjacent cache lines grouped into spatial groups • A new logical tag starts a spatial group generation

  16. 100% 80% 60% 40% 20% 0% 5% gcc mcf vortex AVG fecerec Leakage Energy Reduction • Up to 73% leakage energy reduction • ~40% average leakage energy reduction • < 1% average performance degradation Relative Leakage Power better better Execution Time Increase 60% <1% ~2%

  17. Performance Improvement • Up to 2x speedup with 1024B spatial groups • ~60% average speedup with 1024B spatial groups

  18. Summary • Spatial Pattern Predictor (SPP) • Key Contribution: PC + Group Offset • Small and Effective, High Coverage • 256-entry Tag-Less Direct-Mapped • ~95% coverage • L1 Data Leakage Energy Reduction • ~40% reduction w/ 70nm CMOS technology • < 1% average performance degradation • Prefetching w/ 1024 byte Group • Up to 2x speedup and 56% Average • Conventional Cache: 14% Slowdown

  19. Accurate and Complexity-Effective Spatial Pattern Prediction Chi Chen Se-Hyun Yang Babak Falsafi Andreas Moshovos

  20. 160% 100% 80% 60% 40% 20% 0% Prediction Index Training A: PC B: PC+SPG ID C: PC+SPG OFFSET D: PC+ADDR Over-Prediction Under-Prediction Correct Prediction A B C D A B C D A B C D A B C D facerec gcc mcf vortex • Infinite Tables • PC + SPG offset yields high prediction accuracy • PC + SPG offset has low prediction memory requirements

  21. Contributions • Spatial Pattern Predictor (SPP) • 256-entry Tag-Less Direct-Mapped • ~95% coverage • Leakage Energy Reduction • ~40% reduction w/ 70nm CMOS technology • < 1% average performance degradation • Processor Performance Improvement • Up to 2x speedup

  22. Variations in Spatial Locality • Fraction of data used before eviction • Measured on 64KB 2-way L1D w/ 64B cache lines

  23. Prediction Index • PC + SPG offset yields high prediction accuracy • PC + SPG offset requires low prediction memory requirement

  24. Predictor Memory Organization • 256-entry tag-less direct-mapped yields average prediction accuracy of 96%

  25. Spatial Group Size (1/2)

  26. Spatial Group Size (2/2)

  27. Predictor Memory Organization

  28. Leakage Energy Reduction • Up to 73% leakage energy reduction • ~40% average leakage energy reduction • < 1% average performance degradation

  29. Performance Improvement • Up to 2x speedup with 1024B spatial groups • ~60% average speedup with 1024B spatial groups

  30. 160% 100% 80% 60% 40% 20% 0% Predictor Memory Organization Training Over-Prediction Under-Prediction Correct Prediction A: 128-entry 16-way B: 128-entry DM C: 128-entry FA D: 256-entry 16-way E: 256-entry DM F: 256-entry FA A B C D E F A B C D E F A B C D E F A B C D E F gcc mcf vortex fecerec • 256-entry tag-less direct-mapped • average prediction accuracy of 96%

More Related