1 / 32

Link-Time Path-Sensitive Memory Redundancy Elimination

Link-Time Path-Sensitive Memory Redundancy Elimination. Manel Fern á ndez and Roger Espasa {mfernand,roger}@ac.upc.es Computer Architecture Department Universitat Polit è cnica de Catalunya Barcelona, Spain. Motivation. The memory “gap” Processor speed increases faster than memory speed

december
Download Presentation

Link-Time Path-Sensitive Memory Redundancy Elimination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa {mfernand,roger}@ac.upc.es Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain

  2. Motivation • The memory “gap” • Processor speed increases faster than memory speed • L1-cache latency continues to increase • Memory operations remain a significant bottleneck • Memory redundancy • Instructions that repeatedly access the same location • Lots of memory operations are redundant • Hardware designers exploit memory redundancy • E.g., caches take advantageof temporal reuse • The compiler must be very aggressive in memory optimizations

  3. Memory redundancy • Memory instructions that repeatedly access the same location • Lots of memory operations are redundant • Sources of redundancy • Source code structure • Programmers introduce redundancy • Traditional compilation • Separate compilation units • Limitations in the compilation model • Code generation introduces redundancy • What percentage of memory operations are redundant at run time? redundancy source intervening store … = *p; if ( … ) { *q = … … = *p; } redundant load

  4. Dynamic memory redundancy Load redundancy Store redundancy

  5. Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary

  6. Memory redundancy elimination (MRE) • Removal of memory instructions that repeatedly access the same location • Targeted at redundancy type • Load redundancy elimination (LRE) in a path-sensitive fashion • Based on path-sensitive memory disambiguation • Store redundancy elimination (SRE) • Targeted at redundancy distance • Eliminating close/distant redundancy • In the context of a binary optimizer • Overcome limitations of traditional compilers • Need to deal with “executable code” problems

  7. Load redundancy elimination (LRE) Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE Profile information is needed Eliminating close redundancy Within extended basic blocks (EBBs) Eliminating distant redundancy Intraprocedural dataflow analysis [HorspoolHo97] For fully/partially-redundant loads Redundancy on all/some paths Partial-LRE requires insertion of speculative loads ... I1 load (p0), r1 move r1 , r0 ... ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97

  8. Memory disambiguation • Register use-def chains • Symbolic descriptors for every use • Disambiguation by instruction inspection • Fails on path-sensitive redundancies • Need to deal withpath-sensitive information • Partial-LRE is not sufficient either ... I0def p0 ... I1 load (p0),r1 ... ... I3 add p0,8,p0 ... IØØ-def p0 ... I2 load (p0),r2 ... √ ?

  9. Path-sensitive redundancy • Path-sensitive memory disambiguation • Established for only a subset of all the possible paths • Subsumes generic disambiguation • Path-sensitive LRE • Partial-LRE is now adapted for dealing with path-sensitive redundancies • Availability on edge (AVEDGij) ... I0def p0 ... I1 load (p0),r1 move r1, r0 ... ... I3 add p0,8,p0 load (p0),r0 ... IØØ-def p0 ... move r0, r2 I2 load (p0),r2 ... √ x ---------------

  10. Store redundancy elimination (SRE) Similar approach than LRE SRE on EBBs Full- and Partial-SRE New formulation of the analysis No path-sensitive elimination! Elimination of dead stores Other optimizations produce a lot of dead stores Form of dead code elimination Based on heuristics Includes a basic analysis for useless stack locations ... I1 store r1, (p0) ... I2 store r2, (p0) ... ---------------- ... I1 load (p0), r0 ... I2 store r0, (p0) ... ----------------

  11. Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary

  12. Methodology • Benchmark suite • SPECint95 • Compiled on an AlphaServer with full optimizations • Intrumented using Pixie to get profiling information • Aggressively re-optimized using Alto • Experimental framework • Alto executable optimizer • Evaluation • Dynamic number of loads/stores • Actual execution time • AlphaServer GS-140, Alpha EV6-21264

  13. Dynamic number of loads/stores

  14. Execution time Relative execution time on an AlphaServer GS-140, Alpha EV6-21264 525MHz

  15. Dynamic replay traps Relative number of replay traps on the sim-alpha simulator, modeling an Alpha EV6-21264

  16. Talk outline • Motivation • Memory redundancy elimination (MRE) • Evaluation • Summary

  17. Summary • A high percentage of memory operations are redundant • Memory redundancy elimination (MRE) • Removal of redundant memory operations • Load redundancy elimination (LRE) in a path-sensitive fashion • Based on path-sensitive memory disambiguation • Store redundancy elimination (SRE) • Including elimination of dead stores • For executable code or link-time • Overcome limitations of traditional compilers • Valuable results on real execution time • Future directions • Explore better alias analysis mechanism • Additional techniques for MRE

  18. Backup slides

  19. Dynamic memory redundancy

  20. Dynamic load redundancy

  21. Dynamic store redundancy

  22. Load redundancy elimination (LRE) move r1 , r0 move r0 , r2 --------------- • I2 can be removed! • I1 loads a value from memory into r1 • I2 loads from the same location into r2 • Location (p0) is not modified between I1 and I2 • r1 can be safely bypassed to r2 ... I1 load (p0), r1 ... I2 load (p0), r2 ...

  23. LRE on executable code • Alias analysis! • Register liveness analysis! move r1 , r0 move r0 , r2 --------------- • Is (p1) at I1the same memory location than (p2) at I2? • Is there any available register between I1 and I2 that can be used to bypass r1 to r2? ... I1 load (p1), r1 ... I2 load (p2), r2 ...

  24. LRE: Eliminating close redundancy For extended basic blocks (EBBs) Alias analysis: for disambiguation Register live analysis: for bypassing Profile-guided LRE There is not always a benefit in removing a redundant load ... I1 load (p0), r1 move r1 , r0 ... • Need to evaluate cost-benefit of applying LRE! ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path

  25. LRE: Eliminating distant redundancy load (p0), r0 move r0 ,r1 ---------------- move r1 ,r0 R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97 • For eliminating fully- andpartially- redundant loads • Requires insertion of speculative loads • Dataflow analysis [HorspoolHo97] • Extended cost equation • Complex search for available registers ... ... I2 load (p0),r1 ... I1 store r1 ,(p0) ...

  26. Load redundancy elimination (LRE) Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE Profile information is needed Eliminating close redundancy Within extended basic blocks (EBBs) Eliminating distant redundancy Intraprocedural dataflow analysis [HorspoolHo97] For fully/partially-redundant loads Partial-LRE requires insertion of speculative loads ... I1 load (p0), r1 move r1 , r0 ... ... I2 load (p0), r2 ... move r0 , r2 --------------- Hot Path R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97

  27. Path-sensitive LRE • Path-sensitive redundancy • Redundancy occurs only on some execution paths • Partial-LRE is not sufficient • Memory disambiguation • Using register use-def chains • Symbolic descriptors for every use • Path-sensitive memory disambiguation is needed! ... I0def p0 ... I1 load (p0),r1 ... ... I3 add p0,8,p0 ... IØØ-def p0 ... I2 load (p0),r2 ...

  28. Path-sensitive memory disambiguation • Path-sensitive information • Disambiguation is established for only a subset of all the possible paths • For detecting path-sensitive exact memory dependencies • Partial-LRE • Algorithm is now adapted for dealing with path-sensitive redundancies • Availability on edge (AVEDGij) ... I0def p0 ... I1 load (p0),r1 move r1, r0 ... ... I3 add p0,8,p0 load (p0),r0 ... IØØ-def p0 ... move r0, r2 I2 load (p0),r2 ... √ x ---------------

  29. A combined algorithm Short-distance MRE Basic MRE within EBBs Long-distance MRE Full Full-MRE Partial Partial-MRE Complete Path-sensitive LRE Partial SRE Dead store elimination Easy optimizations(including Basic-MRE) Easy optimizations(including Basic-MRE) Easy optimizations(including Basic-MRE) Function inlining Long-distance MRE(Full/Partial/Complete)

  30. Dynamic number of loads

  31. Dynamic number of stores

  32. Alpha 21264 results

More Related