1 / 20

Statistical Simulation of Superscalar Architectures using Commercial Workloads

Statistical Simulation of Superscalar Architectures using Commercial Workloads. Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001. Outline. Introduction Statistical Simulation

Download Presentation

Statistical Simulation of Superscalar Architectures using Commercial Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001

  2. Outline • Introduction • Statistical Simulation • Statistical profiling • Synthetic trace generation • Methodology • Evaluation • Conclusion

  3. Introduction • Architectural simulation • trace-driven or execution-driven • accurate • long simulation times • long traces to be stored • Need for fast simulation techniques • take part of a full trace • analytical modeling • trace sampling • statistical simulation

  4. Goal • Previous work used SPEC benchmarks to evaluate statistical simulation • In this talk we use both commercial and scientific workloads • SPECint, SPECfp, system traces, multimedia, X graphics, database

  5. Statistical Simulation • Three steps: • extract statistical profile from a program execution • generate synthetic trace from it • simulate on a trace-driven simulator • Two major advantages: • statistical profile is more compact than full trace • fast simulation due to statistical nature • design space exploration in limited time

  6. statistical profile synthetic trace generator synthetic trace trace-driven simulator Statistical Simulation real trace (e.g. SPEC benchmark) branch profiling cache profiling instruction profiling branch statistics cache statistics instruction statistics

  7. Statistical Profiling • Microarchitecture-independent statistics • instruction statistics • Microarchitecture-dependent statistics • branch statistics • cache statistics • Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed)

  8. Statistical ProfilingInstruction Statistics • Instruction mix (13 classes) • Number of register operands • Age of register operands • probability that register operand was produced  instructions before it in the trace (only RAW) • Memory dependencies • probability that load is memory-dependent on the -th store before it in the trace (only RAW)

  9. Statistical ProfilingBranch Statistics • Six branch types • conditional branch, unconditional branch, call with offset, indirect jump, indirect call, return • Distinction • branch prediction accuracy: refill pipeline on branch misprediction • branch target prediction accuracy: single-cycle bubble in pipeline on correct branch prediction but target misprediction

  10. Statistical ProfilingCache Statistics • D-cache statistics • L1 D-cache miss rate • L2 D-cache miss rate • I-cache statistics • L1 I-cache miss rate • L2 I-cache miss rate

  11. st add ld br Synthetic Trace Generation • Instruction-by-instruction • through random number generation • Determine • instruction type • number of operands • age of register operands • memory dependency • branch behavior • D-cache behavior • I-cache behavior I-cache miss D-cache miss mispredicted

  12. Methodology: microarchitecture • Out-of-order processor • 8 and 16 issue • windows of 64 and 128 instructions • McFarling branch predictor • ‘small’ cache configuration • 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache • ‘large’ cache configuration • 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache, 512KB 4WSA unified L2 cache • Access time • L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles)

  13. Methodology: benchmarks • 8 SPECint95 benchmarks • 5 SPECfp95 benchmarks (hydro2d, su2cor, swim, tomcatv, wave5) • 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc, sdet, nroff, groff) • 4 MediaBench applications (g721, gs, gsm, mpeg2) • 4 X graphics benchmarks (DooM, POVRay, Xanim, Quake) • 2 TPC-D queries running on Postgres 6.3 • ~ 200 million instructions / trace

  14. Evaluation • IPC prediction error = IPC real trace - IPC synthetic trace IPC real trace • IPC real trace = IPC when running real trace on trace-driven simulator • IPC synthetic trace = IPC when running synthetic trace generated from the statistical profile of the real trace • Simulation speed: sIPC/xIPC less than 1% after simulating 1 million instructions

  15. IPC prediction error (1) high D-cache miss rate 157% 135% 40% 30% 20% 10% IPC prediction error 0% -10% -20% -30% li go gs gs perl jpeg sdet gcc ijpeg nroff groff verilog gsm_e swim mpeg2 xanim mpeg tpc-d.2 vortex wave5 su2cor xdoom xquake xpovray g721_e hydro2d tomcatv tpc-d.17 real_gcc m88ksim compress SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, ‘small’ cache configuration

  16. IPC prediction error (2) 30% 20% 10% IPC prediction error 0% -10% -20% -30% li go gs gs jpeg gcc sdet ijpeg perl groff nroff swim verilog gsm_e mpeg mpeg2 xanim vortex tpc-d.2 wave5 xquake su2cor xdoom g721_e xpovray tomcatv tpc-d.17 real_gcc hydro2d m88ksim compress SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, ‘large’ cache configuration

  17. IPC prediction error vs. static instruction count 160% w = 64; i = 8; 'small' cache 140% w = 128; i = 16; 'small' cache 120% w = 64; i = 8; 'large' cache nroff jpeg (IBS) verilog sdet 100% w = 128; i = 16; 'large' cache 80% mpeg (IBS) groff gcc DooM Quake gs (IBS) IPC prediction error 60% 40% 20% 0% gcc (IBS) vortex go TPC-D -20% -40% 0 20000 40000 60000 80000 100000 120000 140000 160000 static instruction count (number of instructions executed at least once)

  18. Conclusion (1) • Higher IPC prediction errors for applications with smaller static instruction count: • MediaBench applications • SPECfp95 benchmarks • 2 X graphics benchmarks (POVRay and Xanim) • 5 SPECint95 benchmarks

  19. Conclusion (2) • Smaller IPC prediction errors for applications with larger instruction footprint: • IBS system traces • TPC-D traces • 2 X graphics benchmarks (DooM and Quake) • 3 SPECint95 benchmarks (go, gcc, vortex) • IPC prediction error between -1% and 25%

  20. Conclusion (3) • Statistical simulation is a useful fast simulation technique for commercial workloads • due to higher variability in instructions • since commercial workloads have larger instruction footprint • which makes a statistical technique more powerful

More Related