320 likes | 494 Views
The Return of Synthetic Benchmarks. January 28, 2008. Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin. Outline.
E N D
The Return of Synthetic Benchmarks January 28, 2008 Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Benchmark Spectrum Complete Application Code Application Suites e.g. SPEC CPU Kernel Codes e.g. Livermore Loops Synthetic Benchmarks e.g. Dhrystone, Whetstone Microbenchmarks e.g. STREAM Toy Benchmarks e.g. Heap sort Less Development Effort More Scalable More Maintainable Less Representative More Development Effort Less Scalable Less Maintainable More Representative
Focus on Simulation Time Reduction • Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03] • Representative Sampling [Sherwood et al., ASPLOS’02] • Reduced Input Set [ KleinOsowski, CAN’04] • Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05] Benchmark Subsetting [Eeckhout et al., PACT’02] [Vandierendonck et al., CAECW’04] [Phansalkar et al., ISPASS’05] [Eeckhout et al. IISWC’05] • Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04] • Speedup Simulation[Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]
Motivation : Benchmarking Challenges • Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications • Single-Point Performance Characterization Application Benchmarks are Rigid • Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade • Studying Commercial Workload Performance Early Design Stage Power/Performance Studies Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction
Resurgence of Synthetic Benchmarks….. IEEE Computer, August 2003
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Workload Synthesis: Central Idea Just 40 workload characteristics
Modeling Real-World Applications Microarchitecture-Independent Workload Profiling Modeling Workload Attributes into Synthetic Workload Experiment Environment Real World Proprietary Workload Workload Profiler Binary Instrumentation OR Simulation Real Hardware Workload Synthesizer Synthetic Benchmark Clone Workload Profile = Workload Attributes + Distribution Of Attribute Values Execution Driven Simulator
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Capturing The Essence of Workloads • Attributes to capture inherent workload behavior – Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate • Modeling Locality & Control Flow Predictability – Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches
Modeling Data Access Pattern • Identify streams of data references • A Stream? • – Sequence of memory addresses in an arithmetic progression • – Elements of arrays A, B, and C form 3 streams • for( ii = 0; ii < N; ii ++) • A [ii]= B [ii] +C [ii] • 200, 204, 208 .. 320, 324, 328 .. 404, 408, 412 ... • Issuing Sequence :320,404,200,324,408,204…. • Streams are interleaved and may contain noise • 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …
Extracting Streams • Reference pattern of static Load / Store Instructions – PC-correlated spatial locality -Dependence on address referenced by nearby Ld / St - Programs with pointer chasing codes – PC-correlated temporal locality - Dependence on previous address generated by same Ld / St - Programs with multidimensional arrays • Could static Load / Store instructions be natural sources of streams ? • Profile every static Load / Store instruction –Number of different strides with which it accesses data
Modeling Instruction Level Parallelism Dependency Distance ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R1) SUB R8,R2,R1 Read After Write Dependency Distance = 3 Measure Distribution of Dependency Distances Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32
Modeling Control Flow Predictability • Capture behavior of easy and difficult to predict branches • Inherent program feature that captures branch behavior • Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed • Branches with low transition-rate (easier to predict) TTTTTTTTTN, NNNNNNNNNT • Branches with high transition-rate (easier to predict) TNTNTNTNTN • Branches with moderate transition-rate (tougher to predict)
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Workload Synthesis (1) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile
Workload Synthesis (2) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile
Workload Synthesis (3) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A Branching Model – Based on Transition Rate BR 0.1 B 0.9 D Workload Profile
Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A BR 0.8 0.2 B C BR BR 1.0 1.0 D BR 0.1 0.9 Workload Synthesis (4) Memory Access Model (Strides) A B 1 Big Loop D A B D Synthetic Clone Generation A C D A Branching Model – Based on Transition Rate B D Workload Profile Register Assignment C code with asm & volatile constructs
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Evaluation of BenchMaker • SPEC CPU2000, SPECjbb2005, and DBT2 workloads • Validated Sim-Alpha Performance Model of Alpha 21264
Performance Correlation Trade Accuracy for Flexibility – Average Error of 11%
Energy/Power Correlation Average Error of 13%
Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary
Modeling Impact of Benchmark Drift Increase in Code Footprint (hypothetical) Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)
Summary • Synthetic Benchmarks to Address Benchmarking Challenges • Constructing Synthetic Benchmarks from Hardware-Independent Characteristics • Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift
Questions? Ajay’s email: ajoshi@ece.utexas.edu