1 / 46

Improving Compiler Heuristics with Machine Learning

Improving Compiler Heuristics with Machine Learning. Mark Stephenson Una-May O’Reilly Martin C. Martin Saman Amarasinghe Massachusetts Institute of Technology. System Complexities. Compiler complexity Open Research Compiler ~3.5 million lines of C/C++ code Trimaran’s compiler

elisabeth
Download Presentation

Improving Compiler Heuristics with Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Compiler Heuristics with Machine Learning Mark Stephenson Una-May O’Reilly Martin C. Martin Saman Amarasinghe Massachusetts Institute of Technology

  2. System Complexities • Compiler complexity • Open Research Compiler • ~3.5 million lines of C/C++ code • Trimaran’s compiler • ~ 800,000 lines of C code • Architecture Complexity

  3. NP-Completeness • Many compiler problems are NP-complete • Thus, implementations can’t be optimal • Compiler writers rely on heuristics • In practice, heuristics perform well • …but, require a lot of tweaking • Heuristics often have a focal point • Rely on a single priority function

  4. Priority Functions • A heuristic’s Achilles heel • A single priority or cost function often dictates the efficacy of a heuristic • Priority functions rank the options available to a compiler heuristic • List scheduling (identifying instructions in worklist to schedule first) • Graph coloring register allocation (selecting nodes to spill) • Hyperblock formation (selecting paths to include) • Any priority function is legal

  5. Our Proposal • Use machine-learning techniques to automatically search the priority function space • Increases compiler’s performance • Reduces compiler design complexity

  6. Qualities of Priority Functions • Can focus on a small portion of an optimization algorithm • Don’t need to worry about legality checking • Small change can yield big payoffs • Clear specification in terms of input/output • Prevalent in compiler heuristics

  7. An Example OptimizationHyperblock Scheduling • Conditional execution is potentially very expensive on a modern architecture • Modern processors try to dynamically predict the outcome of the condition • This works great for predictable branches… • But some conditions can’t be predicted • If they don’t predict correctly you waste a lot of time

  8. Misprediction Example OptimizationHyperblock Scheduling Assume a[1] is 0 time if (a[1] == 0) else resources

  9. Example OptimizationHyperblock Scheduling Assume a[1] is 0 time if (a[1] == 0) else resources

  10. Example OptimizationHyperblock Scheduling (using predication) Assume a[1] is 0 time if (a[1] == 0) else resources Processor simply discards results of instructions that weren’t supposed to be run

  11. Example OptimizationHyperblock Scheduling • There are unclear tradeoffs • In some situations, hyperblocks are faster than traditional execution • In others, hyperblocks impair performance • Many factors affect this decision • Accuracy of branch predictor • Availability of parallel execution resources • Effectiveness of the compiler’s scheduler • Parallelizability and predictability of the program • Hard to model

  12. Example OptimizationHyperblock Scheduling [Mahlke] • Find predicatable regions of control flow • Enumerate paths of control in region • Exponential, but in practice it’s okay • Prioritize paths based on four path characteristics • The priority function we want to optimize • Add paths to hyperblock in priority order

  13. Favor frequently executed paths Favor short paths Penalize paths with hazards Favor parallel paths Trimaran’s Priority Function

  14. Our Approach • Trimaran uses four characteristics • What are the important characteristics of a hyperblock formation priority function? • Our approach: Extract all the characteristics you can think of and let a learning technique find the priority function

  15. * / - predictability num_ops 2.3 4.1 Genetic Programming • Searching algorithm analogous to Darwinian evolution • Maintain a population of expressions

  16. Genetic Programming • Searching algorithm analogous to Darwinian evolution • Maintain a population of expressions • Selection • The fittest expressions in the population are more likely to reproduce • Reproduction • Crossing over subexpressions of two expressions • Mutation

  17. General Flow • Randomly generated initial population seeded with the compiler writer’s best guess Create initial population (initial solutions) Evaluation done? Selection Create Variants

  18. General Flow • Compiler is modified to use the given expression as a priority function • Each expression is evaluated by compiling and running the benchmark(s) • Fitness is the relative speedup over Trimaran’s priority function on the benchmark(s) Create initial population (initial solutions) Evaluation done? Selection Create Variants

  19. General Flow • Just as with Natural Selection, the fittest individuals are more likely to survive Create initial population (initial solutions) Evaluation done? Selection Create Variants

  20. General Flow • Use crossover and mutation to generate new expressions • And thus, generate new compilers Create initial population (initial solutions) Evaluation done? Selection Create Variants

  21. Experimental Setup • Collect results using Trimaran • Simulate a VLIW machine • 64 GPRs, 64 FPRs, 128 PRs • 4 fully pipelined integer FUs • 2 fully pipelined floating point FUs • 2 memory units (L1:2, L2:7, L3:35) • Replace priority functions in IMPACT with our GP expression parser and evaluator

  22. Outline of Results • High-water mark • Create compilers specific to a given application and a given data set • Essentially partially evaluating the application • Application-specific compilers • Compiler trained for a given application and data set, but run with an alternate data set • General-purpose compiler • Compiler trained on multiple applications and tested on an unrelated set of applications

  23. Training the Priority Function A.c B.c C.c D.c Compiler 1 2 A B C D

  24. Training the Priority FunctionApplication-Specific Compilers A.c B.c C.c D.c Compiler 1 2 A B C D

  25. 1.23 Hyperblock ResultsApplication-Specific Compilers (High-Water Mark) 3.5 Training input Novel input 3 (add (sub (cmul (gt (cmul $b0 0.8982 $d17)…$d7)) (cmul $b0 0.6183 $d28))) 2.5 (add (div $d20 $d5) (tern $b2 $d0 $d9)) 2 Speedup 1.5 1.54 1 0.5 0 toast Average huff_dec huff_enc rawdaudio mpeg2dec rawcaudio g721encode g721decode 129.compress

  26. Training the Priority FunctionGeneral-Purpose Compilers A.c B.c C.c D.c Compiler 1 A B C D

  27. 3.5 Train data set Novel data set 3.0 2.5 2.0 Speedup 1.5 1.44 1.25 1.0 0.5 0.0 toast codrle4 Average huff_enc huff_dec decodrle4 mpeg2dec rawdaudio rawcaudio 124.m88ksim g721decode g721encode 129.compress Hyperblock ResultsGeneral-Purpose Compiler

  28. 1.6 1.4 1.2 1.09 1.0 Speedup 0.8 0.6 0.4 0.2 0.0 art rasta djpeg 130.li unepic osdemo mipmap 085.cc1 132.ijpeg 052.alvinn Average 147.vortex 023.eqntott Validation of GeneralityTesting General-Purpose Applicability

  29. Running Time • Application specific compilers • ~1 day using 15 processors • General-purpose compilers • Dynamic Subset Selection [Gathercole] • Run on a subset of the training benchmarks at a time • Memoize fitnesses • ~1 week using 15 processors • This is a one time process! • Performed by the compiler vendor

  30. GP Hyperblock SolutionsGeneral Purpose (add (sub (mulexec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref) (mul 0.6727 num_paths) (mul 1.1609 (add (sub (mul (divnum_opsdependence_height) 10.8240) exec_ratio) (sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838) (sub 1.1039 num_ops_max)) (sub (muldependence_height_mean num_branches_max) num_paths))))))) Intron that doesn’t affect solution

  31. GP Hyperblock SolutionsGeneral Purpose (add (sub (mulexec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref) (mul 0.6727 num_paths) (mul 1.1609 (add (sub (mul (divnum_opsdependence_height) 10.8240) exec_ratio) (sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838) (sub 1.1039 num_ops_max)) (sub (muldependence_height_mean num_branches_max) num_paths))))))) Favor paths that don’t have pointer dereferences

  32. Favor highly parallel (fat) paths GP Hyperblock SolutionsGeneral Purpose (add (sub (mulexec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref) (mul 0.6727 num_paths) (mul 1.1609 (add (sub (mul (divnum_opsdependence_height) 10.8240) exec_ratio) (sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838) (sub 1.1039 num_ops_max)) (sub (muldependence_height_mean num_branches_max) num_paths)))))))

  33. If a path calls a subroutine that may have side effects, penalize it GP Hyperblock SolutionsGeneral Purpose (add (sub (mulexec_ratio_mean 0.8720) 0.9400) (mul 0.4762 (cmul (not has_pointer_deref) (mul 0.6727 num_paths) (mul 1.1609 (add (sub (mul (divnum_opsdependence_height) 10.8240) exec_ratio) (sub (mul (cmulhas_unsafe_jsrpredict_product_mean 0.9838) (sub 1.1039 num_ops_max)) (sub (muldependence_height_mean num_branches_max) num_paths)))))))

  34. Our Proposal • Use machine-learning techniques to automatically search the priority function space • Increases compiler’s performance • Reduces compiler design complexity

  35. Eliminate the Human from the Loop • So far we have tried to improve existing priority functions • Still a lot of person-hours were spent creating the initial priority functions • Observation: the human-created priority functions are often eliminated in the 1st generation • What if we start from a completely random population (no human-generated seed)?

  36. Another ExampleRegister Allocation • An old, established problem • Hundreds of papers on the subject • Priority-Based Register Allocation [Chow,Hennessey] • Uses a priority function to determine the worth of allocating a register • Let’s throw our GP system at the problem and see what it comes up with

  37. Register Allocation ResultsGeneral-Purpose Compiler 1.20 Train data set Novel data set 1.15 1.10 1.05 Speedup 1.03 1.03 1.00 0.95 0.90 average huff_enc huff_dec rawdaudio rawcaudio mpeg2dec g721encode g721decode 129.compress

  38. Validation of GeneralityTesting General-Purpose Applicability 1.08 1.06 1.04 1.02 1.02 Speedup 1.00 0.98 0.96 0.94 130.li djpeg unepic average codrle4 085.cc1 132.ijpeg decodrle4 147.vortex 023.eqntott 124.m88ksim

  39. Importance of Priority FunctionsSpeedup over a constant priority function 2.20 Trimaran's function GP's function 2.00 1.80 1.60 Speedup (over no function) 1.40 1.36 1.32 1.20 1.00 0.80 huff_enc average huff_dec rawcaudio rawdaudio mpeg2dec g721encode g721decode 129.compress

  40. Advantages our System Provides • Engineers can focus their energy on more important tasks • Can quickly retune for architectural changes • Can quickly retune when compiler changes • Can provide compilers catered toward specific application suites • e.g., consumer may want a compiler that excels on scientific benchmarks

  41. Related Work • Calder et al. [TOPLAS-19] • Fine tuned static branch prediction heuristics • Requires a priori classification by a supervisor • Monsifrot et al. [AIMSA-02] • Classify loops based on amenability to unrolling • Also used a priori classification • Cooper et al. [Journal of Supercomputing-02] • Use GAs to solve phase ordering problems

  42. Conclusion • Performance talk and a complexity talk • Take a huge compiler, optimize one priority function with GP and get exciting speedups • Take a well-known heuristic and create priority functions for it from scratch • There’s a lot left to do

  43. Why Genetic Programming? • Many learning techniques rely on having pre-classified data (labels) • e.g., statistical learning, neural networks, decision trees • Priority functions require another approach • Reinforcement learning • Unsupervised learning • Several techniques that might work well • e.g., hill climbing, active learning, simulated annealing

  44. Why Genetic Programming • Benefits of GP • Capable of searching high-dimensional spaces • It is a distributed algorithm • The solutions are human readable • Nevertheless…there are other learning techniques that may also perform well

  45. / 1.1 * / / - - * predictability predictability num_ops num_ops 7.5 7.5 2.3 2.3 4.1 4.1 branches Genetic ProgrammingReproduction * / * 1.1 branches

More Related