1 / 22

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution. Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt. The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation. Talk Outline.

medgar
Download Presentation

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wish BranchesCombining Conditional Branching and Predication for Adaptive Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation

  2. Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion

  3. (normal branch code) A A T N if (cond) { b = 0; } else { b = 1; } B C B C D D A p1 = (cond) branch p1, TARGET B mov b, 1 jmp JOIN C TARGET: mov b,0 Predicated Execution (predicated code) Convert control flow dependency to data dependency Pro: Eliminate hard-to-predict branches A p1 = (cond) (!p1) mov b,1 (p1) mov b,0 B C D add x, b, 1 Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved

  4. The Overhead of Predicated Execution -2% 16% 13% non-predicated p1 = (cond) (!p1) mov b,1 (p1) mov b,0 p1 = (cond) (0) mov b,1 (1)mov b,0 A B C D add x, b, 1 (Predicated code) If all overhead is ideally eliminated, predicated execution would provide 16% improvement in average execution time

  5. The Problem • Due to the predication overhead, predicated execution sometimes reduces performance • Branch misprediction characteristics are dependent on run-time behavior: input set, control-flow path andphase behavior. The compiler cannot accurately estimate the run-time behavior of branches

  6. Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion

  7. Wish Branches • A new type of control flow instruction 3 types: wish jump/join and wish loop • The compilergenerates code (with wish branches) that can be executed either as predicated code or non-predicated code (normal branch code) • The hardwaredecides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction • Easy to predict: normal branch code • Hard to predict: predicated code

  8. A A T N B C B C D D A A p1 = (cond) (!p1) mov b,1 (p1) mov b,0 B B mov b, 1 jmp JOIN C C TARGET: mov b,0 normal branch code predicated code Wish Jump/Join High Confidence Low Confidence A wish jump nop B wish join Taken Not-Taken C D A p1=(cond) wish.jump p1 TARGET p1 = (cond) branch p1, TARGET B nop (!p1) mov b,1 wish.join !p1JOIN (1) mov b,1 wish.join (1) JOIN C TARGET: (1) mov b,0 TARGET: (p1) mov b,0 D JOIN: wish jump/join code

  9. do { a++; i++; } while (i<N); Wish Loop H X T X T N N High Confidence Low Confidence Y Y H mov p1, 1 LOOP: (p1) add a, a, 1 (p1) add i, i, 1 (p1) p1 = (cond) wish. loopp1, LOOP EXIT: X X LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP EXIT: (1) (1) (1) Y Y wish loop code normal backward branch code

  10. Mispredicted Case 1: Early-Exit Compared to normal branch code: predicate data dependency and one extra instruction(-) H X1 X2 X3 Y H Correct execution: T T N X T Early-exit: (Low confidence) Flush pipeline N H X1 X2 Y … T N Y X3 Y N

  11. Mispredicted Case 2: Late-Exit Compared to normal branch code: pro: reduce flush penalty (+++) cons: predicate data dependency and one extrainstruction(-) H Correct execution: X1 X2 X3 Y H T T N X T nop nop Late-exit: (Low confidence) N H X1 X2 X3 X4 X5 Y … T T T T N Y

  12. Mispredicted Case 3: No-Exit Compared to normal branch code: predicate data dependency and one extra instruction(-) H X1 X2 X3 Y H Correct execution: T T N Flush pipeline X T No-exit: (Low confidence) N H X1 X2 X3 X4 X5 X6 … T T T T T T Y Y

  13. Advantages/Disadvantages of Wish Branches • Advantages compared to predicated execution • Reduce the overhead of predication • Increase the benefits of predicated code by allowing the compiler to generate more aggressively-predicated code • Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops) • Make predicated code less dependent on machine configuration (eg. branch predictor)

  14. Advantages/Disadvantages of Wish Branches • Disadvantages compared to predicated execution • Extra branch instructions use machine resources • Extra branch instructions increase the contention for branch predictor table entries • May constrain the compiler’s scope for code optimizations

  15. Wish Branch Support • ISA Support • predicated execution, wish branch instruction • Compiler Support • Wish branch generation algorithms The compiler needs to decide which branches are predicated, which are converted to wish branches, and which stay as normal branches • Hardware Support • Confidence estimator • Front-end and branch misprediction detection/recovery module

  16. Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion

  17. Experimental Infrastructure • IA-64 provides full support for predication • Convert IA-64 traces to micro-ops to simulate an out-of-order superscalar processor model Source Code IA-64 Binary IA-64 Trace µops IA-64 Compiler (ORC) Micro-op Translator Micro-op Simulator Trace generation module

  18. Simulation Methodology • Nine SPEC 2000 integer benchmarks • Baseline Processor Configuration • Front End • Large and accurate branch predictor(64KB hybrid branch predictor: gshare + local) • Minimum 30-cycle branch misprediction penalty • 64KB, 2-cycle latency I-cache • Execution Core • 8-wide out-of-order processor • 512-entry instruction window • Confidence Estimator • 1KB tagged 16-bit history JRS confidence estimator (Jacobsen et al. MICRO-29)

  19. Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion

  20. Performance Improvement -4% 14% 2.02 8% 24% non-predicated 16% over conditional branch prediction (w/o mcf) 11% over selective-predication (w/o mcf) 7 % over aggressive predication (w/o mcf) 14% over conditional branch prediction and 13% over selective-predication and 16% over aggressive-predication 12% over conditional branch prediction 11% over selective-predication 13 % over aggressive predication AGGRESSIVE-PREDICATION: all branches that are suitable for if-conversion are predicated SELECTIVE-PREDICATION: branches are selectively predicated using compile-time cost-benefit analysis

  21. Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion

  22. Conclusion • New control flow instructions: wish branches (jump/join/loop) • Wish branches improve performance by dividing the work of predication between the compiler and the microarchitecture • Compiler: analyzes the control-flow graph and generates code • Microarchitecture: makes run-time decision to use predication • Wish branches provide significant performance benefits • 16% compared to conditional branch prediction • 13% compared to selectively predicated code • Wish branches can make predicated execution more viable and effective in high performance processors • By enablingadaptive and aggressive predicated execution

More Related