1 / 31

HW Speculation

HW Speculation. HW Support for Speculation. Ideal View Do conditional things in advance of the branch Undo them if the branch goes the wrong way Also implies undoing things on exception Limits Speculated Values can’t overwrite any real results

erma
Download Presentation

HW Speculation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HW Speculation

  2. HW Support for Speculation • Ideal View • Do conditional things in advance of the branch • Undo them if the branch goes the wrong way • Also implies undoing things on exception • Limits • Speculated Values can’t overwrite any real results • Exceptions can’t cause any destructive activity • Consider storage for destructive operations • Registers and data caches • Any ideas on what might help here?

  3. HW Speculation Aids • Poison bits on registers • Fault if regular instruction tries to use that value • HW (and OS) ignores exception until instruction commits • More general tags • Speculative instruction and results must be tagged until condition is resolved • Boosting • Provide separate shadow resources for boosted instruction • If condition resolves in a way that selects the boosted path • Then the results are committed to real registers • Note that this won’t work for memory…

  4. Aggression levels in Speculation • Consider if-then-else block • Traditional conservative method • Do them in order… • Using prediction • Start predicted path while evaluation condition • Either continue or nullify based on condition result • Aggressive • Start all three blocks • When condition is known, nullify unselected path • Implies LOTS of resources

  5. Hardware-Based Speculation • Combination of three key ideas • Dynamic branch prediction • Speculation - allow speculated blocks to start before condition resolution • Dynamic scheduling (Tomasulo-style) • Advantages • More instruction order flexibility - things tend to run as soon as they can • Dynamic memory disambiguation • Dynamic branch prediction works better than static • Able to maintain precise exceptions (not easy, but doable) • Relieves compiler from difficult machine-specific stuff

  6. HW Speculation Approach • Allow out of order ISSUE • Require in-order commit when instruction is no longer speculative • Prevent speculative changes from changing state • e.g. memory write or register write • Collect pre-commit instructions • in a reorder buffer • holds completed but not committed instruction • Effectively contains a set of virtual registers • similar to a reservation station • and becomes a bypass (forwarding) source

  7. HW support for More ILP • Need HW buffer for results of uncommitted instructions: reorder buffer • 3 fields: instr, destination, value • Reorder buffer can be operand source => more registers like RS • Use reorder buffer number instead of reservation station when execution completes • Supplies operands between execution complete & commit • Once operand commits, result is put into register • Instructions commit • As a result, its easy to undo speculated instructions on mispredicted branches or on exceptions Reorder Buffer FP Op Queue FP Regs Res Stations Res Stations FP Adder FP Adder

  8. Four Steps of Speculative Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station and reorder buffer slotfree, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4. Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

  9. Tomasulo With Reorder Buffer - Cycle 0

  10. Tomasulo With Reorder Buffer - Cycle 1

  11. Tomasulo With Reorder Buffer - Cycle 2

  12. Tomasulo With Reorder Buffer - Cycle 3

  13. Tomasulo With Reorder Buffer - Cycle 4

  14. Tomasulo With Reorder Buffer - Cycle 5

  15. Tomasulo With Reorder Buffer - Cycle 6

  16. Tomasulo With Reorder Buffer - Cycle 7

  17. Tomasulo With Reorder Buffer - Cycle 8

  18. Tomasulo With Reorder Buffer - Cycle 9

  19. Tomasulo With Reorder Buffer - Cycle 10

  20. Tomasulo With Reorder Buffer - Cycle 11

  21. Tomasulo With Reorder Buffer - Cycle 12

  22. Tomasulo With Reorder Buffer - Cycle 13 Figure 4.35 P 313

  23. Tomasulo With Reorder Buffer - Cycle 14

  24. Tomasulo With Reorder Buffer - Cycle 15

  25. Tomasulo With Reorder Buffer - Cycle 16 Need 36 more EX cycles for DIV to finish…

  26. Executing Across Branches • One implication of the reorder buffer is that you can more easily maintain precise interrupts • Another is supporting speculation across branches • Speculated state is in the reorder buffer • State which is written, but not yet committed is available for future speculated operations

  27. Example of Speculative State of Reorder Buffer First loop Second loop Multiply has just reached commit, so other instructions can start committing

  28. Renaming Registers • Common variation of speculative design • Reorder buffer keeps instruction information but not the result • Extend register file with extra renaming registers to hold speculative results • Rename register allocated at issue; result into rename register on execution complete; rename register into real register on commit • Operands read either from register file (real or speculative) or via Common Data Bus • Advantage: operands are always from single source (extended register file)

  29. Dynamic Scheduling in PowerPC 604 and Pentium Pro • Both In-order Issue, Out-of-order execution, In-order Commit Pentium Pro more like a scoreboard since central control vs. distributed

  30. Dynamic Scheduling in PowerPC 604 and Pentium Pro Parameter PPC PPro Max. instructions issued/clock 4 3 Max. instr. complete exec./clock 6 5 Max. instr. commited/clock 6 3 Window (Instrs in reorder buffer) 16 40 Number of reservations stations 12 20 Number of rename registers 8int/12FP 40 No. integer functional units (FUs) 2 2No. floating point FUs 1 1 No. branch FUs 1 1 No. complex integer FUs 1 0No. memory FUs 1 1 load +1 store Q: How pipeline 1 to 17 byte x86 instructions?

  31. Dynamic Scheduling in Pentium Pro • PPro doesn’t pipeline 80x86 instructions • PPro decode unit translates the Intel instructions into 72-bit micro-operations (­ DLX) • Sends micro-operations to reorder buffer & reservation stations • Takes 1 clock cycle to determine length of 80x86 instructions + 2 more to create the micro-operations • 12-14 clocks in total pipeline (­ 3 state machines) • Many instructions translate to 1 to 4 micro-operations • Complex 80x86 instructions are executed by a conventional microprogram (8K x 72 bits) that issues long sequences of micro-operations

More Related