1 / 24

CS 152 Computer Architecture & Engineering

CS 152 Computer Architecture & Engineering. Section 8 Spring 2010. Andrew Waterman. University of California, Berkeley. Mystery Die. Mystery Die. DEC Alpha 21264 15M transistors 600 MHz in 350 nm Highly speculative OoO superscalar. Mystery Die. Map/IQ. FUs. Bus. FUs. FP Map/IQ.

Download Presentation

CS 152 Computer Architecture & Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 152Computer Architecture & Engineering Section 8 Spring 2010 Andrew Waterman University of California, Berkeley

  2. Mystery Die

  3. Mystery Die DEC Alpha 21264 15M transistors 600 MHz in 350 nm Highly speculative OoO superscalar

  4. Mystery Die Map/IQ FUs Bus FUs FP Map/IQ FPU Bus Fetch I$ D$ DEC Alpha 21264 15M transistors 600 MHz in 350 nm Highly speculative OoO superscalar

  5. Alpha 21264 Pipeline

  6. Branch Prediction Local History Table Branch History Table Branch History Table PC Global History Two kinds of correlating branch predictors: Local Global

  7. Branch Prediction Local History Table Branch History Table Branch History Table PC Tournament Predictor Global History 21264 uses both! (tournament predictor) Local Global Prediction

  8. 21264 Fetch Line/way prediction keeps fetch loop short

  9. Alpha 21264 Pipeline

  10. 21264 Register Renaming Registers are renamed, then instructions are inserted into the issue queue Map table backed up on every in-flight insn

  11. 21264 Register Renaming • What hazards does renaming obviate? • In what situations is renaming useful? • If you had to choose between branch prediction and renaming, which would you pick?

  12. 21264 Register Renaming • What hazards does renaming obviate? • WAR, WAW • In what situations is renaming useful? • If you had to choose between branch prediction and renaming, which would you pick?

  13. 21264 Register Renaming • What hazards does renaming obviate? • WAR, WAW • In what situations is renaming useful? • Code with ILP and name dependencies: loops • If you had to choose between branch prediction and renaming, which would you pick?

  14. 21264 Register Renaming • What hazards does renaming obviate? • WAR, WAW • In what situations is renaming useful? • Code with ILP and name dependencies: loops • If you had to choose between branch prediction and renaming, which would you pick? • Not much ILP within a basic block, so renaming isn’t too useful without branch prediction

  15. Alpha 21264 Pipeline

  16. 21264 Superscalar Execution • The 21264 can decode, rename, issue, execute, and commit 4 insns/cycle • How does circuit complexity scale with W in the following operations? • Instruction decode • Register renaming • Result bypassing

  17. 21264 Superscalar Execution • The 21264 can decode, rename, issue, execute, and commit 4 insns/cycle • How does circuit complexity scale with W in the following operations? • Instruction decode: O(W) • Register renaming • Result bypassing

  18. 21264 Superscalar Execution • The 21264 can decode, rename, issue, execute, and commit 4 insns/cycle • How does circuit complexity scale with W in the following operations? • Instruction decode: O(W) • Register renaming: O(W2) • Result bypassing

  19. 21264 Superscalar Execution • The 21264 can decode, rename, issue, execute, and commit 4 insns/cycle • How does circuit complexity scale with W in the following operations? • Instruction decode: O(W) • Register renaming: O(W2) • Result bypassing: O(W2)

  20. 21264 Superscalar Execution • The 21264 can decode, rename, issue, execute, and commit 4 insns/cycle • How does circuit complexity scale with W in the following operations? • Instruction decode: O(W) • Register renaming: O(W2) • Result bypassing: O(W2) • What about issue window complexity?

  21. 21264 Superscalar Execution • 21264 couldn’t fit full bypassing into one clock cycle • Instead, they fully bypass within each of two clusters; inter-cluster bypass takes another cycle

  22. 21264 Instruction Reordering As mentioned earlier, 21264 uses explicit renaming, as opposed to data-in-ROB design What does ROB hold?

  23. Memory Ordering in the 21264 To execute the critical instruction path quickly, want to execute loads ASAP Initially, loads speculatively bypass stores On a misspeculation, set a “wait” bit for that load’s PC, so it will behave conservatively from then on Clear wait bits periodically

  24. Speculation in the 21264 • What does the 21264 speculate on? • Next I$ line/way • Branches, indirect jumps • Exceptions • Load/Store ordering • Load hit/miss • Shortens hit time by a cycle • Anything else?

More Related