1 / 90

Computer Architecture

Computer Architecture. Goal: Build the best possible “processor” Here’s a piece of silicon Here some of its properties Tell me what to build 1. Understand your building blocks 2. Understand what is “best” means 3. Take into account design/production time. Track Record. Evolution?.

mikel
Download Presentation

Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture • Goal: • Build the best possible “processor” • Here’s a piece of silicon • Here some of its properties • Tell me what to build • 1. Understand your building blocks • 2. Understand what is “best” means • 3. Take into account design/production time ECE1773 - Fall ‘07 ECE Toronto

  2. Track Record ECE1773 - Fall ‘07 ECE Toronto

  3. Evolution? ECE1773 - Fall ‘07 ECE Toronto

  4. Modern Designs ECE1773 - Fall ‘07 ECE Toronto

  5. Understanding your Building Blocks ECE1773 - Fall ‘07 ECE Toronto

  6. Moore’s Law ECE1773 - Fall ‘07 ECE Toronto

  7. Moore’s Law in Practice ECE1773 - Fall ‘07 ECE Toronto

  8. The other Moore’s Law ECE1773 - Fall ‘07 ECE Toronto

  9. Technology Scaling ECE1773 - Fall ‘07 ECE Toronto

  10. Ideal Shrink vs. New Design ECE1773 - Fall ‘07 ECE Toronto

  11. Understanding what is Best ECE1773 - Fall ‘07 ECE Toronto

  12. Why Study Computer Architecture ECE1773 - Fall ‘07 ECE Toronto

  13. Why Study Computer Architecture ECE1773 - Fall ‘07 ECE Toronto

  14. Challenges in Computer Architecture ECE1773 - Fall ‘07 ECE Toronto

  15. Review of Modern Processor Architectures ECE1773 - Fall ‘07 ECE Toronto

  16. Sequential Execution Semantics • Contract: How the machine appears to behave ECE1773 - Fall ‘07 ECE Toronto

  17. Dissecting Instructions • Data Movement • Data Manipulation • Control Flow ECE1773 - Fall ‘07 ECE Toronto

  18. An Instruction in a Processor’s Lifetime ECE1773 - Fall ‘07 ECE Toronto

  19. Pipelining ECE1773 - Fall ‘07 ECE Toronto

  20. Sequential Semantics are Preserved ECE1773 - Fall ‘07 ECE Toronto

  21. Superscalar - In-order • Two or more consecutive instructions in the original program order can execute in parallel • This is the dynamic execution order • N-way Superscalar • Can issue up to N instructions per cycle • 2-way, 3-way, … fetch decode ld fetch decode add fetch decode sub fetch decode bne ECE1773 - Fall ‘07 ECE Toronto

  22. Data Dependences ECE1773 - Fall ‘07 ECE Toronto

  23. Superscalar vs. Pipelining loop: ld r2, 10(r1) add r3, r3, r2 sub r1, r1, 1 bne r1, r0, loop Pipelining: sum += a[i--] time fetch decode ld fetch decode add fetch decode sub fetch decode bne Superscalar: fetch decode ld fetch decode add fetch decode sub fetch decode bne ECE1773 - Fall ‘07 ECE Toronto

  24. Superscalar Performance • Performance Spectrum? • What if all instructions were dependent? • Speedup = 0, Superscalar buys us nothing • What if all instructions were independent? • Speedup = N where N = superscalarity • Again key is typical program behavior • Some parallelism exists ECE1773 - Fall ‘07 ECE Toronto

  25. “Real-Life” Performance • OLTP = Online Transaction Processing SOURCE: Partha Ranganathan Kourosh Gharachorloo** Sarita Adve* Luiz André Barroso** Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors ASPLOS98 ECE1773 - Fall ‘07 ECE Toronto

  26. “Real Life” Performance • SPEC CPU 2000: Simplescalar sim: 32K I$ and D$, 8K bpred ECE1773 - Fall ‘07 ECE Toronto

  27. Issue Mechanism – A Group of Instructions at Decode • Assume 2 source & 1 target max per instr. • comparators for 2-way: • 3 for tgt and 2 for src (tgt: WAW + WAR, src: RAW) • comparators for 4-way: • 2nd instr: 3 tgt and 2 src • 3rd instr: 6 tgt and 4 src • 4th instr: 9 tgt and 6 src tgt src1 src1 • simplifications • may be possible • resource checking • not shown tgt src1 src1 Program order tgt src1 src1 ECE1773 - Fall ‘07 ECE Toronto

  28. Preserving Sequential Semantics • In principle not much different than pipelining • Program order is preserved in the pipeline • Some instructions proceed in parallel • But order is clearly defined • Defer interrupts to commit stage (i.e., writeback) • Flush all subsequent instructions • may include instructions committing simultaneously • Allow all preceding instructions to commit • Recall comparisons are done in program order • Must have sufficient time in clock cycle to handle these ECE1773 - Fall ‘07 ECE Toronto

  29. Interrupts Example Exception raised Exception taken fetch decode ld fetch decode add fetch decode div fetch decode bne fetch decode bne Exception raised Exception raised Exception taken fetch decode ld fetch decode add fetch decode div fetch decode bne fetch decode bne ECE1773 - Fall ‘07 ECE Toronto

  30. Case Study: Alpha 21164 ECE1773 - Fall ‘07 ECE Toronto

  31. 21164: Int. Pipe ECE1773 - Fall ‘07 ECE Toronto

  32. 21164: Memory Pipeline ECE1773 - Fall ‘07 ECE Toronto

  33. 21164: Floating-Point Pipe ECE1773 - Fall ‘07 ECE Toronto

  34. Performance Comparison Source: ECE1773 - Fall ‘07 ECE Toronto

  35. CPI Comparison: Ideal 0.25 ECE1773 - Fall ‘07 ECE Toronto

  36. Compiler Impact Optimized Base Performance ECE1773 - Fall ‘07 ECE Toronto

  37. Stall Cycles - 21164 Data Dependences/Data Stalls No instructions ECE1773 - Fall ‘07 ECE Toronto

  38. Issue Cycle Distribution - 21064 ECE1773 - Fall ‘07 ECE Toronto

  39. Issue Cycle Distribution - 21164 ECE1773 - Fall ‘07 ECE Toronto

  40. Stall Cycles Distrubution • Model: When no instruction is committing Does not capture overlapping factors: Stall due to dependence while committing Stall due to cache miss while committing ECE1773 - Fall ‘07 ECE Toronto

  41. Replay Traps • Tried to do something and couldn’t • Store and write-buffer is full • Can’t complete instruction • Load and miss-address-file full • Can’t complete instruction • Assumed Cache hit and was miss • Dependent instructions executed • Must re-execute dependent instructions • Re-execute the instruction and everything that follows ECE1773 - Fall ‘07 ECE Toronto

  42. Replay Traps Explained • ld r1 • add _, r1 F D E M W Cache hit F D D E M W F D E M M W Cache miss F D D D E M W ECE1773 - Fall ‘07 ECE Toronto

  43. M D E Optimistic Scheduling • ld r1 • add _, r1 F D E M W Cache hit F D D E M W Hit/miss known here add should start execution here Must decide that add should execute Start making scheduling decisions ECE1773 - Fall ‘07 ECE Toronto

  44. M D E Optimistic Scheduling #2 • ld r1 • add _, r1 F D E M W Cache hit F D D E M W Hit/miss known here Guess Hit/Miss add should start execution here Must decide that add should execute Start making scheduling decisions ECE1773 - Fall ‘07 ECE Toronto

  45. Stall Distribution ECE1773 - Fall ‘07 ECE Toronto

  46. 21164 ECE1773 - Fall ‘07 ECE Toronto

  47. Instruction Decode/Issue • Up to four insts/cycle • Naturally aligned groups • Must start at 16 byte boundary (INT16) • Simplifies Fetch path Where instructions come from? I-Cache: CPU needs: ECE1773 - Fall ‘07 ECE Toronto

  48. Fetching Four Instructions Where instructions come from? I-Cache: CPU needs: Software must guarantee alignment at 16 byte boundaries Lots of NOPs ECE1773 - Fall ‘07 ECE Toronto

  49. Instruction Decode/Issue • Up to four insts/cycle • Naturally aligned groups • Must start at 16 byte boundary (INT16) • Simplifies Fetch path (in a second) • All of group must issue before next group gets in • Simplifies Scheduling • No need for reshuffling ECE1773 - Fall ‘07 ECE Toronto

  50. Pipeline Processing Front-End ECE1773 - Fall ‘07 ECE Toronto

More Related