1 / 61

Lecture 5. MIPS Processor Design Pipelined MIPS #2

COMP212 Computer Architecture. Lecture 5. MIPS Processor Design Pipelined MIPS #2. Prof. Taeweon Suh Computer Science Education Korea University. Pipelined Datapath. Pipelining Example . add $14, $5, $6. lw $13, 24($1). add $12, $3, $4. sub $11, $2, $3. lw $10, 20($1). 0. M. u. x.

kirti
Download Presentation

Lecture 5. MIPS Processor Design Pipelined MIPS #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP212 Computer Architecture Lecture 5. MIPS Processor Design Pipelined MIPS #2 Prof. Taeweon Suh Computer Science Education Korea University

  2. Pipelined Datapath

  3. Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d

  4. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw: Instruction Fetch (IF) Instruction fetch lw $s0, 8($t1)

  5. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw: Instruction Decode (ID) Instruction decode lw $s0, 8($t1)

  6. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw: Execution (EX) Execution lw $s0, 8($t1)

  7. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw: Memory (MEM) Memory lw $s0, 8($t1)

  8. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw: Writeback (WB) Writeback lw$s0, 8($t1)

  9. Corrected Datapath (for lw) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d lw$s0, 8($t1)

  10. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d sw: Memory (MEM) Memory sw $1, 4($2)

  11. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d sw: Writeback (WB): do nothing Writeback sw $1, 4($2)

  12. Pipeline Control Note that in this implementation, the branch is resolved in the MEM stage

  13. Pipeline Control • What needs to be controlled in each stage (IF, ID, EX, MEM, WB)? • IF: Instruction fetch and PC increment • ID: Instruction decode and operand fetch from register file and/or immediate • EX: Execution stage • RegDst • ALUop[1:0] • ALUSrc • MA: Memory stage • Branch • MemRead • MemWrite • WB: Writeback • MemtoReg • RegWrite(note that this signal is in ID stage)

  14. Pipeline Control • Extend pipeline registers to include control information created in ID stage • Pass control signals along just like the data

  15. Datapath with Control

  16. IF: lw $10, 9($1) P C S r c I D / E X 0 M W B u E X / M E M x 1 C o n t r o l M W B M E M / W B E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  17. IF: sub $11, $2, $3 ID: lw $10, 9($1) P C S r c I D / E X 0 11 M W B u E X / M E M “lw” x 010 1 C o n t r o l M W B M E M / W B 0001 E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  18. IF: and $12, $4, $5 ID: sub $11, $2, $3 EX: lw $10, 9($1) P C S r c I D / E X 0 11 10 M W B u E X / M E M “sub” x 010 000 1 C o n t r o l M W B 0 M E M / W B 1100 00 E X M W B I F / I D 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  19. IF: or $13, $6, $7 ID: and $12, $4, $5 EX: sub $11, $2, $3 MEM: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “and” x 000 000 11 1 C o n t r o l M W B 0 1 M E M / W B 1100 1 10 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  20. IF: add $14, $8, $9 ID: or $13, $6, $7 EX: and $12, $4, $5 MEM: sub $11, .. WB: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “or” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  21. IF: xxxx ID: add $14, $8, $9 EX: or $13, $6, $7 MEM: and $12… WB: sub $11, .. P C S r c I D / E X 0 10 10 M W B u E X / M E M “add” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  22. IF: xxxx ID: xxxx EX: add $14, $8, $9 MEM: or $13, .. WB: and $12… P C S r c 0 10 M I D / E X u E X / M E M x 000 10 W B 1 C o n t r o l W B 0 1 1 M E M / W B M 0 10 M W B 0 I F / I D 0 0 E X A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  23. IF: xxxx ID: xxxx EX: xxxx MEM: add $14, .. P C S r c I D / E X 0 M W B u E X / M E M x 10 1 M C o n t r o l W B 0 1 M E M / W B 0 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control WB: or $13…

  24. IF: xxxx ID: xxxx EX: xxxx MEM: xxxx WB: add $14.. P C S r c I D / E X 0 M W B u E X / M E M x 1 M C o n t r o l W B 1 M E M / W B E X M W B I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  25. Dependencies • Dependencies incur data and control hazards

  26. Data Hazard - Software Solution • Compiler techniques • Insert nop (0x0000_0000) between instructions • Where do we insert nops in the following example? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • However, it really slows us down! • Code scheduling reorganizes the code so that it relieves the dependencies between instructions

  27. Data Hazard - Forwarding • Don’t wait for them to be written to the register file • Use temporary results Ok.. Then, do we have to do this forwarding? • If the write to the register file occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this • If RF writes at the rising-edge of the clock, then? • Let’s stick to this for our project

  28. Forwarding WB ID EX MEM ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  29. MUX MUX Forwarding (from EX/MEM) WB ID EX MEM ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  30. MUX MUX Forwarding (from MEM/WB) WB ID EX MEM ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  31. MUX MUX Forwarding (operand selection) WB ID EX MEM ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Forwarding Unit

  32. MUX MUX MUX Forwarding (operand propagation) WB ID EX MEM ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Rd Rt EX/MEM Rd Forwarding Unit Rt Rs MEM/WB Rd

  33. I D / E X W B E X / M E M M W B C o n t r o l M E M / W B E X M W B I F / I D M n o u i t c x u r t R e g i s t e r s s n D a t a I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t Forwarding

  34. Can't always forward • lw (load word) can still cause a hazard • An instruction tries to read a register following a load instruction that writes to the same register • Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction

  35. Stalling • We can stall the pipeline by keeping an instruction in the same stage ID - - IF

  36. Data Hazard - Load-Use Case at cc3 at cc4 WB IF ID EX MEM nop or $8, $2, $6 and $4, $2, $5 lw$2, 20($1) lw$2, 20($1)

  37. Hazard Detection Unit • Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt) • Stall by letting an instruction (that won’t write anything) go forward

  38. Control Hazards - Branch • When the branch condition is resolved, other instructions are in the pipeline • It works like “not taken” prediction • If branch turns out to be taken, flush instructions Note that in this implementation, the branch is resolved in the MEM stage (Check out the slide #6)

  39. Alleviate Branch Hazards • Reduce penalty to 1 cycle • Move the branch compare to the ID stage of pipeline • Add an adder to calculate the branch target in ID stage • Add the IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register Taken target address is known here Branch is resolved here MEM MEM IF IF ID ID EX EX WB WB beq $1,$2,L1 Bubblee add $1,$2,$3 … MEM IF ID EX WB L1: sub $1,$2, $3

  40. Flushing Instructions I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  41. Control Hazard Handling Logic

  42. Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  43. Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C L2 m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  44. Flushing Instructions (cycle N+1) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) lw $4, 40($7) beq $1, $3, L2 nop I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  45. Improving Performance • Try and avoid stalls using hardware/software techniques • Software technique • Reorder instructions • Utilize the delay slot of branch • Hardware technique • Implement the delayed branch

  46. Performance of Pipelined CPU • Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a pipelined MIPS? CPU Time = # instsX CPI X clock cycle time (T) = # instsX CPI / f

  47. CPI Example • Ideally CPI = 1. But, need to handle stallings (by loads and branches) • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • What is the average CPI?

  48. CPI Example • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • If there is no stall in the pipelined MIPS, how would you calculate CPI? • Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • All jumps flush next instruction • What is the average CPI? • Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus • CPIlw = 1 (0.6) + 2 (0.4) = 1.4 • CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 • CPIjump = 2 (1) = 2 • Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

  49. Critical Path • Critical path of the pipelined MIPS: Tc = max { tpcq + tmem + tsetup,// IF stage 2(tRFread + tmux + teq+ tAND+ tmux + tsetup) , // ID stage tpcq + tmux+ tmux + tALU + tsetup ,// EX stage tpcq + tmemread + tsetup ,// MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? • If the write to the register file occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this • If RF writes at the rising-edge of the clock, then? • Let’s stick to this for our project

  50. Example Tc = 2(tRFread + tmux + teq+ tAND+ tmux + tsetup) = 2[150 + 25 + 40 + 15 + 25 + 20] ps= 550 ps

More Related