400 likes | 509 Views
Write Back. Memory Access. Execute. Instruction Decode / Operand Fetch. Instruction Fetch. Processor Basic steps to process an instruction. IF. ID/OF. EX. MEM. WB. zero. +4. A. A L U. Data Mem. Inst. Mem. Reg. PC. IR. B. imm. Memory Access. Execute. Inst. Dec.
E N D
Write Back Memory Access Execute Instruction Decode / Operand Fetch Instruction Fetch Processor Basic steps to process an instruction IF ID/OF EX MEM WB EE524/CptS561 Jose G. Delgado-Frias
zero +4 A A L U Data Mem. Inst. Mem. Reg PC IR B imm Memory Access Execute Inst. Dec. Op. Fetch Write Back Instruction Fetch Datapath Multiplexers (mux) NPC A Reg[IR 6..10] B Reg[IR 11..15] Imm ((IR16)16## IR 11..15] IR Mem[PC] NPC PC + 4 EE524/CptS561 Jose G. Delgado-Frias
zero +4 A A L U Data Mem. Inst. Mem. Reg PC IR B imm Datapath (Arith/Logic Inst.) A Reg[IR 6..10] B Reg[IR 11..15] Imm ((IR16)16## IR 11..15] ALUoutput A op B ALUoutput A op Imm IR Mem[PC] NPC PC + 4 Reg[IR16..20] ALUoutput EE524/CptS561 Jose G. Delgado-Frias
zero +4 A A L U Data Mem. Inst. Mem. Reg PC IR B imm Datapath (Load Inst.) A Reg[IR 6..10] B Reg[IR 11..15] Imm ((IR16)16## IR 11..15] ALUoutput A op Imm IR Mem[PC] NPC PC + 4 Reg[IR11-15] LMD EE524/CptS561 Jose G. Delgado-Frias
zero +4 A A L U Data Mem. Inst. Mem. Reg PC IR B imm Datapath (Store Inst.) A Reg[IR 6..10] B Reg[IR 11..15] Imm ((IR16)16## IR 11..15] ALUoutput A op Imm IR Mem[PC] NPC PC + 4 Mem[ALUoutput] B EE524/CptS561 Jose G. Delgado-Frias
zero +4 A A L U Data Mem. Inst. Mem. Reg PC IR B imm Datapath (Branch Inst.) A Reg[IR 6..10] B Reg[IR 11..15] Imm ((IR16)16## IR 11..15] ALUoutput (PC+4) op Imm IR Mem[PC] NPC PC + 4 EE524/CptS561 Jose G. Delgado-Frias
Instructions of a program 1 IF ID EX MEM WB 2 IF ID EX WB 3 IF ID Time (clock cycles) EE524/CptS561 Jose G. Delgado-Frias
Instructions of a program 1 IF ID EX MEM WB 2 IF ID EX MEM WB 3 IF ID EX MEM ID WB 4 IF ID EX MEM WB 5 IF ID EX MEM WB 6 IF ID EX MEM 7 IF ID EX 8 IF ID CLOCK CYCLE EE524/CptS561 Jose G. Delgado-Frias
Pipelining Lessons • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup EE524/CptS561 Jose G. Delgado-Frias
zero +4 A L U Data Mem. Inst. Mem. Reg PC Datapath w/ pipeline IF ID EX MEM WB Clock Pipeline registers EE524/CptS561 Jose G. Delgado-Frias
zero +4 A L U Data Mem. Inst. Mem. Reg PC Datapath w/ pipeline EE524/CptS561 Jose G. Delgado-Frias
ID/OF ID/OF ID/OF ID/OF ID/OF ID/OF ID/OF ID/OF EX EX EX EX EX EX EX MEM MEM MEM MEM MEM MEM WB WB WB WB WB IF IF IF IF IF IF IF IF Pipeline 1 IF 2 3 4 INSTRUCTIONS 5 6 7 8 9 1 2 3 4 5 6 7 8 9 CLOCK CYCLE EE524/CptS561 Jose G. Delgado-Frias
Pipeline Hazards • Structural Hazards • two or more instructions use same hardware at the same time. • Data Hazards • Data dependencies • Result from inst. j is needed by inst. k • Control Hazards • Branch changes flow, what happen with the following instruction(s) EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Resources Mem (IM) EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Data Hazards Mem (IM) R1 R2+R3 R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Data Forwarding Mem (IM) R1 R2+R3 R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias
zero +4 A L U Data Mem. Inst. Mem. Reg PC Forwarding unit Datapath w/ pipeline EE524/CptS561 Jose G. Delgado-Frias
XOR R7,R8,R1 SUB R4,R3,R1 ADD R1,R2,R3 SUB R4,R3,R1 ADD R1,R2,R3 XOR R7,R8,R1 SUB R4,R3,R1 ADD R1,R2,R3 zero +4 A L U Data Mem. Inst. Mem. Reg PC Forwarding unit Example ADD R1,R2,R3 EE524/CptS561 Jose G. Delgado-Frias
XOR R7,R8,R1 SUB R4,R3,R1 ADD R1.. zero +4 A L U Data Mem. Inst. Mem. Reg PC Forwarding unit Example EE524/CptS561 Jose G. Delgado-Frias
XOR R7,R8,R1 SUB R8,R3,R1 ADD R1.. zero +4 A L U Data Mem. Inst. Mem. Reg PC Forwarding unit Example EE524/CptS561 Jose G. Delgado-Frias
Data Hazard Classification j: R1 k: RY R1 • RAW (Read After Write) • w/ forward only load presents a problem • WAW • WAR • RAR j: R1 k: R1 j: R1 k: R1 j: R1 k: R1 EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Data Forwarding (load) Mem (IM) R1 LD[Mem] R5 R1+R3 R8 R1-R6 EE524/CptS561 Jose G. Delgado-Frias
ID EX MEM WB IF ID stall EX MEM WB IF stall ID EX MEM stall IF ID EX Data hazard (load) “R1” LW R1,0(R1) IF SUB R4,R1,R5 AND R6,R1,R7 OR R8,R1,R9 EE524/CptS561 Jose G. Delgado-Frias
Branch BR R1, LABEL_A ADD R2,R3,R7 AND R5,R7,R11 : : LABEL_A: LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Branch Mem (IM) BR R1, LABEL_A ADD R2,R3,R7 AND R5,R7,R11 LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias
zero +4 A L U Data Mem. Inst. Mem. Reg PC Forwarding unit Datapath w/ pipeline EE524/CptS561 Jose G. Delgado-Frias
What to do w/ branch • Reduce the number of cycles to decide on a branch. • Delayed branch (Software Solutions) • NO-OP • move instructions • from before • from target • from fall through EE524/CptS561 Jose G. Delgado-Frias
Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Mem (IM) Mem (DM) Reg Reg ALU Branch Mem (IM) BR R1, LABEL_A ADD R2,R3,R7 LD R4,R2,005 EE524/CptS561 Jose G. Delgado-Frias
NO-OP Branch NO-OP EE524/CptS561 Jose G. Delgado-Frias
From Before Branch EE524/CptS561 Jose G. Delgado-Frias
From Target Branch EE524/CptS561 Jose G. Delgado-Frias
From Fall Through Branch EE524/CptS561 Jose G. Delgado-Frias
Multicycle Operations EX inst. unit FP multiply FP adder FP divider I F I D MEM W B 33
FP operations FP Add: 4 cycles FP Multiply: 7 cycles FP Divide: 25 cycles 34
Example Execution starts in order Out of order completion 1 IF 2 ID IF 3 m1 ID IF 4 m2 a1 ID IF 5 m3 a2 X ID 6 m4 a3 M X 7 m5 a4 W M 8 m6 M W 9 m7 W 10 M 11 W MULTD ADDD LD SD 35
MIPS R4000(Superpipelining) IF IS RF EX DF DS TC WB ALU instruction memory Reg data memory Reg DF: Data fetch First half DS: Data fetch Second half TC: Tag Check WB: Write Back IF: Instruction fetch First half IS: Instruction fetch Second half RF: Inst. Decode & Register Fetch EX: Execution 36
Load ALU ALU ALU ALU CC1 CC2 CC3 CC4 CC5 CC6 CC7 LW R1 instruction memory Reg data memory Reg instruction memory Reg data memory Reg Instruction 1 instruction memory Reg data memory Reg Instruction 2 instruction memory Reg data memory ADD R2,R1 37
Branch ALU ALU ALU ALU ALU BEQZ instruction memory Reg data memory Reg instruction memory Reg data memory Reg instruction memory Reg data memory Reg instruction memory Reg data memory instruction memory Reg data memory 38
Branch (taken) Branch inst IF IS RF EX DF DS TC WB Delay slot IF IS RF EX DF DS TC WB stall S S S S S S S S stall S S S S S S S S Branch target IF IS RF EX DF DS TC WB 39
Branch (not taken) Branch inst IF IS RF EX DF DS TC WB Delay slot IF IS RF EX DF DS TC WB Branch inst+2 IF IS RF EX DF DS TC WB Branch inst+3 IF IS RF EX DF DS TC WB Branch inst+4 IF IS RF EX DF DS TC WB 40