b1110 Multi-Cycle CPUs

b1110Multi-Cycle CPUs ENGR xD52 Eric VanWyk Fall 2012

Acknowledgements • Mark L. Chang lecture notes for Computer Architecture (Olin ENGR3410) • Microchip Technology Inc (Datasheet) • A. Sahu, Indian Institute of Technology

Today • Recall Single-Cycle CPUs • Single-Cycle Shortcomings • Multi-Cycle CPUs

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Execution Overview • Fetch instruction from memory • Decode instruction into actions/controls • Fetch/Decode operands • Compute result value or status • Push result(s) to storage • Determine next instruction

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Execution Overview • Fetch instruction from memory • Decode instruction into actions/controls • Fetch/Decode operands • Compute result value or status • Push result(s) to storage • Determine next instruction • Reference Lecture b1001

Processor Overview Overall Dataflow PC fetches instructions Instructions select operand registers, ALU immediate values ALU computes values Load/Store addresses computed in ALU Result goes to register file or Data memory

Information Rippling • Each Clock Cycle “ripples” left to right • Elements emit garbage values until they stabilize • Elements on the Left (leading edge of cycle) • Spend most of the time “bored” (stable) • Under-utilized • Elements on the Right (lagging edge of cycle) • Spend most of the time twitching as things settle • Spend unnecessary dynamic power

PC Instr. Memory Reg Read mux ALU Adder mux PCsetup PC Instr. Memory Reg Read mux ALU Data Memory PC Instr. Memory Reg Read mux ALU mux Reg Setup PC Instr. Memory mux PCsetup PC Instr. Memory Reg Read mux ALU Data Memory mux Reg Setup Performance of Single-Cycle Machine Clock speed is set by the slowest instruction Arithmetic & Logic Load Store Branch Jump

storage element storage element Acyclic Combinational Logic (A) Acyclic Combinational Logic storage element Acyclic Combinational Logic (B) storage element storage element Reducing Cycle Time • Cut combinational dependency graph and insert register / latch • Do same work in N fast cycles, rather than one slow one

Goals • Free up fast instructions from slow clock curse • Load Word is Super Slow • Make better use of available resources • Reuse components (2 Memories, 2 or 3 Adders) • Free up space to speed up components • One big fast ALU, not multiple slow ALU/adders

Multi-Cycle CPUs in the Wild • Common in small embedded spaces • PIC16, PIC18 • 4 bit micros (watches) • Marketing will try to confuse you • Advertise MHz • Hide CPI, Instructions per second, etc http://ww1.microchip.com/downloads/en/DeviceDoc/41213D.pdf

Preview of White Boards to Come • We will go to the white boards “later”. • You will create the schematic necessary to run the RTL design I’m about to give you. • I’m “cheating” and giving you parts of MY answer, to make your reinvention smoother • There is nothing sacred about MY answer, • You’re usually on the hook for the whole shebang • If it fulfils the contract and is small/fast, awesome.

Strategy • Enumerate all the stuff we have to do • Per Instruction • Highlight Common Features • Break tasks in to N phases • Balance work done in each phase

“Typical” Phases • IF: Instruction Fetch • ID: Instruction Decode (& register fetch) • EX: Execute • MEM: Read from Memory • WB: Write Back to Memory • Other Architectures make different divisions!

New Registers • Instruction Register (IR) • Instruction fetched from Data Memory • Data Register (DR) • Data fetched from Data Memory • Operands (A, B) • Fetched from Register File • Result (Res) • Result of the ALU calculation

Phases: Load Word • IF: Instruction Register = Memory[PC] PC=PC+4 • ID: A = RegFile[rt] B = RegFile[IR[16:20]] • EX: Result = A + sign extended immediate • MEM: DataReg = Mem[Result] • WB: RegFile[rs] = DataReg

IR Rs Rt Rd Imm16 WrEn Addr Dout Memory Din Aw Ab Aa Da Registers Dw WrEn Db SignExtnd PC <<2 MDR ALU RES B A Multi Cycle w/ Controls PCSrc MemIn ALUOp ALUSrcA PC_WE IR_WE Mem_WE Concat 4 Dst ALUSrcB Reg_WE RegIn

Desk Work • Time your Multicycle design from Monday • Do symbolically first, then substitute real numbers • Remember parallel paths!

Phases: ADD • IF: Instruction Register = Memory[PC] PC=PC+4 • ID: A = RegFile[rs] B = RegFile[rt] • EX: Result = A + B • MEM: • WB: RegFile[rd] = Result

Phases: Store Word • IF: Instruction Register = Memory[PC] PC=PC+4 • ID: A = RegFile[rs] B = RegFile[rt] • EX: Result = A + sign extended immediate • MEM: Mem[Result] = B • WB:

Phases: Branch if Equal • IF: Instruction Register = Memory[PC] PC=PC+4 • ID: A = RegFile[rs] B = RegFile[rt] Res = PC + sign extended immediate • EX: if(A==B) PC = Res • MEM: • WB:

Phases: Jump • IF: Instruction Register = Memory[PC] PC=PC+4 • ID: PC = PC[31:28],IR[25:0],b00 • EX: • MEM: • WB:

Example Control Diagram

Lets Make It • Create a Multi-Cycle CPU that can do the instructions on prior pages • Jump, Branch, R-type, I-type, LW, SW • Show everything except the actual decode logic • Reference Lecture b1001, but put everything on one “page” • Sketch a Schematic for your Multi Cycle CPU • ALU, Register File, Unified Instruction/Data Memory • Sign Extender, (Optional) Shift by Two • IR, DR, A, B, Res Registers • OMG MUXES EVERYWHERE • Create a control chart (to show the actual decode logic) • Show each cycle of each instruction • Mux selects, ALU Control Lines, Register Enables • Use “X” for “Don’t Care” • We will informally present for the last 15 minutes

Summary • Split Single Cycle in to multiple cycles • Use variable number of cycles per instruction • No More Harrison Bergeron-ing • Most Instructions become Faster • Longest Instruction gets Longer • From unbalanced phases • Costs: Registers, control logic

Preview of things to Come • Lab 2: The ALU • How to control a Multi-Cycle CPU • Timing Concerns and Explicit Balancing • Modern CPUs: Pipelining

b1110 Multi-Cycle CPUs

b1110 Multi-Cycle CPUs

Presentation Transcript

CH3 CPUs

CPUs

ARM CPUs

MIPS Datapath (Single Cycle and Multi-Cycle)

CPUs

CPUs

CPUs

CPUs

CPUs

Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller

CPUs

Multi Cycle CPU

CPUs

CPUs

Multi-Cycle MIPS Implementation

Multi Cycle Treasury Program

CPUs

CPUs

Multi-Cycle Datapath

CPUs

CPUs