1 / 28

Structure of Computer Systems

Structure of Computer Systems. Course 4 The Central Processing Unit - CPU. CPU - Central Processing Unit. “Classic (idyllic) view” Incorporates 2 of the 5 components of the von Neumann’s classical model: ALU CU – Control Unit It is the brain (intelligent part) of a computer

suchi
Download Presentation

Structure of Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure of Computer Systems Course 4 The Central Processing Unit - CPU

  2. CPU - Central Processing Unit • “Classic (idyllic) view” • Incorporates 2 of the 5 components of the von Neumann’s classical model: • ALU • CU – Control Unit • It is the brain (intelligent part) of a computer • Fetch (read) instruction, decode/interpret it, read data, execute instruction and store the result • Do its job in a synchronized and sequential way – “one thing at a time”

  3. CPU - Central Processing Unit • Today’s view: • Contains all kind of computer components: • Multiple CPUs: • symmetric, asymmetric, • multiple cores, • multiple ALUs, specialized ALUs (e.g. floating point, multimedia – MMX, SSE2) • Memory – multiple levels of cache memory (L0, L1, L2, Trace cache) • Interfaces and Peripheral devices – (in case of microcontrollers and DSPs) • Serial channels • Parallel interfaces, • Timers, counters • Converters (ADC, DAC) • Network interfaces • Interrupt system • Bus controller(s) and arbiter(s) • Memory management units • Execute instructions in parallel and in a speculative order • Intelligence may be distributed in memories and interfaces as well • Where is that nice idyllic image ?

  4. CG Clk Rst IR_ld IR Memory PhG Data Addr MUX Data in Dec&CC wr PC … Sel Op_sel ALU Control signals Rst Acc_ld Acc_shr Acc Inc PC_ld Acc_shl Acc_clr Starting with the beginning … • A simple computer • Attributes: sequential, one (accumulator) register, one memory for instructions and data Legend CG - clock generator PhG – phase generator PC – program counter IR – instruction register Acc - accumulator

  5. A simple computer • How does it work? • 4 phases: • IF – instruction fetch – read the instruction into IR • Dec - Decode the instruction – generate control signals • PreEx - Prepare execution – e.g. read the data from memory • Exe – Execute – e.g. adding, subtraction

  6. CG Clk Rst IR_ld IR Memory PhG Data Addr MUX Data in Dec&CC wr PC … Sel Op_sel ALU Control signals Rst Acc_ld Acc_shr Acc Inc PC_ld Acc_shl Acc_clr A simple computer • Example 1 – ADD Acc, M[100h] • IF : Sel=0 => Address = PC ; IR_ld – impuls => IR = ADD 100 • Dec: Sel=0 =>Address = IR_adr[100] ; Inc=1 increment PC • PreEx: Op_sel = code_add => ALU is doing an adding • Exe: Acc_ld => Acc = Acc +M[100]

  7. CG Clk Rst IR_ld IR Memory PhG Data Addr MUX Data in Dec&CC wr PC … Sel Op_sel ALU Control signals Rst Acc_ld Acc_shr Acc Inc PC_ld Acc_shl Acc_clr A simple computer • Example 2 – JMP 200h • IF : Sel=0 => Address = PC ; IR_ld – impulse => IR=JMP 200 • Dec: Inc = 1 => increment PC • PreEx: PC_ld = 1 => PC=IR_addr=100 • Exe: • Example 3 – SHR Acc • IF and Dec: the same • PreEx: • Exe: Acc_shr = 1 => shift the accumulator one position to the right

  8. A simple computer • Homework: try to implement: • MOV M[addr], Acc • MOV Acc, M[addr] • Conditional jump (e.g if Acc=0, >0, <0) • MOV Acc, 0

  9. A simple computer • Issues: • Every instruction executed in a fixed (4) number of steps • Too many for simple instructions • Too few for complex instructions (e.g. multiply) • Only one internal register – hard to operate with data • No Input and Output devices • Limited number of possible operations – small instruction set • Possible improvements: • Variable number of phases -> the phase generator should depend on the instruction code • Multiple internal registers -> 2 buses: input data; output data • Front panel with 7segment LEDs and switches • Increase the number of instructions -> more complex Decoder and Command and Control Unit

  10. A more sophisticated computer, but still simple – the MIPS architecture • Attributes: • Sequential • 32 internal registers of 16 bits • Instructions: fixed length, variable content • Harvard memory architecture: separate instruction and data memory • An instruction is executed in 5 phases: • IF – instruction fetch • ID – decode the instruction and prepare (read) the data • Ex – execute the instruction • M - operation with the memory • Wb – write back – store the result • Instruction types: • “R” Register ex. ADD $RS, $RD,$RT • “I” Immediate ex. ADDI $RT,$RS, constant; LW $RT, offset($RS) • “J” Jump ex. JMP target

  11. Opcode rs rt rd shift funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits MIPS architecture • Instruction formats: • Fixed length (4 bytes) but multiple content • “R” – register type instructions <instr> rd, rs, rt • rd –destination register • rs – source register • rt – target register • Ex: add $s1, $s2, $s3 ; $s1=$s2+$s3

  12. Opcode Target address 6 bits 26 bits Opcode rs rt IMM/Addr 6 bits 5 bits 5 bits 16 bits MIPS architecture Instruction formats • “I” immediate type instruction - with immediate value (constant) <instr> rt, rs, IMM • rs – source register • rt – target register • Ex: addi $s1, $s2, 55 ; $s1=$s2+55 • “J” – jump type instructions <instr> LABEL • Ex: j et1 ;jump

  13. MIPS architecture • Address generation and instruction fetch PC_MUX_Sel1 PC_ld IR_ld +4 Op_code MUX Program Memory PC Address Instr. code IR op_address Add 0 MUX const. Jump address PC_MUX_Sel2 PC = PC+4 - increment the PC PC=Jump_Address – absolute jump PC=PC+ Jump_Address – relative jump

  14. MIPS architecture Exec cmds. DEC op_code Mem. cmds. • Decode and data preparation WB cmds. Instruction register reg. 0 MUX A (data) reg. 1 reg. 2 IR op1_ad reg. 31 op2_ad MUX B (data) Register Block address I (Immediate value)

  15. Result A Address ALU B Data Memory Dout I Din Sel_ALU ex_op_code Wr_mem Exe and mem cmds MIPS architecture • Execute and memorize Data out

  16. Result reg. 0 reg. 1 MUX reg. 2 Data out reg. 31 IR Dest. reg DEC Wr_R0,31 Wr_reg Register Block Sel_rez WB cmds MIPS architecture • Write back the result

  17. MIPS architecture • The whole picture Clk Clock gen. Phase gen. Instr. dec +4 IR PC Instr. mem Regs Data Mem Regs ALU 0

  18. Pipeline execution • What does it mean? • Work as “an assembly line” • idea – General Motors around 1900 • How to do it? • Specialized components (units) for every phase of instruction execution • Memorize the partial results in temporary buffers • What can we achieve? • Higher execution speed at the same clock frequency • CPI ~ 1

  19. IF ID Ex M Wb IF ID Ex M Wb IF ID Ex M Wb Instr. 1 Instr. 2 Instr. 3 Sequential v.s. Pipeline execution • Sequential execution CPI=5 • Pipeline execution CPI=1 (in the ideal case) T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 i1 IF ID Ex M Wb IF ID Ex M Wb i2 i3 IF ID Ex M Wb i4 IF ID Ex M Wb i5 IF ID Ex M Wb

  20. Superscalare and superpipeline architectures • Superscalar – • Multiple pipelines • 2 instructions are fetched every clock • CPI= ½ • Superpipeline • phases require only half clock period • CPI = 1/2 T1 T2 T3 T4 T5 T6 instr. i IF ID Ex M Wb instr. i+1 IF ID Ex M Wb instr. i+2 IF ID Ex M Wb instr. i+3 IF ID Ex M Wb T1 T2 T3 T4 T5 T6 instr. i IF ID Ex M Wb instr. i+1 IF ID Ex M Wb instr. i+2 IF ID Ex M Wb instr. i+3 IF ID Ex M Wb

  21. A Data Mem Inst. Mem Reg. block I R Reg. block R M addr Do addr inst. B Di D I P C ex m wb D e c +4 m wb wb C2 C1 C3 IF DI Ex M Wb Pipelined MIPS architecture

  22. Pipeline architecture • There is no free meal! • Hazard cases: • Data hazard • Data dependency between consecutive instructions • Control hazard • Jump/branch instructions change the normal (sequential) order of instruction execution • Structural hazard • Instructions in different phases use the same structural component (e.g. ALU, registers, memory, bus, etc.) • Result: reduce the speed and the efficiency of the pipeline architecture

  23. IF ID Ex M WbMOV AX, 5 IF ID Ex M ADD BX, AX Stall phases IF ID Ex MSUB CX, 5 IFID ExM MOV DX, CX Hazard cases in pipeline architectures • Data hazard • Data hazard types: • RAW - read after write • Occurs very often; avoided through forwarding (see Common data bus) • WAR – write after read • It is rare in classic pipeline; more often in superscalar pipelines • WAW – write after write • RAR – not a hazard

  24. Hazard cases in pipeline architectures • Data hazard (cont.) • Solutions: • Detection and Stall phases • instruction with unsolved data dependency waits in the “instruction fetch” stage until the data is available • the next instructions are also stalled • Register renaming • multiple copies of a register (see alias registers for Pentium Pro) • instructions with no logical dependency between them can get different copies of the same register • avoid artificial data dependency caused by the limited number of internal registers • Forwarding (see Common data bus) • transfer a result in advance before it is written in the final place (register or memory location) • Out-of-order execution • speculative execution (see Pentium Pro architecture)

  25. IF ID Ex M Wb Instruction with no memory phase IF ID Ex Wb IF ID Ex M Wb IF ID Ex M Wb Two instr. are using the register block in different phases Hazard cases in pipeline architectures • Structural hazard • Solutions: • Detection and Stall phases • Redundant functional units – see Pentium processors • Harvard memory organization – separate code and data memory – see microcontrollers • Multiple buses – see DSPs • Out-of-order execution

  26. JE et1IF ID Ex ADD AX, BX IF IDEx M SUB CX, DX IF ID Ex M ............... et1: MOV SI, 1234h IF ID Ex M Wb Hazard cases in pipeline architectures • Control hazard • Solutions: • Stall phases • Branch prediction • Out-of-order execution

  27. Pipeline architecture – hazard cases • Solving hazard cases: • Detect hazard cases and introduce “stall” phases • Rearrange instructions: • re-arrange instructions in order to reduce the dependences between consecutive instructions • Methods: • Static scheduling – made before program execution – optimization made by the compiler or user • Dynamic scheduling – made during program execution – optimization made by the processor – out-of-order execution • Branch prediction techniques

  28. Static v.s. dynamic scheduling • Static scheduling: • The optimal order of instructions is established by the compiler, based on information about the structure of the pipeline • Advantages: it is made once and benefit every time the code is executed • Drawback: compiler should know about the structure of the hardware (e.g. pipeline stages, phases of every instruction); compiler must be changed when the processor version changes • Dynamic scheduling: • The hardware has the capacity to reorder instruction to avoid or reduce the effect of hazard cases • Advantage: the processor knows best its structure; optimization can be better connected to the hardware; some dependences are reviled on at run-time • Drawbacks: reordering decisions are made every time the code is executed; mode complex hardware is needed

More Related