CS136, Advanced Architecture

CS136, Advanced Architecture Instruction Set Architecture CS 136

Types of ISAs • Stack • Implicit operands (top of stack) • Heavy memory traffic • Limited ability to access operands at will • Obsolete • Accumulator • Implicit register operand (“accumulator”) • One memory operand • Insufficient temporaries • Obsolete • General-purpose register • Multiple registers • Several variations CS 136

GPR Architectures • Memory-memory • CISC idea • Usually allows any operand to be in register as well • Register-memory • Example: x86 • Can do one operand in register, one in memory, or 2 in regs • Register-register • Only design used in modern machines • Lots of registers ⇒ fast flexible operand access • Simplicity of hardware • Compiler has full flexibility in register usage CS 136

Five Ways to Do C = A + B STACK PUSH A PUSH B ADDPOP C ACCUM LOAD A ADD BSTORE C MEM-MEM ADD C,A,B REG-MEM LOAD R1,A ADD R1,B STORE R1,C REG-REG LOAD R1,A LOAD R2,B ADD R3,R1,R2STORE R3,C CS 136

Memory Addressing • Originally just word addressing • 8-bit bytes and byte addressing introduced on IBM 360 series • Brief experiments with bit addressing (bad idea) • Unaligned accesses not worth supporting • Some machines byte-address but only load/store a word at a time • Turned out to be bad design decision • Too many programs do string processing 1 character at a time • May need to revisit in future (32-bit characters?) • Modern RISC designs allow short load/store, but not short arithmetic CS 136

Endian-ness • The word is “Endian”, not “Indian” • Reference to Gulliver’s Travels • Little-Endian invented by Digital Equipment on the PDP-11 • Mathematically more elegant • Horrible for humans • “It seemed like a good idea at the time” • Should be banished from the face of the Earth • Some machines can switch endianness with a control bit • This idea is even stupider than the original CS 136

Addressing Modes • How can an instruction reference memory? • Early days: absolute address in instruction • Led to instruction modification • Improvement: “Indirection” picked up absolute location, used it as final address • Minimum necessary today: follow pointer in register • Clumsy if only option • Fanciest conceivable: *(R1+S*R2+constant), with either or both of R1 and R2 autoincremented or autodecremented as side effect, either before or after instruction • No machine went quite this far • But VAX came close CS 136

Addressing Modes (cont’d) • What’s actually useful? • Need to follow pointers: can restrict to registers • ADD R1,(R2) • Better: LOAD R1,R2 (like MIPS) • Frequent stack access ⇒ register + constant useful • Immediates needed for built-in constants • Access to globals ⇒ absolute memory addresses • (We’ll see that that’s painful) • PC-relative modes • Used to be needed for data; not in modern systems • Still needed for calls and branches • Absolute addresses no longer needed for branches • Can always emulate with PC-relative, since PC known • Still available on some architectures CS 136

Operand Types and Sizes • Type usually implies size • Integers can safely be widened to word size • Shrink again when stored • Takes advantage of two’s-complement representation • Single-precision FP gives different results than double-precisions • Necessary to support both widths • Some FPUs can do two SP operations in parallel • Older machines allowed “packed” decimal (2 digits per byte) • x86 supports with DAA (Decimal Add Adjust) instruction • Still useful in business world, though dying • 32 bits standard these days, 64 bits coming • 128 some day? CS 136

Operations Provided • Only one instruction truly needed: SJ • Subtract A from B, giving C; if result is < 0, jump to D • It’s Turing-complete! • Practical machines need a bit more at minimum: • Arithmetic and logical (add, multiply, divide?, and, or, …) • Data movement (load/store, move between registers) • Control (conditional/unconditional branch, procedure call and return, trap to OS) • System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices) • Other builtins can be useful: • Basic floating point • Bad x86 design idea: sin, sqrt, etc.! • Decimal • String • Vector, graphics CS 136

Control Flow • Addressing modes are important • PC-relative means code can run at any virtual address • Useful for dynamically linked (shared) libraries • Pointer-following jump needed for returns • Also useful for switch statements, function pointers, virtual functions, and shared libraries • How to specify condition for conditional branches? • Condition code as side effect of every instruction • Boils down to extra register • Spurious dependencies in pipeline • Condition register explicitly set by comparison • Compare as part of branch • Adds delay slots in pipeline CS 136

Encodings • Variable-length instructions • Highly efficient (few wasted bits) • Allows complex specifications (e.g., x86 addressing modes) • Usually means misaligned instruction fetch • Greatly complicates fetch/decode units • Fixed-length instructions • May limit number of registers • Usually very few instruction formats • Wastes space but gains speed (e.g., only aligned fetches) • Limits width of immediate operands CS 136

The Fight for Bits • How wide should instruction be? • Wider ⇒ can encode more registers, more options • Wider ⇒ bigger programs, more memory bandwidth • Bigger programs ⇒ fewer cache hits • Things you need to encode: • Operation code (16 to 1000 instructions) • Operands (at least one, normally two or three) • Immediate operands • Memory offsets • Branch targets • Branch conditions • Conditional operations (e.g., conditional load, add) CS 136

Two or Three Operands? • In favor of three: • Smaller code size • No clobbered operands ⇒ fewer copies or reloads • Setting R0 to zero allows fewer operations supported in ALU • In favor of two: • Can address more registers CS 136

How to Decide All These Questions? • Slide rules at 50 paces? • Analysis wars • Look at existing designs, existing programs • “Recompile” programs for hypothetical architecture • Analyze size of resulting program • Run through simulator to see how it performs • Impractical approach • Writing compiler back ends is expensive • Simulators are slow • instead, make projections based on existing object code CS 136

Example of Bad Analysis: @-(R2) • DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement • What happened to indirect autopredecrement? • Analyzed output of BLISS compiler on many programs • Language didn’t provide way to express autopredecrement • Concluded it wasn’t necessary • Very different result if had analyzed C! *--p1 = a[--i]; CS 136

Example of Difficult Analysis: imm16 • How big should an immediate be? • Easy analysis: examine existing code • Calculate frequency of various widths • Analyze tradeoff of using those bits for other purposes • Problem: analyzed architecture affects frequency of different widths • E.g., Alpha has only 16 bits, so you’ll never see over 16! • Alternative: look for multi-instruction sequences that effectively use more than 16 bits • Hard to find (compiler pipeline scheduling) • Compiler will stand on head, use sneaky tricks to avoid generating extra instructions • Need for wider constants depends on architecture • E.g., MIPS needs them when jumping to shared libraries CS 136

CS 136

Interaction with Compilers • Nearly all modern code generated by compilers • Architect must make compiler’s job easier • Lots of registers • Orthogonal instruction set • Few side effects • Instructions and addressing modes matched to language constructs • But NOT attempt to implement them in detail! • Primitives are better than “solutions” even when solutions are correct • Good support for stack, globals, and pointers • Support for both compile-time and run-time binding • Don’t ask compiler to predict dynamic information (e.g., branch targets) • Don’t provide features language can’t express • Example pro and con: vector architectures CS 136

The MIPS64 Architecture • Extension of MIPS32 • Data path widened to 64 bits • Still 32-bit instructions • Still only 32 registers • Most instructions have “D” as prefix to indicate 64-bit version CS 136

R-Type Instruction I-Type Instruction 6 6 5 5 5 5 5 5 16 6 Opcode Opcode rs rs rt rt Immediate rd shamt funct J-Type Instruction 6 26 Opcode Offset inserted into PC MIPS Instruction Formats CS 136

6 5 5 16 Opcode rs rt Immediate I-Type Instructions • Encodes loads, stores (all widths), immediate ALU ops • Also conditional branches (rt unused) CS 136

6 5 5 5 5 6 Opcode rs rt rd shamt funct R-Type Instructions • Register-register ALU operations • “funct” encodes the ALU operation: add, sub, etc. • Opcode chooses operands, special registers, sizes, etc. • Conditional moves • Handles special registers, floating point, … CS 136

6 26 Opcode Offset inserted into PC J-Type Instructions • Jump, jump and link • Trap, return from exception CS 136

MIPS Control Flow • Unconditional jump substitutes low bits of PC • NOT addition! • Exceptionally bad on 64-bit architecture, where 36 bits unchanged • No built-in stack • Subroutine call stores return in register • Callee must save on stack if necessary • Reduces overall cycle time • Ultra-efficient for leaf functions • Conditional branches only test against zero • Complex tests (e.g., <) store Z/NZ result in a register • We’ve seen how this improves the pipeline • Conditional moves can eliminate many branches • Feature of many modern architectures CS 136

MIPS Floating Point • Floating point was originally coprocessor • Separate FP registers • Special instructions to move to/from integer registers • MIPS64 (but not 32) has paired single operations • Two SP numbers pass through DP ALU simultaneously • MIPS64 also has multiply-add in one instruction • Useful in signal processing (multimedia) CS 136

Fallacies and Pitfalls • PITFALL: Instruction designed to support feature in some language • Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK • Why is this bad? • Easy to get wrong (PDP-11 MARK instruction) • Easy to make inefficient (VAX CALLS) • Languages evolve, hardware doesn‘t CS 136

Fallacies and Pitfalls (2) • FALLACY: Typical programs exist • We wish! • PITFALL: Ignoring the compiler • Design better code size, based on bad compiler • Good compiler can blow your idea out of the water • FALLACY: Flawed architectures can’t succeed • Ummm, x86? • Every architecture has drawbacks • FALLACY: You (YOU!) can design a flawless architecture • Always tradeoffs • Always something new to learn CS 136

Summary • Instruction encoding is important • Don’t forget to provide what the compiler needs • This is NOT what you think the compiler needs! • Addresses will only get wider • Data will only get wider • Including characters • Cleverness to improve bandwidth (e.g., MADD) • RISC is here to stay CS 136

CS136, Advanced Architecture