340 likes | 467 Views
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank. Competency Area 4: Computer Arithmetic. Introduction. In previous chapters we’ve discussed: Performance (execution time, clock cycles, instructions, MIPS, etc)
E N D
Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank Competency Area 4: Computer Arithmetic
Introduction • In previous chapters we’ve discussed: • Performance (execution time, clock cycles, instructions, MIPS, etc) • Abstractions: Instruction Set Architecture Assembly Language and Machine Language • In this chapter: • Implementing the Architecture: • How does the hardware really add, subtract, multiply and divide? • Signed and unsigned representations • Constructing an ALU (Arithmetic Logic Unit)
(Signed Representation) (Unsigned Representation) Introduction • Humans naturally represent numbers in base 10, however, computers understand base 2. Example: -1 (1111 1111)2 = 255 Note: Signed representation includes sign-magnitude and two’s complement. Also, one’s complement representation.
Possible Representations • Sign Magnitude: One's Complement Two's Complement 000 = +0 000 = +0 000 = +0 001 = +1 001 = +1 001 = +1 010 = +2 010 = +2 010 = +2 011 = +3 011 = +3 011 = +3 100 = -0 100 = -3 100 = -4 101 = -1 101 = -2 101 = -3 110 = -2 110 = -1 110 = -2 111 = -3 111 = -0 111 = -1 • Sign Magnitude (first bit is sign bit, others magnitude) • Two’s Complement (negation: invert bits and add 1) • One’s Complement (first bit is sign bit, invert other bits for magnitude) • NOTE: Computers today use two’s complement binary representations for signed numbers.
maxint minint Two’s Complement Representations • 32 bit signed numbers (MIPS):0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten • The hardware need only test the first bit to determine the sign.
Two’s Complement Operations • Negating a two's complement number: • invert all bits and add 1 • Or, preserve rightmost 1 and 0’s to its right, flip all bits to the left of the rightmost 1 • Converting n-bit numbers into m-bit numbers with m > n: Example: Convert 4-bit signed number into 8-bit number. 0010 0000 0010 (+210) 1010 1111 1010 (-610) • "sign extension" is used. The most significant bit is copied into the right portion of the new word. For unsigned numbers, the leftmost bits are filled with 0’s. • Example instructions: lbu/lb, slt/sltu, etc.
Addition and Subtraction • Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101 • Two's complement operations easy • subtraction using addition of negative numbers 0111 + 1010 • Overflow (result too large for finite computer word): • e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 note that overflow term is somewhat misleading, 1000 it does not mean a carry “overflowed”
32-bit ALU with Zero Detect: * Recall that given following control lines, we get these functions: 000 = and 001 = or 010 = add 110 = subtract 111 = slt * We’ve learned how to build each of these functions in hardware.
So far… • We’ve studied how to implement a 1-bit ALU in hardware that supports the MIPS instruction set: • key idea: use multiplexor to select desired output function • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important issues about hardware: • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Changes in hardware organization can improve performance • we’ll look at examples for addition (carry lookahead adder) and multiplication, and division
Better adder design • For adder design: • Problem ripple carry adder is slow due to sequential evaluation of carry-in/carry-out bits • Consider the carryin inputs: • Using substitution, we can see the “ripple” effect:
Carry-Lookahead Adder • Faster carry schemes exist that improve the speed of adders in hardware and reduce complexity in equations, namely the carry lookahead adder. • Let cinirepresent the ith carryin bit, then: • We can now define the terms generate and propagate: • Then,
Carry-Lookahead Adder • Suppose gi is 1. The adder generates a carryout independent of the value of the carryin, i.e. • Now suppose gi is 0 and pi is 1: • The adder propagates a carryin to a carryout. In summary, cout is 1 if either gi is 1 or both pi and cin are 1. • This new approach creates the first level of abstraction.
Carry-Lookahead Adder • Sometimes the first level of abstraction will produce large equations. It is beneficial then to look at the second level of abstraction. It is produced by considering a 4-bit adder where we propagate and generate signals at a higher level: • We’re representing a 16-bit adder, with a “super” propagate signal • and a “super” generate signal. • So Pi is true only if the each of the bits in the group propagates a • carry.
Carry-Lookahead Adder • For the “super” generate signals it matters only if there is a carry out in the most significant bit. • Now we can represent the carryout signals for the 16-bit adder • with two levels of abstraction as
2nd Level of Abstraction Carry-LookAhead Adder Design
S A B S A B S A B S A B S A B S A B S A B S A B G Cin GCoutCin GCoutCin G Cin GCoutCin G Cin GCoutCin G Cin P P P P P P P P PmsGlsPls Pms GlsPls PmsGlsPls Pms GlsPls MS MS LS LS G G GCout Cin GCout Cin P P P P Pms GlsPls Pms GlsPls MS LS G GCout Cin P P Pms GlsPls LS GCout Cin P O(log n)-time carry-skip adder With this structure, we can do a2n-bit add in 2(n+1) logic stagesHardwareoverhead is<2× regularripple-carry. (8 bit segment shown) 3rd carry tick 2nd carry tick 4th carry tick 1st carry tick
Multiplication Algorithms • Recall that multiplication is accomplished via shifting and addition. • Example: 0010 (multiplicand) x 0110 (multiplier) 0000 +0010 (shift multiplicand left 1 bit) 00100 + 0010 0001100 (product) Multiply by LSB of multiplier Intermediate product
Multiplication Algorithm 1 Hardware implementation of Algorithm 1:
Multiplication Algorithm 1 For each bit:
Multiplication Algorithm 1 Example: (4-bit)
Multiplication Algorithms • For Algorithm 1 we initialize the left half of the multiplicand to 0 to accommodate for its left shifts. All adds are 64 bits wide. This is wasteful and slow. • Algorithm 2 instead of shifting multiplicand left, shift product register to the right => half the widths of the ALU and multiplicand
Multiplication Algorithm 2 For each bit:
Multiplication Algorithm 2 Example: (4-bit)
Multiplication Algorithm 3 • The third multiplication algorithm combines the right half of • the product with the multiplier. • This reduces the number of steps to implement the multiply and • it also saves space. • Hardware Implementation of Algorithm 3:
Multiplication Algorithm 3 For each bit:
Multiplication Algorithm 3 Example: (4-bit)
Division Algorithms (Quotient) (Dividend) (Divisor) - - • Example: • Hardware implementations are similar to multiplication algorithms: • Algorithm 1 implements conventional division method • Algorithm 2 reduces divisor register and ALU by half • Algorithm 3 eliminates quotient register completely
Floating Point Numbers SIGN EXPONENT SIGNIFICAND • We need a way to represent: • numbers with fractions, e.g., 3.1416 • very small numbers, e.g., .000000001 • very large numbers, e.g., 3.15576 ´ 109 • Representation: • sign, exponent, significand: (–1)sign´ significand ´ 2exponent • more bits for significand gives more accuracy • more bits for exponent increases dynamic range • IEEE 754 floating point standard: • For Single Precision: 8 bit exponent, 23 bit significand, 1 bit sign • For Double Precision: 11 bit exponent, 52 bit significand, 1 bit sign
IEEE 754 floating-point standard • Leading “1” bit of significand is implicit • Exponent is usually “biased” to make sorting easier • All 0s is smallest exponent, all 1s is largest • bias of 127 for single precision and 1023 for double precision • Summary: (–1)sign´ (1+fraction) ´ 2exponent – bias • Example: −0.7510 = −1.122−1 • Single precision: (−1)1 (1 + .1000…) 2126−127 • 1|01111110|10000000000000000000000 • Double precision: (−1)1 (1 + .1000…) 21022−1023 • 1|01111111110|10000000000000000000000…(32 more 0s)
FP Addition Algorithm • The number with the smaller exponent must be shifted right before adding. • So the “binary points” align. • After adding, the sum must be normalized. • Then it is rounded, • and possibly re-normalized • Possible errors include: • Overflow (exponent too big) • Underflow (exp. too small)
Floating-Point Addition Hardware • Implementsalgorithmfrom prev.slide. • Note highcomplexitycomparedwith integeraddition HW.
FP Multiplication Algorithm • Add the exponents. • Adjusting for bias. • Multiply the significands. • Normalize, • Check for over/under flow, • then round. • Repeat if necessary. • Compute the sign.
Ethics Addendum: Intel Pentium FP bug • In July 1994, Intel discovered there was a bug in the Pentium’s FP division hardware… • But decided not to announce the bug, and go ahead and ship chips having the flaw anyway, to save them time & money • Based on their analysis, they thought errors could arise only rarely. • Even after the bug was discovered by users, Intel initially refused toreplace the bad chips on request! • They got a lot ofbad PR from this… • Lesson: Good, ethicalengineers fix problems when they first find them, and don’t cover them up!
Summary • Computer arithmetic is constrained by limited precision. • Bit patterns have no inherent meaning but standards do exist: • two’s complement • IEEE 754 floating point • Computer instructions determine “meaning” of the bit patterns. • Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation).* Please read the remainder of the chapter on your own. However, you will only be responsible for the material that has been covered in the lectures for exams.