1 / 61

Chapter 6: Computer Arithmetic and the ALU

Chapter 6: Computer Arithmetic and the ALU. Topics 6.1 Number Systems and Radix Conversion 6.2 Fixed Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic. Number Systems. Number systems consist of a base or radix and a set of symbols:.

Download Presentation

Chapter 6: Computer Arithmetic and the ALU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6: Computer Arithmetic and the ALU Topics 6.1 Number Systems and Radix Conversion 6.2 Fixed Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic

  2. Number Systems • Number systems consist of a base or radix and a set of symbols: decimal notation (0d1234) Base: 10 Symbols: 0-9 Values: obvious binary notation (0b1010) Base: 2 Symbols: 0,1 Values: obvious hexidecimal notation (0x10EF) Base: 16 Symbols: 0-9,A,B,C,D,E,F Values: 0-9 = 0-9 A = 10 B = 11 C = 12 D = 13 E = 14 F = 15 octal notation (0c077) Base: 8 Symbols: 0-7 Values: obvious

  3. Number Representation • An m digit, base b number is written as a string of m digits: x = xm-1 xm-2 ... x1 x0 where digits xi are in the range of 0  xi b-1 • So: base 2: 0  xi 1 xi = 0,1 base 8: 0  xi 7 xi = 0-7 base 10: 0  xi 9 xi = 0-8 base 16: 0  xi 15 xi = 0-9, A-F • The value of the ith digit is: value(xi) = xi * b i • The value of x is:

  4. Number Representation Examples 32710 = 3 * 102 + 2 * 101 + 7 * 100 102 101 100 3 2 7 10112 = 1 * 23 + 0 * 22 + 1 * 21 + 1 * 20 = 1110 23 22 21 20 1 0 1 1 ED716 = 14(E) * 162 + 13(D) * 161 + 7 * 160 = 3584 + 208 + 7 = 379910 162 161 160 E D 7 37558 = 3 * 83 + 7 * 82 + 5 * 81 + 5 * 80 = = 1536 + 448 + 40 + 5 = 202910 83 82 81 80 3 7 5 5

  5. Number Representation • For a base b number of m digits, the maximum number that can be represented is: • Examples 4 digit base 10 xmax = 104 -1 = 999910 4 digit base 2 xmax = 24 -1 = 11112 = 1510 3 digit base 8 xmax = 83 -1 = 7778 = 51110

  6. Fractions • Fractions are numbers with a radix point: xn-1xn-2...x1x0 . x-1x-2...x-m • A number in a fixed-length computer register with its radix point assumed to be in a fixed position (even all the way to the right) is called a fixed point number • Each digit to the right of the radix point has a value of: radix point value(xi) = xi * b i, but now i goes from -1 to -m • Examples: 43.510 = 4 * 101 + 5 * 100 + 5 * 10-1 1101.10102 = 1 * 23 + 1 * 22 + 1 * 20 + 1 * 2-1 + 1 * 2-3 = 13.62510

  7. Radix Conversion • Converting from base b to calculator’s base c - eg., converting numbers into base 10 1) Start with base b x = xm-1 xm-2 ... x1 x0 2) Initialize base c value y = 0 3) Left to right, get next digit (symbol) xi 4) Convert base b symbol xi to base c number yi (by means of a table) 5) Update the base c vaule by: y = yb + yi 6) If there are more digits, repeat from step 3 • Example: convert AC216 to base 10 y = 0 y = 0 + A(=10) = 10 y = 10*16 + C(=12) = 172 y = 172*16 + 2 = 275410 • Example: convert 7538 to base 10 y = 0 y = 0 + 7 = 7 y = 17*8 + 5 = 61 y = 61*8 + 3 = 49110

  8. Radix Conversion • Converting from calculator’s base c to base b - eg., converting numbers from base 10 to another base 1) Start with the base c integer x to be converted 2) Initialize i = 0 and v = x and produce digits right to left 3) Calculate Di = v mod b and v = v/b. Convert Di to base b to get yi 4) Set i = i + 1; If v  0, repeat from step 3 • Example: convert 366110 to base 16 3661  16 = 228 (rem = 13)  y0 = D(=13) 228  16 = 14 (rem = 4)  y1 = 4 14  16 = 0 (rem = 14)  y2 = E Thus 366110 = E4D16

  9. Radix Conversion Examples • Example: convert 23510 to base 2 235  2 = 117 (rem = 1)  y0 = 1 117  2 = 58 (rem = 1)  y1 = 1 58  2 = 29 (rem = 0)  y2 = 0 29  2 = 14 (rem = 1)  y3 = 1 14  2 = 7 (rem = 0)  y4 = 0 7  2 = 3 (rem = 1)  y5 = 1 3  2 = 1 (rem = 1)  y6 = 1 1  2 = 0 (rem = 1)  y7 = 1 Thus 23510 = 111010112

  10. More Radix Conversion Examples • Example: convert 125710 to base 8 1257  8 = 157 (rem = 1)  y0 = 1 157  8 = 19 (rem = 5)  y1 = 5 19  8 = 2 (rem = 3)  y2 = 3 2  8 = 0 (rem = 2)  y3 = 2 Thus 125710 = 23518

  11. Radix Conversion - a more ad-hoc approach • Example: convert 113510 to base 2 210 29 28 27 26 25 24 23 22 21 20 1 0 0 0 1 1 0 1 1 1 1 111 47 15 7 3 1 0 Thus 113510 = 100011011112 8 4 2 1 64 32 16 512 256 128 1024       

  12. Radix Conversion - Digit Grouping Hex Binary 0 0000 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111 8 1000 9 1001 A 1010 B 1011 C 1100 D 1101 E 1110 F 1111 • Example: convert100011011112 to base 16 0100 0110 1111 4 6 F Thus 100011011112 = 46F16 • Example: convert100011011112 to base 0 010 001 101 111 2 1 5 7 Thus 100011011112 = 21578 Octal Binary 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111

  13. Representing Negative Integer Numbers There are four methods that can be used to represent negative numbers: • Sign-magnitude decimal: +435, -3102, etc. binary: use a sign bit, e.g. +310 = 00112, -310 = 10112 • radix compliment (eg. 2’s compliment) • diminished radix compliment (eg. 1’s compliment) • bias or excess - used in floating point numbers

  14. Complement Operationsfor m-Digit Base b Numbers • Radix complement of m-digit base b number x is: xc = (bm - x) mod bm • Diminished radix complement of x is: xc = bm - 1 - x • The complement operation is used to define the relationship between the unsigned number representing a negative number and its absolute value • Complement number systems use unsigned base b numbers to represent both positive and negative numbers • The complement of a number in the range 0xbm-1 is in the same range ˆ

  15. Tbl 6.1 Complement Representations of Negative Numbers Radix Complement Diminished Radix Complement • For even b, radix complement system represents one more negative than positive value • While diminished radix complement system has 2 zeros but represents same number of positive & negative values Number Representation Number Representation 0 0 0 0 or bm-1 0<x<bm/2 x 0<x<bm/2 x |x|c = bm - 1 - |x| -bm/2x<0 |x|c = bm - |x| -bm/2<x<0

  16. Tbl 6.2 Base 2 Complement Representations 8 Bit 2’s Complement 8 Bit 1’s Complement • In 1’s complement, 255 = 111111112 is often called -0 • In 2’s complement, -128 = 100000002 is a legal value, but trying to negate it gives overflow Number Representation Number Representation 0 0 0 0 or 255 0<x<128 x 0<x<128 x 255 - |x| -128x<0 256 - |x| -127x<0 • Numbers go from -128 to 127 • Numbers go from -127 to 127

  17. calculation of negative value: • xc = (bm - 1 - x) • Ex. 7 • -7 = 10000 - 1 - 0111 • 10000 • 0111 • 01001 • 1 • 1000 • simply invert: 0111c = 1000 ˆ Example: base 2 - 4 bits • 2’s complement • Number Representation Negative • 0 0000 0000 • 1 0001 1111 • 2 0010 1110 • 3 0011 1101 • 4 0100 1100 • 5 0101 1011 • 6 0110 1001 • 8 ----- 1000 • calculation of negative value: • xc = (bm - x) mod bm • Ex. 7 • -7 = 10000 - 0111 • 10000 • 0111 • 01001 • simply invert and add 1: • 0111c = 0111 + 1 = 1000 + 1 = 1001 • 1’s complement • Number Representation Negative • 0 0000 1111 • 1 0001 1110 • 2 0010 1101 • 3 0011 1100 • 4 0100 1011 • 5 0101 1010 • 6 0110 1001 • 8 0111 1000

  18. Shifting • Shifting a number left one digit is equivalent to multiplication by base b (insert a 0 on the right) • Examples: 32710 left shift Þ 327010 01012 = 510 left shift Þ 10102 = 1010 11012 = -310 left shift Þ 10102 = -62 (2’s complement) • In a left shift, overflow occurs if the sign changes • Example: 10102 = -610 left shift Þ 01002 Overflow!

  19. Shifting (cont.) • Shifting a number right one digit is equivalent to division by base b (copy sign bit from MSB to MSB-1 bit for signed numbers - called arithmetic right shifting) • No overflow can occur, but any fractional part of the division is lost • Examples: 01112 = 710 right shift Þ 00112 = 310 (-½ - fraction lost) 10112 = -510 right shift Þ 10102 = -62 (+½ - fraction lost)

  20. Fixed Point Arithmetic - Addition • Complement number systems allow the addition of signed numbers with an adder for unsigned numbers • Overflow only occurs when numbers are of the same sign and overflow can be detected when the result is of opposite sign from the operands • Unsigned addition of m digit base b numbers x and y: 1) Initialize digit counter j = 0 and carry in co = 0 2) Produce digit j of sum = (xj + yj + cj) mod b and carry cj+1 = (xj + yj + cj)/b 3) Increment j = j + 1 and repeat from 2 if j < m

  21. 1 1 1 1011012 0110012 10001102 x y x y x y m – 1 m – 1 1 1 0 0 c c c c c m m – 1 2 1 0 s s s m – 1 1 0 Fixed Point Arithmetic - Addition • Example: • Hardware implementation - ripple carry adder: FA FA FA • Carry and sum of jth+1 stage depends on the carry from the jth stage - thus the correct values ripple from right to left Sj = xjÄ yjÄ cj cj+1 = xj•yj + xj•cj + yj•cj

  22. x y ( b – 1 ) ' s c o m p l e m e n t + 1 B a s e b a d d e r x – y Fig 6.2 Base b Radix Complement Subtracter • To do subtraction in the radix complement system, it is only necessary to negate (radix complement) the 2nd operand • It is easy to take the diminished radix complement, and the adder has a carry-in for the +1

  23. x y x y x y x y m – 1 m – 1 2 2 1 1 0 0 r S u b t r a c t c o n t r o l q q q q m – 1 2 1 0 c c c c c c m m – 1 3 2 1 0 . . . F A F A F A F A s s s s m – 1 2 1 0 Fig 6.3 2’s ComplementAdder/Subtracter • A multiplexer to select y or its complement becomes an exclusive OR gate qj = yj•r + yj•r = yjÄ r

  24. x y x y x y m – 1 m – 1 1 1 0 0 c c c c c m m – 1 2 1 0 s s s m – 1 1 0 Problem with Ripple Carry Addition/Subtraction tc - full adder carry propagation time ts - full adder sum propagation time • Total propagation time of adder is proportional to the number of bits • A 64 bit adder with tc=200ps would take 12.8ns to compute the final carry out - which corresponds to less than an 80MHz clock speed - SLOW! mtc 2tc tc FA FA FA (m-1)tc + ts tc + ts ts

  25. Solution - Carry Look Ahead • Generate a carry across groups of bits using only the bits to be summed and the carry into that group • Consider the jth bit (stage) of an addition operation - there will be a carry out of this stage if: • the bits in this stage generate a carry, or • there is a carry into this stage and the bits in this stage propagate that carry to the next stage • What conditions generate and propagate carry bits?

  26. Binary Propagate and Generate Signals • Generate for digit j is Gj = xjyj • Propagate for digit j is Pj = xj+yj • Of course xj+yj covers xjyj but it still corresponds to a carry out for a carry in • Carries can then be written: cj+1 = Gj + Pjcj • c1 = G0 + P0c0 • c2 = G1 + P1G0 + P1P0c0 • c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0 • c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0 • Eg., in words, the c2 logic is: c2 is one if digit 1 generates a carry, or if digit 0 generates one and digit 1 propagates it, or if digits 0 and 1 both propagate a carry-in

  27. Speed Gains with Carry Lookahead • It takes one gate to produce a G or P, two levels of gates for any carry, and 2 more for full adders • The number of OR gate inputs (terms) and AND gate inputs (literals in a term) grows as the number of carries generated by lookahead • The real power of this technique comes from applying it recursively • For a group of 4 digits an overall generate is: G10 = G3 + P3G2 + P3P2G1 + P3P2P1G0 • An overall propagate is: P10 = P3P2P1P0

  28. Recursive Carry Lookahead Scheme • It is not practical to apply this directly to wide (eg. 64) bit widths directly - however, the idea extends to a multilevel scheme: • Rewrite equation for c4: c4 = G01 + P01 c0 where: G01 = G3 + P3G2 + P3P2G1 + P3P2P1G0 and: P01 = P3P2P1P0 • For the next level up (level 2 - assuming 4 “bits” per group) for group 0: G02 = G31 + P31G21 + P31P21G11 + P31P21P11G01 P02 = P31P21P11P01 • Keep building up in groups of k bits (tree structure) • Now delay is proportional to logkm because there are logkm levels in the carry-look ahead tree

  29. 3 3 G P L o o k a h e a d 0 0 c L e v e l 3 4 2 2 2 2 G P G P L o o k a h e a d 1 1 0 0 c c L e v e l 2 6 2 1 1 1 1 1 1 1 1 G P G P G P G P L o o k a h e a d 3 3 2 2 1 1 0 0 L e v e l 1 c c c c 7 5 3 1 G P G P G P G P G P G P G P G P C o m p u t e 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 g e n e r a t e a n d p r o p a g a t e y x y x y x y x y x y x y x y x c 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 A d d e r s F A F A F A F A F A F A F A F A s s s s s s s s 7 6 5 4 3 2 1 0 Fig 6.4 Carry Lookahead Adder for Group Size k = 2

  30. P2 P2 P2 P2 G2 G2 G2 G2 G1 G1 G1 G1 G1 G1 G1 G1 P1 P1 P1 P1 P1 P1 P1 P1 G1 G1 G1 G1 G1 G1 G1 G1 P1 P1 P1 P1 P1 P1 P1 P1 64 Bit Adder using Carry Look Ahead with Group Size, k = 4 Delays: Level 0 Propagate & Generate 1 delay Level 1 Propagate & Generate 2 delays Level 2 Propagate & Generate 2 delays Level 3 intermediate carry 2 delays Level 2 intermediate carry 2 delays Level 1 intermediate carry 2 delays MSB sum & co from carry in 1 delay Total gate delays: 12

  31. 1010 Multiplicand Y 1101 Multiplier X=X3X2X1X0 1010 X0Y20 0000 X1Y21 1010 X2Y22 1010 X3Y23 10000010 Binary Multiplication (unsigned) • Multiplication can be done using successive addition (as you have seen in the lab assignment) • A faster method is to multiply Y by X, k bits at a time and add the resulting terms (k usually equals 1) - as is done by hand • Example: • There are many ways to implement this in hardware, but perhaps the simplest is the shift-add mechanism

  32. Multiplicand Mn-1 M0 An-1 Qn-1 C A0 Q0 Unsigned Binary Multiplication Datapath Control Unit n-Bit Adder Multiplier Product 2n+1 bit shift register

  33. Unsigned Binary Multiplication Algorithm START C, A  0 M  Multiplicand Q Multiplier Count  0 No Yes Q0 = 1? C, A A + M Shift C,A,Q Count Count + 1 No Yes Count = n? END

  34. Unsigned Binary Multiplication Example (Y) (X) M C A Q 1010 0 0000 1101 initialize registers; count = 0; test:Q0=1 - 0 1010 1101 A ¬ A+M - 0 0101 0110 shift C,A,Q; count=1; test:Q0=0 - 0 0010 1011 shift C,A,Q; count=2; test: Q0=1 - 0 1100 1011 A ¬ A+M - 0 0110 0101 shift C,A,Q; count=3; test: Q0=1 - 1 0000 0101 A ¬ A+M - 1 1000 0010 shift C,A,Q; count=4; end Product

  35. Signed Binary Multiplication • Many algorithms for signed multiplication exist, but the simplest one is similar to the one for unsigned multiplication except: • Shifts are arithmetic (sign bit is copied) • Adder is 2’s complement - carry out is ignored • if the sign bit of the multiplier is ‘1’ (it is negative) then the last arithmetic operation before the final shift is a subtraction vs. an addition • A data path very similar to the one for unsigned multiplication can be used - the C register is not needed, the shift register must be arithmetic, and the adder must be modifed to do 2’s complement addition and subtraction • The algorithm is also very similar with the exception of the final arithmetic operation

  36. Signed Binary Multiplication Example (Y=3) (X= -5) M A Q 00000011 00000000 11111011 initialize registers; count = 0; test:Q0=1 - 00000011 11111011 A ¬ A+M - 00000001 11111101 arithmetic shift A,Q; count=1; test:Q0=1 - 00000100 11111101 A ¬ A+M - 00000010 01111110 arithmetic shift A,Q; count=2; test:Q0=0 - 00000001 00111111 arithmetic shift A,Q; count=3; test:Q0=1 - 00000100 00111111 A ¬ A+M - 00000010 00011111 arithmetic shift A,Q; count=4; test:Q0=1 - 00000101 00011111 A ¬ A+M - 00000010 10001111 arithmetic shift A,Q; count=5; test:Q0=1 - 00000101 10001111 A ¬ A+M - 00000010 11000111 arithmetic shift A,Q; count=6; test:Q0=1 - 00000101 11000111 A ¬ A+M - 00000010 11100011 arithmetic shift A,Q; count=7; test:Q0=1 - 11111111 11100011 A ¬ A-M (subtract because X was neg.) - 11111111 11110001 arithmetic shift A,Q; count=8; end Product = -15

  37. Booth’s Algorithm for Signed Multiplication • A more efficient algorithm was developed by Booth in 1951 • It uses a single bit register to the right of the least significant bit of Q called Q-1 • The action to be taken on the next step depends on the value of Q0 and Q-1 • if Q0Q-1 = 01 then A ¬ A+ M ; arithmetic shift A,Q,Q-1 • if Q0Q-1 = 10 then A ¬ A- M; arithmetic shift A,Q,Q-1 • if Q0Q-1 = 00 or 11 then arithmetic shift A,Q,Q-1 • This algorithm is more efficient that the previous one in that it “skips over” groups of all ‘1’s or all ‘0’s

  38. Multiplicand Mn-1 M0 An-1 Qn-1 A0 Q0 Booth’s Algorithm Datapath Control Unit n-Bit 2’s Complement Adder/Subtractor Multiplier Q-1 Product 2n+1 bit arithmetic shift register

  39. Booth’s Algorithm

  40. Booth’s Algorithm Example (Y=3) (X= -5) M A Q Q-1 00000011 00000000 11111011 0 initialize registers; count = 0; test:Q0Q-1=10 - 11111101 11111011 0 A ¬ A - M - 11111110 11111101 1 arithmetic shift A,Q,Q-1; count=1; test:Q0Q-1=11 - 11111111 01111110 1 arithmetic shift A,Q,Q-1; count=2; test:Q0Q-1=01 - 00000010 01111110 1 A ¬ A+M - 00000001 00111111 0 arithmetic shift A,Q,Q-1; count=3; test:Q0Q-1=10 - 11111110 00111111 0 A ¬ A - M - 11111111 00011111 1 arithmetic shift A,Q,Q-1; count=4; test:Q0Q-1=11 - 11111111 10001111 1 arithmetic shift A,Q,Q-1; count=5; test:Q0Q-1=11 - 11111111 11000111 1 arithmetic shift A,Q,Q-1; count=6; test:Q0Q-1=11 - 11111111 11100011 1 arithmetic shift A,Q,Q-1; count=7; test:Q0Q-1=11 - 11111111 11110001 1 arithmetic shift A,Q,Q-1; count=8; end Product = -15

  41. Floating Point Numbers • Fixed point numbers with a limited number of bits suffer from two problems: • limited range • limited precision • The solution is to use an equivalent representation to the decimal scientific notation of the form: -2.72 X 10-2 • This format has four parts: • sign (+,-) - • significand (also sometimes called the mantissa) 2.72 • exponent -2 • base of the exponent 10

  42. E x p o n e n t F r a c t i o n S i g n s e f 1 m m e f m b i t s s e 1 + m + m = m , V a l u e ( s , e , f ) = ( – 1 )   f   2 e f Fig 6.14 Floating-PointNumber Format • s is sign, e is exponent, and f is significand • base of the exponent is implied by the specific format • radix point in the fraction is also implied (i.e., the fraction is really fixed point)

  43. Signs in Floating-Point Numbers • Both significand and exponent have signs • A complement representation could be used for f, but sign magnitude is most common now • The sign is placed at the left instead of with f so test for negative always looks at left bit • The exponent could be 2’s complement, but it is better to use a biased exponent • If -emin e  emax, where emin, emax > 0, then e = emin + e is always positive, so e replaced by e emin is called the bias ^ ^

  44. Normalized Floating-Point Numbers • There are multiple representations for a floating-point number • If f1 and f2 = 2df1 are both fractions and e2 = e1 - d, then(s, f1, e1) and (s, f2, e2) have same value • Scientific notation example: 0.819 103 = 0.0819 104 • A normalized floating-point number has a leftmost digitnonzero (exponent small as possible) • Zero cannot fit this rule; usually written as all 0s • In normal base 2, when the number is normalized, the bit to the immediate right if the radix point is always 1, so it can be left out • So-called hidden or implied bit

  45. Comparison of Normalized Floating Point Numbers • If normalized numbers are viewed as integers, a biased exponent field to the left means an exponent unit is more than a significand unit • The largest magnitude number with a given exponent is followed by the smallest one with the next higher exponent • Thus normalized FP numbers can be compared for<, , >, , =,  as if they were integers • This is the reason for the s,e,f ordering of the fields and the use of a biased exponent, and one reason for normalized numbers

  46. Example 32 bit Floating Point Format E x p o n e n t F r a c t i o n S i g n s e f • MSB is sign bit • 8 bit exponent to the left of the sign bit • exponent is biased by 127 • exponent base is 2 • 23 bit significand • normalized - radix point is to the left of the significand • implied ‘1’ to the left of the radix point yields 24 bit effective significand • The largest positive number that can be represented is: 31 30 23 22 0 1.111111111111111111111112 X 2128 = 4.789048279756 X 1052

  47. Numbers in the Example Format • Example: 3.0 Sign: 02 Exponent: 000000002 Significand: 11.000000000000000000000002 Normalization requires moving the radix point one place to the left New exponent: 000000012 New significand: 1.100000000000000000000002 Biasing the exponent requires adding 12710 (11111112) New exponent: 100000002 Now construct the final result (sometimes called packing) Result: 0 10000000 10000000000000000000000 2 Sign bit Significand: 1.12 Exponent (leftmost ‘1’ and radix point are implied)

  48. Numbers in the Example Format (cont.) • Example: 4.6283 X 1015 4.6283 X 1015 = 1.027689044975 X 252 Significand: 1.000001110001011010100002 Exponent: 01101002 Biasing the exponent requires adding 12710 (11111112) New exponent: 101100112 Significand is already normalized Now pack the final result Result: 0 10110011 00000111000101101010000 2 Sign bit Significand Exponent (leftmost ‘1’ and radix point are implied)

  49. Comparison of Numbers in the Example Format • Example: 4.6283 X 1015 vs. 3.0 4.6283 X 1015 01011001100000111000101101010000 2 01000000010000000000000000000000 2 3.0 • Example: 256.25 vs. 375.75 256.25 01000011100000000010000000000000 2 01000011101100101110000000000000 2 375.75 • Example: 14.0 vs. 10.0 14.0 01000001011000000000000000000000 2 01000001001000000000000000000000 2 10.0

  50. Floating Add, FA, and Floating Subtract, FS, Procedure Add or subtract (s1, e1, f1) and (s2, e2, f2) 1) Unpack (s, e, f); handle special operands 2) Shift fraction of number with smaller exponent right by|e1 - e2| bits 3) Set result exponent er = max(e1, e2) 4) For FA and s1 = s2 or FS and s1 s2, add significands, otherwise subtract them 5) Count lead zeros, z; carry can make z = -1; shift left z bits or right 1 bit if z = -1 6) Round result, shift right, and adjust z if rounding overflow occurs 7) er er - z; check over- or underflow; bias and pack

More Related