Floating the point

Floating the point Real number representationin a computer

The point : character data cannot be mixed with numeric data • The binary representation for 5.25 is 101.01 but we cannot put the ASCII code for the point inside the number to get 1010010111001 it will look like 5305 and, in general, recovering the correct fractional number will be subject to unpredictable errors. • The ASCII code for - is 00101101 and mixing this character data with numeric data will also cause confusion.

Fractional arithmetic is exactand does not need a decimal point The use of the decimal or binary point can be avoided if all computations are done using fractions. Using fractions means we must have two numbers stored for each fraction, but that is not a problem. The problem is with the size of these numbers.

Fractional arithmetic and decimal expansion • Fractional arithmetic remains exact while all the digits of both numerator and denominator can be stored. Unfortunately they become very large in any computation with more than a few steps and ...it is difficult to compare relative sizes of numbers, 2189543 / 3239442 (= 0.6759013...) and 7643392 /11232219 (= 0.6804882… Many functions in scientific work ( trigonometric, exponential, etc.) generate irrational numbers which do not have fractional representations.

Locating the binary point • Numbers will have to be stored in their binary form somehow, but what do we do about the binary point and possible minus signs? • When storing fractional numbers in a computer the position of the binary point will have to be standardized somehow; and negative numbers will also have to be managed without introducing non-numeric characters like - .

Floating Point • To store fractional numbers in a computer it is agreed that the binary point should be floated to the top so that every number looks like: • 0.1….. x 2e [binary to the left of x decimal to the right] • Thus 5.25, which in binary is 101.01, will be written as 0.10101x 23- multiplying by 23 ensures that the correct place values can restored from a knowledge of the bit string 10101 (called the mantissa, or the significand) and the number 3 (called the exponent, e) which will be stored in binary form (11).

Normalised Binary Exponential Form • General form: •  m 2n where 0.12  m  1 , neZ • m is the mantissa expressed in base 2In a computer the number of binary digits (bits) in the mantissa (usually infinite) will be truncated to fit the space available for storage. • n is the exponent and is an integer.

Normalised Decimal Exponential Form might remind you ofscientific notation • General form: •  m 10n where 0.1  m  1 , neZ • m is the mantissaIn scientific measurements the number of digits in the mantissa indicates the accuracy or significance of the measurement. • n is the exponent and is an integer.The exponent indicates the appropriate scale or range for the measurement.

Negative exponents • A binary number like 0.0000010010110101010110101will be represented as • 0.10010110101010110101 x 2-5 • (this is called the normalized exponential form) • The exponent is negative.

110.0112 0.0011112 Conversion to Normalised Form = 0.1111 2-2 = 0.11001123 • 5439.2710 • 0.0023100610 5439 = 4096 +1343 = 212 +1024+319 = 212+210+256+63 = 212+210+28+1111112 = 1010100111111 5439.27 = 1010100111111.010… = 0.1010100111111010… 213 0.0023100610 = 0.0000000010010111… = 0.100101110110010001…  2-8

Truncating the mantissa • The normal form of the binary number 110111010.0011001001101001011… is • 0.110111010 0011001001101001011…  29If there are only 18 bits available for storing the mantissa, then the number stored will be0.110111010 001100100  29

There are few finite expansions • More fractions in a base 60 system have finite expansions than in a decimal system,for example: 1/2 = 0.(30), 1/3 = 0.(20) In a base 2 system, the only fractions with a finite expansion are of the form n/2k. All other fractions have repeating expansions. Irrational numbers (like p) have infinite expansions in any base system. In a computer they must necessarily be approximated.

Negative numbers • A single bit could be used to indicate whether a number is positive or negative, but exponents are always integers and they must not be truncated. It is important that the indices of these powers be exact and that when they are added or multiplied the result should be exact. • Consequently the computer representation of integers is different to that of floating point representations of real numbers. • [We will return to the representation of integers later]

Storing the exponent • To avoid negative exponents, we add a fixed number to all exponents (we call it the exponent bias) so that the results are all positive. • The resulting number is called the characteristic and if we have k bits available for storing the characteristic, the exponent bias is usually 2k-1 1. In binary form it is the k bit numeral 01111...1111 for example, 24-1 1 = 710 = 01112 (in 4 bits) 25-1 1 = 1510 = 011112 (in 5 bits)

The characteristic • The characteristic is • n + 2k-1 1 • where n is the exponent and k is the number of bits available for the characteristic. • The range of exponents is limited by the number of bits available for storing the characteristic, but, since we know the exponent bias we can recover the value of the exponent from any characteristic (and no negative numbers are involved in storage.) • In particular we know that all positive exponents will have a characteristic with 1 in the first bit and all negative exponents will have 0 in the first bit of the characteristic.

Sign bit 0 0 0 Characteristic [ k bits] 011101 Characteristic [ 6 bits] Characteristic [ 6 bits] 100100000 Mantissa[ n-k-1 bits] Mantissa[ 9 bits] 100100000 Representation of Real Numbers • Using a 16-bit length with characteristic of 6 bits and exponent bias of 25 1 find the representation of 0.0010012 sign bit: 0 normalized form: 0.1001  2-2 characteristic: 25 1 +(-2) =11111-10 = 11101 16-bit representation: 0011 1011 0010 0000

1 Sign bit 1 1 Characteristic [ k bits] Characteristic 100101 Characteristic 101100010 Mantissa 101100010 Mantissa[ n-k-1 bits] Representation of Real Numbers • Using a 16-bit length with characteristic of 6 bits and exponent bias of 25 1 give the representation of - 44.36 = -101100.0100111…2 = -0.101100010.. (26) characteristic: 25 1 + 6 = 25 +5=100000+101=100101 16-bit representation: 1100 1011 0110 0010 The number actually stored is -44.25

The smallest floating point number • The smallest positive number in normal form is 0.1  2n where n is the number that makes the characteristic 0. In 16 bit length numerals with 6 bits for the characteristic the smallest entry is: 0 000000 100000000If the characteristic is 0 then the exponent is n = 0  (25 1) = -25 + 1 = -32 + 1 = -31 The mantissa of 0.12 = 2-1 so the smallest number is: 0.12 2-31 = 2-1  2-31 = 2-32 = 2.328306437  10-10 The smallest positive number is approximately 0.2328  10-9

The largest entry possible is characteristic mantissa 0 111111 111111111 The characteristic is 26  1 and so the exponent is (26  1)  (25  1) = 26  25 = 25(2 1) = 25 = 32 The largest floating point number The mantissa represents0.111111111 = 111111111  2-9 = (29  1)  2-9(= 1 2-9 = 0.998046875  1) The largest number is therefore (1 2-9) 232223(291) = 0.4286578688  1010 = 4,286,578,688

If 232 = 10x x = log10 232 = 32  log10 2 = 32  0.301029995 = 9.6329959861 10x = 109.6329959861 = 100.6329959861  109 = 4.294967296  109  0.4294  1010 Converting 2n to 10x

Range of Real Numbers • Similarly negative numbers range from • -0.42941010 to -0.232810-9 • -0.42941010 -0.2328  10-90.2328  10-90.42941010 • overflow underflow overflow • Overflow will occur if x < -0.42941010 or x > 0.42941010 • Underflow will occur if -0.232810-9 < x < 0.232810-9

Real Number Addition(using 4 digit mantissa) • Addition must be of terms of the same scale: • 0.2361102 + 0.1455104 • 0.002361104 + 0.1455104 {both 104} • (0.002361 + 0.1455) 104 • 0.147861 104 • 0.1478 104 {4 digit mantissa}

Real Number Multiplication(using 4 digit mantissa) • Multiplication problem is in the mantissa • (0.2361102)  (0.1455 104) • 0.2361  0.1455 102+4 {add indices} • 0.03435255 106 = 0.3435255 105 • 0.3435 105 {4 digit mantissa} Notice that multiplication must work from the largest digit downwards since at some point the number is going to have to be truncated.

The normalized binary exponential form floats the binary point to the top and removes the need for storing the marker as such. The exponent bias and the characteristic ensure that no negatives appear. Rounding off the mantissa and the limited range for the exponent mean that the range of representable numbers is limited: there is a largest and a smallest modulus number that can be represented. What’s the big idea?

Floating the point

Floating the point

Presentation Transcript

Floating Point

Floating Point Representation

Floating point

Floating Point

Floating Point

Floating Point

Floating Point

IA32 Floating Point

Floating Point

Floating Point

Floating point

Floating Point Representation

Floating Point

Floating point

Floating Point

Integer Arithmetic Floating Point Representation Floating Point Arithmetic

Floating Point

Floating Point Arithmetic

Floating Point

Floating Point

Floating Point