5.
REAL NUMBER to FLOATING POINT IN
IEEE754 FORMAT
types of real numbers
Fixed point Floating point
(5,31)10 (5,31) 10 = (53,1 *10-1 ) 10
= (531 * 10-2) 10
(11,01) 2 = (0,531* 101) 10
(3B,7C) 16 (11,01)2 =(1,101* 21)2
=(0,1101* 22)2
=(110,1* 2-1)2
1
IEEE 754 Format ( 32 bits)
SIGN BIT CHARACTERISTIC MANTISSA
1 bit 8 bits 23 bits
Representation of X in VF
let X = (7,625)10 a real number, to represent it in
floating point (FP) format, you need to :
2
1. Convert x to binary
X= (7,625)10 = (111,101)2
2. Shift the decimal point so that only one 1 remains in
the integer part:
X = ± 1, M*2e
X = (111,101)2
X= 1,11101 * 22
3
3. Calculate the sign , characteristic and mantissa
Sign (S) : 1 bit
X= 7,625 X>0 SO S = 0
Caracteristic (C): 8 bits
C= e+127
X= 1,11101 *22
e= 2 c = 2 +127 = (129)10 = (1 0 0 0 0 0 0 1) 2
Mantissa (M): X = ± 1, M*2e
M= 11101 (5bits)
M= 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (23 bits)
4
4. The floating point representation of X :
0 10000001 11101000000000000000000
S C M
5. Condensed form in hexadecimal
01 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00
4 0 F 4 0 0 0 0
X= 4 0 F 4 0 0 0 0
5
COVERTION OF A NUMBER X REPRESENTED
IN FLOATING POINT (FP) TO DECIMAL
Let X = C 1 E 9 0 0 0 0 , a number represented in FP, to
calculate its decimal we need to :
1.Convert X to binary
X=11000001111010010000000000000000
2. Give the FP representation
1 10000011 11010010000000000000000
S C M
6
3. Give X in the form : X = ± 1, M*2e
S = 1 => X < 0
C = 1 0 0 0 0 0 1 1 = (131)10
c= e+127 donc e = C– 127 => e = 131 – 127
e=4
M=11010010000000000000000
X= - 1, 1101001* 24
4. Give X as a decimal
X = - 11101,001
X = - (29,125) 10
7
ADDITION OF 2 NUMBERS IN FP
To add 2 floating-point numbers, they must have the
same exponent.
Example:
A = 4 0 D 9 0 0 0 0 et B = 3 E 9 A 0 0 0 0 in FP
Perform A + B
8
Convert A to Decimal
A=40D90000
0100 0000 1101 1001 0000 0000 0000 0000
0 10000001 10110010000000000000000
S C M
S= 0 So A>0
c = 1 0 0 0 0 0 0 1 = (129)10 => e = 129 – 127 => e = 2
M = 1 0 1 10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A = 1 , 1 0 1 1 0 0 1 * 22
9
Convert B to Decimal
B=3E9A0000
0011 1110 1001 1010 0000 0000 0000 0000
0 01111101 00110100000000000000000
S C M
S= 0 So B>0
C = 01111101 = 12510 => e = 125 – 127 => e = -2
M = 0 0 1 1 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B = 1 , 0 0 1 1 0 1 * 2-2
10
Setting of exponents
A = 1 , 1 0 1 1 0 0 1 * 22
B = 1 , 0 0 1 1 0 1 * 2-2
We set B to the same exponent as A by moving the
decimal point 4 positions to the left.
B = 0 , 0 0 0 1 0 0 1 1 0 1 * 22
Addition of A and B
11
Replace the result in IEEE form
0 10000001 1000101010000000000000
S C M
c = 2 +127 = 12910 = 1 0 0 0 0 0 0 1
Give the condensed form in hexadecimal
A + B = 4 0 C 5A 0 0 0
Note
If the integer part of the result is greater than 1, shift the decimal point one
place to the left and increase the exponent by 1:
example : A + B = 10,11001101 * 22 => A + B = 1,011001101 * 23
12