0 ratings0% found this document useful (0 votes) 158 views15 pagesFloating Point Tutorial
learn to use floating points in digital electronics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
eaor7 Floating Point Tutorial | IEEE 784 Floating Point basics | Wérials
RF Wireless World
HOME ARTICLES TUTORIALS APP.NOTES
SOURCE TERMINOLOGY ACADEMIC. T&M
CALCULATORS NEWS GENERAL BOOKS
CONTACT SITEMAP
VENDORS.
DOWNLOADS
Home of RF and Wireless Vendors and
Resources
One Stop For Your RF and Wireless Need
Floating Point Tutorial
This floating point tutorial covers IEEE 754 Standard
Floating Point Numbers, floating point conversions,Decimal
to IEEE 754 standard floating point, floating point standard
to Decimal point conversion, floating point Arithmetic, IEEE
754 standard Floating point multiplication Algorithm floating
point Addition Algorithm with example,floating point Division
Algorithm with example and more.
IEEE 754 Standard Floating
Point Numbers
This Tutorial attempts to provide a brief overview of IEEE
Floating point Numbers format with the help of simple
examples, without going too much into mathematical detail
and notations. At the end of this tutorial we should be able
to know what are floating point numbers and its basic
arithmetic operations such as addition, multiplication &
division. An IEEE 754 standard floating point binary word
ip fwieless- were. comTiarilsMoting-point-tril nim
RF WIRELESS
TUTORIALS
Zigbee | z-wave |
Bluetooth | GSM |
UMTS | LTE |
WLAN | 802.11ac |
loT | RADAR |
satellite |
Waveguide |
astemo? Foting Poi Tura IEEE 74 Floating Pon besos | rials
consists of a sign bit, exponent, and a mantissa as shown in
the figure below, IEEE 754 single precision floating point
number consists of 32 bits of which
1 bit = sign bit(s)
8 = Biased exponent bits (e)
23 = mantissa (m)
0 [ 1000 0001 Toi01 G000 000 0000 G000 G00
Sign bit Exponent Mantissa
qd) (8) bits, (23) bits
Fig 1: IEEE 754 Floating point standard floating point word
The Decimal value of a normalized floating point numbers in
IEEE 754 standard is represented as
pectin atvelon E (ay ae
1
pins = 2°? 4
Fig 2: Equation-1
Exponent Mantissa:
a = (2) bits (22) bits
o 1000 9001 [0101 0060 6000 0000 0000 G00
POPULAR
TUTORIALS
DECT| ISDN} ATM
WBAN| TransferJet|
BLE| Femtocell]
"1" is not represented in floating point HSPA| BACnet|
representation, itis hidden Ethemet| TETRAI
Fig 3 Ethemet| TETRA
Note: "1" is hidden in the representation of IEEE 754 Underwater
wireless| 5G) LiFil
LoRal NFC|
Infrared) RF
measurements|
conversions are calculations. VSAT| Diode] SS7|
floating point word, since it takes up an extra bit location
and it can be avoided. It is understood that we need to
append the "1" to the mantissa of a floating point word for
Networking| Network
ip fwieless- were. comTiarilsMoting-point-tril nimeaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | Wécrials
Security| FTTH]
KNX] WAP] Mobile
Pp
For example in the above fig 1: the mantissa represented is
0101_0000_0000_0000_0000_000 in actual it is
(1.mantissa) =1. 0101_0000_0000_0000_0000_000.
FOLLOW US @
To make the equation 1, more clear let's consider the
example in figure 1 lets try and represent the floating point
binary word in the form of equation and convert it to
equivalent decimal value
Floating point binary word X1=
SL EL ML
x= [0 1000 0001 [101 6000 0000 0000 6000 000
Fig 4
Sign bit (S1) =0. Biased Exponent (E1) =1000_0001 (2) =
129(10). Mantissa (M1)
=0101_0000_0000_0000_0000_000
eng
mr
bins
Decimal value =(-1)° x 1.0101 0000 0000 0000 0000 000 x 2129-127)
1101 000 G0e0 0000 0000 900 x22)
01.61 0000 0900 0000 000 000 aay tari
| stein
= (1x22 }H(Ox2! (1x20). (0x21) + (x2) +
(03273), (0274)
(440+14040.25)
Fig 5 ~
IEEE 754 standard floating
point conversions
Let's look into an example for decimal to IEEE 754 floating
point number and IEEE 754 floating point number to
decimal conversion, this will make much clear the concept
and notations of floating point numbers.
ip fwieless- were. comTiarilsMoting-point-tril nim ansa Peng ee Total EEE Tot Feng Pic [na
Decimal to IEEE 754 standard
floating point
Let take a decimal number say 286.75 lets represent it in
IEEE floating point format (Single precision, 32 bit). We
need to find the Sign, exponent and mantissa bits.
1) Represent the Decimal number 286.75(1o) into Binary
format
286.75 (10) = 100011110.11 2)
2) The binary number is not normalized, Normalize the
binary number. Shift the decimal point such that we get a 1
at the very end (i.e 1.m form).
100011110.11 x 2°
10001111.011 x 2!
1000111.1011 x 22
100011.11011 x 2
10001.111011 * 2
1000.1111011 * e
100,0111L0L1 * 2
10,001111011 * >
10001111011 * 2
LS
Hidden A
mantissa
one
(1M format)
Fig 6
We had to shift the binary points left 8 times to normalize it;
exponent value (8) should be added with bias. We got the
value of mantissa.
ip fwieless- were. comTiarilsMoting-point-tril nimteemot7 Flosing Pn Tia | IEEE 764 Floating Pn basis | rials
Note: In Floating point numbers the mantissa is treated as
fractional fixed point binary number, Normalization is the
process in which mantissa bits are either shifted right or to
the left(add or subtract the exponent accordingly) Such that
the most significant bit is "1".
3) Bias =2°")- 4,
In our case e=8(IEEE 754 format single precision)
Bias = 2°") 1 = 127
(This is the bias value for single precision IEEE floating
point format).
4) The biased exponent e is represented as E = exponent
vale obtained after normalization in step 2 + bias E = 8 +
127 = 135(10) , convert this to binary and we have our
exponent value E = 10000111(2)
5) We have our floating point number equivalent to 286.75
0 T1060 0111 Too01 1116 1100 0060 :9000 060
Sign bit Exponent ‘Mantissa
oO @ bits 23) bits
Fig 7
Now with the above example of decimal to floating point
conversion, it should be clear so as to what is mantissa,
exponent & the bias.
IEEE 754 standard floating
point standard to Decimal
point conversion
Lets inverse the above process and convert back the
floating point word obtained above to decimal. We have
already done this in section 1 but for a different value.
ip fwieless- were. comTiarilsMoting-point-tril nim siseaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | técrials
Decimal num = 1. 0001 1110 1100 0000 0000 000 x 2{e-bias)
= 1, 0001_1110_1100_0000_0000_o00x 2(135-127)
= 1, 0001_1110_1100_0000_0000_000x 2(*)
100011110, 110000000000000¢binary fraction)
=286.75(10)
Fig 8
IEEE 754 standard floating
point Arithmetic
Let us look at Multiplication, Addition, subtraction &
inversion algorithms performed on IEEE 754 floating point
standard. Let us consider the IEEE 754 floating point format
numbers X1 & X2 for our calculations.
x2
a
Sign bi Eaponect Mantissa Sign Exponent Mactssa
o Oras tits oO @as apis
Fig 9
IEEE 754 standard Floating
point multiplication Algorithm
A brief overview of floating point multiplication algorithm
have been explained below, X1 and X2.
Result X3 = X1 * X2
= (-1)87 (M1 x 26") * (-1) 82 (M2 x 2)
$1, S2 => Sign bits of number X1 & X2.
E1, E2: =>Exponent bits of number X1 & X2.
M1, M2 =>Mantissa bits of Number X1 & X2.
1) Check if one/both operands = 0 or infinity. Set the result
to 0 or inf. i.e, exponents = all "0" or all
2) S1, the signed bit of the multiplicand is XOR'd with the
multiplier signed bit of $2. The result is put into the resultant
sign bit.
ip fwieless- were. comTiarilsMoting-point-tril nim ansteemot7 Flosing Pn Tia | IEEE 764 Floating Pn basis | rials
3) The mantissa of the Multiplier (M1) and multiplicand (M2)
are multiplied and the result is placed in the resultant field of
the mantissa (truncate/round the result for 24 bits).
=M1 * M2
4) The exponents of the Multiplier (E1) and the multiplicand
(E2) bits are added and the base value is subtracted from
the added result. The subtracted result is put in the
exponential field of the result block.
=E1+E2-bias
5) Normalize the sum, either shifting right and incrementing
the exponent or shifting left and decrementing the
exponent.
6) Check for underflow/overflow. If Overflow set the output
to infinity & for underflow set to zero
7) If (E1 + E2 - bias) >= to Emax then set the product to
infinity.
8) If £1 + E2 - bias) is lesser than/equal to Emin then set
product to zero.
Example:
Floating Point Multiplication is simpler when compared to
floating point addition. Let's try to understand the
Multiplication algorithm with the help of an example.
Let's consider two decimal numbers
X1 = 125.125 (base 10)
X2 = 12.0625 (base 10)
X8= X1 * X2 = 1509.3203125
Equivalent floating point binary words are
Xt=
ip fwieless- were. comTiarilsMoting-point-tril nim mseaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | Wécrials
SL EL ML
al= O _|10000101 _[11110100100000000000000)
82 E2 M2
0 [10000010 [10000010000000000000000)
Fig 10
1) Find the sign bit by xor-ing sign bit of A and B
i.e, Sign bit = > (0 xor 0) => 0
2) Multiply the mantissa values including the "hidden one"
The Resultant product of the 24 bits mantissas (M1 and M2)
is 48bits (2 bits are to the left of binary point)
M3=1.MI ' 1AM2 = (40),1111001010101901900000000000000000000000000000
M3 = 1,01111001010101001000000000000000000000000000000 x 24
(Normalized binary)
‘Hidden “1"
Fig 11
If M3 (48) = "1" then left shift the binary point and add "1" to
the exponent else don't add anything. This normalizes the
mantissa. Truncate the result to 24 bits. Add the exponent
"4" to the final exponent value
3) Find exponent of the result. = E1 + E2 -bias +
(normalized exponent from step 2) = (10000101)2 +
(10000010)2 - bias +1 = 133 + 130 - 127 + 1 = 137.
Add the exponent value after normalization to the biased
exponent obtained in step 2. i.e. 136+1 = 137 => exponent
value.
Note: The normalization of the product is simpler as the
range of M_A and M_Bis between 1 - 1.9999999.and the
range of the product is between (1 - 3.9999999)Therefore a
1 bit shift is required with the adjust of exponent. So we
have found mantissa, sign, and exponent bits
ip fwieless- were. comTiarilsMoting-point-tril nim aiseaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | Wécrials
4) We have our final result i.e.
EB MB
10001001 _[01111001010101001000000)
Fig 12
If we convert this to decimal we get X=1509.3203125
Floating Point Multiplication is simpler when compared to
floating point addition we will discuss the basic floating point
multiplication algorithm. The simplified floating point
multiplication chart is given in Figure 4.
IEEE 754 standard floating
point Addition Algorithm
Floating-point addition is more complex than multiplication,
brief overview of floating point addition algorithm have been
explained below
X3 = X1+X2
X3 = (M1 x 26") 4/- (M2 x 2&4)
1) X1 and X2 can only be added if the exponents are the
same i.e E1=E2
2) We assume that X1 has the larger absolute value of the 2
numbers. Absolute value of of X1 should be greater than
absolute value of X2, else swap the values such that
Abs(X1) is greater than Abs(X2).
Abs(X1) > Abs(X2).
3) Initial value of the exponent should be the larger of the 2
numbers, since we know exponent of X1 will be bigger ,
hence Initial exponent result E3 = E1
4) Calculate the exponent's difference i.e. Exp_diff = (E1-
E2).
5) Left shift the decimal point of mantissa (M2) by the
ip fwieless- were. comTiarilsMoting-point-tril nim ansteemot7 Flosing Pn Tia | IEEE 764 Floating Pn basis | rials
exponent difference. Now the exponents of both X1 and X2
are same
6) Compute the sum/difference of the mantissas depending
on the sign bit S1 and S2
If signs of X1 and X2 are equal (S1 =
$2) then add the
mantissas
If signs of X1 and X2 are not equal ($1 != $2) then subtract
the mantissas
7) Normalize the resultant mantissa (M3) if needed. (1.m3
format) and the initial exponent result E3=E1 needs to be
adjusted according to the normalization of mantissa.
8) If any of the operands is infinity or if (E3>Emax) ,
overflow has occurred ,the output should be set to infinity.
If(E3 < Emin) then it's a underflow and the output should be
set to zero.
9) Nan's are not supported.
IEEE 754 standard floating
Equivalent floating point binary words are
S1 EL MI
XL= 0 |10000010 _ |00111000000000000000001
s2 Rr M2
0 [OLTITITO T00100000000000000000000)
Fig 13
1) Abs (A) > Abs (B)? Yes. 2) Result of Initial exponent E3 =
E1 = 10000010 = 130(10) 3) E1 - £2 = (10000010 -
01111110) => (130-126)=4 4) Shift the mantissa M2 by (E1-
E2) so that the exponents are same for both numbers.
ip fwieless- were. comTiarilsMoting-point-tril nim sonseaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | Wécrials
1,00100000000000000000000
= pono LodLanDdeooRpaoONeAOARONO,
= 0,00010010000000000000000
(Aligned mantissa)
Fig 14
5) Sign bits of both are equal? Yes, Add the mantissa's
1,00111000000000000000000 (1.411)
+ 0,00010010000000000000000 (eigned Ma)
1,01001010000000000000000 1.63
Fig 15
6) Normalization needed? No, (if Normalization was
required for M3 then the initial exponent result E3=E1
should be adjusted accordingly)
7) Result
$3 EB M3
B= 0 [10000010 _|01001010000000000000000}
Fig 16
X3 in decimal = 10.3125.
8) If we had to perform subtraction, just change the sign bit
of X2 to"1",
Then we would have subtracted the mantissas. Since sign
bits are not equal
1,0011.1000000000000000000. 1x41)
= 0,00010010000000000000000 aizned,
1.001001 10000000000000000 1.143
83 EB MB
X3.= [0110000010 7001001 10000000000000000)
xin decimal = 9.1875,
Fig 17
NOTE: For floating point Subtraction, invert the sign bit of
the number to be subtracted And apply it to floating point
Adder
ip fwieless- were. comTiarilsMoting-point-tril nim sseaor7 Floating Pint Tutorial | IEEE 754 Floating Point basics | Wécrials
IEEE 754 standard floating
point Division Algorithm
Division of IEEE 754 Floating point numbers (X1 & X2) is
done by dividing the mantissas and subtracting the
exponents
X3 = (X1/X2)
= (-1)81 (M1 x 25") / (-1) S2 (M2 x 254)
= (-1) S8(mtima2) 2 E+?)
1) If divisor X2 = zero then set the result to "infinity", if both
X11 and X2 are zero's set it to "NAN"
2) Sign bit S3 = (S1 xor $1)
3) Find mantissa by dividing M1/M2
4) Exponent £3 = (E1 - E2) + bias
5) Normalize if required, i.e by left shifting the mantissa and
decrementing the resultant exponent
6) Check for overflow/underflow
If E3 > Emax return overflow i.e. "infinity"
If E3 < Emin return underflow i.e. zero
IEEE 754 standard floating
point Division Example:
X1=127.03125
X2=16.9375,
Equivalent floating point binary words are
sl El Mi
x1= [0 [10000101 [11111100001000000000000}
S2 E2 M2
x2 © [10000011 [00001111000000000000000)
Fig 18
ip fwieless- were. comTiarilsMoting-point-tril nim vanssa20i7 Foting Poi Tura IEEE 74 Floating Pon besos | rials
1) $3 = $1 xor $2=0
2) E3 = (E1 - 2) + bias = (10000101) - (10000011)+
(1111111)
= 133-1314127 => 129 => (10000001)
3) Divide the mantissas M1/M2
1,11111100001000000000000 (1.M1)
+ 1,00001111000000000000000 (M2)
1.111000000000000000 LLMs
Fig 19
4) Result
83 BB 3B
XO= oO 10000001 |11100000000000000000000)
Fig 20
X3 in decimal = 7.5
Single rrecizion Dautle preckion
32 6
Wore LenS
We 5 2
a7,
3ani0™
Table 1: IEEE 754 Floating point Standard
ip fwieless- were. comTiarilsMoting-point-tril nim
sansreaot7 Floating Pint Tutorial | IEEE 754 Floating Pint basics | Wécrials
Sgn] Emer] Manisa Sgrircane Romber__| Comers
) Represented
7 weeoseoo | onpeosenscsanenncase0e —[+i-0
3] woenbG00-] NenzZer= Denermalzed umber
‘o00c0000000009000000002 the Exponent. i all
‘os0aon0a0s90000900000%0 eros and mantiess ©
‘ee000000020000000000012 rien zero than. the
numbers. Representae
aro hnown as
Denermalized numbers,
Other tem
Denormals
% Livsésa2e-
Hii
| eoooweoo | Nonzere = Wa" Danormalaad
umber
0 | ar02ss—[arynomber "We “Fling. POR
Numbers
7 [ies [yr hye “Haag Pat
Numbers
on ae Tee Sy
OF | Nonzare nan}
Table 2: IEEE 754 Number Representation Table
RF and Wireless tutorials
WLAN 802.11ac 802.11ad wimax Zigbee zwave GSM
SDH CS vs PS MS vs PS
Share this page
ip fwieless- were. comTiarilsMoting-point-tril nim anseaor7 Floating Pint Tutorial | IEEE 784 Floating Point basics | trials
Translate this page
\Vaalg sprog a
Leveret af Google Overszot
ARTICLES T&M section TERMINOLOGIES
Tutorials Jobs & Careers VENDORS loT Online
calculators source codes APP.NOTES T&M World
Website
HOME VENDORS T&M BOOKS.
ARTICLES SOURCE CALCULATORS DOWNLOADS
TUTORIALS TERMINOLOGY NEWS CONTACT
APP.NOTES ACADEMIC GENERAL SITEMAP
ORF Wireless World 2012, RF & Wireless Vendors and Resources, Free HTMLS Templates
ip fwieless- were. comTiarilsMoting-point-tril nim 1515