Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views23 pages

Week 7 B

The document discusses floating-point representation, conversion, and arithmetic in computer organization and assembly language. It explains the components of floating-point numbers, the IEEE 754 standard, and the conversion processes between binary and float values. Additionally, it covers the complexities of floating-point arithmetic operations, particularly addition and subtraction.

Uploaded by

Rayyan Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views23 pages

Week 7 B

The document discusses floating-point representation, conversion, and arithmetic in computer organization and assembly language. It explains the components of floating-point numbers, the IEEE 754 standard, and the conversion processes between binary and float values. Additionally, it covers the complexities of floating-point arithmetic operations, particularly addition and subtraction.

Uploaded by

Rayyan Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

COMPUTER ORGANIZATION &

ASSEMBLY LANGUAGE(CS-215T)

Lecture by:
Dr. Abdul Hameed
Assistant Professor
CSD, SSUET
Batch
2020F
Floating-Point
Representatio
n, Conversion
Week 7
& Floating-
Point
Arithmetic

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT REPRESENTATION
Why the name “Floating-Point”?
Consider the number 32
It is called fixed-point number
That is, the decimal point is fixed at the end
32 => 32.0
Remember Scientific Notation
32 can be written as 3.2 x 101
Now, it becomes a Real number (a fraction)
And the decimal point floats w.r.t power of 10
32.0 = 3.2 x 101 = 0.32 x 102 = 0.032 x 103
Or 32.0= 32.0 x 100 = 320.0 x 10-1 = 3200.0 x 10-2
3

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT REPRESENTATION
Why the name “Floating-Point”?

3.2 x 101 32.0 x 101-1


Moving decimal point to right results in
subtracting 1 from the power of 10

3.2 x 101 0.32 x 101+1


Moving decimal point to left results in
adding 1 to the power of 10
Floa n F
g l o ti n g
tingloati atin loa 4
F gF
Computer Science & Information Technology Department
Sir Syed University of Engg. & Tech.
FLOATING-POINT REPRESENTATION
So far we know that
in floating-point there are three elements
a number, (Significand)
the base value, (Base)
and a power to the base, (Exponent)
S x B E
The number is stored in memory with three fields:
Sign:  (plus or minus)
Significand S
Exponent E
The base B is implicit and need not to be stored
5

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT REPRESENTATION
Typical 32-Bit Floating-Point Format

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT REPRESENTATION
Biased-Exponent?
Bias is a fixed value
Bias is used to get the true value of exponent
For this, bias is subtracted from the exponent field
Typical value is (2k-1 – 1)
Here, k = number of bits in binary exponent
If k=8, then bias is 127
True exponent values are in the range
-127 to +128

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT’S BENEFITS
Fixed-Point has limitations
Can not represent very large numbers
Can not represent fractions

Recall Scientific Notation,


We can get around this limitation

Very large number can be represented as


976,000,000,000,000 = 9.67 x 1014

Fraction values can be represented as


0.0000000000000967 = 9.67 x 10-14
8

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
IEEE STANDARD FOR FLOATING-
POINT
Floating-Point representation is defined in
IEEE Standard 754,
adopted in 1985,
revised in 2008
IEEE 754-2008 covers both decimal and binary
floating-point representations.
Three basic binary formats have bits lengths of
32 bits with exponent of 8 bits
64 bits with exponent of 11 bits
128 bits with exponent of 15 bits

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
IEEE STANDARD FOR
BINARY FLOATING-POINT
Three basic binary formats have bits lengths of
32 bits with exponent of 8 bits

10

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
IEEE STANDARD FOR
BINARY FLOATING-POINT
Three basic binary formats have bits lengths of
64 bits with exponent of 11 bits

11

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
IEEE STANDARD FOR
BINARY FLOATING-POINT
Three basic binary formats have bits lengths of
128 bits with exponent of 15 bits

12

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT CONVERSION
PRINCIPLE
Conversion #1
IEEE 754 Conversion (32-Bits to Float value)

Divide 32 bits into three fields


Convert the exponent value
from unsigned binary to unsigned decimal
subtract 127, call it ‘E’
Convert significand to a floating point number
between 1 to 1.999, call it ‘S
Float value =  S x 2e

13

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT CONVERSION
PRINCIPLE
Conversion #1: Example
IEEE 754 Conversion (32-Bits to Float value)
Bit = 43FC0000
Binary = 0100 0011 1111 1100 0000 0000 0000 0000

Sign =0 (+ve)
E = 10000111 = 135
= 135 – 127 = 8
S = 11111000000000000000000
= 1+.5+.25+.125+.0625+.03125 = 1.96875
Float value = +1.96875 x 28 = +1.96875 x 256
= 504.0 14

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT CONVERSION
PRINCIPLE
Conversion #2
IEEE 754 Conversion (Float value to 32-Bits)

Let f be the float value


Determine largest power of 2 not greater than f
call it ‘p’ such that f = (f/2p)x2p
\S = f/2p , subtract 1
convert remaining value to binary
with each bit position a negative power of 2
Also, E = p, add 127 and convert to binary
If f is negative, sign-bit = 1 else sign-bit = 0
15

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT CONVERSION
PRINCIPLE
Conversion #2: Example
IEEE 754 Conversion (Float value to 32-Bits)
f = 1208.0 = (1208 / 1024)x1024
= 1.1796875 x 210
S = 1.1796875, subtract 1 will give 0.1796875
Converting 0.1796875 to binary
by subtracting negative power of 2
S = 0.1796875 – 0.125 = 0.0546875
= 0.0546875 – 0.03125 = 0.0234375
= 0.0234375 – 0.015625 = 0.0078125
= 0.0078125 – 0.0078125 = 0
S = 001011100000000000000 16

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT CONVERSION
PRINCIPLE
Conversion #2: Example (continued)
IEEE 754 Conversion (Float value to 32-Bits)
f = 1208.0 = (1028 / 1024)x1024
= 1.1796875 x 210
E = 10, add 127,
E = 137
Converting E to unsigned binary
E = 10001001
Complete 32-bits values
= 0 10001001 001011100000000000000
= 010001001001011100000000000000
= 44970000 17

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech. Lecture by: Mr. Shakir Karim
BINARY EXPONENT TO DECIMAL
CONVERSION TABLE

BACK

18

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT ARITHMETIC
A floating-point operation may produce
Exponent Overflow
A positive exponent exceeds
a maximum possible exponent value.
may be designated as + or -
Exponent Underflow
A negative exponent is less than
minimum possible exponent value.
the number is too small
may be reported as 0

19

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT ARITHMETIC
OPERATIONS

20

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT ARITHMETIC:
EXAMPLES

21

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT ARITHMETIC
Addition and Subtraction:
More complex than Multiplication and Division
Due to the need for alignment

22

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.
FLOATING-POINT ARITHMETIC
Addition and Subtraction:
More complex than Multiplication and Division
Due to the need for alignment

Four basic phases for addition/subtraction algorithm

1. Check for Zeros


2. Align the Significant
3. Add or Subtract the Significands
4. Normalize the result

23

Computer Science & Information Technology Department


Sir Syed University of Engg. & Tech.

You might also like