Exponential Notation
FLOATING POINT NUMBERS
Englander Ch. 5
The following are equivalent representations of 1,234
123,400.0 12,340.0 1,234.0 123.4 12.34 1.234 x 10-2 x 10-1 x 100 x 101 x 102 x 103 The representations differ in that the decimal place the point -- floats to the left or right (with the appropriate adjustment in the exponent).
0.1234 x 104
ITEC 1011 Introduction to Information Technologies
Parts of a Floating Point Number
Exponent
Exponent Excess 50 Representation
With 2 digits for the exponent and 5 for the mantissa: from .00001 x 10-50 to .99999 x 1049
-0.9876 x
Sign of mantissa Location of decimal point Mantissa
10-3
Sign of exponent Base
ITEC 1011
Introduction to Information Technologies
Overflows / Underflows
From .00001 x 10-50 to .99999 x 1049 1 x 10-55 to .99999 x 1049
Typical Floating Point Format
IEEE 754 Standard
Most common standard for representing floating point numbers Single precision: 32 bits, consisting of...
Single Precision Format
32 bits
Sign bit (1 bit) Exponent (8 bits) Mantissa (23 bits)
Double precision: 64 bits, consisting of Mantissa (23 bits) Exponent (8 bits) Sign of mantissa (1 bit)
ITEC 1011 Introduction to Information Technologies
Sign bit (1 bit) Exponent (11 bits) Mantissa (52 bits)
ITEC 1011 Introduction to Information Technologies
Double Precision Format
64 bits
Normalization
The mantissa is normalized Has an implied decimal place on left Has an implied 1 on left of the decimal place E.g.,
Mantissa: 10100000000000000000000 Representation: 1.1012 = 1.62510
Mantissa (52 bits) Exponent (11 bits) Sign of mantissa (1 bit)
ITEC 1011 Introduction to Information Technologies
ITEC 1011
Introduction to Information Technologies
Excess Notation
To include both positive and negative exponents, excess-n notation is used Single precision: excess 127 Double precision: excess 1023 The value of the exponent stored is n larger than the actual exponent E.g., excess 127, 10000111 Exponent: 135 127 = 8 (value) Representation:
ITEC 1011 Introduction to Information Technologies
Excess Notation
- Sample Represent exponent of 1410 in excess 127 form: 12710 1410 Representation = = = + 011111112 + 000011102 100011012
ITEC 1011
Introduction to Information Technologies
Excess Notation
- Sample Represent exponent of -810 in excess 127 form: 12710 - 810 Representation = = = + 011111112 - 000010002 011101112
Example
Single precision
0 10000010 11000000000000000000000
1.112 = 1.7510 130 127 = 3 0 = positive mantissa
+1.75 23 = 14.0
ITEC 1011 Introduction to Information Technologies ITEC 1011 Introduction to Information Technologies
Exercise Floating Point Conversion (1)
What decimal value is represented by the following 32-bit floating point number?
1 10000010 11110110000000000000000
Exercise Floating Point Conversion (1)
Answer
What decimal value is represented by the following 32-bit floating point number?
1 10000010 11110110000000000000000
Answer:
Answer: -15.6875
Skip answer
ITEC 1011 Introduction to Information Technologies
Answer
ITEC 1011 Introduction to Information Technologies
Step by Step Solution
1 10000010 11110110000000000000000
Step by Step Solution : Alternative Method
1 10000010 11110110000000000000000
To decimal form
130 - 127 = 3 1.11110110000000000000000000 130 - 127 = 3 Shift Point
To decimal form
1.11110110000000000000000000 1111.10110000000000000000000
1 + .5 + .25 + .125 + .0625 + 0 + .015625 + .0078125
23 * 1.9609375
( negative )
= 15.6875
( negative )
- 15.6875
- 15.6875
Introduction to Information Technologies ITEC 1011 Introduction to Information Technologies
ITEC 1011
Exercise Floating Point Conversion (2)
Express 3.14 as a 32-bit floating point number
Answer: (Note: only use 10 significant bits for the mantissa)
Exercise Floating Point Conversion (2)
Answer
Express 3.14 as a 32-bit floating point number
Answer:
0 10000000 10010001111010111000010
Skip answer
ITEC 1011 Introduction to Information Technologies
Answer
Detail Solution : 3.14 to IEEE Simple Precision
3.14 To Binary: Delete implied left-most 1 and normalize Poof ! Exponent = 127 + 1 position point moved when normalized Value is positive: Sign bit = 0
IEEE Single-Precision Floating Point Format
si g n s 0 1 exponent 8 9 f r ac t io n f1f2 . . . f23 31
11. 0010001111010111000010
10010001111010111000010
10000000
0 10000000 10010001111010111000010
^ e 255 254 ... 2 1 0
e none 127 ... -125 -126 -126
Value none (-1)s(1.f1f2...)2127 ... (-1)s(1.f1f2...)2-125 (-1)s(1.f1f2...)2-126 (-1)s(0.f1f2...)2-126
Type Infinity or NaN Normalized ... Normalized Normalized Denormalized
Exponent bias is 127 for normalized #s
Decimal Floating-Point Add and Subtract Examples
Operands 6.144 102 +9.975 104 Alignment 0.06144 104 +9.975 104 10.03644 104 Normalize & round 1.003644 105 + .0005 105 1.004 105
Floating Point Calculations: Addition
Numbers must be aligned: have the same exponent (the larger one, to protect precision) Add mantissas. If overflow, adjust the exponent Ex. 0 51 99718 (e = 1) and 0 49 67000 (e = -1)
Align numbers: 0 51 99718 0 51 00670
Operands 1.076 10-7 -9.987 10-8
Alignment 1.076 10-7 -0.9987 10-7 0.0773 10-7
Normalize & round 7.7300 10-9 + .0005 10-9 7.730 10-9
Add them:
99718 + 00670 1 00388
Overflow
Round the number and adjust exponent: 0 52 10039
Floating Point Calculations: Multiplication
(a * 10^e) * (b * 10^f) = a * b * 10^(e+f) Rule: multiply mantissas; add exponents But: (n + e) + (n + f) = 2 * (n + e + f) Must subtract excess n from result Ex. 0 51 99718 (e = 1) and 0 49 67000 (e = -1)
Mantissas: Exponents: Normalize: Final result: .99718 * .67000 = 0.6681106 51 + 49 = 100 and 100 50 = 50 .6681106 .66811 .66811 * 10 (since 50 means e = 0)