0% found this document useful (0 votes)

151 views6 pages

Floating Point Representation - M.eng Term Paper

Floating point numbers represent real numbers in computing by approximating their values. They consist of three parts - a sign bit, exponent field, and mantissa. The IEEE 754 standard specifies floating point number formats, including 32-bit single precision and 64-bit double precision, which place the sign bit first followed by exponent and mantissa bits. This representation allows floating point numbers to support a wide range of large and small values. Decimal numbers are converted to floating point by normalizing their binary representation and mapping the bits according to the IEEE 754 standard.

Uploaded by

MAROOF OYEWO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views6 pages

Floating Point Representation - M.eng Term Paper

Uploaded by

MAROOF OYEWO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Floating Point Representation

Oyewo Maroof Olayinka

Department of Electrical/Electronic Engineering,

Federal University of Petroleum Resources, Effurun

8th August, 2019

[email protected]

Abstract

The term floating point refers to the fact that a number's radix point (decimal point, or, more

commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the

significant digits of the number. This position is indicated as the exponent component, and thus the

floating-point representation can be thought of as a kind of scientific notation. (Wikipedia, 2019)

In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real

numbers as an approximation so as to support a trade-off between range and precision. For this

reason, floating-point computation is often found in systems which include very small and very large

real numbers, which require fast processing times. (Wikipedia, 2019)

In this paper, we will be examining floating point representation of binary integers and its

applications in the field of computing.

Overview
Decimal Floating Point Representation

A floating-point number (or real number) can represent a very large (1.23×1088) or a very small

(1.23×10-88) value. It could also represent very large negative number (-1.23×1088) and very small

1
negative number (-1.23×10-88), as well as zero, as illustrated: (C. Hock-Chuan, 2014. [Online]:

https://www.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html)

Image from www.ntu.edu.ng

A floating point usually consist of a fractional representation of the actual number (fraction, F) –

which is derived by shifting the decimal point from the end (after the LSB) to the left just at the right

of the MSB – an exponent, ‘e’ of a certain radix, ‘r’ (usually 10 for decimal system). The exponent or

the index of the radix is a figure which indicates the number of times the decimal point has to be

shifted to the right, (for an integer >1), or to the left (for an integer <1) to obtain the number in its

real form. This can be represented as: F x re

Representation of floating point number is not unique. For example, the number 55.66 can be

represented as 5.566×101, 0.5566×102, 0.05566×103, and so on. The fractional part can be

normalized. In the normalized form, there is only a single non-zero digit before the radix point. For

example, decimal number 123.4567 can be normalized as 1.234567×102; binary number 1010.1011B

can be normalized as 1.0101011B×23. (C. Hock-Chuan, 2014. [Online]:

https://www.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html)

IEEE 754: floating point in modern computers

We may do the same in binary and this forms the foundation of our floating point number.

2
Here it is not a decimal point we are moving but a binary point and because it moves it is referred to

as floating. What we will look at below is what is referred to as the IEEE 754 Standard for

representing floating point numbers. The standard specifies the number of bits used for each section

(exponent, mantissa and sign) and the order in which they are represented.

This first standard is followed by almost all modern machines. It was revised in 2008. IBM

mainframes support IBM's own hexadecimal floating point format and IEEE 754-2008 decimal

floating point in addition to the IEEE 754 binary format.

The standard specifies the following formats for floating point numbers:

Single precision, which uses 32 bits and has the following layout:

 1 bit for the sign of the number. 0 means positive and 1 means negative.

 8 bits for the exponent.

 23 bits for the mantissa.

1 10110110 11010011001010011101011

Sign
Mantissa

Exponent

Double precision, which uses 64 bits and has the following layout:

 1 bit for the sign of the number. 0 means positive and 1 means negative.

 11 bits for the exponent.

 52 bits for the mantissa.

3
e.g. 0 00011100010 0100001000000000000001110100000110000000000000000000

Double precision has more bits, allowing for much larger and much smaller numbers to be

represented. As the mantissa is also larger, the degree of accuracy is also increased (remember that

many fractions cannot be accurately represented in binary). Whilst double precision floating point

numbers have these advantages, they also require more processing power. With increases in CPU

processing power and the move to 64 bit computing a lot of programming languages and software

just default to double precision. (R. Chadwick, 2019. [Online]: https://ryanstutorials.net/binary-

tutorial/binary-floating-point.php)

Sign bit is the first bit of the binary representation. '1' implies negative number and '0' implies

positive number. Example: 11000001110100000000000000000000. This is negative number.

Exponent is decided by the next 8 bits of binary representation. 127 is the unique number for 32 bit

floating point representation. It is known as bias. It is determined by 2 k-1 -1 where 'k' is the number of

bits in exponent field.

There are 3 exponent bits in 8-bit representation and 8 exponent bits in 32-bit representation.

Thus

bias = 3 for 8 bit conversion (23-1 -1 = 4-1 = 3)

bias = 127 for 32 bit conversion. (28-1 -1 = 128-1 = 127)

Example: 01000001110100000000000000000000

10000011 = (131)2

131-127 = 4

Hence the exponent of 2 will be 4 i.e. 24 = 16.

4
Mantissa is calculated from the remaining 23 bits of the binary representation. It consists of '1' and a

fractional part which is determined by:

Example:

01000001110100000000000000000000

The fractional part of mantissa is given by:

1(1/2) + 0(1/4) + 1(1/8) + 0(1/16) +……… = 0.625

Thus the mantissa will be 1 + 0.625 = 1.625

The decimal number hence given as: SignExponentMantissa = (-1)(16)(1.625) = -26

(GeeksforGeeks, 2019. [Online]: https://www.geeksforgeeks.org/floating-point-representation-

digital-logic/)

Decimal to Floating Point Conversion

To convert a decimal number to binary floating point representation:

1. Convert the absolute value of the decimal number to a binary integer plus a binary fraction.

2. Normalize the number in binary scientific notation to obtain m and e.

3. Set s=0 for a positive number and s=1 for a negative number.

To convert 22.625 to binary floating point:

1. Convert decimal 22 to binary 10110. Convert decimal 0.625 to binary 0.101. Combine integer

and fraction to obtain binary 10110.101.

2. Normalize binary 10110.101 to obtain 1.0110 x 24 Thus, m = 1.01101012 and e = 4 = 1002.

5
3. The number is positive, so s=0. (M. Roth, 1996. [Online]:

https://www.cs.uaf.edu/2004/fall/cs301/notes/node49.html)

Conclusion

While floating-point representation can’t represent all numbers precisely, it does give us a

guaranteed number of significant digits. For 8-bit representation, we get a single bit of precision

which is pretty limited. Using 16-bit representation, we get more mantissa and exponent bits to

increase precision and range of numbers respectively; again this can only represent 2 16 real numbers.

Nearly all computers today follow the IEEE 754 standard (32-bit and 64-bit formats) for representing

floating-point numbers. This standard is similar to the 8-bit and 16-bit formats we've explored

already, but the standard deals with longer bit lengths to gain more precision and range; and it

incorporates two special cases to deal with very small and very large numbers. (C. Burch, 2011)

IEEE 754 standard is for now the de-facto standard for floating point representation of binary

numbers for application in modern computer systems until an even more efficient standard is

developed in the future. This is something exciting to look forward to.

NZ National Vital Signs Chart
No ratings yet
NZ National Vital Signs Chart
2 pages
The Angel - Pearl Buck
No ratings yet
The Angel - Pearl Buck
7 pages
PN325 PDS
No ratings yet
PN325 PDS
4 pages
Tilting Vice PDF
No ratings yet
Tilting Vice PDF
33 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
IEEE 754: Floating Point Guide
No ratings yet
IEEE 754: Floating Point Guide
10 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
38 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
20 pages
IEEE Floating Point Conversion Guide
No ratings yet
IEEE Floating Point Conversion Guide
34 pages
Floating Point
No ratings yet
Floating Point
10 pages
Lec 2 Unit-1
No ratings yet
Lec 2 Unit-1
65 pages
IEEE FP Representation
No ratings yet
IEEE FP Representation
3 pages
Computer Arithmetic Basics
No ratings yet
Computer Arithmetic Basics
18 pages
arch1-LECTURE-NUMBER REPRESENTATION
No ratings yet
arch1-LECTURE-NUMBER REPRESENTATION
42 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
05 Floating Point
No ratings yet
05 Floating Point
24 pages
Floating Points
No ratings yet
Floating Points
31 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
Module2.1 of Nothing
No ratings yet
Module2.1 of Nothing
7 pages
Binary Tutorial
No ratings yet
Binary Tutorial
10 pages
Floating Point Arithmetic Guide
No ratings yet
Floating Point Arithmetic Guide
42 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
27 pages
Floating Point Representation
No ratings yet
Floating Point Representation
3 pages
Lecture Slides Week4
No ratings yet
Lecture Slides Week4
42 pages
Floating Point Representation
No ratings yet
Floating Point Representation
3 pages
4.4 - 1 New Floating Point
No ratings yet
4.4 - 1 New Floating Point
22 pages
Number Representation Explained
No ratings yet
Number Representation Explained
5 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
2 pages
Floating Point Number
No ratings yet
Floating Point Number
28 pages
Ieee Standard For Floating Point Numbers
No ratings yet
Ieee Standard For Floating Point Numbers
5 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
Number Representation
No ratings yet
Number Representation
7 pages
Itec1000 Lecture Note 5
No ratings yet
Itec1000 Lecture Note 5
10 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
L2 DataTypeFloat
No ratings yet
L2 DataTypeFloat
54 pages
Floating Point: Contents and Introduction
No ratings yet
Floating Point: Contents and Introduction
7 pages
Floating Point Representation
No ratings yet
Floating Point Representation
18 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
Floating-Point Binary
No ratings yet
Floating-Point Binary
3 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
2.4 Floating Point Representation
No ratings yet
2.4 Floating Point Representation
7 pages
L7 - Floating Point Representation
No ratings yet
L7 - Floating Point Representation
39 pages
L1 FloatingPointNumbers Intro
No ratings yet
L1 FloatingPointNumbers Intro
17 pages
Introduction of IEEE 754 Floating Point Number
No ratings yet
Introduction of IEEE 754 Floating Point Number
11 pages
Floating Point Basics and Formats
No ratings yet
Floating Point Basics and Formats
5 pages
Floating Point Integer
No ratings yet
Floating Point Integer
15 pages
What Are Floating Point Numbers?
No ratings yet
What Are Floating Point Numbers?
7 pages
Data Representation
No ratings yet
Data Representation
28 pages
Unit-1 COA
No ratings yet
Unit-1 COA
26 pages
Unit 1 CBNST
No ratings yet
Unit 1 CBNST
4 pages
Introduction To Numerical Computing: Statistics 580 Number Systems
No ratings yet
Introduction To Numerical Computing: Statistics 580 Number Systems
35 pages
De - Unit 1 - Class 5
No ratings yet
De - Unit 1 - Class 5
2 pages
Floating Point Tutorial
No ratings yet
Floating Point Tutorial
15 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Summarised version-WSN Based Smart Car Parking System
No ratings yet
Summarised version-WSN Based Smart Car Parking System
23 pages
Practice Questions On Subnetting
100% (1)
Practice Questions On Subnetting
1 page
Computer Packages
No ratings yet
Computer Packages
35 pages
COM 215 Lecture Notes
No ratings yet
COM 215 Lecture Notes
54 pages
5G Wireless Technology: Millimeter Wave Health Effects
No ratings yet
5G Wireless Technology: Millimeter Wave Health Effects
5 pages
Ielts5 - Santiago Suarez
No ratings yet
Ielts5 - Santiago Suarez
1 page
Lecture Notes 2 - Atomic Structure
No ratings yet
Lecture Notes 2 - Atomic Structure
32 pages
TL-30 Datasheet - UDNC
No ratings yet
TL-30 Datasheet - UDNC
2 pages
Carbon-14: 2 Radiocarbon Dating
No ratings yet
Carbon-14: 2 Radiocarbon Dating
6 pages
SanyaMidha FullStackWebDeveloper Resume
100% (1)
SanyaMidha FullStackWebDeveloper Resume
1 page
Module 7 Intangibles
No ratings yet
Module 7 Intangibles
14 pages
Lexicology Study Guide
No ratings yet
Lexicology Study Guide
34 pages
Radiant July 2018
No ratings yet
Radiant July 2018
18 pages
Unit 2 - Esp in Elt - Complete
No ratings yet
Unit 2 - Esp in Elt - Complete
35 pages
(Mycology Series 16) D.H. Howard-Pathogenic Fungi in Humans and Animals-Marcel Dekker (2003)
100% (2)
(Mycology Series 16) D.H. Howard-Pathogenic Fungi in Humans and Animals-Marcel Dekker (2003)
804 pages
Class 11 Physics Exam Paper
No ratings yet
Class 11 Physics Exam Paper
4 pages
Advanced Ventilator Specifications
No ratings yet
Advanced Ventilator Specifications
2 pages
Canara - Epassbook - 2024-05-13 09:12:52.002054
No ratings yet
Canara - Epassbook - 2024-05-13 09:12:52.002054
65 pages
Recovery CDs
No ratings yet
Recovery CDs
6 pages
BIOLOGY PLUS TWO Short Notes - Line Foundation
No ratings yet
BIOLOGY PLUS TWO Short Notes - Line Foundation
9 pages
Android File Management Guide
No ratings yet
Android File Management Guide
19 pages
Parenteral Feeding
No ratings yet
Parenteral Feeding
3 pages
Hanon Complete Text
No ratings yet
Hanon Complete Text
129 pages
Integrated Geophysical Approach For Dam Health Checks and Monitoring
No ratings yet
Integrated Geophysical Approach For Dam Health Checks and Monitoring
37 pages
Sec Registration of Representative Office: Basic Requirements To Have
No ratings yet
Sec Registration of Representative Office: Basic Requirements To Have
8 pages
Manual7298631 Dell Color Management User S Guide For Macos
No ratings yet
Manual7298631 Dell Color Management User S Guide For Macos
13 pages
Os Lec 4 Process
No ratings yet
Os Lec 4 Process
7 pages
Omkar Resume
No ratings yet
Omkar Resume
2 pages
Asuhan Keperawatan Diare
No ratings yet
Asuhan Keperawatan Diare
32 pages
Service Manual: Viewsonic Pjd6211
No ratings yet
Service Manual: Viewsonic Pjd6211
60 pages

Floating Point Representation - M.eng Term Paper

Uploaded by

Floating Point Representation - M.eng Term Paper

Uploaded by

Floating Point Representation

Oyewo Maroof Olayinka

Department of Electrical/Electronic Engineering,

Federal University of Petroleum Resources, Effurun

8th August, 2019

floating-point representation can be thought of as a kind of scientific notation. (Wikipedia, 2019)

In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real

real numbers, which require fast processing times. (Wikipedia, 2019)

applications in the field of computing.

Image from www.ntu.edu.ng

real form. This can be represented as: F x re

can be normalized as 1.0101011B×23. (C. Hock-Chuan, 2014. [Online]:

IEEE 754: floating point in modern computers

floating point in addition to the IEEE 754 binary format.

 8 bits for the exponent.

 23 bits for the mantissa.

 11 bits for the exponent.

 52 bits for the mantissa.

just default to double precision. (R. Chadwick, 2019. [Online]: https://ryanstutorials.net/binary-

positive number. Example: 11000001110100000000000000000000. This is negative number.

bits in exponent field.

bias = 3 for 8 bit conversion (23-1 -1 = 4-1 = 3)

bias = 127 for 32 bit conversion. (28-1 -1 = 128-1 = 127)

Hence the exponent of 2 will be 4 i.e. 24 = 16.

fractional part which is determined by:

The fractional part of mantissa is given by:

1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +……… = 0.625

Thus the mantissa will be 1 + 0.625 = 1.625

The decimal number hence given as: Sign*Exponent*Mantissa = (-1)*(16)*(1.625) = -26

(GeeksforGeeks, 2019. [Online]: https://www.geeksforgeeks.org/floating-point-representation-

Decimal to Floating Point Conversion

To convert a decimal number to binary floating point representation:

2. Normalize the number in binary scientific notation to obtain m and e.

To convert 22.625 to binary floating point:

and fraction to obtain binary 10110.101.

2. Normalize binary 10110.101 to obtain 1.0110 x 24 Thus, m = 1.01101012 and e = 4 = 1002.

developed in the future. This is something exciting to look forward to.

You might also like

1(1/2) + 0(1/4) + 1(1/8) + 0(1/16) +……… = 0.625

The decimal number hence given as: SignExponentMantissa = (-1)(16)(1.625) = -26