0% found this document useful (0 votes)

42 views26 pages

Significant Figures & Error Analysis

Uploaded by

subhrojit.nandy.27105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views26 pages

Significant Figures & Error Analysis

Uploaded by

subhrojit.nandy.27105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Chapter 1: Numbers and

Precisions
Significant figures
Determining the number of significant figures in measured quantities is
essential when reporting the precision of measured values and the
precision that can be reported when measured values are used in
calculations. The rules for determining the number of significant figures
are as follows:

1. All nonzero digits are significant.

o For example, the value 211.8 has four significant figures.

2. All zeros that are found between nonzero digits are significant.

o Thus, the number 20,007, with three 0s between the 2 and 7, has a
total of five significant figures.

3. Leading zeros (to the left of the first nonzero digit) are not significant.

o A value such as 0.0085, for example, has two significant figures

because the 0s before the 8 are placeholders and are not significant.

4. Trailing zeros for a whole number that ends with a decimal point are
significant.

o For example, a value written as 320.0 shows the decimal point,

which indicates that the 0 to the right of the 2 was measured;
therefore, the value has a total of three significant figures. If the
decimal point was not written, then 320 would have only two
significant figures. In general, any confusion this may cause can be
avoided by writing values such as these in scientific notation.

5. Trailing zeros to the right of the decimal place are significant.

Numbers and Precision | 1

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

o This means a value such as 12.000 has a total of five significant

figures, since the 0s after the decimal place have been measured to
be zeros, indicating they are as significant as any other nonzero
digit.

6. Exact numbers, and irrationally defined numbers like Euler’s

number (e) and pi (π), have an infinite number of significant figures.

o In a defining expression like 1 meter = 100 centimeters, these

values are considered exact and thus have an infinite number of
significant figures. While π is usually written as 3.14 for ease of
calculation, the π button on the calculator would be used in any
calculations, and thus it is considered to be a value with infinite
significant figures.

For any value written in scientific notation as A ×10x, the number of

significant figures is determined by applying the above rules only to the
value of A; the x is considered an exact number and thus has an infinite
number of significant figures.

o For example, the value 4,500 can be written in scientific notation to

reflect two, three, and four significant digits:
o 4.5 × 103 has two significant figures
o 4.50 × 103 has three significant figures
o 4.500 × 103 has four significant figures

Calculations with significant figures

For calculations involving measured quantities, the first step in

determining the precision of the answer is to determine the number of
significant figures in each of the measured quantities. Once done, the
number of significant figures in a calculated value involving
measurements is determined based on the mathematical operation being
performed.

When two or more measured quantities are added or subtracted, the

resulting value will have the same number of decimal places as the value
with the fewest number of decimal places (the limiting value). So if the
measured values of 22.35 and 47.773 are added, the limiting value of

Numbers and Precision | 2

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

22.35 has two decimal places, which means that the result of the
addition will have only two decimal places.

When two or more measured quantities are being multiplied or divided,

the answer will have the same number of total significant figures as the
value with the fewest number of significant figures. So if the measured
values of 2.445 and 31.7 are being multiplied, the resulting value will
have three significant figures, since 2.445 has four significant figures,
but 31.7 has only three significant figures.

When a value is to be rounded off, the rules for rounding are:

1. When the digit to the right of the one being rounded to is less than a 5,
the remaining digit remains the same as the value rounds down.
o For example, 33.742 is to be rounded to one decimal place. Here,
the 7 in the first decimal place is followed by a 4, which is less than
5, which means that 33.742 rounded to one decimal place is 33.7.
Note that only the 4 that is to the right of the 7 is looked at here;
the 2 in the third decimal place is insignificant when rounding to
one decimal place.
2. When the digit to the right of the one being rounded to is greater than 5,
the value rounds up.
o For example, 2.8763 is to be rounded to two decimal places. In this
case, the 6 in the third decimal place is greater than 5, so the 7 in
the second decimal place is rounded up to 8. This means that when
rounded to two decimal places, 2.8763 rounds to 2.88. Again, the 3
in the fourth decimal place is insignificant when rounding to two
decimal places.
3. When the digit to the right of the one being rounded to is exactly a 5
(which means no nonzero digit follows it), the value is rounded so that
the final digit is an even number. This rule is designed to avoid always
rounding up or always rounding down; it creates more balance when
rounding.
o Thus, 21.45 rounds to one decimal place to 21.4, while 36.75 would
round to 36.8.
o However, if a value such as 38.25003 is to be rounded to one
decimal place, it rounds to 38.3. This is the only type of rounding
where a digit farther than immediately to the right of the one being
rounded to is ever considered. In this example, the digit looked at

Numbers and Precision | 3

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

when rounding off to one decimal place is a 5. However, farther

along the decimal portion of the value there is a nonzero digit. The
number being rounded is therefore rounded up, as the 0.00003
indicates that the value of 0.05003 is larger than just the 0.05. For
this reason, the value rounded to one decimal place is 38.3, not
38.2

Errors in Numerical Analysis

Numerical analysis involves developing algorithms to solve
mathematical problems approximately rather than exactly. When
dealing with real-world problems, exact solutions are often impossible
due to the complexity of the equations involved, the limitations of
computational resources, or inherent approximations in the model.
These approximations lead to errors.

Absolute Error
Absolute Error is used to measure the accuracy of a measurement by
comparing it to the true or exact value. It shows how far off a
measurement is from the actual value, without considering whether the
measured value is greater or less than the true value. It is always non-
negative. The absolute error has the same units as the measured and
true values. Absolute error does not tell us just how much significant
the error is relative to the true value.

Definition: Absolute error is the absolute difference between the

measured value and the true value.

Formula: Ea =| Xtrue - Xapprox |

The formula to calculate absolute error is:

 Xtrue is the true or exact value.
 Xapprox is the approximate or measured value.
 The vertical bars “| |” denote the absolute value, ensuring error is
always non-negative.

Numbers and Precision | 4

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Calculation of Absolute Error

1. Identify the True Value: Determine the exact value of the quantity.
This might be a known constant, a value from a theoretical model, or
the most accurate measurement available.
2. Identify the Approximate Value: Determine the approximate or
measured value. This could be a value obtained through
experimentation, estimation, or numerical approximation.
3. Subtract the Approximate Value from the True Value: Find the
difference between the true value and the approximate value.
4. Take the Absolute Value: Ensure the error is expressed as a non-
negative quantity by taking the absolute value of the difference.

Relative Error
Relative error is a measure of the accuracy of an approximation in
relation to the true value. It expresses the absolute error as a fraction of
the true value, providing the error’s significance compared to the
magnitude of the quantity being measured. Relative error is particularly
useful when comparing errors across different units because it is a
dimensionless quantity.

Definition: Relative Error is the ratio of the Absolute Error to the true or
exact value.

Formula: Er= (| Xtrue - Xapprox | / | X true |)

Calculation of Relative Error

1. Determine the Absolute Error: Calculate the absolute error using

the formula.
2. Divide by the True Value: Divide the absolute error by the true
value to obtain the relative error.
3. Express as a Fraction or Percentage: The result can be left as a
fraction or multiplied by 100 to express it as a percentage.

Numbers and Precision | 5

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Percentage Error
Percentage error quantifies the accuracy of a measured or estimated
value by expressing the error as a percentage of the true or exact value.
It provides a way to compare the error relative to the magnitude of the
true value, making it easier to understand the significance of the error
in context. A small percentage error means the measurement is close to
the true value while a large percentage error indicates that the
measurement is far from the true value.

Definition: Percentage Error is the ratio of the Absolute Error to the true
value multiplied by 100, it can also be defined as Relative Error
multiplied by 100.

Formula: Ep = Er x 100% = (| Xtrue - Xapprox | / | Xtrue |) x 100%

Calculation of Percentage Error
1. Determine the Absolute Error: Calculate the absolute error using
the formula.
2. Divide by the True Value: Divide the absolute error by the true
value to obtain the relative error.

3. Multiply by 100: Convert the relative error to a percentage by

multiplying the result by 100.

Binary Number Representation

Binary is a base-2 number system that uses two mutually exclusive
states to represent information. A binary number is made up of elements
called bits where each bit can be in one of the two possible states.
Generally, we represent them with the numerals 1 and 0. We also talk
about them being true and false. Electrically, the two states might be
represented by high and low voltages or some form of switch turned on or
off.

We build binary numbers the same way we build numbers in our

traditional base 10 system. However, instead of a one's column, a 10's
column, a 100's column (and so on) we have a one's column, a two's
columns, a four's column, an eight's column, and so on, as illustrated
below.
Numbers and Precision | 6
As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Binary 2... 26 25 24 23 22 21 20
... 64 32 16 8 4 2 1

For example, to represent the number 203 in base 10, we know we place
a 3 in the 1's column, a 0 in the 10's column and a 2 in
the 100's column. This is expressed with exponents in the table below.

203 in base 10
102 101 100
2 0 3

Or, in other words, 2 × 102 + 3 × 100 = 200 + 3 = 203. To represent the

same thing in binary, we would have the following table.

203 in base 2
27 26 2 5 24 23 2 2 21 20
1 1 0 0 1 0 1 1

That equates to 27 + 26 + 23+21 + 20 = 128 + 64 + 8 + 2 + 1 = 203.

Base 2 and 10 factors related to bytes

Name Base 2 Bytes Close Base 10 bytes
Factor Base
10
Factor
1 Kilobyte 210 1,024 103 1,000
1 Megabyte 220 1,048,576 106 1,000,000
1 Gigabyte 230 1,073,741,824 109 1,000,000,000
1 Terabyte 240 1,099,511,627,77 1012 1,000,000,000,000
6
1 Petabyte 250 1,125,899,906,84 1015 1,000,000,000,000,000
2,624
1 Exabyte 260 1,152,921,504,60 1018 1,000,000,000,000,000,000
6,846,976

Numbers and Precision | 7

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Conversion

The easiest way to convert between bases is to use a computer, after all,
that's what they're good at! However, it is often useful to know how to do
conversions by hand.

The easiest method to convert between bases is repeated division. To

convert, repeatedly divide the quotient by the base, until the quotient is
zero, making note of the remainders at each step. Then, write the
remainders in reverse, starting at the bottom and appending to the right
each time. An example should illustrate; since we are converting to
binary we use a base of 2.

Convert 203 to binary

Quotient Remainder

203 ÷ 2 101 1
101 ÷ 2 50 1
50 ÷ 2 25 0
25 ÷ 2 12 1
12 ÷ 2 6 0
6÷2 3 0
3÷2 1 1
1÷2 0 1

Reading from the bottom and appending to the right each time gives 11001011

Convert 193.379 to binary

First of all split the number into Integer (193) and fractional part (.379), then convert
them to binary form separately and finally adding them.

Quotient Remainder

193/2 96 1
96/2 48 0
48/2 24 0
24/2 12 0
12/2 6 0
6/2 3 0
3/2 1 1
1/2 0 1

Numbers and Precision | 8

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

So, in binary form 193 becomes 11000001

Multiplied Integer digit

form
.379 x2 0.758 0
.758 x2 1.516 1
.516 x2 1.032 1
.032 x2 0.064 0
.064 x2 0.128 0
0.128 x2 0.256 0
0.256 x2 0.512 0
0.512 x2 1.024 1

So, in binary form 0.379 becomes .01100001

Thus, in binary form the number 193.379 in decimal form (193.379)10 can be written
as (11000001.0110001)2

Binary Addition
Binary addition technique is similar to the normal addition of decimal
numbers excluding that as an alternative value of 10 digits, it carries on
a 2 value.

For example, as we compute 7+9 manually, then the answer is 16. So we

know that the result has to write like two digits 1 and 6. The main
reason to write down the result like 1 6 is, the addition of 7 + 9 is greater
than the single digit. So the result cannot be denoted through a single
digit because the largest single digit is ‘9’.

Similarly, whenever we would like to sum two binary numbers, only we

will have a carry if the product is bigger than 1 because, in binary
numbers, 1 is the highest number. The binary addition rules are given in
the following truth table of subtraction.

Numbers and Precision | 9

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

B A+B Carry
A
0 0 0 0

0 1 1 0

1 0 1 0

1 1 0 1

In the above tabular form, the initial three equations are the same for the
binary digit number. The addition of binary numbers step by step is
explained in detail. For binary addition take an example of 11011 &
10101.

1 1 1 1 (Carry)
1 1 0 1 1 (27)
(+) 1 0 1 0 1 (21)
____________
1 1 0 0 0 0 (48)

Here the step by step binary addition rules is explained below

1 + 1 => 1 0 = 0 with a carry 1

1 + 1 + 0 => 1 0 = 0 with carry 1
1 + 0 + 1 => 1 0 => 0 = 0 with carry 1
1 + 1 + 0 => 1 0 => 0 = 0 with carry 1
1 + 1 + 1 => 1 0 +1 => 1 1

Carefully note that 10 + 1 => 11 and this is equal to 2 + 1= 3. Therefore

the necessary outcome is 111000.

Examples

The binary addition examples are shown in the following figure.

Numbers and Precision | 10

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Binary Subtraction
In subtraction, this is the primary technique. In this method, ensure that
the subtracting number must be from a larger number to smaller, or else
this technique won’t work appropriately.

If the minuend is smaller than the subtrahend, then this method is used
by just switch their positions and memorize that the effect will be a
negative number. The binary subtraction rules are given in the following
table of subtraction.

A B A-B Borrow

0 0 0 0

1 0 1 0

1 1 0 0

0 1 1 1

Numbers and Precision | 11

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Example, in the binary subtraction, subtract the subtrahend from

minuend. Take an example of subtrahend (11011) and minuend
(1101101). For subtraction, arrange these two like the subtrahend
should be below the minuend. The example of this is given below.

1101101
– 11011
To get the same number of digits in subtrahend, add zeros where it
requires.

1101101
– 0011011
________
1010010

In the above binary subtraction example, the subtraction was achieved

from the right side to the left side with the help of tabular form which is
shown in the above. Here the step by step binary subtract on rules is
explained below.

Starting from left side, we see:

1 – 1 = 0 => 0 – 1 = 1 (borrow 1) => 1(0) – 0 = 0 => 1 – 1 = 0 => 0 – 1 = 1

(borrow 1) => 1(0) – 0 = 0 => 1 – 0 = 1

So the final result will be 1010010

Binary Multiplication
Let us consider 2 binary numbers: 101101 and 101; to multiply them we
must write as follows:

101101 Check once you are done!!

×101 101101= 45
101101 101= 5
000000× 11100001= 225
+101101×× 45 × 5 = 225
11100001
Numbers and Precision | 12
As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Let us consider binary number with a radix point in it: 101011.01 and
0101.10; to multiply them we must write as follows:

1 0 1 0 1 1. 0 1 for multiplying, stop considering radix point in

× 0 1 0 1. 1 0 the numbers, remove unwanted 0s, re-write
them accordingly and conduct multiplication.

10101101
×1011
10101101
10101101×
00000000××
10101101×××
1 1 1 0 1 1 0 1.1 1 1 Adjust the radix point in the answer after
adding the bits after radix point, i.e. 2 bits + 1
bits = 3 bits
Check once you are done!!
101011.01 = 43.25
101.1 = 5.5
11101101.111 = 237.875
And 43.25 × 5.5 = 237.825

Binary Division

Numbers and Precision | 13

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Classification of binary representation

In general, the binary number can be represented in two ways.

1. Unsigned Binary Numbers

2. Signed Binary Numbers

Unsigned Binary Numbers

Using unsigned binary number representation, only positive binary
numbers can be represented. For n-bit unsigned binary numbers, all n-
bits are used to represent the magnitude of the number.

For example, if we represent decimal 12 in 5- bit unsigned number form

then (12)10 = (01100)2. Here all 5 bit are used to represent the magnitude
of the number.

In unsigned binary number representation, using n-bits, we can

represent the numbers from 0 to 2n – 1. For example, using 4 -bits we
can represent the number from 0 to 15 in unsigned binary number
representation.

Signed Binary Numbers

Using signed binary number representation both positive and negative
numbers can be represented.

In signed binary number representation the most significant bit (MSB) of

the number is a sign bit. For positive numbers, the sign bit is 0 and for
negative number, the sign bit is 1.

There are three different ways the signed binary numbers can be
represented.

1. Signed Magnitude Form: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.

2. 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for
k bits.
3. 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1),
for k bits

Numbers and Precision | 14

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Sign Magnitude Representation

In sign-magnitude representation, the Most Significant bit of the number
is a sign bit and the remaining bit represents the magnitude of the
number in a true binary form. For example, if some signed number is
represented in the 8-bit sign-magnitude form then MSB is a sign bit and
the remaining 7 bits represent the magnitude of the number in a true
binary form.

Here is the representation of + 34 and -34 in a 8-bit sign-magnitude

form.

Since the magnitude of both numbers is the same, the first 7 bits in the
representation are the same for both numbers. For +34, the MSB is 0,
and for -34, the MSB or sign bit is 1.

In sign magnitude representations, there are two different

representations for 0.

Numbers and Precision | 15

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Using n-bits, the range of numbers that can be represented in Sign

Magnitude Representation is from – (2n-1 – 1) to (2n -1 – 1).

1’s Complement Representation

In 1’s complement representation, the representation of the positive

number is same as the negative number. But the representation of the
negative number is different.

For example, if we want to represent -34 in 8-bit 1’s complement form,

then first write the positive number (+34). And invert all 1s in that
number by 0s and 0s by 1s in that number. The corresponding inverted
number represents the -34 in 1’s complement form. It is also called 1s
complement of the number +34.

Here is another example which shows how to represent -60 in 8-bit 1’s
complement form.

Using n-bits, the range of numbers that can be represented in 1’s

complement form is from – (2n-1 – 1) to (2n -1 – 1). For example, using 4-

Numbers and Precision | 16

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

bits, it is possible to represent integer’s numbers from -7 to +7 in a 1’s

complement form representation.
Similar to sign-magnitude form, there are two different representations of
0 in 1’s complement form representation.

2’s Complement Representation

In 2’s complement representation also, the representation of the positive

number is same as1’s complement and sign-magnitude form.

But the representation of the negative number is different. For example,

if we want to represent -34 in 2’s complement form then

1. Write the number corresponding to +34.

2. Starting from Least Significant Bit (LSB), just copy all the bits until
the first 1 is encountered in the number.
3. After the first ‘1’ is encountered, invert all the 1s in the number
with 0s and 0s in the number with 1s (including the sign bit)
4. The resultant number is 2’s complement representation of the
number -34.

Numbers and Precision | 17

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

The second way of representing -34 in 2’s complement form is

1. Write the number corresponding to +34.

2. Find 1’s complement of +34
3. Add ‘1’ to the 1’s complement number
4. The resultant is 2’s complement representation of -34

For n-bit number N, its 2’s complement is (2n – N). For example, the 2’s
complement of +34 in 8-bit form is (28 – 34). In binary, it is 100000000 –

Numbers and Precision | 18

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

00100010 = 11011110. That is a third way of finding the 2’s

complement.
Here is the representation of -60 in sign-magnitude form, 1’s
complement, and 2’s complement form.

Using n-bits, the range of number which can be represented in 2’s

complement form is from – (2n-1) to 2n-1 – 1. For example, using 4-bits, it
is possible to represent numbers from -8 to +7. Unlike 1’s complement
and sign magnitude form, there is a unique way of representing 0 in this
2’s complement form.

Fixed and Floating point Representation

Digital Computers use Binary number system to represent all types of
information inside the computers. Alphanumeric characters are
represented using binary bits (i.e., 0 and 1). Digital representations are
easier to design, storage is easy, and accuracy and precision are greater.

There are various types of number representation techniques for digital

number representation, for example: Binary number system, octal
number system, decimal number system, and hexadecimal number
system etc. But Binary number system is most relevant and popular for
representing numbers in digital computer system.

Storing Real Number

These are structures as following below −

Numbers and Precision | 19

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

There are two major approaches to store real numbers (i.e., numbers
with fractional component) in modern computing. These are (i) Fixed
Point Notation and (ii) Floating Point Notation. In fixed point notation,
there are a fixed number of digits after the decimal point, whereas
floating point number allows for a varying number of digits after the
decimal point.

Fixed-Point Representation −

This representation has fixed number of bits for integer part and for
fractional part. For example, if given fixed-point representation is
IIII.FFFF, then you can store minimum value is 0000.0001 and
maximum value is 9999.9999. There are three parts of a fixed-point
number representation: the sign field, integer field, and fractional field.

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.

Numbers and Precision | 20
As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for

k bits.
 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1),
for k bits.

2’s complementation representation is preferred in computer system

because of unambiguous property and easier for arithmetic operations.

Example −Assume number is using 32-bit format which reserve 1 bit for
the sign, 15 bits for the integer part and 16 bits for the fractional part.

Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent -.

000000000101011 is 15 bit binary value for decimal 43 and
1010000000000000 is 16 bit binary value for fractional 0.625.

The advantage of using a fixed-point representation is performance and

disadvantage is relatively limited range of values that they can represent.
So, it is usually inadequate for numerical analysis as it does not allow
enough numbers and accuracy. A number whose representation exceeds
32 bits would have to be stored inexactly.

These are above smallest positive number and largest positive number
which can be store in 32-bit representation as given above format.

Numbers and Precision | 21

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Therefore, the smallest positive number is 2-16 ≈ 0.000015 approximate

and the largest positive number is (215-1) + (1-2-16) =215(1-2-16) =32768,
and gap between these numbers is 2-16.

We can move the radix point either left or right with the help of only
integer field is 1.

Floating-Point Representation −

This representation does not reserve a specific number of bits for the
integer part or the fractional part. Instead it reserves a certain number of
bits for the number (called the mantissa or significand) and a certain
number of bits to say where within that number the decimal place sits
(called the exponent).

The floating number representation of a number has two part: the first
part represents a signed fixed point number called mantissa. The second
part of designates the position of the decimal (or binary) point and is
called the exponent. The fixed point mantissa may be fraction or an
integer. Floating-point is always interpreted to represent a number in the
following form: M x re.

Only the mantissa m and the exponent e are physically represented in

the register (including their sign). A floating-point binary number is
represented in a similar manner except that is uses base 2 for the
exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.

So, actual number is (-1) s (1+m) x2(e-Bias), where s is the sign bit, m is the
mantissa, e is the exponent value, and Bias is the bias number.

Numbers and Precision | 22

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

Note that signed integers and exponent are not represented by either sign
representation, or one’s complement representation, or two’s complement
representation, it is rather done by bias representation. This is done so
because in case of signed value, 1’s and 2’s representation the sign bit
plays a vital role and make it difficult to control moreover there is a
discontinuity in this systems while in bias system we find no
discontinuity, hence it is preferred to use.

The floating point representation is more flexible. Any non-zero number

can be represented in the normalized form of ± (1.b 1b2b3 ...) 2x2n this is
normalized form of a number x.

Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8
bits for signed exponent, and 23 bits for the fractional part. The leading
bit 1 is not stored (as it is always 1 for a normalized number) and is
referred to as a “hidden bit”.

Then −53.5 is normalized as -53.5= (-110101.1)2= (-1.101011) x25, which

is represented as following below,

Where 00000101 is the 8-bit binary value of exponent value +5, mantissa
is only 101011 other 17 bits are adjusted by putting 0s, and we omit the
integer part of the binary number.

Note that 8-bit exponent ﬁeld is used to store integer exponents -126 ≤ n
≤ 127 (bias system).

The smallest normalized positive number that ﬁts into 32 bits is

(1.00000000000000000000000)2x2-126=2-126≈1.18x10-38, and largest
normalized positive number that ﬁts into 32 bits is
(1.11111111111111111111111)2x2127= (224-1) x2104 ≈ 3.40x1038. These
numbers are represented as following below,

Numbers and Precision | 23

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

The precision of a ﬂoating-point format is the number of positions

reserved for binary digits plus one (for the hidden bit). In the examples
considered here the precision is 23+1=24.

The gap between 1 and the next normalized ﬂoating-point number is

known as machine epsilon. the gap is (1+2-23)-1=2-23for above example,
but this is same as the smallest positive ﬂoating-point number because
of non-uniform spacing unlike in the ﬁxed-point scenario.

Note that non-terminating binary numbers can be represented in floating

point representation, e.g., 1/3 = (0.010101 ...)2 cannot be a ﬂoating-point
number as its binary representation is non-terminating.

IEEE Floating point Number Representation −

IEEE (Institute of Electrical and Electronics Engineers) has standardized

Floating-Point Representation as following diagram.

So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the

mantissa, e is the exponent value, and Bias is the bias number. The sign
bit is 0 for positive number and 1 for negative number. Exponents are
represented by or two’s complement representation.

Numbers and Precision | 24

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

According to IEEE 754 standard, the floating-point number is

represented in following ways:

 Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit
mantissa
 Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit
mantissa
 Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit
mantissa
 Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112
bit mantissa

Special Value Representation −

There are some special values depended upon different values of the
exponent and mantissa in the IEEE 754 standard.

 All the exponent bits 0 with all mantissa bits 0 represents 0. If sign
bit is 0, then +0, else -0.
 All the exponent bits 1 with all mantissa bits 0 represents infinity. If
sign bit is 0, then +∞, else -∞.
 All the exponent bits 0 and mantissa bits non-zero represents de-
normalized number.
 All the exponent bits 1 and mantissa bits non-zero represents error.

Floating Point Arithmetic

Let 2 decimal numbers x and y be chosen for arithmetic operations, let z
be the result of the arithmetic operation, if fx, fy and fz are the respective
fractional part and Ex , Ey and Ez are the respective exponential part of
the decimal numbers x,y and z such that the normalized form is: x = f x Ex

1. For Addition and Subtraction :

a. Set Ez = Ex or Ey, which one is higher (if Ex >= Ey, Ez = Ex)
b. Adjust the decimal point of fx/fy in order to maintain Ez.
c. Perform fx ± fy to get fz

Numbers and Precision | 25

As per new syllabus of University of Calcutta (CCF-2022) CHEM-H-SEC3-3-TH

d. Normalized value of fz cannot exceed 1, if it happen shift the

decimal point to bring 0 before the decimal point. Eg: 1.987 =>
0.198
For Example:
i) Add 0.7642E4 and 0.4253E6.
Solution: The exponent of a number with the smallest exponent
is increased by 2 so that 0.7642E4 becomes 0.0076 E6. Then
0.7642E4 + 0.4253E6 = 0.0076E6 + 0.4253E6 = 0.4329E6
ii) Subtract 0.4673E-4 from 0.8542E-5.
Solution: The smallest exponent is E-5 so we increase the
exponent of 0.8542E-5 by 1 and it becomes 0.0854E-4, therefore
0.4673E-4 – 0.0854E-4 = 0.3819E-4.
2. For Multiplication :
a. Multiply the fractional part fz = fx.fy
b. Add the exponents Ez = Ex + Ey
c. Then z = fz.10Ez
d. Normalized value of fz cannot exceed 1, if it happen shift the
decimal point to bring 0 before the decimal point. Eg: 1.987 =>
0.198

For Example:
Multiply 0.5634E11 × 0.1532E-14.
Solution: 0.5634 × 0.1532 = 0.08631288 and E11 + E-14 = 11 + (-14)
=E-3. Therefore, 0.5634E11 × 0.1532E-14 = 0.08631288 E-3. Now the
leading digit of mantissa should be non-zero, therefore 0.08631288E-3
becomes 0.8631288E-4 = 0.8631 E-4
3. For Division :
a. divide the fractional part fz = fx/fy
b. Add the exponents Ez = Ex - Ey
c. Then z = fz.10Ez
d. Normalized value of fz cannot exceed 1, if it happen shift
the decimal point to bring 0 before the decimal point. Eg:
1.987 => 0.198
For Example:
Divide 0.2000E5 by 0.8883E3.
Solution: 0.2000/0.8883 = 0.2251 and E5 – E3 = 5 – 3 = E2. Therefore,
0.2000E5 / 0.8883E3 = 0.2251E2.

Numbers and Precision | 26

Bansal Sheets
100% (1)
Bansal Sheets
289 pages
Significant Figure Rules 1
No ratings yet
Significant Figure Rules 1
10 pages
Physics Crush Handwritten Notes 06
No ratings yet
Physics Crush Handwritten Notes 06
20 pages
Gp1q1w1.2 - Errors in Measurement
No ratings yet
Gp1q1w1.2 - Errors in Measurement
36 pages
MCSC 202 Numerical Methods
No ratings yet
MCSC 202 Numerical Methods
466 pages
6 Accenture 2023 Pseudocode Trainer Handout
No ratings yet
6 Accenture 2023 Pseudocode Trainer Handout
25 pages
Chem 11 4
No ratings yet
Chem 11 4
16 pages
MTH1310 - Rounding and Significant Figures
No ratings yet
MTH1310 - Rounding and Significant Figures
42 pages
Basic Concepts of Chemistry: Class: +1
No ratings yet
Basic Concepts of Chemistry: Class: +1
19 pages
Error .2
No ratings yet
Error .2
14 pages
Understanding Measurement & Significant Figures
No ratings yet
Understanding Measurement & Significant Figures
14 pages
05-Rounding Off Analytical Results
No ratings yet
05-Rounding Off Analytical Results
6 pages
General Physics 1 (Module 3)
No ratings yet
General Physics 1 (Module 3)
5 pages
Module 2 Accuracy and Precision and Errors
No ratings yet
Module 2 Accuracy and Precision and Errors
8 pages
Understanding Accuracy & Precision
100% (1)
Understanding Accuracy & Precision
5 pages
Errors in Numerical Computations: Dr. Gokul K. C
No ratings yet
Errors in Numerical Computations: Dr. Gokul K. C
19 pages
CH 03
No ratings yet
CH 03
16 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
14 Errorunacademyfinal2
No ratings yet
14 Errorunacademyfinal2
80 pages
Chapter 34
No ratings yet
Chapter 34
46 pages
Uncertainty in Measurements 2
No ratings yet
Uncertainty in Measurements 2
4 pages
Digital Design & Number Systems Guide
No ratings yet
Digital Design & Number Systems Guide
171 pages
Significant Figures and Rounding - Physics Lab I
No ratings yet
Significant Figures and Rounding - Physics Lab I
14 pages
Chem 11 4
No ratings yet
Chem 11 4
16 pages
Lecture I by Surafel
No ratings yet
Lecture I by Surafel
16 pages
Significant Figures & Measurement Uncertainty
No ratings yet
Significant Figures & Measurement Uncertainty
5 pages
Lec 1 Error Analysis
No ratings yet
Lec 1 Error Analysis
29 pages
Calculation of Uncertainty in A Result
No ratings yet
Calculation of Uncertainty in A Result
7 pages
Measurement Accuracy & Precision
No ratings yet
Measurement Accuracy & Precision
9 pages
Numerical Methods for Students
No ratings yet
Numerical Methods for Students
18 pages
Error Calculations
No ratings yet
Error Calculations
10 pages
Chapter 3
No ratings yet
Chapter 3
61 pages
CHEMISTRY 001, Errors 1
No ratings yet
CHEMISTRY 001, Errors 1
27 pages
What I Need To Know
100% (1)
What I Need To Know
16 pages
CHAPTER1-Concept of Error
No ratings yet
CHAPTER1-Concept of Error
11 pages
Significant Figures - Cifras - Significativas
No ratings yet
Significant Figures - Cifras - Significativas
6 pages
Significant Figures
No ratings yet
Significant Figures
9 pages
Introduction to Numerical Analysis
No ratings yet
Introduction to Numerical Analysis
23 pages
Activity 2
100% (1)
Activity 2
5 pages
3 Measurementsanderrors
No ratings yet
3 Measurementsanderrors
5 pages
Identifikasi Dan Kuantifikasi Kimia Pertemuan 4
No ratings yet
Identifikasi Dan Kuantifikasi Kimia Pertemuan 4
38 pages
General Physics 1 - Lesson 2
No ratings yet
General Physics 1 - Lesson 2
26 pages
EXP1 - Significant Figures Errors
0% (1)
EXP1 - Significant Figures Errors
15 pages
Significant Figures Guide
No ratings yet
Significant Figures Guide
10 pages
Fsag
No ratings yet
Fsag
5 pages
Chemistry: Mastering Significant Figures
No ratings yet
Chemistry: Mastering Significant Figures
4 pages
NYA Exp01 Significant Figures v2
No ratings yet
NYA Exp01 Significant Figures v2
5 pages
A2 Physics Error Propagation 001
No ratings yet
A2 Physics Error Propagation 001
14 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Physics Lab: Significant Figures
No ratings yet
Physics Lab: Significant Figures
18 pages
Verilog HDL Lab Codes
No ratings yet
Verilog HDL Lab Codes
30 pages
MAK 411 E Experimental Methods in Engineering: Basic Terminology
No ratings yet
MAK 411 E Experimental Methods in Engineering: Basic Terminology
7 pages
Number System
No ratings yet
Number System
96 pages
Coping With Significant Figures: 1. Where Do They Come From? 2 2. Getting Started 3
No ratings yet
Coping With Significant Figures: 1. Where Do They Come From? 2 2. Getting Started 3
14 pages
Chapter 2
No ratings yet
Chapter 2
62 pages
Ch3 Approximations Errors
No ratings yet
Ch3 Approximations Errors
14 pages
Notes 1.2 - Significant Figures
No ratings yet
Notes 1.2 - Significant Figures
6 pages
Lesson 4 Pyhsics Significant Figure
100% (1)
Lesson 4 Pyhsics Significant Figure
9 pages
Solution Manual For Introduction To Chemistry 4th Edition by Bauer
No ratings yet
Solution Manual For Introduction To Chemistry 4th Edition by Bauer
30 pages
Week 2 - Number System - Presentation
No ratings yet
Week 2 - Number System - Presentation
32 pages
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
No ratings yet
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
5 pages
A Modified Carry Select Adder Using Common Boolean Logic: Sanooja S, Aswathi B
No ratings yet
A Modified Carry Select Adder Using Common Boolean Logic: Sanooja S, Aswathi B
4 pages
Error in Measurements & Instruments
No ratings yet
Error in Measurements & Instruments
15 pages
Computer Number Systems Guide
No ratings yet
Computer Number Systems Guide
14 pages
T2 Homework 2
No ratings yet
T2 Homework 2
3 pages
Mnemonic
No ratings yet
Mnemonic
3 pages
Subtraction With Complements-Class Notes-Issue 3
No ratings yet
Subtraction With Complements-Class Notes-Issue 3
6 pages
ICT Note Book
No ratings yet
ICT Note Book
28 pages
Machine Vision
No ratings yet
Machine Vision
168 pages
Representation of Signed Numbers: 1. Sign-Magnitude (SM) Representation
No ratings yet
Representation of Signed Numbers: 1. Sign-Magnitude (SM) Representation
4 pages
Number Systems and Code Conversion
No ratings yet
Number Systems and Code Conversion
46 pages
RTL Design Examples in Verilog
No ratings yet
RTL Design Examples in Verilog
113 pages
Carry Save Addition: CS623 Cad For Vlsi
No ratings yet
Carry Save Addition: CS623 Cad For Vlsi
5 pages
Sample Midterm Exam
No ratings yet
Sample Midterm Exam
7 pages
Bitwise Operations Lab Guide
No ratings yet
Bitwise Operations Lab Guide
1 page
Sics 154 Digital 2
No ratings yet
Sics 154 Digital 2
13 pages
Digital Questions
No ratings yet
Digital Questions
8 pages
Floating Point Representation
No ratings yet
Floating Point Representation
12 pages
Excersise 2
No ratings yet
Excersise 2
4 pages
Binary Data Structure Guide
No ratings yet
Binary Data Structure Guide
4 pages
EEE212 Week1
No ratings yet
EEE212 Week1
66 pages
12.logical Instruction
No ratings yet
12.logical Instruction
9 pages
MNM CC
No ratings yet
MNM CC
24 pages
Communication Protocols: Energus Power Solutions LTD
No ratings yet
Communication Protocols: Energus Power Solutions LTD
27 pages