Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views50 pages

Probability and Statistics

This document covers measures of central tendency, focusing on the arithmetic mean and median. It explains how to calculate the mean for sets of values, simple frequency distributions, and grouped frequency distributions, along with examples. The median is introduced as an alternative average, detailing its calculation for both discrete and grouped data, emphasizing its usefulness in cases with extreme values.

Uploaded by

bxlxjxkili003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views50 pages

Probability and Statistics

This document covers measures of central tendency, focusing on the arithmetic mean and median. It explains how to calculate the mean for sets of values, simple frequency distributions, and grouped frequency distributions, along with examples. The median is introduced as an alternative average, detailing its calculation for both discrete and grouped data, emphasizing its usefulness in cases with extreme values.

Uploaded by

bxlxjxkili003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 50

Statistics

1.1. Measures of Central Tendency


1.1.1. The Arithmetic Mean
Introduction:
This chapter describes the most commonly used average, the arithmetic mean. It is
initially defined in words with an accompanying simple example, then some important
notation for describing sets of data is given. Techniques for calculating the mean for
(discrete) sets of data and frequency distributions are then demonstrated and the place of
so-called “weighted” means is shown.

Definition of the Arithemetic Mean:


The ARITHMETIC MEAN of a set of values is defined as “the sum of the values”
divided by “the number of values”. The arithmetic mean is normally abbreviated to just
the “mean”

Example 1: (ARITHMETIC MEAN for a SET)


(a) If a firm received orders worth ₤151, ₤52 and ₤280 for three consecutive months,
their mean average values of orders per month would be calculated as:

(b) The mean of the values 12, 8, 25, 26 and 10 is calculated as:

Formula for the mean of a set of values:


The Arithmetic Mean is the most commonly used average and is defined (for a set of
values) as follows:

It is commonly known as the “mean”, which it will generally be called from hereon in the
manual. Using the notation of the previous section, the mean of a set of values
x1,x2…….xn is calculated as follows:

Mean for a set:

Example 2: (MEAN for a SET):


To calculate the mean for the set: 43,75,50,51,51,47,50,47,40,48
Here, n = 10 and
Therefore:

The Mean of a Simple Frequency Distribution:


Large sets of data will normally be arranged into a frequency distribution, and
thus the formula for the mean given in last section is not quite appropriate, since no
account is taken of frequencies. In the case of a simple (discrete) frequency distribution
such as:
x 10 12 13 14 16 19
f 2 8 17 5 1 1

Example 3: (MEAN for a SIMPLE FREQUENCY DISTRIBUTION)


Calculate the mean of the following distribution:
No. of vehicles serviceable x 0 1 2 3 4 5
Number of days f 2 5 11 4 4 1

Solution:
x f fx
0 2 0
1 5 5
2 11 22
3 4 12
4 4 16
5 1 5
27 60

Thus

Hence, the mean number of vehicles serviceable is 2.2.

The Mean of a Grouped Frequency Distribution:


One of the disadvantages of arranging discrete data into the form of a grouped
frequency distribution is the fact that individual values of items are lost. This is
particularly inconvenient when a mean needs to be calculated since, clearly, it is
impossible to find the total of the values of the items, which means, in effect, that it is
impossible to calculate the mean exactly. However, it is possible to estimate it. This is
done by:
(a) using the group (or class) mid-points as representative x-values,
(b) estimating the total of the values in each group using x times f (the group frequency),
(c) adding these totals together to form an estimate of the total of all values (i.e. ),
(d) dividing by the total number of items .

Notice that this gives an estimate of the mean as which is exactly the same

formula as for a simple frequency distribution

2
Formula for the Mean of a frequency distribution:
The mean for a frequency distribution is calculated using the following formula.
Mean for a frequency distribution

Mean,

Note: For a grouped (discrete or continuous) frequency distribution, x is the class mid-
point

Example 4: (MEAN for a GROUPED DISCRETE FREQUENCY DISTRIBUTION)


The following data relates to the number of successful sales made by the salesman
employed by a large microcomputer firm in a particular quarter.
Number of Sales 0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29

Number of Salesman 1 14 23 21 15 6

Calculate the mean number of sales.

Solution: The standard layout and calculations are as follows:

Number of Sales Number of Salesman Class midpoint

(f) (x) (fx)


0 to 4 1 2 2
5 to 9 14 7 98
10 to 14 23 12 276
15 to 19 21 17 357
20 to 24 15 22 330
25 to 29 6 27 162
Totals 80 1225

Here, and

Mean number of sales,

=
= 15.3

Example 5: (MEAN of a GROUPED CONTINUOUS FREQUENCY DISTRIBUTION)


A machine produces circular bolts and, for a quality control test, 250 were selected
randomly and the diameter of their heads measured. Find the mean of the following
resulting diameters.

3
Diameter of Number of Diameter of Number of
Head cm components Head cm components
0.9747 – 0.9719 2 0.9765 – 0.9767 49
0.9750 – 0.9752 6 0.9768 – 0.9770 25
0.9753 – 0.9755 8 0.9771 – 0.9773 18
0.9756 – 0.9758 15 0.9774 – 0.9776 12
0.9759 – 0.9761 42 0.9777 – 0.9779 4
0.9762 – 0.9764 68 0.9780 – 0.9782 1

Solution: Since there are many classes and the data is fairly unwieldy, the given classes
have not been repeated in the following table of calculation. This would be perfectly
acceptable in an examination.

Mid Points x f fx
0.9748 2 1.9496
0.9751 6 5.8506
0.9754 8 7.8032
0.9757 15 14.6355
0.9760 42 40.9920
0.9763 68 66.3884
0.9766 49 47.8534
0.9769 25 24.4225
0.9772 18 17.5896
0.9775 12 11.7300
0.9778 4 3.9112
0.9781 1 0.9781
250 244.1041

Here Σfx = 24444.1041 and Σf = 250

Therefore,

That is, the mean diameter of head = 0.97642 cm.

Note:
(a) The arithmetic mean is the most well known example of a measure of location, or
average, which aims to represent a set of items numerically.
(b) The special notation, x1, x2, x3, ……etc is used as a method of describing the
individual values of the items in a group in general terms, without specifying their
actual values.
(c) The summation operator, ∑, is used to represent the addition of a set of values in
general terms.
(d) The mean for a set of values is found by dividing the sum of the values by their
number.
(e) The mean is the most popular average, being well understood and taking all items
into account. Its main disadvantage is the fact that it takes extreme values too

4
much into account and can be considered unrepresentative where such values
occur.

1.1.2. The Median


Introduction
The median is generally considered as an alternative average to the mean. This section
defines the median and shows how to find its value for a set and for simple and grouped
frequency distributions. In the case of a grouped frequency distribution, two equivalent
methods are demonstrated, one using a formula, the other a graphics method. Also
described are the standard situations where the median is most effectively used.

Definition of the Median


Suppose a machine produces 5, 3, 5, 21 and 2 defective items each day over a five-day
period. The mean number of defectives per day would be calculated as:
Mean =
An objection to using 7.2 as an average here is that it is unrepresentative, both of the four
lower values (5, 3, 5 and 2) and the largest value (21). The mean takes extreme items into
account and thus is sometimes not very useful as a practical average. In cases such as
these, the median, is used. This is found by placing the values in size order and picking
the middle value as the average. The above five values, in order of size, can be written as:
2, 3, 5, 5, 21
and the median (the middle value, underlined) is seen to be 5, which is more useful as a
working average.

The MEDIAN of a set of data is the value of that item which lies exactly half-way along
the set (which must be arranged into size order).

Note 1: When a set of data contains an even number of items, there is no unique middle
or central value. The convention used in this situation is to use the mean of the middle
two items to give a (practical) median.
Note 2: For a set with an odd number (n) of items, the median can be precisely identified
as the value of the item. Thus in a size-ordered set of 15 items, the median

would be the = the 8th item along.

Example 6: (MEDIAN of a set of values)


(a) The median of 43, 75, 48, 51, 51, 47, 50 is determined by size-ordering the
set as: 43, 47, 48, 50, 51, 51, 75 and then: median = middle item = 50.
(b) The median of 2, 4, 6, 1, 2, 3, 3, 2 is found by size-ordering the set as:
1,2,2,2,3,3,4,6 (noticing that there is an even number of items) which
gives median = mean of middle two = (2+3)2 = 2.5

Median for a Simple Frequency Distribution

5
Where there are a large number of discrete items in a data set, but the range of values is
limited, a simple frequency distribution will probably have been complied. For example,
if records had been kept of the number of vehicles not available for hire on each of 80
consecutive days for a large taxi fleet, the results might appear as follows.

Number of vehicles Number of


unavailable days
0 15
1 24
2 18
3 12
4 8
5 2
6 1

Procedure for calculating the Median


To calculate the median for a simple (discrete) frequency distribution, the following
procedure should be followed.
Step 1: Calculate the value of (identifying the central item)
Step 2: Form a F (cumulative frequency) column
Step 3: Find that F value which first exceeds
x f F
__________________
0 15 15
1 24 39
Median = 2 18 57 =
3 12 69
4 8 77
5 2 79
6 1 80

Step 4: The median is the x-value corresponding to the F value identified in Step 3
Note: Sometimes is replaced by N for convenience.

Example 7: Calculate the median for the following distribution of delivery times of
orders sent out from a firm
Delivery time(days) 0 1 2 3 4 5 6 7 8 9 10 11
No. of orders 4 8 11 12 21 15 10 4 2 2 1 1

Solution:

6
The median is the = = 40th item.
The F Column is shown in the following table.
The first F value to exceed 46 is F = 56

Delivery time (days) x No. of orders f Cum. Freq. (F)


0 4 4
1 8 12
2 11 23
3 12 35
4 21 56
5 15 71
6 10 81
7 4 85
8 2 87
9 2 89
10 1 90
11 1 91

The median is thus 4 (days)

Median for a Grouped frequency distribution:


As mentioned earlier, the penalty paid for grouping values is the loss of their
individual identities and thus there is now way that a median can be calculated exactly in
this situation. However, there is a method commonly employed for estimating the
median: Using an interpolation formula. In this context is a simple mathematical
technique which estimates an unknown value by utilizing immediately surrounding
known values.

Steps for estimating the median by formula:


Step 1: Form a cumulative frequency (F) column
Step 2: Find the value of N + 2(where N = Σf)
Step 3: Find that F value that first exceeds N/2, which identifies the median class M.
Step 4: Calculate the median using the following interpolation formula:

Median =

where LM is the lower bound of the median class


FM-1 is the cumulative frequency of class immediately prior to the median class
fM is the actual frequency of median class
cM is the width of the median class.

Example 8: Estimate the median for the following data, which represents the ages of a
set of 130 representatives who took part in the statistical survey.
Age in years 20 and 25 and 30 and 35 and 40 and 45 and

7
under 25 under 30 under 35 under 40 under 45 under 50
No. of 2 14 29 43 33 9
representatives

Solution:
Age (in years) Number of Representatives f F
20 and under 25 2 2
25 and under 30 14 16
30 and under 35 29 45
35 and under 40 43 88
40 and under 45 33 121
45 and under 50 9 130

N/2 = 130//2 = 65

The median class is the class that has the first F greater than 65, i.e. 35 to 40

The median can now be estimated using the interpolation formula

LM = 35 FM-1= 45 fM = 43 cM = 5

Median = =

Therefore the median is 37.33years.

Median for a simple continuous frequency distribution:

Occasionally, continuous data will be measured to a particular value rather than


naturally allocated to true continuous groups. For example, during a work study exercise,
the times taken by 46 workers to complete a particular job were measured to give the
following:

No. of minutes 11 12 13 14 15 16 17 18 19
No. of workers 2 6 18 12 5 0 1 1 1
Notice that, although at first sight he data might appear discrete, it is strictly continuous.
In order to calculate median, the values given for number of minutes must be translated
as true continuous groups rather discrete values.

Characteristics of the Median:


(i) It is an appropriate to the mean when extreme values are present at one or
both ends of a set or distribution
(ii) It can be used when certain end values of a set of distribution are difficult,
expensive or impossible to obtained particularly appropriate to life data.

8
(iii) It can be used with non-numeric data if desired, providing the measurements
can be naturally ordered
(iv) It will often assume a value equal to one of the original items, which is
considered as an advantage over the mean
(v) The main disadvantage of the median is that it is difficult to handle
theoretically in more advanced statistical work.

1.1.3. The Mode


Introduction:
Although the mean and median will be the averages used in most circumstances,
there are situations in which other averages are particularly appropriate. Whereas the
mean can be said to find the centre of gravity and the median , the middle of a set of
items, the mode identifies the most popular item and is described in the following
sections.

Definition of Mode:
Mode of a set of data is the value that occurs often, or equivalently has the largest
frequency.

The mode of the set 2, 1, 4, 3, 3, 1, 1, 2, 1, is 1 since this value occurs often.

Example 9: The mode of the following simple discrete frequency distribution

x 4 5 6 7 8 9 10
f 2 5 21 18 9 2 1
Is 6, since this value has the largest frequency of 21.

The mode for grouped data


For a grouped frequency distribution, the mode( in line with the mean and
median) cannot be determined exactly and so must be estimated. The technique used is
one of the interpolation, similar to that used to estimate the median of a frequency
distribution.

Steps for estimating the mode by an interpolation formula


Step 1: Determine the modal class(that class which has the largest frequency)
Step 2: Calculate D1 = difference between the largest frequency and the frequency
immediately preceding it.
Step 3: Calculate D2 = difference between the largest frequency and the frequency
immediately following it.
Step 4. Use the following interpolation formula

Mode =

Where L is the lower bound of modal class


C is the modal class width.

9
Example 10: Estimate the mode of the following distribution of ages.
Age in years 20-25 25-30 30-35 35-40 40-45 45-50
No. of employees 2 14 29 43 33 9

Solution:
Age (Years) No. of Employees
20-25 2
25-30 14
30-35 29
→ 35-40 43
40-45 33
45-50 9

The modal class is 35-40

D1 = 43 – 29 = 14
D2 = 43 – 33 = 10

The lower class bound of the modal class is L = 35.


The class width of the modal class, C = 5 (from 35 – 40)

Thus Mode = =

Therefore the mode is 37.92 years.

Characteristics of the Mode:


(i) Occasionally used as an alternative to the mean or median when the situation
calls for the most popular value to represent some data
(ii) Easy to understand, not difficult to calculate and can be used when a
distribution has open ended classes
(iii) Although the mode usefully ignores isolated extreme values, it is thought to
be too much affected by the most popular class when a distribution is
significantly skewed.
(iv) Like the median, the mode is not used in advanced statistical work.

Empirical Relationship between Mean, Median and Mode.

Mode = 3 Median – 2 Mode.

Exercise 1.1:

1. Find the arithmetic mean of the following sets:


(a) 84, 92, 73, 67, 88, 74, 91, 74

10
(b) 0.53, 0.46, 0.50, 0.49, 0.52, 0.53, 0.44, 0.55, 0.54

2. Find the mean of the following frequency distributions:


(a)
x f
18.5 5
19.5 12
20.5 20
(b)

x f
1 2
2 8
3 24
4 52
5 31
6 11

3. A firm recorded the number of orders received for each of 58 successive weeks to give
the following distribution:

Number of Orders Number of


Received Weeks
10 – 14 3
15 – 19 7
20 – 24 15
25 – 29 20
30 – 34 9
35 – 39 4
Calculate the mean weekly number of orders received.

4. The ages of a company’s employees are tabulated below:


Calculate the mean employee age in years.

Number of
Age in years Employees
20 and under 25 2
25 and under 30 14
30 and under 35 29
35 and under 40 43
40 and under 45 33
45 and under 50 9

11
5. A quality control section of a cannery inspected the contents of 130 randomly selected
tins of cooked spaghetti from output. As part of their measurements, the following net
weights (in grams) were tabulated:

Weight (in grams) Number


of Tins
under 424.9 1
424.900 – 424.925 1
424.925 – 424.950 6
424.950 – 424.975 18
424.975 – 424.000 33
425.000 – 425.025 46
425.025 – 425.050 14
425.050 – 425.075 5
425.075 – 425.100 5
425.1 and over 1

Calculate the mean net weight of the contents of the tins and say whether you think that
the consumer is getting reasonable value if the label on the tin advertises the contents as
425gms.

6. The following is an extract from a business report.


“….. Over the past 15 months, the number of orders received has averaged 24 per month
with the best three months averaging 35. The lowest months saw only 14, 14, 16 and 22
orders respectively……”.
(a) Find the average number of orders that were received in the middle 8 months.
(b) If the target over 16 months is an average of 25, how many orders must be
received in month 16 to achieve this?

7. During the 1984 – 85 session, a college ran 70 different classes of which 44 were
“science”, with a mean class size of 15.2, and 26 were “arts”, with a mean class size of
19.2. The frequency distribution of class sizes is given:

Size of Class Number of Number of


(Number of students) Science Classes Arts Classes
1–6 4 0
7 – 12 15 3
13 – 18 11 10
19 – 24 8 8
25 – 30 5 4
31 – 36 1 1
No student belonged to more than one class.
(a) Calculate the mean class size of the college.

12
(b) Suppose now that no class of 12 students or less had been allowed to run.
Calculate what the mean class size for the college would have been if the students
in such classes:
(i) had been transferred to the other classes;
(ii) had not been admitted to the college.

(c) Number of students enrolling in 1986-87 on science and arts courses is expected
to rise by 20% and to fall by 10% respectively, compared with 1984-85. Calculate
the maximum number of classes the college should run if the mean class size is to
be not less than 20.s

8. Find the median of the following sets of data:


(a) 2.52, 3.96, 3.28, 9.20, 3.75
(b) 84, 91, 72, 68, 87, 78, 78, 82, 79

9. The following figures were obtained by sampling the output of bags of walnuts which
were ready to be distributed to a national chain of supermarkets.

No. of walnuts 19 20 21 22 23

No. of Bags 2 11 29 36 10
Find the median number of walnuts per bag.

10. Use the interpolation formula to estimate the median of the following data, which
relate to the IQ of a special group of an organisation’s employees.
IQ 98-106 107-115 116-124 125-133 134-142 143-151 152-160
No. of employees 3 5 9 12 5 4 2

11. The following figures relate to the length of time spent by cars in a particular car park
during one day.
Time Parked Upto 1 1-2 2-3 3-4 4-5 5-6 6-9 9-12
No. of Cars 450 730 640 120 40 30 20 20
Estimate the median parking time.

12. Determine the value of the mode for the following sets of data:
(a) 10, 11, 10, 12, 11, 10, 11, 11, 11, 12, 13, 11, 12
(b) 2, 1, 1, 2, 3, 2, 3, 4, 6, 4, 1, 2, 3
(c)
x 14 15 16 17 18 19 20
f 14 26 18 9 2 1 1

13. Calculate a modal value for the following data of age at commitment of crime of 500
male criminals.
Age (years) Under 16-17 18 19-20 21-27 28-36
16
Number of men 8 70 95 133 161 33

13
14. Find the mode of the following distribution.
No. of children 0 1 2 3 4 5 6 or more
No. of families 11 47 28 9 4 1 1

1.2 Measures of Dispersion


The absolute measures can be divided into following four positional measures.
1. Range 2. Quartile Deviation or Semi-inter-quartile deviation 3.Mean
Deviation 4. Standard Deviation.
The relative measures in each of the above four cases are called the coefficient of
the respective measures such as coefficient of standard deviation etc. The relative
measures are used only for the purpose of comparision between two or more series with
varying size or number of items or varying central values or varying units of calculation.

1.2.1 Range
It is the simplest measure of dispersion. It is the difference between the minimum
and maximum items of the series. For example, in the series 20, 21, 22, 25, 30, 32, 47,
37, 65 the range is 65 – 20 = 45.
Absolute range or range = xmax – xmin.
Or range = L – S, where L is the largest value and S is the smallest value.

Coefficient of range or relative range =

=
Merits, Demerits and Uses of Range
Merits: 1. It can be easily understood.
2. It is easy to calculate and it is the simplest method of measuring dispersion
3. It lends itself to algebraic treatments.
4. It is an absolute measure of dispersion.
Demerits: 1. It is too indefinite to be used as a practical measure of dispersion because it
depends entirely upon the extreme values.
2. It is not based on all the observations.
3. It is affected by sampling fluctuations.

Uses: It is used in quality control.


Example 11: Find the range and coefficient of range of the weights of 10 students from
the following data
41 20 15 65 73 84 53 35 71 55.
Solution: Arranging the data in the ascending order, we get
15 20 35 41 53 55 65 71 73 84
Here largest value L = 84, Smallest value S = 15
Range = L – S = 84 – 15 = 69.
Coefficient of Range = .

14
Example 12: Find the range and the coefficient of range of the marks obtained by 100
students given below:
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 - 60 60 – 70 70 – 80
No. of Students 13 8 7 10 11 23 18 10

Solution: Here L = 80, S = 0


Range = L – S = 80 – 0 = 80
Coefficient of range =

1.2.2 Quartile deviation or Semi-inter quartile range

Quartile Deviation is a measure of dispersion based on the Upper Quartile (Q3) and
Lower Quartile (Q1) of a series. It is half of the difference between the upper quartile and
the lower quartile. This difference is the range between these two quartiles and is called
inner-quartile range. The half of this range is semi-quartile range. The quartile deviation
is also known as Semi-interquartile Range.
Quartile Deviation = , where
Q3 = third or upper quartile; Q1 = first or lower quartile.
In most cases the central 50% observations of a series tend to be fairly
typical. Thus quartile deviation can be used as a suitable measure of dispersion in such
cases. It can also be computed from a frequency distribution with open-end classes at
both ends. It is important to note that range cannot be computed from a frequency
distribution with open end classes at both ends, so, quartile deviation is a more suitable
measure of dispersion than range.

Merits, Demerits and Uses of Quartile Deviation

Merits. 1. It is easy to calculate.


2. It can be easily understood.
3. It is not affected by the extreme values.
4. It has a special utility in measuring variation in case of frequency distribution
with open end classes at both ends.

Demerits.1. It is not based on all the observations.


2. It is not capable of algebraic treatments.
3. It is the representative value of data.
4. It is affected by sampling fluctuations.
5. It cannot be regarded as measure of dispersion as it really does not show the
scatter around an average but rather a distance on a scale.

Uses: It can be used only for descriptive statistics.

Example 13: If the first quartile is 104 and quartile deviation is 18, find the third
quartile.

15
Solution: Here Q1 = 104, Q3 = ?
Quartile deviation = = 18 (given)
 Q3 = 2  18 + Q1 = 36 + 104 = 140
Third Quartile Q3 = 140.

Example 14: Calculate the Quartile deviation and coefficient of quartile deviation from
the following data:
Age in years 20 30 40 50 60 70 80
No. of Members 3 61 132 153 140 51 3
Solution:
We have the following table
Age in years (x) No. of members (f) Cumulative frequency (c.f)
20 3 3
30 61 64
40 132 196
50 153 349
60 140 489
70 51 540
80 3 543

Here N = 543
th
Q1 = value of item = 40 years

Q3 = value of 3 th
item = 3  136 = 408th item = 60 years.

Quartile Deviation = = 10 years.

Coefficient of Q.D. = = 0.2

1.2.3 Mean Deviation or Average Deviation

Mean deviation of a set of observations of a series is the arithmetic mean of all the
deviations, without their algebraic signs, taken from its central value(mean, median or
mode). In other words, it is in the average of the modulus of the deviations of the
observations in a series taken from mean or median or mode. Mean deviation is one of
the calculated measure in which all the values are employed in their calculations. It has a
precise significance as it is an arithmetic average of the variations of the value of
individual item in the series from their central tendency. While calculating it, we will
come across the following two problems:

16
1. What average should be taken as central value? The solution to it is that the
central value may be any one of the averages – mean, median or mode. But,
generally, arithmetic mean is taken as the central value.
2. What should be the algebraic signs of the deviations? While calculating the mean
deviations, the algebraic sign of the deviation is always taken as positive, because
the sum of deviations with their algebraic signs, + and –, from the arithmetic
mean is always zero.

Example 15: Let us have a sample of six observations 3, 5, 13, 14, 15, 16. The mean of
these observations is = =11.
The deviation of the items from the mean 11 is (3 – 11),(8 – 11),(13 – 11),(14 – 11),
(15 – 11), (16 – 11). i.e. – 8, –6, +2, +3, +4, +5. The sum of all these deviations is
– 8 –6 +2 +3 +4+5 = 0. Thus the summation of deviations from the mean in the given
series is zero and this will be so in all the other series. To avoid such a situation, we have
the following rule.

Signs(plus and minus) of deviations are disregarded and absolute values of the deviations
are summed up. Symbolically, we use , which means the deviation of the ith
observation of x from the central value , (which may be mean or median or mode) with
positive sign. Here the vertical line stand for positive value. Now add up all n
observations to get . Then Mean Deviation MD = , where is the

arithmetic mean.

Example 16: Calculate the mean deviation about the mean for the following series:
15, 20, 17, 19, 21, 13, 12, 10, 17, 9, 12.

Solution: Here n = 11, and therefore,

Now,
x d = x – 15
15 0 0
20 5 5
17 2 2
19 4 4
21 6 6
13 –2 2
12 –3 3
10 –5 5
17 2 2
9 –6 6

17
12 –3 3
 = 38

Mean deviation = = 3.45

Mean Deviation for Grouped data: Let x1, x2, … , xn occur with frequencies f1, f2, …,
fn respectively and let f = n and M can be either Mean or Median or Mode, then the
mean deviation is given by the formula,

Mean Deviation = where d = and f = n

Example 17: Find the mean deviation from the mean for the following data:
Marks obtained 20 18 16 14 12 10 8 6
No. of Students 2 4 9 18 27 25 14 1

Solution: Let us calculate the mean of the given data by forming the following table:7

Marks (x) No. of Students(f) fx = f


6 1 6 6 6
8 14 112 4 56
10 25 250 2 50
12 27 324 0 0
14 18 252 2 36
16 9 144 4 36
18 4 72 6 24
20 2 40 8 16
f = 100 f x = f  =
1200 224

Arithmetic mean =

Mean Deviation about Mean = =

Example 18: Calculate the mean deviation from the mean for the following data:
Class interval 0 – 4 4 – 8 8 – 12 12 – 16 16 – 20
Frequency 4 6 8 5 2
Solution:
Let us prepare the following table by assuming that the frequencies in each class
are centered at its mid-value.
Class Mid value (x) f fx = f

18
0–4 2 4 8 7.2 28.8
4–8 6 6 36 3.2 19.2
8 – 12 10 8 80 0.8 6.4
12 – 16 14 5 70 4.8 24.0
16 – 20 18 2 36 8.8 17.6
f = 25 f x = 230 f = 96.0

Arithmetic mean =

Mean Deviation about Mean = =

Merits, Demerits and Uses of Mean Deviation


Merits:
1. It is easy to understand and compute
2. Mean deviation is less affected by the extreme values as compared to standard
deviation
3. Mean deviation about an arbitrary point is least when the point is median
Demerits:
1. In mean deviation the signs of all deviations are taken as positive and
therefore, it is not suitable for further algebraic treatments.
2. It is rarely used in social sciences
3. It does not give accurate results because the mean deviation from the median
is least but median itself is not considered a satisfactory average when the
variation in the series is large.
4. It is often not useful for statistical inferences.
Uses:
Mean deviation and its coefficient are used in studying economic problems such
as distribution of income and wealth in a society.

1.2.4 Standard Deviation

Standard Deviation is the most important and commonly used measure of


dispersion. It measures the absolute dispersion or variability of a distribution. A small
standard deviation means a high degree of uniformity of the observations as well as
homogeneity of a series. It is extremely useful in judging the representativeness of the
mean.
Standard deviation is the positive square root of the average of squared deviations taken
from arithmetic mean. It is, generally, denoted by the Greek alphabet  or by S.D. or s.d.
Let x be a random variate which takes on n values, viz., x 1, x2, … , xn, then the standard
deviation of these n observations is given by,

where is the mean of the observations.

19
When the items are very small, the following formula is used

Example 19: Find the standard deviation of 3, 4, 5, 6


Solution: Here n = 4, x = 3 + 4 + 5 + 6 = 18
x2 = 32 + 42 + 52 + 62 = 9 + 16 + 25 + 36 = 86

Merits, Demerits and Uses of Standard Deviation:


Merits:
1. It is based on all the observation
2. It is rigidly defined
3. It has a greater mathematical significance and is capable of further
mathematical treatments
4. It represents the true measurement of dispersion of a series.
5. It is least affected by fluctuation of sampling
6. It is not reliable and dependable measure of dispersion
7. It is extremely useful in correlation etc.
Demerits:
1. It is difficult to compute unlike other measures of dispersion
2. It is not simple to understand and not easily understood
3. It gives more weightage to extreme values
4. It consumes much time and labour while computing it.
Uses:
1. It is widely used in biological studies.
2. It is used in fitting a normal curve to a frequency distribution
3. It is most widely used measure of dispersion.

1.2.5 Variance and Coefficient of Variation:

Variance: The variance is the square of standard deviation and is denoted by σ 2. The
methods for calculating variance are the same as for the standard deviation.

Coefficient of variation: It is a relative measure of dispersion. It is generally, denoted


by C.V. and is given by the formula
Coefficient of variation or C.V. =
Where σ is the standard deviation and is the mean of the given series. It is
important to note that the coefficient of variation is always a percentage.

20
The coefficient of variation is of great practical significance and is the best
measure of comparing the variability of the two series. The series or group for which the
coefficient or variation is greater is said to be more variable(less consistent). On the
other hand, the series for which the variation is less is said to be less variable(more
consistent). Coefficient of variation can be employed for comparing the relative
consistency of the prices of shares of two or more companies. It will help a genuine
investor(in shares) in selecting share, the price of which is relatively more stable. Thus
the shares which are more consistent in the fluctuation of prices will be preferred by him.

Calculation of Standard Deviation – Individual observations:


When the data under consideration consists of individual observations, the standard
deviation may be computed by any of the following two methods.
(a) By taking deviations of the items from the actual mean.
(b) By taking deviations of the items from an assumed mean.

Direct Method:
In case of simple series, the standard deviation can be obtained by the formula

or , where d = xi –

and xi is the value of the variable or observation,


is the arithmetic mean,
n is the total number of observations.

Steps of calculation
Step 1: Calculate the arithmetic mean
Step 2: Take the deviations of the items from the mean
i.e. calculate d = xi –
Step 3: Take the sum of the square of all these deviations
i.e. d2 =

Step 4: Find the mean of the squared deviations obtained in step 3.


i.e. , where n is the total number of observations. It is known as
variance.
Step 5: Take the square root of variance to get the desired standard deviation.

Example 20: Find the standard deviation of 16, 13, 17, 22.
Solution: Here A.M. = .

21
Let us prepare the following table in order to calculate the standard deviation.

X d = x – = x – 17 d2 = (x – )2
16 –1 1
13 –4 16
17 0 0
22 5 25
d2 = 42

Now, = = = 3.2

Short cut method:


This method is applied to calculate standard deviation, when the mean of the data
comes out to be a fraction. In that case it is very difficult and tedious to find the
deviations of all observations from the mean by the above method. The formula used is

, where d = x – A, A is assumed mean

Steps in calculation:
Step 1: Take any arbitrary number as the assumed mean A.
Step 2: Take the deviations from the assumed mean and denote it by d.
i.e. d = x – A. Take the total of these deviations, i.e. obtain d.
Step 3: Square these deviations and obtain d2.

Step 4: Calculate , , , where n is the total number of the observations.

Step 5: Find – . Take its square root to get the standard deviation of the

given data.

Example 21: Find the standard deviation of the following data:


48, 43, 65, 57, 31, 60, 37, 48, 59, 78.
Solution: Let us prepare the following table in order to calculate the value of S.D. by
assuming the value of A as 50.
Value x d=x–A d2
48 –2 4
43 –7 49
65 15 225
57 7 49
31 – 19 361
60 10 100
37 – 13 169
48 –2 4
59 9 81
78 28 784

22
n = 10 d = 26 d2 = 1826

Here, , which is a fraction.


Let us apply the short cut formula in order to calculate S.D.

 =

Standard Deviation for Discrete Series or Grouped Data:


The standard deviation of a discrete series or grouped data can be calculated by
any one of the following methods.
(a) Actual Mean Method or Direct Method
(b) Assumed Mean Method or Short cut method
(a) Direct Method: The standard deviation for the discrete series is given by the formula

, where is the arithmetic mean, x is the size of the item, f is

the corresponding frequency and n = f.


However, in practice, this method is rarely used because if the arithmetic mean is
in fraction, the calculations take a lot of time and are cumbersome.

(b) Short cut Method: In this method we use the following formula to calculate the

standard deviation , where d = x – A, A is the assumed mean and n

= f.

Steps in calculation
Step 1: Take any item of the given series as assumed mean A.
Step 2: Take the deviations of the items from the mean A and denote it by d
Step 3: Multiply the deviations by the respective frequency and denote it by fd. Obtain
the total  fd.
Step 4: Calculate d2, where d’s are obtained in step 2.
Step 5. Multiply the squared deviations by respective frequencies to get  fd2.

Step 6. Find the value of 2 = – .

Step 7: Take the square root of 2 obtained in step 6 to get the value of standard deviation

Example 22: Find the standard deviation from the following data:
Size of the item: 10 11 12 13 14 15 16
Frequency: 2 7 11 15 10 4 1

Solution:
Size of the Frequency f d=x–A fd d2 fd2

23
item x A = 13
10 2 –3 –6 9 18
11 7 –2 – 14 4 28
12 11 –1 – 11 1 11
13 15 0 0 0 0
14 10 1 10 1 10
15 4 2 8 4 16
16 1 3 3 9 9
n = f = 50  fd = – 10  fd2 = 92
Now A.M. = , a fraction.

S.D. = =

Calculation of Standard Deviation for a continuous series:


The standard deviation of a continuous series can be calculated by any one of the
methods discussed for discrete frequency distribution. However, in practice only Step
Deviation Method is mostly used. In this method the formula used is

, where d = i is the class interval, m is the mid value of

the interval, A is the assumed mean.

Steps in calculation;
Step 1: Find the mid values or mid points of the various classes and denote it by m
Step 2: Take any one of the values of m’s as the assumed mean A
Step 3: Take the deviations of the mid points from the assumed mean A and divide it by
class interval or common factor i. Denote it by d.
Step 4: Multiply the respective frequencies f with the corresponding deviation d and
obtain  fd.
Step 5: Square the deviations d and multiply it with their respective frequencies. Obtain
 fd2
Step 6: Substitute the values of  fd,  fd2, i in the formula

, where n = f.

Example 23: Find the standard deviation of the following distribution:


Marks 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
No. of Students 5 12 15 20 10 4 2

Solution: Assume A = 45
Class interval No of students f Mid value x fd fd2
d=

24
10 – 20 5 15 –3 – 15 45
20 – 30 12 25 –2 – 24 48
30 – 40 15 35 –1 – 15 15
40 – 50 20 45 0 0 0
50 – 60 10 55 1 10 10
60 – 70 4 65 2 8 16
70 – 80 2 75 3 6 18
f = n = 68 – 30 fd =152
2

= = 14.3

Example 24: Calculate the mean, median and variance of the following data:

Height in cm 95 – 105 105 – 115 115 – 125 125 – 135 135 – 145
No of Children 19 23 36 70 52

Solution: Here i = 10, Let A = 120

Class interval Mid value x f fd fd2 Cum. Freq. f


d=
95 – 105 100 19 –2 – 38 76 19
105 – 115 110 23 –1 – 23 23 42
115 – 125 120 36 0 0 0 78
125 – 135 130 70 1 70 70 148
135 – 145 140 52 2 104 208 200
113 377 n = 200

Mean =
Median: The median class is 125 – 135 as n/2 = 100 lies in it.

Therefore, Median =

Variance:

= = 12.51

σ2 = 156.5

Example 25: The following are the runs scored by two batsmen A and B in ten innings.

25
A 101 27 0 36 82 45 7 13 65 14
B 97 12 40 96 13 8 85 8 56 15
Who is more consistent?

Solution:
Here

And

Let us now calculate the coefficient of variation of A and B.

Batsman A Batsman B
2
Runs Scored dx = x – 39 dx Runs Scored dy = y – 39 dy2
101 62 3844 97 54 2916
27 – 12 144 12 – 31 961
0 – 39 1521 40 –3 9
36 –3 9 96 53 2809
82 43 1849 13 – 30 900
45 6 36 8 – 35 1225
7 – 32 1024 85 42 1764
13 – 26 676 8 – 35 125
65 26 676 56 13 169
14 – 25 625 15 –8 784
390 0 10404 430 0 12762
Also C.V. of Batsman A =

C.V. of Batsman B =
Now, C.V.of A < C.V. of B which implies Batsman A is more consistent than
Batsman B.

1.2.6 Skewness

A frequency distribution is said to be symmetrical when the values of the variable


equidistant from their mean have equal frequencies.

If a frequency distribution is not symmetrical, it said to asymmetrical or skewed. Any


deviation from symmetry is called skewness.

26
Skewness is the lack of symmetry. When a frequency distribution is plotted on a chart,
skewness present in the items tends to be dispersed more on one side of the mean than on
the other.
Skewness may be positive or negative. A distribution is said to be positively skewed if
the frequency curve has a longer tail towards the higher values of x, i.e., if the frequency
curve gradually slopes down towards the high values of x. For a positively skewed
distribution,
Mean (M) > Median (Me) > Mode (Mo)
A distribution is said to be negatively skewed if the frequency curve has a longer tail
towards the lower values of x. For a negatively skewed distribution,
Mean < Median < Mode.
For a symmetrical distribution,
Mean = Median = Mode.

Measures of Skewness
The degree of skewness is measured by its coefficient. The common measures of
skewness are:
1. Pearson’s first measure:

2. Pearson’s second measure:

3. Bowley’s measure:

where Q1, Q2,Q3 are the first, second and third quartiles respectively.

4. Moment measure:

where m2 and m3 are the second and the third central moments and is the S.D.

Example 26: Calculate the Pearson’s measure of skewness on the basis of Mean, Mode
and Standard Deviation.

x: 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5


f: 35 40 48 100 125 87 43 22

Solution:
Pearson’s first measure of skewness is

27
Assuming a continuous series, we construct the following table:

Class-intervals x f d=x – 18.5 fd fd2


14 – 15 14.5 35 -4 -140 560
15 – 16 15.5 40 -3 -120 360
16 – 17 16.5 48 -2 -96 192
17 – 18 17.5 100 -1 -100 100
18 – 19 18.5 125 0 0 0
19 – 20 19.5 87 1 87 87
20 – 21 20.5 43 2 86 172
21 – 22 21.5 22 3 66 198
Total 500 = N -217 = fd 1669 = fd2

Mean = A + = 18.5 - = 18.5 – 0.43 = 18.07.

S.D. =

Mode = =

Skewness =

Example 27: (i) The Karl Pearson’s coefficient of skewness of a distribution is 0.32. Its
S.D. is 6.5 and the mean is 29.6. Find the Mode.
(ii) In a distribution Mean = 65; Median = 70 and Coefficient of skewness is
–0.6. Find (a) Mode (b) Coefficient of variation.

Solution:

(i) Karl Pearson’s first measure of skewness is


Solving the above, we get Mode = 27.52.

(ii) Karl Pearson’s Second measure of skewness is

28

Solving the above, we get S.D. = 25.

Again, Coefficient of


Solving the above, we get Mode = 80

Coefficient of Variation = = = 38.46%

Example 28: Find the appropriate measure of skewness from the following distribution:
Age (Years) Below 20 20-25 25-30 30-35 35-40 40-45 45-55 55 and above
No. of Employees 13 29 46 60 112 94 45 21

Solution: Since the frequency distribution has open-end classes, skewness based on
Quartiles, i.e. Bowley’s measure is the appropriate measure of skewness.
Age (Years) Cumulative frequency (less than)
20 13
25 42
30 88
Q1   105 = N/4
35 148
Q2   210 = N/2
40 260
Q3   315 = 3N/4
45 354
55 399
55 and above 420 = N

Here, , and

29

1.2.7 Kurtosis:

Kurtosis is the peakedness of the frequency curve. In two or more distributions


having same average, dispersion and skewness, one may have high concentration of
values near the mode; in this case its frequency curve will show a sharper peak than the
other. This characteristic of frequency distribution is known as kurtosis.
Kurtosis is measured by the coefficient 2.
A distribution is said to be Platy-Kurtic, Meso-Kurtic and Lepto-Kurtic according as
2 < 3, 2 = 3 and 2 > 3.
Exercise 1.2

1. Calculate the range from the following data:


12, 20, 15, 22, 16, 14, 21, 17

2. Calculate the range from the following data:


10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
5 8 12 7 4

3. The first quartile derived from a set of observations is 25 and its quartile deviation is
15, find the third quartile.

4. Calculate quartile deviation and its coefficient from the following data:
Marks obtained 10 20 30 40 50 60
No. of Students 4 7 15 8 7 2

5. Calculate the semi-interquartile range for the following data:


Class 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
Frequency 4 15 28 16 7

6. A student obtained the mean and standard deviation of 100 observations as 40 and 5.1
respectively. It was later discovered that he had wrongly copied down an observation 50
instead of 40. Calculate the correct mean and standard deviation.

7. The coefficient of variation of two series are 58% and 69%. Their standard deviations
are 21.2 and 15.6. What are their arithmetic means?

8. The arithmetic mean and the standard deviation of a set of 9 items are 43 and 5
respectively. If an item of value 63 is added to the set, find the mean and standard
deviation of all the 10 items.

9. Find the mean deviation from mean and standard deviation for the following data:

30
100 150 200 250 360 490 500 600 671

10. Calculate the standard deviation and C.V. for the following data:
Size of the item 10 11 12 13 14 15 16
Frequency 2 7 11 15 10 4 1

11. The coefficient of variations of two series are 123.6% and 10.9%. Their standard
deviations are 1.31 and 1.30. What are their arithmetic means?

3. Theory of Probability

3.1.1. Introduction

If an experiment is repeated under essential homogeneous and similar conditions


we generally come across two types of situations:
(i) The result or what is usually known as the ‘outcome’ is unique or certain.
(ii) The result is not unique but may be one of the several possible outcomes.
The phenomena covered by (i) are known as deterministic. For example, for a perfect
gas, PV = constant.
The phenomena covered by (ii) are known as probabilistic. For example, in tossing a
coin we are not sure if a head or tail will be obtained.

In the study of statistics we are concerned basically with the presentation and
interpretation of chance outcomes that occur in a planned study or scientific
investigation.

31
3.1.2. Definition of various terms

Trial and event: Consider an experiment which, though repeated under essentially
identical conditions, does not give unique results but may result in any one of the several
possible outcomes. The experiment is known as a trial and outcomes are known as
events or cases. For example, throwing of a die is a trial and getting 1(or 2 or … 6) is an
event.

Exhaustive events: The total number of possible outcomes in any trial is known as
exhaustive events or exhaustive cases. For example, in tossing of a coin there are two
exhaustive case, viz.: Head and Tail(the possibility of the coin standing on an edge being
ignored)

Favourable events or cases: The number of cases favourable to an event in a trial is the
number of outcomes which entail the happening of the event. For example, in throwing
of two dice, the number of cases favourable to getting the sum 3 is: (1,2) and (2,1)

Mutually exclusive events: Events are said to be mutually exclusive or incompatible if


the happening of any one of them precludes the happening of all the others, that is if no
two or more of them can happen simultaneously in the same trial. For example, in
tossing a coin the events head and tail are mutually exclusive.

Equally likely events: Outcomes of a trial are said to be equally likely, if taking into
consideration all the relevant evidences, there is no reason to expect one in preference to
the others. For example, in throwing an unbiased die, all the six faces are equally likely
to come.

Sample Space: Consider an experiment whose outcome is not predictable with certainty.
However, although the outcome of the experiment will not be known in advance, let us
suppose that the set of all possible outcomes is known. This set of all possible outcomes
of an experiment is known as the sample space of the experiment and is denoted by S.

Some examples follow.


1. If the outcome of an experiment consists in the determination of the sex of a newborn
child, then
S = { g,b}
where the outcome g means that the child is a girl and b that it is a boy.

2. If the experiment consists of flipping two coins, then the sample space consists of the
following four points:
S = {(H,H), (H,T), (T,H), (T,T)}
The outcome will be (H,H) if both coins are heads, (H,T) if the first coin is heads
and the second tails, (T,H) if the first is tails and the second heads, and (T,T) if both
coins are tails.

32
3. If the experiment consists of tossing two dice, then the sample space consists if the 36
points
S = { (i,j):i,j = 1, 2, 3, 4, 5,
= { (1,1)------(1,6)-----(6,1)-----(6,6) }
where the outcome (i,j) is said to occur if i appears on the leftmost die and j on the
other die.

3.2. Definitions of Probability

1. Mathematical or Classical or a priori probability:


If a trial results in n exhaustive, mutually exclusive and equally likely cases and m
of them are favourable to the happening of an event E, then the probability ‘p’ of
happening of E is given by,
p = P(E) =

2. Statistical or empirical probability:


If a trial is repeated a number of times under essentially homogenous and
identical conditions, then the limiting value of the number of times the event happens to
the number of trials, as the number of trials become indefinitely large is called the
probability of happening of the event. Symbolically, if in n trials an event E happens m
times, then the probability ‘p’ of the happening of E is given by,
P = P(E) =

3. Axiomatic Definition:
Consider an experiment whose sample space is S. For each event E of the sample
space S, we assume that a number P(E) is defined and satisfies the following three
axioms.

Axiom 1: 0 ≤ P(E) ≤ 1
Axiom 2: P(S) = 1
Axiom 3: For any sequence of mutually exclusive events, E 1, E2, … (that is, events for
which Ei Ej = Φ, when i ≠ j),

3.2.1. Some Important Formulas

1. If A and B are any two events, then

This rule is known as additive rule on probability.

33
For three events A, B and C, we have,

2. If A and B are mutually exclusive events, then

In general, if A1, A2, … , An are mutually exclusive, then

3. If A and Ac are complementary events, then


P(A) + P(Ac) = 1

4. P(S) = 1

5. P(Φ) = 0

6. If A and B are any two events, then

7. If A and B are independent events, then

Glossary of Probability terms:

Statement Meaning in terms of


Set theory
1. At least one of the events A or B occurs
2. Both the events A and B occur
3. Neither A nor B occurs
4. Event A occurs and B does not occur
5. Exactly one of the events A or B occurs
6. If event A occurs, so does B AB
7. Events A and B are mutually exclusive
8. Complementary event of A
9. Sample space Universal set S

3.2.2. Solved Examples

Example 1: Find the probability of getting a head in tossing a coin.


Solution: When a coin is tossed, we have the sample space Head, Tail
Therefore, the total number of possible outcomes is 2

34
The favourable number of outcomes is 1, that is the head.
The required probability is ½.

Example 2: Find the probability of getting two tails in two tosses of a coin.
Solution: When two coins are tossed, we have the sample space HH, HT, TH, TT
Where H represents the outcome Head and T represents the outcome Tail.
The total number of possible outcomes is 4.
The favourable number of outcomes is 1, that is TT
The required probability is ¼.

Example 3: Find the probability of getting an even number when a die is thrown
Solution: When a die is thrown the sample space is 1, 2, 3, 4, 5, 6
The total number of possible outcomes is 6
The favourable number of outcomes is 3, that is 2, 4 and 6
The required probability is= ½.

Example 4: What is the chance that a leap year selected at random will contain 53
Sundays?
Solution: In a leap year(which consists of 366 days) there are 52 complete weeks and 2
days over. The following are the possible combinations for these two over days:
(i) Sunday and Monday (ii)Monday and Tuesday (iii)Tuesday and Wednesday
(iv)Wednesday and Thursday (v)Thursday and Friday (vi)Friday and Saturday
(vii)Saturday and Sunday.
In order that a leap year selected at random should contain 53 Sundays, one of the
two over days must be Sunday. Since out of the above 7 possibilities, 2 viz. (i) and
(ii)are favourable to this event,
Required probability

Example 5: If two dice are rolled, what is the probability that the sum of the upturned
faces will equal 7?
Solution: We shall solve this problem under the assumption that all of the 36 possible
outcomes are equally likely. Since there are 6 possible outcomes – namely (1,6), (2,5),
(3,4), (4,3), (5,2,), (6,1) – that result in the sum of the dice being equal to 7, the desired
probability is = .

Example 6: A bag contains 3 Red, 6 White and 7 Blue balls. What is the probability that
two balls drawn are white and blue?
Solution: Total number of balls = 3 + 6 + 7 = 16.
Out of 16 balls, 2 can be drawn in ways.
Therefore exhaustive number of cases is 120.
Out of 6 white balls 1 ball can be drawn in ways and out of 7 blue balls 1 ball
can be drawn in ways. Since each of the former cases can be associated with each
of the latter cases, total number of favourable cases is x = 6 x 7 = 42.

35
The required probability is =

Example 7: A lot consists of 10 good articles, 4 with minor defects and 2 with major
defects. Two articles are chosen from the lot at random (without replacement). Find the
probability that (i) both are good, (ii) both have major defects, (iii) at least 1 is good, (iv)
at most 1 is good, (v)exactly 1 is good, (vi) neither has major defects and (vii) neither is
good.
Solution: Although the articles may be drawn one after the other, we can consider that
both articles are drawn simultaneously, as they are drawn without replacement.
(i)

(ii)

=
(iii) P(at least 1 is good) = P(exactly 1 is good or both are good)
=P(exactly 1 is good and 1 is bad or both are good)
=

(iv) P(atmost 1 is good) =P(none is good or 1 is good and 1is bad)


=
(v) P(exactly 1is good) =P(1 is good and 1 is bad)
=

(vi) P(neither has major defects) = P(both are non-major defective articles)
=

(vii) P(neither is good) = P(both are defective)


=

36
Example 8: From 6 positive and 8 negative numbers, 4 numbers are chosen at random
(without replacement) and multiplied. What is the probability that the product is
positive?
Solution: If the product is to be positive, all the 4 numbers must be positive or all the 4
must be negative or 2 of them must be positive and the other 2 must be negative.
No. of ways of choosing 4 positive numbers= =15.
No. of ways of choosing 4 negative numbers= =70.
No.of ways of choosing 2 positive and 2 negative numbers
=
Total no. of ways of choosing 4 numbers from all the 14 numbers
=
P(the product is positive)
=

Example 9: If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black
balls, what is the probability that one of the drawn balls is white and the other two black?
Solution: If we regard the order in which the balls are selected as being relevant, then
the sample space consists of 11∙ 10 ∙ 9 = 990 outcomes. Furthermore, there are 6∙ 5∙ 4 =
120 outcomes in which the first ball selected is white and the other two black; 5 ∙ 6∙ 4 =
120 outcomes in which the first is black, the second white and the third black; and 5∙ 4 ∙
6 = 120 in which the first two are black and the third white. Hence, assuming that
“randomly drawn” means that each outcome in the sample space is equally likely to
occur, we see that the desired probability is =

Example 10: In a large genetics study utilizing guinea pigs, Cavia sp., 30% of the
offspring produced had white fur and 40% had pink eyes. Two-thirds of the guinea pigs
with white fur had pink eyes. What is the probability of a randomly selected offspring
having both white fur and pink eyes?
Solution: P(W) = 0.30, P(Pi) = 0.40, and P(Pi‌‌‌ W) = 0.67. Utilizing Formula 2.9,
P(Pi ∩ W) = P(Pi ‌ W). P(W) = 0.67. 0.30 = 0.20.
Twenty percent of all offspring are expected to have both white fur and pink eyes.

Example 11: Consider three gene loci in tomato, the first locus affects fruit shape with
the oo genopyte causing oblate or flattened fruit and OO or Oo normal round fruit. The
second locus affects fruit color with yy having yellow fruit and YY or Yy red fruit. The
final locus affects leaf shape with pp having potato or smooth leaves and PP or Pp having
the more typical cut leaves. Each of these loci is located on a different pair of
chromosomes and, therefore, acts independently of the other loci. In the following cross
OoYyPp × OoYypp, what is the probability that an offspring will have the dominant
phenotype for each trait? What is the probability that it will be heterozygous for all three
genes? What is the probability that it will have round, yellow fruit and potato leaves?

37
Solution: Genotypic array:
( OO + Oo + oo) ( YY + Yy + yy) ( pp)

Phenotypic array:
( O- + oo) ( Y- + yy) ( P + pp)
The probabiltity of dominant phenotype for each trait from the phenotypic array
above is
P(O-Y-P-) = P(O-) × P(Y-) × P(P-) = × × = .
The probability of heterozygous for all three genes from the genotypic array
above is
P(OoYyPp) = P(Oo) × P(Yy) × P(Pp) = × × = = .
The probability of a round, yellow-fruited plant with potato leaves from the
phenotypic array above is
P(O-yypp) = P(O-) × P(yy) × P(pp) = × × = .
Each answer applies the probability rules for independent events to the separate gene loci.

Example 12: (a) Two cards are drawn at random from a well shuffled pack of 52 playing
cards. Find the chance of drawing two aces.
(b) From a pack of 52 cards, three are drawn at random. Find the chance that they
are a king, a queen and a knave.
(c) Four cards are drawn from a pack of cards. Find the probability that (i) all are
diamond (ii) there is one card of each suit (iii) there are two spades and two hearts.
Solution: (a) From a pack of 52 cards 2 can be drawn in ways, all being equally
likely. Exhaustive number of cases is .
In a pack there are 4 aces and therefore 2 aces can be drawn in ways.

 Required probability = =

(b) Exhaustive number of cases =


A pack of cards contains 4 kings, 4 queens and 4 knaves. A king, a queen and a
knave can each be drawn in ways and since each way of drawing a king can be
associated with each of the ways of drawing a queen and a knave, the total number of
favrourable cases =   .

 Required probability = =

(c) Exhaustive number of cases

(i) Required probability =

38
(ii) Required probability =

(iv) Required probability =

Example 13: What is the probability of getting 9 cards of the same suit in one hand at a
game of bridge?
Solution: One hand in a game of bridge consists of 13 cards.
 Exhaustive number of cases
Number of ways in which, in one hand, a particular player gets 9 cards of one suit are
and the number of ways in which the remaining 4 cards are of some other suit are
. Since there are 4 suits in a pack of cards, total number of favourable cases is
.

 Required probability =

Example 14: A committee of 4 people is to be appointed from 3 officers of the


production department, 4 officers of the purchase department, two officers of the sales
department and 1 chartered accountant. Find the probability of forming the committee in
the following manner:
(i) There must be one from each category
(ii) It should have at least one from the purchase department
(iii) The chartered accountant must be in the committee.
Solution: There are 3 + 4 + 2 + 1 = 10 persons in all and a committee of 4 people can be
formed out of them in ways. Hence exhaustive number of cases is = 210
(i) Favourable number of cases for the committee to consist of 4 members, one from each
category is    1 = 24
Required probability =
(ii) P(Committee has at least one purchase officer) = 1 – P(Committee has no purchase
Officer)
In order that the committee has no purchase officer, all the four members are to be
selected amongst officers of production department, sales department and chartered
accountant, that is out of 3 + 2 + 1 = 6 members and this can be done in = 15
ways. Hence,
P(Committee has no purchase officer) =

P(Committee has at least one purchase officer) = 1 – =

(iii) Favourable number of cases that the committee consists of a chartered accountant as
a member and three others are:
1 = 84 ways.

39
Since a chartered accountant can be selected out of one chartered accountant in only
1 way and the remaining 3 members can be selected out of the remaining 10 – 1
persons in ways. Hence the required probability = .

Example 15: A box contains 6 red, 4 white and 5 black balls. A persons draws 4 balls
from the box at random. Find the probability that among the balls drawn there is at least
one ball of each colour.
Solution: The required event E that in a draw of 4 balls from the box at random there is
at least one ball of each colour can materialize in the following mutually disjoint ways:
(i) 1 Red, 1 White and 2 Black balls
(ii) 2 Red, 1 White and 1 Black balls
(iii) 1 Red, 2 White and 1 Black balls
Hence by addition rule of probability, the required probability is given by,
P(E) = P(i) + P(ii) + P(iii)
=
= 0.5275

Example 16: A problem in Statistics is given to the three students A, B and C whose
chances of solving it are 1/2, 3/4 and 1/4 respectively. What is the probability that the
problem will be solved if all of them try independently?
Solution: Let A, B and C denote the events that the problem is solved by the students A,
B and C respectively. Then
P(A) = 1/2 P(B) = 3/4 P(C) = 1/4
P( ) = 1 – 1/2 = 1/2 P( ) = 1 – 3/4 = 1/4 P( ) = 1 – 1/4 = 3/4

P(Problem solved) = P(At least one of them solves the problem)


= 1 – P(None of them solve the problem)
= 1 – P( )
= 1 – P( )
= 1 – P( ) P( ) P( )
=

Example 17: Three groups of children contain respectively 3 girls and 1 boy, 2 girls and
2 boys and 1 girl and 3 boys. One child is selected at random from each group. Find the
probability that the three selected consist of 1 girl and 2 boys.
Solution: The required event of getting 1 girl and 2 boys among the three selected
children can materialize in the following three mutually exclusive cases:

Group No. → I II III

40
(i) Girl Boy Boy
(ii) Boy Girl Boy
(iii) Boy Boy Girl

By addition rule of probability,


Required probability = P(i) + P(ii) + P(iii)

Since the probability of selecting a girl from the first group is 3/4, of selecting a
boy from the second is 2/4, and of selecting a boy from the third group is ¾, and since
these three events of selecting children from the three groups are independent of each
other, we have,
P(i) =

P(ii) =

P(iii) =

Hence the required probability =

Exercise 3.1.

1. From a bag containing 3 red and 2 black balls, 2 balls are drawn at random. Find the
probability that they are of the same colour.

2. A card is drawn from a well-shuffled pack of playing cards. What is the probability
that it is either a spade or an ace?

3. The probability that a contractor will get a plumbing contract is 2/3 and the probability
that he will get an electric contract is 4/9. If the probability of getting at least one
contract is 4/5, what is the probability that he will get both?

4. What is the probability of getting atleast 1 head when 2 coins are tossed?

5. If the probability that A solves a problem is ½ and that for B is ¾ and if they aim at
solving a problem independently, what is the probability that the problem is solved?
6. An urn contains 3 white balls, 4 red balls, and 5 black balls. Two balls are drawn from
the urn at random. Find the probability that (i) both of them are of the same colour
and (ii) they are of different colours.

7. Ten chips numbered 1 through 10 are mixed in a bowl. Two chips are drawn from the
bowl successively and without replacement. What is the probability that their sum is
10?

41
8. A box contains 4 white, 5 red and 6 black balls. Four balls are drawn at random from
the box. Find the probability that among the balls drawn, there is at least 1 ball of
each colour.

9. (i) Four persons are chosen at random from a group consisting of 4 men, 3 women and
2 children. Find the chance that the selected group contains at least 1 child.
(ii) A committee of 6 is to be formed from 5 lecturers and 3 professors. If the members
of the committee are chosen at random, what is the probability that there will be a
majority of lecturers in the committee?

10. Suppose that A and B are mutually exclusive events for which P(A) = .3 and
P(B)= .5. What is the probability that
(a) either A or B occurs;
(b) A occurs but B does not;
(c) both A and B occur?

11. Sixty percent of the students at a certain school wear neither a ring nor a necklace.
Twenty percent wear a ring and 30 percent wear a necklace. If one of the students is
chosen randomly, what is the probability that this student is wearing
(a) a ring or a necklace;
(b) a ring and a necklace? Ans: 0.4; 0.1

12. An urn contains 5 red, 6 blue, and 8 green balls. If a set of 3 balls is randomly
selected, what is the probability that each of the ball will be (a) of the same color;
(b) of different colors? Ans: 0.0888

13. There are 30 psychiatrists and 24 psychologists attending a certain conference. Three
of these 54 people are randomly chosen to take part in a panel discussion. What is
the probability that at least one psychologists is chosen? Ans: 0.8363

14. Two cards are chosen at random from a deck of 52 playing cards. What is the
probability that they
(a) are both aces;
(b) have the same value? Ans: 0.0045; 0.0588

15. An instructor gives her class a set of 10 problems with the information that the final
exam will consist of a random selection of 5 of them. If a student has figured out
how to do 7 of the problems, what is the probability that he or she will answer
correctly.
(a) all 5 problems;
(b) at least 4 of the problems? Ans: 0.0833; 0.5
(c)
16. If there are 12 strangers in a room, what is the probability that no two of them
celebrate their birthday in the same month?
17. A group of 6 men and 6 women is randomly divided into 2 groups of size 6 each.
What is the probability that both groups will have the same number of men?

42
Ans: 0.4329

18. If a zoologist has 6 male guinea pigs and 9 female guinea pigs, and randomly selects
2 of them for an experiment, what are the probabilities that
(a) both will be males?
(b) both will be females?
(c) there will be one of each sex? Ans: 0.143; 0.343; 0.514
(d)
19. Suppose you are planning to study a species of crayfish in the ponds at a wildlife
preserve. Unknown to you 15 of the 40 ponds available lack this species. Because
of time constraints you feel you can survey only 12 ponds. What is the probability
that you choose 8 ponds with crayfish and 4 ponds without crayfish?
Ans:0.264

20. In a study of the effects of acid rain on fish populations in Adirondack mountain
lakes, samples of yellow perch, Perca flavescens, were collected. Forty percent of
the fish had gill filament deformities and 70% were stunted. Twenty percent
exhibited both abnormalities.
(a) Find the probability that a randomly sampled fish will be free of both symptoms.
(b) If a fish has a gill filament deformity, what is the probability it will be stunted?
(c) Are the two symptoms independent of each other? Explain.

3.3. Conditional Probability and Baye’s Theorem

3.3.1. Conditional Probability and Multiplication Law


For two events A and B
P(A∩B) = P(A) . P(B/A), P(A) > 0
= P(B) . P(A/B), P(B) > 0
where P(B/A) represents the conditional probability of occurrence of B when the event A
has already happened and P(A/B) is the conditional probability of occurrence of A when
the event B has already happened.

3.3.2. Theorem of Total Probability:


If B1, B2, … , Bn be a set of exhaustive and mutually exclusive events, and A is another
event associated with (or caused by) Bi, then
P(A) =

3.3.3. Solved Examples

43
Example 18 : A box contains 4 bad and 6 good tubes. Two are drawn out from the box at
a time. One of them is tested and found to be good. What is the probability that the other
one is also good?
Solution: Let A = one of the tubes drawn is good and B = the other tube is good.
P(A∩B) = P(both tubes drawn are good)
=
Knowing that one tube is good, the conditional probability that the other tube
is also good is required, i.e., P(B/A) is required.
By definition,

P(B/A) =

Example 19: A bolt is manufactured by 3 machines A, B and C. A turns out twice as


many items as B, and machines B and C produce equal number of items. 2% of bolts
produced by A and B are defective and 4% of bolts produced by C are defective. All bolts
are put into 1 stock pile and chosen from this pile. What is the probability that it is
defective?
Solution: Let A = the event in which the item has been produced by machine A, and so
on.
Let D = the event of the item being defective.
P(A) = , P(B) = P(C) =
P(D/A) = P(an item is defective, given that A has produced it)
= = P(D/B)

P(D/C) =
By theorem of total probability,
P(D) = P(A )× P(D/A) + P(B) × P(D/B) + P(C) ×P(D/c)
= × + × + ×

Example 20: In a coin tossing experiment, if the coin shows head, one die is thrown and
the result is recorded. But if the coin shows tail, 2 dice are thrown and their sum is
recorded. What is the probability that the recorded number will be 2?
Solution: When a single die is thrown, P(2) = 1/6
When 2 dice are thrown, the sum will be 2 only if each dice shows 1.
P(getting 2 as sum with 2 dice) = (since independence)
By theorem of total probability,
P(2) = P(H)  P(2/H) + P(T)  P(2/T)

44
=

Example 21: An urn contains 10 white and 3 black balls. Another urn contains 3 white
and 5 black balls. Two balls are drawn at random from the first urn and place in the
second urn and then one ball is taken at random from the latter. What is the probability
that it is a white ball?
Solution: The two balls transferred may be both white or both black or one white and one
black.
Let B1 = event of drawing 2 white balls from the first urn, B 2 = event of
drawing 2 black balls from it and B 3 = event of drawing one white and one black ball
from it.
Clearly B1, B2 and B3 are exhaustive and mutually exclusive events.
Let A = event of drawing a white ball from the second urn after transfer.
P(B1) =

P(B2) =

P(B3) =
P(A/B1) = P(drawing a white ball / 2 white balls have been transferred)
= P(drawing a white ball / urn II contains 5 white and 5 black balls)
=

Similarly, P(A/B2) = and P(A/B3) =


By theorem of total probability,
P(A) = P(B1)  P(A/B1) + P(B2)  P(A/B2) + P(B3)  P(A/B3)
=

Example 22: In 1989 there were three candidates for the position of principal –
Mr.Chatterji, Mr. Ayangar and Mr. Singh – whose chances of getting the appointment are
in the proportion 423 respectively. The probability that Mr. Chatterji if selected would
introduce co-education in the college is 0.3. The probabilities of Mr. Ayangar and
Mr.Singh doing the same are respectively 0.5 and 0.8. What is the proabability that there
will be co-education in the college?
Solution: Let the events and probabilities be defined as follows:
A: Introduction of co-education
E1: Mr.Chatterji is selected as principal
E2: Mr.Ayangar is selected as principal
E3: Mr.Singh is selected as principal

45
Then,
P(E1) = P(E2) = P(E3) =
P(A/E1) = 0.3 P(A/E2) = 0.5 P(A/E3) = 0.8

P(A) =
=
= P(E1) P(A/E1) + P(E2) P(A/E2) + P(E3) P(A/E3)
=

3.3.4. Baye’s theorem


If E1, E2, … , En are mutually disjoint events with P(E i)  0, (i = 1,2, … , n) then
for any arbitrary event A which is a subset of such that P(A) > 0, we have,

P(Ei/A) = , i = 1, 2, … , n

3.3.5. Solved Examples

Example 23. A bag contains 5 balls and it is not known how many of them are white.
Two balls are drawn at random from the bag and they are noted to be white. What is the
chance that all the balls in the bag are white?
Solution: Since 2 white balls have been drawn out, the bag must have contained 2, 3, 4
or 5 white balls.
Let B1 = Event of the bag containing 2 white balls, B 2 = Events of the bag
containing 3 white balls, B3 = Event of the bag containing 4 white balls and B 4 = Event of
the bag containing 5 white balls.

Let A = Event of drawing 2 white balls.


P(A/B1) = P(A/B2) =

P(A/B3) = P(A/B4) =
Since the number of white balls in the bag is not known, Bi’s are equally likely.
P(B1) = P(B2) = P(B3) = P(B4) =
By Baye’s theorem,

P(B4/A) =

46
=

Example 24: There are 3 true coins and 1 false coin with ‘head’ on both sides. A coin is
chosen at random and tossed 4 times. If ‘head’ occurs all the 4 times, what is the
probability that the false coin has beeb chosen and used?
Solution:
P(T) = P(the coin is a true coin) =

P(F) = P(the coin is a false coin) =

Let A = Event of getting all heads in 4 tosses


Then P(A/T) = and P(A/F) = 1
By Baye’s theorem

P(F/A) =

Example 25: The contents of urns I, Ii and III are as follows:


1 white, 2 black and 3 red balls
2 white, 1 black and 1 red balls
4 white, 5 black and 3 red balls
One urn is chosen at random and two balls are drawn. They happen to be white and
red. What is the probability that they come from urns I, II or III?

Solution: Let E1, E2 and E3 denote the events that the urn I, II and III is chosen,
respectively, and let A be the event that the two balls taken from the selected urn are
white and red. Then
P(E1) = P(E2) = P(E3) =

P(A/E1) =

P(A/E2) =

P(A/E3) =

47
Hence P(E2/A) =

Similarly, P(E3/A) =

Therefore P(E1/A) =

Exercise 3.2
1. Bag I contains 2 white and 3 black balls and bag II contains 4 white and 1 black balls.
A ball chosen at random from one of the bags is white. What is the probability that it
has come from bag I?

2. Five men out of 100 and 25 women out of 1000 are colour-blind. A colour-blind
person is chosen at random. What is the probability that the person is a male?
(Assume males and females are in equal numbers).

3. There are 2 bags one of which contains 5 re and 8 black balls and the other 7 red and
10 black balls. A ball is drawn from one or the other of the 2 bags. Find the chance of
drawing a red ball.

4. In a bolt factory, machines A, B and C produce 25%, 35% and 40% of the total output,
respectively. Of their outputs, 5, 4 and 2%, respectively, are defective bolts. If a bolt
is chosen at random from the combined output, what is the probability that it is
defective? If a bolt chosen at random is found to be defective, what is the probability
that it was produced by B or C?

5. There are 4 candidates for the office of the highway commissioner; the respective
probabilities that they will be selected are 0.3, 0.2, 0.4 and 0.1, and the probabilities
for a project’s approval are 0.35, 0.85, 0.45 and 0.15, depending on which of the 4
candidates is selected. What is the probability of the project getting approved?

48
6. Urn I has 2 white and 3 black balls, urn II has 4 white and 1 black balls and urn III has
3 white and 4 black balls. An urn is selected at random and a ball drawn at random is
found to be white. Find the probability that urn I was selected.

7. Three urns contain 3 white, 1 red and 1 black balls; 2 whit, 3 red and 4 black balls; 1
white, 3 red and 2 black balls respectively. One urn is chosen at random and from it 2
balls are drawn at random. If they are found to be 1 red and 1 black ball, what is the
probability that the first urn was chosen?

8. Find the probability of drawing a queen and a king from a pack of cards in two
consecutive draws, the cards drawn not being replaced.

9. Police plan to enforce speed limits by using radar traps at 4 different locations within
the city limits. The radar traps at each of the locations L 1, L2, L3 and L4 are operated
at 40%, 30%, 20% and 30% of the time, and if a person who is speeding on his way
to work has probabilities of 0.2, 0.1, 0.5 and 0.2 respectively, of passing through
these locations, what is the probability that he will receive a speeding ticket?

10. The blood type distribution in the United States at the time of world war II was
thought to be type A, 41%; type B, 9%; type AB, 4% and type O, 46%. It is
estimated that during world war II, 4% of inductees with type O blood were typed as
having type A, 88% of those with type A blood were correctly typed; 4% with type B
blood were typed as A and 10% with type AB were typed as A. A soldier was
wounded and brought to surgery. He was tested and typed as having type A blood.
What is the probability that this was his correct blood type?

49
50

You might also like