0% found this document useful (0 votes)

23 views23 pages

Part II - Data Aalysis

Uploaded by

petersmog286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views23 pages

Part II - Data Aalysis

Uploaded by

petersmog286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Department of Library and Information.

Science
Federal University Lokoja
LIS 313: Research and Statistical Method in Library & Information Science

Part II

Introduction to Statistics
When a person is ill or has had an accident and received an injury there are many variables
associated with these situations which could be measured. An infection may cause the person's
temperature to rise, a broken bone will cause pain. There may also be psychological
consequences which a researcher may wish to measure, e.g. anxiety or health beliefs. The
methods used to measure these variable will very often be of a quantitative nature. The
researcher will use techniques which allow some form of number to be used to assess or quantify
the condition under investigation. They will seek to investigate the relationships between
variables using systematic controlled observations. These observations of a carefully chosen
sample of the population of interest, and the associated statistical procedures, will enable
researchers to test their hypotheses and verify or refute the theories which attempt to explain
the observations. The techniques used by researchers to test their hypotheses are many and
varied and in quantitative studies will often involve some form of experimental study. Other
types of study may be better approached using survey techniques which may employ a variety
of questionnaires or attitude measures

A great deal of Library and Information Centres related research is often concerned with
measuring associations between variables, e.g. reading habit and performance. These types of
studies may look at relatively simple linear relationships, i.e correlations, as in the cited example,
but it is possible, using multiple regression techniques, to examine the complex interrelationships
between several factors which may have a bearing on the topic of interest. For example, the
likelihood of a person failing his exams is related to many factors including poor preparation,
information seeking behavior, use of relevant and current material, and other lifestyle measures.
The relative contributions of these factors can be teased out using these sort of techniques.

1
Regardless of the research techniques used, in quantitative research the aim of the research
activity may be summarised as Understanding, Prediction, and Control. The researcher is
attempting to gain an understanding of the phenomena under study so that they may use this
understanding in order to make predictions about the real world, and thus develop technologies
or procedures which allow a degree of control to be exerted over that phenomena. Thus in
Library and Information Science research we may be seeking to understand the transmission of
information to meet human needs in order to make predictions about how to help patrons find
information that will fill an information gap.

Nature of Data

Numerical data is the essence of quantitative methods. In order to try to understand the
phenomena under study, the researcher will first have to find a means of expressing the
variables to be measured using some form of numerical technique. For most practical purposes,
data can be measured at four different levels; each level has a specific purpose and also has
important implications for the type of analysis to be undertaken. These four levels of
measurement are known as nominal, ordinal, interval and ratio.

Consider a study in which a patron register in a library to enable him borrow books and use the
library facilities. During the registration, the patrons name is taken and a patron’s identification
number issued. The number is a unique identifier - since there may well be a patrons with the
same name. But the number that is issued is rather like a name in that it identifies the patron
and probably will not be used in a numerical sense; i.e. this patron's identifier has no numerical
significance relative to other patron's identification numbers. This type of data in which
numbers are used as identifiers are called nominal data and researchers speak of nominal levels
of measurement. Another example might be where a number is used to identify gender e.g. 1
= male and 2 = female.

Continuing with this example, after registration and the patron would want to see a reference
librarian, he will be asked to wait to see the reference librarian, and is told they are number five
on a waiting list. Now although this indicates to the patron that four other people will have to
be seen before their turn, it doesn't indicate very much about how long it will take to see the
reference, since there is no way of knowing how long each of the other four patients will need

2
with the reference librarian . This is an example of ordinal data, where the numerical value
indicates something about relative rather than absolute position in a series. Other common
examples of ordinal data include ranks, where absolute numerical values are turned into a
numerical series because we are more interested in relative values than absolute. Many
statistical tests make use of this type of ranked data.

For someone who weighs 150 lb not only do we know they are heavier than someone with a
weight of 140lb, but we also know by how much heavier. This is known as interval level
measurement because this numerical system also tells us about the intervals between the units
of measurement. Weight is a special types of interval measurement because it does not have
an absolute zero point, i.e. you can't have less than zero weight - this special type of scale is
known as a ratio scale.

For most purposes the distinction between interval and ratio scales is not that important, but
knowing the difference between nominal, ordinal, and interval/ratio is of importance because it
helps us to choose the appropriate statistical tool for the analysis of the data.

Variables

Put simply, a variable is something that can have more than one value! In research, particularly
quantitative research where we are using experiments to try to establish cause and effect certain
variables are especially important:

• Independent Variable (IV)

• Dependent Variable (DV)

• Extraneous Variable (EV)

Consider an experimental study aimed at establishing the efficacy of electronic scholarly on

Scholarly communication. We hypothesise that electronic scholarly methods will have beneficial
effects on scholarly communication. In this situation the new model of electronic scholarly is the
Independent Variable and the scholarly communication the Dependent variable. However, it is
very likely that the Dependent variable may be influenced by other factors as well as the
Independent variable.

3
Scholarly communication may be affected by the poor quality of content, out date research focus
or other problems that effect scholarly communication. All these other potential sources of
influence are known as extraneous variables. The purpose of experimental design is, as far as
possible, to control these extraneous factors.

Target Populations

The target population in a research study comprises all those potential participants that could
make up the study group. Thus in a study of the provision of online information services, the
target population would be those entire library with a website. Of course a researcher might
want to narrow the target populations and might choose (for example) all University libraries
as the target population. It is important to realise that the ability to generalise the findings of a
study will be restricted by the chosen target population. Thus in the above study the researcher
could only generalise the findings back to a population of university libraries with a website.
Though it should be added that the findings may well be suggestive of similar results in
populations that do not differ too greatly from the target population.

4
MEP Pupil Text 9

9 Data Analysis
9.1 Mean, Median, Mode and Range
In Unit 8, you were looking at ways of collecting and representing data. In this unit, you
will go one step further and find out how to calculate statistical quantities which summa-
rise the important characteristics of the data.

The mean, median and mode are three different ways of describing the average.

• To find the mean, add up all the numbers and divide by the number of numbers.
• To find the median, place all the numbers in order and select the middle number.
• The mode is the number which appears most often.
• The range gives an idea of how the data are spread out and is the difference between
the smallest and largest values.

Worked Example 1
Find
(a) the mean (b) the median (c) the mode (d) the range
of this set of data.
5, 6, 2, 4, 7, 8, 3, 5, 6, 6

Solution
(a) The mean is
5+6 +2 + 4 + 7+8+3+ 5+ 6+ 6
10
52
=
10
= 5.2 .

(b) To find the median, place all the numbers in order.

2, 3, 4, 5, 5, 6, 6, 6, 7, 8
As there are two middle numbers in this example, 5 and 6,
5+6
median =
2
11
=
2
= 5.5 .

(d) The range is the difference between the smallest and largest numbers, in this case
2 and 8. So the range is 8 − 2 = 6 .
145
MEP Pupil Text 9
9.1

Worked Example 2
Five people play golf and at one hole their scores are
3, 4, 4, 5, 7.
For these scores, find
(a) the mean (b) the median (c) the mode (d) the range .

Solution
(a) The mean is
3+ 4+ 4+5+ 7
5
23
=
5
= 4.6 .

(b) The numbers are already in order and the middle number is 4. So
median = 4 .

(c) The score 4 occurs most often, so,

mode = 4 .

(d) The range is the difference between the smallest and largest numbers, in this case
3 and 7, so
range = 7 − 3
= 4.

Exercises
1. Find the mean median, mode and range of each set of numbers below.
(a) 3, 4, 7, 3, 5, 2, 6, 10
(b) 8, 10, 12, 14, 7, 16, 5, 7, 9, 11
(c) 17, 18, 16, 17, 17, 14, 22, 15, 16, 17, 14, 12
(d) 108, 99, 112, 111, 108
(e) 64, 66, 65, 61, 67, 61, 57
(f) 21, 30, 22, 16, 24, 28, 16, 17

2. Twenty children were asked their shoe sizes. The results are given below.

8, 6, 7, 6, 5, 4 12 , 7 12 , 6 12 , 8 12 , 10
1 1
7, 5, 5 2
8, 9, 7, 5, 6, 8 2
6

For this data, find

(a) the mean (b) the median (c) the mode (d) the range.

146
MEP Pupil Text 9
9.1

17. Eight judges each give a mark out of 6 in an ice-skating competition.

Oksana is given the following marks.
5.3, 5.7, 5.9, 5.4, 4.5, 5.7, 5.8, 5.7
The mean of these marks is 5.5, and the range is 1.4.
The rules say that the highest mark and the lowest mark are to be deleted.
5.3, 5.7, 5.9, 5.4, 4.5, 5.7, 5.8, 5.7
(a) (i) Find the mean of the six remaining marks.
(ii) Find the range of the six remaining marks.
(b) Do you think it is better to count all eight marks, or to count only the six
remaining marks? Use the means and the ranges to explain your answer.
(c) The eight marks obtained by Tonya in the same competition have a mean
of 5.2 and a range of 0.6. Explain why none of her marks could be as high
as 5.9. (MEG)

9.2 Finding the Mean from Tables and

Tally Charts
Often data are collected into tables or tally charts. This section considers how to find the
mean in such cases.

Worked Example 1
A football team keep records of the number of goals it scores per match during a season.

No. of Goals Frequency

0 8
1 10
2 12
3 3
4 5
5 2

Find the mean number of goals per match.

Solution No. of Goals Frequency No. of Goals × Frequency

The table above can
0 8 0 × 8 = 0
be used, with a third
1 10 1 × 10 = 10
column added.
2 12 2 × 12 = 24
The mean can now 3 3 3 × 3 = 9
be calculated. 4 5 4 × 5 = 20

73 5 2 5 × 2 = 10
Mean =
40 TOTALS 40 73

= 1.825 . (Total matches) (Total goals)

150
MEP Pupil Text 9

Worked Example 2
The bar chart shows how many cars were sold by a salesman over a period of time.

6
5
4
Frequency
3
2
1

0 1 2 3 4 5
Cars sold per day

Find the mean number of cars sold per day.

Solution
The data can be transferred to a table and a third column included as shown.

Cars sold daily Frequency Cars sold × Frequency

0 2 0 × 2 = 0
1 4 1 × 4 = 4
2 3 2 × 3 = 6
3 6 3 × 6 = 18
4 3 4 × 3 = 12
5 2 5 × 2 = 10

TOTALS 20 50

(Total days) (Total number of cars sold)

50
Mean =
20
= 2.5

Worked Example 3
A police station kept records of the number of road traffic accidents in their area each day
for 100 days. The figures below give the number of accidents per day.

1 4 3 5 5 2 5 4 3 2 0 3 1 2 2 3 0 5 2 1
3 3 2 6 2 1 6 1 2 2 3 2 2 2 2 5 4 4 2 3
3 1 4 1 7 3 3 0 2 5 4 3 3 4 3 4 5 3 5 2
4 4 6 5 2 4 5 5 3 2 0 3 3 4 5 2 3 3 4 4
1 3 5 1 1 2 2 5 6 6 4 6 5 8 2 5 3 3 5 4

Find the mean number of accidents per day.

151
MEP Pupil Text 9
9.2

Solution
The first step is to draw out and complete a tally chart. The final column shown below
can then be added and completed.

Number of Accidents Tally Frequency No. of Accidents × Frequency

0 |||| 4 0 × 4 = 0
1 |||| |||| 10 1 × 10 = 10
2 |||| |||| |||| |||| || 22 2 × 22 = 44
3 |||| |||| |||| |||| ||| 23 3 × 23 = 69
4 |||| |||| |||| | 16 4 × 16 = 64
5 |||| |||| |||| || 17 5 × 17 = 85
6 |||| | 6 6 × 6 = 36
7 | 1 7 × 1 = 7
8 | 1 8 × 1 = 8
TOTALS 100 323

323
Mean number of accidents per day = = 3.23.
100

Exercises
1. A survey of 100 households asked how many cars there were in each household
The results are given below.

No. of Cars Frequency

0 5
1 70
2 21
3 3
4 1

Calculate the mean number of cars per household.

2. The survey of question 1 also asked how many TV sets there were in each house-
hold. The results are given below.

No. of TV Sets Frequency

0 2
1 30
2 52
3 8
4 5
5 3

Calculate the mean number of TV sets per household.

152
MEP Pupil Text 9

8. The mean of 6 numbers is 12.3. When an extra number is added, the mean changes
to 11.9. What is the extra number?

9. When 5 is added to a set of 3 numbers the mean increases to 4.6. What was the
mean of the original 3 numbers?

10. Three numbers have a mean of 64. When a fourth number is included the mean is
doubled. What is the fourth number?

9.4 Mean, Median and Mode for Grouped Data

The mean and median can be estimated from tables of grouped data.
The class interval which contains the most values is known as the modal class.

Worked Example 1
The table below gives data on the heights, in cm, of 51 children.

Class Interval 140 ≤ h < 150 150 ≤ h < 160 160 ≤ h < 170 170 ≤ h < 180
Frequency 6 16 21 8

(a) Estimate the mean height. (b) Estimate the median height.
(c) Find the modal class.

Solution
(a) To estimate the mean, the mid-point of each interval should be used.

Class Interval Mid-point Frequency Mid-point × Frequency

140 ≤ h < 150 145 6 145 × 6 = 870

150 ≤ h < 160 155 16 155 × 16 = 2480
160 ≤ h < 170 165 21 165 × 21 = 3465
170 ≤ h < 180 175 8 175 × 8 = 1400

Totals 51 8215

8215
Mean =
51
= 161 (to the nearest cm)

(b) The median is the 26th value. In this case it lies in the 160 ≤ h < 170 class interval.
The 4th value in the interval is needed. It is estimated as
4
160 + × 10 = 162 (to the nearest cm)
21

157
MEP Pupil Text 9
9.4

Also note that when we speak of someone by age, say 8, then the person could be any age
from 8 years 0 days up to 8 years 364 days (365 in a leap year!). You will see how this is
tackled in the following example.

Worked Example 2
The age of children in a primary school were recorded in the table below.

Age 5–6 7–8 9 – 10

Frequency 29 40 38

(a) Estimate the mean. (b) Estimate the median. (c) Find the modal age.

Solution
(a) To estimate the mean, we must use the mid-point of each interval; so, for example
for '5 – 6', which really means
5 ≤ age < 7 ,
the mid-point is taken as 6.

Class Interval Mid-point Frequency Mid-point × Frequency

5–6 6 29 6 × 29 = 174
7–8 8 40 8 × 40 = 320
9 – 10 10 38 10 × 38 = 380
Totals 107 874

874
Mean =
107
= 8.2 (to 1 decimal place)

(b) The median is given by the 54th value, which we have to estimate. There are 29
values in the first interval, so we need to estimate the 25th value in the second
interval. As there are 40 values in the second interval, the median is estimated as
being
25
40
of the way along the second interval. This has width 9 − 7 = 2 years, so the
median is estimated by
25
× 2 = 1.25
40
from the start of the interval. Therefore the median is estimated as
7 + 1.25 = 8.25 years.

(c) The modal age is the 7 – 8 age group.

158
MEP Pupil Text 9

Worked Example 1 uses what are called continuous data, since height can be of any value.
(Other examples of continuous data are weight, temperature, area, volume and time.)

The next example uses discrete data, that is, data which can take only a particular value,
such as the integers 1, 2, 3, 4, . . . in this case.

The calculations for mean and mode are not affected but estimation of the median
requires replacing the discrete grouped data with an approximate continuous interval.

Worked Example 3
The number of days that children were missing from school due to sickness in one year
was recorded.

Number of days off sick 1–5 6 – 10 11 – 15 16 – 20 21 – 25

Frequency 12 11 10 4 3

(a) Estimate the mean (b) Estimate the median. (c) Find the modal class.

Solution
(a) The estimate is made by assuming that all the values in a class interval are equal to
the midpoint of the class interval.

Class Interval Mid-point Frequency Mid-point × Frequency

1–5 3 12 3 × 12 = 36
6–10 8 11 8 × 11 = 88
11–15 13 10 13 × 10 = 130
16–20 18 4 18 × 4 = 72
21–25 23 3 23 × 3 = 69

Totals 40 395

395
Mean =
40
= 9.925 days.

(b) As there are 40 pupils, we need to consider the mean of the 20th and 21st values.
These both lie in the 6–10 class interval, which is really the 5.5–10.5 class interval,
so this interval contains the median.
As there are 12 values in the first class interval, the median is found by considering
the 8th and 9th values of the second interval.
As there are 11 values in the second interval, the median is estimated as being
8.5
11
of the way along the second interval.

159
MEP Pupil Text 9
9.4

But the length of the second interval is 10.5 − 5.5 = 5 , so the median is estimated by
8.5
× 5 = 3.86
11
from the start of this interval. Therefore the median is estimated as
5.5 + 3.86 = 9.36 .

Exercises
1. A door to door salesman keeps a record of the number of homes he visits each day.

Homes visited 0–9 10 – 19 20 – 29 30 – 39 40 – 49

Frequency 3 8 24 60 21

(a) Estimate the mean number of homes visited.

(b) Estimate the median.
(c) What is the modal class?

2. The weights of a number of students were recorded in kg.

Mean (kg) 30 ≤ w < 35 35 ≤ w < 40 40 ≤ w < 45 45 ≤ w < 50 50 ≤ w < 55

Frequency 10 11 15 7 4

(a) Estimate the mean weight. (b) Estimate the median.

3. A stopwatch was used to find the time that it took a group of children to run 100 m.

Time (seconds) 10 ≤ t < 15 15 ≤ t < 20 20 ≤ t < 25 25 ≤ t < 30

Frequency 6 16 21 8

(a) Is the median in the modal class? (b) Estimate the mean.
(c) Estimate the median.
(d) Is the median greater or less than the mean?

4. The distances that children in a year group travelled to school is recorded.

Distance (km) 0 ≤ d < 0.5 0.5 ≤ d < 1.0 1.0 ≤ d < 1.5 1.5 ≤ d < 2.0
Frequency 30 22 19 8

(a) Does the modal class contain the median?

(b) Estimate the median and the mean.
(c) Which is the largest, the median or the mean?

160
MEP Pupil Text 9

9.5 Cumulative Frequency

Cumulative frequencies are useful if more detailed information is required about a set of
data. In particular, they can be used to find the median and inter-quartile range.

The inter-quartile range contains the middle 50% of the sample and describes how
spread out the data are. This is illustrated in Example 2.

Worked Example 1 Height (cm) Frequency

For the data given in the table, draw up a 90 < h ≤ 100 5

cumulative frequency table and then draw 100 < h ≤ 110 22
a cumulative frequency graph.
110 < h ≤ 120 30
120 < h ≤ 130 31
Solution
130 < h ≤ 140 18
The table below shows how to calculate 140 < h ≤ 150 6
the cumulative frequencies.

Height (cm) Frequency Cumulative Frequency

90 < h ≤ 100 5 5
100 < h ≤ 110 22 5 + 22 = 27
110 < h ≤ 120 30 27 + 30 = 57
120 < h ≤ 130 31 57 + 31 = 88
130 < h ≤ 140 18 88 + 18 = 106
140 < h ≤ 150 6 106 + 6 = 112

A graph can then be plotted using points as shown below.

120
(150,112)

(140,106)
100
y

(130,88)
80
q

Cumulative
Frequency
60
(120,57)

(110,27)
20

(90,0) (100,5)
0
90 100 110 120 130 140 150
Height (cm)

164
MEP Pupil Text 9

Note
A more accurate graph is found by drawing a smooth curve through the points, rather
than using straight line segments.

120
(150,112)

(140,106)
100

(130,88)
80
Cumulative
Frequency
60
(120,57)

(110,27)
20

(90,0) (100,5)
0
90 100 110 120 130 140 150
Height (cm)
Worked Example 2
The cumulative frequency graph below gives the results of 120 students on a test.

120

100

80
Cumulative
Frequency
60

0
0 20 40 60 80 100
Test Score

165
MEP Pupil Text 9
9.5

Use the graph to find:

(a) the median score, (b) the inter-quartile range,
(c) the mark which was attained by only 10% of the students,
(d) the number of students who scored more than 75 on the test.

Solution
1
(a) Since 2
of 120 is 60, the median 120

can be found by starting at 60

on the vertical scale, moving 100
horizontally to the graph line
and then moving vertically 80
down to meet the horizontal scale. Start at 60
Cumulative 60
In this case the median is 53. Frequency

Median = 53
0
0 20 40 60 80 100

Score

(b) To find out the inter-quartile range, we must consider the middle 50% of the
students.

To find the lower quartile, 120

1
start at 4
of 120, which is 30.
100 90
This gives
Lower Quartile = 43 . 80
Cumulative
Frequency
60

To find the upper quartile,

3
start at 4
of 120, which is 90. 40 30

This gives 20

Upper Quartile = 67 . Upper quartile = 67

0
0 20 40 60 80 100
Lower quartile = 43
The inter-quartile range is then Test Score

Inter - quartile Range = Upper Quartile − Lower Quartile

= 67 − 43
= 24 .

166
MEP Pupil Text 9

Here the mark which was attained

100 by the top 10% is required.
10% of 120 = 12
80
so start at 108 on the cumulative
Cumulative
Frequency 60
frequency scale.

This gives a mark of 79.

79
0
0 20 40 60 80 100
120
Test Score 103

(d) To find the number of students who 100

scored more than 75, start at 75 on
the horizontal axis.
80

This gives a cumulative frequency Cumulative

of 103. Frequency 60

So the number of students with a 40

score greater than 75 is

20
120 − 103 = 17 .
75

0
0 20 40 60 80 100
Test Score

As in Worked Example 1, a more accurate estimate for the median and inter-quartile
range is obtained if you draw a smooth curve through the data points.

Exercises
1. Make a cumulative frequency table for each set of data given below. Then draw a
cumulative frequency graph and use it to find the median and inter-quartile range.
(a) John weighed each apple in a large box. His results are given in this table.

Weight of
apple (g) 60 < w ≤ 80 80 < w ≤ 100 100 < w ≤ 120 120 < w ≤ 140 140 < w ≤ 160
Frequency 4 28 33 27 8

(b) Pasi asked the students in his class how far they travelled to school each day.
His results are given below.

Distance (km) 0 < d ≤1 1< d ≤ 2 2<d≤3 3<d≤4 4<d≤5 5<d≤6

Frequency 5 12 5 6 5 3

167
MEP Pupil Text 9

120

100

80
Cumulative
Frequency
60

0 20 40 60 80 100 120 140

Distance travelled (miles)

(c) (i) Use the cumulative frequency curve to estimate the median distance
travelled by the guests.
(ii) Give a reason for the large difference between the mean distance and
the median distance.
(MEG)

9.6 Standard Deviation

The two frequency polygons drawn on the graph below show samples which have the
same mean, but the data in one are much more spread out than in the other.

15
Frequency
10

0 1 2 3 4 5 6 7 8 9 10 11 12
Length

The range (highest value – lowest value) gives a simple measure of how much the data
are spread out.

175
MEP Pupil Text 9
9.6

Standard deviation (s.d.) is a much more useful measure and is given by the formula:

∑ (x i − x )2
i =1
s.d. =
n

where xi represents each datapoint ( x1 , x 2 , ..., x n)

x is the mean,
n is the number of values.

Then ( xi − x )2 gives the square of the difference between each value and the mean
(squaring exaggerates the effect of data points far from the mean and gets rid of negative
values), and
n
2
∑ (x − x)
i =1
i

sums up all these squared differences.

The expression
n
1 2
n ∑ (x
i =1
i − x)

gives an average value to these differences. If all the data were the same, then each xi
would equal x and the expression would be zero.

Finally we take the square root of the expression so that the dimensions of the standard
deviation are the same as those of the data.

So standard deviation is a measure of the spread of the data. The greater its value, the
more spread out the data are. This is illustrated by the two frequency polygons shown
above. Although both sets of data have the same mean, the data represented by the
'dotted' frequency polygon will have a greater standard deviation than the other.

Worked Example 1
Find the mean and standard deviation of the numbers,
6, 7, 8, 5, 9.

Solution
The mean, x , is given by,
6+7+8+5+9
x =
5
35
=
5
= 7.

176
MEP Pupil Text 9

Now the standard deviation can be calculated.

(6 − 7)2 + ( 7 − 7)2 + (8 − 7)2 + (5 − 7)2 + (9 − 7)2

s.d. =
5

1+ 0 +1+ 4 + 4
=
5
10
=
5

= 2
= 1.414 (to 3 decimal places)

An alternative formula for standard deviation is

n
2
∑x i
s.d. = i =1
− x2
n

This expression is much more convenient for calculations done without a calculator. The
proof of the equivalence of this formula is given below although it is beyond the scope of
the GCSE syllabus.

Proof
You can see the proof of the equivalence of the two formulae by noting that

n n
2
∑ ( xi
i =1
− x) = ∑ (x
i =1
i
2
− 2 xi x + x 2 )
n n n
= ∑
i =1
xi 2 − ∑ (2 xi x ) + ∑ x 2
i =1 i =1

n n n
= ∑ xi2 − 2 x ∑ xi + x 2 ∑1
i =1 i =1 i =1

(since the expressions 2x and x 2 are common for each term in the summation).

n n
1
But ∑
i =1
1 = n , since you are summing 1 + 1 + ... + 1 = n , and x =
14 4244 3 n ∑ x , by
i =1
n terms
definition, thus
n
1 2 1 ⎛ n 2 n
⎞ n

∑ (x i − x) = ⎜ ∑ xi − 2 x xi + x 2 n⎟ ∑ (substituting ∑ 1 = n)
n i =1 n ⎝ i =1 i =1 ⎠ i =1

177
MEP Pupil Text 9
9.6

n
⎛ n ⎞
∑ xi2 ⎜ ∑xi ⎟
= i =1
− 2x ⎜ i =1
⎟ + x2 (dividing by n)
n ⎜ n ⎟
⎜ ⎟
⎝ ⎠

n n

∑x i
2
∑x
i =1
i

= i =1
− 2x 2 + x 2 (substituting x for )
n n
n
2
∑x i
= i =1
− x2
n
and the result follows.

Worked Example 2
Find the mean and standard deviation of each of the following sets of numbers.
(a) 10, 11, 12, 13, 14 (b) 5, 6, 12, 18, 19

Solution
(a) The mean, x , is given by
10 + 11 + 12 + 13 + 14
x =
5
60
=
5
= 12

The standard deviation can now be calculated using the alternative formula.

⎛ 10 2 + 112 + 12 2 + 132 + 14 2 ⎞ 2
s.d. = ⎜ ⎟ − 12
⎝ 5 ⎠

= 146 − 144

= 32
= 1.414 (to 3 decimal places) .

(b) The mean, x , is given by

5 + 6 + 12 + 18 + 19
x =
5
= 12 (as in part (a)).

178
MEP Pupil Text 9

The standard deviation is given by

⎛ 52 + 6 2 + 12 2 + 182 + 19 2 ⎞ 2
s.d. = ⎜ ⎟ − 12
⎝ 5 ⎠

= 178 − 144

= 5.831 (to 3 decimal places).

Note that both sets of numbers have the same mean value, but that set (b) has a much
larger standard deviation. This is expected, as the spread in set (b) is clearly far more
than in set (a).

Worked Example 3
The table below gives the number of road traffic accidents per day in a small town.

Accidents per day 0 1 2 3 4 5 6

Frequency 5 8 6 3 2 1 1

Find the mean and standard deviation of this data.

Solution
The necessary calculations for each datapoint, xi , are set out below.

Accidents per day Frequency

( xi ) ( fi ) xi 2 xi fi xi 2 fi
0 5 0 0 0
1 8 1 8 8
2 6 4 12 24
3 3 9 9 27
4 2 16 8 32
5 0 25 0 0
6 1 36 6 36
TOTALS 25 43 127

From the totals,

n n
2
n = 25 , ∑x
i =1
i fi = 43 , ∑x
i =1
i = 127 .

The mean, x , is now given by

∑x
i =1
i fi

x =
n
43
=
25
= 1.72 .
179
MEP Pupil Text 9
9.6

The standard deviation is now given by

∑x
i =1
i
2
fi
s.d. = − x2
n

127
= − 1.72 2
25
= 1.457 .

Most scientific calculators have statistical functions which will calculate the mean and
standard deviation of a set of data.

Exercises
1. (a) Find the mean and standard deviation of each set of data given below.

A 51 56 51 49 53 62
B 71 76 71 69 73 82
C 102 112 102 98 106 124

(b) Describe the relationship between each set of numbers and also the relation-
ship between their means and standard deviations.

2. Two machines, A and B, fill empty packets with soap powder. A sample of boxes
was taken from each machine and the weight of powder (in kg) was recorded.

A 2.27 2.31 2.18 2.2 2.26 2.24

B 2.78 2.62 2.61 2.51 2.59 2.67 2.62 2.68 2.70

(a) Find the mean and standard deviation for each machine.
(b) Which machine is most consistent?

3. Two groups of students were trying to find the acceleration due to gravity.
Each group conducted 5 experiments.

Group A 9.4 9.6 10.2 10.8 10.1

Group B 9.5 9.7 9.6 9.4 9.8

Find the mean and standard deviation for each group, and comment on their results.

4. The number of matches per box was counted for 100 boxes of matches.
The results are given in the table below.

180

ECE 069 - Engineering Data Analysis - WM
No ratings yet
ECE 069 - Engineering Data Analysis - WM
133 pages
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition updated 2025
100% (2)
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition updated 2025
122 pages
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition full chapters instanly
100% (2)
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition full chapters instanly
111 pages
Stata Commands PDF
No ratings yet
Stata Commands PDF
5 pages
Biostatistics New
No ratings yet
Biostatistics New
217 pages
UNIT1
No ratings yet
UNIT1
38 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
Basic Statistical Concepts For Nurses
100% (2)
Basic Statistical Concepts For Nurses
23 pages
Eyes On The Truth: Assessing The Use of The Body-Worn Cameras in South Cotabato Provincial Police Office
No ratings yet
Eyes On The Truth: Assessing The Use of The Body-Worn Cameras in South Cotabato Provincial Police Office
29 pages
Lecture Notes Quanti 1
No ratings yet
Lecture Notes Quanti 1
105 pages
Financial Statement Analysis Shivam Cements Limited
100% (1)
Financial Statement Analysis Shivam Cements Limited
43 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
PSY112
No ratings yet
PSY112
5 pages
Addressing Uncertainty
No ratings yet
Addressing Uncertainty
76 pages
Chapter 1 Classification and Graphical Presentation (Becon 2025)
No ratings yet
Chapter 1 Classification and Graphical Presentation (Becon 2025)
67 pages
Basic Statistics For Testing
No ratings yet
Basic Statistics For Testing
58 pages
Quantitative Methods: (Including Modeling & Simulation)
100% (2)
Quantitative Methods: (Including Modeling & Simulation)
29 pages
Math As A Tool Data Management Introduction and Central Tendency
No ratings yet
Math As A Tool Data Management Introduction and Central Tendency
12 pages
Statistics A Review
No ratings yet
Statistics A Review
47 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
RIP Midterm Revision Sheet
No ratings yet
RIP Midterm Revision Sheet
29 pages
Smm105 - Reviewer (Prelim)
No ratings yet
Smm105 - Reviewer (Prelim)
7 pages
(Ebook PDF) Advanced Engineering Mathematics 5th Edition PDF Download
100% (2)
(Ebook PDF) Advanced Engineering Mathematics 5th Edition PDF Download
83 pages
Mpa Reviewer
No ratings yet
Mpa Reviewer
42 pages
2 Types of Data
No ratings yet
2 Types of Data
44 pages
Statistics MMW
No ratings yet
Statistics MMW
65 pages
MMW Module 4
No ratings yet
MMW Module 4
54 pages
Analysis of Data
No ratings yet
Analysis of Data
4 pages
Ummiee
No ratings yet
Ummiee
5 pages
CH11 PPT
No ratings yet
CH11 PPT
33 pages
Data Analysis Planning Guide
No ratings yet
Data Analysis Planning Guide
9 pages
Literature Review of Tuition Impact On Learning of Students
50% (4)
Literature Review of Tuition Impact On Learning of Students
33 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
MMW Module 4 Lesson 1
No ratings yet
MMW Module 4 Lesson 1
13 pages
G.E. 4 Pre - Final Handoout
No ratings yet
G.E. 4 Pre - Final Handoout
11 pages
Statistics for Students
No ratings yet
Statistics for Students
11 pages
WK7 NCMB315
No ratings yet
WK7 NCMB315
4 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
Bio Statistics
No ratings yet
Bio Statistics
24 pages
Stat130 Module Notes
No ratings yet
Stat130 Module Notes
151 pages
H
No ratings yet
H
6 pages
Basic Ideas of Data Management
No ratings yet
Basic Ideas of Data Management
32 pages
Understanding Variables in Data Analysis
No ratings yet
Understanding Variables in Data Analysis
12 pages
Module Stat 1
No ratings yet
Module Stat 1
4 pages
FILES
No ratings yet
FILES
5 pages
BSN Midterm 2023: Quantitative Research Guide
No ratings yet
BSN Midterm 2023: Quantitative Research Guide
19 pages
Stats
No ratings yet
Stats
20 pages
MTPDF1 - Introduction To Statistics
No ratings yet
MTPDF1 - Introduction To Statistics
106 pages
Stansa23z - 2023 - Basic Statistics
No ratings yet
Stansa23z - 2023 - Basic Statistics
10 pages
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
28 pages
Module 5
No ratings yet
Module 5
27 pages
CHP1 Mat161
No ratings yet
CHP1 Mat161
4 pages
Research Article: Tesfaye Mekonnen Sifan
No ratings yet
Research Article: Tesfaye Mekonnen Sifan
12 pages
Assignment EMRM5103 Risk Management (Full)
100% (1)
Assignment EMRM5103 Risk Management (Full)
68 pages
Week 2 - Demographics and Introduction To Statistics
50% (2)
Week 2 - Demographics and Introduction To Statistics
53 pages
STPDF1 - Recalling Basic Concepts
No ratings yet
STPDF1 - Recalling Basic Concepts
31 pages
1) Unit 1. Introduction PDF
No ratings yet
1) Unit 1. Introduction PDF
7 pages
Numeracy & Quantitative Methods: Numeracy For Professional Purposes
No ratings yet
Numeracy & Quantitative Methods: Numeracy For Professional Purposes
18 pages
Introduction to Statistics: Key Concepts
No ratings yet
Introduction to Statistics: Key Concepts
15 pages
Continuous Space PBIL Extensions
No ratings yet
Continuous Space PBIL Extensions
8 pages
Statistics
No ratings yet
Statistics
101 pages
Module 1
No ratings yet
Module 1
10 pages
SMA 160 Exam July 2019 Suppl
No ratings yet
SMA 160 Exam July 2019 Suppl
4 pages
Numeracy & Quantitative Methods: Numeracy For Professional Purposes
No ratings yet
Numeracy & Quantitative Methods: Numeracy For Professional Purposes
18 pages
1 Nature of Statistics
No ratings yet
1 Nature of Statistics
7 pages
Basic Statistical Tools in Research and Data Analy PDF
No ratings yet
Basic Statistical Tools in Research and Data Analy PDF
8 pages
Types of Data Qualitative Data
No ratings yet
Types of Data Qualitative Data
5 pages
Gender Age and Inequality in The Professions 1st Edition Marta Choroszewicz Instant Download
No ratings yet
Gender Age and Inequality in The Professions 1st Edition Marta Choroszewicz Instant Download
76 pages
Chapter 2. Random Variables: Niprl
No ratings yet
Chapter 2. Random Variables: Niprl
59 pages
The Project of Dividend Policy Commercial Bank Limited
No ratings yet
The Project of Dividend Policy Commercial Bank Limited
45 pages
AP Statistics Curriculum Guide
No ratings yet
AP Statistics Curriculum Guide
3 pages
Xii Com Mark Weightage Pre-Board 2024
No ratings yet
Xii Com Mark Weightage Pre-Board 2024
2 pages
MiniTest 720-730-740 en
No ratings yet
MiniTest 720-730-740 en
8 pages
Midterm - APS1070 - 2020 - 05 Summer
No ratings yet
Midterm - APS1070 - 2020 - 05 Summer
2 pages
The Kolmogorov-Smirnov Test: Vasileios Hatzivassiloglou University of Texas at Dallas
No ratings yet
The Kolmogorov-Smirnov Test: Vasileios Hatzivassiloglou University of Texas at Dallas
11 pages
Jari Oksanen Vegan - Ecological Diversity PDF
No ratings yet
Jari Oksanen Vegan - Ecological Diversity PDF
17 pages
Netherlands Vs Argentina
No ratings yet
Netherlands Vs Argentina
13 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
65 pages
Lecture6 Clustering
No ratings yet
Lecture6 Clustering
47 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
34 pages
18MAB303T - Testing Hypothesis - Basics 2023
No ratings yet
18MAB303T - Testing Hypothesis - Basics 2023
25 pages
07 Box Plots, Variance and Standard Deviation
No ratings yet
07 Box Plots, Variance and Standard Deviation
5 pages
Bcoc - 134
No ratings yet
Bcoc - 134
4 pages
P-Value 0.2, 0.05 Data Is Not Normal Reject H0: Tests of Normality
No ratings yet
P-Value 0.2, 0.05 Data Is Not Normal Reject H0: Tests of Normality
2 pages

Part II - Data Aalysis

Uploaded by

Part II - Data Aalysis

Uploaded by

Department of Library and Information.

• Independent Variable (IV)

• Dependent Variable (DV)

• Extraneous Variable (EV)

Consider an experimental study aimed at establishing the efficacy of electronic scholarly on

(b) To find the median, place all the numbers in order.

(c) The score 4 occurs most often, so,

For this data, find

17. Eight judges each give a mark out of 6 in an ice-skating competition.

9.2 Finding the Mean from Tables and

No. of Goals Frequency

Find the mean number of goals per match.

Solution No. of Goals Frequency No. of Goals × Frequency

= 1.825 . (Total matches) (Total goals)

Find the mean number of cars sold per day.

Cars sold daily Frequency Cars sold × Frequency

(Total days) (Total number of cars sold)

Find the mean number of accidents per day.

Number of Accidents Tally Frequency No. of Accidents × Frequency

No. of Cars Frequency

Calculate the mean number of cars per household.

No. of TV Sets Frequency

Calculate the mean number of TV sets per household.

9.4 Mean, Median and Mode for Grouped Data

Class Interval Mid-point Frequency Mid-point × Frequency

140 ≤ h < 150 145 6 145 × 6 = 870

Age 5–6 7–8 9 – 10

Class Interval Mid-point Frequency Mid-point × Frequency

(c) The modal age is the 7 – 8 age group.

Number of days off sick 1–5 6 – 10 11 – 15 16 – 20 21 – 25

Class Interval Mid-point Frequency Mid-point × Frequency

Homes visited 0–9 10 – 19 20 – 29 30 – 39 40 – 49

(a) Estimate the mean number of homes visited.

2. The weights of a number of students were recorded in kg.

Mean (kg) 30 ≤ w < 35 35 ≤ w < 40 40 ≤ w < 45 45 ≤ w < 50 50 ≤ w < 55

(a) Estimate the mean weight. (b) Estimate the median.

Time (seconds) 10 ≤ t < 15 15 ≤ t < 20 20 ≤ t < 25 25 ≤ t < 30

4. The distances that children in a year group travelled to school is recorded.

(a) Does the modal class contain the median?

9.5 Cumulative Frequency

Worked Example 1 Height (cm) Frequency

For the data given in the table, draw up a 90 < h ≤ 100 5

Height (cm) Frequency Cumulative Frequency

A graph can then be plotted using points as shown below.

Use the graph to find:

can be found by starting at 60

To find the lower quartile, 120

To find the upper quartile,

Upper Quartile = 67 . Upper quartile = 67

Inter - quartile Range = Upper Quartile − Lower Quartile

Here the mark which was attained

This gives a mark of 79.

(d) To find the number of students who 100

This gives a cumulative frequency Cumulative

So the number of students with a 40

score greater than 75 is

Distance (km) 0 < d ≤1 1< d ≤ 2 2<d≤3 3<d≤4 4<d≤5 5<d≤6

0 20 40 60 80 100 120 140

9.6 Standard Deviation

where xi represents each datapoint ( x1 , x 2 , ..., x n)

sums up all these squared differences.

Now the standard deviation can be calculated.

(6 − 7)2 + ( 7 − 7)2 + (8 − 7)2 + (5 − 7)2 + (9 − 7)2

An alternative formula for standard deviation is

(b) The mean, x , is given by

The standard deviation is given by

= 5.831 (to 3 decimal places).

Accidents per day 0 1 2 3 4 5 6

Find the mean and standard deviation of this data.

Accidents per day Frequency

From the totals,

The mean, x , is now given by

The standard deviation is now given by

A 2.27 2.31 2.18 2.2 2.26 2.24

Group A 9.4 9.6 10.2 10.8 10.1

You might also like