Measures of Spread
Week 2
Teacher: MSc. Diana Barrezueta F.
Learning Objectives
• Calculate the range and interquartile range.
• Calculate the standard deviation for a population and a sample, and
understand its meaning.
• Distinguish between the variance and the standard deviation.
• Calculate and apply Chebyshev’s Theorem to any set of data.
Measures of Spread
• Another important feature that can help us understand more about a
data set is the manner in which the data are distributed, or spread.
• Variation and dispersion are words that are also commonly used to
describe this feature.
• There are several commonly used statistical measures of spread that
we will investigate in this lesson.
Range
One measure of spread is the range.
The range is simply the difference between the largest value (maximum)
and the smallest value (minimum) in the data.
Example: Return to the data set used in the previous lesson, which is
shown below:
75, 80, 90, 94, 96
The range of this data set is
96−75=21.
This is telling us the distance between the maximum and minimum values in
the data set.
The range is useful because it requires very little calculation, and therefore,
gives a quick and easy snapshot of how the data are spread.
However, it is limited, because it only involves two values in the data set,
and it is not resistant to outliers.
Interquartile Range
The interquartile range is the difference between the Q3 and Q1,
and it is abbreviated IQR. Thus, IQR=Q3−Q1.
The IQR gives information about how the middle 50% of the data are spread.
Fifty percent of the data values are always between Q3 and Q1.
Example: A recent study proclaimed Mobile, Alabama the wettest city in
America.
Source: http://www.livescience.com/environment/070518_rainy_cities.html.
The following table lists measurements of the approximate annual rainfall in
Mobile over a 10 year period. Find the range and IQR for this data.
• First, place the data in order from smallest to largest. The
range is the difference between the minimum and maximum
Find the Range rainfall amounts. In this example, the range tells us that
there is a difference of 44 inches of rainfall between the
wettest and driest years in Mobile.
Find the IQR
• To find the IQR, first identify the
quartiles, and then compute Q3−Q1.
• In this example, the IQR shows that
there is a difference of 22 inches of
rainfall, even in the middle 50% of the
data.
• It appears that Mobile experiences wide
fluctuations in yearly rainfall totals,
which might be explained by its position
near the Gulf of Mexico and its
exposure to tropical storms and
hurricanes.
Standard Deviation – 1)Calculate the deviation
• The standard deviation is an extremely important measure of spread that is based on the mean.
• Recall that the mean is the numerical balancing point of the data. One way to measure how the data are spread is to
look at how far away each of the values is from the mean.
• The difference between a data value and the mean is called the deviation.
• Written symbolically, it would be as follows:
Example: Computing the deviations
• Let’s take the simple data set of
three randomly selected
individuals’ shoe sizes shown
below: 9.5, 11.5, 12
• The mean of this data set is 11.
• Notice that if a data value is less
than the mean, the deviation of
that value is negative. Points that
are above the mean have positive
deviations.
• The deviations are as follows:
What is the Standard Deviation?
• The standard deviation is a measure of the typical, or average,
deviation for all of the data points from the mean.
• However, because the mean is the balancing point of the data, when
you add the deviations, they always sum to 0.
• Therefore, we need all the deviations to be positive before we add
them up.
• For the standard deviation, we square all the deviations. The square
of any real number is always positive.
Example: Computing the squared deviations
We want to find the average of the squared
deviations.
Usually, to find an average, you divide by the
number of terms in your sum. In finding the
standard deviation, however, we divide by n−1.
In this example, since n=3, we divide by 2. The
result, which is called the variance, is 1.75.
The variance of a sample is denoted by s2 and is
a measure of how closely the data are clustered
around the mean.
Computing the Standard Deviation
• Because we squared the deviations before we added them, the units we were
working in were also squared.
• To return to the original units, we must take the square root of our result:
√1.75≈1.32.
• This quantity is the sample standard deviation and is denoted by s.
• The number indicates that in our sample, the typical data value is
approximately 1.32 units away from the mean.
• It is a measure of how closely the data are clustered around the mean.
• A small standard deviation means that the data points are clustered close to
the mean, while a large standard deviation means that the data points are
spread out from the mean.
Why n−1?
• Dividing by n−1 is only necessary for the calculation of the standard
deviation of a sample.
• When you are calculating the standard deviation of a population, you
divide by N, the number of data points in your population.
• When you have a sample, you are not getting data for the entire
population, and there is bound to be random variation due to
sampling (remember that this is called sampling error).
• So we might be off by a little from using a sample, and it would be
better to overestimate s to represent the standard deviation
Formulas
Chebyshev’s Theorem
• This theorem gives us information about how many elements of a
data set are within a certain number of standard deviations of the
mean.
• The formal statement for Chebyshev’s Theorem is as follows: The
proportion of data points that lie within k standard deviations of the
mean is at least:
Example:
• 60-1 standard deviation=45 60+1 standard deviation=75
• 60-2 standard deviation=30 60+2 standard deviation=90
• 60-3 standard deviation=15 60+3 standard deviation=105
Review
Question