Unit 1: Exploring Data
Lesson 3: Measures of Variation
Variation
● Three most commonly used measures of variation are:
○ Range
○ Interquartile range
○ Standard deviation
Range
● The range of data is the difference between the largest and the smallest
measurement in a data set
● R = range = largest measurement - smallest measurement
● Range is the simplest measure of spread
● Not a reliable measure because it depends only on the two extreme
measurements
Interquartile range
● The interquartile range (IQR) is the range of the middle 50% of the data, the
difference between the third quartile and the first quartile
● IQR is not affected by outliers
● If you choose to measure the center by median you should use IQR to
measure spread
● We will continue a deepdive into IQR when we cover positions
MC1
If quartiles Q1 = 20 and Q3 = 30, which of the following must be true?
I. The median is 25.
II. The mean is between 20 and 30.
III. The standard deviation is at most 10.
A. I only
B. II only
C. III only
D. All are true.
E. None are true.
Standard deviation
● Standard deviation is the more useful measure of variation than range
● Standard deviation refers to the distribution’s extent of stretch or squeeze
between values of a data set
● Standard deviation takes into account all measures, not just the extreme
measurements
● Standard deviation is affected by outliers, like range
● When there are outliers, use IQR
● The square of the standard deviation is called the variance
Standard deviation, continued
● A lowercase Greek letter sigma σ is used to denote a population standard
deviation
● The square of the population standard deviation is called the population
variance
Standard deviation, continued
● The letter s is used to denote a sample standard deviation
● The square of the sample standard deviation is called the sample variance
Why do we divide n-1?
N population size
n sample size
Standard deviation, continued
● Note that standard deviation is measured in the same units as the data values
while variance is measured in squared units of the data values
○ For example if the standard deviation of household income is dollars, the variance is dollars^2
● Standard deviation (SD) can be used as unit for measuring the distance
between any measurement and the mean of the data set
○ We will cover more on this idea later
Standard deviation, continued
Let’s check your understanding
● What does a standard deviation of 0 mean?
● Why is a SD >= 0 ? can variance be negative?
● What does a larger standard deviation mean?
Example
Student A: Student B:
8 7
9 8
4 6
8 7
6 6
Can you calculate the standard deviation for each
8 6
student?
7 7
9 5
10 5
5 7
Example Answer:
Student A: Student B:
MC2
Suppose the average score on a national test is 500 with a standard deviation of 100. If each score is
increased by 25%, what are the new mean and standard deviation?
A. 500, 100
B. 525, 100
C. 625, 100
D. 625, 105
E. 625, 125
M3
Below is a boxplot of yearly tuition and fees of all four year colleges and universities in a Western state. The low outlier is from a private university that gives full
scholarships to all accepted students, while the high outlier is from a private college catering to the very rich.
Removing both outliers will effect what changes, if any, on the mean and median costs for this state's four-year institutions of higher learning?
A. Both the mean and the median will be unchanged.
B. The median will be unchanged, but the mean will increase.
C. The median will be unchanged, but the mean will decrease.
D. The mean will be unchanged, but the median will increase.
E. Both the mean and median will change.
Position
● Measures of position are used to describe the position of a value with respect
to the rest of the values in the data set
● Commonly used measures of position are:
○ Quartiles
○ Percentiles
○ Standardized scores (z-score)