Measures of Variability
Range
Variance
Standard Deviation
Coefficient of Variation
Range
The range is a simple measure that tells you the spread of values in a data set. It has a simple definition:
• Found by subtracting the smallest value from the largest value in a data set
Range = maximum value – minimum value
So if you have a set of data such as 4, 2, 5, 8, 12, 15, the range is the highest number (15) minus
the lowest number (2). In this case:
Range = 15-2 = 13
• Illustration: Consider the data on home sales in Cincinnati, Ohio, suburb
Home Sale Selling Price ($)
10 138000
10 254000
10 186000
10 257500
10 108000
10 254000
10 138000
10 298000
10 199500
10 208000
10 142000
10 456250
• Largest home sales price: $456,250
• Smallest home sales price: $108,000
• Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
• Drawback: Range is based on only two of the observations and thus is highly influenced by
extreme values
Variance
• Measure of variability that utilizes all the data
• It is based on the deviation about the mean, which is the difference between the value of each
observation (xi) and the mean
• The deviations about the mean are squared while computing the variance
∑(𝑥𝑖 − 𝑥̅ )2
• Sample variance, 𝑠 2 =
𝑛−1
∑(𝑥𝑖 − µ)2
• Population variance , 𝜎 2 = 𝑁
Table 2.12: Computation of Deviations and Squared Deviations about the Mean for the Class Size Data
Computation of Sample Variance:
• Standard Deviation
• Positive square root of the variance
• Measured in the same units as the original data
• For sample , s = √𝑠 2
• For population, σ = √σ2
• Coefficient of Variation
Standard deviation
• ( x 100 ) %
Mean
• Measures the standard deviation relative to the mean
• Expressed as a percentage
Illustration:
• Consider the class size data:
46 54 42 46 32
• Mean, 𝑥̅ = 44
• Standard deviation, s = 8
8
• Coefficient of variation = (44 x 100)% = 18.2%
Analyzing Distributions
Percentiles Empirical Rule
Quartiles Identifying Outliers
Z-Scores Box Plots
Percentiles
• Value of a variable at which a specified (approximate) percentage of observations are below
that value
• The pth percentile tells us the point in the data where:
• Approximately p percent of the observations have values less than the pth percentile
• Approximately (100 – p) percent of the observations have values greater than the pth
percentile
• Steps to calculate the pth percentile:
• Arrange the data in ascending order (smallest to largest value)
• Compute k = (n + 1) × p
• Divide k into its integer component, i, and its decimal component, d
• If d = 0, find the kth largest value in the data set; this is the pth percentile
• If d > 0, the percentile is between the values in positions i and i + 1 in the sorted
data; to find this percentile, we must interpolate between these two values:
• Calculate the difference between the values in positions i and i + 1 in
the sorted data set; we define this difference between the two values as
m
• Multiply this difference by d: t = m × d
• To find the pth percentile, add t to the value in position i of the sorted
data
• Illustration
• To determine the 85th percentile for the home sales data in Table 2.9.
1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
2. Dividing 11.05 into the integer and decimal components gives us i = 11 and d = 0.05
d > 0, interpolate between the values in the 11th and 12th positions in the sorted data
Illustration (contd.)
• To determine the 85th percentile for the home sales data in Table 2.9
• The value in the 11th position is 298,000
• The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data
Quartiles
• When the data is divided into four equal parts:
• Each part contains approximately 25% of the observations
• Division points are referred to as quartiles
• 𝑄1 = first quartile, or 25th percentile
• 𝑄2 = second quartile, or 50th percentile (also the median)
• 𝑄3 = third quartile, or 75th percentile
z-score
• Measures the relative location of a value in the data set
• Helps to determine how far a particular value is from the mean relative to the data set’s
standard deviation
• Standardized value
• If 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 is a sample of n observations
𝑥𝑖 − 𝑥̅
𝑧𝑖 = 𝑠
• 𝑧𝑖 = z-score for 𝑥𝑖
• 𝑥̅ = sample mean
• s = sample standard deviation
• For class size data, 𝑥̅ = 44 and s = 8
• For observations with a value > mean, z-score > 0
• For observations with a value < mean, z-score < 0
Empirical Rule
• For data having a bell-shaped distribution:
• Within 1 standard deviation—approximately 68% of the data values
• Within 2 standard deviations—approximately 95% of the data values
• Within 3 standard deviations—almost all the data values
Identifying Outliers
• Outliers: Extreme values in a data set
• It can be identified using standardized values (z-scores)
• Any data value with a z-score less than –3 or greater than +3 is an outlier
Box Plots
• Graphical summary of the distribution of data
• Developed from the quartiles for a data set
*q`
Figure 2.23: Box Plots Comparing Home Sale Prices in Different Communities
Figure 2.22: Box Plot
for the Home Sales
Data