Chapter 3
Describing Data: Numerical
Measures
Nguyen Thi Lien
Faculty of Mathematical Economics, NEU
Email:
[email protected]04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 1
What are we going to learn?
Compute and interpret the mean, median, and mode for a
set of data
Find the range, variance, standard deviation, and
coefficient of variation and know what these values mean
Apply the empirical rule to describe the variation of
population values around the mean
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 2
After this chapter, you should be able
to?
Measures of central tendency, variation, and shape
◦ Mean, median, mode, geometric mean, Quartiles
◦ Range, interquartile range, variance and standard deviation, coefficient
of variation
◦ Symmetric and skewed distributions
Population summary measures
◦ Mean, variance, and standard deviation
◦ The empirical rule and Bienaymé-Chebyshev rule
Five number summary and box-and-whisker plots
Covariance and coefficient of correlation
Pitfalls in numerical descriptive measures and ethical considerations
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 3
Describing Data
Numerically
Describing Data
Numerically
Central Variation
Tendency
Arithmetic Mean Range
Median Interquartile Range
Mode Variance
Standard Deviation
Coefficient of
Variation
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 4
Mean
Raw Data:
Arithmetic Weighted Data:
Mean
Grouped Data:
Geometric
Mean
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 5
Exercise 3.1: Calculate the Arithmetic
Mean
Course Grade Points
Algebra 3.63
Introduction to Logic 4.20
Microeconomics 3.46
Statistics 4.00
Total 11.66
n
∑ xi
i=1 x 1+ x 2 + ⋯ + x n
x̄ = =
n n
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 6
Exercise 3.2: Calculate the Weighted
Mean
If know more information about the number of credits:
Course Number Grade
of Credits Points
Algebra 3 3.63 n
∑ w i xi
Introduction to Logic 2 4.20 x̄ = i=1
n
∑ w𝑖
i=1
Microeconomics 3 3.46
Statistics 3 4.00
Weight wi Value xi
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 7
Exercise 3.2: Calculate the Weighted
Mean
Course Number of Grade Grade Points x
Credits Points Credits
Algebra 3 3.63 10.89
Introduction to Logic 2 4.20 8.40
Microeconomics 3 3.46 10.38
Statistics 3 4.00 12.00
Total 11 41.67
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 8
Exercise 3.3: Calculate the Weighted
Mean
A sample of 33 students were asked to rate themselves on whether they
were outgoing or not using this five point scale: 1 = extremely
extroverted, 2 = extroverted, 3 = neither extroverted nor introverted, 4
= introverted, or 5 = extremely introverted. The results are shown in
the table below:
Rating 1 2 3 4 5
Frequency 1 7 20 5 0
Calculate the sample mean.
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 9
Exercise 3.4: Calculate the Geometric
Mean
Year Rate of GDP change (%)
2012 105.25
2013 105.42
2014 105.98
2015 106.68
2016 106.21
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 10
Mean
Compare the mean of following data:
◦ Data 1: {10, 10, 11, 12, 12}
◦ Data 2: {10, 10, 11, 12, 120}
The mean is easily affected by the extreme values or outliers
lead to biased comparison Use the other measure
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 11
Median
No class width Class width
Put all the numbers in numerical order
An
An odd
odd number
number of
of observation
observation Find
Find the
the class
class containing
containing Median
Median
(f
(fii =2m+1):
=2m+1): Median
Median == xxm+1
m+1
An even number of observation Calculate the Median
Median = +
(fi =2m): Median =
XMedian(min) : Lower boundary of the class containing the median
hMedian : Width of the class containing the Median
i : Number of observations
SMedian-1 : Cumulative frequency of the previous class
fMedian : Frequency of the class containing the Median
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 12
Exercise 3.5: Calculate the Median
• Data: { 5, 6, 9, 5, 6}
ÞOrdered data: { 5, 5, 6, 6, 9 } : Median = 6
• Data: { 5, 7, 9, 8, 6,11}
=> Ordered Data {5, 6, 7, 8, 9, 11} : Median = = 7.5
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 13
Exercise 3.6: Calculate the Median
Number of Number Cumulative
tracks on CD of CDs Frequency
8 1 1
9 4 5 Median = xm+1
10 1 6
11 3 9
13 2 11
Total 11
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 14
Exercise 3.7: Calculate the Median
Number of Number Cumulative
tracks on CD of CDs Frequency
8 1 1
9 4 5
Median =
10 1 6
11 3 9
12 1 9
13 2 11
Total 12
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 15
Exercise 3.8: Calculate the Median
Age Number Cumulative
of users Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 16
Median
Compare the mean and median of following data:
◦ Data 1: {10, 10, 11, 12, 12}
◦ Data 2: {2, 3, 4, 6, 40}
The median is independent from the outliers
Depends on the position
And Apply for quantitative variable only
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 17
Mode
• Could be applied for both quantitative and qualitative variable
• Mode is the value repeated most often. Its frequency is the largest
• Find the Mode:
Qualitative Data: Mode is the category having the largest frequency
Quantitative Data: Mode is the value having the largest frequency
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 18
Exercise 3.9: Calculate the
Mode
• Qualitative Data
Data: { Yellow, Yellow, Red, Blue, Green}
Mode is “Yellow”
• Quantitative Data
Data 1: { 5, 6, 6, 7, 7, 7, 9 }
Data 2: { 5, 6, 7, 8, 9 }
Data 3: { 5, 6, 9, 5, 6 }
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 19
Exercise 3.9: Calculate the
Mode
• Qualitative Data
Data: { Yellow, Yellow, Red, Blue, Green}
Mode is “Yellow”
• Quantitative Data
Data 1: { 5, 6, 6, 7, 7, 7, 9 } -> Mode = 7
Data 2: { 5, 6, 7, 8, 9 } -> No mode
Data 3: { 5, 6, 9, 5, 6 } -> Mode = 5 and 6
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 20
Exercise 3.10: Calculate the
Mode
Age Number of users
10 – 20 3
20 – 30 7
30 – 40 18
40 – 50 20
50 – 60 12
Total 60
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 21
Quartile
Divide data into 4 equal-parts by 3 cutoff points: 3 quartile
2nd quartile:
25% 25% 25% 25%
Q1 Q2 Q3
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 22
Quartile
Find a quartile:
Q1 = 0.25(n+1)
First quartile position
Q2 = 0.50(n+1)
Second quartile position
Q3 = 0.75(n+1)
Third quartile position
where n is the number of observed values
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 23
Quartile
◦ Lower Quartile (Q1): LCB=lower boundary of the
Q1 = + class containing the item,
n=number of observations,
◦ Median (Q2):
S=cumulative frequency,
Q2 = +
f=frequency of the class
◦ Upper Quartile (Q3):
containing the item,
◦ Q3 = + h= width of the class containing
the item
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 24
Exercise 3.11: Calculate the Quartile
Cumulative
Age Frequency
Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 25
Mean, Mode, Median
Left skewed Symmetric Right skewed
Mean
Median
Mean < Median < Mode Mode Mode < Median < Mean
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 26
Measures of Dispersion
Describing Data
Numerically
Central Variation
Tendency
Arithmetic Mean Range
Median Interquartile Range
Mode Variance
Standard Deviation
Coefficient of
Variation
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 27
Range
• The difference between the largest and the smallest value
in a data set.
Range = xmax -
xmin
• Pros: simple
• Cons: affected by outliers
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 28
Exercise 3.12: Calculate the Range
Firm A Firm B
Worker 1 400 1480
Worker 2 400 1485
• Range (A) = 6000 – 400 = 5600 Worker 3 600 1486
Worker 4 600 1488
• Range (B) = 1522 – 1480 = 52 Worker 5 700 1490
Worker 6 800 1503
Worker 7 900 1505
Worker 8 2000 1520
Worker 9 2600 1521
Worker 10 6000 1522
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 29
Interquartile Range
Interquartile Range (IQR) is range between Q3 and Q1
IQR = Q3-Q1
IQR is the width of 50% middle value of data
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 30
Exercise 3.13: Calculate the IQR
Cumulative
Age Frequency
Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 31
Variance
Population variance Sample variance
µ = population mean = arithmetic mean
N = population size n = sample size
xi = ith value of the variable x Xi = ith value of the variable X
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 32
Standard Deviation
Population Standard Sample Standard variance
variance
=
µ = population mean = arithmetic mean
N = population size n = sample size
xi = ith value of the variable x Xi = ith value of the variable X
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 33
Exercise 3.14: Calculate the Variance
The variability between the number of
Compare coffee sales in two branches of Starbucks
A 20 40 50 60 80 20 49 50 51 80 B
Coffee sales in A Coffee sales in B
90 90
80 80 80 80
70 70
60 60 60
50 50 50 49 50 51
40 40 40
30 30
2020 2020
10 10
0 0
1 2 3 4 5 1 2 3 4 5
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 34
Exercise 3.15: Calculate the Variance
Age Frequency (fi) xi
10 - 20 3 15 1900.08
20 - 30 7 25 1610.19
30- 40 18 35 480.50
40 - 50 20 45 467.22
50 - 60 12 55 2640.33
Total 60 7098.33
=120.31
s
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 35
Coefficient of Variation
• Applying to compare among:
- Different variables
- Same variables but the means are different
• This is the ratio of the standard deviation to the mean
SD
CV = ×100
mean
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 36
Exercise 3.16: Calculate the CV
An investor is considering the relative risks associated with two
projects:
- The first project has a mean expected profit of £5000 with a
standard deviation of £707.11
- The second project has a mean expected profit of £500 with a
standard deviation of £112.13
Use the measures of dispersion to establish which project has
the lowest degree of risk.
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 37
Chebychev’s Theorem
For any population with mean μ and standard deviation σ,
and k > 1, the percentage of observations that fall within the
interval: [μ + kσ] is at least
Examples:
At least Within
(1 - 1/1.52) = 55.6% k = 1.5 (μ ± 1.5σ)
(1 - 1/22) = 75% k = 2 (μ ± 2σ)
(1 - 1/32) = 89% k = 3 (μ ± 3σ)
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 38
The Empirical Rule
If the data distribution is bell-shaped, then the interval:
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 39
Box and Whisker plot
Boxplot 1
Lower Limit
Upper Limit
Boxplot 2
Q1 – 1.5IQR Q3 + 1.5IQR
Lower limit: the maximum of Upper limit: the minimum of
(min, Q1-1.5*IQR) (max, Q3+1.5*IQR)
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 40
Skewness (Sk)
Sk = – 0.3 Sk = 0 Sk = 0.3
Left short tail Two-tail Right short tail
Sk = – 1.3 Sk = 1.3
Left long tail Right long tail
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 41
Covariance
Measure combined variability of and
Negati ve covariance
20
Positive covariance 20
Mean of Y
Mean of Y
10
10
0
0 5 10
Mean of X 0
0 5 10
Mean of X
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 42
Exercise 3.17: Calculate the Covariance
Calcultate the covariance of the following sample data of four
(X, Y) pairs: (1, 5), (2, 10), (4, 7), and (5, 9)
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 43
Correlation Coefficient
, no unit, measures linear relationship between and
: linear negative
: negatively correlated
: no correlated
: positively correlated
: linear positive
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 44
Correlation Coefficient
Size of correlation Interpretation
0.9-1.0 (-0.9 to -1.00) Very high positive (negative) correlation
0.7-0.9 (-0.7 to -0.9) High positive (negative) correlation
0.5-0.7 (-0.5 to -0.7) Moderate positive (negative) correlation
0.3-0.5 (-0.3 to -0.5) Low positive (negative) correlation
0.0-0.3 (-0.0 to -0.3) Negligible correlation
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 45
Correlation
Graph and Correlation Coefficient
Positively
Moderate
High
Negatively
Negligibl
y
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 46
Exercise 3.18: Calculate the correlation
coefficient
(X-MeanX).
No X Y X-MeanX Y-MeanY (X-MeanX)2 (Y-MeanY)2
(Y-MeanY)
1 1 5 -2 -2.75 5.50 4 7.5625
2 2 10 -1 2.25 -2.25 1 5.0625
3 4 7 1 -0.75 -0.75 1 0.5625
4 5 9 2 1.25 2.50 4 1.5625
Total 12 31 5 10 14.75
Mean 3 7.75
Variance 3.333 4.917
SD 1.8257 2.2174
= 0.411
The low positive relationship between price and quantity supplied
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 47
Summary Statistics in SPSS
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 48
Summary Statistics in Excel
Data -> Data Analysis
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 49
Summary Statistics in SPSS
Net Sales
Mean 77.6005
Standard Error 5.566494
Median 59.705
Mode 31.6
Standard Deviation 55.66494
Sample Variance 3098.585
Kurtosis 3.149955
Skewness 1.714996
Range 274.36
Minimum 13.23
Maximum 287.59
Sum 7760.05
Count 100
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 50