DOTE 2011 | Fall 2024
@ CUHK Business School
Statistical Analysis for Business Decisions
Descriptive Statistics and Excel Intro
Yunduan Lin
Assistant Professor
Department of Decisions, Operations and Technology
CUHK Business School
Agenda
Statistical Analysis for Business Decisions
01 Relationship Between Data
o Covariance
o Positive / negative correlation
02 Excel for Basic Statistics
o Basic functions
o Tool for calculation
o Generate different kinds of charts
Supplement Logistics
Grading
Main message:
You can secure a reasonable score as long as you spend some efforts (i.e., participation, homework etc.).
Exam
o Open everything except Internet.
o I can arrange one make-up exam if there is time conflict. But the time may not be ideal for all of you due to
the availability of TA and me.
o You can choose to skip one of the exams and move all weights to the other. BUT, the 'skip' decision should
be made before midterm and are not allowed to change later.
Quiz
o Basically, it is for 'participation', but not for 'exam'. I will mainly give points based on whether you submit it or
not, rather than the correctness. Even, the questions might be some open questions.
Project
o Small project. Allow 1-3 students as a group. Recall the 'mini-lecture' I mentioned in the Survey.
o However, I would expect more if you have more students in the group.
Results for Pre-Course Survey
69 responses
69 responses
Results for Pre-Course Survey
What do you expect to gain from this course? Topics to cover for mini-lecture:
o Data processing o Statistics analysis on gaming
o Math knowledge o Statistics base on the financial crisis like what
o How to view and understand data in an easier way and could be the major cause of financial crisis no
how to generate idea from it matter their seriousness
o Know about all the statistics terminology o How to not get fooled by biased statistics
o Quickly transfer the data into a clearer picture o Python
o Knowing how to classify and group the data that is o Vba
applicable in workplace o Use statistics to analyze a stock
o Excel o Fun fact about stat in ancient time
o The mechanism and application of ANOVA o I would like to cover the costs of a firm fulfilling
o Convince people with statistics the responsibility to the environment or
o Something actually useful sustainability
o learn something better for my career o Investments
o A good grade o News
o No o Fashion or music
o Exam skills(?)
A Feedback Form for the Entire Term
https://docs.google.com/forms/d/e/1FAIpQLSfsEgnMFLypI_KW6GF7j_FXtVY5E4Jrmf2P_BDwaG8GXWDc0A/viewform?usp=sf_link
Three Numerical Representation
Central tendency One number to present the whole dataset
o Mean Middle in terms of distance Easily affected by outliers
o Median Middle in terms of position Not affected by outliers
o Mode With most frequency
Dispersion How data points are spread out
o Range Difference between max and min Easily affected by outliers
o Interquartile Range (IQR) Difference between Q 3 and Q1 Not affected by outliers
o Variance (VAR) Average squared distance to mean
o Standard Deviation (SD) Sqaure root of VAR
o Coefficient of Variation (CV) Rescale SD with mean
Shape Symmetric? Thin or fat tail?
o Skewness Symmetry Cubic Extreme values
o Kurtosis Shape of tail Quadratic Extreme values
Revisit - IQR
Quartiles
Quartiles divided data into four (quart) parts (tiles):
o First quartile: Median
o Second quartile:
o Third quartile:
Interquartile range (IQR)
Difference between the third and first quartile:
Less affected by the outlier
Revisit – IQR - Extension
Rule of thumb
Outlier Detection
IQR can be considered as the range of the middle portion of the data, so this is not sensitive to
the outliers. In other words, we can use IQR to roughly define outliers:
o Mild / Suspected Outliers: any data points larger than Q3+1.5IQR or smaller than Q1-1.5IQR.
o Serious / Extreme Outliers: any data points larger than Q3+3IQR or smaller than Q1-3IQR.
Example: Outlier
Data: 1, 2, 3, 4, 5 Data: 1, 2, 3, 4, 50
1 2
Q1=2, Q3=4, IQR=2 Q1=2, Q3=4, IQR=2
Q3+1.5IQR=7, Q1-1.5IQR=-1 Q3+1.5IQR=7, Q1-1.5IQR=-1
Q3+3IQR=10, Q1-3IQR=-4 Q3+3IQR=10, Q1-3IQR=-4
Revisit - Skewness
Skewness
Average cubic distance from mean scales by cubic SD
skew=0: symmetric skew>0: right-skewed skew<0: left-skewed
Revisit - Skewness
Example:
Data: 1, 2, 3, 4, 5 Data: 1, 2, 3, 4, 10
1 2
1 -2 4 -8 1 -3 9 -27
2 -1 1 -1 2 -2 4 -8
3 0 0 0 3 -1 1 -1
4 1 1 1 4 0 0 0
5 2 4 8 10 6 36 216
Population skewness: Population skewness:
Revisit - Kurtosis
Kurtosis
Thickness of the tail
excesskurtosis=0: mesokurtic Normal distribution
excesskurtosis>0: leptokurtic Fat tail (slender peak)
excesskurtosis<0: platykurtic Thin tail (broad peak)
Revisit - Kurtosis
Example:
1 Data: 1, 2, 3, 4, 5
2 Data: 1, 2, 3, 4, 15
1 -2 4 16 1 -4 16 256
2 -1 1 1 2 -3 9 81
3 0 0 0 3 -2 4 16
4 1 1 1 4 -1 1 1
5 2 4 16 15 10 100 10000
Population kurtosis: Population kurtosis:
Population excess kurtosis: -1.3 Population excess kurtosis: 0.063
Platykurtic Leptokurtic
Relationship Between Paired Observations
Previously, we only have a single measure on one individual:
Suppose, we now have two measures on the same individual:
For example, x can be height and y can be weight.
We then consider the relationship between x and y.
o Covariance
o Correlation coefficient
Covariance - Definition
Covariance
Whether the given two sequences move toward means together or not?
Population covariance
Sample covariance
Cov>0: positive correlation, moves together;
Cov<0: negative correlation, moves oppositely.
Correlation Coefficient - Definition
Correlation coefficient
Rescale covariance by standard deviations
Population correlation coefficient
Sample correlation coefficient Coincide to be the same!
Correlation coefficient is always between –1 and 1.
Covariance - Illustration
Perfect positive correlation Positive correlation Zero correlation
Negative correlation Perfect negative correlation
Covariance & Correlation - Example
Example:
1 Data: 1, 2, 3, 4, 5
1 -2 4 4 -2 4 4
2 -1 1 7 1 1 -1
3 0 0 6 0 0 0
4 1 1 4 -2 4 -2
5 2 4 9 3 9 6
Covariance:
Correlation:
Covariance & Correlation - Importance
Why we care if them move together?
o For finance, negative correlated assets reduces total risk exposure (diversification)
o For marketing, the manager may want to know if consumers buying for one good would also be
interested to buy another good. Then the manager may want to provide discount for bundle
goods to boost sale.
Application
Summary statistics
Characteristic of stock return
o High mean / median: high return
o Low variance / standard deviation: less risky
o Positive skewness: likely to have a positive surprise
o Low kurtosis: less risk
Characteristic of a good portfolio
o Zero or negative correlation: lower total risk
Excel – Basic Functions
Use Excel as a Calculator
https://support.microsoft.com/en-us/office/excel-functions-
Official Documentation
alphabetical-b3944572-255d-4efb-bb96-c6d90033e188
Uploaded as
1 – Descriptive Statistics.xlsx
Excel – Cell Reference
Use Excel as to Store Data
Instead of entering data directly, store the data in the spreadsheet:
Single Cell Reference
o Absolute address: $A$1 – unchanged after copy and paste
o Relative address: A1 – move along copy and paste
Range of Cells
o Absolute address: $A$1:$B$2
o Relative address: A1:B2
Excel – Bar Chart
Use Excel to Generate a Bar Chart Vertical Axis Title
Computer-Related Jobs Number of Works
700,000
Job Names 2010 2020 Est. Median Pay
Systems Analysts 544,400 664,800 $77,740 600,000
Software App Developers 520,800 664,500 $90,530 500,000
Programmers 363,100 406,800 $71,380
400,000
Network / System Admins 347,200 443,800 $69,160
300,000
CIS Managers 307,900 363,700 $115,780
200,000
Infor Security Analysts 302,300 367,900 $75,660
Database Administrators 110,800 144,800 $73,490 100,000
0
Systems Software App Programmers Network / CIS Managers Infor Security Database
o Select the data (e.g., first three columns) Analysts Developers System
Admins
Analysts Administrators
o Click insert > insert Column or Bar Chart icon 2010 2020 Est.
Horizontal Axis Legend
Excel – Scatterplot
Use Excel to Generate a Scatterplot
Ice Cream Revenue
Temperature ℃ Ice Cream Revenue
1 14.2 $215
Ice Cream Revenue VS Temperature
2 16.4 $325
$700
3 11.9 $185 $600
4 15.2 $332 $500
y = 29.989x - 149.29
5 18.5 $506 $400
6 22.1 $522 Fitted equation $300 Trendline
$200
7 19.4 $412
$100
8 25.1 $614
$0
9 23.4 $544 0 5 10 15 20 25 30
10 18.1 $421 Ice Cream Revenue Linear (Ice Cream Revenue)
11 22.6 $445
12 17.2 $408 o Click on the scatters
o Right click > add trendline
o Select the data (e.g., last two columns) o Click on the trendline
o Click insert > insert Scatterplot icon o Right click > add trendline equation
Excel – Scatterplot VS Bar Chart
Which one to choose?
Ice Cream Revenue Ice Cream Revenue
Temperature ℃ Ice Cream Revenue Ice Cream Brand Ice Cream Revenue
1 14.2 $215 Pixabay $185
2 16.4 $325 Ben&Jerry's $215
3 11.9 $185 Haagen-Dazs $332
4 15.2 $332 Dreyer's $325
5 18.5 $506 Blue Bell $408
Continuous6 22.1 $522 Skinny Cow $421
Discrete / Halo Top $406
variable 7 19.4 $412
Categorical
8 25.1 $614
variable
9 23.4 $544
10 18.1 $421
11 22.6 $445
12 17.2 $408
Scatterplot Bar chart
Excel – Bar Chart VS Histogram
Which one to choose?
Bar Chart
Ice Cream Revenue
Ice Cream Revenue $500
Ice Cream Brand Ice Cream Revenue
Illustration $400
$300
Pixabay $185 by brand $200
$100
Ben&Jerry's $215 $0
Haagen-Dazs $332
Dreyer's $325
Blue Bell $408 Ice Cream Revenue
Skinny Cow $421
Histogram
Halo Top $406
Distribution of Ice Cream Revenue
Distribution
of revenue
Excel – Scatterplot VS Line Chart
Which one to choose?
Scatterplot
Ice Cream Revenue Ice Cream Revenue VS Temperature
$700
Temperature ℃ Ice Cream Revenue
$600
1 14.2 $215 $500
Correlation $400 y = 29.989x - 149.29
2 16.4 $325 $300
$200
3 11.9 $185
$100
4 15.2 $332 $0
0 5 10 15 20 25 30
5 18.5 $506
Ice Cream Revenue Linear (Ice Cream Revenue)
6 22.1 $522
7 19.4 $412 Line Chart
8 25.1 $614
Trend
Ice Cream Revenue
9 23.4 $544
$800
10 18.1 $421
$600
11 22.6 $445
$400
12 17.2 $408 $200
$0
1 2 3 4 5 6 7 8 9 10 11 12
o Select the data (e.g., the last column)
Series1
o Click insert > insert Line Chart icon
Excel – Wrong Examples
Charts can become meaningless
Ice Cream Revenue
Temperature ℃ Ice Cream Revenue A Wrong Line Chart
1 14.2 $215 700
2 16.4 $325 600
500
3 11.9 $185
400
4 15.2 $332
300
5 18.5 $506 200
6 22.1 $522 100
7 19.4 $412 0
1 2 3 4 5 6 7 8 9 10 11 12
8 25.1 $614 Temperature ℃ Ice Cream Revenue
9 23.4 $544
10 18.1 $421
11 22.6 $445 o There is no need to plot these two lines in one chart.
12 17.2 $408 These two columns have no meaning for comparison.
Excel – Wrong Examples
Charts can become meaningless
Ice Cream Revenue Another Wrong Line Chart
Ice Cream Brand Ice Cream Revenue $450
$400
Pixabay $185 $350
$300
Ben&Jerry's $215 $250
$200
Haagen-Dazs $332 $150
$100
Dreyer's $325 $50
Blue Bell $408 $0
Skinny Cow $421
Halo Top $406
Ice Cream Revenue
o The brand is not ordinal categorical or numerical
variable, therefore, this trend is meaningless.