CORRELATION
ANALYSIS
1
Topics
1. Correlation Analysis
▪ Sample case on Correlation Analysis
▪ Example of Correlation using Jamovi
▪ Example of Correlation using MS Excel
2
Topics
1. Introduction
2. Correlation
• What It Means
• What It Does Not Mean
3
Case: Housing Prices
Your uncle is planning to sell his house in the
USA. To get an initial feel for the types of houses
in his area, he has collected the data in the file
named HOUSES. This file contains information
such as the selling price of the house, square
feet, numbers of bedrooms & bathrooms, and
the presence of an attic for 108 sample homes
sold in his neighborhood. He needs help in data
analysis
4
Variables for the Housing Prices Case
SQ_FT: Variable measuring the total square feet of a
house
BEDS & BATHS: the # of bedrooms & bathrooms
HEAT & STYLE: are categorical variables
HEAT takes on the value of 0 for gas forced air heating &
1 for electric heat.
STYLE: architectural style of a house: 0 indicates a
trilevel, 1 indicates a two-story house & 2 indicates that
a house is a bungalow
5
BATHS
Variables for the Housing Prices Case
GARAGE: the # of cars that can fit into the garage
BASEMENT: the presence (1) or absence (0) of a basement
AGE: the age of a house in years
FIRE: the presence (1) or absence (0) of a fireplace
PRICE: the selling price of a house in thousands of dollars
SCHOOL: the presence (1) or absence (0) of a school in the
area
6
BATHS
Case: Housing Prices
❖ RESEARCH OBJECTIVE:
To determine the
DETERMINANTS or PREDICTORS
of housing prices
7
BATHS
Case: Housing Prices
❖ Determinants or predictors are
known as INDEPENDENT VARIABLES
❖ Outcomes of the predictors are
known as DEPENDENT VARIABLES
8
Introduction
➢ Motivation for Conducting Correlation &
Linear Regression Analysis:
▪ Aim is to simultaneously analyze multiple
variables
o Consider a database of various variables across
clients (e.g. educational attainment, sex, income
& household assets)
o We may be interested in determining the
RELATIONSHIP among these variables
9
Introduction
❖ GUIDE to examine relationships:
10
Sir Francis
Galton:
Founder of
the
CORRELATION &
linear
regression
11
Introduction
Y
• Consider Galton’s
data on heights of
fathers & first
born sons
• Tall fathers tend to
have tall sons;
short fathers tend
to have short
sons.
X
12
Introduction
❖ Scatter Plot (scatter diagram)=
Can be used to show the relationship
between 2 numerical variables
▪ Aside from graphical devices, there are other ways
of assessing relationships:
▪ Correlation Analysis
▪ Simple Linear Regression
13
Introduction
Purpose of Correlation & Regression
❖ Correlation Analysis:
– Used to detect using Correlation Coefficient whether 2
variables are “linearly” related (or associated)
– i.e. Does one variable increase when the other
variable increases?
– Does one variable decrease when the other
variable increases?
14
Introduction
Purpose of Correlation & Regression
❖ Simple Linear Regression (SLR):
– Used to predict the value of 1 dependent
(response) variable based on the value of 1
independent (explanatory) variable
15
Correlation
16
Correlation & Scatter Plot
17
Correlation & Scatter Plot Diagram
❖ We examine if one independent
(explanatory) variable is related or
associated w/ one dependent
(response or outcome) variable
18
Correlation & Scatter Plot
19
Correlation
❖ Direction of the Relationship between 2 quantitative
variables
1) Positive relationship=
- As the independent variable increases, the dependent
variable increases as well
2) Negative relationship=
- As the independent variable increases, the
dependent variable decreases OR
- As the independent variable decreases, the
dependent variable increases
20
Scatter Plot
21
Example of a Positive Relationship
22
Example of a Negative Relationship
23
Scatter Plot
24
Correlation & Scatter Plot
25
26
Correlation
❖ Population Correlation Coefficient ρ (Rho)=
- Used to measure the strength of association (linear
relationship) between 2 numerical variables
– Concerned with strength of relationship
– No causal (cause-&-effect) effect is implied yet
• Sample Correlation Coefficient r is a point estimate of
ρ
- What we normally use since we only have samples instead of
populations
27
Correlation
Perfect Perfect
negative Zero positive
correlation correlation correlation
-1.0 -0.5 0 +0.5 +1.0
Increasing degree Increasing degree
of negative correlation of positive correlation
28
Degree of Strength of Correlation
• Perfect: If the value is near ± 1, then it said to be
a perfect correlation: as one variable increases,
the other variable tends to also increase (if
positive) or decrease (if negative).
• High degree: If the coefficient value lies between
± 0.50 and ± 1, then it is said to be a strong
correlation.
• Moderate degree: If the value lies between ±
0.30 and ± 0.49, then it is said to be a medium
correlation.
• Low degree: When the value lies below + .29,
then it is said to be a small correlation.
• No correlation: When the value is zero.
29
Sample Narrative when Describing the Results
of Correlation Analysis
❖ Square feet and house price are
correlated with a high degree of
correlation, r = .828, and were
significant (p < .001)
30
Null & Alternative Hypothesis Statements for
the Test of Correlation
❖ Null Hypothesis (Ho)=
The X & Y variables are NOT related.
❖ Alternative Hypothesis (Ha)=
The X & Y variables are related.
31
Ho & Ha Statements for Housing Price Case
❖ Null Hypothesis (Ho)=
The size of a house & its price are
NOT related.
❖ Alternative Hypothesis (Ha)=
The size of a house & its price are
related.
32