Correlation
Analysis
By
Jerome L. Buhay
What is correlation?
– Correlation is a measure of the direction and strength of
linear relationship between two variables.
Direction means positive or negative.
Strength can be perfect, strong or high, moderate, low or
zero or none correlation.
– Correlation between two variables does not prove X
causes Y or Y causes X.
What is Correlation?
Characteristics of a Relationship:
Direction
Positive As X goes up, Y goes up; variables “move” in
same direction
Negative As X goes up, Y goes down; variables “move” in
different directions
What is correlation?
– Degree/Strength and Direction of Relationship
How well do the data fit a specific form?
Typically look for how well data fit a straight line.
Scatter diagram is an illustrative way to determine the strength
and direction of relationship.
Pearson Correlation Coefficient is a numerical measure that can
also be used to determine strength and direction of relationship.
Scatter diagram
Scatter diagram
Scatter diagram
Pearson correlation coefficient r
Symbol: r
r can range from -1.0 to +1.0
Sign (+/-) indicates “direction”
Value indicates “strength”
Measures a “linear” relationship only
Pearson correlation coefficient r
Direction of relationship between X, Y
Positive (+r) = As X goes up, Y goes up
Negative (-r) = As X goes up, Y goes down
Strength of a relationship between X, Y
Closer to 1.0, stronger
Closer to 0, weaker
when r = 0 X,Y relationship not defined by a straight line/no
linear correlation
Pearson correlation coefficient r
Illustration:
-1 0 1
Perfect Negative No/Zero Perfect Positive
Correlation Correlation Correlation
Closer to 0 = weaker
Closer to 1.0 = stronger
r close to 1.0 perfect
r 0 could mean many things:
No relationship at all between X & Y
Non-linear relationship between X & Y
Restricted range on X and/or Y
Outlier may be causing problems
Pearson correlation coefficient r
Computational Formula:
r=
Correlation vs. Causality:
1. Correlation tells you two variables are related
2. Does NOT tell you why!!
3. Do not draw causal inferences from a correlation
X Y ,Y X
examples:
r = -0.60 #friends, depression
Does being depressed cause you to not have friends? Or does not having
friends cause you to be depressed?
r = +0.55 hours studying, grades
Do people who get good grades study more? Or does studying more lead to
good grades?
Example Computing r
HOURS SCORE
STUDENT (X) (Y) X2 Y2 XY
A 1 1 1 1 1
B 1 3 1 9 3
C 3 2 9 4 6
D 4 5 16 25 20
E 6 4 36 16 24
F 7 5 49 25 35
G 8 7 64 49 56
H 8 8 64 64 64
X = 38 Y = 35 X2=240 Y2=193 XY= 209
Interpreting Correlation (Evans, 1996)
Interpreting r
r Verbal Interpretation
-1 Perfect Negative Correlation
-0.8 to -0.99 Very Strong Negative Correlation
-0.6 to -0.79 Strong Negative Correlation
-0.4 to -0.59 Moderate Negative Correlation
-0.2 to -0.39 Weak Negative Correlation
-0.01 to -0.19 Very Weak Negative Correlation
0 No Correlation
0.01 to 0.19 Very Weak Positive Correlation
0.2 to 0.39 Weak Positive Correlation
0.4 to 0.59 Moderate Positive Correlation
0.6 to 0.79 Strong Positive Correlation
0.8 to 0.99 Very Strong Positive Correlation
1 Perfect Positive Correlation