LESSON 4.
3: SIMPLE CORRELATION ANALYSIS
OBJECTIVES: At the end of the lesson, the students must be able to:
1. compute and interpret the degree of correlation of two variables;
2. conduct hypothesis testing for the significance of the correlation coefficient.
LESSON PROPER
Simple correlation analysis is used to determine the degree of linear relationship between
two variables.
A. Pearson-Product Moment Correlation Coefficient.
It measures the strength of linear relationship between two variables in at least the interval
scale. The usual first step in analyzing a relationship between two variables that are at least in the
interval level is to construct and examine a scatter plot. Scatter plots are graphic display devices
which permit the researcher to quickly perceive several important features of the relationship
(Healey, 1999). They provide in the least, an impressionistic information about the existence,
strength, and direction of the relationship and can also be used to check the relationship for
linearity, i.e., how well the pattern of dots can be approximated with a straight line (Albacea,
2011).
The correlation coefficient takes on values from -1 to 1. A value very near -1 indicates an
almost perfect inverse linear relationship between X and Y, while a correlation coefficient very
near +1 indicates an almost perfect direct linear relationship between two variables. On the other
hand, a value very near zero implies absence of a linear relationship. The qualitative interpretation
for the correlation coefficient is given in the table below.
VALUE OF r QUALITATIVE INTERPRETATION
0 No correlation (association)
0.01 – 0.2 Very weak/very low correlation
0.21 – 0.4 Weak/low correlation
0.41 – 0.6 Moderate correlation
0.61 – 0.8 Strong/high correlation
0.81 – 0.99 Very strong/very high correlation
1 Perfect correlation
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Example: Find the value of the correlation coefficient and interpret the result.
Age (X) 43 21 25 42 57 59 35
Glucose 99 65 79 75 87 81 75
Level (Y)
120
100
GLUCOSE LEVEL
80
60
40
20
0
0 10 20 30 40 50 60 70
AGE
𝑛=7
∑ 𝑥 = 43 + 21 + ⋯ + 35 = 282
∑ 𝑦 = 99 + 65 + ⋯ + 75 = 561
∑ 𝑥 2 = 432 + 212 + ⋯ + 352 = 12,634
∑ 𝑦 2 = 992 + 652 + ⋯ + 752 = 45,647
∑ 𝑥𝑦 = 43(99) + 21(65) + ⋯ + 35(75) = 23,110
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
7(23,110) − 282(561)
𝑟=
√[7(12,634) − (282)2 ][7(45,647) − (561)2 ]
3,568
𝑟=
√[88,438 − 79524][319,529 − 314,721]
3,568
𝑟= = 𝟎. 𝟓𝟓
√[8,914][4,808]
There is a moderate positive linear relationship between age and glucose level.
B. Spearman-rho or rank-order correlation coefficient
The Spearman's rank-order correlation is the nonparametric version of the Pearson
product-moment correlation. Spearman's correlation coefficient, (ρ, also signified by rs)
measures the strength and direction of association between two ranked variables. You need
two variables that are either ordinal, interval or ratio. Although you would normally hope to
use a Pearson product-moment correlation on interval or ratio data, the Spearman correlation
can be used when the assumptions of the Pearson correlation are markedly violated. However,
Spearman's correlation determines the strength and direction of the monotonic
relationship between your two variables rather than the strength and direction of the linear
relationship between your two variables, which is what Pearson's correlation determines. A
monotonic relationship is a relationship that does one of the following: (1) as the value of one
variable increases, so does the value of the other variable; or (2) as the value of one variable
increases, the other variable value decreases.
6 ∑ 𝐷2
𝜌 = 1−
𝑛(𝑛2 − 1)
Where: D – difference between the two ranks
n – number of pairs of values
Example: Compute the spearman-rho for the scores of nine students in Physics and Math.
Physics (X) Math (Y) Rx Ry D D2
35 30 3 5 -2 4
23 33 5 3 2 4
47 45 1 2 -1 1
17 23 6 6 0 0
10 8 7 8 -1 1
43 49 2 1 1 1
9 12 8 7 1 1
6 4 9 9 0 0
28 31 4 4 0 0
12
6 ∑ 𝐷2
𝜌 = 1−
𝑛(𝑛2 − 1)
6(12)
𝜌 =1−
9(92 − 1)
72 72
𝜌 = 1− = 1−
9(81 − 1) 720
𝜌 = 1 − 0.1 = 𝟎. 𝟗
There is a very high association between the scores of the students in Physics and Math.
TEST OF HYPOTHESIS FOR THE SIGNIFICANCE OF THE CORRELATION
COEFFICIENT
HYPOTHESIS DECISION RULE
n < 30 n ≥30
Ho: 𝜌 = 0(There is no linear Reject Ho if
relationship between X and Y)
Ha: 𝜌 ≠ 0 (There is a linear │tc│> tα/2, (n – 2) │tc│> Zα/2
relationship between X and Y)
Ha: 𝜌 > 0 (There is a positive linear tc > tα, (n – 2) tc > Zα
relationship between X and Y)
Ha: 𝜌 < 0 (There is a negative linear tc < - tα, (n – 2) tc < - Zα
relationship between X and Y)
Test Statistic
𝑟 √𝑛 − 2
𝑡𝑐 =
√1 − 𝑟 2
Example: Determine if there is a significant linear relationship between age and
glucose level at 5% level of significance.
Age (X) 43 21 25 42 57 59 35
Glucose 99 65 79 75 87 81 75
Level (Y)
i. Ho: 𝜌 = 0(There is no linear relationship between age and glucose level)
Ha: 𝜌 ≠ 0(There is a linear relationship between age and glucose level)
ii. Do a two-tailed t-test at 𝛼 = 0.05.
iii. Decision Rule: Reject Ho if │tc│> tα/2, (n – 2)
│tc│> t0.05/2, (7 – 2)
│tc│> t0.025, 5 = 2.571
Otherwise, fail to reject Ho.
iv. Computation
𝑟√𝑛 − 2
𝑡𝑐 =
√1 − 𝑟 2
0.55√7 − 2
𝑡𝑐 = = 𝟏. 𝟒𝟕
√1 − (0.55)2
v. Decision: Since │tc│=1.47 < 2.571, then we fail to reject Ho.
vi. Conclusion: At 𝛼 = 0.05, we can conclude that there is no significant linear relationship
between age and glucose level.