0% found this document useful (0 votes)

5 views21 pages

01 Correlation (Revised)

Uploaded by

shishirsms769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views21 pages

01 Correlation (Revised)

Uploaded by

shishirsms769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Correlation Analysis

Noor Md. Rahmatullah

Professor

Bivariate distribution:
There are many situations arise in which we are interested to study the
relationship between two variables such as:
1. The amount of rainfall and yield of a certain crop
2. The amount of FGR and fish weight
3. The height and weight of a group of children
4. Income and expenditure of several families
5. The heart girth and body weight of animal etc.

The distribution in which we consider a pairs of observations simultaneously is

known as bivariate distribution.

What is correlation?
When there is a relationship between quantitative measures between two sets
of phenomena, the appropriate statistical tool for discovering and measuring the
relationship and expressing it on a precise way is known as correlation.

Correlation may be linear or non-linear. If the amount of change in one

variable tends to bear a constant ratio to the amount of change in the other variable,
the correlation is said to be linear; because the scatter diagram would show a linear
path. Here we shall be concerned with linear correlation or simple correlation only.

Karl Pearson’s coefficient of correlation:

The Pearson‟s product-moment correlation coefficient (PMCC) is a statistic
that is used to estimate the intensity or degree of linear relationship between two
variables. It is a numerical estimate of both the strength of the linear relationship
and the direction of the relationship. It is calculated when the scale of measurement is
interval or ratio. This statistic is typically referred to as the correlation coefficient and
which is given by:
Σ(x i  x ) (yi  y)
Cov(x, y) n
rxy  
V(x) V(y) Σ(x i  x )2 Σ(yi  y) 2
n n
1
Σ(x i  x ) (yi  y)
 n
1
Σ(x i  x ) 2 Σ(yi  y)2
n
Σx Σy
Σx i yi  i i SP(x, y)
 n 
 2 (x i ) 2   2 (yi ) 2  SS(x) SS(y)
 Σx i    Σyi  
 n   n 
 The correlation coefficient computed from the sample data measures the strength
and direction of a linear relationship between two variables.
 The symbol for the population correlation coefficient  is the correlation
computed by using all the possible pairs of data values (x,y) taken from a
population.
 The symbol for the sample correlation coefficient is r is the correlation computed
from data obtained from the samples.

Assumptions underlying Karl Pearson’s correlation coefficient:

Pearson‟s correlation coefficient r is based on the following assumptions:
1. The variables under the study are measured on an interval or ratio scale.
2. The two variables follow bivariate normal distribution.
3. The relationship between the variables is linear.
4. The sample is adequate size to assume normality.

Geometrical interpretation of product moment correlation coefficient:

We can see from a scatter diagram whether two variables are correlated, but
we have no measure of how strong this relationship is. The diagram below shows a
typical scatter diagram for a bivariate distribution with the n pairs of observations of
the two variables x and y plotted.

y 2nd Quadrant 1st Quadrant

(x i  x, y i  y)
xi  x 
 
  yi  y
 
y
  ( x , y) 

 

3rd Quadrant 4th Quadrant
x
x

To show up any relationship more clearly, we can move the origin of the diagram to
the point ( x, y) . The coordinate of a typical point ( xi , y i ) will now be written
as ( xi  x, y i  y ) . The ith point on the diagram has been labeled to show this. If we
look the scatter diagram we can see the signs (+ or -) taken by all the ( xi  x) and
( y i  y ) in the four new quadrants. By multiplying the signs, we can find the sign
taken by their product ( xi  x) ( y i  y ) .

2 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Quadrant (x i  x) (y i  y) (x i  x) (y i  y)
First + + +
Second - + -
Third - - +
Fourth + - -

Now think about the three types of correlation:

 If there is a positive correlation, most points lie in the first and third quadrants so
 (x i  x) (y i  y) would be positive.
 If there is a negative correlation, most points lie in the second and fourth
quadrants so  (x i  x) (y i  y) would be negative.
 If there if no linear relationship, the points lie in all four quadrants and
 (x i  x) (y i  y) is zero or close to zero, since the positive and negative values
tend to cancel out.

Example: Judging correlation from scatter plots

Scatterplot

The most useful graph for displaying the relationship between two quantitative
variables is a scatterplot.
A scatterplot shows the relationship between two quantitative variables measured for
the same individuals. The values of one variable appear on the horizontal axis, and the
values of the other variable appear on the vertical axis. Each individual in the data
appears as a point on the graph.

3 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Properties of correlation coefficient:
1. The correlation coefficient is independent of the choice of origin and scale of
measurement of the variables.
2. Correlation coefficient measures the linear relationship.
3. Correlation coefficient is a symmetric measure with respect to x and y,
symbolically rxy  ryx . The r is not affected by the choice of x and y.
Interchange x and y and the value of r will not change.
4. Correlation coefficient lies between +1 and –1. Symbolically  1  r  1.
5. The correlation coefficient will be positive or negative depending on whether
the sign of Cov(x, y).
6. The correlation coefficient is a pure number without units i.e. dimensionless.

r is not affected by:

-- interchanging the two variables (it makes no difference which variable is
called x and which is called y)
-- adding the same number to all the values of one variable
-- multiplying all the values of one variable by the same positive number

Because r uses the standardized values of the observations, r does not change
when we change units of measurement (inches vs. centimeters, pounds vs.
kilograms and miles vs. meters etc.). So, r is “scale invariant”.
7. Correlation requires that both variables be quantitative (numerical).
You can‟t calculate a correlation between “income” and “city of residence”
because “city of residence” is a qualitative (non-numerical) variable.
8. The correlation can be misleading in the presence of outliers or nonlinear
association.
Correlation coefficient r does not describe curved relationships. r is affected
by outliers. When possible, check the scatter plot.
9. Correlation measures association. But association does not necessarily show
causation.
Both variables may be influenced simultaneously by some third variable.
10. The ratio of the values of rxy does not show the closeness of correlations.
If rxy  0.6 , then it does not mean that this correlation has double strength of
that whose value is rxy  0.3
11. The value rxy  0.5 shows as much closeness in the positive direction, as
rxy  0.5 shows in the negative direction.
12. Random variables which are at the interval or ratio level of measurement.
13. If X and Y are independent, then rxy  0 ; though, the converse is not true.

4 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

 Prove that correlation coefficient r lies between –1 and +1
symbolically  1  r  1

Let x and y be the variables and (x1,y1), (x2,y2),… …,(xn,yn) denotes n pairs of
observations with means ( x, y) and standard deviation sx and sy respectively. If we
write the standard normal variate as follows:
2 2
x x  x  x  x  x
ui  i  u i   i   u i   i 
2 2

sx s  s 
 x   x 
( xi  x ) 2
 u i   u i  n
2 2
2
sx
yi  y
and vi 
sy
Similarly vi  n
2

Again
( xi  x ) ( y i  y )
u i vi  
sx sy
( xi  x)( y i  y )

sx s y
nCov( x, y )

sx s y
 nr
Where r denote the correlation coefficient between x and y.
Now, (ui  vi ) 2 can never be negative, because it is a perfect square. Hence the sum
of all such squares for i=1, 2, 3, … …, n can not be negative; i.e.

(ui  vi ) 2  0
 ui  vi  2ui vi  0
2 2

 n  n  2nr  0
 2n(1  r )  0
 1  r  0 (Since n  0)
1  r  0

1  r  0
 1  r

r  1
 1  r  1.

This proves that the correlation coefficient lies between –1 and +1.

5 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

 Correlation coefficient is independent of change of origin and scale.

Proof :
We know that correlation coefficient between x and y is given by,
SP(x, y) Σx i  x yi  y 
rxy  
SS(x)  SS(y) Σx i  x  Σyi  y 
2 2

xi  a y b
Suppose the transformations u i  and vi  i for defining change of origin
c d
and scale (where c, d  0 ).
xi  a x a
Now, u i  u 
c c
x  a x  a xi  x
 ui  u  i  
c c c
 xi  x  cui  u 
Similarly, yi  y  dvi  v
Σx i  x yi  y  Σcu i  u  dvi  v 
rxy  
Σx i  x  Σyi  y  Σcu i  u  Σdvi  v 
2 2 2 2

cdΣu i  u vi  v  cdΣu i  u vi  v 

   ruv
c Σu i  u  d Σvi  v  cd Σu i  u  Σvi  v 
2 2 2 2 2 2

rxy  ruv
This proves that the correlation coefficient is not affected by change of origin and
scale.

Why correlation is important?

Correlation is widely used statistic in different field of research. At some point
in your career as a researcher you may be asked to describe the association (strength
and direction) between variables. A statistical technique called correlation which
allows us to do this. Most of the variables show some kind of relationship. For
instance, there is relationship between nitrogen rate and yield, price and supply,
income and expenditure etc. With the help of correlation analysis we can measure in
one figure the degree of relationship.

Degrees of Correlation

Through the coefficient of correlation, we can measure the degree or extent of the
correlation between two variables. On the basis of the coefficient of correlation we
can also determine whether the correlation is positive or negative and also its degree
or extent.

1. Perfect correlation: If two variables changes in the same direction and in the
same proportion, the correlation between the two is perfect positive.

6 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

According to Karl Pearson the coefficient of correlation in this case is +1. On
the other hand if the variables change in the opposite direction and in the same
proportion, the correlation is perfect negative. Its coefficient of correlation is
-1. In practice we rarely come across these types of correlations.
2. Absence of correlation: If two series of two variables exhibit no relations
between them or change in variable does not lead to a change in the other
variable, then we can firmly say that there is no correlation or absurd
correlation between the two variables. In such a case the coefficient of
correlation is 0.
3. Limited degrees of correlation: If two variables are not perfectly correlated
or is there a perfect absence of correlation, then we term the correlation as
Limited correlation. It may be positive, negative or zero but lies with the limits
 1.

High degree, moderate degree or low degree are the three categories of this kind of
correlation. The following table reveals the effect (or degree) of coefficient or
correlation.

Degrees Positive Negative

Absence of correlation 0 0
Perfect correlation +1 -1
High degree + 0.75 to + 1 - 0.75 to -1
Moderate degree + 0.25 to + 0.75 - 0.25 to - 0.75
Low degree 0 to 0.25 0 to - 0.25

Interpretation of r:
The correlation coefficient always lies between –1 and +1. To interpret the
correlation coefficient, we must consider both its sign (positive or negative) which
indicates the direction of relationship and its absolute value which indicates the
strength of linear relationship. A perfect positive correlation has a coefficient of 1.0; a
perfect negative correlation has a coefficient of -1.0. When there is no association
between two variables, the correlation coefficient has a value of 0. A value of zero
indicates that the variables are not linearly related or perhaps have more complex or
non linear relationship.
Review each correlation coefficient presented below and determine it‟s direction and
strength
1. -0.38
The negative sign tells us that this is a negative correlation. A high score on
the X variable would predict a low score on the Y variable. The absolute value of
the correlation is 0.38 would be considered moderate in size in agricultural
research. It is not a terrible strong relationship but there is definitely a linear
relationship between the two variables.

2. 0.23
This is a small, positive correlation. A high score on the X variable would
predict a high score on the Y variable but not with a great deal of accuracy. There
would be a fair amount of scatter on the bivariate plot but a definite linear
relationship could be seen.

7 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

3. –0.50
This is a negative correlation. The negative sign indicates that a high score on
the X variable would predict a low score on the Y variable. The absolute value of
0.50 suggests a fairly predictable relationship between X and Y. A correlation of –
0.50 would be considered a moderate correlation in agricultural research even
though it is only half-way between no relationship and a perfect relationship.
There would be a modest amount of scatter on the bivariate plot.

4. 0.84
This is a strong positive correlation. A high score on the X variable would
predict a high score on the Y variable. The absolute value of the correlation is
0.84 which is close size to 1.0. There would not be a lot of scatter on a bivariate
plot. This would be considered a high degree of correlation.

5. –1.0
This is a perfect negative correlation. The data would follow a perfectly
straight line on a scatter plot beginning in the upper left corner of the plot and
progressing downward to the lower right corner of the plot. A high score on the X
variable would predict a low score on the Y variable. There would be no scatter at
all on the plot.

6. 0.11
This is a small, positive correlation. The positive sign indicates that a high
score on the X variable would predict a high score on the Y variable. But, an
absolute value of 0.11 suggests a very small linear relationship. There would
be a large amount of scatter on the bivariate plot.

7. –0.06
This correlation coefficient is close to zero. Even though the sign of the
correlation coefficient is negative, the fact that its absolute value is so close to
zero would lead to an interpretation of no relationship. There would a large
amount of scatter on the bivariate plot that would appear to be random.

8. 0.62
This is a positive correlation. The positive sign indicates that a high score on the X
variable would predict a high score on the Y variable. The absolute value of 0.62
suggests a fairly predictable relationship between X and Y; There would be only a
modest amount of scatter on the bivariate plot.

9. -0.75
This is a high degree of negative correlation. The negative sign indicates that a
high score on the X variable would predict a low score on the Y variable. The
absolute value of 0.75 indicates a strong relationship. There would be only a
modest amount of scatter on the bivariate plot.

8 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Scatter diagram:
A graph that shows the relationship between two variables is called a scatter
diagram or scatter plot. A bivariate plot graphs the relationship between two variables
that have been measured on a single sample of subjects. Such a plot permits you to
see at a glance the degree and pattern of relationship between the two variables.

Nuance: Why is the maximum r = 1?

Consider a correlation coefficient correlated between your height in inches and
your height in feet. This is incredibly stupid to correlate (see our graph below).
6.0
5.8
Height in Feets
5.6
5.4
5.2
5.0
4.8

58 60 62 64 66 68 70 72
Height in Inches

Obviously the relationship will be perfect. All the points are on the line. There is no
x x y y
spread to the scatter plot. Thus your Z x  i will be equal to Z y  i . That's
x y
because you stand in the same relative location in the height distribution no matter
whether it is in feet or inches. Such a relationship gives you an r = 1.0 in the simple
derivation below.

9 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Cov( x, y )
rxy 
 x y
1 nCov( x, y )

n  x y
1 SP( x, y )

n  x y
1 ( xi  x)( y i  y )

n  x y
1 ( xi  x ) ( y i  y )
 
n x y
1 ( xi  x ) 2
 
n  x2
( xi  x ) 2 1

n  x2
 x2
 2
x
 1.

Correlation Does Not Imply Causation

Just because one variable relates to another variable does not mean that
changes in one causes changes in the other. Other variables may be acting on one or
both of the related variables and affect them in the same direction. Cause-and-effect
may be present, but correlation does not prove cause. For example, the length of a
person‟s pants and the length of their legs are positively correlated - people with
longer legs have longer pants; but increasing one’s pant length will not lengthen one’s
legs!

Property of Linearity
The conclusion of no significant linear correlation does not mean that X and Y
are not related in any way. The data depicted in Figure 2 result in r = 0, indicating no
linear correlation between the two variables. However, close examination show a
definite pattern in the data reflecting a very strong “nonlinear” relationship. Pearson’s
correlation apply only to linear data.

10 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

120

100

0
-15 -10 -5 0 5 10 15

Fig. 2: Nonlinear relationship between x and y rxy  0 where y  x 2

If two variables are independent, their correlation coefficient is zero. Is the

converse true? Explain by means of an example.
If two variables are independent, their correlation coefficient is zero. But the
converse is not true. A zero correlation coefficient does not necessarily signify that
the variables are independent. It only implies that there is no linear relationship
between the variables. However, the possible existence of a non-linear relationship
can not be ruled out altogether. For example, let us consider the following data:

x: -3 -2 -1 0 1 2 3
y: 9 4 1 0 1 4 9

Here  x = 0,  y = 28,  xy = 0, n = 7
ΣxΣy
 SP(x, y)  Σxy  0
n
SP(x, y)
Therefore rxy   0 i.e. the correlation coefficient between x and y is
SS(x).SS(y)
zero. But it may be noticed that x and y are bound by the relation y  x 2 . So, x and y
are not independent. Thus the correlation may be zero, even when the variables are
not independent.

Coefficient of Determination (r2)

The relationship between two variables can be represented by the overlap of

two circles representing each variable (Figure 3). If the circles do not overlap, no
relationship exists; if they overlap completely, the correlation equals r = 1.0. If the
circles overlap somewhat, as in Figure 3, the area of overlap represents the amount of

11 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

variance in the dependent (Y-variable) than can be explained by the independent (X-
variable). The area of overlap, called the percent common variance, calculates as:

r 2  100
For example, if two variables are correlated r = 0.71 they have 50% common variance
(0.712 x 100 = 50%) indicating that 50% of the variability in the Y-variable can be
explained by variance in the X-variable. The remaining 50% of the variance in Y
remains unexplained. This unexplained variance indicates the error when predicting Y
from X. For example, strength and speed are related about r = 0.80, (r2 = 64%
common variance) indicating 64% of both strength and speed come from common
factors and the remaining 36% remains unexplained by the correlation.

Fig. 3:Example of the coefficient of determination (percent common variance r2x100).

y variable (predicted)
x variable (predictor)

Area of overlap; r2x100 (percent common variance)

Probable Error of correlation coefficient:

If r is the correlation coefficient in sample of n pairs of observations, then its standard

error is given by

1  r2
SE(r) 
n

Probable error of correlation coefficient is given by

1  r 2 
P.E  0.6745  S.E(r)  0.6745  
 n 

Probable error is a measure for testing the reliability of an observed correlation

coefficient. The reason for taking the factor 0.6745 is that in a normal
distribution, the range   0.6745 covers 50% of the total area.

12 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

i. If the value of r is less than P. E., then there is no evidence of correlation i.e. r
is not significant.
ii. If r is more than 6 times the P. E. „r‟ is practically certain .i.e. significant.
iii. By adding or subtracting P. E. to „r‟, we get the upper and Lower limits within
which „r’ of the population can be expected to lie.

Symbolically  = r  P. E.

 = Population correlation coefficient.

State in each case whether you would expect to obtain a positive, negative or no
correlation between:
 Age and blood pressure.
 Air temperature and metabolic rate.
 Amount of rainfall and yield of a certain crop.
 Dose of nitrogen and yield of a certain crop.
 Drug dose and blood pressure.
 Food intake and weight.
 Idle time of machine and volume of production.
 Income and expenditure of several families.
 Increase in rainfall up to a point and production of rice.
 Investment and profit
 Number of goals conceded by a team and their position in the league.
 Number of hours studied and grade obtained.
 Number of tiller and yield of wheat.
 Numbers of errors and typing speed.
 Panicle length and yield of rice.
 Price and demand of commodities.
 Production and price per unit.
 Sale of cold-drinks and day temperature.
 Sale of woolen garments and day temperature.
 Shoe size and intelligence.
 Supply and Price of commodities.
 Temperature and percentage breakage of unhusked rice in milling.
 The age of husbands and wives.
 The height and weight of a group of children.
 Weight and blood pressure.
 Years of education and income.

13 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Example: If r = 0.6 and n = 64 find out the probable error of the coefficient of
correlation also find the expected population correlation coefficient.

Solution:

1  r 2 
P.E.  0.6745  
 n 
1  (0.6) 2 
 0.6745  
 64 
 0.57

Expected population correlation coefficient   r  P.E.  0.6  0.057 .

Example: Find the coefficient of correlation r, given that

Cov(x,y) = −16·5, Var(x) = 2·85 and Var(y) = 100

Solution:- Putting the given values in the formula,

Cov(x, y)
r
Var(x)  Var(y)
- 1.65

2.85  100
 - 0.97

Problem:

A group of n  15 Stray berry plants was grown in plots in a green house and
the measurement were taken on crop yield (y) and the corresponding level of nitrogen
present in the leaf at the time of picking:

x 2.50 2.55 2.54 2.65 2.68 2.55 2.62 2.57 2.63 2.59 2.69 2.61 2.67 2.57 2.53
y 247 245 266 277 284 251 275 272 241 265 281 292 285 274 282

Find the association between level of nitrogen and crop yield. Test the association
between level of nitrogen and crop yield.

14 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Objective Type Questions:

Comment on the following:

a) rxy  0  x and y are independent.

b) If rxy  0 then rx,  y  0, r x, y  0, r x,  y  0 .
c) Pearson correlation coefficient is independent of origin but not of scale.
d) If the correlation coefficient between the variables x and y is zero then the
correlation coefficient between the variables x2 and y2 is zero.
e) The numerical value of product moment correlation coefficient „r‟ between
two variables x and y can not exceed unity.
f) r measures every type of relationship between two variables.
g) If r  0 , then as x increases y also increases.
h) rxy  0.9 , then for large values of x, what sort of values do we expect for y.
i) if rxy  0 , what is the value of Cov(x,y) and how are x and y related.

 Indicate the correct answer using tick (√) mark:

a). The coefficient of correlation will have positive sign when

1. x is increasing, y is decreasing 2. both x and y are increasing

3. x is decreasing, y increasing 4. there is no change in x and y.

b). The coefficient of correlation

1. can not be positive 2. cannot be negative

3. is always positive 4. can be both positive as well as negative.

c). The correlation coefficient

1. can take any value between –1 and +1 2. is always less than –1

3. is always more than +1 4. can not be zero.

d). Probable error of r is

1  r 2  1  r 2 
1. P.E.  0.6475   2. P.E.  0.6475  
 n   n 

1  r 2  1  r 2 
3. P.E.  0.6475   4. P.E.  0.6745  
 n   n 

15 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

(e). The following table gives the relationship between pairs of data values (xi,yi) for
i = 1, 2, …, 5.
x 1 2 3 4 5
y 2 4 6 8 10

All the points lie on the line y  bx with regression coefficient b and correlation
coefficient r . Find b and r .

1) b  1 and r  2
2) b  2 and r  2
3) b  2 and r  1
1
4) b  and r  2 .
2
(f). The correlation coefficient r satisfies 0  r 2  1. Which of the following
statements is true?

1) 0  r  1. 2) r  1 3) r  1 4) r  1 or r  -1

(g). Find the correlation coefficient for 6 pairs of observations if the LSR line is
y  0.5  0.05 x and if 81% of the variation in y is explained by regression on x.
1) 0.9 2) 0.81 3) -0.05 4) None of these.

(h). For the bivariate data (x1,y1) (x2,y2) (xn,yn), the least squares regression line is
fitted. The line is y  2.51  4.1x . You know that the first data point is
( x1, y1 )  (0.1, 2.0) , so the residual at this point is:

1) 2.1 2) -0.1 3) 0.1 4) 2.0

(i). The correlation coefficient for a set of bivariate data (xi,yi) is r = 0.87, where the xi
are measured in inches and the yi are measured in lbs. A second analyst records the xi
values in cm. (1 inch ≈ 2.5 cm). What is the second analyst‟s value of the correlation
coefficient (to 2dp)?

1) 0.35
2) 0.87
3) 2.18
4) Unable to determine without knowing the yi units.

16 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

 Justification for r Formula

(x, y) centroid of sample points

x=3
y x - x = 7- 3 = 4
(7, 23)
24
•
20

y  y  23 11  12
Quadrant 2 Quadrant 1
16
•
12
y = 11
(x, y)
8
Quadrant 3 • Quadrant 4

4
••
0 x
0 1 2 3 4 5 6 7

17 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Methods of studying correlation
The following are the important methods of ascertaining whether two variables are
correlated or not:
1. Scatter Diagram Method
2. Karl Pearson‟s Coefficient of Correlation
3. Spearman‟s Rank Correlation Coefficient

 Scatterplot (or scatter diagram) is a graphical representation in which the

paired (x,y) sample data are plotted with a horizontal x axis and a vertical y
axis. Each individual (x,y) pair is plotted as a single point.

Causation

If there is a significant linear correlation between two variables, then one of five
situations can be true.

1. There is a direct cause and effect relationship

2. There is a reverse cause and effect relationship
3. The relationship may be caused by a third variable
4. The relationship may be caused by complex interactions of several variables
5. The relationship may be coincidental

Common Errors in Correlation

There are some common errors that are made when looking at correlation.

 Avoid concluding causation. We just got through talking about causation. Just
because there is a linear relationship doesn't mean that one thing caused the other.
It could be any of the five situations above.
 Avoid data based on rates or averages. Variation is suppressed when using a rate
or an average. The variance of the sample means was the variance of the
population divided by the sample size. So, if you work with averages, the
variances are smaller and you might be able to find linear relationships that are
significant when they would not be if the original data was used.
 Watch out for linearity. All that we're testing here is the strength of a linear
relationship. There are other kinds of relationships. In algebra, we talk about
linear, quadratic, cubic, exponential, logarithmic, Gaussian (bell shaped),
logistics, and power models. A scatter plot is a good way to look for patterns.

18 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Quiz on MCQ

Correlation is a __________ type of statistical analysis.

(a) univariate
(b) bivariate
(c) multivariate
(d) none of these

Correlation is:
(a) the covariance of standardized scores
(b) the mean of the population standard deviations
(c) a way of testing cause and effect
(d) for comparing mean differences
(e) none of the above

What would you expect the correlation between daily calorie consumption and body
weight to be?
(a) moderate to large positive
(b) small positive
(c) zero or near zero
(d) small negative
(e) moderate to large negative

The square of the correlation coefficient or r2 is called the

(a) coefficient of determination
(b) variance
(c) covariance
(d) cross-product
(e) none of the above

The measure of how well the regression line fits the data is the:
1. Coefficient of determination
2. Slope of the regression line
3. Mean square error
4. Standard error of the regression coefficient

The assumptions of the simple linear regression model include:

1. The errors are normally distributed
2. The error terms have a constant variance
3. The errors have a mean of zero
4. A and B
5. A, B, and C

As the relationship deteriorates from a perfect correlation, what happens to the points
on a scatter diagram?
1. They become more scattered
2. The slope changes
3. The y-intercept changes
4. Both B and C, above
5. None of the above

19 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

If two variables have a correlation coefficient of .30, what percentage of one variable
is accounted for by the other variable?
1. 30%
2. 70%
3. 10%
4. 9%

Observed errors, which represent information from the data which is not explained by
the model, are called?
1. Marginal values
2. Residuals
3. Mean square errors
4. Standard errors
5. None of the above

In an experiment an analyst has observed that SP(x,y) equals -212.35, SS(x) equals
237.16 and SS(y) = 858.49. The sample average for x was 193.1 and the sample
average for y was 15.2. Assuming that a linear regression model is appropriate, the
least squares estimate for 0 is ____________.
1. -0.859
2. -0.895
3. 188.099
4. 206.710
5. 218.719

In an experiment an analyst has observed that SP(xy) equals -212.35, SS(x) equals
237.16 and SS(y) = 858.49. The sample average for x was 193.1 and the sample
average for y was 15.2. Assuming that a linear regression model is appropriate, the
least squares estimate for 1 is ____________.
1. -0.859
2. -0.895
3. 188.099
4. 206.710
5. 218.719

In an experiment an analyst has observed that SPxy equals -212.35, SSx equals
237.16 and SSy = 858.49. The sample average for x was 193.1 and the sample
average for y was 15.2.Assume that a linear regression model is appropriate. These
results imply that, if X equals 200, the expected value for Y would be ____________.
1. 37,618.95
2. 26,824.83
3. 12.5
4. 11.2
5. 9.02

In an experiment an analyst has observed that SPxy equals -212.35, SSx equals
237.16 and SSy = 858.49. The sample average for x was 193.1 and the sample
average for y was 15.2. Assuming that a linear regression model is appropriate,
approximately ____________ of variation in Y could be attributed to variation in X.
1. 13%

20 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

2. 22%
3. 44%
4. 47%
5. None of the above

In a regression experiment involving 102 observations an analyst has estimated 0 as

81.41 and 1 as 1.925. In this analysis the sample average for x was 62.5, SSx was
325.64 and the standard error of the regression was 28.5. If x equals 71, a 95%
confidence interval for Y would be _____________.
1. 158.52; 254.55
2. 140.72; 272.35
3. 149.18; 263.89
4. 204.72; 208.35
5. 205.63; 207.44

In a study of 42 observations, the sample covariance between two variables, X1 and

X2, is -188.37. If SSx1 equals 202.25 and SSx2 equals 305.12. At  = 0.05, the test
statistic for H0:  =0 would equal _____________; we would therefore infer
_____________ between X1 and X2.
1. -3.06; a significant negative relationship
2. 2.14; a significant positive relationship
3. -1.99; no significant relationship
4. 5.24; a significant negative relationship
5. 3.06; a significant positive relationship

Which of the following values is minimized through least squares estimation of 0

and 1?
1. ( y  yˆ )2
2. ( yi  y ) 2
3. ( yi  yˆ )2

21 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

J. Klup Applications Year 12 Workbook Unit 3 (J.klup)
100% (5)
J. Klup Applications Year 12 Workbook Unit 3 (J.klup)
274 pages
Understanding Correlation Measures
100% (1)
Understanding Correlation Measures
6 pages
Unit II Notes Correlation and Regression
No ratings yet
Unit II Notes Correlation and Regression
19 pages
Chapter 8. Correlation and Regression Analyses
No ratings yet
Chapter 8. Correlation and Regression Analyses
36 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
22 pages
Tugas 6 Analisis Multivariat Data Panel
No ratings yet
Tugas 6 Analisis Multivariat Data Panel
11 pages
Ch.-1 Correlation, Regression and Curve Fitting
No ratings yet
Ch.-1 Correlation, Regression and Curve Fitting
22 pages
Correlation and Regression...
No ratings yet
Correlation and Regression...
39 pages
Correlation
0% (1)
Correlation
22 pages
Linear Correlation Analysis Guide
No ratings yet
Linear Correlation Analysis Guide
11 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
10 pages
CH 6 10math
No ratings yet
CH 6 10math
27 pages
Week 1 - Introduction To Descriptive Statistics
No ratings yet
Week 1 - Introduction To Descriptive Statistics
58 pages
Modelling and Forecast
No ratings yet
Modelling and Forecast
19 pages
Scatter Plots and Correlation Analysis
No ratings yet
Scatter Plots and Correlation Analysis
12 pages
Correlation Analysis-Students NotesMAR 2023
No ratings yet
Correlation Analysis-Students NotesMAR 2023
24 pages
Homework 6 Solution
No ratings yet
Homework 6 Solution
12 pages
Corelation and Reg.-12-27
No ratings yet
Corelation and Reg.-12-27
16 pages
Correlation
No ratings yet
Correlation
12 pages
CORRELATION
No ratings yet
CORRELATION
9 pages
Chapter 4
No ratings yet
Chapter 4
27 pages
QMM 1
No ratings yet
QMM 1
18 pages
Math T STPM Sem 3 2022
No ratings yet
Math T STPM Sem 3 2022
2 pages
Understanding Correlation Basics
No ratings yet
Understanding Correlation Basics
40 pages
PPP Correlation BIOSTATISTICS
No ratings yet
PPP Correlation BIOSTATISTICS
14 pages
009 D 1 Correlation
No ratings yet
009 D 1 Correlation
29 pages
Correlation
No ratings yet
Correlation
17 pages
Correlation Analysis Lecture 7
No ratings yet
Correlation Analysis Lecture 7
7 pages
Module-4 (Correlation & Regression)
No ratings yet
Module-4 (Correlation & Regression)
30 pages
Correlation
No ratings yet
Correlation
31 pages
Covariance and Correlation: Parthiban Rajendran
No ratings yet
Covariance and Correlation: Parthiban Rajendran
31 pages
Corr. & Reg
No ratings yet
Corr. & Reg
8 pages
Correlation of Experimental Data
No ratings yet
Correlation of Experimental Data
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Statistics and Probability: Quarter 4 - (Week 6)
No ratings yet
Statistics and Probability: Quarter 4 - (Week 6)
8 pages
CH-5 Correlation and Regression
No ratings yet
CH-5 Correlation and Regression
20 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Stats Notes Part 2
No ratings yet
Stats Notes Part 2
13 pages
2023 11 26 155650mba BSM - U Ii - Dec2023
No ratings yet
2023 11 26 155650mba BSM - U Ii - Dec2023
50 pages
Correlation & Regression
No ratings yet
Correlation & Regression
12 pages
Covariance and Correlation: Parthiban Rajendran
No ratings yet
Covariance and Correlation: Parthiban Rajendran
17 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
Correlation and Regression - Intro
No ratings yet
Correlation and Regression - Intro
24 pages
Correlation Regression
No ratings yet
Correlation Regression
20 pages
L 10 Correlation
No ratings yet
L 10 Correlation
57 pages
STA 201 Statistics Common Practical Manual 2019
No ratings yet
STA 201 Statistics Common Practical Manual 2019
144 pages
Lesson 18 - Correlation
No ratings yet
Lesson 18 - Correlation
3 pages
Senior High Stats: Pearson's r Guide
No ratings yet
Senior High Stats: Pearson's r Guide
9 pages
Chapter 4 (Correlation Part)
No ratings yet
Chapter 4 (Correlation Part)
16 pages
Chapter - 10.QM Sir Pac
No ratings yet
Chapter - 10.QM Sir Pac
8 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
BONGGA Statistics-and-Probability 4Q SLM8
No ratings yet
BONGGA Statistics-and-Probability 4Q SLM8
10 pages
Unit 4
No ratings yet
Unit 4
23 pages
Session 11
No ratings yet
Session 11
23 pages
Unit IV Theory
No ratings yet
Unit IV Theory
23 pages
Chap 7 - Two Sample Test
No ratings yet
Chap 7 - Two Sample Test
59 pages
Session 7
No ratings yet
Session 7
18 pages
Session 9
No ratings yet
Session 9
19 pages
Exam 1 GNUR 405
No ratings yet
Exam 1 GNUR 405
16 pages
Lecture Notes #4 Correlation
No ratings yet
Lecture Notes #4 Correlation
8 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
9 Correlation
No ratings yet
9 Correlation
123 pages
Env. Pollution, DM, ME 202
No ratings yet
Env. Pollution, DM, ME 202
20 pages
Dispersion
No ratings yet
Dispersion
23 pages
Edexcel Gcse Statistics Coursework Example
100% (2)
Edexcel Gcse Statistics Coursework Example
6 pages
Module 4 - Statistical Methods
No ratings yet
Module 4 - Statistical Methods
28 pages
IMS 504-Week 4&5 New
No ratings yet
IMS 504-Week 4&5 New
40 pages
Chapter Five1
No ratings yet
Chapter Five1
8 pages
Descriptive Statistics & Probability Analysis
100% (1)
Descriptive Statistics & Probability Analysis
5 pages
Approach To Comparative Politics
No ratings yet
Approach To Comparative Politics
8 pages
Workbook Part 1 Revised 2023 Header
No ratings yet
Workbook Part 1 Revised 2023 Header
19 pages
4L - 01 Feb2020 1630 Hrs
No ratings yet
4L - 01 Feb2020 1630 Hrs
19 pages
Correlation
No ratings yet
Correlation
5 pages
Population Genetics
No ratings yet
Population Genetics
6 pages
Short-Term Interest Rate Forecasting
No ratings yet
Short-Term Interest Rate Forecasting
11 pages
Heritability
No ratings yet
Heritability
5 pages
Grasshopper
No ratings yet
Grasshopper
5 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
7 pages
Economics Analysis Tools Guide
No ratings yet
Economics Analysis Tools Guide
6 pages
01 - Common Pitfalls in Statistical Analysis Measures o
No ratings yet
01 - Common Pitfalls in Statistical Analysis Measures o
5 pages
TUGAS1 - ADS - RA - 121450027 - Dimas Rizky Ramadhani
No ratings yet
TUGAS1 - ADS - RA - 121450027 - Dimas Rizky Ramadhani
6 pages
Biostatistics Final Project
No ratings yet
Biostatistics Final Project
9 pages
Determination of Isoelectric Point
No ratings yet
Determination of Isoelectric Point
3 pages
STA1505 Assignment 2 - 2025
No ratings yet
STA1505 Assignment 2 - 2025
3 pages
Statistic Projects
No ratings yet
Statistic Projects
4 pages
Vce Physics Ga23
No ratings yet
Vce Physics Ga23
3 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
3 pages
Understanding Skewness in Statistics
No ratings yet
Understanding Skewness in Statistics
13 pages
ML - Lab-3.ipynb - Colab
No ratings yet
ML - Lab-3.ipynb - Colab
2 pages
SM Cie 3
No ratings yet
SM Cie 3
3 pages
Assignment 3 Research Methodlogy 20040621068 PDF
No ratings yet
Assignment 3 Research Methodlogy 20040621068 PDF
2 pages
CCGPS Math 6 Grade Unit 6 Study Guide - Statistics: Name: Period: Date
No ratings yet
CCGPS Math 6 Grade Unit 6 Study Guide - Statistics: Name: Period: Date
5 pages

01 Correlation (Revised)

Uploaded by

01 Correlation (Revised)

Uploaded by

Correlation Analysis

Noor Md. Rahmatullah

The distribution in which we consider a pairs of observations simultaneously is

Correlation may be linear or non-linear. If the amount of change in one

Karl Pearson’s coefficient of correlation:

Assumptions underlying Karl Pearson’s correlation coefficient:

Geometrical interpretation of product moment correlation coefficient:

y 2nd Quadrant 1st Quadrant

2 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Now think about the three types of correlation:

Example: Judging correlation from scatter plots

3 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

r is not affected by:

4 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

5 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

cdΣu i  u vi  v  cdΣu i  u vi  v 

Why correlation is important?

6 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Degrees Positive Negative

7 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

8 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Nuance: Why is the maximum r = 1?

9 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Correlation Does Not Imply Causation

10 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Fig. 2: Nonlinear relationship between x and y rxy  0 where y  x 2

If two variables are independent, their correlation coefficient is zero. Is the

Coefficient of Determination (r2)

The relationship between two variables can be represented by the overlap of

11 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Fig. 3:Example of the coefficient of determination (percent common variance r2x100).

Area of overlap; r2x100 (percent common variance)

Probable Error of correlation coefficient:

If r is the correlation coefficient in sample of n pairs of observations, then its standard

Probable error of correlation coefficient is given by

Probable error is a measure for testing the reliability of an observed correlation

12 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

 = Population correlation coefficient.

13 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Expected population correlation coefficient   r  P.E.  0.6  0.057 .

Example: Find the coefficient of correlation r, given that

Cov(x,y) = −16·5, Var(x) = 2·85 and Var(y) = 100

Solution:- Putting the given values in the formula,

14 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Comment on the following:

a) rxy  0  x and y are independent.

 Indicate the correct answer using tick (√) mark:

a). The coefficient of correlation will have positive sign when

1. x is increasing, y is decreasing 2. both x and y are increasing

3. x is decreasing, y increasing 4. there is no change in x and y.

b). The coefficient of correlation

1. can not be positive 2. cannot be negative

3. is always positive 4. can be both positive as well as negative.

c). The correlation coefficient

1. can take any value between –1 and +1 2. is always less than –1

3. is always more than +1 4. can not be zero.

d). Probable error of r is

15 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

1) 2.1 2) -0.1 3) 0.1 4) 2.0

16 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

(x, y) centroid of sample points

17 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

 Scatterplot (or scatter diagram) is a graphical representation in which the

1. There is a direct cause and effect relationship

Common Errors in Correlation

18 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

Correlation is a __________ type of statistical analysis.

The square of the correlation coefficient or r2 is called the

The assumptions of the simple linear regression model include:

19 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

20 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

In a regression experiment involving 102 observations an analyst has estimated 0 as

In a study of 42 observations, the sample covariance between two variables, X1 and

Which of the following values is minimized through least squares estimation of 0

21 D:\Class Notes\Correlation (Rahmat)\01 Correlation (Revised).doc

You might also like