CHAPTER 4
CORRELATION & REGRESSION ANALYSIS
Correlation Analysis
The study of the relationship between variables
or
A group of techniques to measure the strength of the
association between two variables
Note that here we consider two variables
One is the independent variable
The second is the dependent variable,
Variables
The Independent Variable provides the basis for
estimation. It is the predictor variable. It is scaled
on X axis.
The Dependent Variable is the variable being
predicted or estimated. It is scaled on Y-axis.
Scatter Diagram
A chart that portrays the relationship between the
two variables.
Dependent variable – vertical (or Y) axis
Independent variable – horizontal (or X) axis
EXAMPLE 1
Mattamagadi, the president of India, is concerned
about the cost to students of textbooks. He believes
there is a relationship between the number of pages
in the text and the selling price of the book. To
provide insight into the problem he selects a sample
of eight textbooks currently on sale in the bookstore.
Draw a scatter diagram.
EXAMPLE 1
Book Page Price ($)
Operation Re search 500 84
Basic Algebra 700 75
Economics 800 99
Management Science 600 72
Business Management 400 69
Industrial law 500 81
Human Resource 600 63
Information Technology 800 93
Example 1
Scatter diagram showing no. of pages and price
120
100
80
Price
60
40
20
0
0 200 400 600 800 1000
# of pages
Correlation Analysis
Correlation Analysis is a group of statistical
techniques used to measure the strength of the
association between two variables.
The Coefficient of Correlation (r) is a measure of
the strength of the relationship between two
variables.
The Coefficient of Correlation (r)
A measure of the strength of the linear relationship
between two variables.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong
correlation.
Negative values indicate an inverse relationship
Positive values indicate a direct relationship.
Values close to 0.0 indicate weak correlation
The strength and direction of the correlation
Perfect No Perfect
Negative correlation Positive
correlation correlation
Moderate Moderate
Negative Positive
correlation correlation
Strong Weak Weak Strong
Negative Negative Positive Positive
correlation correlation correlation correlation
-1.00 -0.50 0 0.50 1.00
Negative correlation Positive correlation
Perfect Negative Correlation
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Correlation Coefficient (r): -1
Direction of relationship:
As one variable increases, the other
decreases.
Graph:
A straight line with a downward slope.
Perfect Positive Correlation
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
•Correlation Coefficient (r): +1
•Direction of Relationship:
As one variable increases, the other
increases in the same way.
•Graph:
A straight line with an upward slope (45-
degree angle).
Zero Correlation
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
•Correlation Coefficient (r): 0
•Direction of Relationship:
No relationship between the variables.
Graph:
Data points are spread out randomly, with no
apparent trend or pattern
EXAMPLE 1
Compute the correlation coefficient, interpret the strength.
Determine the coefficient of determination and interpret.
Coefficient of correlation (r)
Formula for coefficient of correlation or (correlation
coefficient )
n( XY ) ( X )( Y )
r
[n( X 2 ) ( X ) 2 ][ n( Y 2 ) ( Y ) 2 ]
Coefficient of Determination 2
(r )
The proportion of the total variation in the dependent
variable (Y) that is explained or accounted for, by the
variation in the independent variable (X).
It is the square of the coefficient of correlation.
It ranges from 0 to 1.
It does not give any information on the direction of
the relationship between the variables.
Example 1 continued
Book Page-X Price ($)-Y X*X Y*Y XY
Operation Research 500 84 250000 7056 42000
Basic Algebra 700 75 490000 5625 52500
Economics 800 99 640000 9801 79200
Management Science 600 72 360000 5184 43200
Business Management 400 69 160000 4761 27600
Industrial law 500 81 250000 6561 40500
Human Resource 600 63 360000 3969 37800
Information Technology 800 93 640000 8649 74400
total 4900 636 3150000 51606 397200
n( XY ) ( X )( Y )
r
[n( X 2 ) ( X ) 2 ][ n( Y 2 ) ( Y ) 2 ]
8( 397 ,200 ) ( 4,900 )( 636 )
8( 3,150 ,000 ( 4,900 ) 8( 51,606 ) ( 636 )
2 2
0.614
EXAMPLE 1 continued
The correlation between the number of pages and the
selling price of the book is r =0.614.
This indicates a moderate association between the variable.
Coefficient of determination r2 = 0.376
37.6% of the variation in the price of the book is accounted
by variation on the page number
Regression Analysis
In regression analysis we use the independent
variable (X) to estimate the dependent variable (Y).
The relationship between the variables is linear.
Both independent and dependent variable must be
interval or ratio scale.
The least squares criterion is used to determine the
regression equation.
Regression Equation
Y’ = a + bX,
where:
Y’ is the average predicted value of Y (or estimated value of y) for a
selected value of X.
a is the constant or Y-intercept.
It is the estimated Y’ value when X=0
b is the slope of the line,
Shows the amount of change in Y’ for a change of one unit in X
Positive value of b indicates a direct relationship between two variables
Negative value of b indicates an inverse relationship
Regression Equation
a is computed using;
SY SX
a b
n n
b is computed using;
n ( S XY ) ( S X )( S Y )
b
S 2
n( X ) ( X ) S 2
Develop a regression equation for the information given in
EXAMPLE 1 that can be used to estimate the selling price based
on the number of pages
n XY X Y SY SX
b a b
n X X
2 2
n n
8(397 ,200 ) (4,900 )(636 ) 636 4,900
b .05143 a 0.05143 48.0
8(3,150 ,000 ) (4,900 ) 2 8 8
The regression equation is:
Y’ = 48.0 + .05143X
The equation crosses the Y-axis at $48.
A book with no pages would cost $48.
The slope of the line is 0.05143.
The sign of the b value and the sign of r will always
be the same.
Chapter 5: Scientific paper writing
Reading assignment
THANKS A LOT!!!