Course no: EMBA 502: Business Mathematics and
Statistics
Topic: Correlation and Regression
M. Amir Hossain PhD
Correlation and Regression
The most commonly used forms of bi-variate
statistical analysis
Useful in making business and economic decisions
Helpful in identifying the nature of relationship
among many business and economic variables
Recognize that there is a quantifiable relationship
between two or more variables
One variable depends on another and can be
determined by it
Correlation and Regression
The variables :
Students GPAs and amount of time they
spend on studying
A firm’s sale and expenditure on advertisement
Dependent variable and Independent variable
Determination of dependent and independent
variable is crucial
Usually
X : Independent variable
Y : Dependent variable
Scatter Diagram
A plot of the paired observations of X and Y on a graph
Graphically shows the relationship between two
variables
Common practice is to place the dependent variable on
Y–axis and independent variable on X–axis
Ex. Sales and advertisement expenditures (in million
Taka) of a firm on different months are
Sales 3 6 4 6 3 5 4
Advertisement 2 4 2 3 1 3 2.5
Scatter Diagram
Scatter Diagram
Correlation Analysis
• Correlation Analysis: A group of statistical
techniques used to measure the strength of the
relationship (correlation) between two variables.
• Scatter Diagram: A chart that portrays the
relationship between the two variables of interest.
• Dependent Variable: The variable that is being
predicted or estimated.
• Independent Variable: The variable that provides
the basis for estimation. It is the predictor
variable.
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure
of the strength of the relationship between
two variables.
It requires interval or ratio-scaled data (variables).
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong
correlation.
Values close to 0.0 indicate no linear correlation.
Negative values indicate an inverse relationship and
positive values indicate a direct relationship.
The Coefficient of Correlation, r
Perfect Negative Correlation
10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation
10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation
10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Strong Positive Correlation
10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Formula for r
n( XY ) ( X )( Y )
r
n( X 2 2
) ( X ) n Y Y
2 2
Coefficient of Determination
The Coefficient of Determination, r2 - the proportion
of the total variation in the dependent variable Y
that is explained or accounted for by the variation
in the independent variable X.
The coefficient of determination is the square of the
coefficient of correlation, and ranges from 0 to 1.
Example: Sales and advertisement expense data,
r = 0.759 and r2 = (0.759)2 = 0.576
57.6% variation of sales can be explained by the
variation in advertisement expenses
Example: The following sample observations were
randomly selected:
X:4 5 3 6 7
Y:4 6 5 7 8
Determine the coefficient of correlation and coefficient
of determination and interpret.
Ans: Here, ∑X = 25, ∑Y = 30, ∑X2 = 135, ∑Y2 = 190,
∑XY = 159, r = 0.9, r2 = 0.82.
Comment: 82% of the variation in Y can be explained by
X
Example: Dan Ireland, the student body president at Toledo State
University, is concerned about the cost of textbooks. To provide
insight into the problem he selects a sample of eight textbooks
currently on sale in the bookstores. He decides to study the
relationship between the number of pages in the text and the cost.
Compute the correlation coefficient.
No. of Pages Price (in $)
500 28
700 25
800 33
600 24
400 23
500 27
600 21
800 31
Calculate r and comment on the relationship between the variables.
Also r2
Regression Analysis
In regression analysis an equation is developed to
express the relationship between dependent and
independent variables
The equation must be linear
Purpose: to determine the regression equation; it is used to
predict the value of the dependent variable (Y) based on
the independent variable (X).
Procedure: select a sample from the population and list the
paired data for each observation; draw a scatter diagram
to give a visual portrayal of the relationship; determine
the regression equation.
Regression Analysis
General form of linear regression model
Y = a + bX + e
Where,
Y : dependent variable
a : intercept term
b : slope of the line
X : independent variable
e : error term
Want to estimate a and b such that ∑e2 is minimum
Regression Analysis
Linear Regression Model
The relationship between X and Y is described by a
linear function
Changes in Y are assumed to be caused by changes in X
Linear regression population equation model
Yi β0 β1x i ε i
Where 0 and 1 are the population model coefficients
and is a random error term.
Simple Linear Regression Model
The population regression model:
Population Random
Population Independent
Slope Error
Y intercept Variable
Coefficient term
Yi β0 β1Xi ε i
Dependent
Variable
Linear component Random Error
component
Simple Linear Regression Model
Y Yi β 0 β1X i ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi
Regression Analysis
We estimate β0 and β1 such that ∑e2 is minimum
The error sum of squares ∑e2 will be minimum if
β̂ 0 b0 y bˆ1 x
x - x y y
β̂1 b1
x - x
2
Simple Linear Regression Equation
The simple linear regression equation provides an
estimate of the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value for
observation i intercept
Value of x for
yˆ i b0 b1x i observation i
Interpretation of the Slope and the Intercept
• b0 is the estimated average value of y when
the value of x is zero (if x = 0 is in the range
of observed x values)
• b1 is the estimated change in the average value
of y as a result of a one-unit change in x
Prediction
• The regression equation can be used to predict a
value for y, given a particular x
• For a specified value, xn+1 , the predicted value is
yˆ n1 b0 b1x n1
Regression Analysis
The error sum of squares ∑e2 will be minimum if
These estimates are known as least squares estimates
Sign of is similar to that of correlation
coefficient r
Estimated value of dependent variable:
Estimated error is:
Regression Analysis
is the average predicted value of Y for
any X.
is the Y-intercept, or the estimated Y
value when X=0
is the slope of the line, or the average
change in Y’ for each change of one unit in X
Regression Analysis (Coefficient of determination)
r 2 = Percentage of total variation in the dependent
variable explained by the independent variable.
From a linear regression model one can write
r 2 = (Explained variation/total variation)
= (Total variation – Unexplained variation)
Total variation
Regression Analysis (Coefficient of determination)
Total Variation (TSS) =
Unexplained variation (ESS) =
Explained variation (RSS) =
Coefficient of variation (r2) =
Regression Analysis (Coefficient of determination)
RSS ESS
r
2
1
TSS TSS
ESS
SY X
n2