Statistical Decision Making
AQ054-3-M and Version 3
Regression Analysis
Topic & Structure of The Lesson
Correlation Analysis
Linear Regression analysis
Multiple Regression analysis
Inferences for Linear Regression
SPSS Application
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <2> of 9
Learning Outcomes
• At the end of this topic, You should be able
to
apply correlation to determine the relationship between
different variables
use SPSS to display results and relationship
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <3> of 9
Key Terms You Must Be Able To
Use
• If you have mastered this topic, you should be able to use the following
terms correctly in your assignments and exams:
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <4> of 9
Correlation Analysis
Introduction
Correlation & Regression are concerned with
measuring the linear relationship between two
variables.
Scattergram is used to illustrate any
relationship between two variables.
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 5 (of 40)
Correlation Analysis
Correlation analysis is used to
measure strength of the association
(linear relationship) between two
variables
Only concerned with strength of the
relationship
No causal effect is implied
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 6 (of 40)
Scatter Diagrams
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 7 (of 40)
Scatter Diagrams
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 8 (of 40)
Scatter Diagrams
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 9 (of 40)
Scatter Diagrams
Types of Regression Models
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 11 (of 40)
Linear Regression Analysis
3 common methods used to determine a
regression line
inspection method
semi-average method
least square method
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 12 (of 40)
Linear Regression Analysis
Population Linear Regression
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 12 of 66
Linear Regression Analysis
Population Linear Regression (continued)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 13 of 66
Linear Regression Analysis
Estimated Regression Model
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 14 of 66
Linear Regression Analysis
Least square method
the standard method of obtaining a regression line.
For any set of bivariate, there are two regression
line which can be obtained
x on y regression line
used for estimating x given a value of y
y on x regression line
used for estimating y given a value of x.
Note that for this syllabus, only the y on x regression
line is dealt with.
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 13 (of 40)
Linear Regression Analysis
The Least Squares Equation
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 16 of 66
Quick Review Question
Example:
A real estate agent wishes to examine the
relationship between the selling price of a
home and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (y) = house price in
$1000s
Independent variable (x) = square feet
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 17 of 66
Sample Data
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 18 of 66
Regression Using SPSS
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 19 of 66
house price 98.248 0.11 (square feet)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 20 of 66
Graphical Presentation
House price model: scatter plot and regression
line
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
house price 98.24833 0.10977 (square feet)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 21 of 66
Correlation Analysis
It is a technique used to measure the strength
of relationship between two variables by
measuring the degree of ‘scatter’ of the data
values.
The less scatter the data values are, the
stronger the correlation.
Two types of correlation
Positive (direct)
Negative (inverse)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 16 (of 40)
Measures of correlation
Product moment correlation coefficient
Coefficient of determination
Spearman rank correlation coefficient
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 17 (of 40)
Product moment correlation coefficient, r
It measures the extent to which two variables
move in sympathy with or in opposition to one
another.
n xy x y
r
n x 2 x 2 n y 2 y 2
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 18 (of 40)
The correlation coefficient, r lies between 0
and 1.
When r = 0, it signifies there is no correlation
present
When r = 1, it signifies perfect positive
correlation
When r = -1, it signifies perfect negative
correlation
The further away r is from 0, the stronger is
the correlation.
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 19 (of 40)
Examples of appropriate r values
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 20 (of 40)
Coefficient of determination, r2
It indicates the proportion of variance in the
dependent variable that is explained
statistically by knowledge of the independent
variable and vice versa.
Notice that, since –1 r +1, it follows that 0
r2 +1
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 22 (of 40)
Coefficient of Determination, R2
The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
The coefficient of determination is also
called R-squared and is denoted as R2
where 0 R 1
2
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 28 of 66
Examples of Approximate R2 Values
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 29 of 66
y
0 < R2 < 1
Weaker linear relationship
between x and y:
x
Some but not all of the
y
variation in y is explained by
variation in x
x
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 30 of 66
R2 = 0
y
No linear relationship
between x and y:
The value of Y does not
x depend on x. (None of the
R2 = 0
variation in y is explained by
variation in x)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 31 of 66
Spearman rank correlation coefficient, rs
It can be used:
as an approximation to the product moment
coefficient
With non-numeric data that can be ranked
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 24 (of 40)
Procedure for obtaining rs
Rank the x values, rx
Rank the y values, ry
For each pair of ranks, calculate d2 =(rx – ry)2
Calculate d2
The value of the rank correlation coefficient can
then be calculated as below:
6 d 2
rs 1
n n2 1
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 25 (of 40)
Practical difficulties in drawing conclusions
from correlation coefficient
A high correlation coefficient does not necessarily
imply that the variables are related to one another-
spurious correlation
A low correlation coefficient between two variables
does not necessarily mean that there is little
relationship between them but there are also some
additional factors exerting an influence.
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 29 (of 40)
The product moment correlation coefficient is
also a measure a linear relationship and will
give a low value if the relationship, although
close, is curvilinear.
Sample size is small, need to treat results with
caution.
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 30 (of 40)
Interpretation of outcomes
Interpretation of the Slope & the Intercept
b0 is the estimated average value of y when
the value of x is zero
b1 is the estimated change in the average
value of y as a result of a one-unit change in x
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 36 of 66
Interpretation of the Intercept, b0
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 37 of 66
Interpretation of the Slope Coefficient, b1
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 38 of 66
Inferences of Linear Regression
Inference about the slope:
t test for a population slope
Is there a linear relationship between x and y?
Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
Test statistic
where:
b1 β1
t b1 = Sample regression slope coefficient
sb1 β1 = Hypothesized slope
sb1 = Estimator of the standard error of the
slope
d.f. n 2
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 39 of 66
Quick Review Question
House Price
in $1000s
Square Feet Estimated Regression Equation:
(x)
(y)
245 1400 house price 98.25 0.1098 (sq.ft.)
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Does square footage of the house
405 2350
affect its sales price?
324 2450
319 1425
255 1700
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 40 of 66
Test Statistic (from table): t = 3.329
From Excel output: sb1
b1 t
H0: β1 = 0 Coefficient Standard P-
s Error t Stat value
HA: β1 0 1.6929 0.1289
Intercept 98.24833 58.03348 6 2
Square 3.3293 0.0103
d.f. = 10-2 = 8 Feet 0.10977 0.03297 8 9
Decision:
a/2=.025 a/2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H0
There is sufficient evidence
-tα/2 tα/2
0 that square footage affects
-2.3060 2.3060 3.329
house price
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 41 of 66
Regression Analysis for Description
Confidence Interval Estimate of the Slope:
b1 t /2sb1 d.f. = n - 2
Excel Printout for House Prices:
Coefficient Standard Upper
s Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 42 of 66
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 43 of 66
P-value
An alternative to comparing t-values in
determining whether or not to reject H0.
For a two-tailed hypothesis, we can reject H0
if the reported two-tailed significance level in
our output is less than 0.05
For one-tailed hypothesis, we can reject H0 if
one-half of the reported two-tailed
significance level is less than 0.05
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 30 (of 74)
Quick Review Question
Example:
Estimated Regression Equation:
House Price
Square Feet
in $1000s
(y)
(x) house price 98.25 0.1098 (sq.ft.)
245 1400
312 1600
279 1700 Predict the price for a house
308 1875 with 2000 square feet
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 45 of 66
house price 98.25 0.1098 (sq.ft.)
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 46 of 66
Multiple Regression Analysis
practical extension of the simple regression
model.
allows us to build a model with several
independent variables instead of just one
variable.
very complex and normally tackled by
computer
widely used for intermediate term forecasting
AQ054-3-M-Statistical Decision Making Regression Analysis
Multiple Regression Analysis
We may write the population regression
model as
Y = 0 +1x1 +2x2 + … +kxk +
with k regressor variables.
The parameters j, j = 0,1,…,k are called the
regression coefficients. The parameters j
represents the expected change in response Y
per unit change in xj when all the remaining
regressors are held constant
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 60 (of 74)
Example of estimated multiple regression equation:
Ŷ b 0 b1x1 b 2 x 2
Ŷ= dependent variable
b0 = a constant
x1 and x2 = values of the two independent
variables
b1 and b2 = coefficients for the two independent
variables
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 62 (of 74)
Quick Review Question
AQ054-3-M-Statistical Decision Making Regression Analysis Slide 23 (of 40)
Summary of Main Teaching Points
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <51> of 9
Question and Answer Session
Q&A
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <52> of 9
What we will cover next
The End
AQ054-3-M-Statistical Decision Making Regression Analysis Slide <53> of 9