ECON2026:
STATISTICAL METHODS II
The University of the West Indies
Department of Economics
Slide 1
Topic 6B
Multiple Regression
Multiple Regression Model
Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Slide 2
Multiple Regression
We continue our study of regression analysis by
considering situations involving two or more
independent variables.
This subject area, called multiple regression
analysis, enables us to consider more factors and
thus obtain better estimates than are possible with
simple linear regression.
Slide 3
Multiple Regression Model
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
x1, x2, . . . xp and an error term is:
y = β0 + β1x1 + β2x2 + . . . + βpxp + ε
where:
β0, β1, β2, . . . , βp are the parameters, and
ε is a random variable called the error term
Slide 4
Multiple Regression Equation
Multiple Regression Equation
The equation that describes how the mean
value of y is related to x1, x2, . . . xp is:
E(y) = β0 + β1x1 + β2x2 + . . . + βpxp
Slide 5
Estimated Multiple Regression Equation
Estimated Multiple Regression Equation
y^ = b0 + b1x1 + b2x2 + . . . + bpxp
A simple random sample is used to compute sample
statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters β0, β1, β2, . . . , βp.
Slide 6
Estimation Process
Multiple Regression Model
E(y) = β0 + β1x1 + β2x2 +. . .+ βpxp + ε Sample Data:
x 1 x 2 . . . xp y
Multiple Regression Equation
. . . .
E(y) = β0 + β1x1 + β2x2 +. . .+ βpxp . . . .
Unknown parameters are
β0, β1, β2, . . . , βp
Estimated Multiple
b0, b1, b2, . . . , bp Regression Equation
provide estimates of
β0, β1, β2, . . . , βp Sample statistics are
b0, b1, b2, . . . , bp
Slide 7
Least Squares Method
Least Squares Criterion
Computation of Coefficient Values
The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software packages to
perform the calculations.
Slide 8
Least Squares Method
Computation of Coefficient Values
The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software packages to
perform the calculations.
The emphasis will be on how to interpret the
computer output rather than on how to make the
multiple regression computations.
Slide 9
Multiple Regression Model
Example: Programmer Salary Survey
A software firm collected data for a sample of 20
computer programmers. A suggestion was made that
regression analysis could be used to determine if
salary was related to the years of experience and the
score on the firm’s programmer aptitude test.
The years of experience, score on the aptitude test
test, and corresponding annual salary ($1000s) for a
sample of 20 programmers is shown on the next slide.
Slide 10
Multiple Regression Model
Exper. Test Salary Exper. Test Salary
(Yrs.) Score ($000s) (Yrs.) Score ($000s)
4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0
Slide 11
Multiple Regression Model
Suppose we believe that salary (y) is related to
the years of experience (x1) and the score on the
programmer aptitude test (x2) by the following
regression model:
y = β0 + β1x1 + β2x2 + ε
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test
Slide 12
Solving for the Estimates of β0, β1, β2
Least Squares
Input Data Output
x1 x2 y Computer b0 =
Package b1 =
4 78 24
for Solving b2 =
7 100 43
. . . Multiple
R2 =
. . . Regression
3 89 30 Problems etc.
Slide 13
Solving for the Estimates of β0, β1, β2
Regression Equation Output
Predictor Coef SE Coef t stat p
Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478
Slide 14
Estimated Regression Equation
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
Note: Predicted salary will be in thousands of dollars.
Slide 15
Interpreting the Coefficients
In multiple regression analysis, we interpret each
regression coefficient as follows:
bi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when all
other independent variables are held constant.
Slide 16
Interpreting the Coefficients
b1 = 1.404
Salary is expected to increase by $1,404 for
each additional year of experience (when the variable
score on programmer attitude test is held constant).
Slide 17
Interpreting the Coefficients
b2 = 0.251
Salary is expected to increase by $251 for each
additional point scored on the programmer aptitude
test (when the variable years of experience is held
constant).
Slide 18
Multiple Coefficient of Determination
Relationship Among SST, SSR, SSE
SST = SSR + SSE
= +
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Slide 19
Multiple Coefficient of Determination
ANOVA Output
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855
SSR
SST
Slide 20
Multiple Coefficient of Determination
R2 = SSR/SST
R2 = 500.3285/599.7855 = .83418
Slide 21
Adjusted Multiple Coefficient
of Determination
Adding independent variables, even ones that are
not statistically significant, causes the prediction
errors to become smaller, thus reducing the sum of
squares due to error, SSE.
Because SSR = SST – SSE, when SSE becomes smaller,
SSR becomes larger, causing R2 = SSR/SST to
increase.
The adjusted multiple coefficient of determination
compensates for the number of independent
variables in the model.
Slide 22
Adjusted Multiple Coefficient
of Determination
Slide 23
Assumptions About the Error Term ε
The error ε is a random variable with mean of zero.
The variance of ε , denoted by σ 2, is the same for all
values of the independent variables.
The values of ε are independent.
The error ε is a normally distributed random variable
reflecting the deviation between the y value and the
expected value of y given by β0 + β1x1 + β2x2 + . . + βpxp.
Slide 24
Testing for Significance
In simple linear regression, the F and t tests provide
the same conclusion.
In multiple regression, the F and t tests have different
purposes.
Slide 25
Testing for Significance: F Test
The F test is used to determine whether a significant
relationship exists between the dependent variable
and the set of all the independent variables.
The F test is referred to as the test for overall
significance.
Slide 26
Testing for Significance: t Test
If the F test shows an overall significance, the t test is
used to determine whether each of the individual
independent variables is significant.
A separate t test is conducted for each of the
independent variables in the model.
We refer to each of these t tests as a test for individual
significance.
Slide 27
Testing for Significance: F Test
Hypotheses H0: β1 = β2 = . . . = βp = 0
Ha: One or more of the parameters
is not equal to zero.
Test Statistics F = MSR/MSE
Rejection Rule Reject H0 if p-value < α or if F > Fα ,
where Fα is based on an F distribution
with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.
Slide 28
F Test for Overall Significance
Hypotheses H0: β1 = β2 = 0
Ha: One or both of the parameters
is not equal to zero.
Rejection Rule For α = .05 and d.f. = 2, 17; F.05 = 3.59
Reject H0 if p-value < .05 or F > 3.59
Slide 29
F Test for Overall Significance
ANOVA Output
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855
p-value used to test for
overall significance
Slide 30
F Test for Overall Significance
Test Statistics F = MSR/MSE
= 250.16/5.85 = 42.76
Conclusion p-value < .05, so we can reject H0.
(Also, F = 42.76 > 3.59)
Slide 31
Testing for Significance: t Test
Hypotheses
Test Statistics
Rejection Rule Reject H0 if p-value < α or
if t < -tα/2 or t > tα/2 where tα/2
is based on a t distribution
with n - p - 1 degrees of freedom.
Slide 32
t Test for Significance
of Individual Parameters
Hypotheses
Rejection Rule For α = .05 and d.f. = 17, t.025 = 2.11
Reject H0 if p-value < .05, or
if t < -2.11 or t > 2.11
Slide 33
t Test for Significance
of Individual Parameters
Regression Equation Output
Predictor Coef SE Coef t stat p
Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478
t statistic and p-value used to test for the
individual significance of “Experience”
Slide 34
t Test for Significance
of Individual Parameters
Test Statistics
Conclusions Reject both H0: β1 = 0 and H0: β2 = 0.
Both independent variables are
significant.
Slide 35
End of Topic 6, Part B
Slide 36