Chapter 14
Inference on the
Least-Squares
Regression Model
and Multiple
Regression
© 2010 Pearson Prentice Hall. All rights reserved
Section 14.1
Testing the
Significance of the
Least-Squares
Regression Model
© 2010 Pearson Prentice Hall. All rights reserved
Objectives
1. State the requirements of the least-squares
regression model
2. Compute the standard error of the estimate
3. Verify that the residuals are normally
distributed
4. Conduct inference on the slope
5. Construct a confidence interval about the
slope of the least-squares regression model
© 2010 Pearson Prentice Hall. All rights reserved 14-104
Objective 1
• State the Requirements of the Least-Squares
Regression Model
© 2010 Pearson Prentice Hall. All rights reserved 14-105
Requirement 1 for Inference on the
Least-Squares Regression Model
For any particular value of the explanatory variable x,
the mean of the corresponding responses in the
population depends linearly on x. That is,
µy|x = b1 x + b 0
for some numbers b0 and b1, where µy|x represents the
population mean response when the value of the
explanatory variable is x.
© 2010 Pearson Prentice Hall. All rights reserved 14-106
Requirement 2 for Inference on the
Least-Squares Regression Model
The response variables are normally distributed with
mean µy|x = b1 x + b 0 and standard deviation s.
© 2010 Pearson Prentice Hall. All rights reserved 14-107
“In Other Words”
When doing inference on the least-squares regression
model, we require (1) that for any explanatory variable,
x, the mean of the response variable, y, depends on the
value of x through a linear equation, and (2) that the
response variable, y, is normally distributed with a
constant standard deviation, s. The mean
increases/decreases at a constant rate depending on the
slope, while the variance remains constant.
© 2010 Pearson Prentice Hall. All rights reserved 14-108
© 2010 Pearson Prentice Hall. All rights reserved 14-109
The least-squares regression model is given by
y i = b1 x i + b 0 + ei
where
• yi is the value of the response variable for the
ith individual
• b0 and b1 are the parameters to be estimated
based on sample data
• xi is the value of the explanatory variable for the
ith individual
• ei is a random error term with mean 0 and
variance se2i = s 2, the error terms are
independent.
• i=1,…,n, where n is the sample size (number of
ordered pairs in the data set)
© 2010 Pearson Prentice Hall. All rights reserved 14-110
Objective 2
• Compute the Standard Error of the Estimate
© 2010 Pearson Prentice Hall. All rights reserved 14-111
The standard error of the estimate, se, is found
using the formula
se =
å (y i - yˆ i ) 2
=
å residuals 2
n -2 n -2
© 2010 Pearson Prentice Hall. All rights reserved 14-112
Parallel Example 2: Compute the Standard Error
Compute the standard error of the estimate for the
drilling data which is presented on the next slide.
© 2010 Pearson Prentice Hall. All rights reserved 14-113
Depth at Which Time to Drill
Drilling Begins, x 5 Feet, y
(in feet) (in minutes)
35 5.88
50 5.99
75 6.74
95 6.1
120 7.47
130 6.93
145 6.42
155 7.97
160 7.92
175 7.62
185 6.89
190 7.9
© 2010 Pearson Prentice Hall. All rights reserved 14-114
CAUTION!
Be sure to divide by n-2 when computing the
standard error of the estimate.
© 2010 Pearson Prentice Hall. All rights reserved 14-115
Objective 3
• Verify That the Residuals Are Normally
Distributed
© 2010 Pearson Prentice Hall. All rights reserved 14-116
Parallel Example 4: Compute the Standard Error
Verify that the residuals from the drilling
example are normally distributed.
© 2010 Pearson Prentice Hall. All rights reserved 14-117
Objective 4
• Conduct Inference on the Slope
© 2010 Pearson Prentice Hall. All rights reserved 14-118
Hypothesis Test Regarding the Slope
Coefficient, b1
To test whether two quantitative variables are linearly
related, we use the following steps provided that
1. the sample is obtained using random sampling.
2. the residuals are normally distributed with
constant error variance.
© 2010 Pearson Prentice Hall. All rights reserved 14-119
Step 1: Determine the null and alternative
hypotheses. The hypotheses can be
structured in one of three ways:
Two-tailed Left-Tailed Right-Tailed
H0: b1 = 0 H0: b1 = 0 H0: b1 = 0
H1: b1 ¹ 0 H1: b1 < 0 H1: b1 > 0
Step 2: Select a level of significance, a, depending
on the seriousness of making a Type I
error.
© 2010 Pearson Prentice Hall. All rights reserved 14-120
Step 3: Compute the test statistic
b1 - b1 b1
t0 = =
sb1 sb1
which follows Student’s t-distribution with
n-2 degrees of freedom. Remember, when
computing the test statistic, we assume the
null hypothesis to be true. So, we assume
that b1=0.
© 2010 Pearson Prentice Hall. All rights reserved 14-121
Classical Approach
Step 4: Use Table VI to determine the critical
value using n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-122
Classical Approach
Two-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-123
Classical Approach
Left-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-124
Classical Approach
Right-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-125
Classical Approach
Step 5: Compare the critical value with the test
statistic.
© 2010 Pearson Prentice Hall. All rights reserved 14-126
P-Value Approach
Step 4: Use Table VI to estimate the P-value using
n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-127
P-Value Approach
Two-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-128
P-Value Approach
Left-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-129
P-Value Approach
Right-Tailed
© 2010 Pearson Prentice Hall. All rights reserved 14-130
P-Value Approach
Step 5: If the P-value < a, reject the null hypothesis.
© 2010 Pearson Prentice Hall. All rights reserved 14-131
Step 6: State the conclusion.
© 2010 Pearson Prentice Hall. All rights reserved 14-132
CAUTION!
Before testing H0: b1 = 0, be sure to draw a residual
plot to verify that a linear model is appropriate.
© 2010 Pearson Prentice Hall. All rights reserved 14-133
Parallel Example 5: Testing for a Linear Relation
Test the claim that there is a linear relation between drill
depth and drill time at the a = 0.05 level of significance
using the drilling data.
© 2010 Pearson Prentice Hall. All rights reserved 14-134
Objective 5
• Construct a Confidence Interval about the
Slope of the Least-Squares Regression Model
© 2010 Pearson Prentice Hall. All rights reserved 14-135
Confidence Intervals for the Slope of the
Regression Line
A (1- a)×100% confidence interval for the slope of the
true regression line, b1, is given by the following
formulas:
se
Lower bound: b - t a × = b - t a × s
å (x i - x )
1 2 2 1 2 b1
se
Upper bound: b1 + ta 2 × = b1 + ta 2 × sb1
å (x i - x)
2
Here, ta/2 is computed using n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-136
Note: The confidence interval formula for b1 can be
computed only if the data are randomly obtained, the
residuals are normally distributed, and there is
constant error variance.
© 2010 Pearson Prentice Hall. All rights reserved 14-137
Parallel Example 7: Constructing a Confidence Interval for
the Slope of the True Regression Line
Construct a 95% confidence interval for the slope of the
least-squares regression line for the drilling example.
© 2010 Pearson Prentice Hall. All rights reserved 14-138
Section 14.2
Confidence and
Prediction Intervals
© 2010 Pearson Prentice Hall. All rights reserved
Objectives
1. Construct confidence intervals for a mean
response
2. Construct prediction intervals for an
individual response
© 2010 Pearson Prentice Hall. All rights reserved 14-140
Confidence intervals for a mean response are
intervals constructed about the predicted value of y, at a
given level of x, that are used to measure the accuracy
of the mean response of all the individuals in the
population.
Prediction intervals for an individual response are
intervals constructed about the predicted value of y that
are used to measure the accuracy of a single individual’s
predicted value.
© 2010 Pearson Prentice Hall. All rights reserved 14-141
Objective 1
• Construct Confidence Intervals for a Mean
Response
© 2010 Pearson Prentice Hall. All rights reserved 14-142
Confidence Interval for the Mean Response of y, yˆ .
A (1-a)×100% confidence interval for yˆ , the mean response
of y for a specified value of x, is given by
(x - x) * 2
Lower bound: 1
yˆ - ta 2 × se +
n å (x i - x )2
(x - x) * 2
Upper bound: 1
yˆ + ta 2 × se +
n å (x i - x )2
where x* is the given value of the explanatory variable, n is
the number of observations, and ta/2 is the critical value
with n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-143
Parallel Example 1: Constructing a Confidence Interval for a
Mean Response
Construct a 95% confidence interval about the predicted
mean time to drill 5 feet for all drillings started at a
depth of 110 feet.
© 2010 Pearson Prentice Hall. All rights reserved 14-144
Objective 2
• Prediction Interval for an Individual Response
about yˆ .
© 2010 Pearson Prentice Hall. All rights reserved 14-145
Prediction Interval for an Individual
Response about y ˆ.
A (1-a)×100% prediction interval for yˆ, the individual response of
y, is given by
(x - x ) * 2
Lower bound: 1
yˆ - ta 2 × se 1+ +
n å (x - x ) i
2
(x - x ) * 2
Upper bound: 1
yˆ + ta 2 × se 1+ +
n å (x - x ) i
2
where x* is the given value of the explanatory variable, n is
the number of observations, and ta/2 is the critical value
with n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-146
Parallel Example 2: Constructing a Prediction Interval for an
Individual Response
Construct a 95% prediction interval about the predicted
time to drill 5 feet for a single drilling started at a depth
of 110 feet.
© 2010 Pearson Prentice Hall. All rights reserved 14-147
Section 14.3
Multiple
Regression
© 2010 Pearson Prentice Hall. All rights reserved
Objectives
1. Obtain the correlation matrix
2. Use technology to find a multiple regression
equation
3. Interpret the coefficients of a multiple
regression equation
4. Determine R2 and adjusted R2
© 2010 Pearson Prentice Hall. All rights reserved 14-149
Objectives (continued)
5. Perform an F-test for lack of fit
6. Test individual regression coefficients for
significance
7. Construct confidence and prediction intervals
8. Build a regression model
© 2010 Pearson Prentice Hall. All rights reserved 14-150
Objective 1
• Obtain the Correlation Matrix
© 2010 Pearson Prentice Hall. All rights reserved 14-151
A multiple regression model is given by
y i = b 0 + b1 x1i + b 2 x 2i + b k x ki + ei
where
• yi is the value of the response variable for the
ith individual
• b0, b1,…, bk are the parameters to be estimated
based on sample data
• x1i is the ith observation for the first explanatory
variable, x2i is the ith observation for the second
explanatory variable, and so on
• ei is a random error term that is normally se2i = s 2
distributed with mean 0 and variance
• The error terms are independent, and i=1,…,n,
where n is the sample size.
© 2010 Pearson Prentice Hall. All rights reserved 14-152
A correlation matrix shows the linear
correlation between each pair of variables under
consideration in a multiple regression model.
© 2010 Pearson Prentice Hall. All rights reserved 14-153
Multicollinearity exists between two
explanatory variables if they have a high linear
correlation.
© 2010 Pearson Prentice Hall. All rights reserved 14-154
CAUTION!
If two explanatory variables in the regression
model are highly correlated with each other,
watch out for strange results in the regression
output.
© 2010 Pearson Prentice Hall. All rights reserved 14-155
Parallel Example 1: Constructing a Correlation Matrix
As cheese ages, various chemical processes take place
that determine the taste of the final product. The next
two slides give concentrations of various chemicals in
30 samples of mature cheddar cheese and a subjective
measure of taste for each sample.
Source: Moore, David S., and George P. McCabe (1989)
© 2010 Pearson Prentice Hall. All rights reserved 14-156
Obs Taste ln(Acetic) ln(H2S) Lactic
1 12.3 4.543 3.135 0.86
2 20.9 5.159 5.043 1.53
3 39 5.366 5.438 1.57
4 47.9 5.759 7.496 1.81
5 5.6 4.663 3.807 0.99
6 25.9 5.697 7.601 1.09
7 37.3 5.892 8.726 1.29
8 21.9 6.078 7.966 1.78
9 18.1 4.898 3.85 1.29
10 21 5.242 4.174 1.58
11 34.9 5.74 6.142 1.68
12 57.2 6.446 7.908 1.9
13 0.7 4.477 2.996 1.06
14 25.9 5.236 4.942 1.3
15 54.9 6.151 6.752 1.52
© 2010 Pearson Prentice Hall. All rights reserved 14-157
Obs Taste ln(Acetic) ln(H2S) Lactic
16 40.9 6.365 9.588 1.74
17 15.9 4.787 3.912 1.16
18 6.4 5.412 4.7 1.49
19 18 5.247 6.174 1.63
20 38.9 5.438 9.064 1.99
21 14 4.564 4.949 1.15
22 15.2 5.298 5.22 1.33
23 32 5.455 9.242 1.44
24 56.7 5.855 10.199 2.01
25 16.8 5.366 3.664 1.31
26 11.6 6.043 3.219 1.46
27 26.5 6.458 6.962 1.72
28 0.7 5.328 3.912 1.25
29 13.4 5.802 6.685 1.08
30 5.5 6.176 4.787 1.25
© 2010 Pearson Prentice Hall. All rights reserved 14-158
Objective 2
• Use Technology to Find a Multiple Regression
Equation
© 2010 Pearson Prentice Hall. All rights reserved 14-159
Parallel Example 2: Multiple Regression
1. Use technology to obtain the least-squares regression
equation yˆ = b0 + b1 x1 + b2 x 2 + b3 x 3 where x1
represents the natural logarithm of the cheese’s acetic
acid concentration, x2 represents the natural logarithm
of the cheese’s hydrogen sulfide concentration, x3
represents the cheese’s lactic acid concentration and y
represents the subjective taste score (combined score
of several tasters).
2. Draw residual plots and a boxplot of the residuals to
assess the adequacy of the model.
© 2010 Pearson Prentice Hall. All rights reserved 14-160
Objective 3
• Interpret the Coefficients of a Multiple
Regression Equation
© 2010 Pearson Prentice Hall. All rights reserved 14-161
Parallel Example 3: Interpreting Regression Coefficients
Interpret the regression coefficients for the least-
squares regression equation found in Parallel
Example 2.
© 2010 Pearson Prentice Hall. All rights reserved 14-162
Objective 4
• Determine R2 and Adjusted R2
© 2010 Pearson Prentice Hall. All rights reserved 14-163
2 2
The adjusted R2, denoted R or R , is the adjusted
adj
coefficient of determination. It modifies the value of
R2 based on the sample size, n, and the number of
explanatory variables, k. The formula for the
adjusted R2 is
æ n -1 ö
R = 1- ç ÷( )
2 2
adj 1- R
è n - k -1ø
© 2010 Pearson Prentice Hall. All rights reserved 14-164
“In Other Words”
The value of R2 always increases by adding one
more explanatory variable.
© 2010 Pearson Prentice Hall. All rights reserved 14-165
CAUTION!
Never use R2 to compare regression models
with a different number of explanatory
variables. Rather, use the adjusted R2.
Ko bao giờ sử dụng R^2 để chọn ra mô
hình nào phù hợp khi kết hợp các biến x
mà hãy sử dụng adj R^2
© 2010 Pearson Prentice Hall. All rights reserved 14-166
Parallel Example 4: Coefficient of Determination
For the regression model obtained in Parallel
Example 2, determine the coefficient of
determination and the adjusted R2.
© 2010 Pearson Prentice Hall. All rights reserved 14-167
Objective 5
• Perform an F-Test for Lack of Fit
© 2010 Pearson Prentice Hall. All rights reserved 14-168
Test Statistic for Multiple Regression
mean square due to regression MSR
F0 = =
mean square error MSE
tử số
with k-1 degrees of freedom in the numerator and n-k
mẫu số
degrees of freedom in the denominator, where k is
the number of explanatory variables and n is the
sample size.
© 2010 Pearson Prentice Hall. All rights reserved 14-169
F-Test Statistic for
Multiple Regression Using R2
R n - (k + 1)
2
F0 = 2
×
1- R k
where
R2 is the coefficient of determination
k is the number of explanatory variables
n is the sample size.
© 2010 Pearson Prentice Hall. All rights reserved 14-170
Decision Rule for Testing
H0: b1=b2= ··· =bk=0
If the P-value is less than the level of significance, a,
reject the null hypothesis. Otherwise, do not reject
the null hypothesis.
© 2010 Pearson Prentice Hall. All rights reserved 14-171
“In Other Words”
The null hypothesis states that there is no linear
relation between the explanatory variables and the
response variable. The alternative hypothesis states
that there is a linear relation between at least one
explanatory variable and the response variable.
© 2010 Pearson Prentice Hall. All rights reserved 14-172
Parallel Example 5: Inference on the Regression Model
Test H0: b1=b2=b3=0 versus H1: at least one bi¹0 for
the multiple regression model for the cheese taste data.
© 2010 Pearson Prentice Hall. All rights reserved 14-173
CAUTION!
If we reject the null hypothesis that the slope
coefficients are zero, then we are saying that at
least one of the slopes is different from zero, not
that they all are different from zero.
Nếu mà chúng ta reject giả thuyết H0 (cái mà nói là
coeffficient của các slope đều = 0), thì chúng ta phải
nói Có ít nhất 1 trong những slope khác 0, chứ ko
phải là tất cả slope đều khác 0
© 2010 Pearson Prentice Hall. All rights reserved 14-174
Objective 6
• Test Individual Regression Coefficients for
Significance
test từng coefficient riêng lẻ
© 2010 Pearson Prentice Hall. All rights reserved 14-175
Parallel Example 6: Testing the Significance of Individual
Predictor Variables
Test the following hypotheses for the cheese taste data:
a) H0: b1=0 versus H1: b1¹0
b) H0: b2=0 versus H1: b2¹0
c) H0: b3=0 versus H1: b3¹0
© 2010 Pearson Prentice Hall. All rights reserved 14-176
Objective 7
• Construct Confidence and Prediction Intervals
© 2010 Pearson Prentice Hall. All rights reserved 14-177
Parallel Example 7: Testing the Significance of Individual
Predictor Variables
a) Construct a 95% confidence interval for the mean
taste score of all cheddar cheeses whose natural
logarithm of hydrogen sulfide concentration is 5.5
and whose lactic acid concentration is 1.75.
b) Construct a 95% prediction interval for the taste
score of an individual cheddar cheese whose natural
logarithm of hydrogen sulfide concentration is 5.5
and whose lactic acid concentration is 1.75.
© 2010 Pearson Prentice Hall. All rights reserved 14-178
Objective 8
• Build a Regression Model
© 2010 Pearson Prentice Hall. All rights reserved 14-179
Guidelines in Developing a
Multiple Regression Model
1. Construct a correlation matrix to help identify the
explanatory variables that have a high correlation with
the response variable. In addition, look at the correlation
matrix for any indication that the explanatory variables
are correlated with each other. Remember just because
two explanatory variables have high correlation, this does
not mean that multicollinearity is a problem, but it is a
tip-off to watch out for strange results from the
regression model.
© 2010 Pearson Prentice Hall. All rights reserved 14-180
Guidelines in Developing a
Multiple Regression Model
2. Determine the multiple regression model using all the
explanatory variables that have been identified by the
researcher.
3. If the null hypothesis that all the slope coefficients are
zero has been rejected, we proceed to look at the
individual slope coefficients. Identify those slope
coefficients that have small t-test statistics (and therefore
large P-values). These are candidates for explanatory
variables that may be removed from the model.
We should only remove one explanatory variables at a
time from the model before recomputing the
regression model.
© 2010 Pearson Prentice Hall. All rights reserved 14-181
Guidelines in Developing a
Multiple Regression Model
4. Repeat Step 3 until all slope coefficients are significantly
different from zero.
5. Be sure that the model is appropriate by drawing
residual plots.
© 2010 Pearson Prentice Hall. All rights reserved 14-182