Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views81 pages

Regression

Uploaded by

qanhsofull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views81 pages

Regression

Uploaded by

qanhsofull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Chapter 14

Inference on the
Least-Squares
Regression Model
and Multiple
Regression

© 2010 Pearson Prentice Hall. All rights reserved


Section 14.1
Testing the
Significance of the
Least-Squares
Regression Model

© 2010 Pearson Prentice Hall. All rights reserved


Objectives
1. State the requirements of the least-squares
regression model
2. Compute the standard error of the estimate
3. Verify that the residuals are normally
distributed
4. Conduct inference on the slope
5. Construct a confidence interval about the
slope of the least-squares regression model

© 2010 Pearson Prentice Hall. All rights reserved 14-104


Objective 1
• State the Requirements of the Least-Squares
Regression Model

© 2010 Pearson Prentice Hall. All rights reserved 14-105


Requirement 1 for Inference on the
Least-Squares Regression Model

For any particular value of the explanatory variable x,


the mean of the corresponding responses in the
population depends linearly on x. That is,
µy|x = b1 x + b 0
for some numbers b0 and b1, where µy|x represents the
population mean response when the value of the
explanatory variable is x.

© 2010 Pearson Prentice Hall. All rights reserved 14-106


Requirement 2 for Inference on the
Least-Squares Regression Model

The response variables are normally distributed with


mean µy|x = b1 x + b 0 and standard deviation s.

© 2010 Pearson Prentice Hall. All rights reserved 14-107


“In Other Words”

When doing inference on the least-squares regression


model, we require (1) that for any explanatory variable,
x, the mean of the response variable, y, depends on the
value of x through a linear equation, and (2) that the
response variable, y, is normally distributed with a
constant standard deviation, s. The mean
increases/decreases at a constant rate depending on the
slope, while the variance remains constant.

© 2010 Pearson Prentice Hall. All rights reserved 14-108


© 2010 Pearson Prentice Hall. All rights reserved 14-109
The least-squares regression model is given by
y i = b1 x i + b 0 + ei
where
• yi is the value of the response variable for the
ith individual
• b0 and b1 are the parameters to be estimated
based on sample data
• xi is the value of the explanatory variable for the
ith individual
• ei is a random error term with mean 0 and
variance se2i = s 2, the error terms are
independent.
• i=1,…,n, where n is the sample size (number of
ordered pairs in the data set)

© 2010 Pearson Prentice Hall. All rights reserved 14-110


Objective 2
• Compute the Standard Error of the Estimate

© 2010 Pearson Prentice Hall. All rights reserved 14-111


The standard error of the estimate, se, is found
using the formula

se =
å (y i - yˆ i ) 2

=
å residuals 2

n -2 n -2

© 2010 Pearson Prentice Hall. All rights reserved 14-112


Parallel Example 2: Compute the Standard Error

Compute the standard error of the estimate for the


drilling data which is presented on the next slide.

© 2010 Pearson Prentice Hall. All rights reserved 14-113


Depth at Which Time to Drill
Drilling Begins, x 5 Feet, y
(in feet) (in minutes)
35 5.88
50 5.99
75 6.74
95 6.1
120 7.47
130 6.93
145 6.42
155 7.97
160 7.92
175 7.62
185 6.89
190 7.9
© 2010 Pearson Prentice Hall. All rights reserved 14-114
CAUTION!

Be sure to divide by n-2 when computing the


standard error of the estimate.

© 2010 Pearson Prentice Hall. All rights reserved 14-115


Objective 3
• Verify That the Residuals Are Normally
Distributed

© 2010 Pearson Prentice Hall. All rights reserved 14-116


Parallel Example 4: Compute the Standard Error

Verify that the residuals from the drilling


example are normally distributed.

© 2010 Pearson Prentice Hall. All rights reserved 14-117


Objective 4
• Conduct Inference on the Slope

© 2010 Pearson Prentice Hall. All rights reserved 14-118


Hypothesis Test Regarding the Slope
Coefficient, b1

To test whether two quantitative variables are linearly


related, we use the following steps provided that

1. the sample is obtained using random sampling.


2. the residuals are normally distributed with
constant error variance.

© 2010 Pearson Prentice Hall. All rights reserved 14-119


Step 1: Determine the null and alternative
hypotheses. The hypotheses can be
structured in one of three ways:

Two-tailed Left-Tailed Right-Tailed


H0: b1 = 0 H0: b1 = 0 H0: b1 = 0

H1: b1 ¹ 0 H1: b1 < 0 H1: b1 > 0

Step 2: Select a level of significance, a, depending


on the seriousness of making a Type I
error.

© 2010 Pearson Prentice Hall. All rights reserved 14-120


Step 3: Compute the test statistic
b1 - b1 b1
t0 = =
sb1 sb1
which follows Student’s t-distribution with
n-2 degrees of freedom. Remember, when
computing the test statistic, we assume the
null hypothesis to be true. So, we assume
that b1=0.

© 2010 Pearson Prentice Hall. All rights reserved 14-121


Classical Approach

Step 4: Use Table VI to determine the critical


value using n-2 degrees of freedom.

© 2010 Pearson Prentice Hall. All rights reserved 14-122


Classical Approach

Two-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-123


Classical Approach

Left-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-124


Classical Approach

Right-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-125


Classical Approach

Step 5: Compare the critical value with the test


statistic.

© 2010 Pearson Prentice Hall. All rights reserved 14-126


P-Value Approach

Step 4: Use Table VI to estimate the P-value using


n-2 degrees of freedom.

© 2010 Pearson Prentice Hall. All rights reserved 14-127


P-Value Approach

Two-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-128


P-Value Approach

Left-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-129


P-Value Approach

Right-Tailed

© 2010 Pearson Prentice Hall. All rights reserved 14-130


P-Value Approach

Step 5: If the P-value < a, reject the null hypothesis.

© 2010 Pearson Prentice Hall. All rights reserved 14-131


Step 6: State the conclusion.

© 2010 Pearson Prentice Hall. All rights reserved 14-132


CAUTION!

Before testing H0: b1 = 0, be sure to draw a residual


plot to verify that a linear model is appropriate.

© 2010 Pearson Prentice Hall. All rights reserved 14-133


Parallel Example 5: Testing for a Linear Relation

Test the claim that there is a linear relation between drill


depth and drill time at the a = 0.05 level of significance
using the drilling data.

© 2010 Pearson Prentice Hall. All rights reserved 14-134


Objective 5
• Construct a Confidence Interval about the
Slope of the Least-Squares Regression Model

© 2010 Pearson Prentice Hall. All rights reserved 14-135


Confidence Intervals for the Slope of the
Regression Line

A (1- a)×100% confidence interval for the slope of the


true regression line, b1, is given by the following
formulas:
se
Lower bound: b - t a × = b - t a × s
å (x i - x )
1 2 2 1 2 b1

se
Upper bound: b1 + ta 2 × = b1 + ta 2 × sb1
å (x i - x)
2

Here, ta/2 is computed using n-2 degrees of freedom.


© 2010 Pearson Prentice Hall. All rights reserved 14-136
Note: The confidence interval formula for b1 can be
computed only if the data are randomly obtained, the
residuals are normally distributed, and there is
constant error variance.

© 2010 Pearson Prentice Hall. All rights reserved 14-137


Parallel Example 7: Constructing a Confidence Interval for
the Slope of the True Regression Line

Construct a 95% confidence interval for the slope of the


least-squares regression line for the drilling example.

© 2010 Pearson Prentice Hall. All rights reserved 14-138


Section 14.2
Confidence and
Prediction Intervals

© 2010 Pearson Prentice Hall. All rights reserved


Objectives
1. Construct confidence intervals for a mean
response
2. Construct prediction intervals for an
individual response

© 2010 Pearson Prentice Hall. All rights reserved 14-140


Confidence intervals for a mean response are
intervals constructed about the predicted value of y, at a
given level of x, that are used to measure the accuracy
of the mean response of all the individuals in the
population.

Prediction intervals for an individual response are


intervals constructed about the predicted value of y that
are used to measure the accuracy of a single individual’s
predicted value.

© 2010 Pearson Prentice Hall. All rights reserved 14-141


Objective 1
• Construct Confidence Intervals for a Mean
Response

© 2010 Pearson Prentice Hall. All rights reserved 14-142


Confidence Interval for the Mean Response of y, yˆ .
A (1-a)×100% confidence interval for yˆ , the mean response
of y for a specified value of x, is given by

(x - x) * 2

Lower bound: 1
yˆ - ta 2 × se +
n å (x i - x )2

(x - x) * 2
Upper bound: 1
yˆ + ta 2 × se +
n å (x i - x )2

where x* is the given value of the explanatory variable, n is


the number of observations, and ta/2 is the critical value
with n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-143
Parallel Example 1: Constructing a Confidence Interval for a
Mean Response

Construct a 95% confidence interval about the predicted


mean time to drill 5 feet for all drillings started at a
depth of 110 feet.

© 2010 Pearson Prentice Hall. All rights reserved 14-144


Objective 2
• Prediction Interval for an Individual Response
about yˆ .

© 2010 Pearson Prentice Hall. All rights reserved 14-145


Prediction Interval for an Individual
Response about y ˆ.
A (1-a)×100% prediction interval for yˆ, the individual response of
y, is given by

(x - x ) * 2

Lower bound: 1
yˆ - ta 2 × se 1+ +
n å (x - x ) i
2

(x - x ) * 2
Upper bound: 1
yˆ + ta 2 × se 1+ +
n å (x - x ) i
2

where x* is the given value of the explanatory variable, n is


the number of observations, and ta/2 is the critical value
with n-2 degrees of freedom.
© 2010 Pearson Prentice Hall. All rights reserved 14-146
Parallel Example 2: Constructing a Prediction Interval for an
Individual Response

Construct a 95% prediction interval about the predicted


time to drill 5 feet for a single drilling started at a depth
of 110 feet.

© 2010 Pearson Prentice Hall. All rights reserved 14-147


Section 14.3
Multiple
Regression

© 2010 Pearson Prentice Hall. All rights reserved


Objectives
1. Obtain the correlation matrix
2. Use technology to find a multiple regression
equation
3. Interpret the coefficients of a multiple
regression equation
4. Determine R2 and adjusted R2

© 2010 Pearson Prentice Hall. All rights reserved 14-149


Objectives (continued)
5. Perform an F-test for lack of fit
6. Test individual regression coefficients for
significance
7. Construct confidence and prediction intervals
8. Build a regression model

© 2010 Pearson Prentice Hall. All rights reserved 14-150


Objective 1
• Obtain the Correlation Matrix

© 2010 Pearson Prentice Hall. All rights reserved 14-151


A multiple regression model is given by
y i = b 0 + b1 x1i + b 2 x 2i +  b k x ki + ei
where
• yi is the value of the response variable for the
ith individual
• b0, b1,…, bk are the parameters to be estimated
based on sample data
• x1i is the ith observation for the first explanatory
variable, x2i is the ith observation for the second
explanatory variable, and so on
• ei is a random error term that is normally se2i = s 2
distributed with mean 0 and variance
• The error terms are independent, and i=1,…,n,
where n is the sample size.
© 2010 Pearson Prentice Hall. All rights reserved 14-152
A correlation matrix shows the linear
correlation between each pair of variables under
consideration in a multiple regression model.

© 2010 Pearson Prentice Hall. All rights reserved 14-153


Multicollinearity exists between two
explanatory variables if they have a high linear
correlation.

© 2010 Pearson Prentice Hall. All rights reserved 14-154


CAUTION!

If two explanatory variables in the regression


model are highly correlated with each other,
watch out for strange results in the regression
output.

© 2010 Pearson Prentice Hall. All rights reserved 14-155


Parallel Example 1: Constructing a Correlation Matrix

As cheese ages, various chemical processes take place


that determine the taste of the final product. The next
two slides give concentrations of various chemicals in
30 samples of mature cheddar cheese and a subjective
measure of taste for each sample.

Source: Moore, David S., and George P. McCabe (1989)

© 2010 Pearson Prentice Hall. All rights reserved 14-156


Obs Taste ln(Acetic) ln(H2S) Lactic
1 12.3 4.543 3.135 0.86
2 20.9 5.159 5.043 1.53
3 39 5.366 5.438 1.57
4 47.9 5.759 7.496 1.81
5 5.6 4.663 3.807 0.99
6 25.9 5.697 7.601 1.09
7 37.3 5.892 8.726 1.29
8 21.9 6.078 7.966 1.78
9 18.1 4.898 3.85 1.29
10 21 5.242 4.174 1.58
11 34.9 5.74 6.142 1.68
12 57.2 6.446 7.908 1.9
13 0.7 4.477 2.996 1.06
14 25.9 5.236 4.942 1.3
15 54.9 6.151 6.752 1.52
© 2010 Pearson Prentice Hall. All rights reserved 14-157
Obs Taste ln(Acetic) ln(H2S) Lactic
16 40.9 6.365 9.588 1.74
17 15.9 4.787 3.912 1.16
18 6.4 5.412 4.7 1.49
19 18 5.247 6.174 1.63
20 38.9 5.438 9.064 1.99
21 14 4.564 4.949 1.15
22 15.2 5.298 5.22 1.33
23 32 5.455 9.242 1.44
24 56.7 5.855 10.199 2.01
25 16.8 5.366 3.664 1.31
26 11.6 6.043 3.219 1.46
27 26.5 6.458 6.962 1.72
28 0.7 5.328 3.912 1.25
29 13.4 5.802 6.685 1.08
30 5.5 6.176 4.787 1.25
© 2010 Pearson Prentice Hall. All rights reserved 14-158
Objective 2
• Use Technology to Find a Multiple Regression
Equation

© 2010 Pearson Prentice Hall. All rights reserved 14-159


Parallel Example 2: Multiple Regression

1. Use technology to obtain the least-squares regression


equation yˆ = b0 + b1 x1 + b2 x 2 + b3 x 3 where x1
represents the natural logarithm of the cheese’s acetic
acid concentration, x2 represents the natural logarithm
of the cheese’s hydrogen sulfide concentration, x3
represents the cheese’s lactic acid concentration and y
represents the subjective taste score (combined score
of several tasters).
2. Draw residual plots and a boxplot of the residuals to
assess the adequacy of the model.

© 2010 Pearson Prentice Hall. All rights reserved 14-160


Objective 3
• Interpret the Coefficients of a Multiple
Regression Equation

© 2010 Pearson Prentice Hall. All rights reserved 14-161


Parallel Example 3: Interpreting Regression Coefficients

Interpret the regression coefficients for the least-


squares regression equation found in Parallel
Example 2.

© 2010 Pearson Prentice Hall. All rights reserved 14-162


Objective 4
• Determine R2 and Adjusted R2

© 2010 Pearson Prentice Hall. All rights reserved 14-163


2 2
The adjusted R2, denoted R or R , is the adjusted
adj
coefficient of determination. It modifies the value of
R2 based on the sample size, n, and the number of
explanatory variables, k. The formula for the
adjusted R2 is
æ n -1 ö
R = 1- ç ÷( )
2 2
adj 1- R
è n - k -1ø

© 2010 Pearson Prentice Hall. All rights reserved 14-164


“In Other Words”

The value of R2 always increases by adding one


more explanatory variable.

© 2010 Pearson Prentice Hall. All rights reserved 14-165


CAUTION!

Never use R2 to compare regression models


with a different number of explanatory
variables. Rather, use the adjusted R2.

Ko bao giờ sử dụng R^2 để chọn ra mô


hình nào phù hợp khi kết hợp các biến x
mà hãy sử dụng adj R^2

© 2010 Pearson Prentice Hall. All rights reserved 14-166


Parallel Example 4: Coefficient of Determination

For the regression model obtained in Parallel


Example 2, determine the coefficient of
determination and the adjusted R2.

© 2010 Pearson Prentice Hall. All rights reserved 14-167


Objective 5
• Perform an F-Test for Lack of Fit

© 2010 Pearson Prentice Hall. All rights reserved 14-168


Test Statistic for Multiple Regression

mean square due to regression MSR


F0 = =
mean square error MSE
tử số
with k-1 degrees of freedom in the numerator and n-k
mẫu số
degrees of freedom in the denominator, where k is
the number of explanatory variables and n is the
sample size.

© 2010 Pearson Prentice Hall. All rights reserved 14-169


F-Test Statistic for
Multiple Regression Using R2

R n - (k + 1)
2
F0 = 2
×
1- R k
where
R2 is the coefficient of determination
k is the number of explanatory variables
n is the sample size.

© 2010 Pearson Prentice Hall. All rights reserved 14-170


Decision Rule for Testing
H0: b1=b2= ··· =bk=0

If the P-value is less than the level of significance, a,


reject the null hypothesis. Otherwise, do not reject
the null hypothesis.

© 2010 Pearson Prentice Hall. All rights reserved 14-171


“In Other Words”

The null hypothesis states that there is no linear


relation between the explanatory variables and the
response variable. The alternative hypothesis states
that there is a linear relation between at least one
explanatory variable and the response variable.

© 2010 Pearson Prentice Hall. All rights reserved 14-172


Parallel Example 5: Inference on the Regression Model

Test H0: b1=b2=b3=0 versus H1: at least one bi¹0 for


the multiple regression model for the cheese taste data.

© 2010 Pearson Prentice Hall. All rights reserved 14-173


CAUTION!

If we reject the null hypothesis that the slope


coefficients are zero, then we are saying that at
least one of the slopes is different from zero, not
that they all are different from zero.
Nếu mà chúng ta reject giả thuyết H0 (cái mà nói là
coeffficient của các slope đều = 0), thì chúng ta phải
nói Có ít nhất 1 trong những slope khác 0, chứ ko
phải là tất cả slope đều khác 0

© 2010 Pearson Prentice Hall. All rights reserved 14-174


Objective 6
• Test Individual Regression Coefficients for
Significance

test từng coefficient riêng lẻ

© 2010 Pearson Prentice Hall. All rights reserved 14-175


Parallel Example 6: Testing the Significance of Individual
Predictor Variables

Test the following hypotheses for the cheese taste data:


a) H0: b1=0 versus H1: b1¹0
b) H0: b2=0 versus H1: b2¹0
c) H0: b3=0 versus H1: b3¹0

© 2010 Pearson Prentice Hall. All rights reserved 14-176


Objective 7
• Construct Confidence and Prediction Intervals

© 2010 Pearson Prentice Hall. All rights reserved 14-177


Parallel Example 7: Testing the Significance of Individual
Predictor Variables

a) Construct a 95% confidence interval for the mean


taste score of all cheddar cheeses whose natural
logarithm of hydrogen sulfide concentration is 5.5
and whose lactic acid concentration is 1.75.
b) Construct a 95% prediction interval for the taste
score of an individual cheddar cheese whose natural
logarithm of hydrogen sulfide concentration is 5.5
and whose lactic acid concentration is 1.75.

© 2010 Pearson Prentice Hall. All rights reserved 14-178


Objective 8
• Build a Regression Model

© 2010 Pearson Prentice Hall. All rights reserved 14-179


Guidelines in Developing a
Multiple Regression Model

1. Construct a correlation matrix to help identify the


explanatory variables that have a high correlation with
the response variable. In addition, look at the correlation
matrix for any indication that the explanatory variables
are correlated with each other. Remember just because
two explanatory variables have high correlation, this does
not mean that multicollinearity is a problem, but it is a
tip-off to watch out for strange results from the
regression model.

© 2010 Pearson Prentice Hall. All rights reserved 14-180


Guidelines in Developing a
Multiple Regression Model
2. Determine the multiple regression model using all the
explanatory variables that have been identified by the
researcher.
3. If the null hypothesis that all the slope coefficients are
zero has been rejected, we proceed to look at the
individual slope coefficients. Identify those slope
coefficients that have small t-test statistics (and therefore
large P-values). These are candidates for explanatory
variables that may be removed from the model.
We should only remove one explanatory variables at a
time from the model before recomputing the
regression model.
© 2010 Pearson Prentice Hall. All rights reserved 14-181
Guidelines in Developing a
Multiple Regression Model

4. Repeat Step 3 until all slope coefficients are significantly


different from zero.
5. Be sure that the model is appropriate by drawing
residual plots.

© 2010 Pearson Prentice Hall. All rights reserved 14-182

You might also like