06-05-10
Multiple Regression
Multiple Regression
Part 1. Basic Multiple Regression
The Linear Regression Model
The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
Model Utility: R2, Adjusted R2, and the F Test
Testing Significance of an Independent
Variable
Confidence Intervals and Prediction Intervals
• Part 2 Using Squared and Interaction Terms
The Quadratic Regression Model
Interaction
1
06-05-10
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
Dummy Variables to Model Qualitative Variables
The Partial F Test: Testing a Portion of a Model
Part 4 Model Building and Model Diagnostics
Model Building and Model Diagnostics
Model Building and the Effects of Multicollineartity
Diagnostics for Detecting Outlying and
Influential Observations
The Linear Regression Model
The linear regression model relating y to x1, x2, …, xk is
y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + β k xk + ε
where
µ y|x1 , x2 ,..., xk = β0 + β1 x1 + β2 x2 + ... + βk xk is the mean value of the
dependent variable y when the values of the independent
variables are x1, x2, …, xk.
β0 , β1 , β2 ,..., βk are the regression parameters relating the mean
value of y to x1, x2, …, xk.
ε is an error term that describes the effects on y of all factors other
than the independent variables x1, x2, …, xk .
2
06-05-10
Example: The Linear Regression
Model
Example : The Fuel Consumption Case
Average Hourly Fuel Consumption
Week Temperature, x1 (F) Chill Index, x2 y (MMcf)
1 28.0 18 12.4
2 28.0 14 11.7
3 32.5 24 12.4
4 39.0 22 10.8
5 45.9 8 9.4
6 57.8 16 9.5
7 58.1 1 8.0
8 62.5 0 7.5
y = β0 + β1 x1 + β2 x2 + ε
The Linear Regression Model
Illustrated
Example : The Fuel Consumption Case
3
06-05-10
The Regression Model Assumptions
Model y= µ y|x1 , x2 ,..., xk + ε = β0 + β1 x1 + β2 x2 + ... + βk xk + ε
Assumptions about the model error terms, ε’s
Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms σ2 is,
the same for every combination values of x1, x2, …, xk.
Normality The error terms follow a normal distribution
for every combination values of x1, x2, …, xk.
Independence The values of the error terms are
statistically independent of each other.
Least Squares Estimates and Prediction
Estimation/Prediction Equation:
yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k
is the point estimate of the mean value of the dependent variable when
the values of the independent variables are x01, x02, …, x0k. It is also the
point prediction of an individual value of the dependent variable
when the values of the independent variables are x01, x02, …, x0k.
b1, b2, …, bk are the least squares point estimates of the parameters
β1, β 2, …, β k.
x01, x02, …, x0k are specified values of the independent predictor
variables x1, x2, …, xk.
4
06-05-10
Example: Least Squares Estimation
Example : The Fuel Consumption Case
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predictor Coef StDev T P
Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013
S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
Predicted Values (Temp = 40, Chill = 10)
Fit StDev Fit 95.0% CI 95.0% PI
10.333 0.170 ( 9.895, 10.771) ( 9.293, 11.374)
Example: Point Predictions and
Residuals
Example : The Fuel Consumption Case
Observed Fuel Predicted Fuel
Average Hourly Consumption Consumption Residual
Week Temperature, x1 (F) Chill Index, x2 y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
1 28.0 18 12.4 12.0733 0.3267
2 28.0 14 11.7 11.7433 -0.0433
3 32.5 24 12.4 12.1631 0.2369
4 39.0 22 10.8 11.4131 -0.6131
5 45.9 8 9.4 9.6372 -0.2372
6 57.8 16 9.5 9.2260 0.2740
7 58.1 1 8.0 7.9616 0.0384
8 62.5 0 7.5 7.4831 0.0169
5
06-05-10
Mean Square Error and Standard Error
SSE = ∑ e i2 = ∑ ( y i − yˆ i ) 2 Sum of Squared Errors
SSE Mean Square Error, point estimate
s 2 = MSE =
n- ( k + 1 ) of residual variance σ2
SSE Standard Error, point estimate of
s = MSE =
n-(k + 1) residual standard deviation σ
Example :The Fuel Consumption Case
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
SSE 0.674
s 2 = MSE = = = 0.1348 s = s 2 = 0.1348 = 0.3671
n-(k + 1) 8 − 3
The Multiple Coefficient of Determination
The multiple coefficient of determination R2 is
Explained variation
R2 =
Total variation
R2 is the proportion of the total variation in y explained by the linear
regression model
Total variation = Explained variation + Unexplained variation
Total variation = ∑ (yi − y )2 Total Sum of Squares (SSTO)
Explained variation = ∑ (yˆ i − y )2 Regression Sum of Squares (SSR)
Unexplained variation = ∑ (yi − yˆ i )2 Error Sum of Squares (SSE)
Multiple correlation coefficient , R = R 2
6
06-05-10
The Adjusted R2
The adjusted multiple coefficient of determination is
k n − 1
R 2 = R2 −
n − 1 n − ( k + 1)
Example : The Fuel Consumption Case
S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
24.875 2 8 − 1
R2 = = 0.974, R 2 = 0.974 − = 0.963
25.549 8 − 1 8 − (2 + 1)
F Test for Linear Regression Model
To test H0: β 1= β 2 = …= β κ = 0 versus
Ha: At least one of the β 1, β 2, …, β k is not equal to 0
Test Statistic:
(Explained variation)/k
F(model) =
(Unexplained variation)/[n - (k + 1)]
Reject H0 in favor of Ha if:
F(model) > Fα or
p-value < α
Fα is based on k numerator and n-(k+1) denominator degrees of freedom.
7
06-05-10
Example: F Test for Linear Regression
Example :The Fuel Consumption Case
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
Test Statistic:
(Explained variation)/k 24.875 / 2
F(model) = = = 92.30
(Unexplained variation)/[n - (k + 1)] 0.674 /(8 − 3)
Reject H0 at α level of significance, since
F-test at α = 0.05
F(model) = 92.30 > 5.79 = F.05 and level of significance
p - value ≈ 0.000 < 0.05 = α
Fα is based on 2 numerator and 5 denominator degrees of freedom.
12.5 Testing Significance of the Independent
Variable
If the regression assumptions hold, we can reject H0: β j = 0 at the α
level of significance (probability of Type I error equal to α) if and only if
the appropriate rejection point condition holds or, equivalently, if the
corresponding p-value is less than α.
Alternative Reject H0 if: p-Value
Ha : β j > 0 t > tα Area under t distributi on right of t
Ha : β j < 0 t < −tα Area under t distributi on left of t
Ha : β j ≠ 0 t > tα / 2 , that is Twice area under t distributi on right of t
t > tα / 2 or t < −tα / 2
Test Statistic α)% Confidence Interval for β j
100(1-α
bj
t= [b j ± tα / 2 sb j ]
sbj
tα, tα/2 and p-values are based on n – (k+1) degrees of freedom.
8
06-05-10
Example: Testing and Estimation
for βs
Example : The Fuel Consumption Case
Predictor Coef StDev T P
Constant 13.1087 0.8557 15.32 0.000
Temp -0.09001 0.01408 -6.39 0.001
Chill 0.08249 0.02200 3.75 0.013
Test Interval
b
t= 2 =
0.08249
= 3.75 > 2.571 = t.025 [b2 ± tα / 2 sb2 ] =
sb2 0.02200 [0.08249 ± (2.571)(0.02200)] =
[0.08249 ± 0.05656] =
p − value = 2 × P (t > 3.75) = 0.013
[0.02593, 0.13905]
Chill is significant at the α = 0.05 level, but not at α = 0.01
tα, tα/2 and p-values are based on 5 degrees of freedom.
Confidence and Prediction Intervals
Prediction: yˆ = b0 + b1 x01 + b2 x02 + ... + bk x0 k
If the regression assumptions hold,
100(1 - α)% confidence interval for the mean value of y
[ŷ ± t α/2 s( y − yˆ ) ] s( y − yˆ ) = s Distance value
100(1 - α)% prediction interval for an individual value of y
[ŷ ± t α/2 s yˆ ], s yˆ = s 1 + Distance value
Distance value (requires matrix algebra), see Appendix G on CD-ROM
tα/2 is based on n-(k+1) degrees of freedom
9
06-05-10
Example: Confidence and Prediction
Intervals
Example : The Fuel Consumption Case
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predicted Values (Temp = 40, Chill = 10)
Fit StDev Fit 95.0% CI 95.0% PI
10.333 0.170 (9.895, 10.771) (9.293,11.374)
95% Confidence Interval 95% Prediction Interval
[ŷ ± t α/2 s Distance value ] [ŷ ± t α/2 s 1 + Distance value ]
[10.333 ± ( 2.571)( 0.3671) 0.2144515 ] [10.333 ± (2.571)( 0.3671) 1 + 0.2144515 ]
[10.333 ± 0.438] [10.333 ± 1.041]
[9.895,10 .771] [9.292,11 .374]
The Quadratic Regression Model
Model y= β0 + β1 x + β2 x
2
10
06-05-10
Example: Quadratic Regression Model
Example :The Gasoline Additive Case
Units of Mileage,
Additive, x y (MPG)
0 25.8
0 26.1
0 25.4
1 29.6
1 29.2
1 29.8
2 32.0
2 31.4
2 31.7
3 31.7
3 31.5
3 31.2
4 29.4
4 29.0
4 29.5
Example: Quadratic Regression Model
Example : The Gasoline Additive Case
Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Predictor Coef StDev T P
Constant 25.7152 0.1554 165.43 0.000
Units 4.9762 0.1841 27.02 0.000
UnitsSq -1.01905 0.04414 -23.09 0.000
S = 0.2861 R-Sq = 98.6% R-Sq(adj) = 98.3%
Analysis of Variance
Source DF SS MS F P
Regression 2 67.915 33.958 414.92 0.000
Residual Error 12 0.982 0.082
Total 14 68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
Fit StDev Fit 95.0% CI 95.0% PI
31.7901 0.1111 ( 31.5481, 32.0322) ( 31.1215, 32.4588)
yˆ = 25.7152 + 4.9762( 2.44) − 1.01905(2.44) 2 = 31.7901 mpg
11
06-05-10
Interaction
Radio and TV Print Sales Radio and TV Print Sales
Example Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
Sales
Region
Expenditures
x1
Expenditures
x2
Volume
y
12.13: The 1 1 1 3.27 14 3 4 17.99
2 1 2 8.38 15 3 5 19.85
Bonner 3 1 3 11.28 16 4 1 9.46
4 1 4 14.5 17 4 2 12.61
Frozen 5 1 5 19.63 18 4 3 15.5
Foods Case 6 2 1 5.84 19 4 4 17.68
7 2 2 10.01 20 4 5 21.02
8 2 3 12.46 21 5 1 12.23
9 2 4 16.67 22 5 2 13.58
10 2 5 19.83 23 5 3 16.77
11 3 1 8.51 24 5 4 20.56
12 3 2 10.14 25 5 5 21.05
13 3 3 14.75
Modeling Interaction
Model y= β0 + β1 x1 + β2 x2 + β3 x1 x2 x1x2 is a cross-product or interaction term
Example : The Bonner Frozen Food Case Minitab Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Predictor Coef StDev T P
Constant -2.3497 0.6883 -3.41 0.003
RadioTV 2.3611 0.2075 11.38 0.000
Print 4.1831 0.2075 20.16 0.000
Interact -0.34890 0.06257 -5.58 0.000
S = 0.6257 R-Sq = 98.6% R-Sq(adj) = 98.4%
Analysis of Variance
Source DF SS MS F P
Regression 3 590.41 196.80 502.67 0.000
Residual Error 21 8.22 0.39
Total 24 598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
Fit StDev Fit 95.0% CI 95.0% PI
19.799 0.265 ( 19.247, 20.351) ( 18.385, 21.213)
12
06-05-10
Using Dummy Variables to Model
Qualitative Independent Variable
Example : The Electronics World Case
Number of Location Sales
Households Dummy Volume
Store x Location DM y
1 161 Street 0 157.27
2 99 Street 0 93.28
3 135 Street 0 136.81
4 120 Street 0 123.79
5 164 Street 0 153.51
6 221 Mall 1 241.74
7 179 Mall 1 201.54
8 204 Mall 1 206.71
9 214 Mall 1 229.78
10 101 Mall 1 135.22
Location Dummy Variable
1 if a store is in a mall location
DM =
0 otherwise
Example: Regression with a Dummy
Variable
Example : The Electronics World Case
Sales = 17.4 + 0.851 Households + 29.2 DM
Predictor Coef StDev T P
Constant 17.360 9.447 1.84 0.109
Househol 0.85105 0.06524 13.04 0.000
DM 29.216 5.594 5.22 0.001
S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 21412 10706 199.32 0.000
Residual Error 7 376 54
Total 9 21788
13
06-05-10
The Partial F Test: Testing the
Significance of a Portion of a Regression
Model
Complete model : y= β0 + β1 x1 + ... + β g x g + β g +1 x g +1 + ... + βk xk + ε
Reduced model : y= β0 + β1 x1 + ... + β g x g + ε
To test H0: β g+1= β g+2 = …= β k = 0 versus
Ha: At least one of the β g+1, β g+2, …, β k is not equal to 0
(SSE R - SSE C )/(k - g)
Partial F Statistic: F=
SSE C /[n - (k + 1)]
Reject H0 in favor of Ha if:
F > Fα or
p-value < α
Fα is based on k-g numerator and n-(k+1) denominator degrees of
freedom.
Model Building and the Effects of
Multicollinearity
Example: The Sale Territory Performance Case
Sales Time MktPoten Adver MktShare Change Accts WkLoad Rating
3669.88 43.10 74065.11 4582.88 2.51 0.34 74.86 15.05 4.9
3473.95 108.13 58117.30 5539.78 5.51 0.15 107.32 19.97 5.1
2295.10 13.82 21118.49 2950.38 10.91 -0.72 96.75 17.34 2.9
4675.56 186.18 68521.27 2243.07 8.27 0.17 195.12 13.40 3.4
6125.96 161.79 57805.11 7747.08 9.15 0.50 180.44 17.64 4.6
2134.94 8.94 37806.94 402.44 5.51 0.15 104.88 16.22 4.5
5031.66 365.04 50935.26 3140.62 8.54 0.55 256.10 18.80 4.6
3367.45 220.32 35602.08 2086.16 7.07 -0.49 126.83 19.86 2.3
6519.45 127.64 46176.77 8846.25 12.54 1.24 203.25 17.42 4.9
4876.37 105.69 42053.24 5673.11 8.85 0.31 119.51 21.41 2.8
2468.27 57.72 36829.71 2761.76 5.38 0.37 116.26 16.32 3.1
2533.31 23.58 33612.67 1991.85 5.43 -0.65 142.28 14.51 4.2
2408.11 13.82 21412.79 1971.52 8.48 0.64 89.43 19.35 4.3
2337.38 13.82 20416.87 1737.38 7.80 1.01 84.55 20.02 4.2
4586.95 86.99 36272.00 10694.20 10.34 0.11 119.51 15.26 5.5
2729.24 165.85 23093.26 8618.61 5.15 0.04 80.49 15.87 3.6
3289.40 116.26 26879.59 7747.89 6.64 0.68 136.58 7.81 3.4
2800.78 42.28 39571.96 4565.81 5.45 0.66 78.86 16.00 4.2
3264.20 52.84 51866.15 6022.70 6.31 -0.10 136.58 17.44 3.6
3453.62 165.04 58749.82 3721.10 6.35 -0.03 138.21 17.98 3.1
1741.45 10.57 23990.82 860.97 7.37 -1.63 75.61 20.99 1.6
2035.75 13.82 25694.86 3571.51 8.39 -0.43 102.44 21.66 3.4
1578.00 8.13 23736.35 2845.50 5.15 0.04 76.42 21.46 2.7
4167.44 58.54 34314.29 5060.11 12.88 0.22 136.58 24.78 2.8
2799.97 21.14 22809.53 3552.00 9.14 -0.74 88.62 24.96 3.9
14
06-05-10
Correlation Matrix
Example: The Sale Territory Performance Case
Multicollinearity
Multicollinearity refers to the condition where the independent
variables (or predictors) in a model are dependent, related, or
correlated with each other.
Effects
Hinders ability to use bjs, t statistics, and p-values to assess the
relative importance of predictors.
Does not hinder ability to predict the dependent (or response)
variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
15
06-05-10
Variance Inflation Factors (VIF)
The variance inflation factor for the jth independent (or predictor)
variable xj is
1
VIFj =
1 − R2j
where Rj2 is the multiple coefficient of determination for the
regression model relating xj to the other predictors – x1,…,xj-1,xj+1, xk
xj = β0 + β1x1 + β2 x2 +...+βj−1xj−1 + βj+1xj+1+...+βk xk + ε
Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
mean(VIFj) substantially greater than 1 suggests severe
multicollinearity
Example: Variance Inflation Factors
(VIF)
Example: The Sale Territory Performance Case MegaStat Output
Regression output confidence interval
variables coefficients std. error t (df=16) p-value 95% lower 95% upper VIF
Intercept -1,507.8137 778.6349 -1.936 .0707 -3,158.4457 142.8182
Time 2.0096 1.9307 1.041 .3134 -2.0832 6.1024 3.343
MktPoten 0.0372 0.0082 4.536 .0003 0.0198 0.0546 1.978
Adver 0.1510 0.0471 3.205 .0055 0.0511 0.2509 1.910
MktShare 199.0235 67.0279 2.969 .0090 56.9307 341.1164 3.236
Change 290.8551 186.7820 1.557 .1390 -105.1049 686.8152 1.602
Accts 5.5510 4.7755 1.162 .2621 -4.5728 15.6747 5.639
WkLoad 19.7939 33.6767 0.588 .5649 -51.5975 91.1853 1.818
Rating 8.1893 128.5056 0.064 .9500 -264.2304 280.6090 1.809
2.667
mean VIF
max(VIFj) =5.639, mean(VIFj) = 2.667 probably not severe multicollinearity
16
06-05-10
Residual Analysis in Multiple
Regression
For an observed value of yi, the residual is
ei = yi − yˆ i = yi − (b0 + b1 xi1 + ... + bk xik )
If the regression assumptions hold, the residuals should
look like a random sample from a normal distribution
with mean 0 and variance σ2.
Residual Plots
Residuals versus each independent variable
Residuals versus predicted y’s
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals
Nonconstant Variance: Remedial
Measures
Example: The QHIC Case y / x = β 0 / x + β1 + β 2 x + η
Upkeep/V = - 53.5 1/V + 3.41 One + 0.0112 Value
Predictor Coef SE Coef T P
Noconstant
1/V -53.50 83.20 -0.64 0.524
One 3.409 1.321 2.58 0.014
Value 0.011224 0.004627 2.43 0.020
Predicted Values (1/V = 0.004545, One = 1, Value = 220)
Fit SE Fit 95.0% CI 95.0% PI
5.635 0.162 ( 5.306, 5.964) ( 3.994, 7.276)
Plots: Residual versus Fits x and predicted responses
17
06-05-10
Diagnostics for Detecting Outlying
and Influential Observations
Observation 1: Outlying with respect to y value
Observation 2: Outlying with respect to x value
Observation 3: Outlying with respect to x value and y value not
consistent with regression relationship (Influential)
Example: Influence Diagnostics
Hospital Labor Needs Case, Model:
y = monthly labor hours required
x1 = monthly X-ray exposures
x2 = monthly occupied bed days
x3 = average length of patient stay (days)
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
18
06-05-10
Leverage Values
Leverage = distance value (hi )
An observation is outlying with respect to x if it has a large leverage,
greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
Residuals and Studentized Residuals
Residual ei
Studentized Residual = ei′ =
Residual Standard Error s 1 − h
i
An observation is outlying with respect to y if it has a large studentized
(or standardized) residual, |StRes| greater than 2
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
19
06-05-10
Studentized Deleted Residuals
Deleted Residual di n−k −2
Studentized Deleted Residual = = ei
Deleted Residual Standard Error sd SSE (1 − hi ) − ei2
i
An observation is outlying with respect to y if it has a large studentized deleted residual,
|tRes| greater than tα/2 [with (n-k-2) d.f.]
Hospital Labor Needs Case: (17-3-2) = 12 |tRes| > t.025 = 2.179
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
Cook’s Distance
ei2 hi
Cook's Distance = Di = 2
(k + 1) s 2 (1 − hi )
An observation is influential with respect to the estimated regression
parameters b0, b1, …, bk if it has a large Cook’s distance, Di greater
than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1) =13, Di > F.50 = 0.8845
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
20