THIRD EDITION
Applied Regression Analysis and
Other Multivariable Methods
David G. Kleinbaum
Emory University
Lawrence L. Kupper
University of North Carolina, Chapel Hill
Keith E. Muller
University of North Carolina, Chapel Hill
Azhar Nizam
Emory University
An Alexander Kugushev Book
^P> Duxbury Press
An Imprint of Brooks/Cole Publishing Company
lP An International Thomson Publishing Company
Pacific Grove Albany Belmont Bonn Boston Cincinnati Detroit Johannesburg London
Madrid Melbourne Mexico City New York Paris Singapore Tokyo Toronto Washington
Contents
CONCEPTS AND EXAMPLES OF RESEARCH
1-1 Concepts
1
1-2 Examples
2
1-3 Concluding Remarks
References
6
2
CLASSIFICATION OF VARIABLES AND THE CHOICE OF ANALYSIS
2-1 Classification of Variables
7
2-2 Overlapping of Classification Schemes
2-3 Choice of Analysis
11
References
13
11
BASIC STATISTICS: A REVIEW
14
3-1
3-2
3-3
3-4
3-5
3-6
Preview
14
Descriptive Statistics
15
Random Variables and Distributions
16
Sampling Distributions of /, #2, and F
19
Statistical Inference: Estimation
21
Statistical Inference: Hypothesis Testing
24
xi
xii
Contents
3-7 Error Rates, Power, and Sample Size
Problems
30
References
33
28
INTRODUCTIONTO REGRESSION ANALYSIS
4-1
4-2
4-3
4-4
Preview
34
Association versus Causality
35
Statistical versus Deterministic Models
Concluding Remarks
38
References
38
STRAIGHT-LINE REGRESSION ANALYSIS
5-1
5-2
5-3
5-4
5-5
5-6
5-7
5-8
5-9
5-10
5-11
34
37
39
Preview
39
Regression with a Single Independent Variable
39
Mathematical Properties of a Straight Line
42
Statistical Assumptions for a Straight-line Model
43
Determining the Best-fitting Straight Line
47
Measure of the Quality of the Straight-line Fit and Estimate of a2
Inferences About the Slope and Intercept
52
Interpretations of Tests for Slope and Intercept
54
Inferences About the Regression Line JUY\X = o + i%
57
Prediction of a New Value of FatX 0
59
Assessing the Appropriateness of the Straight-line Model
60
Problems
60
References
87
THE CORRELATION COEFFICIENT
AND STRAIGHT-LINE REGRESSION ANALYSIS
6-1
6-2
6-3
6-4
6-5
6-6
6-7
51
88
Definition of r
88
ras a Measure of Association
89
The Bivariate Normal Distribution
90
r and the Strength of the Straight-line Relationship
93
What r Does Not Measure
95
Tests of Hypotheses and Confidence Intervals for the Correlation Coefficient
Testing for the Equality of Two Correlations
99
Problems
101
References
103
96
Contents
7
THE ANALYSIS-OF-VARIANCE TABLE
7-1 Preview
104
7-2 The ANOVA Table for Straight-line Regression
Problems
108
104
104
MULTIPLE REGRESSION ANALYSIS:
GENERAL CONSIDERATIONS
111
8-1
8-2
8-3
8-4
8-5
8-6
8-7
Preview
111
Multiple Regression Models
112
Graphical Look at the Problem
113
Assumptions of Multiple Regression
115
Determining the Best Estimate of the Multiple Regression Equation
The ANOVA Table for Multiple Regression
119
Numerical Examples
121
Problems
123
References
135
TESTING HYPOTHESES IN MULTIPLE REGRESSION
9-1
9-2
9-3
9-4
9-5
9-6
Preview
136
Test for Significant Overall Regression
137
PartialFTest
138
Multiple PartialFTest
143
Strategies for Using Partial F Tests
145
Tests Involving the Intercept
150
Problems
151
References
159
CORRELATIONS: MULTIPLE,
PARTIAL, AND MULTIPLE PARTIAL
10-1
10-2
10-3
10-4
10-5
10-6
118
136
160
Preview
160
Correlation Matrix
161
Multiple Correlation Coefficient
162
Relationship of Ry\xhx2,...,xk to the Multivariate Normal Distribution
Partial Correlation Coefficient
165
Alternative Representation of the Regression Model
172
164
xiii
xiv
Contents
10-7 Multiple Partial Correlation
10-8 Concluding Remarks
174
Problems
174
Reference
185
11
11-1
11-2
11-3
11-4
11-5
12
12-1
12-2
12-3
12-4
12-5
12-6
12-7
12-8
12-9
13
13-1
13-2
13-3
13-4
13-5
13-6
13-7
13-8
13-9
172
CONFOUNDING AND INTERACTION IN REGRESSION
Preview
186
Overview
186
Interaction in Regression
Confounding in Regression
Summary and Conclusions
Problems
199
Reference
211
188
194
199
REGRESSION DIAGNOSTICS
212
Preview
212
Simple Approaches to Diagnosing Problems in Data
Residual Analysis
216
Treating Outliers
228
Collinearity
237
Scaling Problems
248
Treating Collinearity and Scaling Problems
248
Alternate Strategies of Analysis
249
An Important Caution
252
Problems
253
References
279
POLYNOMIAL REGRESSION
212
281
Preview
281
Polynomial Models
282
Least-squares Procedure for Fitting a Parabola
282
ANOVA Table for Second-order Polynomial Regression
284
Inferences Associated with Second-order Polynomial Regression
Example Requiring a Second-order Model
286
Fitting and Testing Higher-order Models
290
Lack-of-fit Tests
290
Orthogonal Polynomials
292
Contents
13-10 Strategies for Choosing a Polynomial Model
Problems
302
301
14
DUMMY VARIABLES IN REGRESSION
14-1
14-2
14-3
14-4
14-5
14-6
14-7
14-8
14-9
14-10
14-11
14-12
14-13
Preview
317
Definitions
317
Rule for Defming Dummy Variables
318
Comparing Two Straight-line Regression Equations: An Example
319
Questions for Comparing Two Straight Lines
320
Methods of Comparing Two Straight Lines
321
Method I: Using Separate Regression Fits to Compare Two Straight Lines
322
Method II: Using a Single Regression Equation to Compare Two Straight Lines
327
Comparison of Methods I and II
330
Testing Strategies and Interpretation: Comparing Two Straight Lines
330
Other Dummy Variable Models
332
Comparing Four Regression Equations
334
Comparing Several Regression Equations Involving Two Nominal Variables
336
Problems
338
References
360
-, c
1^
ANALYSIS OF COVARIANCE AND OTHER
METHODS FOR ADJUSTING CONTINUOUS DATA
15-1
15-2
15-3
15-4
15-5
15-6
15-7
16
16-1
16-2
16-3
16-4
16-5
317
361
Preview
361
Adjustment Problem
361
Analysis of Covariance
363
Assumption of Parallelism: A Potential Drawback
365
Analysis of Covariance: Several Groups and Several Covariates
Comments and Cautions
368
Summary
371
Problems
371
Reference
385
366
SELECTING THE BEST REGRESSION EQUATION
386
Preview
386
Steps in Selecting the Best Regression Equation
387
Step 1: Specifying the Maximum Model
387
Step 2: Specifying a Criterion for Selecting a Model
390
Step 3: Specifying a Strategy for Selecting Variables
392
xv
xvi
Contents
16-6
16-7
16-8
16-9
17
17-1
17-2
17-3
17-4
17-5
17-6
17-7
17-8
17-9
18
18-1
18-2
18-3
18-4
18-5
18-6
18-7
19
19-1
19-2
19-3
19-4
19-5
19-6
19-7
Step 4: Conducting the Analysis
401
Step 5: Evaluating Reliability with Split Samples
Example Analysis ofActual Data
403
Issues in Selecting the Most Valid Model
409
Problems
409
References
422
ONE-WAY ANALYSIS OFVARIANCE
401
423
Preview
423
One-way ANOVA: The Problem, Assumptions, and Data Configuration
426
Methodology for One-way Fixed-effects ANOVA
429
Regression Model for Fixed-effects One-way ANOVA
435
Fixed-effects Model for One-way ANOVA
438
Random-effects Model for One-way ANOVA
440
Multiple-comparison Procedures for Fixed-effects One-way ANOVA
443
Choosing a Multiple-comparison Technique
456
Orthogonal Contrasts and Partitioning an ANOVA Sum of Squares
457
Problems
463
References
483
RANDOMIZED BLOCKS: SPECIAL CASE OF TWO-WAY ANOVA
Preview
484
Equivalent Analysis of a Matched Pairs Experiment
488
PrincipleofBlocking
491
Analysis of a Randomized-blocks Experiment
493
ANOVA Table for a Randomized-blocks Experiment
495
Regression Models for a Randomized-blocks Experiment
499
Fixed-effects ANOVA Model for a Randomized-blocks Experiment
Problems
503
References
515
TWO-WAY ANOVA WITH EQUAL CELL NUMBERS
Preview
516
Usinga Table of Cell Means
518
General Methodology
522
F Tests for Two-way ANOVA
527
Regression Model for Fixed-effects Two-way ANOVA
Interactions in Two-way ANOVA
534
Random- and Mixed-effects Two-way ANOVA Models
Problems
544
References
560
530
541
502
516
Contents
20
20-1
20-2
20-3
20-4
21
21-1
21-2
21-3
21-4
21-5
21-6
22
22-1
22-2
22-3
22-4
23
23-1
23-2
23-3
23-4
23-5
23-6
TWO-WAY ANOVA WITH UNEQUAL CELL NUMBERS
561
Preview
561
Problems with Unequal Cell Numbers: Nonorthogonality
563
Regression Approach for Unequal Cell Sample Sizes
567
Higher-way ANOVA
571
Problems
572
References
588
ANALYSIS OF REPEATED MEASURES DATA
589
Preview
589
Examples
590
General Approach for Repeated Measures ANOVA
592
Overview of Selected Repeated Measures Designs and ANOVA-based Analyses
Repeated Measures ANOVA for Unbalanced Data
611
Other Approaches to Analyzing Repeated Measures Data
612
Appendix 21-A Examples of SAS's GLM and MIXED Procedures
613
Problems
616
References
638
THE
METHOD OF MAXIMUM LIKELIHOOD
Preview
639
The Principle of Maximum Likelihood
639
Statistical Inference via Maximum Likelihood
Summary
652
Problems
653
References
655
LOGISTIC REGRESSION ANALYSIS
639
642
656
Preview
656
The Logistic Model
656
Estimating the Odds Ratio Using Logistic Regression
658
A Numerical Example of Logistic Regression
664
Theoretical Considerations
671
An Example of Conditional ML Estimation
Involving Pair-matched Data with Unmatched Covariates
677
23-7 Summary
681
Problems
682
References
686
594
xvii
xviii
Contents
24
POISSON REGRESSION ANALYSIS
687
24-1
24-2
24-3
24-4
24-5
24-6
24-7
24-8
Preview
687
The Poisson Distribution
687
An Example of Poisson Regression
688
Poisson Regression: General Considerations
690
Measures of Goodness of Fit
694
Continuation of Skin Cancer Data Example
696
A Second Illustration of Poisson Regression Analysis
Summary
704
Problems
705
References
709
APPENDIX ATABLES
701
711
A-l
A-2
A-3
A-4
Standard Normal Cumulative Probabilities
Percentiles of the t Distribution
715
Percentiles of the Chi-square Distribution
Percentiles of the F Distribution
717
1
+
r
A-5 Values off In724
712
716
1 -r
A-6
A-7
A-8
A-9
A-10
Upper a Point of Studentized Range
726
Orthogonal Polynomial Coefficients
728
Bonferroni Corrected Jackknife and Studentized Residual Critical Values
Critical Values for Leverages
730
Critical Values for the Maximum of N Values of Cook's d(i) times (n-k-\)
APPENDIX BMATRICES AND THEIR
RELATIONSHIP TO REGRESSION ANALYSIS
731
732
APPENDIX CANOVA INFORMATION FOR FOUR
COMMON BALANCED REPEATED MEASURES DESIGNS
C-1
C-2
C-3
C-4
C-5
729
744
Balanced Repeated Measures Design with One Crossover Factor (Treatments)
744
Balanced Repeated Measures Design with Two Crossover Factors
746
Balanced Repeated Measures Design with One Nest Factor (Treatments) 750
Balanced Repeated Measures Design with One Crossover Factor and One Nest Factor
Balanced Two-group Pre/Posttest Design
755
References
757
SOLUTIONS TO EXERCISES
INDEX
787
758
752