Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
162 views13 pages

Categorical Data Analysis - 2002 - Agresti - Frontmatter

The document is a comprehensive overview of categorical data analysis, detailing methods and techniques developed since the 1960s, with a focus on generalized linear modeling. It serves as a textbook for courses in statistics and biostatistics, covering essential topics such as distributions, contingency tables, and logistic regression. The content is structured into chapters that progressively build on foundational concepts and applications in categorical data analysis.

Uploaded by

haryanieaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views13 pages

Categorical Data Analysis - 2002 - Agresti - Frontmatter

The document is a comprehensive overview of categorical data analysis, detailing methods and techniques developed since the 1960s, with a focus on generalized linear modeling. It serves as a textbook for courses in statistics and biostatistics, covering essential topics such as distributions, contingency tables, and logistic regression. The content is structured into chapters that progressively build on foundational concepts and applications in categorical data analysis.

Uploaded by

haryanieaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Categorical Data Analysis

10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Categorical Data Analysis
Second Edition

University of Florida
ALAN AGRESTI

Gainesville, Florida
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

This book is printed on acid-free paper. "
Copyright 䊚 2002 John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923, Ž978. 750-8400, fax Ž978. 750-4744. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New
York, NY 10158-0012, Ž212. 850-6011, fax Ž212. 850-6008, E-Mail: [email protected].
For ordering and customer service, call 1-800-CALL-WILEY.

Library of Congress Cataloging-in-Publication Data Is A©ailable

ISBN 0-471-36093-7

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
To Jacki
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Contents

Preface xiii

1. Introduction: Distributions and Inference for Categorical Data 1


1.1 Categorical Response Data, 1
1.2 Distributions for Categorical Data, 5
1.3 Statistical Inference for Categorical Data, 9
1.4 Statistical Inference for Binomial Parameters, 14
1.5 Statistical Inference for Multinomial Parameters, 21
Notes, 26
Problems, 28

2. Describing Contingency Tables 36


2.1 Probability Structure for Contingency Tables, 36
2.2 Comparing Two Proportions, 43
2.3 Partial Association in Stratified 2 = 2 Tables, 47
2.4 Extensions for I = J Tables, 54
Notes, 59
Problems, 60

3. Inference for Contingency Tables 70


3.1 Confidence Intervals for Association Parameters, 70
3.2 Testing Independence in Two-Way Contingency
Tables, 78
3.3 Following-Up Chi-Squared Tests, 80
3.4 Two-Way Tables with Ordered Classifications, 86
3.5 Small-Sample Tests of Independence, 91

vii
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
viii CONTENTS

3.6 Small-Sample Confidence Intervals for 2 = 2 Tables,* 98


3.7 Extensions for Multiway Tables and Nontabulated
Responses, 101
Notes, 102
Problems, 104

4. Introduction to Generalized Linear Models 115


4.1 Generalized Linear Model, 116
4.2 Generalized Linear Models for Binary Data, 120
4.3 Generalized Linear Models for Counts, 125
4.4 Moments and Likelihood for Generalized Linear
Models,* 132
4.5 Inference for Generalized Linear Models, 139
4.6 Fitting Generalized Linear Models, 143
4.7 Quasi-likelihood and Generalized Linear Models,* 149
4.8 Generalized Additive Models,* 153
Notes, 155
Problems, 156

5. Logistic Regression 165


5.1 Interpreting Parameters in Logistic Regression, 166
5.2 Inference for Logistic Regression, 172
5.3 Logit Models with Categorical Predictors, 177
5.4 Multiple Logistic Regression, 182
5.5 Fitting Logistic Regression Models, 192
Notes, 196
Problems, 197

6. Building and Applying Logistic Regression Models 211


6.1 Strategies in Model Selection, 211
6.2 Logistic Regression Diagnostics, 219
6.3 Inference About Conditional Associations in 2 = 2 = K
Tables, 230
6.4 Using Models to Improve Inferential Power, 236
6.5 Sample Size and Power Considerations,* 240
6.6 Probit and Complementary Log-Log Models,* 245

*Sections marked with an asterisk are less important for an overview.


10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CONTENTS ix

6.7 Conditional Logistic Regression and Exact


Distributions,* 250
Notes, 257
Problems, 259

7. Logit Models for Multinomial Responses 267


7.1 Nominal Responses: Baseline-Category Logit Models, 267
7.2 Ordinal Responses: Cumulative Logit Models, 274
7.3 Ordinal Responses: Cumulative Link Models, 282
7.4 Alternative Models for Ordinal Responses,* 286
7.5 Testing Conditional Independence in I = J = K
Tables,* 293
7.6 Discrete-Choice Multinomial Logit Models,* 298
Notes, 302
Problems, 302

8. Loglinear Models for Contingency Tables 314


8.1 Loglinear Models for Two-Way Tables, 314
8.2 Loglinear Models for Independence and Interaction in
Three-Way Tables, 318
8.3 Inference for Loglinear Models, 324
8.4 Loglinear Models for Higher Dimensions, 326
8.5 The Loglinear᎐Logit Model Connection, 330
8.6 Loglinear Model Fitting: Likelihood Equations and
Asymptotic Distributions,* 333
8.7 Loglinear Model Fitting: Iterative Methods and their
Application,* 342
Notes, 346
Problems, 347

9. Building and Extending Loglinearr


r Logit Models 357
9.1 Association Graphs and Collapsibility, 357
9.2 Model Selection and Comparison, 360
9.3 Diagnostics for Checking Models, 366
9.4 Modeling Ordinal Associations, 367
9.5 Association Models,* 373
9.6 Association Models, Correlation Models, and
Correspondence Analysis,* 379
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
x CONTENTS

9.7 Poisson Regression for Rates, 385


9.8 Empty Cells and Sparseness in Modeling Contingency
Tables, 391
Notes, 398
Problems, 400

10. Models for Matched Pairs 409


10.1 Comparing Dependent Proportions, 410
10.2 Conditional Logistic Regression for Binary Matched
Pairs, 414
10.3 Marginal Models for Square Contingency Tables, 420
10.4 Symmetry, Quasi-symmetry, and Quasi-
independence, 423
10.5 Measuring Agreement Between Observers, 431
10.6 Bradley᎐Terry Model for Paired Preferences, 436
10.7 Marginal Models and Quasi-symmetry Models for
Matched Sets,* 439
Notes, 442
Problems, 444

11. Analyzing Repeated Categorical Response Data 455


11.1 Comparing Marginal Distributions: Multiple
Responses, 456
11.2 Marginal Modeling: Maximum Likelihood Approach, 459
11.3 Marginal Modeling: Generalized Estimating Equations
Approach, 466
11.4 Quasi-likelihood and Its GEE Multivariate Extension:
Details,* 470
11.5 Markov Chains: Transitional Modeling, 476
Notes, 481
Problems, 482

12. Random Effects: Generalized Linear Mixed Models for


Categorical Responses 491
12.1 Random Effects Modeling of Clustered Categorical
Data, 492
12.2 Binary Responses: Logistic-Normal Model, 496
12.3 Examples of Random Effects Models for Binary
Data, 502
12.4 Random Effects Models for Multinomial Data, 513
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CONTENTS xi

12.5 Multivariate Random Effects Models for Binary Data,


516
12.6 GLMM Fitting, Inference, and Prediction, 520
Notes, 526
Problems, 527

13. Other Mixture Models for Categorical Data* 538


13.1 Latent Class Models, 538
13.2 Nonparametric Random Effects Models, 545
13.3 Beta-Binomial Models, 553
13.4 Negative Binomial Regression, 559
13.5 Poisson Regression with Random Effects, 563
Notes, 565
Problems, 566

14. Asymptotic Theory for Parametric Models 576


14.1 Delta Method, 577
14.2 Asymptotic Distributions of Estimators of Model
Parameters and Cell Probabilities, 582
14.3 Asymptotic Distributions of Residuals and Goodness-
of-Fit Statistics, 587
14.4 Asymptotic Distributions for LogitrLoglinear
Models, 592
Notes, 594
Problems, 595

15. Alternative Estimation Theory for Parametric Models 600


15.1 Weighted Least Squares for Categorical Data, 600
15.2 Bayesian Inference for Categorical Data, 604
15.3 Other Methods of Estimation, 611
Notes, 615
Problems, 616

16. Historical Tour of Categorical Data Analysis* 619


16.1 Pearson᎐Yule Association Controversy, 619
16.2 R. A. Fisher’s Contributions, 622
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xii CONTENTS

16.3 Logistic Regression, 624


16.4 Multiway Contingency Tables and Loglinear Models, 625
16.5 Recent Žand Future? . Developments, 629

Appendix A. Using Computer Software to Analyze Categorical Data 632


A.1 Software for Categorical Data Analysis, 632
A.2 Examples of SAS Code by Chapter, 634

Appendix B. Chi-Squared Distribution Values 654

References 655

Examples Index 689

Author Index 693

Subject Index 701


10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Preface

The explosion in the development of methods for analyzing categorical data


that began in the 1960s has continued apace in recent years. This book
provides an overview of these methods, as well as older, now standard,
methods. It gives special emphasis to generalized linear modeling techniques,
which extend linear model methods for continuous variables, and their
extensions for multivariate responses.
Today, because of this development and the ubiquity of categorical data in
applications, most statistics and biostatistics departments offer courses on
categorical data analysis. This book can be used as a text for such courses.
The material in Chapters 1᎐7 forms the heart of most courses. Chapters 1᎐3
cover distributions for categorical responses and traditional methods for
two-way contingency tables. Chapters 4᎐7 introduce logistic regression and
related logit models for binary and multicategory response variables. Chap-
ters 8 and 9 cover loglinear models for contingency tables. Over time, this
model class seems to have lost importance, and this edition reduces some-
what its discussion of them and expands its focus on logistic regression.
In the past decade, the major area of new research has been the develop-
ment of methods for repeated measurement and other forms of clustered
categorical data. Chapters 10᎐13 present these methods, including marginal
models and generalized linear mixed models with random effects. Chapters
14 and 15 present theoretical foundations as well as alternatives to the
maximum likelihood paradigm that this text adopts. Chapter 16 is devoted to
a historical overview of the development of the methods. It examines contri-
butions of noted statisticians, such as Pearson and Fisher, whose pioneering
effortsᎏand sometimes vocal debatesᎏbroke the ground for this evolution.
Every chapter of the first edition has been extensively rewritten, and some
substantial additions and changes have occurred. The major differences are:


A new Chapter 1 that introduces distributions and methods of inference
for categorical data.

A unified presentation of models as special cases of generalized linear
models, starting in Chapter 4 and then throughout the text.
xiii
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xiv PREFACE


Greater emphasis on logistic regression for binary response variables
and extensions for multicategory responses, with Chapters 4᎐7 introduc-
ing models and Chapters 10᎐13 extending them for clustered data.

Three new chapters on methods for clustered, correlated categorical
data, increasingly important in applications.

A new chapter on the historical development of the methods.

More discussion of ‘‘exact’’ small-sample procedures and of conditional
logistic regression.

In this text, I interpret categorical data analysis to refer to methods for


categorical response variables. For most methods, explanatory variables can
be qualitative or quantitative, as in ordinary regression. Thus, the focus is
intended to be more general than contingency table analysis, although for
simplicity of data presentation, most examples use contingency tables. These
examples are often simplistic, but should help readers focus on understand-
ing the methods themselves and make it easier for them to replicate results
with their favorite software.
Special features of the text include:


More than 100 analyses of ‘‘real’’ data sets.

More than 600 exercises at the end of the chapters, some directed
towards theory and methods and some towards applications and data
analysis.

An appendix that shows, by chapter, the use of SAS for performing
analyses presented in this book.

Notes at the end of each chapter that provide references for recent
research and many topics not covered in the text.

Appendix A summarizes statistical software needed to use the methods


described in this text. It shows how to use SAS for analyses included in the
text and refers to a web site Žwww.stat.ufl.edur; aarcdarcda.html . that
contains Ž1. information on the use of other software Žsuch as R, S-plus,
SPSS, and Stata., Ž2. data sets for examples in the form of complete SAS
programs for conducting the analyses, Ž3. short answers for many of the
odd-numbered exercises, Ž4. corrections of errors in early printings of the
book, and Ž5. extra exercises. I recommend that readers refer to this ap-
pendix or specialized manuals while reading the text, as an aid to implement-
ing the methods.
I intend this book to be accessible to the diverse mix of students who take
graduate-level courses in categorical data analysis. But I have also written it
with practicing statisticians and biostatisticians in mind. I hope it enables
them to catch up with recent advances and learn about methods that
sometimes receive inadequate attention in the traditional statistics curricu-
lum.
10.1002/0471249688.fmatter, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/0471249688.fmatter by National Institutes Of Health Malaysia, Wiley Online Library on [18/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PREFACE xv

The development of new methods has influenced ᎏand been influenced


byᎏthe increasing availability of data sets with categorical responses in the
social, behavioral, and biomedical sciences, as well as in public health, human
genetics, ecology, education, marketing, and industrial quality control. And
so, although this book is directed mainly to statisticians and biostatisticians, I
also aim for it to be helpful to methodologists in these fields.
Readers should possess a background that includes regression and analysis
of variance models, as well as maximum likelihood methods of statistical
theory. Those not having much theory background should be able to follow
most methodological discussions. Sections and subsections marked with an
asterisk are less important for an overview. Readers with mainly applied
interests can skip most of Chapter 4 on the theory of generalized linear
models and proceed to other chapters. However, the book has distinctly
higher technical level and is more thorough and complete than my lower-level
text, An Introduction to Categorical Data Analysis ŽWiley, 1996..
I thank those who commented on parts of the manuscript or provided help
of some type. Special thanks to Bernhard Klingenberg, who read several
chapters carefully and made many helpful suggestions, Yongyi Min, who
constructed many of the figures and helped with some software, and Brian
Caffo, who helped with some examples. Many thanks to Rosyln Stone and
Brian Marx for each reviewing half the manuscript and Brian Caffo, I-Ming
Liu, and Yongyi Min for giving insightful comments on several chapters.
Thanks to Constantine Gatsonis and his students for using a draft in a course
at Brown University and providing suggestions. Others who provided com-
ments on chapters or help of some type include Patricia Altham, Wicher
Bergsma, Jane Brockmann, Brent Coull, Al DeMaris, Regina Dittrich,
Jianping Dong, Herwig Friedl, Ralitza Gueorguieva, James Hobert, Walter
Katzenbeisser, Harry Khamis, Svend Kreiner, Joseph Lang, Jason Liao,
Mojtaba Ganjali, Jane Pendergast, Michael Radelet, Kenneth Small, Maura
Stokes, Tom Ten Have, and Rongling Wu. I thank my co-authors on various
projects, especially Brent Coull, Joseph Lang, James Booth, James Hobert,
Brian Caffo, and Ranjini Natarajan, for permission to use material from
those articles. Thanks to the many who reviewed material or suggested
examples for the first edition, mentioned in the Preface of that edition.
Thanks also to Wiley Executive Editor Steve Quigley for his steadfast
encouragement and facilitation of this project. Finally, thanks to my wife
Jacki Levine for continuing support of all kinds, despite the many days this
work has taken from our time together.

ALAN AGRESTI

Gaines®ille, Florida
No®ember 2001

You might also like