Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
Lecture # 3 Contingency Table
Objectives of this lecture:
After reading this unit, you should be able to
• basic concepts of Two-way contingency table
• understand about the Odds ratio
• understand about the test of independence
Contingency Table
A rectangular table having I rows for categories of X and J columns for categories of Y displays the
IJ possible combinations of outcomes. The cells of the table represents the IJ possible outcomes.
When the cell contains frequency counts of outcomes for a sample, the table is called a contingency
table, a term introduced by Karl Pearson (1904). Another name is cross classification table.
A contingency table with I rows and J columns is called an I by J table.
In the abstract, a contingency table looks like:
nij Y=1 Y=2 ... Y=J Total
X=1 n11 n12 ... n1 J n1 +
X=2 n 21 n 22 ... n2 J n2+
... ... ... ... ... ...
X=I nI 1 nI 2 ... n IJ nI +
Total n+1 n+ 2 ... n+ J n ++
If subjects are randomly sampled from the population and cross-classified, both X and Y are random
and (X , Y ) has a bivariate discrete joint distribution. Let π ij =P ( X =i ,Y = j) , the probability of
falling into the (i, j)th (row,column) in the table.
Example: The Physicians’ Health Study was a 5-year randomized study of whether Regular aspirin
intake reduces mortality from cardiovascular disease. Every other Day, physicians participating in
the study took either one aspirin tablet or a placebo. The study was blind those in the study did not
know whether they were taking aspirin or a placebo.
Table 1: Cross-Classification of Aspirin use and Myocardial
infarction
Myocardial Infarction
Fatal Attack Non-fatal Attack No Attack
Placebo 18 171 10, 845
Aspirin 5 99 10, 933
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
Joint/ Marginal for Contingency Table
Let πij denote the probability that (X, Y) occurs in the cell in row i and column j. The probability
distribution { πij } is the joint distribution of X and Y. The marginal distributions are the row and
column totals that result from summing the joint probabilities. We denote these by {πi + } for the
row variable and {π+ j } for the column variable, where the subscript ‘‘+’’ denotes the sum over
that index; that is,
J J
P ( X =i)=∑ P ( X =i , Y = j)=∑ πij = πi + and
j=1 j =1
I I
P ( X = j)=∑ P ( X =i , Y = j)=∑ πij =π + j
i=1 i=1
These satisfy ∑ πi +=∑ π+ j =∑ ∑ π ij=π ++=1. The marginal distributions provide single-
i j i j
variable information.
When X is fixed rather than random, the notation of a joint distribution for X and Y is no longer
meaningful. However, for a fixed category of X, Y has a probability distribution.
Example: The following 2 × 3 contingency table is from a report by the Physicians’ Health Study
Research Group on n = 22, 071 physicians that took either a placebo or aspirin every other day.
Myocardial Infarction
Fatal Attack Non-fatal Attack No Attack
Placebo 18 171 10, 845
Aspirin 5 99 10, 933
Here we have placed the marginal probabilities into each cell:
Myocardial Infarction
Fatal Attack Non-fatal Attack No Attack Row marginal
Placebo 0.00081 0.0077 0.4913 0.49981
Aspirin 0.00022 0.0044 0.4953 0.49992
Column 0.00103 0.0121 0.9866 1.000
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
Conditional distribution for Contingency Table
Given that a subject is classified in row I of X, π j∣i denotes the probability of classification in
J
column j of Y, j=1,2, . . . , J. Note that ∑ π j∣i =1. The probabilities {π1∣i , .. . , π J∣i } from the
j=1
conditional distribution of Y at category i of X.
The conditional distribution of Y given X relates to the joint distribution by
π j∣i= P( Y = j∣X =i )=π ij / πi + for i=1,2 ,. . . , I
Column
Row 1 2 Total
1 π11 π12 π1 +
( π 1∣1) ( π 2∣1) (1.0)
2 π21 π22 π2 +
( π 1∣2) ( π 2∣2 ) (1.0)
Total π+1 π+ 2 1.0
Sample distributions use similar notation, with p or π^ in place for π . For instance {πij }
denotes the sample joint distribution. The cell frequencies are denoted by {nij } and
n=∑ ∑ nij is the total sample size. Thus πij =nij /n .
i j
Example: The following 2 × 3 contingency table is from a report by the Physicians’ Health Study
Research Group on n = 22, 071 physicians that took either a placebo or aspirin every other day.
Myocardial Infarction
Fatal Attack Non-fatal Attack No Attack
Placebo 18 171 10, 845
Aspirin 5 99 10, 933
Here we have placed the probabilities of each classification into each cell:
Myocardial Infarction
Fatal Attack Non-fatal Attack No Attack
Placebo π1∣1 π 2∣1 π 3∣1
Aspirin π1∣2 π 2∣2 π 3∣2
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
Independence of Categorical Variables
Two categorical response variables are defined to be independent if all joint probabilities equal the
product of their marginal probabilities
P ( X =i ,Y = j)=P ( X =i) P (Y = j) or
πij =πi + π+ j for i=1,2 , . . . , I and j=1,2 ,. . . , J
When X and Y are independent,
π j∣i= P( Y = j∣X =i)=π ij / πi + = πi + π + j / πi + =π + j for i=1,2 , . . . , I
πi∣ j= P( X =i∣Y = j)=π ij / π+ j =πi + π+ j / π+ j= πi + for j=1,2 , . . . , J
Each conditional distribution of Y is identical to the marginal distribution of Y.
Thus, two variables are independent when {π j∣i=. . .= π j∣i , for j=1,2 , . . . , J }; that is the
probability of any given column response is the same in each row. When Y is a response and X is an
explanatory variable, this is a more nature way to define independence. Independence is then often
referred to as homogeneity of the conditional distributions.
Comparing Two Proportions
Difference of Proportions
We use the generic terms success and failure for the outcome categories. Let X and Y be
dichotomous. Let π 1=P (Y =1∣X =1) and let π 2=P (Y =1∣X =2).
The difference in probability of Y = 1 when X = 1 versus X = 2 is π 1 − π 2.
The difference π 1 − π 2 compares the success probabilities for the two groups. It falls between −1
and +1, equating zero when π 1=π 2 , that is, when the response variable is independent of the
group classification. Let π^1 and π^2 denote the sample proportions of successes. The sample
difference of proportions π^1 −π^2 estimates π 1 − π 2.
For sample sizes n1 and n2 for the two groups, when we treat the two samples as independent
binomial samples, the estimated expectation and standard error of π^1 −π^2 is
E( π^1− π^2)= π1− π2 and
π^1 (1− π^1) π^2 (1−π^2 )
SE=
√ n1
+
n2
As the sample sizes increase, the standard error decreases and the estimate of π 1 − π 2 tends to
improve. A large-sample 100(1 − α)% Wald confidence interval for π 1 − π 2 is
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
( π^ 1−π^2 )±z α / 2(SE)
For a significance test of H 0 :π 1=π 2 , the standard Z-test statistic divides ( π^ 1 − π^2 ) by a pooled
SE that applies under H 0.
Example: The study population consisted of over 22,000 male physicians who were randomly
assigned to either low-dose aspirin or a placebo (an identical looking pill that was inert). They
followed these physicians for about five years. Some of the data is summarized in the 2x2 table
shown below.
Myocardial Infarction (MI)
Yes No
Placebo 189 10845
Aspirin 104 10933
We can calculate proportions
Myocardial Infarction (MI)
Yes No
Placebo 0.01712887 0.9828711
Aspirin 0.00942285 0.9905771
data: MI
X-squared = 24.429, df = 1, p-value = 7.71e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.004597134 0.010814914
sample estimates:
prop 1 prop 2
0.01712887 0.00942285
Ratio of Proportions (Relative Risk)
A difference between two proportions of a certain fixed size usually is more important when both
proportions are near 0 or 1 than when they are near the middle of the range. Consider a comparison
of two drugs on the proportion of subjects who had adverse reactions when using them. The
difference between 0.010 and 0.001 is the same as the difference between 0.410 and 0.401, namely
0.009.
Relative risk is the ratio of the risks for an event for the exposure group to the risks for the non-
exposure group.
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
For 2 × 2 tables, the ratio of probabilities is often called the relative risk:
nij Y=1 Y=2 Total
X=1 n11 n12 n1 +
X=2 n 21 n 22 n2+
Total n+1 n+ 2 n ++
n11/ n 1+
Relative Risk =
n21/ n 2+
A relative risk of 1.00 occurs when π 1=π 2 , that is, when the response variable is independent of
the group.
Interpretation:
Relative Risk < 1: The event is less likely to occur in the treatment group
Relative Risk = 1: The event is equally likely to occur in each group
Relative Risk > 1: The event is more likely to occur in the treatment group
Example, a data from a flu vaccination study (Beran et al., 2009). This study was a randomized
controlled trial, the gold standard for identifying causal relationships.
Treatment Flu infections Non-infections
Vaccine 49 5054
Placebo 74 2475
Now, let’s plug these numbers into the relative risk formula:
49 /(49+5054)
RR= =0.3310
74 /(74+ 2475)
The risk ratio is 0.3310, indicating the vaccine is a protective factor. The vaccinated are about 1/3 as
likely to catch the flu as the unvaccinated. OR the probability of getting flu is lower for vaccinated
individuals than the unvaccinated.
Another example, The study population consisted of over 22,000 male physicians who were
randomly assigned to either low-dose aspirin or a placebo (an identical looking pill that was inert).
They followed these physicians for about five years. Some of the data is summarized in the 2x2
table shown below.
Myocardial Infarction (MI)
Yes No Total Cumulative Incidence
Placebo 189 10845 11,037 189/11,037 = 0.0126
Aspirin 104 10933 11,034 104/11,034 = 0.0217
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
189/11037
Relative Risk= =0.58
104/11034
Those who take low dose aspirin regularly have 58% or 0.58 times the risk of myocardial infarction
compared to those who do not take aspirin.
OR
Subjects taking aspirin had 43% less risk of having a myocardial infarction compared to subjects
taking the placebo.
Odds Ratio (OR)
The odds ratio is another measure of association for 2 × 2 contingency tables. It also occurs as a
parameter in the most important model for categorical responses logistic regression.
For a probability of success π, the odds of success are defined to be
Odds= π
(1−π )
For instance, when π = 0.75, the odds of success equal 0.75/0.25 = 3.0. The odds are non-negative,
with value greater than 1.0 when a success is more likely than a failure. When odds = 3.0, we
expect to observe three successes for every one failure. When odds = 1/3, a failure is three times as
likely as a success. We then expect to observe one success for every three failures.
In 2 × 2 tables, within row 1 the odds of success are odds 1=π 1 /(1− π 1 ) , and within row 2 the
odds of success equal odds 2=π 2 /(1− π 2) .
nij Y=1 Y=2 Total
X=1 n11 n12 n1 +
X=2 n 21 n 22 n2+
Total n+1 n+ 2 n ++
The ratio of the odds from the two rows,
Odds1 π1 /(1− π1 ) n 11/ n12 n11 n22
θ = = = =
Odds2 π2 /(1− π2 ) n 21/ n22 n12 n21
is the odds ratio. Whereas the relative risk is a ratio π 1 /π 2 of two probabilities, the odds ratio θ is
a ratio of two odds.
Properties of Odds Ratio
1. The odds ratio can equal any non-negative number.
2. It doesn’t depend on the marginal distribution of either variable.
3. If the categories of both variables are interchanged, the value doesn’t change.
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
4. If the categories of one variable are switch, the odds ratio in the new rearrange table will
equal.
Example, a data from a flu vaccination study (Beran et al., 2009). This study was a randomized
controlled trial, the gold standard for identifying causal relationships.
Treatment Flu infections Non-infections
Vaccine 49 5054
Placebo 74 2475
Now, let’s plug these numbers into the odds ratio formula:
49×2475
OR = = 0.324
74×5054
So, the vaccinated individuals were more likely to affected by flu infections than the unvaccinated.
Inference for Odds Ratios (OR) and Log OR
The sampling distribution of the odds ratio is highly skewed unless the sample size is extremely
large. Because of this skewness, statistical inference uses its natural logarithm, log(θ).
Independence corresponds to log(θ) = 0.
The sample log odds ratio, log θ^ , has a less-skewed, bell-shaped sampling distribution. Its
approximating normal distribution has a mean of logθ and a standard error of
1 1 1 1
SE (log θ^ )=
√ + + +
n11 n12 n21 n22
The SE decreases as the cell counts increase. Because the sampling distribution is closer to
normality for log θ^ , than θ^ .
A large-sample Wald confidence interval for logθ is
log θ^ ±zα /2 (SE (log θ^ ))
Relative Risk and Odds Ratios are often confused despite being unique
concepts. Why?
Well, both measure association between a binary outcome variable and a continuous or binary
predictor variable.
The basic difference is that the odds ratio is a ratio of two odds (yep, it’s that obvious) whereas the
relative risk is a ratio of two probabilities. (The relative risk is also called the risk ratio).
π1 /(1− π1 ) (1−π2 )
θ= =Relative Risk
π2 /(1− π2 ) (1−π1 )
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
When π 1 and π 2 are both close to zero, the fraction in the last term of this expression equals
approximately 1.0. The odds ratio and relative risk then take similar values.
Testing Independence of Two Way CT
Pearson Chi-square Statistic: In two-way contingency tables with joint probabilities {πij} for two
response variables, the null hypothesis of statistical independence is
H 0 : πij = πi + π+ j for all i and j
To test H 0 , we identify μ ij =n π ij =n π i + π + j as the expected frequency of nij , assuming
independence. Usually, {πi+} and {π+j} are unknown, as is this expected value. To obtain estimated
expected frequencies, we substitute sample proportions for the unknown marginal probabilities,
giving
ni + n+ j ni + n+ j
μ^ij =n π^i + π^+ j =n( )( )=
n n n
based on the row marginal totals {ni+} and the column marginal totals {n+j}. The μ^ij have the
same row and column totals as the cell counts {nij}, but they display the pattern of independence.
Then chi-square equals
∑ ∑ (nij −μ^ij) 2
χ 2= i j
with df =(I −1)( J −1)
μ^ij
Likelihood-Ratio Statistic: For multinomial sampling the Kernel of the likelihood is
∏ ∏ π nij , ij
where all πij ≥0 and ∑ ∑ π ij=1
i j i j
Let us consider the null hypothesis
H 0 : πij = πi + π + j for alli and j
Under H0, π^ij =π^i + π^+ j=ni + n + j /n2 . In the general case, π^ij =nij / n .
The ratio of the likelihoods equals
∏ ∏ (ni + n + j )n ij
i j
Λ= n
n ∏ ∏ nijn ij
i j
The likelihood-ratio chi-squared statistic is denoted by G2, it equals
2
G =−2 log Λ=2 ∑ ∑ n ij log ( nij / μ^ij ) where , μ^ij =ni + n + j /n
i j
The larger the values of G2, the more evidence exists against independence.
Categorical Data Analysis | Lecture 3 | Two-way Contingency table
Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU
Reference Book:
i. Agresti A. (2019), An Introduction to Categorical Data Analysis, 3rd edition, A John Wiley & Sons
Inc., Publication.
ii.
<><><><><><><><><> End <><><><><><><><><>
Categorical Data Analysis | Lecture 3 | Two-way Contingency table