SAHADEB - Categorical - Data - LECTURES 1 - Part 2
SAHADEB - Categorical - Data - LECTURES 1 - Part 2
Sahadeb Sarkar
IIM Calcutta
1
Terminology
Discrete data: relates to discrete outcomes, Discrete
distributions
• Categorical Data: Discrete data with finitely many
possible values on a nominal scale (e.g., the state a
person lives in, the political party one might vote for,
the blood type of a patient; Multinomial, Bernoulli
distribution). Central tendency given by its mode
• Count data (non-negative integer valued) : Records
the frequency of an event, may not have an upper
bound (e.g., Poisson, Binomial, Negative Binomial
distributions). It arise out of counting and not ranking.
2
Discrete Data Types
• Dichotomous data: can take only two values
such as “Yes” and “No”
• Nonordered polytomous data: five different
detergents
• Ordered polytomous data: grades A, B , C, D;
“old”, “middle-aged”, “young” employees
4
Derivation Tools in CDA, Text p.18
Slutsky’s Theorem:
𝑑 𝑑
Suppose 𝑋𝑛 𝑋 and 𝑌𝑛 𝑐, constant. Then
𝑑
1. 𝑋𝑛 + 𝑌𝑛 𝑋+𝑐
𝑑
2. 𝑌𝑛 𝑋𝑛 𝑐𝑋
𝑑
3. If c0, 𝑋𝑛 /𝑌𝑛 𝑋/𝑐
5
Inference for One-way Frequency
Table
6
Binomial Distribution
(leading to One-Way Frequecy Table)
Suppose Y is a random variable with 2 possible outcome
categories c1,c2 with probabilities π1, π2=(1 π1).
Suppose there are n observations on Y ; we can summarize
the responses through the vector of observed frequencies
(random variables), (X1, X2=nX1).
7
Example 1.1, p. 6, Text
8
Metabolic syndrome
(https://en.wikipedia.org/wiki/Metabolic_syndrome)
9
Example 1.1 (Binary Case), p. 37, Text
• Test if the prevalence of Metabolic Syndrome is 40% in this
study population
48
π− 𝜋0 93
− 0.4
𝑍= = = 2.286;
𝜋0 ×(1−𝜋0 )/𝑛 0.4×0.6/93
P-value = 2(2.2.86)=0.0223
𝑟+𝑘 1
• 𝑃 𝑋 = 𝑘 = 𝑟 𝑘! 𝑝𝑟 (1 − 𝑝)𝑘 . Put 𝛼 = r & =rp/(1-p) for reparameterization
𝑘+𝑟−1 −𝑟
• Note: =(−1)𝑘
𝑘 𝑘
11
Negative Binomial Distribution (p. 41)
𝑟+𝑘 𝑟
• 𝑃 𝑋=𝑘 = 𝑝 (1 − 𝑝)𝑘 ……… (1a)
𝑟 𝑘!
• E(X)= rp/(1-p), V(X) = rp/(1-p)2 > E(X) …….. (1b)
• Extension through reparameterization:
1
= (> 0), =rp/(1-p) in (1)
𝑟
𝑟
1
+𝑘 µ 𝑘
1
• Then, 𝑃 𝑋 = 𝑘 = α α
……… (2a)
1
α
1
𝑘!
α
+µ
1
α
+µ
• E(X)= ; V(X) = + 2 ……………………(2b)
12
Hypergeometric Distribution
• Randomly sample n elements from a finite (dichotomous)
population of size N, without replacement, having K
“success”-type and (N-K) “failure”-type elements. (e.g.
Pass/Fail or Employed/ Unemployed).
• The probability of a success changes on each draw, as each
draw decreases the population.
• X = number of successes in the sample. Then X has the
hypergeometric distribution:
13
Multivariate Hypergeometric Distribution
• Randomly sample n elements from a finite (polytomous)
population of size N, without replacement, having K1, K2, ..., Kc
elements of types 1, 2, …, c.
• Xi= number of i-th type elements in the sample, i=1,…,c. Then
X has multivariate hypergeometric distribution:
𝑐 𝐾𝑖
𝑖=1 𝑥
𝑖
𝑃(𝑋𝑖 = 𝑥𝑖 , 𝑖 = 1, … , 𝑐) =
𝑁
𝑛
• E(Xi)=n(Ki/N),
• V(Xi) = {n(Ki/N)(1 – (Ki/N) )}×[(N-n)/(N-1)]
• Cov(Xi, Xj) = {n(Ki/N)(Kj/N) }×[(N-n)/(N-1)]
14
Inference for Multinomial Case
15
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
Suppose Y is a random variable with k possible
outcome categories c1,c2,…,ck with probabilities π1,
π2,…, πk=(1- π1-…- πk-1).
Suppose there are n observations on Y; we can
summarize the responses through the vector of
observed frequencies (random variables), X = (X1,
X2,…, Xk), where Xk=n- X1-…- Xk-1.
16
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
X = (X1, X2,…, Xk) has a multinomial distribution
with parameters n and (π1, π2,…, πk ).
𝑛! 𝑥 𝑥
P(X1=x1, …, Xk=xk) = 𝜋1 1 … 𝜋𝑘 𝑘
𝑥1 ! 𝑥2 !… 𝑥𝑘 !
𝐸(𝑋𝑖 )=n𝜋𝑖 ,
17
Example 1.1, p. 6, Text
One-Way Frequency Table for Metabolic Syndrome Study
MS
Present Absent Total
48 45 93
19
Example: Pearson’s χ2 Test
When we are trying to do a test of hypothesis to determine
whether a die is a fair die, it is a simple hypothesis.
Suppose we roll it 120 times and the summarized data
are as follows:
In this case, k=6 and n=120. H0: πi = 1/6 (= π0i), i=1,2,…,6
20
Pearson’s Chi-Square (contd.)
The hypothesis presented in Equation (1) is an
example of a simple hypothesis. (Simple in the sense
that the hypothesis completely specifies the true
distribution).
21
Multinomial Example, p.38,Text
Multinomial Case:
Depression Diagnosis in the DOS Study
Major Dep Minor Dep No Dep Total
128 136 481 745
DOS = Depression Of Seniors
24
Poisson Distribution Case
Suppose Y is a random variable taking integer values y=0,
−
𝑦
1, 2,…, with probability P(Y=y)=𝑒
𝑦!
Suppose there are n observations on Y; we can summarize
the observations through the vector of observed
frequencies for value-categories 0, 1, 2, …
27
Intentionally Kept Blank
28
Sampling Schemes
Leading to (2×2) Contingency Tables
29
Layout of the 2×2 table
Column factor
(‘Response’)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Fact Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 30
Totals
Sampling schemes
leading to 2×2 contingency tables
31
Poisson Sampling
• Poisson Sampling (French mathematician Simeon
Denis Poisson): Here a fixed amount of time (or space,
volume, money etc.) is employed to collect a random
sample from a single population and each member of
the population falls into one of the four cells in the
2×2 table.
• In the CVD Death example 1 (next slide), researchers
spent a certain amount of time sampling the health
records of 3112 women who were categorized as
obese and non obese against died of CVD or not. In
this case, none of the marginal totals or the sample
size was known in advance.
32
Example-1: Cardio-Vascular Deaths and Obesity among
women in American Samoa
34
Prospective Product Binomial Sampling
• Prospective Product Binomial Sampling
(“cohort” study): First identify explanatory variable(s)
that explain “causation” . Population is categorized according
to levels of explanatory variable and random samples are then
selected from each explanatory group.
If separate lists of obese and non obese American Samoan
women were available in Example 1, a random sample of
2500 could have been selected from each. The term Binomial
refers to the dichotomy of the explanatory variable. The term
Product refers to the fact that sampling is done from more
than one population independently.
35
Example-2: Vitamin-C versus Common Cold
Outcome
36
Retrospective Product Binomial Sampling
37
Example 3: Smoking versus Lung Cancer
Outcome
CANCER CONTROL TOTAL
SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172
38
Retrospective Product Binomial
Sampling
• We cannot test for the equality of proportions along the
explanatory variable if the sampling scheme is
retrospective.
• We only get odds ratio from a case control study which is
an inferior measure of strength of association as
compared to relative risk.
• Why do retrospective sampling at all, then?
Compared to prospective cohort studies they tend to be less
costly and shorter in duration. Case-control studies are often
used in the study of rare diseases, or as a preliminary study
where little is known about the association between possible
risk factor and disease of interest.
39
Retrospective Product Binomial
Sampling (Continued)
• If the probabilities of the “Yes” response are very
small, it may need a huge sample size to get any
“Yes” response at all through prospective sampling.
• Retrospective sampling guarantees that we have at
least a reasonable number of “Yes” responses for
each level of explanatory variable.
• In the smoking versus lung cancer study (Example 3),
retrospective sampling may be accomplished without
having to follow the subjects throughout their
lifetime.
40
Prospective
Subjects selected
according to the levels
of the explanatory
variable
Explanatory Response
Variable Variable
Retrospective
Subjects selected
according to
levels of the
Response variable
41
Layout of the 2×2 table
Column factor
(Response)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Factor Row
(Explanatory) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 42
Totals
Estimated Proportions
• Proportion of “Yes” (Level 1) response in the
first level of the explanatory variable is
ˆ1 n11 / R1
ˆ 2 n21 / R2
43
Assumption
• We will assume that the frequencies of all the entries
in the 2x2 table are greater than 5.
• This ensures that the “asymptotic tests” performed
on the 2x2 tables are reasonably accurate.
(“asymptotic” means ‘appropriate in large samples’)
• If all the entries in the 2x2 table are not greater than
5, one may try Fisher’s Exact test.
44
Example-1: Cardio-Vascular Deaths and Obesity among
women in American Samoa
48
Calculations
51
Example 3: Smoking versus Lung Cancer
Outcome
CANCER CONTROL TOTAL
SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172
53
Intentionally Kept Blank
54
Exact Test: Independence of Two Attributes
• Example: Data collected on a random sample of
people attending preview of a movie
• Did the movie have equal appeal to the young and old
or whether it is more liked by the young.
• Test H0: two attributes are independent against Ha:
they are positively associated.
55
Exact Test: Independence of Two Attributes
• To test if two qualitative characters (attributes) A and B
are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
𝑙
• Let 𝑃 𝐴 = 𝐴𝑖 = 𝑗=1 𝑝𝑖𝑗 = 𝑝𝑖0 ; Let 𝑃 𝐵 = 𝐵𝑗 =
𝑘
𝑖=1 𝑝𝑖𝑗 = 𝑝0𝑗
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency
of Ai and Bj are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
56
Exact (Conditional) Test: Independence of
Two Attributes
• To test if two qualitative characters (attributes) A and B are
independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency of Ai and Bj
are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
• Under H0, conditional distribution of {nij, all I,j} given current
sample marginals {𝑛𝑖0 , 𝑛0𝑗 , all i, j} has the (multivariate
hypergeometric) pmf
57
Exact (Conditional) Test: Independence of
Two Attributes
• Add up probabilities, under H0, of the given table and of
those indicating more extreme positive association (and
having the same marginals). These tables and
corresponding probabilities are:
59
Homogeneity versus Independence
Hypotheses
• Hypothesis of homogeneity
H0: π1 = π2
Not done in Retrospective Product Binomial Sampling
• Hypothesis of Independence
(At this stage qualitatively expressed)
Done only in Poisson or Multinomial Sampling
60
Homogeneity versus Independence
Hypotheses (contd.)
• The hypothesis of independence is used to
investigate an association between row and column
factors without specifying one of them as a
response. Although the hypotheses may be
expressed in terms of parameters, it is more
convenient to use the qualitative wording:
• H0: The row categorization is independent of the
column categorization
61
Sampling scheme versus Hypotheses
Sampling scheme Marginal Total fixed in Usual Hypothesis: Usual Hypothesis:
advance Independence Homogeneity
Poisson None YES YES
Multinomial Grand Total (Sample YES YES
size)
Prospective Row (explanatory) YES
total
Retrospective Column (Response) YES
total
62
Inference for 22 Table
(Sec 2.2, Text)
Measures of Association:
• (i) Relative Risk (or Incidence Rate Ratio or
‘Probability Ratio’)
• (ii) Difference Between Proportions,
• (iii) Odds Ratio
63
Is “Tutoring” Helpful in a Business Stat Course?
𝑎/𝑏 𝑎𝑑
Estimated Odds Ratio = =
𝑐/𝑑 𝑏𝑐
64
Relative Risk vs Odds Ratio
65
Layout of the 2×2 table
Column factor
(Response)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Factor Row
(Explanatory) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 66
Totals
(i) Relative Risk (RR) or Incidence Rate Ratio (IRR)
(Text, p.53)
67
Confidence Intervals for Relative Risk (RR)
(Text, p.54)
𝑛11
π1 𝑛1+
• Estimate of RR ( ): 𝑅𝑅 = 𝑛21
π2
𝑛2+
• Estimate of “asymptotic” variance of loge(RR):
1− π1 1− π2
• 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑅𝑅 = + (should 𝑛22 be 𝑛21 ?)
𝑛11 𝑛22
69
Confidence Interval for π1 π2
𝑛11 𝑛21
• Estimate of π1 π2: 𝜋1 𝜋2 = −
𝑛1+ 𝑛1+
𝜋1 (1− 𝜋1 ) 𝜋2 (1− 𝜋2 )
• 𝑉𝑎𝑟(𝜋1 − 𝜋2 ) = +
𝑛1+ 𝑛2+
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2 )
• s.e.( 𝜋1 − 𝜋2 ) = +
𝑛1+ 𝑛2+
• 100(1-)% CI for π1 π2:
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2)
(𝜋1 − 𝜋2 ) − 𝑍/2 + to
𝑛1+ 𝑛2+
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2)
(𝜋1 − 𝜋2 ) + 𝑍/2 +
𝑛1+ 𝑛2+
70
Testing H0: π1 π2 = 0
𝑛11 𝑛21
• Estimate of π1 π2: 𝜋1 𝜋2 = −
𝑛1+ 𝑛1+
𝑛11 +𝑛21
• =
𝑛1+ +𝑛2+
1 1
• 𝑉𝑎𝑟() = (1 − )( + )
𝑛1+ n2+
𝜋1 −𝜋2
• Test statistic, Z = is asymptotically
(1−)(𝑛 + n ) 1 1
1+ 2+
N(0,1), under H0, if n1+, n2+ are ‘large’
71
Exact Test of Two Proportions
• Example. Compare two methods of treatment of an allergy.
Method 1(A) uses 15 patients and Method 2(B) uses 14. Is
mehod 2 better than method 1 ?
• Here n1+=15, n2+ = 14, n11=6, n21 = 11 and Ha: p1 < p2. Here
sample sizes are not large, hence asymptotic tests are not
applicable. Need to use exact tests.
72
Exact (Conditional) Test of Two Proportions
(GGD, Fundamentals, Vol 1)
74
Exact Test of Two Proportions
75
Example: Exact Test of Two Proportions
• Example. Compare two methods of treatment of an allergy.
Method 1(A) uses 15 patients and Method 2(B) uses 14. Is
mehod 2 better than method 1 ?
76
(iii) Odds, and Odds Ratio
Odds of an outcome: Let be the population
proportion of “YES” outcomes. Then the
corresponding odds is given by,
/(1 )
The sample odds is given by,
ˆ ˆ /(1 ˆ )
77
(iii) Odds, and Odds Ratio (contd)
i = population proportion of “YES” response for
Group X=i. Then the odds of “YES” happening is given
𝜋𝑖
by: 𝜔𝑖 = , 0 ≤ 𝜔𝑖 < ∞.
1−𝜋𝑖
The sample odds of “YES” in Group i, give the
𝜋𝑖
estimate: 𝜔𝑖 = .
1−𝜋𝑖
Odds Ratio of “YES” response in Group 1 to that in
Group 2:
𝜔1 𝜋1 (1 − 𝜋2 )
𝜑= = ×
𝜔2 (1 − 𝜋1 ) 𝜋2
78
Odds versus Probabilities
Given the probability of a “YES” outcome, the
corresponding odds is given by,
/(1 )
Similarly, given the odds ω of a “YES” response, the
corresponding probability is given by
/(1 )
79
Odds versus Probabilities (contd.)
Interpretation: An event with chance of
occurrence 0.95 means the event has odds of 19
to 1 in favour of its occurrence while an event with
chances 0.05 has the same odds 19 to 1, against it.
80
Relation between Probability, Odds & Logit
Log(Odds)
Probability Odds =Logit
0 0 NC Odds maps probability
0.1 0.11 -2.20 from [0,1] to [0,)
0.2 0.25 -1.39 asymmetrically,
0.3 0.43 -0.85 while Logit maps it to
0.4 0.67 -0.41 (-, ) symmetrically
0.5 1.00 0.00
0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
81
Example: NFL Football
TEAM ODDS against (Prob of Win)
San Francisco 49ers Even (1/2)
Denver Broncos 5 to 2 (2/7)
New York Giants 3 to 1 (1/4)
Cleveland Browns 9 to 2 (2/11)
Los Angeles Rams 5 to 1 (1/6)
Minnesota Vikings 6 to 1 (1/7)
Buffalo Bills 8 to 1 (1/9)
Pittsburgh Steelers 10 to 1 (1/11)
83
The Following are Equivalent
• The proportions π1, π2 are equal.
84
Confidence Intervals for Odds Ratio (OR)
(Text, p.52)
𝑛11 𝑛22
• Estimate of OR : 𝑂𝑅 =
𝑛21 𝑛12
• Estimate of “asymptotic” variance of loge(OR):
1 1 1 1
• 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑂𝑅 = + + +
𝑛11 𝑛22 𝑛21 𝑛12
• 100(1-)% CI for OR:
𝑙𝑜𝑔𝑒 𝑂𝑅 exp − 𝑍/2 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑂𝑅 to
• Alternatively,
H0 : ω1 = ω2, or H0: φ = 1, or H0: log(φ) = 0
86
Odds Ratio (Contd.)
Interpretation:
If the odds ratio =1 /2 equals to 4, then 1=42.
This means that the odds of a “yes” outcome in the
first group is four times the odds of a “yes” outcome in
the second group.
87
Advantages of Odds Ratio over
Risk Ratio or Difference of Proportions
1. Estimate of Odds Ratio (OR) remains invariant over
the sampling design (i.e., works even in case of
retrospective sampling), and it is given by
𝑂𝑅=(n11n22)/(n12n21), since
𝑷(𝒀=𝟏|𝑿=𝟏) 𝑃(𝑌=1,𝑋=1)
𝑷(𝒀=𝟎|𝑿=𝟏) 𝑃(𝑌=0,𝑋=1) 𝑃(𝑌=1,𝑋=1)𝑃(𝑌=0,𝑋=0)
𝑷(𝒀=𝟏|𝑿=𝟎) = 𝑃(𝑌=1,𝑋=0) =
𝑃 𝑌=0,𝑋=1 𝑃(𝑌=1,𝑋=0)
𝑷(𝒀=𝟎|𝑿=𝟎) 𝑃(𝑌=0,𝑋=0)
𝑃 𝑋 =1 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟏
𝑃 𝑋 =1 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟏
= =
𝑃 𝑋 =0 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟎
𝑃 𝑋 =0 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟎
2. Comparison of odds extends nicely to regression
analysis when response (Y) is a categorical variable. 88
Computation of odds ratio in a 2x2 table
Cold No Cold
Placebo 335 76
Calculate odds ratio by dividing the product of the diagonal elements of the
table with that of the off diagonal element of the table.
The above result indicates that the odds of getting cold on a placebo
treatment is 1.53 times larger than that of getting cold on vitamin C
treatment.
89
Example: Computation of odds ratio
Cancer Control
Smoker 83 72
Non-Smoker 3 14
Calculate odds ratio by dividing the product of the diagonal elements of the
table with that of the off diagonal element of the table.
The above result indicates that the odds of getting cancer for a smoker is
5.38 times larger than that of getting cancer for a non-smoker.
90
Sampling Distribution of the
Loge of Estimated Odds Ratio
91
Two Formulae of Standard Errors for the
Loge of Odds Ratio
• The estimated variance is obtained by substituting
sample quantities for unknowns in the variance
formula of the estimator. The sample quantities used
to replace the unknowns depend on the usage.
– For a confidence interval, π1 and π2 are replaced by their
individual sample estimates.
– For the test of hypothesis, they are replaced by their
pooled sample estimate from the combined sample.
92
• Testing: The Odds are equal then odds ratio=1
ln(odds ratio) = 0.
– If the sample sizes are large, resulting P-value for testing
ln(1/2) = 0, is nearly identical to that obtained with
the Z-test for equal proportions (π1 = π2).
93
Testing Equality of proportions π1 and π2,
i.e., log(OR)=0 :
• To test the equality of odds of “YES” 1 and 2 in two
Groups ( H0: 1/ 2 =1) , one estimates the common
proportion from combined sample and compute
standard error based on it.
• Estimated st. dev. for constructing Test Statistic:
1 1
s.e.(𝑙𝑛( 𝜔1 /𝜔2 )) = 𝑛1+ 𝜋𝑐 (1− 𝜋𝑐 )
+
𝑛2+ 𝜋𝑐 (1− 𝜋𝑐 )
(𝑛11 +𝑛21 )
𝑤ℎ𝑒𝑟𝑒 𝜋𝑐 =
(𝑛1+ +𝑛2+ )
𝑙𝑛( 𝜔1 /𝜔2 )
• Test statistic=
s.e.( 𝑙𝑛( 𝜔1 /𝜔2 )) ~ N(0,1)
Reject H0 if |Test statistic value| > Z/2
94
Example: Cardio-Vascular Deaths and Obesity
among women in American Samoa
1 1 1 1
= + + + (short-cut formula, p.52, text)
𝒏𝟏𝟏 𝒏𝟏𝟐 𝒏𝟐𝟏 𝒏𝟐𝟐
4. 95% interval for the odds ratio exp(0.093) to exp(0.761); or 1.10 to 2.14
Conclusion: The odds of a cold for the placebo group are estimated to be 1.53
times the odds of a cold for the vitamin C group (approximate 95% CI: 1.10 to 2.14)
99
Intentionally Kept Blank
100
Test for Marginal Homogeneity
(McNemar’s Test, Text, p.55-56)
103
Cochran-Mantel-Haenszel Test for no row by
column association in any of the 22 Tables
(pp. 94-101)
104
Cochran-Mantel-Haenszel Test (pp. 94-101)
𝑞 (ℎ) (ℎ) 2
𝑛11 −𝑚11
𝑄𝐶𝑀𝐻 =
ℎ=1
𝑞 (ℎ) , Text, p. 100: QCMH = (18-16.4 +
𝑣11
ℎ=1 32 – 28.8)2/(2.3855 + 3.7236) =
Here, h=1,2
3.7714; P-value = 0.052 with
(ℎ) 𝑛 𝑛
(ℎ) (ℎ)
𝑛2+ 𝑛+2
(ℎ) (ℎ) 12 dist
𝑤ℎ𝑒𝑟𝑒 𝑣11 = 1+(ℎ) +1
2 𝑛(ℎ) − 1
𝑛
105
Intentionally Kept Blank
106
Cochran-Armitage Trend Test
(See Text, p.60-61)
Binary categorical (row) variable X, ordered (column)
response variable Y.