0% found this document useful (0 votes)

9 views56 pages

Lecture Notes 3

CDA3

Uploaded by

kenenisa Abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views56 pages

Lecture Notes 3

CDA3

Uploaded by

kenenisa Abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

' $

ST3241 Categorical Data Analysis I

Two-way Contingency Tables

Odds Ratio and Tests of Independence

& %
1
' $

Inference For Odds Ratio (p. 24)

• For small to moderate sample size, the distribution of sample
odds ratio θ̂ is highly skewed.
• For θ = 1, θ̂ cannot be much smaller that θ, but it can be much
larger than θ with nonnegligible probability.
• Consider log odds ratio, log θ
• X and Y are independent implies log θ = 0.

& %
2
' $

Log Odds Ratio

• Log odds ratio is symmetric about zero in the sense that
reversal of rows or reversal of columns changes its sign only.
• The sample log odds ratio, log θ̂ has a less skewed distribution
and can be approximated by the normal distribution well.
• The asymptotic standard error of log θ̂ is given by
r
1 1 1 1
ASE(log θ̂) = + + +
n11 n12 n21 n22

& %
3
' $

Confidence Intervals
• A large sample confidence interval for log θ is given by

log(θ̂) ± zα/2 ASE(log θ̂)

• A large sample confidence interval for θ is given by

exp[log(θ̂) ± zα/2 ASE(log θ̂)]

& %
4
' $
Example: Aspirin Usage
• Sample Odds Ratio = 1.832
• Sample log odds ratio, log θ̂ = log(1.832) = 0.2629
• ASE of log θ̂
r
1 1 1 1
+ + + = 0.123
89 10933 10845 104

• 95% confidence interval for log θ equals

0.605 ± 1.96 × 0.123

• The corresponding confidence interval for θ is (e0.365 , e0.846 ) or

(1.44,2.33).
& %
5
' $

Recall SAS Output

Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits

Case-Control (Odds Ratio) 1.8321 1.4400 2.3308

Cohort (Col1 Risk) 1.8178 1.4330 2.3059
Cohort (Col2 Risk) 0.9922 0.9892 0.9953

Sample Size = 22071

& %
6
' $

A Simple R Function For Odds Ratio

> odds.ratio < −
function(x, pad.zeros = FALSE, conf.level=0.95) {
if(pad.zeros) {
if(any(x==0)) x<-x+0.5
}
theta<-x[1,1]*x[2,2]/(x[2,1]*x[1,2])
ASE<-sqrt(sum(1/x))
CI<-exp(log(theta) +
c(-1,1)*qnorm(0.5*(1+conf.level))*ASE)
list(estimator=theta, ASE=ASE,
conf.interval=CI, conf.level=conf.level) }

& %
7
' $

Notes (p. 25)

• Recall the formula for sample odds ratio
n11 n22
θ̂ =
n12 n21

• The sample odds ratio is 0 or ∞ if any nij = 0 and it is

undefined if both entries in a row or column are zero.
• Consider the slightly modified formula
(n11 + 0.5)(n22 + 0.5)
θ̃ =
(n12 + 0.5)(n21 + 0.5)

• In the ASE formula also, nij ’s are replaced by nij + 0.5.

& %
8
' $

Observations
• A sample odds ratio 1.832 does not mean that p1 is 1.832 times
p2 .
• A simple relation:
p1 /(1 − p1 ) 1 − p2
OddsRatio = = RelativeRisk ×
p2 /(1 − p2 ) 1 − p1

• If p1 and p2 are close to 0, the odds ratio and relative risk take
similar values.
• This relationship between odds ratio and relative risk is useful.

& %
9
' $

Example: Smoking Status and Myocardial Infarction

Ever Myocardial
Smoker Infarction Controls

Yes 172 173

No 90 346

• Odds Ratio=? (3.8222)

• How do we get relative risk? (2.4152)

& %
10
' $

Chi-Squared Tests (p.27)

• To test H0 that the cell probabilities equal certain fixed values
{πij }.
• Let {nij } be the cell frequencies and n be the total sample size.
• Then µij = nπij are the expected cell frequencies under H0 .
• Pearson (1900)’s chi-squared test statistic
X (nij − µij )2
2
χ =
µij

& %
11
' $

Some Properties
• This statistic takes its minimum value of zero when all
nij = µij .
• For a fixed sample size, greater differences between {nij } and
{µij } produce larger χ2 values and stronger evidence against
H0 .
• The χ2 statistic has approximately a chi-squared distribution
with appropriate degrees of freedom for large sample sizes.

& %
12
' $

Likelihood-Ratio Test (p. 28)

• The likelihood ratio

maximum likelihood when H0 is true

Λ=
maximum likelihood when parameters are unrestricted

• In mathematical notation, if L(θ) denotes the likelihood

function with θ as the set of parameters and the null
hypothesis is H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 , the likelihood
ratio is given by
supθ∈Θ0 L(θ)
Λ=
supθ∈(Θ0 ∪Θ1 ) L(θ)

& %
13
' $

Properties
• Likelihood ratio cannot exceed 1.
• Small likelihood ratio implies deviation from H0 .
• Likelihood ratio test statistic is −2 log Λ, which has a
chi-squared distribution with appropriate degrees of freedom
for large samples.
• For a two-way contingency table, this statistic reduces to
X nij
2
G =2 nij log( )
µij

• The test statistics χ2 and G2 have the same large sample

distribution under null hypothesis.

& %
14
' $

Tests of Independence (p.30)

• To test: H0 : πij = πi+ π+j for all i and j.
• Equivalently, H0 : µij = nπij = nπi+ π+j .
• Usually, {πi+ } and {π+j } are unknown.
• We estimate them, using sample proportions
ni+ n+j ni+ n+j
µˆij = npi+ p+j = n =
n2 n

• These {µ̂ij } are called estimated expected cell frequencies

& %
15
' $

Test Statistics
• Pearson’s Chi-square test statistic
XI X J
2 (nij − µ̂ij )2
χ =
i=1 j=1
µ̂ij

• Likelihood ratio test statistic

I X
X J
2 nij
G =2 nij log( )
i=1 j=1
µ̂ij

• Both of them have large sample chi-squared distribution with

(I − 1)(J − 1) degrees of freedom.

& %
16
' $

Party Identification By Gender (p.31)

Party Identification
Gender Democrat Independent Republican Total

Females 279 73 225 577

(261.4) (70.7) (244.9)
Males 165 47 191 403
(182.6) (49.3) (171.1)
Total 444 120 416 980

& %
17
' $

Example: Continued · · ·
• The test statistics are: χ2 = 7.01 and G2 = 7.00
• Degrees of freedom = (I − 1)(J − 1) = (2 − 1)(3 − 1) = 2.
• p-value = 0.03.
• Thus, the above test statistics suggest that party identification
and gender are associated.

& %
18
' $

SAS Codes: Read The Data

data Survey;
length Party $ 12;
input Gender $ Party $ count;
datalines;
Female Democrat 279
Female Independent 73
Female Republican 225
Male Democrat 165
Male Independent 47
Male Republican 191
;
run;

& %
19
' $

SAS Codes: Use Proc Freq

proc freq data=survey order=data;
weight count;
tables gender*party / chisq expected
nopercent norow nocol;
run;

& %
20
' $

Output
The FREQ Procedure
Table of Gender by Party
Gender Party
Frequency
Expected Democrat Independent Republican Total

Female 279 73 225 577

261.42 70.653 244.93

Male 165 47 191 403

182.58 49.347 171.07

Total 444 120 416 980

& %
21
' $

Output
Statistics for Table of Gender by Party
Statistic DF Value Prob
-------------------------------------------------
Chi-Square 2 7.0095 0.0301
Likelihood Ratio Chi-Square 2 7.0026 0.0302
Mantel-Haenszel Chi-Square 1 6.7581 0.0093
Phi Coefficient 0.0846
Contingency Coefficient 0.0843
Cramer’s V 0.0846
Sample Size = 980

& %
22
' $

R Codes
>gendergap<-matrix(c(279,73,225,165,47,191),
byrow=T,ncol=3)
>dimnames(gendergap)<-
list(Gender=c("Female","Male"),
PartyID=c("Democrat","Independent",
"Republican"))
>gendergap
PartyID
Gender Democrat Independent Republican
Female 279 73 225
Male 165 47 191

& %
23
' $

R Codes
>chisq.test(gendergap)

Pearson’s Chi-squared test

data: gendergap
X-squared = 7.0095, df = 2,p-value = 0.03005

& %
24
' $

An Alternative Way
>Gender<-c("Female","Female","Female","Male",
"Male","Male")
>Party<-c("Democrat","Independent", "Republican",
"Democrat","Independent", "Republican")
>count<-c(279,73,225,165,47,191)
>gender1<-data.frame(Gender,Party,count)
>gender<-xtabs(count Gender+Party, data=gender1)
>gender
>summary(gender)

& %
25
' $

Output

Party
Gender Democrat Independent Republican
Female 279 73 225
Male 165 47 191
Call: xtabs(formula = count Gender + Party,
data = gender1)
Number of cases in table: 980
Number of factors: 2
Test for independence of all factors:
Chisq = 7.01, df = 2, p-value = 0.03005

& %
26
' $

Table of Expected Cell Counts

> rowsum<-apply(gender,1,sum)
> colsum<-apply(gender,2,sum)
> n<-sum(gender)
> gd<-outer(rowsum,colsum/n,
make.dimnames=T)

& %
27
' $

Table of Expected Cell Counts

> gd
Democrat Independent Republican
Female 261.4163 70.65306 244.9306
Male 182.5837 49.34694 171.0694

& %
28
' $

Residuals (p.31)
• To understand better the nature of evidence against H0, a cell
by cell comparison of observed and estimated frequencies is
necessary.
• Define, adjusted residuals
nij − µ̂ij
rij = p
µ̂ij (1 − pi+ )(1 − p+j )

• If H0 is true, each rij has a large sample standard normal

distribution.
• If rij in a cell exceeds 2 then it indicates lack of fit of H0 in
that cell.
• The sign also describes the nature of association.
& %
29
' $

Computing Residuals in R
> rowp<-rowsum/n %Row marginal prob.
> colp<-colsum/n %Column marginal prob.
> pd<-outer(1-rowp,1-colp,
make.dimnames=T)
> resid<-(gender-gd)/sqrt(gd*pd)
> resid

& %
30
' $

Residuals Output

Party
Gender Democrat Independent Republican
Female 2.2931603 0.4647941 -2.6177798
Male -2.2931603 -0.4647941 2.6177798

& %
31
' $

Some Comments (p.33)

• Pearson’s χ2 tests only indicate the degree of evidence for an
association, but they cannot answer other questions like nature
of association etc.
• These χ2 tests are not always applicable. We need large data
sets to apply them. Approximation is often poor when
n/(IJ) < 5.
• The values of χ2 or G2 do not depend on the ordering of the
rows. Thus we ignore some information when there is ordinal
data.

& %
32
' $

Testing Independence For Ordinal Data (p.34)

• For ordinal data, it is important to look for types of
associations when there is dependence.
• It is quite common to assume that as the levels of X increases,
responses on Y tend to increases or responses on Y tends to
decrease to ward higher levels of X.
• The most simple and common analysis assigns scores to
categories and measures the degree of linear trend or
correlation.
• The method used is known as “Mantel-Haenszel Chi-Square”
test (Mantel and Haenszel 1959).

& %
33
' $

Linear Trend Alternative to Independence

• Let u1 ≤ u2 ≤ · · · ≤ uI denote scores for the rows.
• Let v1 ≤ v2 ≤ · · · ≤ vJ denote scores for the columns.
• The scores have the same ordering as the category levels.
• Define the correlation between X and Y as
P
I P
J P
I P
J
ui vj nij − ( ui ni+ )( vj n+j )/n
i=1 j=1 i=1 j=1
r= s
P
I P
I P
J P
J
[ u2i ni+ −( ui ni+ )2 /n][ vj2 n+j −( vj n+j )2 /n]
i=1 i=1 j=1 j=1

& %
34
' $

Test For Linear Trend Alternative

• Independence between the variables implies that its true value
equals zero.
• The larger the correlation is in absolute value, the farther the
data fall from independence in this linear dimension.
• A test statistic is given by M 2 = (n − 1)r2 .
• For large samples, it has approximately a chi-squared
distribution with 1 degrees of freedom.

& %
35
' $

Infant Malformation and Mothers Alcohol Consumption

Malformation
Alcohol
Consumption Absent(0) Present(1) Total

0 17,066 48 17,114
<1 14,464 38 14,502
1-2 788 5 793
3-5 126 1 127
≥6 37 1 38

& %
36
' $

Infant Malformation and Mothers Alcohol Consumption

Malformation
Alcohol Percent Adjusted
Consumption Absent(0) Present(1) Total Present Residual

0 17,066 48 17,114 0.28 -0.18

<1 14,464 38 14,502 0.26 -0.71
1-2 788 5 793 0.63 1.84
3-5 126 1 127 0.79 1.06
≥6 37 1 38 2.63 2.71

& %
37
' $

Example: Tests For Independence

• Pearson’s χ2 = 12.1, d.f. = 4, p-value = 0.02.
• Likelihood Ratio Test, G2 = 6.2 , d.f. = 4, p-value =.19.
• The two tests give inconsistent signals.
• The percent present and adjusted residuals suggest that there
may be a linear trend.

& %
38
' $

Test For Linear Trend

• Assign scores, v1 = 0, v2 = 1 and
u1 = 0, u2 = 0.5, u3 = 1.5, u4 = 4.0 and u5 = 7.0.
• We have, r = 0.014, n = 32, 574 and M 2 =6.6 with p-value =
0.01.
• It suggests strong evidence of a linear trend for infant
malformation with alcohol consumption of mothers.

& %
39
' $

SAS Codes
data infants;
input malform alcohol count @@;
datalines;
1 0 17066 2 0 48
1 0.5 14464 2 0.5 38
1 1.5 788 2 1.5 5
1 4.0 126 2 4.0 1
1 7.0 37 2 7.0 1
;
run;
proc format;
value malform 2=’Present’ 1=’Absent’;
value Alcohol 0=’0’ 0.5=’<1’ 1.5=’1-2’ 4.0=’3-5’
7.0=’>=6’;
run;

& %
40
' $

SAS Codes
proc freq data = infants;
format malform malform. alcohol alcohol.;
weight count;
tables alcohol*malform / chisq cmh1 norow
nocol nopercent;
run;

& %
41
' $

Partial Output

Statistic DF Value Prob

---------------------------------------------------------
Chi-Square 4 12.0821 0.0168
Likelihood Ratio Chi-Square 4 6.2020 0.1846
Mantel-Haenszel Chi-Square 1 6.5699 0.0104
Phi Coefficient 0.0193
Contingency Coefficient 0.0193
Cramer’s V 0.0193

& %
42
' $

Partial Output
Cochran-Mantel-Haenszel Statistics (Based on Table
Scores)
Statistic Alternative Hypothesis DF Value Prob
-------------------------------------------------------
1 Nonzero Correlation 1 6.5699 0.0104

Total Sample Size = 32574

& %
43
' $

Notes
• The correlation r has limited use as a descriptive measure of
tables.
• Different choices of monotone scores usually give similar results.
• However, it may not happen when the data are very
unbalanced, i.e. when some categories have many more
observations than other categories.
• If we had taken (1, 2, 3, 4, 5) as the row scores in our example,
then M 2 = 1.83 and p-value = 0.18 gives a much weaker
conclusion.
• It is usually better to use one’s own judgment by selecting
scores that reflect distances between categories.

& %
44
' $

SAS Codes
data infantsx;
input malform alcoholx count @@;
datalines;
1 0 17066 2 0 48
1 1 14464 2 1 38
1 2 788 2 2 5
1 3 126 2 3 1
1 4 37 2 4 1
;
run;
proc freq data = infantsx;
weight count;
tables alcoholx*malform / cmh1 norow nocol nopercent;
run;

& %
45
' $

Total Sample Size = 32574

& %
46
' $

Fisher’s Tea Tasting Experiment (p.39)

Guess Poured First

Poured First Milk Tea Total

Milk 3 1 4
Tea 1 3 4
Total 4 4 8

& %
47
' $

Example: Tea Tasting

• To test whether she can tell accurately.
• To test H0 : θ = 1 against H1 : θ > 1.
• We cannot use previously discussed tests as we have a very
small sample size.

& %
48
' $

Fisher’s Exact Test

• For a 2 × 2 table, under the assumption of independence, i.e. θ
= 1, the conditional distribution of n11 given the row and
column totals is hypergeometric.
• For given row and column marginal totals, the value for n11
determines the other three cell counts. Thus, the
hypergeometric formula expresses probabilities for the four cell
counts in terms of n11 alone.

& %
49
' $

Fisher’s Exact Test

• When θ = 1, the probability of a particular value n11 for that
count equals
  
n n2+
 1+   
n11 n+1 − n11
p(n11 ) =  
n
 
n+1

• To test independence, the p-value is the sum of hypergeometric

probabilities for outcomes at least as favorable to the
alternative hypothesis as the observed outcome.

& %
50
' $

Example: Tea Tasting

• The outcomes at least as favorable as the observed data is
n1 1 = 3 and 4 given the row and column totals.
• Hence, 0 10 1
B 4 CB 4 C
B CB C
@ A@ A
3 1 16
p(3) = 0 1 = 70 = .2286,
B 8 C
B C
@ A
4
0 10 1
B 4 CB 4 C
B CB C
@ A@ A
4 0 1
p(4) = 0 1 = 70 = .0143.
B 8 C
B C
@ A
4

& %
51
' $

Example: Tea Tasting

• Therefore, p-value = P (3) + P (4) = 0.243.
• There is not much evidence against the null hypothesis of
independence.
• The experiment did not establish an association between the
actual order of pouring and the guess.

& %
52
' $

SAS Codes: Exact Test

data tea;
input poured $ guess $ count @@;
datalines;
Milk Milk 3 Milk Tea 1
Tea Milk 1 Tea Tea 3
;
proc freq data=tea order=data;
weight count;
tables poured*guess / exact;
run;

& %
53
' $

Partial Output

Fisher’s Exact Test

----------------------------------
Cell (1,1) Frequency (F) 3
Left-sided Pr <= F 0.9857
Right-sided Pr >= F 0.2429
Table Probability (P) 0.2286
Two-sided Pr <= P 0.4857
Sample Size = 8

& %
54
' $

R Codes
> Poured<-c("Milk","Milk","Tea","Tea")
> Guess<-c("Milk","Tea","Milk","Tea")
> count<-c(3,1,1,3)
> teadata<-data.frame(Poured,Guess,count)
> tea<-xtabs(count Poured+Guess,data=teadata)
> fisher.test(tea,alternative="greater")

& %
55
' $

Output
Fisher’s Exact Test for Count Data
data: tea p-value = 0.2429
alternative hypothesis: true odds ratio is
greater than 1
95 percent confidence interval:
0.3135693 Inf
sample estimates:
odds ratio
6.408309

& %
56

Inferences On Two-Way Contingency Tables
No ratings yet
Inferences On Two-Way Contingency Tables
45 pages
Chi-Square Test: Milan A Joshi
No ratings yet
Chi-Square Test: Milan A Joshi
39 pages
SPSS: Chi-Square Test of Independence: Analyze
No ratings yet
SPSS: Chi-Square Test of Independence: Analyze
34 pages
Chi Square Test
No ratings yet
Chi Square Test
38 pages
Quiz 4 Review
No ratings yet
Quiz 4 Review
21 pages
Chi-Square: Heibatollah Baghi, and Mastee Badii
No ratings yet
Chi-Square: Heibatollah Baghi, and Mastee Badii
37 pages
Chi Square
No ratings yet
Chi Square
37 pages
Chi Square
No ratings yet
Chi Square
41 pages
Test of Goodness of Fit and Independence: Chi-Square-test-as A Test of Independence
No ratings yet
Test of Goodness of Fit and Independence: Chi-Square-test-as A Test of Independence
9 pages
Chi Square Test
100% (2)
Chi Square Test
5 pages
Week 6 Lecture 1 - 2023-2024
No ratings yet
Week 6 Lecture 1 - 2023-2024
47 pages
Chapter 8-10 Contigency Table, Correlation and Regression
No ratings yet
Chapter 8-10 Contigency Table, Correlation and Regression
91 pages
Measures of Association
No ratings yet
Measures of Association
56 pages
Probability and Statistics - Lecture 4
No ratings yet
Probability and Statistics - Lecture 4
35 pages
Chi-Square Test for Independence Guide
No ratings yet
Chi-Square Test for Independence Guide
11 pages
Contingency Table
No ratings yet
Contingency Table
5 pages
10measures of Association
No ratings yet
10measures of Association
249 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
1measures of Association
No ratings yet
1measures of Association
105 pages
Chi Square (Χ) : Yetty Dwi Lestari Department of Management, FEB Airlangga University
No ratings yet
Chi Square (Χ) : Yetty Dwi Lestari Department of Management, FEB Airlangga University
71 pages
Analysis of Categorical Data
No ratings yet
Analysis of Categorical Data
75 pages
Lecture 4&5-Categorical Data Analysis
No ratings yet
Lecture 4&5-Categorical Data Analysis
85 pages
6 Contingency Tables
No ratings yet
6 Contingency Tables
72 pages
Lecture10 Chisquare Tests
No ratings yet
Lecture10 Chisquare Tests
45 pages
Categorical Notes Ch3
No ratings yet
Categorical Notes Ch3
15 pages
Chi-Square Test of Independence
No ratings yet
Chi-Square Test of Independence
15 pages
6.3 Chi-Square
No ratings yet
6.3 Chi-Square
35 pages
Chi - Square
No ratings yet
Chi - Square
21 pages
Two-Way Tables - Measures of Association
No ratings yet
Two-Way Tables - Measures of Association
33 pages
Statistical Theory Lecture 5-2025
No ratings yet
Statistical Theory Lecture 5-2025
13 pages
Chi Square
No ratings yet
Chi Square
29 pages
Chi-Squared Analysis Guide
No ratings yet
Chi-Squared Analysis Guide
26 pages
Chi Square
No ratings yet
Chi Square
4 pages
Chi Square
No ratings yet
Chi Square
20 pages
Chi-Square Distribution Guide
No ratings yet
Chi-Square Distribution Guide
21 pages
Applied Econometrics Using Stata
No ratings yet
Applied Econometrics Using Stata
48 pages
Statistical Significance & Association
No ratings yet
Statistical Significance & Association
21 pages
Chi-Square Test
No ratings yet
Chi-Square Test
10 pages
BS IMI U8 Oct23
No ratings yet
BS IMI U8 Oct23
100 pages
Lecture 08 Test For Independence
No ratings yet
Lecture 08 Test For Independence
18 pages
Chi-Square Analysis for Researchers
No ratings yet
Chi-Square Analysis for Researchers
28 pages
Business Research CH-6
No ratings yet
Business Research CH-6
28 pages
STATIS (Statistics) Chapter 5: Introduction To Non-Parametric Lesson I The Non-Parametric (Brownlee, 2018)
No ratings yet
STATIS (Statistics) Chapter 5: Introduction To Non-Parametric Lesson I The Non-Parametric (Brownlee, 2018)
7 pages
Statistical Analysis Techniques
No ratings yet
Statistical Analysis Techniques
79 pages
Dsur I Chapter 18 Categorical Data
No ratings yet
Dsur I Chapter 18 Categorical Data
47 pages
Tests of Significance and Measures of Association
No ratings yet
Tests of Significance and Measures of Association
21 pages
Chi Square 2 - 2
No ratings yet
Chi Square 2 - 2
13 pages
Chi-Square Test Fall Semester 2024
No ratings yet
Chi-Square Test Fall Semester 2024
21 pages
Chapter - 16: Categorical Data Field (2005)
No ratings yet
Chapter - 16: Categorical Data Field (2005)
31 pages
Module 5 Inferential Statistics
No ratings yet
Module 5 Inferential Statistics
8 pages
Chi Squared Test
No ratings yet
Chi Squared Test
11 pages
Non Parametric Tests
No ratings yet
Non Parametric Tests
64 pages
Introduction To Nonparametric Statistics Craig L. Scanlan, Edd, RRT
No ratings yet
Introduction To Nonparametric Statistics Craig L. Scanlan, Edd, RRT
11 pages
Categorical Data Analysis Course
No ratings yet
Categorical Data Analysis Course
191 pages
Chi Square Report
No ratings yet
Chi Square Report
35 pages
QTM Cycle 7 Session 6
No ratings yet
QTM Cycle 7 Session 6
76 pages
Categorical Data Analysis: GLM Basics
100% (1)
Categorical Data Analysis: GLM Basics
53 pages
NaRM Mid
No ratings yet
NaRM Mid
2 pages
CH 1
No ratings yet
CH 1
7 pages
Whole YaxleyCorowa1997 Thesis
No ratings yet
Whole YaxleyCorowa1997 Thesis
173 pages
Gandhinagar Institute of Technology: Question Bank
No ratings yet
Gandhinagar Institute of Technology: Question Bank
5 pages
VBScript Examples
No ratings yet
VBScript Examples
8 pages
2021 Article
No ratings yet
2021 Article
17 pages
Com Lab Manual
100% (1)
Com Lab Manual
59 pages
SSC GD: Previous Paper
No ratings yet
SSC GD: Previous Paper
37 pages
Did Staggered
No ratings yet
Did Staggered
37 pages
House Price Prediction Guide
No ratings yet
House Price Prediction Guide
32 pages
Linear Differential Equation
No ratings yet
Linear Differential Equation
35 pages
Aids To Selection and Selection Methods.
No ratings yet
Aids To Selection and Selection Methods.
172 pages
History of Business Statistics
No ratings yet
History of Business Statistics
3 pages
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
No ratings yet
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
8 pages
First COT Detailed Lesson Plan
100% (2)
First COT Detailed Lesson Plan
7 pages
Encrypted Data Document
No ratings yet
Encrypted Data Document
14 pages
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
No ratings yet
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
6 pages
Name: - : Inquiry Question
No ratings yet
Name: - : Inquiry Question
14 pages
1.2 Test Mark Scheme
No ratings yet
1.2 Test Mark Scheme
5 pages
Worksheet - 1 Tangent - Normal
No ratings yet
Worksheet - 1 Tangent - Normal
11 pages
Maths Paper
No ratings yet
Maths Paper
32 pages
Reflection - Project in Enhanced Mathematics 8
No ratings yet
Reflection - Project in Enhanced Mathematics 8
5 pages
Asgn-6 Soln
No ratings yet
Asgn-6 Soln
16 pages
9-4 Notes PDF
No ratings yet
9-4 Notes PDF
18 pages
Phylogenetic Tree Creation Morphological and Molecular Methods For 07-Johnson
100% (2)
Phylogenetic Tree Creation Morphological and Molecular Methods For 07-Johnson
35 pages
Mining Machinery Maintenance Guide
100% (2)
Mining Machinery Maintenance Guide
22 pages
Term 2 - Week 14 - Activity 1 - Angles in Circles
No ratings yet
Term 2 - Week 14 - Activity 1 - Angles in Circles
2 pages
Unit1 PD
No ratings yet
Unit1 PD
8 pages
G.T.N. Arts College (Autonomous)
No ratings yet
G.T.N. Arts College (Autonomous)
20 pages
Database Concepts and SQL Quiz
100% (1)
Database Concepts and SQL Quiz
28 pages
Grade 2 Class Prog
No ratings yet
Grade 2 Class Prog
1 page
18.085 Computational Science and Engineering I: Mit Opencourseware
No ratings yet
18.085 Computational Science and Engineering I: Mit Opencourseware
13 pages
Unit 5
No ratings yet
Unit 5
25 pages

Lecture Notes 3

Uploaded by

Lecture Notes 3

Uploaded by

' $

ST3241 Categorical Data Analysis I

Odds Ratio and Tests of Independence

Inference For Odds Ratio (p. 24)

Log Odds Ratio

log(θ̂) ± zα/2 ASE(log θ̂)

• A large sample confidence interval for θ is given by

exp[log(θ̂) ± zα/2 ASE(log θ̂)]

• 95% confidence interval for log θ equals

0.605 ± 1.96 × 0.123

• The corresponding confidence interval for θ is (e0.365 , e0.846 ) or

Recall SAS Output

Case-Control (Odds Ratio) 1.8321 1.4400 2.3308

Sample Size = 22071

A Simple R Function For Odds Ratio

Notes (p. 25)

• The sample odds ratio is 0 or ∞ if any nij = 0 and it is

• In the ASE formula also, nij ’s are replaced by nij + 0.5.

Example: Smoking Status and Myocardial Infarction

Yes 172 173

• Odds Ratio=? (3.8222)

Chi-Squared Tests (p.27)

Likelihood-Ratio Test (p. 28)

maximum likelihood when H0 is true

• In mathematical notation, if L(θ) denotes the likelihood

• The test statistics χ2 and G2 have the same large sample

Tests of Independence (p.30)

• These {µ̂ij } are called estimated expected cell frequencies

• Likelihood ratio test statistic

• Both of them have large sample chi-squared distribution with

Party Identification By Gender (p.31)

Females 279 73 225 577

SAS Codes: Read The Data

SAS Codes: Use Proc Freq

Female 279 73 225 577

Male 165 47 191 403

Total 444 120 416 980

Pearson’s Chi-squared test

Table of Expected Cell Counts

Table of Expected Cell Counts

• If H0 is true, each rij has a large sample standard normal

Some Comments (p.33)

Testing Independence For Ordinal Data (p.34)

Linear Trend Alternative to Independence

Test For Linear Trend Alternative

Infant Malformation and Mothers Alcohol Consumption

Infant Malformation and Mothers Alcohol Consumption

0 17,066 48 17,114 0.28 -0.18

Example: Tests For Independence

Test For Linear Trend

Statistic DF Value Prob

Total Sample Size = 32574

Total Sample Size = 32574

Fisher’s Tea Tasting Experiment (p.39)

Guess Poured First

Poured First Milk Tea Total

Example: Tea Tasting

Fisher’s Exact Test

Fisher’s Exact Test

• To test independence, the p-value is the sum of hypergeometric

Example: Tea Tasting

Example: Tea Tasting

SAS Codes: Exact Test

Fisher’s Exact Test

You might also like