Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views33 pages

21 - Contingency Tables

The document provides an overview of contingency tables, including their definitions, applications, and statistical analysis using SPSS. It covers concepts such as crosstabulation, measures of association, chi-squared tests, and examples of data interpretation. Additionally, it discusses the relationship between observed and expected counts, as well as graphical representations of data.

Uploaded by

Moiz Alam Vlogs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views33 pages

21 - Contingency Tables

The document provides an overview of contingency tables, including their definitions, applications, and statistical analysis using SPSS. It covers concepts such as crosstabulation, measures of association, chi-squared tests, and examples of data interpretation. Additionally, it discusses the relationship between observed and expected counts, as well as graphical representations of data.

Uploaded by

Moiz Alam Vlogs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Contingency Tables

• Chapters Seven, Sixteen, and Eighteen


• Chapter Seven
– Definition of Contingency Tables
– Basic Statistics
– SPSS program (Crosstabulation)
• Chapter Sixteen
– Basic Probability Theory Concepts
– Test of Hypothesis of Independence
Contingency Tables (continued)
• Chapter Eighteen
– Measures of Association
– For nominal variables
– For ordinal variables
Basic Empirical Situation
• Unit of data.
• Two nominal scales measured for each unit.
– Example: interview study, sex of respondent,
variable such as whether or not subject has a
cellular telephone.
– Objective is to compare males and females with
respect to what fraction have cellular
telephones.
Crosstabulation of Data
• Prepare a data file for study.
– One record per subject.
– Three variables per record: subject ID, sex of
subject, and indicator variable of whether
subject has cellular telephone.
• SPSS analysis
– Statistics, summarize, crosstabs
• Basic information is the contingency table.
Two Common Situations
• Hypothesized causal relation between
variables.

• No hypothesized causal relation.


Hypothesized Causal Relation
• Classification of variables
– Independent variable is one hypothesized to be
cause. Example: sex of respondent.
– Dependent variable is hypothesized to be the effect.
Example: whether or not subject has cellular
telephone.
• Format convention
– Columns to categories of independent variable
– Rows to categories of dependent variable
Association Study
• No hypothesized causal mechanism.
– Whether or not subject above median on verbal
SAT and whether or not above median on
quantitative SAT.
• No convention about assigning variables to
rows and columns.
Contingency Table
• One column for each value of the column
variable; C is the number of columns.
• One row for each value of the row variable;
R is the number of rows.
• R x C contingency table.
Contingency Table
• Each entry is the OBSERVED COUNT
O(i,j) of the number of units having the (i,j)
contingency.
• Column of marginal totals.
• Row of marginal totals.
Example Contingency Table
(Hypothetical)
Own Cell Male Female Total
Telephone
Yes 60 80 140

No 140 120 260

Total 200 200 400


Example Contingency Table
(Hypothetical)
• Entry 60 in the upper left hand corner means
that there were 60 male respondents who
owned a cellular telephone.
• ASSUME marginal totals are known:
• THEN, knowing entry of 60 means that you
can deduce all other entries.
• This 2 x 2 table has one degree of freedom.
• R x C table has (R-1)(C-1) degrees of freedom.
Row and Column Percentages
• Natural to use percentages rather than raw counts.
– Remember that you want to use these numbers for
comparison purposes.
– The term “rate” is often used to refer to a percentage
or probability.
• Can ask for column percentages, row
percentages, or both.
– Percentage in the direction of the independent variable
(usually the column).
Relation of Percentages to
Probabilities
• ASSUME that the column variable is the
independent variable.
• THEN the column percentages are estimates
of the conditional probabilities given the
setting of the independent variable.
• The basic questions revolve around whether or
not the conditional distributions are the same
for all settings of the independent variable.
Bar Charts
• Graphical means of presenting data.
• SPSS analysis
– Graphs, bar chart.
• Can use either count scale or percentage
scale (prefer percentage scale).
• Can have bars side by side or stacked.
Generalization of the R x C
contingency table
• Can have three or more variables to classify
each subject. These are called “layers”.
– In example, can add whether respondent is
student in college or student in high school.
Chapter Sixteen: Comparing
Observed and Expected Counts
• Basic hypothesis
• Definitions of expected counts.
• Chi-squared test of independence.
Basic Hypothesis
• ASSUME column variable is the
independent variable.
• Hypothesis is independence.
• That is, the conditional distribution in any
column is the same as the conditional
distribution in any other column.
Expected Count
• Basic idea is proportional allocation of
observations in a column based on column
total.
• Expected count in (i, j ) contingency =
E(i,j)= total number in column j *total
number in row i/total number in table.
• Expected count need not be an integer; one
expected count for each contingency.
Residual
• Residual in (i,j) contingency = observed
count in (i,j) contingency - expected count
in (i,j) contingency.
• That is, R(i,j)= O(i,j)-E(i,j)
• One residual for each contingency.
Pearson Chi-squared Component
• Chi-squared component for (i, j)
contingency =C(i,j)= (Residual in (i, j)
contingency)2/expected count in (i, j)
contingency.
• C(i,j)=(R(i,j))2 / E(i,j)
Assessing Pearson Component
• Rough guides on whether the (i, j)
contingency has an excessively large chi-
squared component C(i,j):
– the observed significance level of 3.84 is about
0.05.
– Of 6.63 is about 0.01.
– Of 10.83 is 0.001.
Pearson Chi-Squared Test
• Sum C(i,j) over all contingencies.
• Pearson chi-squared test has (R-1)(C-1)
degrees of freedom.
• Under null hypothesis
– Expected value of chi-square equals its degrees
of freedom.
– Variance is twice its degrees of freedom
Special Case of 2 x 2
Contingency Table
Status of Column Column Total
Row Var On Off
On A B A+B

Off C D C+D

Total A+C B+D N


Chi-squared test for a 2x2 table
• 1 degree of freedom [(R-1)(C-1)=1]
• Value of chi-squared test is given by
• N(AD-BC)2 /[(A+B)(C+D)(A+C)(B+D)]
• There is a correction for continuity
Computer Output for Chi-
Squared Tests
• Output gives value of test.
• Asymptotic significance level (p-value)
• Four types of test
– Pearson chi-squared
– Pearson chi-squared with continuity correction
– Likelihood ratio test (theoretically strong test)
– Fisher’s exact test (most accepted, if given.
Example Problem Set
• The independent variable is whether or not the
subject reported using marijuana at time 3 in a
study (time 3 is roughly in later high school).
The dependent variable is whether or not the
subject reported using marijuana at time 4 in a
study (time 4 is roughly in middle college or
beginning independent living). The
contingency table is on the next slide.
Marijuana Use at Time 4 by
Marijuana Use at Time 3
Use at No use at Used at Total
time 4 time 3 time 3
No use at 120 9 129
time 4
Used at 95 142 237
time 4
Total 215 151 366
Example Question 1
• Which of the following conclusions is
correct about the test of the null hypothesis
that the distribution of whether or not a
subject uses marijuana at time 3 is
independent of whether the subject uses
marijuana at time 4?

• Usual options.
Solution to question 1
• Find the significance level in the chi-square
test output. Pearson chi-square (without and
with continuity correction), likelihood ratio,
and Fisher’s exact had significance levels of
0.000.
• Option A (reject at the 0.001 level of
significance) is the correct choice.
Example Question 2
• How many degrees of freedom does the
contingency table describing this output
have?
• Solution: (R-1)(C-1)=(2-1)(2-1)=1.
Example Question 3
• Specify how the expected count of 97.8 for
subject’s who did use marijuana at time 3 and
time 4 was calculated?
• Solution:
• Total number using at time 3 was 151.
• Total number using at time 4 was 237.
• Total N was 366.
• Expected Count=151*237/366.
Example Question 4
• Compute the contribution to Pearson’s chi-
square statistic from the cell used marijuana
at time 3 and used marijuana at time 4.
• Solution:
• Observed count was 142
• Expected count was 97.8
• Component=(142-97.8)2/97.8=19.97
Example Question 5
• Describe the pattern of association between
these two variables.
• Solution. There was a strong dependence
between the two variables. About 44
percent of nonusers at time 3 used at time 4,
compared to 94 percent of users at time 3.
That is, marijuana usage increases very
consistently over time.

You might also like