Chi square tests
Prajkta Bhide, PhD
Chi square (χ2) distribution
• Used for analysis of count or frequency data
• E.g. For a sample of hospitalized patients, we have data on how many are
male/female
• E.g. Socioeconomic status of patients admitted in psychiatry vs oncology units in a
hospital - investigate whether the distribution of the patients by social class
differed in these two units
• E.g. Investigate whether there is a relationship between area of residence and
diagnosis in the population from which the sample was drawn
• Most appropriate for use with categorical variables
2
Observed vs expected frequencies
• The quantitative data used in the computation of the test statistic are the frequencies
associated with each category of one or more variables under study.
• The observed frequencies are the number of subjects or objects in our sample that fall into
the various categories of the variable of interest.
• E.g. If in a sample for 100 patients, we observe : 50 are married, 30 are single, 15 are
widowed and 5 are divorced.
• Expected frequencies are the number of subjects or objects in our sample that we would
expect to observe if some null hypothesis about the variable is true.
• E.g. Suppose our null hypothesis was that the four categories of marital status are equally
represented in the population from which we drew our sample. In that case we would expect
our sample to contain 25 married, 25 single, 25 widowed and 25 divorced patients.
3
Chi square (χ2) statistic
• The chi-square (χ2) statistic is a single number that tells you how much difference
exists between your observed counts and the counts you would expect if there
were no relationship at all in the population.
• Basically comparing your expected values with the values you actually collect.
Chi square statistic
assumes values
between 0 and infinity.
It cannot take on
negative values.
• The Chi-square statistic can only be used on numbers. It cannot be used for
percentages, proportions, means or similar statistical values.
4
• When the null hypothesis is true, χ2 is distributed approximately as chi square with
k - r degrees of freedom.
• k = number of groups for which observed and expected frequencies are available .
• r = number of restrictions or constraints imposed on the given comparison.
• The quantity χ2 is a measure of the extent to which in a given situation pairs of
observed and expected frequencies agree.
• When there is close agreement between observed and expected frequencies the
value of χ2 is small and when the agreement is poor the value of χ2 is large.
5
Decision rule
• The quantity Σ[(Oi - Ei)2/Ei] will be small if the observed and expected frequencies
are close together and it will be large if the differences are large.
• The computed value of χ2 is compared with the tabulated value of χ2 with k - r
degrees of freedom.
• The decision rule is: reject H0 if computed χ2 is greater than or equal to the
tabulated χ2 for the chosen value of α.
6
7
Type of chi-square tests
• A chi-square goodness of fit test
• A chi-square test for independence
• A chi-square test for homogeneity
8
chi-square goodness of fit test
• A chi-square goodness of fit test determines if sample data matches a population.
(fits one categorical variable to a distribution)
• As a test for normality (normal distribution)
• OR binomial distribution
• OR Poisson distribution
9
Suppose you have data for 250 hospitals on their
inpatient occupancy ratio. We wish to know if the data
provides sufficient evidence that the sample did not come
from a normally distributed population. Testing for
Normality
Inpatient occupancy Number of hospitals
ratio (%)
0.00 – 40 (39.9) 16
1. Data
40.0 – 50 (49.9) 18
As given in table
50.0 – 60.0 (59.9) 22
2. Assumption 60.0 – 70 (69.9) 51
We assume that the 70.0 – 80 (79.9) 62
sample is simple random
sample. 80.0 – 90 (89.9) 55
90.0 – 100 (99.9) 22
100.0 – 110 (109.9) 4
10
3. Hypotheses
• H0: In the population from which the sample was drawn, inpatient occupancy
ratios are normally distributed.
• HA: The sampled population is not normally distributed.
4. Test statistic
The test statistic is given by χ2
5. Distribution of test statistic
If H0 is true, χ2 is distributed approximately as chi square with k - r degrees of
freedom.
6. Decision rule
Reject H0 if computed value of χ2 is equal to or greater than the critical value of χ2.
11
7. Calculation of test statistic
Since we have only the sample, we use it to determine the mean and standard
deviation; mean = 69.91 and stdev= 19.02.
We know relative frequency = probability. And we know to calculate it for a
normal distribution by converting it to standard normal.
Class interval Z= (xi – x̅)/s at lower Expected relative Expected frequency
limit of interval frequency
< 40 0.0582 14.55 i.e. 250 x 0.0582
40.0 – 49.9 -1.57 0.0887 22.18
50.0 – 59.9 -1.05 0.1546 38.65
60.0 – 69.9 -0.52 0.1985 49.62
70.0 – 79.9 0.00 0.2019 50.48
80.0 – 89.9 0.53 0.1535 38.38
90.0 – 99.9 1.06 0.0875 21.88
100.0 – 109.9 1.58 0.0397 9.92
≥ 110 2.11 0.0174 4.35
12
Class interval Observed Expected (Oi - Ei)2/Ei
frequency (Oi) frequency (Ei)
< 40 16 14.55 0.145
40.0 – 49.9 18 22.18 0.788
50.0 – 59.9 22 38.65 7.173
60.0 – 69.9 51 49.62 0.038
70.0 – 79.9 62 50.48 2.629
80.0 – 89.9 55 38.38 7.197
90.0 – 99.9 22 21.88 0.001
100.0 – 109.9 4 9.92 3.533
≥ 110 0 4.35 4.350
Total 250 250 25.854
Computed χ2 = 25.854
k = number of groups/class intervals = 9
r = number of constraints = 3 (for making ΣEi = ΣOi, and estimating µ and σ)
Therefore degrees of freedom = k – r = 9 - 3 = 6 13
14
8. Statistical decision
We compare χ2 = 25.854 with the values of χ2 in table.
Computed value > χ20.995 = 18.548
So we reject the null hypothesis that the sample came from a normally distributed
population at the 0.005 level of significance.
9. Conclusion
We conclude that the data in the sampled population, inpatient occupancy ratios
are not normally distributed.
10. p value
Since 25.854 > 18.548, p value < 0.005
The probability of obtaining a value of χ2 as large as 25.854, when the null
hypothesis is true, is less than 5 in 1000.
We say that such a rare event did not occur due to chance alone (when H0 is true)
so we look for another explanation.
The other explanation is that the null hypothesis is false.
15
• Sometimes the parameters are specified in the null hypothesis.
• It should be noted that had the mean and variance of the population been
specified as part of the null hypothesis, we would not have had to estimate them
from the sample and our degrees of freedom in that case would have been 9 - 1 =
8.
• Chi square can be used to test for normality however it is not the most
appropriate test to use when the hypothesized distribution is continuous.
• The Kolmogorov-Smirnov test is designed for goodness of fit test involving
continuous distributions.
16
When you have small expected frequencies
• If you have data such that the expected frequencies for one or more categories
are small (<1)
• In that case adjacent categories may be combined to achieve suggested minimum
(at least 1).
• Combining reduces the number of categories and therefore the degrees of
freedom.
17
Suppose you have data on acceptance for new pain
reliever. 100 physicians each selected 25 patients to
participate in the study. Each patient after trying the new
pain reliever for a specified amount of time, was asked
whether it was preferable to the pain reliever used
regularly in the past.
We are interested in determining whether or not these
data are compatible with the null hypothesis that they
were drawn from a population that follows a binomial
distribution.
Our computed χ2 = 47.624
Degrees of freedom = 10 (number of groups) – 2 (constraints) = 8
Constraints: we force the total of expected freq to equal total observed freq and we
needed to estimate p from sample data.
From table, we find that 47.624 > 21.955 (> χ20.995) for p < 0.005
So we reject the null hypothesis that the sample came from a binomially population
at the 0.005 level of significance.
18
Number of patients out of Number of doctors Total number of patients
25 preferring new pain reporting this preferring new pain reliever by
reliever number doctor
0 5 0
1 6 6
2 8 16
3 10 30
4 10 40
5 15 75
6 17 102
7 10 70
8 10 80
9 9 81
10 or more 0 0
Total 100 500
Since p is not known, we estimate it based on our sample i.e. 500/2500 = 0.2 (this
becomes one constraint)
19
Number of patients Number of Expected relative freq Expected freq (Ei)
out of 25 preferring doctors
new pain reliever reporting this
number
(observed
freq, Oi)
0 5 0.0038 0.38 Combine = 2.74
1 6 0.0236 2.36
2 8 0.0708 7.08
3 10 0.1358 13.58
4 10 0.1867 18.67
5 15 0.1960 19.60
6 17 0.1633 16.33
7 10 0.1109 11.09
8 10 0.0623 6.23
9 9 0.0295 2.95
10 or more 0 0.0173 1.73
Total 100 1.0000 100
20
A hospital administrator wishes to test the null hypothesis
that emergency admissions follow a Poisson distribution
with λ = 3. Suppose he/she collects data over a period of
90 days.
Number of emergency Number of days this number
admissions in a day of emergency admissions
occurred
Our computed χ2 = 3.664
0 5
Degrees of freedom = 9 (number of
1 14
groups) – 1 (constraints) = 8
2 15
Suppose α =0.05, then table value =
3 23
15.507
4 16
5 9
We cannot reject null hypothesis at α 6 3
=0.05. 7 3
We conclude that the emergency 8 1
admissions at this hospital may follow 9 1
Poisson distribution with λ = 3 10 or more 0
Total 90
21
Type of chi-square tests
• A chi-square goodness of fit test
• A chi-square test for independence
• A chi-square test for homogeneity
22
chi-square test for independence
• A chi-square test for independence compares two variables in a contingency
table to see if they are related. It tests to see whether distributions of categorical
variables differ from each another. (compares two sets of data to see if there is a
relationship)
• The use of the chi square distribution is to test the null hypothesis that two
criteria of classification when applied to the same set of entities are independent.
• We say the two criteria of classification are independent if the distribution of one
criterion is the same no matter what the distribution of the other criterion.
• E.g. if socioeconomic status and area of residence of the inhabitants of a certain
city are independent, we would expect to find the same proportion of families in
the low, middle and high socio-economic groups in all areas of the city.
23
Use of contingency tables
• Contingency table is constructed using ‘r’ rows representing the various levels of
one criterion of classification and ‘c’ columns representing various levels of the
second criterion.
Second First criterion of classification level
criterion of
classification 1 2 3 … c Total
level
1 n11 n12 n13 n1c n1
2 n21 n22 n23 n2c n2
3 n31 n32 n33 n3c n3
…
r nr1 nr2 nr3 nrc nr
Total n1 n2 n3 nc n
• We are interested in testing the null hypothesis that in the population the two
criteria of classification are independent – if the null hypothesis is rejected we will
conclude that the two criteria of classification are not independent.
24
• We use the property that if two events are independent the probability of their
joint occurrence is equal to the product of their individual probabilities.
•
In general to obtain the expected frequency of a given cell we multiply the total of
the row in which the cell is located by the total of the column in which the cell is
located and divide the product by the grand total.
25
Decision rule
• The expected frequencies and observed frequencies are compared.
• If the discrepancy is sufficiently small the null hypothesis is tenable.
• If the discrepancy is sufficiently large the null hypothesis is rejected and we
conclude that the two criteria of classification are not independent.
• Chi-square defined in this manner is distributed approximately as chi square with
(r - 1)(c – 1) degrees of freedom when the null hypothesis is true.
• If the computed value of χ2 is equal to or larger than the tabulated value of χ2 for
some α the null hypothesis is rejected at that α level of significance.
26
The researchers want to investigate whether women
infected with HIV who are also infected with HPV present
with more cytological abnormalities then women with
only one or neither virus. Can they conclude that there is
a relationship between HPV status and stage of HIV
infection?
HIV
HPV Seropositive, Seropositive,
Seronegative Total
symptomatic asymptomatic
Positive 23 4 10 37
Negative 10 14 35 59
Total 33 18 45 96
1. Data
As given in table
2. Assumption
We assume that the sample available for analysis is simple random sample drawn
from the population of interest.
27
3. Hypotheses
• H 0: HPV status and stage of HIV infection are independent (i.e. no relationship).
• H A: The two variables are not independent.
4. Test statistic
The test statistic is given by χ2
5. Distribution of test statistic
If H0 is true, χ2 is distributed approximately as chi square with (r-1)(c-1) degrees of
freedom i.e. (2-1)(3-1) = 2 df.
6. Decision rule
Let α = 0.05.
Reject H0 if computed value of χ2 is equal to or greater than the critical value of χ2.
= 5.991 from table. 28
29
7. Calculation of test statistic
We calculate the expected frequencies (in brackets)
HIV
HPV Seropositive, Seropositive,
Seronegative Total
symptomatic asymptomatic
Positive 23 (12.72) 4 (6.94) 10 (17.34) 37
Negative 10 (20.28) 14 (11.06) 35 (27.66) 59
Total 33 18 45 96
C11 = 33*37/96 = 12.72
C21 = 33*59/96 = 20.28
C12 = 18*37/96 = 6.94
C22 = 18*59/96 = 11.06
Χ2 = (23-12.72)2/12.75 + (4-6.94)2/6.94 + …. +
C13 = 45*37/96 = 17.34 (35-27.66)2/27.66
C23 = 45*59/96 = 27.66 = 20.60081
30
8. Statistical decision
We compare χ2 = 20.60081 with the values of χ2 in table (5.991).
So we reject the null hypothesis since 20.60081 > 5.991.
9. Conclusion
We conclude that there is a relationship between HPV status and HIV infection
stage.
10. p value
Since 20.60081 > 10.597, p value < 0.005
31
Small expected frequencies for tests of independence
• For contingency tables with more than one degree of freedom (i.e. more than a 2 x
2 contingency table) a minimum expectation of 1 is allowable if no more than 20%
of the cells have expected frequencies of less than 5.
• To do this, adjacent rows or adjacent columns may be combined if logical.
• When χ2 is based on less than 30 degrees of freedom, expected frequencies as
small as 2 can be tolerated.
32
2 x 2 contingency table
Second
criterion of First criterion of classification level
classificatio
n level 1 2 Total
1 a b a+b
2 c d c+d
Total a+c b+d n
• Χ2 = n(ad - bc)2
(a+c)(b+d)(a+b)(c+d)
Where a, b, c and d are the observed frequencies and
degrees of freedom = (r-1)(c-1) = 1 df
33
Small expected frequencies for 2x2 contingency table
• For 2x2 contingency tables both small expected frequencies and small total sample
sizes may pose a problem.
• Do not use χ2 test if n < 20
• Do not use χ2 test if 20 < n < 40 and any expected frequency is less than 5
• When n ≥ 5, any expected cell frequency as small as 1 may be tolerated.
34
Characteristics of tests of independence
1. A single sample is selected from a population of interest and the subjects or
objects are cross classified on the basis of the two variables of interest.
2. The rationale for calculating expected cell frequencies is based on the probability
law which states that if two events (here are the two criteria of classification) are
independent, the probability of their joint occurrence is equal to the product of
their individual probabilities.
3. The hypotheses and conclusions are stated in terms of the independence (or lack
of independence) of the two variables.
35
Type of chi-square tests
• A chi-square goodness of fit test
• A chi-square test for independence
• A chi-square test for homogeneity
36
chi-square test for homogeneity
• When we do not have a single sample drawn from a single population
i.e. when the investigator may specify that independent samples are drawn
from each of several populations in that case we need a chi-square test for
homogeneity
• The homogeneity test is concerned with the question: are the samples drawn
from populations that are homogenous with respect to some criterion of
classification?
• So the null hypothesis states that the samples are drawn from the same
population.
37
In order to study the relationship between age and
several prognostic factors in squamous cell carcinoma,
researchers collected data on frequencies of histological
types in four age groups.
We wish to know if we may conclude that the populations
represented by the four age-group samples are not
homogenous with respect to cell type?
Cell type
Age group Number of Large cell Small cell
Keratinizing cell type
(years) patients nonkeratinizing cell type nonkeratinizing cell type
30-39 34 18 7 9
40-49 97 56 29 12
50-59 144 83 38 23
60-69 105 62 25 18
Total 380 219 99 62
1. Data
As given in table
2. Assumption
We assume that the samples available for analysis are simple random samples
drawn from each one of the four populations of interest.
38
3. Hypotheses
• H 0: The four populations are homogenous with respect to cell type.
• H A: The four populations are not homogenous with respect to cell type.
4. Test statistic
The test statistic is given by χ2
5. Distribution of test statistic
If H0 is true, χ2 is distributed approximately as chi square with (r-1)(c-1) degrees of
freedom i.e. (4-1)(3-1) = 6 df.
6. Decision rule
Let α = 0.05.
Reject H0 if computed value of χ2 is equal to or greater than the critical value of χ2.
= 12.592 from table. 39
40
7. Calculation of test statistic
We calculate the expected frequencies and then calculate the Χ2 statistic
χ2 = 4.444
8. Statistical decision
We compare χ2 = 4.444 with the values of χ2 in table (12.592).
Since 4.444 < 12.592, we are unable to reject H0.
9. Conclusion
We conclude that the four populations may be homogenous with respect to cell
type.
10. p value
Since 4.444 < 10.645, p value > 0.1
41
Characteristics of tests of homogeneity
1. Two or more populations are identified in advance and an independent sample is
drawn from each.
2. Sample subjects or objects are placed in appropriate categories of the variable of
interest.
3. The calculation of expected cell frequencies is based on the rationale that if the
populations are homogenous as stated in the null hypothesis the best estimate
of the probability that a subject/object will fall into a particular category of the
variable of interest can be obtained by pooling the sample data.
4. The hypotheses and conclusions are stated in terms of homogeneity (with
respect to the variable of interest) of populations.
42
If in a chi-square test of independence, your computed value of
the test statistic is 5.991 and the critical value from the table is
20.6008, then your decision should be to ...
A) not reject the null hypothesis
B) reject the null hypothesis
C) label the test as inconclusive
D) label the test as wrong
43
Suppose you have data in the form of frequency distribution of
children 2-9 years of age categorized by sex (X) and spleen
enlargement (Y). After hypothesis testing using the appropriate test,
you get a computed test statistic of 2.353, while the critical value of
test statistic is 3.84 at 5% level of significance. What can you
conclude from the data?
A) Sex and spleen enlargement are independent of each other.
B) Sex and spleen enlargement are not independent of each other.
C) Sex is dependent on spleen enlargement.
D) Spleen enlargement is dependent on sex.
44
Thank you!