Chapter 4
CHI-SQUARE TEST for
INDEPENDENCE &
GOODNESS-OF-FIT
CONTENTS
I. Introduction
II. Test of Independence
III. Goodness – of – Fit tests
Uniform Distribution
Binomial Distribution
Normal Distribution
Poisson Distribution
INTRODUCTON
Tests of Independence: describes the
relationship between row and column
categories which is whether they are
independent or dependent to each other.
Eg. Customer preference rating vs.
geographical location of customers
Goodness –of – Fit tests: are applied to
check assumptions made about
populations (whether that population
distribution is, normal, binomial, Poisson,
etc.)
THE CHI-SQUARE DISTRIBUTION
The Chi-square distribution is denoted by X2 (X is
the Greek letter of Chi)
The mathematical expression for the chi-square
distribution contains only one parameter and
that is the degree of freedom v.
The variable X2 can not be negative. Thus, X2
curve don’t extend to the left of zero.
In this chapter the significance level will be right
tailed area of the X2 distributions.
The symbol X2α,v will mean the value of X2 such
that the distribution with v degree of freedom
has a right tail area of α, where v = (r-1)*(c-1)
THE CHI-SQUARE DISTRIBUTION
Reject H0
Accept H0
Right tailed area at α = 0.05
Example: if α = 0.05 & v = 6, thus X2α,v = X20.05,6 = 12.592, from
table
TEST OF INDEPENDENCE BETWEEN TWO
VARIABLES
A X2 test of independence is used to analyze
the frequencies of two variables with
multiple categories to determine whether
the two variables are independent. That is,
the Chi-square distribution involves using
sample data to test for the independence of
two variables. The sample data are given in
to a two way table called a contingency
table. Examples:
Whether employee absenteeism is
independent of job classification
Whether beer preference is independent
of sex (gender)
Whether favorite sport is independent of
TEST OF INDEPENDENCE BETWEEN TWO
VARIABLES
The steps and procedures are similar with
hypothesis testing.
Example: A company planning a TV advertising
campaign wants to determine which TV shows its
target audience watches and thereby to know
whether the choice of TV program an individual
watches is independent of the individuals income.
The table supportingTypes
this is of
shown below. Use a
Show
5% level of significance and the null hypothesis.
Income Basket Movie News Total
ball
Low 143 70 37 250
Medium 90 67 43 200
High 17 13 20 50
TEST OF INDEPENDENCE BETWEEN TWO
VARIABLES
Solution
The steps and procedures are
1. H0: Choice of TV program an individual watches is
independent of the individuals income
Ha: Income and Choice of TV program are not
independent of the individuals income
2. Decision rule:
Reject H0 if sample X2 > X2α,v
α = 0.05, v = (r-1)*(c-1) = (3-1)*(3-1) = 4
X2α,v = X20.05,4= 9.49
Reject H0 if sample X2 is greater than 9.49
2
TEST OF INDEPENDENCE BETWEEN TWO
VARIABLES
Solution
In computing the test statistic our first task is to
estimate the expected frequencies of each cell are
fe :
feij = [(cell row total)*(cell column
total)]/Grand total
fe11 =
Therefore, fe12 =values of independence
the expected fe13 = feij are :
250*250/500 250*150/500 250*100/500
= 125 = 75 = 50
fe21 = fe22 = fe23 =
200*250/500 200*150/500 200*100/500
= 100 = 60 = 40
TEST OF INDEPENDENCE BETWEEN TWO
VARIABLES
Solution: Calculate Sample X2 = ∑ ((f 0 - fe)2/fe)
f0 fe f0 - (f0 - (f0 -
fe fe)2 fe)2/fe
143 125 18 324 2.592
70 75 -5 25 0.33333 Step 4: Reject H0
37 50 -13 169 3.38 thus, choice of TV
90 100 -10 100 1 program is not
67 60 7 49 0.81667 independent from
43 40 3 9 0.225 income level.
17 25 -8 64 2.56
13 15 -2 4 0.26667
20 10 10 100 10
Sample X2 = ∑((f0 - 21.1737
2
TESTING FOR THE EQUALITY OF SEVERAL
PROPORTIONS
The test for equal proportion is performed in
exactly the same way as the test for
independence.
The cell proportions to row totals are equal
(Pr)
This equality of proportions gives rise to the
expected frequencies fe
Pr means cell proportions to column total
The hypothesis of the test for equal
proportions can be stated either in terms of
equal row proportions or in terms of equal
column proportions.
We will use row proportions Pr that is cell
TESTING FOR THE EQUALITY OF
SEVERAL PROPORTIONS
Example: The table below contains counts for a
random sample of n = 200 workers. It shows for
example, that 12 workers who had not gone to high
school were rated as satisfactory by a supervisor.
We want to test (at the α = 0.05 level) the
hypothesis that the population proportions of
satisfactory workers in education levels 1, 2, and 3
are equal.
Supervisor Educational level
Performance
rating rating of workers by educational level
No HS HS, not Compl Total
complet eted
ed HS
Satisfactory 12 63 65 140
Not 8 17 35 60
satisfactory
TESTING FOR THE EQUALITY OF SEVERAL
PROPORTIONS
Solution
1. H0: the cell proportions Pr in any row are equal
Ha: the cell proportions Pr in at least one row
are not equal
2. Decision rule:
Reject H0 if sample X2 > X2α,v
α = 0.05, v = (r-1)*(c-1) = (2-1)*(3-1) = 2
X2α,v = X20.05,2= 5.991
Reject H0 if sample X2 is greater than 5.991
3. Compute the test statistic. Sample X2 = ∑ ((f 0 -
fe)2/fe)
TESTING FOR THE EQUALITY OF SEVERAL
PROPORTIONS
f0 fe f0 - (f0 - (f0 -
Step 4: Accept
fe fe)2 fe)2/fe
H0 ,because
12 14 -2 4 0.2857
5.0595 is less than
63 56 7 49 0.8750
15.991. This
65 70 -5 25 0.3571
implies, the
8 6 2 4 0.6667
proportion of
17 24 -7 49 2.0417 satisfactory rated
35 30 5 25 0.8333 workers is the
Sample X2 = ∑((f0 - 5.0595 same for all three
fe)2/fe) educational levels
GOODNESS–OF-FIT TESTS
Goodness of fit tests use sample data as a
basis for accepting or rejecting assumptions
about a population’s distribution. The
assumptions are stated as the null hypothesis
A goodness of fit test is performed by first
computing the frequencies that would be
expected if H0 is true, then the difference
between the frequencies observed in the
sample f0 and the expected frequencies fe are
used to calculate
Sample X2 = ∑ ((f
0 - fe)2/fe)
Finally the sample X2 is compared with the
appropriate X2α,v to decide whether H0 should
be accepted or rejected.
GOODNESS–OF-FIT: Uniform Distribution
A random variable that can have one of N
values has a uniform distribution if every
value has the same probability 1/N of
occurring. Therefore,
fe = 1/N*n = n/N
e.g. If a uniformly distributed random
variable can have one of N = 5 values and we
have a sample of n = 400, then the expected
frequency for each value is:
fe = n/N = 400/5 = 80
GOODNESS–OF-FIT: Uniform Distribution
Example: A winning number in a Massachusetts lottery is a four
digit number. Digits in the winning number are assumed to be
drawn at random. It is assumed that the winning digit
population has a uniform distribution, i.e. each of the N = 10
integers: 0, 1, 2, …, 9 has the same probability (1/N = 1/10) of
being selected for each place in a winning number. Meaza
plays the lottery regularly. She keeps a running tally of digits in
past winning numbers, and plays (bets on) a four digit number
made up of digits that have occurred most frequently in the
past. Meaza’s system is based on the supposition that some
digits (the one that have occurred most frequently in the past)
have higher probabilities of occurring than others, i.e. the
winning digit population does not have a uniform distribution.
We will use a random sample of 400 winning digits to test at
the 5% level, the hypothesis that the population distribution is
uniform.
GOODNESS–OF-FIT: Uniform Distribution
Solution:
Step 1: State the hypothesis
H0: the distribution is uniform
Ha: the distribution is not uniform
Step 2: Decision rule:
Reject H0 if sample X2 > X2α,v
α = 0.05, v = ne-1-g = 10-1-0 = 9
where ne = the number of expected values; g
= number of estimations
X2α,v = X20.05,9= 16.919
Reject H0 if sample X2 is greater than 16.919
2
GOODNESS–OF-FIT: Uniform
Distribution
Digi f0 fe f0 - (f0 - (f0 - Note: For the X2
ts fe fe)2 fe)2/fe approximation,
0 41 40 1 1 1/40 each fe must be
14 196 196/40 at least 5
1 54 40
2 31 40 -9 81 81/40
-1 1 1/40 Step 4: Accept
3 39 40
H0, thus the
4 35 40 -5 25 25/40
winning digit
5 36 40 -4 16 16/40 population is
6 56 40 16 256 256/40 uniform. So, the
7 38 40 -2 4 4/40 result doesn’t
8 31 40 -9 81 81/40 support the
supposition of
9 39 40 -1 1 1/40
Meaza
Sample X2 = ∑((f - 16.55
GOODNESS–OF-FIT: Binomial Distribution
A particular binomial distribution is specified
by the values of two parameters, n & p
where,
n = sample size or number of trials
p = probability of success in a trial
From the binomial probability table, we can
calculate the probabilities of 0, 1, 2, ….,
successes in n trials. Each of these
probabilities, when multiplied by the sample
size n, is an expected frequency for the
number of successes in n trials.
GOODNESS–OF-FIT: Binomial
Example: Fizz Co.Distribution
makes compressed gas cylinders
used for carbonating water & inflating life vests.
Cylinders are sold in boxes of 20. Occasionally, a
cylinder will be defective (too low in pressure).
Customers have been complaining about the
defectives. Many have returned all cylinders left
in a box when one defective is found, stating
that if one is defective, the chances are that
others also are defective. Joseph Popp, the
quality control manager of Fizz, states that the
overall proportion of defective cylinders is at an
acceptable low level, and that the number of
defective cylinders in a boxes of 20 has a
binomial distribution. Joe also points out that if
he is correct, it would be unlikely that any box
would contain several defective cylinders. Joe
GOODNESS–OF-FIT: Binomial
Distribution
Thus, the sample proportion defective is,
pˉ = 100/2000 = 0.05, Joe also recorded that
the number of boxes containing 0, 1, 2, ….
Defectives as shown in the following table,
Number Number
of of boxes Required: Perform the
defectives observed test for Joe.
in box f0
Note: We will use pˉ to
0 39 estimate P
1 34
2 20
3 4
4 1
5 2
6 or more 0
GOODNESS–OF-FIT: Binomial
Solution: Distribution
We first need to find the expected numbers of
defectives for a binomial distribution with n =
20 and p = 0.05 from binomial table. Number
Number Number
fe (0) = p(0)*n = 0.3585*100
of of boxes of boxes
defectiv observe expecte
= 35.85
es in d f0 d fe
box
fe (1) = p(1)*n = 0.3773*100
0 39 35.85
= 37.73
1 34 37.73
fe (2) = p(2)*n = 0.1887*100
2 20 18.87
= 18.87
3 4 5.96
4 1 1.33
Note that, the number fe
5 2 0.23
of in each row should be
at least 5. 6 or 0 0.03
more
GOODNESS–OF-FIT: Binomial
Distribution
Note: For the X2
No. of f0 fe f0 - (f0 - fe)2/fe approximation,
defecti each fe must be
fe
ves at least 5
0 3 35.8 3.15 0.2768
9 5 The decision
1 3 37.7 -3.73 0.3687 rule is not stated
4 3 at the beginning,
2 2 18.8 1.13 0.0677 because v is
0 7 unknown
3 or 7 -0.55
7.55 0.0401
more
Sample X2 = ∑((f0 - 0.7553
fe)2/fe)
GOODNESS–OF-FIT: Binomial Distribution
Solution:
Step 1: State the hypothesis
H0: the distribution is a binomial with n = 20
Ha: the distribution is not a binomial with n =
20
Step 2: Decision rule:
Reject H0 if sample X2 > X2α,v
α = 0.05, v = ne-1-g = 4-1-1 = 2
where ne = reduced to 4; and g = 1 (P is
estimated by pˉ)
X2α,v = X20.05,2= 5.991
Step 4:Reject
AcceptH0 H0, thus X
if sample the
2
is number of defective
greater than 5.991
cylinders in boxes of 20 is a binomial
Step 3: Sample X2 =0.7553
GOODNESS–OF-FIT: Normal
Distribution
Example: For inventory Sales (000s of No. of days
gallons f0
planning and control
purpose, Krupp Chemical Less than 34.0 0
Co. wants to know if its 34.0 & under 13
35.5
sales of a liquid chemical
35.5 & under 20
are normally distributed. 37.0
Sales for a random sample 37.0 & under 35
of 200 days are given 38.5
below in the table. At the 38.5 & under 43
40.0
5% level, perform a test of
40.0 & under 51
the hypothesis that sales 41.5
are normally distributed. 41.5 & under 27
43.0
43.0 & under 10
The sample mean and s.d. calculated from the
44.5
200 sample daily sales44.5
are:& underxˉ = 140,000
GOODNESS–OF-FIT: Binomial
Solution: Distribution
Note: µ and ∂ are estimated by sample
estimators.
fe = n*p(x class)
fe for less than 34.0 class is
= p(less than 34 thousand)*n;
z = (x - µ)/ ∂ = (34-40)/2.5 = -2.4
p(less than 34 thousand) = 0.5-0.4918 =
0.0082
fe = p(0.0082)*n
= 0.0082*200 = 1.64,
By using the same procedure we will find the
following table
GOODNESS–OF-FIT: Normal
Distribution
Sales class f0 Proba fe f0 - fe (f0 - fe)2/fe
bility
Less than 0 0.0082 1.64
34.0 5.82 4.7176
34.0 & under 13 0.0277 5.54
35.5
35.5 & under 20 0.0792 15.84 4.16 1.0925
37.0
37.0 & under 35 0.1592 31.84 3.16 0.3136
38.5
38.5 & under 43 0.2257 45.14 -2.14 0.1015
40.0
40.0 & under 51 0.2257 45.14 5.86 0.7607
41.5
41.5 & under 27 0.1592 31.84 -4.84 0.7357
43.0
. 43.0 & under 10 0.0792 15.84 -5.84 2.1531
44.5
GOODNESS–OF-FIT: Normal Distribution
Solution:
Step 1: State the hypothesis
H0: the distribution is normally distributed
Ha : the distribution is not normally distributed
Step 2: Decision rule:
Reject H0 if sample X2 > X2α,v
α = 0.05, v = ne-1-g = 8-1-2 = 5
where ne = reduced to 8; and g = 2 (µ and ∂
are estimated )
X2α,v = X20.05,5= 11.070
Reject H0 if sample X2 is greater than 11.070
Step 4: Reject H0, thus the daily sales are not
Step 3: Sample X = 15.1940
2
normally distributed.
GOODNESS–OF-FIT: Poisson
Distribution
The Poisson process formula provides the
probability of the number of arrivals in an
interval of time. The formula is:
P(x) = e-λt (λt)x/x! , where: x = no. of arrivals in t
units of time
λ = average arrivals rate per
unit of time
t = number of units of time
Example: When a beer bottle filling machine
breaks a bottle, the machine must be
shutdown while the broken glass is removed.
The production manager at Moor’s Brewery
has been using a Poisson distribution with λ
= 3 shutdown per day, on the average, to
determine the probabilities of 0, 1, 2,3,…
shutdowns in a day. The manager has
GOODNESS–OF-FIT: Poisson
Distribution
Example:- Contd…
No. of No. of days f0
Shutdowns in
a day (x)
0 3
1 20
2 29
3 22
4 23
5 10
6 or more 13
Total 120
GOODNESS–OF-FIT: Poisson
Distribution
Example:- Contd…
Solution:
P(x shutdowns) in a day is given by
P(x) = e-λt (λt)x/x!
P(x) = e-3 (3)x/x!
P(0) = e-3 (3)0/0! = 0.0498
fe(0)= P(0)*No. of days in the sample (i.e.
120)
= 0.0498*120 = 5.976
P(1) = e-3 (3)1/1! = 0.1494
fe(1)= P(1)*No. of days in the sample (i.e.
120)
= 0.1494*120 = 17.928
P(2) = e-3 (3)2/2! = 0.2240
GOODNESS–OF-FIT: Poisson
Distribution
x f0 P(x) fe f0 - fe (f0 - fe)2/fe
0 3 0.049 5.976 -2.976 1.482
8
1 20 0.149 17.92 2.072 0.239
4 8
2 29 0.224 26.88 2.120 0.167
0 0
3 22 0.224 26.88 -4.880 0.886
0 0
4 23 0.168 20.16 2.840 0.400
0 0
5 10 0.100 12.09 -2.096 0.363
8 6
GOODNESS–OF-FIT: Poisson Distribution
Solution:
Step 1: State the hypothesis
H0: the no. of shutdowns per day is a Poisson
distribution with λ = 3 per day
Ha : the no. of shutdowns per day is not a
Poisson distribution with λ = 3 per day
Step 2: Decision rule:
Reject H0 if sample X2 > X2α,v ; α = 0.05, v =
ne-1-g = 7-1-0 = 6
where ne = 8; and g = 0; X2α,v = X20.05,6=
12.592
Reject H0 if sample X2 is greater than 12.592
Step 3: Sample X2 = 4.383