Chi-Square Test
Chi-Square Test
Chi –square denoted by Greek letter χ2 and has developed by Karl Pearson in 1990. This theory
describes the magnitude of observed and theoretical frequency (expected frequency) distribution.
Since Chi square test does not make any assumption about population parameter. It is also called
a distribution of free test.
Properties of chi- square test
1. Chi square distribution which lies between 0 to ∞ is continuous probability distribution.
2. Since chi square is sum of square, its value cannot be negative.
3. The value of chi- square will be zero if difference of each pair of chi-square is zero.
Application of chi-square
1. Chi- square test for goodness of fit.
2. Chi -square test for independence of attributes
3. Chi- square test for population variance.
Chi -Square test for goodness of fit.
The goodness of fit describes the differences between the observed and expected frequency
distribution. If the observed values are close to expected value under a hypothesis, the fit is said
to be good. The test goodness of fit is also used to test the significant difference between
observed and expected frequency distribution of binomial, Poisson and normal etc.
Steps
1. Setting of hypothesis
Null hypothesis (H0): There is no significance difference between observed (experimental) and
the expected (theoretical) frequency.
Alternative hypothesis (H1): there is significant difference between observed (experimental) and
the expected (theoretical) frequency.
2. Compute the test statistic
(𝑂−𝐸)2
χ2 =∑ 𝐸
ΣO
Where, E = 𝑛
3. Level of significance
4. Find the tabulated value (critical) values of ∑𝜒 2 for (n-1) degree of freedom.
5. Decision: if calculated value is less than its tabulated value, Ho is accepted and if calculated
value is greater than its tabulated value Ho is rejected and alternative hypothesis is accepted.
Conditions (assumptions) for the validity of chi-square test
1. N, total number of observed frequency greater than 50.
2. The sample observation should be independent i.e. no individual item should be included
twice or more in the sample.
3. Total observed and expected frequency should be equal. i.e. ΣO = ΣE =N
4. Each expected frequency should be large than 10. But in practice, no expected frequency is
less than5. If any frequency less than 5, we should be pooled the expected frequencies and
observed frequencies.
Then, d.f. = n-1 - k1 - k2
Example:
The following tables give the number of accidents on the road during a week. Test whether the
accidents are uniformly occurred over the week.
Day Sun Mon Tue Wed Thus Fri Sat
No. of accidents 16 12 14 18 9 11 8
Solution:
Step 1: Null hypothesis HO: The accidents are uniformly occurred over the week.
Alternative hypothesis H1: The accidents are not uniformly occurred over the week.
Step 2: Test statistic:
(𝑂−𝐸)2
χ2 =∑ 𝐸
ΣO
Where, E = Expected frequencies = Where, E = 𝑛
Calculation of χ2
O E O-E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
16 13 3 9 0.6923
12 13 -1 1 0.0769
14 13 1 1 0.0769
17 13 4 16 1.2308
9 13 -4 16 1.2308
15 13 2 4 0.3077
8 13 -5 25 1.9231
ΣO=91 ΣE=91 ∑
(𝑂−𝐸)2
= 5.5385
𝐸
(𝑂−𝐸)2
χ2 = ∑ = 5.5385
𝐸
Calculation of χ2
O E O-E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
43 36 7 49 1.36
32 36 -4 16 0.44
38 36 2 4 0.11
27 36 -9 81 2.25
38 36 2 4 0.11
52 36 16 256 7.11
36 36 0 0 0
31 36 -5 25 0.69
39 36 3 9 0.25
24 36 -12 144 4
ΣO =91 ΣE =360 ∑
(𝑂−𝐸)2
= 16.32
𝐸
(𝑂−𝐸)2
χ2 =∑ = 16.32
𝐸
Calculation of χ2
O E O-E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
34 36 -2 4 0.1111
10 12 -2 4 0.3333
20 16 4 16 1
ΣO =64 ΣE =64 ∑
(𝑂−𝐸)2
= 1.44
𝐸
(𝑂−𝐸)2
χ2 =∑ =1.44
𝐸
Example
Fit a binomial distribution to the following data, assuming that the coin is unbiased.
No. of heads 0 1 2 3 4 5
Frequency 90 560 1000 900 600 50
Test the goodness of fit for above data. (TU 2053 MBA)
Solution:
Let r be the number of heads occurred. Then, probability of getting r heads out of n coins is
given by
P(X= r) = n𝑐𝑟 𝑝𝑟 𝑞 𝑛−𝑟 where, r =0, 1, 2,…..n
Here, n = 5, p = 0.50, q =1-0.50 = 0.50
∴ P(X=r) = 5𝑐𝑟 (0.5)𝑟 (0.5)5−𝑟
= 5𝑐𝑟 (0.5)5
Where r = 0, 1, 2,3,4,5
The expected frequencies of getting r heads is
E=f (r) = N.P(X=r)
=N x 0.0313 x 5Cr
= 3200 x 0.0313 x 5Cr
Where, N=3200
Calculation of expected frequency ( E ) = f (r)= 100 x5Cr
f(0)=100 x 5C0 =100.16 =100
f(1)= 100 x 5𝑐1 =500
f(2)=100 x 5𝑐2 =1000
f(3)=100 x 5𝑐3 = 1000
f(4)=100 x 5𝑐4 =500
f(5)=100 x 5𝑐5 =100
Hence, the expected frequencies of heads are
No. of heads 0 1 2 3 4 5
Frequency 100 500 1000 1000 500 100
In order to test the goodness of fit for above data, the steps for χ2-test are as follows:
Null hypothesis H0: the binomial distribution with p = q= 0.50 is a good fit to the given data.
Alternative hypothesis H1: the binomial distribution with p = q= 0.50 is not good fit to the given
data.
Test statistic: Under H0, the test statistic is
Calculation of χ2
(𝑂−𝐸)2
χ2 =∑ 𝐸
(𝑂−𝐸)2
χ2 =∑ = 63.2
𝐸
Solution:
O= Observed frequency
E= Expected frequency
=N x P(X= r)
𝑒 −𝑚 . 𝑚 𝑟
=N x P(X= r) =N. 𝑟!
Where, r = 0, 1, 2, 3, 4
𝑒 −0.53 . 0.53𝑟 2.7183−0.53 . 0.53𝑟
E= f(r) =200 x P(X= r) =200 x =200 x
𝑟! 𝑟!
=Where, r = 0, 1, 2, 3, 4
Calculation of expected frequency ( E)
(0.53)0
f(0)= 117.72 x = 117.72
0!
(0.53)1
f(1)= 117.72 117.72 x = 62.39
1!
(0.53)2
f(2)= 117.72 x =16.53
2!
(0.53)3
f(3)= 117.72 x = 2.921
3!
(0.53)4
f(4)= 117.72 x =0.381
4!
Calculation of χ2
O E O-E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
120 117.72 2.28 5.1984 0.0442
60 62.39 -2.39 5.7121 0.0916
15 16.53
4 20 2.921 19.83 0.17 0.0289 0.0015
1 0.381
ΣO =200 ΣE =200 ∑
(𝑂−𝐸)2
= 0..1373
𝐸
(𝑂−𝐸)2
χ2 =∑ = 0.1373
𝐸
𝑵(𝐚𝐝−𝐛𝐜)𝟐
Or, χ2 =
(𝐚+𝐛)(𝐜+𝐝)(𝐚+𝐜)(𝐛+𝐝)
Solution:
Null hypothesis H0: Inoculation and attack by T.B. are independent. In other words, Inoculation
is not effective in preventing tuberculosis.
Alternative hypothesis H1: Inoculation and attack by T.B. are dependent. In other words,
Inoculation is effective in preventing tuberculosis.
Test statistic: Under H0, the test statistic is
(𝑂−𝐸)2
χ2 =∑ 𝐸
𝑅𝑇 𝑋 𝐶𝑇
Where, E= Expected frequency in a cell = Where, E = 𝑁
Alternative method:
The χ2 for 2x2 Contingency table can also be computed by using following formula directly.
a=20 b=300 a + b =320
c=80 d =600 c + d=680
a + c =100 b + d=900 N=1000
𝑵(𝐚𝐝−𝐛𝐜)𝟐 𝟏𝟎𝟎𝟎(𝟐𝟎 𝐱 𝟔𝟎𝟎−𝟖𝟎 𝐱 𝟑𝟎𝟎)𝟐
χ2 = (𝐚+𝐛)(𝐜+𝐝)(𝐚+𝐜)(𝐛+𝐝) = = 7.35
𝟑𝟐𝟎 𝐱 𝟔𝟖𝟎 𝐱 𝟏𝟎𝟎 𝐱 𝟗𝟎𝟎
Total
a = 4.5 b = 9.5 14
c = 6.5 d =7.5 14
Total 11 17 28
Null hypothesis H0: The vaccine is not effective in preventing from died of Anthrax.
Alternative hypothesis H1: The vaccine is effective in preventing from die of Anthrax.
Test statistic: Under H0, the test statistic is
O E O-E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
4.5 5.5 -1 1 0.1818
9.5 8.5 1 1 0.1176
6.5 5.5 1 1 0.1818
7.5 8.5 -1 1 0.1176
ΣO =28 ΣE =28 ∑
(𝑂−𝐸)2
= 0.5988
𝐸
(𝑂−𝐸)2
χ2 =∑ = 7.36
𝐸
𝑅𝑇 𝑋 𝐶𝑇
Where, E= Expected frequency in a cell = Where, E= 𝑁
𝛿 2 =Population variance
𝑆 2 = an unbiased estimated of population variance δ2
1
𝑆 2 = 𝑛−1 Σ[𝑋 − ̅̅̅
𝑋)2
1 (𝑋)2
𝑆 2 = 𝑛−1 Σ[𝑋 2 - ]
𝑛
Where, d=X - A
If 𝑆 2 denote the biased estimate of population variance 𝛿 2 , then
1
̅̅̅2
𝑆 2 = 𝑛−1 Σ[𝑋 − 𝑋)
If sample variance is given
𝑛𝑆 2
χ2 =
𝛿2
Note:
𝑛𝑆 2
i) For small sample case (i.e. n < 30), χ2 = 𝛿2
Practical problems
1. The following table gives the number of aircraft accidents that occurred during the various
days of the week. Test whether the accidents are uniform distributed over the week? (T.U. 2015
M, 2016 R, 2017 M)
Day Sun Mon Tue Wed Thu Fri Sat
No. of accidents 14 16 8 12 11 9 14
(Ans: χ2= 4.1665)
2. The numbers of automobile accidents per week in a certain community were as follows:
12 8 20 2 14 10 15 6 9 4. Are these frequencies in agreement with the belief that accidents
conditions were the same during the 10 week periods under considerations? (T.U.2041 MBA)
(Ans: χ2= 26.6)
3. A sample of 500 workers of a factory according to sex and nature of work is as follows.
Sex Total
Male Female
Technical 200 100 300
Non-technical 50 150 200
Total 250 250 500
Test at 5% level of significance whether there exists any relationship between sex and nature of
work. (T.U. 2057, 2014 R MBS) (Ans: χ2= 83.33)
4. Four hundred employees of a factory are classified according to their level of and decision. Do
you agree with the statement that decisions vary according to level of employee? (T.U. 2057
MBS)
Decisions Senior officer officer Junior officer Total
Quick 60 55 70 210
Slow 40 45 90 190
Total 100 100 160 400
(Ans: χ2=8.377)
5. Marketing manager of a company is concerned that the Brand’s share is unevenly distributed
throughout the country. In a survey of 100 consumers in each geographic religions of the
country, following result was obtained. (T.U. 2056 MBA)
Region Total
North east North west South east South east
Purchase the brand 40 55 45 50 190
Do not purchase the brand 60 45 55 50 210
Total 100 100 100 100 400
Is there evidence from the above data of a relation between ownership of TV sets and level of
income? (T.U. 2055 MBA) (Ans: χ2=243.59)
7. The department of Business Administration at a university would like to determine whether
there is a relationship between student’s interest in business administration and ability in
mathematics. A random sample of 200 students is selected and they are asked whether their
ability in mathematics and interest in business administration are low, average or high. The
results were as follows:
Interest in Business Administration Ability in mathematics Total
Low Average High
Low 60 15 15 90
Average 15 45 10 70
High 5 10 25 40
Total 80 70 50 200
Test whether there is any relationship between student interest in Business Administration and
ability in mathematics. (T.U. 2057 MBA, 2015 M MBS) (Ans: χ2= 84.75)
8. The following table shows the reaction of 1000 audience about the new movie after leaving
the theatre in different locations of the country.
Audience Reaction Location Total
Kathmandu Pokhara Bharatpur
Excellent 250 120 80 450
Good 200 50 50 300
Poor 150 50 50 250
Total 600 220 180 1000
Test at 5% level of significance, the hypothesis that the audience reaction is independent of the
location. (T.U. 2018 R, MBS) ( Ans: χ2= 9.488)
9. Following table provides data with regard to structure of the fathers and their sons at the age
of 20 years.
Structure Structure of fathers Total
of sons Tall Short
Tall 8 3 11
Short 6 7 13
Total 14 10 24
Test that the structure of sons is independent of the structure of the fathers. ( Ans: χ2= 0.8104)
10. Test the hypothesis that δ =18, given that s = 21 for random sample of size 29 from a normal
population. ( Ans: χ2= 39.47)
11. Height in inches of 8 students is given below:
72 74 75 76 62 64 68 62
Can we say that variance of distribution of heights of all students from which the above sample
of 8 students was drawn is equal to 25 inches? ( Ans: χ2= 0.3886)
4. Test the hypothesis that δ =12, given that s=18 for a random sample size of 40.
(Ans: χ2 =4.5846)