Basic Concept Hypothesis Testing
Statistical Test
Statistical hypothesis testing is a procedure based on a sample evidence and probability
theory to determine whether the hypothesis is a reasonable statement.
Statistical Hypothesis
A statistical hypothesis is a statement about population parameter developed for a
purpose of testing.
Null Hypothesis(H ¿¿ 0)¿
Null hypothesis is a maintained hypothesis that is held to be true unless the sufficient
evidence to the contrary is obtained.
Alternative Hypothesis (H 1 )
The alternative hypothesis is the opposite of the null hypothesis and represents the
conclusion supported if the null hypothesis is rejected.
Level of Significance(α )
The fixed maximum allowable probability level for which the null hypothesis is rejected
even it is true is called level of significance.
P-Value
The P-value or observed level of significance of a test is the minimum level of
significance for which the null hypothesis can be rejected, given the observed sample
statistic.
If P-value of the test statistic is less than level of significance (α ) , null hypothesis can
be rejected, otherwise accepted.
The Steps of Hypothesis Testing
1. State the null (H ¿¿ 0)¿ and alternative (H 1 ) hypothesis.
2. Choose the level of significance(α ) and the sample size(n).
3. Determine the appropriate test statistic to test the hypothesis.
4. Collect the data and compute the sample value of the appropriate test statistic.
5. Determine whether the test statistic has fallen into the rejection or the non-
rejection region (by using P-value / Critical Value).
6. Make the statistical decision. If the test statistic falls into the non-rejection region,
the null hypothesis cannot be rejected. If the test statistic falls into the rejection
region, the null hypothesis is rejected.
& Paired t test
& Test of Independence of Attributes :
Qualitative variables cannot be measured but individuals may be classified according to qualitative
characteristics. For example, a group of students may be classified according to their merit, or according
to their father's profession, social status etc. The qualitative characteristics are termed as attributes. A
group of individuals may be classified according two attributes in a 2-way table. Such a two-way table
arranged on the basis of two attributes is called contingency table. An m x n contingency table is
constituted with m classes of one attribute and n classes of another attribute. Such a table consists of m
columns (the number of classes of one attribute) and n rows (the number of classes of the other attribute)
as shown below :
Class of attribute A
Class of
attribute a1 a2 ………. ai ……… am Total
B
b1 O11 O21 ………. Oi1 ……… Om1 B1
b2 O12 O22 ………. Oi2 ……… Om2 B2
׃ ׃ ׃ ׃ ׃ ׃
׃ ׃ ׃ ׃ ׃ ׃
bj O1j O2j ………. Oij ……… Omj Bj
׃ ׃ ׃ ׃ ׃ ׃
׃ ׃ ׃ ׃ ׃ ׃
bn O1n O2n ………. Oin ……… Omn Bn
N=Ai
Total A1 A2 Ai Am
=Bj
The above table is an m x n contingency table and contains m x n cells. The number of
observations corresponding to the ith class of the attribute A and the jth class of the attribute B is O ij
which is the observed frequency of the ij-th cell. The table apparently looks like a correlation table.
However, in correlation table, the variables are quantitative and in contingency table the variables are
qualitative or categorical.
The relationship between the attributes in a contingency table is termed as association. If the
association between the attributes is insignificant, the attributes are said to be independent. Independence
of attributes in contingency tables are tested using 2 (chi-square) statistic and the test is known as 2 test
of independence. The null hypothesis may be stated as
H0 : There is no association between the attributes
or
H0 : The attributes are independent
The test statistic is
χ =∑ ∑
2
i j
{ (Oij−Eij )2
Eij } ∑∑
=
i j
O2ij
Eij
−N
has 2 distribution with (m-1)(n-1) d.f.
Here Eij is the expected frequency of the ij-th cell, i.e. for the jth category of B in the ith category of
A. Eij is obtained as
Ai x B j
Eij =
N
where Ai is the total frequency for the ith category of A and B j is the total frequency for the jth
category of B. The test statistic is compared with the theoretical value of 2 with (m-1)(n-1) degrees of
freedom to comment on the acceptability of the null hypothesis.
Note :
1. If more than 20% cell frequencies in a contingency table are less than 5, 2 test may lead to
misleading results.
2. A significant 2 leads to rejection of the null hypothesis but does not measure the degree of
association or dependence of the attributes. A measure of dependence is given by coefficient of
contingency.
Example:
A sample survey of COU graduates employed in various organizations gave the follow results :
Utilization of educational qualification in
Type of present job
Total
organization Fully Partially Very little
utilized utilized or not all
Education &
research 112 19 6 137
Industry 88 27 1 116
Others 27 40 26 93
Total 227 86 33 346
We wish to examine the pattern of utilization of educational qualification in various types of
organizations. The null hypothesis is
H0 : There is no association between the utilization of education with the type of employer
organization.
Ai x B j
Eij =
Here N = 346 and there are 3 classes of each attribute. Using the formula N we obtain
the expected cell frequencies as tabulated below :
Expected frequencies
Utilization pattern
Type of
Fully Partially Very little Total
Employer
or not all
Education &
research 90 34 13 137
Extension 76 29 11 116
Others 61 23 9 93
Total 227 86 33 346
O2ij
∑E
Test statistic 2 = ij - N
{
112 19 6 882 272 12 27 2 402 26 2
}
2 2 2
+ + + + + + + +
= 90 34 13 76 29 11 61 23 9 - 346
= 90.52
The degrees of freedom is (3-1) x (3 -1) = 4
The critical value of 2 at 1% level of significance is 13.28. The computed value of 2 leads to
rejection of the null hypothesis.