Chapter Two (Estimation and Hypothesis Testing)
Chapter Two (Estimation and Hypothesis Testing)
1
decisions and draw conclusions about the population from which the sample is drawn. In
statistics there are two ways through which inference can be made.
Statistical estimation
Statistical hypothesis testing
Both involve using sample statistics to make inferences about the population parameter.
Sample Numerical
data
ii. Interval estimation: It is the procedure that results in the interval of values as an
estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.
Estimator and Estimate
Estimator is the rule or random variable that helps us to approximate a population
parameter. But estimate is the different possible values which an estimator can assume.
2
n
X i
For example: The sample mean X i 1 is an estimator for the population mean and
n
3
that the parameter's value is within a certain range of the statistic's value? This range is
the confidence interval. A confidence interval is a specific interval estimate of a
parameter determined by using data obtained from a sample and the specific confidence
level of the estimate.
The confidence level is the probability that the value of the parameter falls within the
range specified by the confidence interval surrounding the statistic. There are different
conditions to be considered to construct confidence intervals of the population mean, .
Condition-1: If the population variance 2 is known; what ever the value of sample
size but the population is normal
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is with replacement and order important. The population can have
any frequency distribution. The sampling distribution of X will have a mean X
and a standard deviation X , and approaches a normal distribution as n gets large.
n
This allows us to use the normal distribution curve for computing confidence intervals.
Z
X ~ N (0,1)
n
X Z n
Where: = is the probability that the parameter lies outside the interval
4
Z 2 is the value of the standard normal variable corresponding to
X
P Z 2 Z 2 1
n
P X Z 2 n X Z 2
n 1
X Z 2 n , X Z 2 n
Note: When (as is often the case) we don't know the population standard deviation
and n is large ( n 30 ), we can approximate it by the sample standard deviation S , and
obtain the following (good) approximation of the 1 100% confidence interval for :
X Z 2 S n , X Z 2 S n
Z 2 Z-value with an area of /2 to its right (obtained from a table).
Condition-2: If the population variance 2 is not known and n is Small (n<30 the
population is normal:
In most practical research, the standard deviation for the population of interest is not
known. In this case, the standard deviation is replaced by the estimated standard
deviation S, also known as the standard error. Since the standard error is an estimate for
the true value of the standard deviation, the distribution of the sample mean X is no
longer normal with mean and standard deviation n . Instead, the sample mean
5
becomes closer to the normal distribution, since the standard error approaches the true
standard deviation for large n.
t
X has t distribution with n-1 degree of freedom.
S n
-The value of t 2 can be obtained from a table with an area of 2 to the right with
n 1 degrees of freedom.
Therefore, the 1 100% confidence interval for when the population is normally
distributed and is not known is given by:
X t 2 S n , X t 2 S n
Example 2.1: A random sample of 900 workers showed an average height of 67 inches
with a standard deviation of 5 inches.
a) Find a 95% confidence interval of the mean height of all workers
b) Find a 99% confidence interval of the mean height of all workers
Solution:
a) X 67 , S=5, n=900
1 100% 95% 1 0.95
0.05 2 0.025
Z 2 Z 0.025 1.96, from the table.
6
X Z 2 S n , X Z 2 S n
( 67 2 . 58 * 5 30 , 67 2 . 58 * 5 30 )
66 . 57 , 67 . 43
Example 2.2: A Drug Company is testing a new drug which is supposed to reduce blood
pressure. From the six people who are used as subjects, it is found that the average drop
in blood pressure is 2.28 points, with a standard deviation of 0.95 points. What is the 95%
confidence interval for the mean change in pressure?
Solution:
X 2.28 , S 0.95 , n 6
1 100% 95% 1 0.95
0.05 2 0.025
t 2 t 0.025 2.571, from the table, with df 5.
Example 2.3: Suppose we want to estimate a 95% confidence interval for the average
quarterly returns of all fixed-income funds in the Ethiopia. We draw a sample of 100
observations and calculate the sample mean to be 0.05 and the standard deviation 0.03.
We assume that those returns are normally distributed with known variance.
Solution:
X 0.05, 0.03, n=100
1 100% 95% 1 0.95
0.05 2 0.025
Z 2 Z 0.005 2.58, from the table
7
X Z 2 n
0 .05 1 .96 ( 0 .03 10 )
( 0 .04412 , 0 .05588 )
2.1.2. Point and Interval Estimation of the Population proportion:
X
If P represents for the population proportion then the sample proportion Pˆ provides
n
a good estimate of P. Therefore, the sample proportion P̂ is the point estimation of the
population proportion. To construct the confidence interval for the proportion we follow
the following conditions:
Conditions: If the population proportion is not too close to zero or one, and
that the sample size is large (at least 30):
X
Under these conditions, the sampling distribution Pˆ can be approximated by
n
To construct a confidence interval for P, we can now adopt the same argument
that was used in finding a confidence interval for and write:
P (1 P ) P (1 P )
P ( Pˆ Z 2 P Pˆ Z 2 ) 1
n n
Pˆ (1 Pˆ ) Pˆ (1 Pˆ )
Pˆ Z 2 P Pˆ Z 2 )
n n
Example 2.4: In a sample of 400 people who were questioned regarding their
participation in sports, 160 said that they did participate. Construct a 98 % confidence
interval for P, the proportion of P in the population who participate in sports.
8
Solution:
Let X= be the number of people who are interested to participate in sports.
X 160
Pˆ 0 .4
n 400
P (1 P ) 0 .4 ( 0 .6 )
Pˆ 0 . 0245
n 400
As a result, an approximate 98% confidence interval for P is given by:
Pˆ (1 Pˆ ) Pˆ (1 Pˆ )
Pˆ Z 2 P Pˆ Z 2 )
n n
9
b. Alternative hypothesis: Is a claim or statement about a population parameter that
will be true if the null hypothesis is false. It is a statistical hypothesis that states a
hypothesis of difference between a parameter and a specific value. It is usually
denoted by H or H .
1 A
10
Do not Reject H0 Correct Decision Type II error
Decision
Reject H0 Type I error Correct Decision
General steps in hypothesis testing:
1. State the appropriate hypothesis
2. Select the level significance,
3. Select an appropriate test statistics
4. Identify the critical region.
5. Compute the test value
6. Making the decision.
7. Summarize the results.
2.2.1 Hypothesis tests about a population mean:
Suppose the assumed or hypothesized value of is denoted by 0 then one can
formulate two sided (1) and one sided (2 and 3) hypothesis as follows:
1. H 0 : 0 VS H1 : 0
2. H 0 : 0 VS H1 : 0
3. H 0 : 0 VS H1 : 0
Condition-1: If the population standard deviation, is known what ever the value of
sample size is and when sampling is from a normal distribution:
n
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
0 Z cal Z 2
VS 0 Z cal Z
0
0 Z cal Z
11
Note: When we don't know the population standard deviation and n is large ( n 30 ),
we can approximate it by the sample standard deviation S , and obtain the following test
statistics:
Z cal
X ~ N (0,1)
0
S n
Hypothesis
Decision rule is to reject H0 if:
Null Alternative
0 t cal t 2
VS 0 t cal t
0
0 t cal t
Example 2.5: The Tele Co. provides telephone service in an area. According to the
company’s records, the average length of all calls placed was 12.5 minutes. A sample of
150 such calls placed through this Co. produced a mean length of 13 minutes with a
standard deviation of 2.6 minutes. Can you conclude that the mean length of all current
calls is different from 12.5 minutes? Use the 0.05 level of significance and assume that
the distribution of all call is normal.
Solution:
Let 0 population mean
1. State the null and alternative hypothesis:
12
H 0 : 12.5 (The mean length of all current calls is 12.5 minutes)
H 1 : 12.5 (The mean length of all current calls is different from12.5 minutes).
2. Select the level significance, = 0.05 (given)
3. Select an appropriate test statistics:
Z-statistic is appropriate because the sample size is large
4. Identify the critical region:
Here we have two critical regions since we have two tailed hypothesis. The
critical region is Z cal Z 0.025 1.96 (1.96,1.96) is the acceptance region
Z cal
X
0 13 12.5
0 .5
2.27
S n 2 .6 150 0.22
region.
5. Compute the test value
13
10 n
Xi (X i X )2
X i 1
67.8 , S i 1
3.01, n=10
101 n 1
t cal
X 67.8 66 1.891
0
S n 3.01 10
6. Decision: Reject H0, since t cal is not in the acceptance region
7. Conclusion: At 5% level of significance, we have evidence to say that the average
height of an individual is less than 66 inches.
Example 2.7: A national magnitude claims that the average college student watches less
television. The average national of all college students is 29.4 hours per week with a
standard deviation of 2 hours. A sample of 25 college students has a mean of 27 hours.
Test the claim at 0.01 and assume normality of the population.
Solution:
1. State the null and alternative hypothesis:
H 0 : 29.4 VS H 1 : 29.4
Z cal
X 27 29.4 6
0
n 2 25
6. Decision:
Do not reject H0, since Z cal is not in the acceptance region
7. Conclusion: The average college students watches less television at 1% level of
significance
14
Example 2.8: An authority from a district power station of the town told reporters
recently that the average monthly electric Bill of households in AA is not more than Birr
100. A random sample of 400 households from the city produces a mean of Birr 105 Bill
with standard deviation of Birr 40. Test the claim of the authority at 5% level of
significance.
Solution:
1. State the null and alternative hypothesis:
H 0 : 100 (claim) VS H 1 : 100
Z cal
X 105 100 2.5
0
S n 40 400
5. Decision:
Reject H0, since Z cal is not in the acceptance region
6. Conclusion: At 5% level of significance the claim of the authority is not correct.
2.2.2 Tests about a population proportion: P
The procedure to make tests of hypothesis about the population proportion P for large
samples is similar in many aspects to the population mean. The procedure includes the
same seven steps. Similarly, the test can be two-tailed or one tailed. When the sample
size is large, the sample proportion P̂ is approximately normally distributed with its
P (1 P )
mean equal to P and standard deviation equal to . Hence; we use the normal
n
distribution to perform a test of hypothesis about the population proportion P for a large
15
Sample. The sample size considered to be large when nPˆ and n(1 Pˆ ) are both greater
than 5.
Suppose the assumed or hypothesized value of P (parameter of the binomial
distribution) is denoted by P0 then one can formulate two sided (1) and one sided (2 and
3) hypothesis as follows:
1. H 0 : P P0 VS H 1 : P P0
2. H 0 : P P0 VS H 1 : P P0
3. H 0 : P P0 VS H 1 : P P0
Hypothesis
Decision rule is to reject H0 if:
Null Alternative
P P0 Z cal Z 2
VS P P0 Z cal Z
P P0
P P0 Z cal Z
Z cal
Pˆ P
0
~ N (0,1)
P0 (1 P0 )
n
Example 2.9: A manufacturing company has submitted a claim that 100% of items
produced by a certain process are non defective. An improvement in the process is being
considered that the feel will lower the proportion of defectives below the current 10%. In
an experiment 100 items are produced with the new process and 5 are defective: Is this
evidence sufficient to conclude that the method has been improved? Use a 0.05 level of
significance.
16
2. 0.05
3. Critical Region: Z>1.645
4. Computation
X 95
Pˆ 0.95
n 100
Z cal
Pˆ P
0
0.95 0.90
1.67
P0 (1 P0 ) 0 .9 * 0 .1
n 100
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has
reduced the proportion of defective.
Example 2.10: the unemployment rate in a given country at a given period is believed to
be 10%. The government embarked on a series of projects to reduce unemployment. It
was of interest to determine whether unemployment decreases as a result of the projects.
A random sample of 500 people was chosen, and 48 of them were found to be
unemployed. Test at 1% level of significance if the government projects reduced the
unemployment rate
1. H 0 : P 0.1 VS H 1 : P 0.1
2. 0.05
3. Critical Region: Z<-Z1.645
4. Critical Region: Z Z
5. Computation
X 48
Pˆ 0.096
n 500
Z cal
Pˆ P
0
0.096 0.1
0.3
P0 (1 P0 ) 0.1* 0.9
n 500
Z tab Z Z 0.01 2 .33
17
6. Decision: Do not reject H0 since Zcal > Ztab
7. Conclusion: the government projects didn’t reduce unemployment.
Example 2.11: A large sample of 200 students from the students of a certain high school
is interviewed and 85 of them are found to use city bus. Can you conclude that at least
40% of the students use city bus? Use a 0.05 level of significance (Exercise)
P
B is the probability that a number has attribute B.
A B1 B2 . . Bj . Bc Total
A O O O O R
1 11 12 1j 1c 1
A O O O O R
2 21 22 2j 2c 2
.
.
A O O O O R
i i1 i2 ij ic i
18
.
.
A O O O O
r r1 r2 rj rc
Total C C C n
1 2 j
- The chi-square procedure test is used to test the hypothesis of independency of two
attributes
i 1 j 1
eij
given by
Ri C j
eij Where Ri=the i th raw total
n
Cj= the j th column total.
n=total number of observation.
Remarks:
r c r c
Oij eij
i 1 j 1 i 1 j 1
exceeds the tabulated value with degree of freedom equal to (c-1) (r-1).
19
Example 8.12 A researcher is interested to assess the effect of litracy on family planning
use. Accordingly he collected data and tabulated the findings in the following manner.
Can we say there is association between educational status and family planning use?
FP Use Educational Status Total
Ilitrate Litrate
Yes a 63 b 49 112
No c 15 d 33 48
Total 78 82 160
Example 8.13: A geneticist took a random sample of 300 men to study whether there is
association between father and son regarding boldness. He obtained the following
results.
Son
Father Bold Not
Bold 85 59
Not 65 91
Using α=5% test whether there is association between father and son regarding boldness.
Example 8.14: Random samples of 200 men, all retired were classified according to
education and number of children is as shown below
Number of children
Education level 0-1 2-3 Over 3
Elementary 14 37 32
Secondary and above 31 59 27
20