N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr) .
ii) Simple random sampling without replacement (srswor) .
By the sampling wor , the number of possible samples will be N Cn 5C3 10 , which are as
follows:
(1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 8), (1, 6, 9), (1, 8, 9), (3, 6, 8), (3, 6, 9), (3, 8, 9), (6, 8, 9).
1 N
Y Yi , population mean.
N i 1
1 n
y yi , sample mean.
n i 1
1 N 1 N 2
(Yi Y ) Yi Y 2 , population variance.
2 2
N i 1 N i 1
1 N 1 N 2
S2
N 1 i 1
(Yi Y ) 2
N 1 i 1
Yi N Y 2 , population mean square.
1 n 1 n 2
s2
n 1 i 1
( y i y ) 2
n 1 i 1
yi n y 2 , sample mean square.
Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
N 1 2 2
i.e. E ( y ) Y and its variance V ( y ) S .
nN n
Proof: It is immediately seen that
1 n 1 n
E ( y ) E yi E ( yi ) . By definition,
n
i 1 n i 1
N
1 N
E ( yi ) Yi Pr ( yi Yi ) Yi Y , since yi can take any one of the values
i 1
N i 1
Y1 ,, YN each with probability 1 / N .
Therefore,
1 n
E( y) Y Y .
n i 1
Simple random sampling 9
Justification of the above result can see by taking particular case, i.e. as
2 2
n n
{ yi E ( yi )} ai (a1 a2 ... an ) . Put n 3 , then,
2
i 1 i 1
3 3
(a1 a2 a3 ) 2 a12 a22 a32 a1a2 a1a3 a2a1 a2a3 a3a1 a3a2 ai2 ai a j .
i 1 i, j
i j
1 n 1 n
E ( yi Y ) 2 E [( yi Y ) ( y j Y )] , i j .
n 2 i 1 n 2 i, j
1 n 1 n
V ( yi ) Cov ( yi , y j ) (2.1)
n 2 i 1 n 2 i, j
i j
Consider
N
V ( yi ) E ( yi Y ) 2 (Yi Y ) 2 Pr ( yi Yi )
i 1
1 N
N i 1
(Yi Y ) 2 , since yi can take any one of the values Y1 ,, YN each with
probability 1 / N .
N 1 2 1 N
2
N
S , since S 2
N 1 i 1
(Yi Y ) 2 (2.2)
and
N
Cov ( yi , y j ) E [( yi Y ) ( y j Y )] (Yi Y ) (Y j Y ) Pr ( yi Yi , y j Y j ) .
i, j
In this case y j can take any one of the values Y1 ,, YN with probability 1 / N irrespective of
the values taken by yi , because old composition of the population remain the same
throughout the sampling process due to the sampling with replacement. In other words for
i j , yi and y j are independent, so that
10 RU Khan
1 1 1
Pr ( yi Yi , y j Y j ) Pr ( yi Yi ) Pr ( y j Y j ) .
N N N2
Hence,
1 N 1 N N
2 i 2 i
Cov ( yi , y j ) (Y Y ) (Y j Y ) (Y Y ) (Y j Y ) 0 . (2.3)
N i, j N i 1 j 1
Substitute the values of equations (2.2) and (2.3) in equation (2.1), we get
1 n N 1 2 N 1 2 2
V ( y) S S .
n 2 i 1 N nN n
1 N
E (Y ) E ( N y ) N E ( y ) N Y N Yi Y
ˆ
N i 1
N 2 2 N ( N 1) 2
and V (Yˆ ) V ( N y ) N 2 V ( y ) S .
n n
Remarks:
N 1
i) The standard error (SE ) of y is SE ( y ) V ( y ) S .
n nN
N N ( N 1)
ii) The standard error Yˆ is SE (Yˆ ) V (Yˆ ) S .
n n
n 1 i 1 n 1 i 1
V ( yi ) E ( yi2 ) Y 2 , so that
E ( yi2 ) 2 Y 2 , since V ( yi ) ( N 1) S 2 / N 2 .
and
V ( y ) E ( y 2 ) Y 2 , so that
Simple random sampling 11
2 N 1 2 2
E( y 2 ) Y 2 , since V ( y ) S , for srswr .
n nN n
Therefore,
1 n 2 N 1 2
E (s 2 ) ( 2 Y 2 ) n Y 2 2 S .
n 1 i 1 n N
Example: In a population with N 5 , the values of Yi are 8, 3, 11, 4 and 7.
( N 1) S 2 2
iii) V ( y ) , and
nN n
N 1 2
iv) E ( s 2 ) S .
2
N
Solution:
a) We know that
1 N 1 N 2 1 N 2
Y Yi 6.6 , Yi Y 8.24 and S
N i 1
2
N i 1
2 2
N 1 i 1
Yi N Y 2 10.3 .
b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 8) 8.0 64.00 40.0 0.0 (11, 4) 7.5 56.25 37.5 24.5
(8, 3) 5.5 30.25 27.5 12.5 (11, 7) 9.0 81.00 45.0 8.0
(8, 11) 9.5 90.25 47.5 4.5 (4, 8) 6.0 36.00 30.0 8.0
(8, 4) 6.0 36.00 30.0 8.0 (4, 3) 3.5 12.25 17.5 0.5
(8, 7) 7.5 56.25 37.5 0.5 (4, 11) 7.5 56.25 37.5 24.5
(3, 8) 5.5 30.25 27.5 12.5 (4, 4) 4.0 16.00 20.0 0.0
(3, 3) 3.0 9.00 15.0 0.0 (4, 7) 5.5 30.25 27.5 4.5
(3, 11) 7.0 49.00 35.0 32.0 (7, 8) 7.5 56.25 37.5 0.5
(3, 4) 3.5 12.25 17.5 0.5 (7, 3) 5.0 25.00 25.0 8.0
(3, 7) 5.0 25.00 25.0 8.0 (7, 11) 9.0 81.00 45.0 8.0
(11, 8) 9.5 90.25 47.5 4.5 (7, 4) 5.5 30.25 27.5 4.5
(11, 3) 7.0 49.00 35.0 32.0 (7, 7) 7.0 49.00 35.0 0.0
(11, 11) 11.0 121.00 55.0 0.0
12 RU Khan
1 n 1
i) E( y)
n i 1
yi
25
165 6.6 Y , where n is the number of sample.
1 n
ii) E ( N y ) N yi 33 or E( N y ) N E( y ) 33 .
n i 1
1 n 2
iii) V ( y )
n i 1
yi Y 2 4.12 .
Now,
( N 1) S 2 2
4.12 , and 4.12 , therefore,
nN n
(n 1) S 2 2
V ( y) 4.12 .
nN n
1 n 2 1
iv) E ( s 2 ) si 25 206 8.24
n i 1
(1a)
( N 1) S 2
and 8.24 (2a)
N
In view of equation (1a) and (2a), we get
( N 1) S 2
E (s 2 ) 2 8.24 .
N
1 n 1 n
E ( y ) Y , and V ( y ) V ( yi ) Cov ( yi , y j ) , (2.4)
n 2 i 1 n 2 i, j
i j
N 1 2
where V ( yi ) S , for each i . (2.5)
N
Consider
N
Cov ( yi , y j ) E [( yi Y ) ( y j Y )] (Yi Y ) (Y j Y ) Pr ( yi Yi , y j Y j ) .
i, j
In this case y j can take any one of the values except Yi , the value which is known to have
1
already been assumed by yi , with equal probability , so that for i j ,
N 1
Simple random sampling 13
1 1
Pr ( yi Yi , y j Y j ) Pr ( yi Yi ) Pr ( y j Y j | yi Yi ) .
N N 1
Hence,
N
1
Cov( yi , y j ) (Yi Y ) (Y j Y )
N ( N 1) i, j
1 N
N
N ( N 1) i 1
(Yi Y ) j (Y Y ) (Yi Y )
j 1
1 N N N
(Yi Y ) (Y j Y ) (Yi Y ) 2
N ( N 1) i 1 j 1 i 1
1 N
S2
(Yi Y ) N
N ( N 1) i 1
2
(2.6)
Substitute the values of equations (2.5) and (2.6) in equation (2.4), we get
1 ( N 1) S 2 1 S 2 ( N 1) 2 n 1 2
V ( y) n n (n 1) S S
n 2 N n2 N nN n N
N n 2 n S2 S2
S 1 (1 f ) ,
nN N n n
n
where f is called the sampling fraction and the factor (1 f ) is called the finite
N
population correction ( fpc ) . If the population size N is very large or if n is small
n
corresponding with N , then f 0 and consequently fpc 1.
N
Alternative expression
N n 2 1 1 2
V ( y) S S .
nN n N
Corollary: Yˆ N y is an unbiased estimate of the population total Y with its variance
V (Yˆ ) N 2 (1 f ) S 2 / n .
Proof:
By definition,
1 N
E (Yˆ ) E ( N y ) N E ( y ) N Y N Yi Y
N i 1
and
N n 2 S2
V (Yˆ ) V ( N y ) N 2 2
S N (1 f ) .
nN n
14 RU Khan
Remarks
N n 1 f 1 1
i) The standard error of y is SE ( y ) S S S .
nN n n N
N n 1 f 1 1
ii) The standard error Yˆ is SE (Yˆ ) N S NS NS .
nN n n N
For large population fpc (1 f ) 1, then
S2 S
i) V ( y ) , and SE ( y ) .
n n
ˆ N 2S 2 NS
ii) V (Y ) , and SE (Yˆ ) .
n n
V ( yi ) E ( yi2 ) Y 2 , so that
N 1 2
E ( yi2 ) S Y 2 , since V ( yi ) ( N 1) S 2 / N .
N
and V ( y ) E ( y 2 ) Y 2 , so that
N n 2 N n 2
E( y 2 ) 2
S Y , since V ( y ) S , for srswr .
nN nN
Therefore,
1 n N 1 2 N n 2
E (s 2 ) S Y 2 n S Y 2
n 1 i 1 N nN
1 S2 1 S2
[n ( N 1) ( N n)] (n 1) N S2.
n 1 N n 1 N
Example: A random sample of n 2 households was drawn from a small colony of N 5
households having monthly income (in rupees) as follows:
Households: 1 2 3 4 5
Income (in thousand rupees): 8 6.5 7.5 7 6
b) Enumerate all possible samples of size n 2 by the without replacement method and
verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y ) Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y ) Y .
( N n) S 2
iii) V ( y ) , and
nN
iv) E ( s 2 ) S 2 .
Solution:
a) We know that
1 N 1 N 1 N 2
Y
N i 1
Yi 7 , 2 Yi2 Y 2 0.5 , and S 2
N i 1
N 1 i 1
Yi N Y 2 0.625 .
b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 6.5) 7.25 52.563 36.25 1.125 (8, 7.5) 7.75 60.063 38.75 0.125
(8, 7) 7.50 56.250 37.50 0.500 (8, 6) 7.00 49.000 35.00 2.000
(6.5, 7.5) 7.00 49.000 35.00 0.500 (6.5, 7) 6.75 45.563 33.75 0.125
(6.5, 6) 6.25 39.063 31.25 0.125 (7.5, 7) 7.25 52.563 36.25 0.125
(7.5, 6) 6.75 45.563 33.75 1.125 (7, 6) 6.50 42.250 32.50 0.500
1 n
i) E ( y ) yi 7 Y , where n is the number of sample.
n i 1
1 n
ii) E ( N y ) N yi 35 , or
n i 1
E( N y ) N E( y ) 35 .
1 n 1 n 2 ( N n) S 2
iii) V ( y ) ( yi Y ) yi Y 0.1875 , and
2 2
0.1875 .
n i 1 n i 1 nN
Therefore,
( N n) S 2
V ( y) 0.1875 .
nN
1 n 2
iv) E ( s 2 )
n i 1
si 0.625 S 2 .
2 N 1 2
and under srswr , V ( y ) S (2.8)
n nN
Comparing (2.7) and (2.8), we note that ( N 1) ( N n) , which is always the case
N 1 2 N n 2
S S .
nN nN
Example: In a population N 5 , the values are 2, 4, 6, 8 and 10, then for a srs size n 3 ,
show that V ( y ) srswor V ( y ) srswr .
Solution: We know that
N n 2 N 1 2
V ( y ) srswor S , and V ( y ) srswr S ,
nN nN
1 N 1 N
where, S 2 i
N 1 i 1
(Y Y ) 2
10 and Y Yi 6 .
N i 1
Thus,
4 8
V ( y ) srswor , V ( y ) srswr , and therefore,
3 3
V ( y ) srswor V ( y ) srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T i yi is a class of linear estimator of Y , where i ' s are coefficient attached to
i 1
sample values, then,
n
i) The class T is linear unbiased estimate class if i 1.
i 1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
n n n n
i) E (T ) E i yi i E ( yi ) i Y Y , iff i 1 .
i 1 i 1 i 1 i 1
2
n n
ii) V (T ) E i yi Y , under i 1 .
i 1 i 1
n
2
n n
2
E i yi 2 Y i yi Y 2 E i yi Y 2 .
i 1 i 1 i 1
Consider,
2
n n n n n
E i yi E i2 yi2 i j yi y j i2 E ( yi2 ) i j E ( yi y j )
i 1 i 1 i j i 1 i j
Simple random sampling 17
Now
1 N 2
E ( yi2 ) yi , note that
N i 1
N N N
( N 1) S 2 ( yi Y ) 2 yi2 N Y 2 or yi2 ( N 1) S 2 N Y 2 .
i 1 i 1 i 1
Thus,
1
E ( yi2 ) ( N 1) S 2 Y 2
N
N
1 1 N
and E ( yi y j ) yi Pr (i) y j Pr ( j | i ) yi y j .
i j
N N 1 i j
Note that
2
N N N N
yi yi2 yi y j ( N 1) S 2 N Y 2 yi y j
i 1 i 1 i j i j
N
yi y j N 2Y 2 ( N 1) S 2 N Y 2 .
i j
Hence,
1 1
E ( yi y j ) [ N 2Y 2 ( N 1) S 2 N Y 2 ] Y 2 S 2 / N .
N N 1
and
2
n n
1
n S 2
E i yi i2 ( N 1) S 2 Y 2 i j Y 2
N i j N
i 1 i 1
n
S 2 n n n S 2
S 2
i2
N i 1
i2 Y 2 i2 1
i2 Y 2
N
i 1 i 1 i 1
n
S2
S 2 i2 Y 2 .
i 1
N
Thus,
n n 2
n
S2 1 1 n
V (T ) S 2 i2 , since i2 i , under condition i 1 ,
i 1
i 1
N i 1
n n i 1
then
n 1
2
1 1
V (T ) S i .
2
i 1 n n N
18 RU Khan
n 2
1 1
Therefore, we note that V (T ) will be minimum, if i 0 , where i , for all
i 1
n n
1 n
i 1, 2, , n , and T yi y .
n i 1
OR
Differentiating variance function with respect to i and equating to zero, we get
1 1 1 n
V (T ) 2 S 2 i 0 i , for all i 1, 2, , n , and T yi y .
i n n n i 1
1 N NP
Population mean Yi P.
N i 1 N
1 N 1 N 2 NP
Population variance
N i 1
(Yi P ) 2
N i 1
Yi P 2
N
P 2 PQ .
1 N 1 N 2
Mean square of population
N 1 i 1
(Yi P) 2
N 1 i 1
Yi NP 2
NP NP 2 NPQ
.
N 1 N 1
Similarly, assign to the i th member of the sample the value y i , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise, then
n
1 n a
Sample total yi np a , and Sample mean yi p .
i 1
n i 1 n
Simple random sampling 19
1 n 1 n 2
Mean square for sample
n 1 i 1
2
( yi p)
n 1 i 1
yi np 2
1 npq
(np np 2 ) .
n 1 n 1
N 2 2 N 2 PQ
V ( Aˆ ) V (Yˆ ) N 2V ( y ) .
n n
pq PQ
Theorem: Vˆ ( p) v( p) is an unbiased estimate of V ( p) .
n 1 n
pq n pq 1 npq
Proof: E [Vˆ ( p)] E E E
n 1 n n 1 n n 1
PQ npq
, since in srswr E ( s 2 ) 2 PQ and s 2 .
n n 1
pq PQ
Corollary: Vˆ ( Aˆ ) Vˆ ( Np) N 2 Vˆ ( p) N 2 is an unbiased estimate of V ( Aˆ ) N 2 .
n 1 n
Remarks
i) The standard error (SE ) of p is SE ( p) PQ / n .
Results are:
i) E ( p) E ( y ) Y P . This shows that sample proportion p is an unbiased estimate of
N n 2 N n NPQ N n PQ
population proportion P and V ( p) V ( y ) S .
nN nN N 1 N 1 n
ii) E ( Aˆ ) E ( Np) N E ( p) NP A , means that Np is an unbiased estimate of NP and
N n 2 2 N n NPQ N n PQ
V ( Aˆ ) V (Yˆ ) N 2V ( y ) N 2 S N N2 .
nN nN N 1 N 1 n
20 RU Khan
N n pq N n PQ
Theorem: Vˆ ( p) v( p) is an unbiased estimate of V ( p) .
n 1 N N 1 n
N n pq N n npq N n npq
Proof: E [Vˆ ( p)] E E E
n 1 N nN n 1 nN n 1
N n PQ NPQ npq
, since in srswor E ( s 2 ) S 2 and s 2 .
N 1 n N 1 n 1
N n
Corollary: Vˆ ( Aˆ ) Vˆ ( Np) N 2 Vˆ ( p) N pq is an unbiased estimate of
n 1
N n PQ
V ( Aˆ ) N 2 .
N 1 n
Remarks
N n PQ
The standard error (SE ) of p is SE ( p) and the standard error of Â
N 1 n
N n PQ
is SE ( Aˆ ) N .
N 1 n
Example: A list of 3000 voters of a ward in a city was examined for measuring the
accuracy of age of individuals. A random sample of 300 names was taken, which revealed
that 51 citizens were shown with wrong ages. Estimate the total number of voters having a
wrong description of age in the list and estimate the standard error.
a
Solution: Given N 3000 , n 300 , a 51 , and p 0.17 , then, Aˆ N p 510 .
n
i) If srswr , is considered, the estimate of the standard error is given by
pq
Est [ SE ( Aˆ )] N 65.1696 65 .
n 1
ii) If srswor , is considered, the estimate of the standard error is given by
N n
Est [ SE ( Aˆ )] N pq 61.8246 62 .
n 1
confidence limits. For instance, when 0.05 the degree of confidence is 0.95 and we get a
95% confidence interval.
y Y
or Pr Z / 2 Z / 2 1
SE ( y )
or Pr [Z / 2 SE ( y ) y Y Z / 2 SE ( y )] 1
or Pr [ y Z / 2 SE ( y ) Y y Z / 2 SE ( y )] 1 .
The probability being (1 ) , the interval
Pr [ y Z / 2 SE ( y ) Y y Z / 2 SE ( y )] 1 will include Y , i.e. y Z / 2 / n
will include Y .
2. Confidence limit for population total: On the same above lines, we see that
Pr [ N y Z / 2 SE (Yˆ ) Y N y Z / 2 SE (Yˆ )] 1
Pr [ y Z / 2 SE ( y ) Y y Z / 2 SE ( y )] 1 .
Since sample size is small and variance of population is unknown, so the interval is defined
as
N n S
y t / 2, n 1 S y t / 2, n 1 , as population size is very large.
nN n
1 n 2
S 2 is unknown, it can be replaced by its estimator s 2 i
n 1 i 1
y n y 2
44.25 .
Therefore,
6.652
Upper confidence limit 7.125 2.131 10.668853 11 , and
16
6.652
Lower confidence limit 7.125 2.131 3.58 4 .
16
Example: In a mess, it was observed that leftover cost a lot. A survey was conducted to find
out the optimum quantity for each item. A random sample of 10 inmates showed that they
taken 4, 5, 2, 3, 1, 7, 2, 3, 4, 4 slices of bread in their breakfast. If there are 120 breakfasts are
to be served every day, estimate the number of slices required every day. Also obtain a 95%
confidence interval for it.
1 n
Solution: Given N 120 , n 10 , and y yi 3.5 , then
n i 1
Since sample size is small and variance of population is unknown, so that, confidence limit
N y t / 2, n 1 NS (1 f ) / n
1 n 2
s2
n 1 i 1
yi n y 2 2.94444 .
Hence,
10
Upper confidence limit 420 2.262 120 1.716 1 / 10 561.02517 561
120
and
Lower confidence limit 420 141.02517 278.97483 279 .
Example: 100 villages were selected under srswor from a list of 1521 villages. It was
found that 19 of the selected villages where illegally occupied by some landlords. Estimate all
such villages occupied by the landlords out of the total 1521 villages and 95% confidence
interval.
Solution: Given N 1521, n 100 , and a 19 , then, p 0.19
Estimate of number of village illegally occupied by landlords in the population of villages
Aˆ N p 288.99 289 .
Since sample size is 30 and variance of population proportion is unknown, then,
confidence limit will be
N p Z / 2 SE ( Aˆ ) , where, SE (Aˆ ) is unknown, so it can be replaced by its unbiased
estimator
N n
N pq 57.964667 .
n 1
Thus,
Upper confidence limit 289 1.96 57.964667 412.5227 413 , and
Lower confidence limit 289 1.96 57.964667 165.4773 165 .
Example: A simple random sample of 30 households was drawn without replacement from
a city area containing 14848 households. The number of persons per household in the sample
were as follows: 5, 6, 3, 3, 2, 3, 3, 3, 4, 4, 3, 2, 7, 4, 3, 5, 4, 4, 3, 3, 4, 3, 3, 1, 2, 4, 3, 4, 3 and
4. Estimate the average and total number of people in the area and compute the probability
that these estimates are with in 10% of the true value.
Solution: Given N 14848 , and n 30 , then,
105
Estimate of the population total Y N y 14848 51968 . Assuming that the
30
1 f
population values are normally distributed, so that, N y ~ N Y , NS , thus,
n
24 RU Khan
y Y d or y Y d or | y Y | d .
Simple random sampling 25
Since | y Y | d differ from sample to sample, so this margin of error can be specified
in the form of probability statement as:
Pr [ | y Y | d ] or Pr [ | y Y | d ] 1 . (2.9)
Where is small and it is the risk that we are willing to bear if the actual difference is
greater than d . This is called the level of significance and (1 ) is called level of
confidence or confidence coefficient.
As the population is normally distributed, so the sample mean will also follow the normal
y Y
distribution i.e. y ~ N [Y , V ( y )] , then Z ~ N (0,1) .
V ( y)
For the given value of we can find a value Z of standard normal variate from the
standard normal table by the following equation:
| y Y |
Pr Z 2 or Pr [| y Y | V ( y ) Z 2 ] (2.10)
V ( y )
Comparing the equation (2.9) and (2.10), we get
1 1
d Z 2 V ( y ) , so that d 2 Z2 2 V ( y ) Z 2 2 S 2 .
n N
Z2 2 S 2 1 1 1 1 Z2 / 2 S 2
1 n0 , where n0 (2.11)
d2 n N n N d2
n n n0 n n0
or 1 0 0 1 0 or n (2.12)
n N n N n
1 0
N
d2 Z 2 2 S 2 S2
d Z 2 V ( y ) V ( y ) , and n0
Z / 2 d2 V ( y)
V ( y) V ( y)
Let CV ( y ) e e 2 or V ( y ) e 2 Y 2 (2.13)
Y 2
Y
Substitute equation (2.13) in relation (2.11), we get,
S2
n0 , and hence n from (2.12).
e2 Y 2
26 RU Khan
Remark
i) To get n such that the margin of error in the estimate Yˆ N y of the population total Y
is d , then, | Yˆ Y | d or | N y N Y | d , or N | d | d or N 2 d 2 d 2 , or
d 2
d2 .
N2
Therefore,
2
N Z 2 S
n0 , and n can be obtained by the relation (2.12).
d
V N2 S2
V ( y) , and n0 , then, n from (2.12).
N2 V
Example: For a population of size N 430 roughly we know that Y 19 , S 2 85.6 with
srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y apart
chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y Y 10% of Y or | y Y | 10% of Y 1.9 , so that
10
1 Z 2 2 S 2 (1.96) 2 85.6
Pr [ | y Y | 1.9] 0.05 , and n0 91.091678 .
20 d2 (1.9) 2
Therefore,
n0
n 75.168 75 .
n0
1
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ Y | 1000 , so that, Pr [ | Yˆ Y | 1000] 0.05 .
20
We know that
2
n0 N Z 2 S 676 1.96
2
n
, here, n0
n0 d 1000 229 402.01385 , and hence
1
N
n 252.09 252 .
Simple random sampling 27
N n PQ
d Z 2 V ( p) or d 2 Z2 / 2 V ( p) Z 2 2 , as sampling is srswr .
N 1 n
Z 2 2 PQ N n N n Z 2 2 PQ PQ
1 n0 , where n0 (2.16)
d2 n ( N 1) n ( N 1) d2 V ( p)
N 1 N n N N N 1
or 1 1
n0 n n n n0
N N n0 n0 n0
or n (2.17)
N 1 n0 ( N 1) n0 N 1 n
1 1 0
n0 N N N
If N is sufficiently large, then n n0
b) If precision is specified in terms of V ( p) i.e. V ( p) V (given).
PQ
Substituting V ( p) V in relation (2.16) we get, n0 , and hence n can be obtained
V
by relation (2.17).
c) When precision is given in terms of coefficient of variation of p
Let
V ( p) V ( p)
CV ( p) e e2 , or V ( p) e 2 P 2 (2.18)
P 2
P
Substitute equation (2.18) in relation (2.16), we get,
PQ Q 1 1
n0 1 , and hence n is given by the relation (2.17).
e2 P2 e2 P e2 P
28 RU Khan
Remarks
i) To get n , if the margin of error in the estimate Aˆ Np of the population total A NP is
d , then,
d 2
| Aˆ A | d or | N p N P | d , or N | d | d , or N 2 d 2 d 2 , or d 2 .
N2
Thus,
2
N Z 2 PQ
n0 , and n can be obtained by the relation (1.17).
d
1 1 2138
2
S2 i 134.5 , and
2 2
Y NY 131 682 36
N 1 i 36 1 36
1
| Yˆ Y | 200 , then, Pr[| Yˆ Y | 200] 0.05 .
20
Simple random sampling 29
We know that
2 2
n0 N Z / 2 36 1.96
n , here n0 S 134.5 16.7409 , and therefore,
1
n0 d 200
N
n 11.42765 12 .
If C (n) is the cost of a sample of size n then the most economic sample size will be that
which minimize the sum of cost and expected loss. Thus the problem of determination of the
sample size can be stated as
Find n such that (n) C (n) L(n) is minimum.
Exercise: If the loss function due to an error in y is proportional to | y Y | and if the total
cost of the survey is C c0 c1n , show that with simple random sampling, ignoring the
2/3
S
fpc , the most economical value of n is , where is a constant.
c 2
1
Solution: Given l ( z ) | y Y | , then, l ( z ) | y Y | , and
S2 2 2
y ~ N Y , , when fpc is ignored V ( y ) S , z ( y Y ) ~ N 0, S , so that
n n n
1 1 z 2 1 n z2
f ( z) exp exp
(S / n ) 2 2 S / n (S / n ) 2 2S
Now
| z | | y Y | y Y z , if y Y .
| z | | y Y | y Y z , if y Y .
Thus, the expected loss
0
L(n) | z | f ( z ) dz | z | f ( z ) dz | z | f ( z ) dz
0
0
z f ( z ) dz z f ( z ) dz 2 z f ( z ) dz
0 0
1
2 z exp (n z 2 / 2 S ) dz
0 (S n ) 2
30 RU Khan
n z2 2n z S2
Put t , then dz dt or z dz dt .
2S2 2S2 n
Therefore,
S2 1 2 S t 2 S
L ( n) 2 e t dt e dt , as e t dt 1 .
0 n (S n) 2 2 n 0 2 n 0
Exercise: With a loss function l ( z ) z 2 and a cost function C c0 c1n . Show that
using srs the most economic value of the sample size n to estimate the population mean Y
12
S2
is , where z y Y , y is the sample mean used to estimate Y .
c1
L(n) E [ z 2 ] E ( z 2 ) . Consider,
V ( z ) E [ z E ( z )]2 E ( z 2 ) , since E ( z ) E ( y Y ) 0 .
Also
1 1
V ( z) V ( y Y ) V ( y) S 2 .
n N
Therefore,
S2 S2 S2 S2
E (z 2 ) , and the expected loss L(n) .
n N n N
To determine the value of n , consider the function
S2 S2
( n) L ( n) C ( n) c0 c1 n
n N
Differentiate this function with respect to n , we get
1/ 2
S2 S2 S2
0 c1 , or c1 , or n .
n n2 n2 c1
Simple random sampling 31
Exercise: The selling price of a lot of standing timber is UW , where U is the price per
unit volume and W is the volume of timber on the lot. The number N of logs on the lot is
counted, and the average volume per log is estimated from a simple random sample of n
logs. The estimate is made and paid for by the seller and is provisionally accepted by the
buyer. Later, the buyer finds out the exact volume purchased, and the seller reimburses him if
he has paid for more than was delivered. If he has paid for less than was delivered, the buyer
does not mention the fact.
Construct the seller's loss function. Assuming that the cost of measuring n logs is cn , find
the optimum value of n . The standard deviation of the volume per log may be denoted by S
and the fpc ignored.
Solution: Let Ŵ be the estimated total volume of the timber. The error in the estimate is
Wˆ W .
If Wˆ W z 0 sellers loss is zero, i.e. l ( z ) 0
N 2S 2
Wˆ ~ N W , , or z (Wˆ W ) ~ N 0, N S , so that
n
n
2
1 1 z 1 n z 2
f ( z)
exp exp
( NS n) 2 2 NS n ( NS n) 2 2 N 2S 2
1 n z 2
exp
0
L ( n) l ( z ) f ( z ) dz (Uz) dz
( NS n) 2 2 N 2S 2
1 n z 2
exp
0
Uz dz
( NS n) 2 2 N 2S 2
1 n z 2
Uz exp dz
0 ( NS n) 2 2 N 2S 2
n z2 2n z N 2S 2
Put t , then dz dt or z dz dt .
2 N 2S 2 2 N 2S 2 n
Therefore,
UN 2 S 2 1 UNS t UNS t
L ( n) e t dt 0 e dt , as 0 e dt 1 .
0 n ( NS n) 2 2 n 2 n
To determine the value of n , consider the function
UNS 1 / 2
( n) L ( n) C ( n) c n n .
2
32 RU Khan
1 NQ N NP N NP
YNQ i
NQ i 1
Y 0 , also, i Yi , and
Y Yi2 Yi2 , so that NY NP YNP ,
i 1 i 1 i 1 i 1
1
or YNP Y . By definition,
P
1 N 1 N 2 N
2
N i 1
(Yi Y ) 2
N i 1
Yi Y 2
, or N 2
Yi2 NY 2 .
i 1
NP
Similarly, NP 02 Yi2 NP YNP
2
.
i 1
Thus,
1 1 Q
N ( 2 P 02 ) NP YNP
2
NY 2 NP Y 2 NY 2 N 1 Y 2 N Y 2 .
P 2 P P
Therefore,
Q 2 Q 2
P o2 2 Y 2 or o2 Y .
P P P2
Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn
without replacement and added to the original sample. Show that the mean based on (n n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1 3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1 n1 / n) 2
size is large.
Simple random sampling 33
Solution: Let the sample mean based on n , n1 , and n n1 elements are denoted by y n ,
n n1
y n1 , and y nn1 respectively, and are defined as y n 1 yi , y n1 1 yi , and
n i 1 n1 i 1
n y n n1 y n1
y n n1 . We have to show E ( y nn1 ) Y , in this case the expectation is taken
n n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y nn1 ) E (n y n n1 y n1 ) E [n y n n1 E ( y n1 n)]
n n1 n n1
1
E (n y n n1 y n ) , since n1 is a sub-sample of the sample of size n .
n n1
1
(n Y n1 Y ) Y .
n n1
To obtain the variance
2
n y n n1 y n1
V ( y n n1 ) E ( y n n1 Y ) E
2
Y
n n1
1
E [n y n n1 y n1 (n n1 ) Y ] 2
2
(n n1 )
1
E [n y n n Y n1 y n1 n1 Y ] 2
2
(n n1 )
1
E [n ( y n Y ) n1 y n1 n1 y n n1 y n n1Y ] 2
2
(n n1 )
1
E [(n n1 ) ( y n Y ) n1 ( y n1 y n )]2
2
(n n1 )
1
[(n n1 ) 2 E ( y n Y ) 2 n12 E ( y n1 y n ) 2 ] , as samples are drawn
2
(n n1 )
independently.
1
[(n n1 ) 2 V ( y n ) n12 E{E ( y n1 y n ) 2 n}]
2
(n n1 )
1 1 1
(n n1 ) 2 V ( y n ) n12 E S n2
(n n1 ) 2 n1 n
1 n n1 2
(n n1 ) 2 V ( y n ) n12 S
(n n1 ) 2 n1n
34 RU Khan
1 n (n n1 ) 2 n (n n1 ) 2
(n n1 ) 2 V ( y n ) 1 S V ( yn ) 1 S .
2
(n n1 ) n n (n n1 ) 2
Therefore,
V ( y n n1 ) n1 (n n1 ) n1 (n n1 )
1 S 2 1 S2
2 2 2
V ( yn ) n (n n1 ) V ( y n ) n (n n1 ) S / n
n 2 3 n1n 1 (3 n1 / n)
.
(n n1 ) 2 (1 n1 / n) 2
Exercise: A simple random sample of size n n1 n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that
n ( y y2 ) n22 n2 1 1 2
V ( y1 y ) V 2 1 V ( y1 y 2 ) 2 S
n n2 n 2 n1 n2
n 2 n n2 2 n2 2 n n1 2 1 1 2
2 1 S S S S .
n 2 n1 n2 n1 n n1 n n1 n
iii) Cov ( y, y1 y ) E [ y ( y1 y )] E ( y ) E ( y1 y )
E ( y y1 y 2 ) Y 0 E ( y y1 ) E ( y 2 ) (1)
Simple random sampling 35
Consider
n y n2 y 2 n n
E ( y y1 ) E 1 1 y1 E 1 y12 2 y1 y 2
n n n
n n
1 E ( y12 ) 2 E ( y1 ) E ( y 2 )
n n
n1 n2 2 n1 S 2 n
2
[ V ( y1 ) Y ] Y Y 2 2 Y 2
n n n n1 n
S 2 n1 2 n2 2 S 2
Y Y Y 2 (2)
n n n n
Now
2 2 S2
2
V ( y) E ( y ) Y 2
or E ( y ) V ( y ) Y Y 2 (3)
n
In view of equations (1), (2), and (3), we get
S2 S2
Cov ( y , y1 y ) Y 2 Y 2 0.
n n
Exercise: A population has three units U1 ,U 2 and U 3 with variates Y1 ,Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample (s ) P (s) Estimator t Estimator t
(U1 , U 2 ) 1/ 2 Y1 2Y2 Y1 2Y2 Y12
(U1 ,U 3 ) 1/ 2 Y1 2Y3 Y1 2Y3 Y12
Prove that both t and t are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t ) t i p (t i ) (Y1 2Y2 Y1 2Y3 ) Y .
i
2
Therefore,
Similarly,
1
E (t ) t i p (t i ) (Y1 2Y2 Y12 Y1 2Y3 Y12 ) Y , hence, t is unbiased for Y .
i
2
1
E (t 2 ) [(Y1 2Y2 Y12 ) 2 (Y1 2Y3 Y12 ) 2 ]
2
1 4
(Y1 2Y13 Y12 4Y12Y2 4Y1Y2 4Y22 Y14 2Y13
2
Therefore,
V (t ) E (t 2 ) [ E (t )]2
We conclude that both linear estimator t and quadratic estimator t are unbiased; among
which estimator has minimum variance depends on the variate values.