Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
388 views30 pages

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
388 views30 pages

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SIMPLE RANDOM SAMPLING

A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr) .
ii) Simple random sampling without replacement (srswor) .

Simple random sampling with replacement (srswr)


In sampling with replacement a unit is selected from the population consisting of N units, its
content noted and then returned to the population before the next draw is made, and the
process is repeated n times to give a sample of n units. In this method, at each draw, each of
1
the N units of the population gets the same probability of being selected. Here the same
N
unit of the population may occur more than once in the sample (order in which the sample
units are obtained is regarded). There are N n samples, and each has an equal probability
1
of being selected.
Nn
Note: If order in which the sample units are obtained is ignored (unordered), then in such
case the number of possible samples will be
N
Cn  N (1 N 1C1  N 1C2    N 1Cn2 ) .

Simple random sampling without replacement (srswor)


Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population
before next draw is made. The process is repeated n times to give a sample of n units. In this
method at the r  th drawing, each of the N  r  1 units of the population gets the same
1
probability of being included in the sample. Here any unit of the population cannot
N  r 1
occur more than once in the sample (order is ignored). There are N C n possible samples, and
1
each such sample has an equal probability of being selected.
N
Cn
Example: For a population of size N  5 with values 1, 3, 6, 8 and 9 make list of all
possible samples of size n  3 by both the methods [ srswr (unordered) and srswor ].
Solution: By the sampling wr , the number of possible samples will be
N
Cn  N (1 N 1C1   N 1Cn  2 )5C3  5 (1 4C1 )  35 , which are as follows:
(1, 1, 1), (1, 1, 3), (1, 1, 6), (1, 1, 8), (1, 1, 9), (1, 3, 3), (1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 6),
(1, 6, 8), (1, 6, 9), (1, 8, 8), (1, 8, 9), (1, 9, 9), (3, 3, 3), (3, 3, 6), (3, 3, 8), (3, 3, 9), (3, 6, 6),
(3, 6, 8), (3, 6, 9), (3, 8, 8), (3, 8, 9), (3, 9, 9), (6, 6, 6), (6, 6, 8),(6, 6, 9), (6, 8, 8), (6, 8, 9), (6,
9, 9), (8, 8, 8), (8, 8, 9), (8, 9, 9), (9, 9, 9).
8 RU Khan

By the sampling wor , the number of possible samples will be N Cn  5C3  10 , which are as
follows:
(1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 8), (1, 6, 9), (1, 8, 9), (3, 6, 8), (3, 6, 9), (3, 8, 9), (6, 8, 9).

Theory of simple random sampling with replacement


N , population size.
n , sample size.
Yi , value of the i  th unit of the population.
yi , value of the i  th unit of the sample.
N
Y   Yi , population total.
i 1

1 N
Y   Yi , population mean.
N i 1

1 n
y  yi , sample mean.
n i 1

1 N 1 N 2
   (Yi  Y )   Yi  Y 2 , population variance.
2 2
N i 1 N i 1

1 N 1  N 2 
S2  
N  1 i 1
(Yi  Y ) 2
 
N  1  i 1
Yi  N Y 2  , population mean square.

1 n 1  n 2 
s2  
n  1 i 1
( y i  y ) 2
 
n  1  i 1
yi  n y 2  , sample mean square.


Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
N 1 2  2
i.e. E ( y )  Y and its variance V ( y )  S  .
nN n
Proof: It is immediately seen that
1 n  1 n
E ( y )  E   yi    E ( yi ) . By definition,
n 
 i 1  n i 1
N
1 N
E ( yi )   Yi Pr ( yi  Yi )   Yi  Y , since yi can take any one of the values
i 1
N i 1
Y1 ,, YN each with probability 1 / N .
Therefore,

1 n
E( y)  Y  Y .
n i 1
Simple random sampling 9

To obtain the variance, we have


2 2 2
1 n  1  n  1 n 
V ( y )  E [ y  E ( y )]  E  yi  Y
2   E  yi  nY   E   ( yi  Y ) 
n  n 2  i 1  n 2 i 1
 i 1   
 
1  n  1  n 
 E   ( yi  Y ) 2   E   ( yi  Y ) ( y j  Y )  .
n 2 i 1  n
2
i, j 
i  j 

Justification of the above result can see by taking particular case, i.e. as
2 2
n   n 
{ yi  E ( yi )}    ai   (a1  a2  ...  an ) . Put n  3 , then,
2

i 1   i 1 
3 3
(a1  a2  a3 ) 2  a12  a22  a32  a1a2  a1a3  a2a1  a2a3  a3a1  a3a2   ai2   ai a j .
i 1 i, j
i j

1 n 1 n
  E ( yi  Y ) 2   E [( yi  Y ) ( y j  Y )] , i  j .
n 2 i 1 n 2 i, j

1 n 1 n
  V ( yi )   Cov ( yi , y j ) (2.1)
n 2 i 1 n 2 i, j
i j

Consider
N
V ( yi )  E ( yi  Y ) 2   (Yi  Y ) 2 Pr ( yi  Yi )
i 1

1 N
 
N i 1
(Yi  Y ) 2 , since yi can take any one of the values Y1 ,, YN each with

probability 1 / N .

N 1 2 1 N
2 
N
S , since S 2  
N  1 i 1
(Yi  Y ) 2 (2.2)

and
N
Cov ( yi , y j )  E [( yi  Y ) ( y j  Y )]   (Yi  Y ) (Y j  Y ) Pr ( yi  Yi , y j  Y j ) .
i, j

In this case y j can take any one of the values Y1 ,, YN with probability 1 / N irrespective of
the values taken by yi , because old composition of the population remain the same
throughout the sampling process due to the sampling with replacement. In other words for
i  j , yi and y j are independent, so that
10 RU Khan

1 1 1
Pr ( yi  Yi , y j  Y j )  Pr ( yi  Yi ) Pr ( y j  Y j )    .
N N N2
Hence,

1 N 1 N N
2 i 2  i
Cov ( yi , y j )  (Y  Y ) (Y j  Y )  (Y  Y )  (Y j  Y )  0 . (2.3)
N i, j N i 1 j 1

Substitute the values of equations (2.2) and (2.3) in equation (2.1), we get

1 n N 1 2 N 1 2  2
V ( y)   S  S  .
n 2 i 1 N nN n

Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance


N 2 2 N ( N  1) 2
V (Yˆ )   S .
n n
Proof: By definition,

1 N
E (Y )  E ( N y )  N E ( y )  N Y  N  Yi  Y
ˆ
N i 1

N 2 2 N ( N  1) 2
and V (Yˆ )  V ( N y )  N 2 V ( y )   S .
n n

Remarks:

 N 1
i) The standard error (SE ) of y is SE ( y )  V ( y )  S .
n nN

N N ( N  1)
ii) The standard error Yˆ is SE (Yˆ )  V (Yˆ )  S .
n n

Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population


variance  2 i.e. E (s 2 )   2 .
Proof: By definition
 1 n  1 n 
E (s 2 )  E   ( y i  y ) 2
   E ( y i )  n E ( y )  .
2 2

 n  1 i 1  n  1 i 1 

To obtain E ( yi2 ) and E ( y 2 ) , note that

V ( yi )  E ( yi2 )  Y 2 , so that

E ( yi2 )   2  Y 2 , since V ( yi )  ( N  1) S 2 / N   2 .

and

V ( y )  E ( y 2 )  Y 2 , so that
Simple random sampling 11

2  N  1 2  2
E( y 2 )   Y 2 , since V ( y )    S  , for srswr .
n  nN  n
Therefore,

1 n  2   N  1 2
E (s 2 )   ( 2  Y 2 )  n   Y 2    2   S .
n  1 i 1  n   N 
  
Example: In a population with N  5 , the values of Yi are 8, 3, 11, 4 and 7.

a) Calculate population mean Y , variance  2 and mean sum square S 2 .


b) Enumerate all possible samples of size 2 by the replacement method and verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y )  Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y )  Y .

( N  1) S 2  2
iii) V ( y )   , and
nN n
 N  1 2
iv) E ( s 2 )   S  .
2
 N 
Solution:
a) We know that

1 N 1 N 2 1  N 2 
Y   Yi  6.6 ,    Yi  Y  8.24 and S 
N i 1
2
N i 1
2 2

N  1  i 1
Yi  N Y 2   10.3 .


b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 8) 8.0 64.00 40.0 0.0 (11, 4) 7.5 56.25 37.5 24.5
(8, 3) 5.5 30.25 27.5 12.5 (11, 7) 9.0 81.00 45.0 8.0
(8, 11) 9.5 90.25 47.5 4.5 (4, 8) 6.0 36.00 30.0 8.0
(8, 4) 6.0 36.00 30.0 8.0 (4, 3) 3.5 12.25 17.5 0.5
(8, 7) 7.5 56.25 37.5 0.5 (4, 11) 7.5 56.25 37.5 24.5
(3, 8) 5.5 30.25 27.5 12.5 (4, 4) 4.0 16.00 20.0 0.0
(3, 3) 3.0 9.00 15.0 0.0 (4, 7) 5.5 30.25 27.5 4.5
(3, 11) 7.0 49.00 35.0 32.0 (7, 8) 7.5 56.25 37.5 0.5
(3, 4) 3.5 12.25 17.5 0.5 (7, 3) 5.0 25.00 25.0 8.0
(3, 7) 5.0 25.00 25.0 8.0 (7, 11) 9.0 81.00 45.0 8.0
(11, 8) 9.5 90.25 47.5 4.5 (7, 4) 5.5 30.25 27.5 4.5
(11, 3) 7.0 49.00 35.0 32.0 (7, 7) 7.0 49.00 35.0 0.0
(11, 11) 11.0 121.00 55.0 0.0
12 RU Khan

1 n 1
i) E( y)  
n  i 1
yi 
25
 165  6.6  Y , where n  is the number of sample.

1 n
ii) E ( N y )   N yi  33 or E( N y )  N E( y )  33 .
n  i 1

1 n 2
iii) V ( y ) 
 
n i 1
yi  Y 2  4.12 .

Now,

( N  1) S 2 2
 4.12 , and  4.12 , therefore,
nN n
(n  1) S 2  2
V ( y)    4.12 .
nN n
1 n 2 1
iv) E ( s 2 )   si  25  206  8.24
n  i 1
(1a)

( N  1) S 2
and  8.24 (2a)
N
In view of equation (1a) and (2a), we get

( N  1) S 2
E (s 2 )    2  8.24 .
N

Theory of simple random sampling without replacement


Theorem: In srswor , sample mean y is an unbiased estimate of the population mean Y
 N n 2
i.e. E ( y )  Y and its variance is V ( y )   S .
 nN 
Proof: As in srswr ,

1 n 1 n
E ( y )  Y , and V ( y )   V ( yi )   Cov ( yi , y j ) , (2.4)
n 2 i 1 n 2 i, j
i j

N 1 2
where V ( yi )  S , for each i . (2.5)
N
Consider
N
Cov ( yi , y j )  E [( yi  Y ) ( y j  Y )]   (Yi  Y ) (Y j  Y ) Pr ( yi  Yi , y j  Y j ) .
i, j

In this case y j can take any one of the values except Yi , the value which is known to have
1
already been assumed by yi , with equal probability , so that for i  j ,
N 1
Simple random sampling 13

1 1
Pr ( yi  Yi , y j  Y j )  Pr ( yi  Yi ) Pr ( y j  Y j | yi  Yi )   .
N N 1
Hence,
N
1
Cov( yi , y j )   (Yi  Y ) (Y j  Y )
N ( N  1) i, j

1 N 
N 

 
N ( N  1) i 1
(Yi  Y )  j (Y  Y )  (Yi  Y ) 
 j 1
 

1  N N N 
   (Yi  Y )  (Y j  Y )   (Yi  Y ) 2 
N ( N  1)  i 1 j 1 i 1 
 

1 N
S2
  (Yi  Y )   N
N ( N  1) i 1
2
(2.6)

Substitute the values of equations (2.5) and (2.6) in equation (2.4), we get

1  ( N  1) S 2  1  S 2  ( N  1) 2 n  1 2
V ( y)  n  n (n  1)    S  S
n 2  N  n2  N  nN n N
  

 N n 2  n  S2 S2
  S  1    (1  f ) ,
 nN   N n n
n
where f  is called the sampling fraction and the factor (1  f ) is called the finite
N
population correction ( fpc ) . If the population size N is very large or if n is small
n
corresponding with N , then f   0 and consequently fpc  1.
N
Alternative expression
 N n 2 1 1  2
V ( y)   S   S .
 nN  n N 
Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance
V (Yˆ )  N 2 (1  f ) S 2 / n .
Proof:
By definition,

1 N
E (Yˆ )  E ( N y )  N E ( y )  N Y  N  Yi  Y
N i 1

and

 N n 2 S2
V (Yˆ )  V ( N y )  N 2  2
 S  N (1  f ) .
 nN  n
14 RU Khan

Remarks

N n 1 f 1 1 
i) The standard error of y is SE ( y )  S S S   .
nN n n N 

N n 1 f 1 1 
ii) The standard error Yˆ is SE (Yˆ )  N S NS NS   .
nN n n N 
For large population fpc  (1  f )  1, then

S2 S
i) V ( y )  , and SE ( y )  .
n n

ˆ N 2S 2 NS
ii) V (Y )  , and SE (Yˆ )  .
n n

Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population


mean square S 2 i.e. E ( s 2 )  S 2 .
Proof: By definition,
 1 n  1 n 
E (s 2 )  E   ( y i  y ) 2
   E ( y i )  n E ( y )  .
2 2
 n  1 i 1  n  1 i 1 

To obtain E ( yi2 ) and E ( y 2 ) , note that

V ( yi )  E ( yi2 )  Y 2 , so that
N 1 2
E ( yi2 )  S  Y 2 , since V ( yi )  ( N  1) S 2 / N .
N

and V ( y )  E ( y 2 )  Y 2 , so that

 N n 2  N n 2
E( y 2 )   2
 S  Y , since V ( y )    S , for srswr .
 nN   nN 
Therefore,

1  n  N 1 2  N n 2 
E (s 2 )    S  Y 2   n S  Y 2 
n  1 i 1  N   nN 

1 S2 1 S2
 [n ( N  1)  ( N  n)]  (n  1) N  S2.
n 1 N n 1 N
Example: A random sample of n  2 households was drawn from a small colony of N  5
households having monthly income (in rupees) as follows:

Households: 1 2 3 4 5
Income (in thousand rupees): 8 6.5 7.5 7 6

a) Calculate population mean Y , variance  2 and mean sum square S 2 .


Simple random sampling 15

b) Enumerate all possible samples of size n  2 by the without replacement method and
verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y )  Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y )  Y .

( N  n) S 2
iii) V ( y )  , and
nN

iv) E ( s 2 )  S 2 .
Solution:
a) We know that

1 N 1 N 1  N 2 
Y  
N i 1
Yi  7 ,  2   Yi2  Y 2  0.5 , and S 2 
N i 1

N  1  i 1
Yi  N Y 2   0.625 .


b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 6.5) 7.25 52.563 36.25 1.125 (8, 7.5) 7.75 60.063 38.75 0.125
(8, 7) 7.50 56.250 37.50 0.500 (8, 6) 7.00 49.000 35.00 2.000
(6.5, 7.5) 7.00 49.000 35.00 0.500 (6.5, 7) 6.75 45.563 33.75 0.125
(6.5, 6) 6.25 39.063 31.25 0.125 (7.5, 7) 7.25 52.563 36.25 0.125
(7.5, 6) 6.75 45.563 33.75 1.125 (7, 6) 6.50 42.250 32.50 0.500

1 n
i) E ( y )   yi  7  Y , where n  is the number of sample.
n  i 1

1 n
ii) E ( N y )   N yi  35 , or
n  i 1
E( N y )  N E( y )  35 .

1 n 1 n 2 ( N  n) S 2
iii) V ( y )   ( yi  Y )   yi  Y  0.1875 , and
2 2
 0.1875 .
n  i 1 n  i 1 nN

Therefore,

( N  n) S 2
V ( y)   0.1875 .
nN
1 n 2
iv) E ( s 2 )  
n  i 1
si  0.625  S 2 .

Property: V ( y ) under srswor is less than the V ( y ) under srswr .


Proof:
N n 2
Under srswor , V ( y)  S (2.7)
nN
16 RU Khan

2 N 1 2
and under srswr , V ( y )   S (2.8)
n nN
Comparing (2.7) and (2.8), we note that ( N  1)  ( N  n) , which is always the case
N 1 2 N  n 2
S  S .
nN nN
Example: In a population N  5 , the values are 2, 4, 6, 8 and 10, then for a srs size n  3 ,
show that V ( y ) srswor  V ( y ) srswr .
Solution: We know that
N n 2 N 1 2
V ( y ) srswor  S , and V ( y ) srswr  S ,
nN nN

1 N 1 N
where, S 2   i
N  1 i 1
(Y  Y ) 2
 10 and Y   Yi  6 .
N i 1

Thus,
4 8
V ( y ) srswor  , V ( y ) srswr  , and therefore,
3 3
V ( y ) srswor  V ( y ) srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T    i yi is a class of linear estimator of Y , where  i ' s are coefficient attached to
i 1
sample values, then,
n
i) The class T is linear unbiased estimate class if   i  1.
i 1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
 n  n n n
i) E (T )  E   i yi    i E ( yi )    i Y  Y , iff   i  1 .
 
 
 i 1  i 1 i 1 i 1
2
 n  n
ii) V (T )  E   i yi  Y , under   i  1 .
 
 
 i 1  i 1

 n 
2
 n    n 
2
 E    i yi   2 Y    i yi   Y 2   E    i yi   Y 2 .
      
 i 1   i 1    i 1 

Consider,
2
 n   n n  n n
E   i yi  E    i2 yi2    i j yi y j     i2 E ( yi2 )    i j E ( yi y j )
 
   
 i 1   i 1 i j  i 1 i j
Simple random sampling 17

Now

1 N 2
E ( yi2 )   yi , note that
N i 1
N N N
( N  1) S 2   ( yi  Y ) 2   yi2  N Y 2 or  yi2  ( N  1) S 2  N Y 2 .
i 1 i 1 i 1
Thus,
1
E ( yi2 )  ( N  1) S 2  Y 2
N
N
1 1 N
and E ( yi y j )   yi Pr (i) y j Pr ( j | i )   yi y j .
i j
N N  1 i j

Note that
2
N  N N N
  yi    yi2   yi y j  ( N  1) S 2  N Y 2   yi y j
 
 i 1  i 1 i j i j

N
  yi y j  N 2Y 2  ( N  1) S 2  N Y 2 .
i j

Hence,
1 1
E ( yi y j )  [ N 2Y 2  ( N  1) S 2  N Y 2 ]  Y 2  S 2 / N .
N N 1
and
2
 n  n
1 
n  S 2 
E    i yi     i2  ( N  1) S 2  Y 2     i j  Y 2 
  N  i j  N 
 i 1  i 1 
n
S 2 n n  n  S 2
S 2
  i2  
N i 1
 i2  Y 2   i2  1 
   i2   Y 2 


N 
i 1 i 1  i 1 
n
S2
 S 2   i2  Y 2  .
i 1
N

Thus,
n n 2
n
S2  1 1 n
V (T )  S 2   i2  , since   i2     i    , under condition   i  1 ,
i 1 
i 1
N i 1
n n i 1
then
n  1
2
 1 1 
V (T )  S    i       .
2
i 1  n  n N 
18 RU Khan

n 2
 1 1
Therefore, we note that V (T ) will be minimum, if    i    0 , where  i  , for all
i 1 
n n
1 n
i  1, 2, , n , and T   yi  y .
n i 1

OR
Differentiating variance function with respect to  i and equating to zero, we get

  1 1 1 n
V (T )  2 S 2  i    0  i  , for all i  1, 2, , n , and T   yi  y .
 i  n n n i 1

Simple random sampling applied to qualitative characteristics


Suppose a random sample of size n is drawn from a population of size N , for which the
proportion of individuals having a character C (attribute) is P . Thus, in the population, NP
members are with a particular character C and NQ members with the character not  C (e.g.
in sampling from a population of persons, we may have persons who are smokers and non-
smokers, honest and dishonest, below poverty line and above poverty line etc.). Let a be the
a
number of members in the sample having the character C , then the sample proportion p  .
n
To obtain the expectation and variance of sample proportion, first we change the attribute to
variable by adopting the following procedure.
We assign to the i  th member of the population the value Yi , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise. In this way, we get a variable
y , which has
N
Population total   Yi  NP  A .
i 1

1 N NP
Population mean   Yi   P.
N i 1 N

1 N 1 N 2 NP
Population variance  
N i 1
(Yi  P ) 2
 
N i 1
Yi  P 2 
N
 P 2  PQ .

1 N 1  N 2 
Mean square of population  
N  1 i 1
(Yi  P) 2  
N  1  i 1
Yi  NP 2 

NP  NP 2 NPQ
  .
N 1 N 1
Similarly, assign to the i  th member of the sample the value y i , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise, then
n
1 n a
Sample total   yi  np  a , and Sample mean   yi   p .
i 1
n i 1 n
Simple random sampling 19

1 n 1  n 2 
Mean square for sample  
n  1 i 1
2
( yi  p)  
n  1  i 1
yi  np 2 


1 npq
 (np  np 2 )  .
n 1 n 1

Case I) Random sampling with replacement


a NPQ
On replacing Y by P , Y by NP , y by p  , S 2 by and  2 by PQ in the
n N 1
expressions obtained in expectation and variance of the estimates of population mean and
population total, we find
i) E ( p)  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
2 PQ
population proportion P and V ( p)  V ( y )   .
n n
ii) E ( Aˆ )  E ( Np)  N E ( p)  NP  A , means that Np  Aˆ is an unbiased estimate of
NP  A and

N 2 2 N 2 PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )   .
n n
pq PQ
Theorem: Vˆ ( p)  v( p)  is an unbiased estimate of V ( p)  .
n 1 n
 pq   n pq  1  npq 
Proof: E [Vˆ ( p)]  E   E   E 
 n  1  n n  1 n  n  1
PQ npq
 , since in srswr E ( s 2 )   2  PQ and s 2  .
n n 1
pq PQ
Corollary: Vˆ ( Aˆ )  Vˆ ( Np)  N 2 Vˆ ( p)  N 2 is an unbiased estimate of V ( Aˆ )  N 2 .
n 1 n
Remarks
i) The standard error (SE ) of p is SE ( p)  PQ / n .

ii) The standard error of  is SE ( Aˆ )  N PQ / n .

Case II) Random sampling without replacement

Results are:
i) E ( p)  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
N  n 2  N  n  NPQ  N  n  PQ
population proportion P and V ( p)  V ( y )  S     .
nN  nN  N  1  N  1  n
ii) E ( Aˆ )  E ( Np)  N E ( p)  NP  A , means that Np is an unbiased estimate of NP and

 N n 2 2  N  n  NPQ  N  n  PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )  N 2  S  N    N2   .
 nN   nN  N  1  N 1  n
20 RU Khan

 N  n  pq  N  n  PQ
Theorem: Vˆ ( p)  v( p)    is an unbiased estimate of V ( p)    .
 n 1  N  N 1  n
 N  n  pq   N  n  npq   N  n   npq 
Proof: E [Vˆ ( p)]  E     E     E 
 n  1  N   nN  n  1  nN   n  1 
 N  n  PQ NPQ npq
  , since in srswor E ( s 2 )  S 2  and s 2  .
 N 1  n N 1 n 1
 N n
Corollary: Vˆ ( Aˆ )  Vˆ ( Np)  N 2 Vˆ ( p)  N   pq is an unbiased estimate of
 n 1 
 N  n  PQ
V ( Aˆ )  N 2   .
 N 1  n
Remarks

 N  n  PQ
The standard error (SE ) of p is SE ( p)    and the standard error of Â
 N 1  n
 N  n  PQ
is SE ( Aˆ )  N   .
 N 1  n
Example: A list of 3000 voters of a ward in a city was examined for measuring the
accuracy of age of individuals. A random sample of 300 names was taken, which revealed
that 51 citizens were shown with wrong ages. Estimate the total number of voters having a
wrong description of age in the list and estimate the standard error.
a
Solution: Given N  3000 , n  300 , a  51 , and p   0.17 , then, Aˆ  N p  510 .
n
i) If srswr , is considered, the estimate of the standard error is given by
pq
Est [ SE ( Aˆ )]  N  65.1696  65 .
n 1
ii) If srswor , is considered, the estimate of the standard error is given by

 N n
Est [ SE ( Aˆ )]  N  pq  61.8246  62 .
 n 1 

Confidence interval (Interval estimations)


After having the estimate of an unknown parameter (which is rarely equal to parameter), it
becomes necessary to measure the reliability of the estimate and to construct some confidence
limits with a given degree of confidence. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an interval estimate,
i.e. an interval estimate of a parameter  is an interval of the form L    U , where L and
U depends on the sampling distribution of ˆ .
To choose L and U for any specified probability 1   , where L , such that
Pr ( L    U )  1   . An interval L    U , computed for a particular sample, is called a
(1   )100% confidence interval, the quantity (1   ) is called the confidence coefficient or
the degree of confidence, and the end points L and U are called the lower and upper
Simple random sampling 21

confidence limits. For instance, when   0.05 the degree of confidence is 0.95 and we get a
95% confidence interval.

Limits in case of simple random sampling with replacement


1. Confidence limit for population mean: It is usually assumed that the estimator y is
normally distributed about the corresponding population values, i.e. y ~ N (Y ,  2 / n) .
Since the tables are available for standard normal variable, so that we transform the values
y Y
normal to standard normal as Z  ~ N (0,1) .
/ n
By definition,
Pr ( | Z |  Z / 2 )  1   or Pr (Z / 2  Z  Z / 2 )  1  

 y Y 
or Pr   Z  / 2   Z  / 2   1  
 SE ( y ) 
or Pr [Z / 2 SE ( y )  y  Y  Z / 2 SE ( y )]  1  

or Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   .
The probability being (1   ) , the interval
Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   will include Y , i.e. y  Z / 2  / n
will include Y .
2. Confidence limit for population total: On the same above lines, we see that
Pr [ N y  Z / 2 SE (Yˆ )  Y  N y  Z / 2 SE (Yˆ )]  1  

The probability being (1   ) , the interval, N y  Z / 2 N / n will include Y .


Note: If the sample size is less than 30, and population variance is unknown, Student  t is
used, instead of standard normal.
3. Confidence limit for population proportion: As above, we see that
Pr [ p  Z / 2 SE ( p)  P  p  Z / 2 SE ( p)]  1  

The probability being (1   ) , the interval, p  Z / 2 PQ / n will include P .

Limits in case of simple random sampling without replacement


1. Confidence limit for population mean: Here also the distribution of the estimate based
on the sample as distributed normally, i.e. y ~ N (Y , (1  f ) S 2 / n) , then,
y Y
Z ~ N (0,1) . By definition,
S (1  f ) / n

Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   .

The probability being (1   ) , the interval [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )] will


include Y , i.e. y  Z / 2 S (1  f ) / n will include Y .
22 RU Khan

2. Confidence limit for population total: As in srswr , we see that


Pr [ N y  Z / 2 SE (Yˆ )  Y  N y  Z / 2 SE (Yˆ )]  1   . The probability being (1   ) ,
the interval, N y  Z / 2 NS (1  f ) / n will include Y .
Note: If the sample size is less than 30, and population variance is unknown, Student  t is
used, instead of standard normal.
3. Confidence limit for population proportion: As in srswr , we see that
Pr [ p  Z / 2 SE ( p)  P  p  Z / 2 SE ( p)]  1   . The probability being (1   ) , the
 N  n  PQ
interval , p  Z  / 2   will include P .
 N 1  n
Example: In a library, there are 4500 members who can borrow the books. A random
sample of 16 persons was taken and number of books borrowed by them during a month was
recorded as follows:
2, 3, 10, 0, 5, 7, 13, 1, 6, 23, 18, 12, 6, 0, 1 and 7. Estimate the average number of books
borrowed by each member during a month and obtain 95% confidence interval.
Solution: Given N  4500 , n  16
n
Estimate of population mean Yˆ  Sample mean y   yi  7.125 .
1
n i 1

Since sample size is small and variance of population is unknown, so the interval is defined
as
N n S
y  t / 2, n 1 S  y  t / 2, n 1 , as population size is very large.
nN n

1  n 2 
S 2 is unknown, it can be replaced by its estimator s 2   i
n  1  i 1
y  n y 2

 44.25 .

Therefore,
6.652
Upper confidence limit  7.125  2.131   10.668853  11 , and
16
6.652
Lower confidence limit  7.125  2.131   3.58  4 .
16
Example: In a mess, it was observed that leftover cost a lot. A survey was conducted to find
out the optimum quantity for each item. A random sample of 10 inmates showed that they
taken 4, 5, 2, 3, 1, 7, 2, 3, 4, 4 slices of bread in their breakfast. If there are 120 breakfasts are
to be served every day, estimate the number of slices required every day. Also obtain a 95%
confidence interval for it.

1 n
Solution: Given N  120 , n  10 , and y  yi  3.5 , then
n i 1

Estimate of population total Yˆ  N y  420 .


Simple random sampling 23

Since sample size is small and variance of population is unknown, so that, confidence limit
N y  t / 2, n 1 NS (1  f ) / n

Since S 2 is unknown, so it can be replaced by its unbiased estimator

1  n 2 
s2  
n  1  i 1
yi  n y 2   2.94444 .


Hence,

 10 
Upper confidence limit  420  2.262  120  1.716 1   / 10  561.02517  561
 120 
and
Lower confidence limit  420  141.02517  278.97483  279 .
Example: 100 villages were selected under srswor from a list of 1521 villages. It was
found that 19 of the selected villages where illegally occupied by some landlords. Estimate all
such villages occupied by the landlords out of the total 1521 villages and 95% confidence
interval.
Solution: Given N  1521, n  100 , and a  19 , then, p  0.19
Estimate of number of village illegally occupied by landlords in the population of villages
Aˆ  N p  288.99  289 .
Since sample size is  30 and variance of population proportion is unknown, then,
confidence limit will be
N p  Z / 2 SE ( Aˆ ) , where, SE (Aˆ ) is unknown, so it can be replaced by its unbiased
estimator

 N n
N  pq  57.964667 .
 n 1 
Thus,
Upper confidence limit  289  1.96  57.964667  412.5227  413 , and
Lower confidence limit  289  1.96  57.964667  165.4773  165 .
Example: A simple random sample of 30 households was drawn without replacement from
a city area containing 14848 households. The number of persons per household in the sample
were as follows: 5, 6, 3, 3, 2, 3, 3, 3, 4, 4, 3, 2, 7, 4, 3, 5, 4, 4, 3, 3, 4, 3, 3, 1, 2, 4, 3, 4, 3 and
4. Estimate the average and total number of people in the area and compute the probability
that these estimates are with in  10% of the true value.
Solution: Given N  14848 , and n  30 , then,
105
Estimate of the population total Y  N y  14848   51968 . Assuming that the
30
 1 f 
population values are normally distributed, so that, N y ~ N  Y , NS  , thus,

 n 
24 RU Khan

Pr ( Estimate lies with in 10% of the true value)


 Pr (Y  10% of Y  N y  Y  10% of Y )
 P (0.9 Y  N y  1.1Y )  P ( N y  1.1Y )  P ( N y  0.9 Y )
We shall use the result that
 1 f  N y Y
N y ~ N  Y , NS  , so that Z 
 ~ N (0,1)
 n  1 f
NS
n
 1   10 
Pr ( N y  1.1Y )  Pr  N y  Y   Pr  N y  Y 
 1.1   11 
 10 1 1 
 Pr  N y  N y  Y  N y 
 11 11 11 
 1   1 
 Pr  N y  Y  N y   Pr  N y  Y  N y 
 11   11 
 
 
N y Y
 Pr    Pr ( Z  1.457)  0.9279 .
Ny
 
1 f 1 f
 NS 11 NS 
 n n 
Similarly,
 10   10 1 1 
Pr ( N y  0.9 Y )  Pr  N y  Y   Pr  N y  N y  Y  N y 
9  9 9 9 
 
 
  N y Y
 Pr  N y  Y   N y   Pr  
1 Ny

 9   1 f 1 f 
NS 9N S 
 n n 
 Pr (Z  1.78)  0.0375 .
Therefore, the required probability 0.9279  0.0375  0.8904 .

Estimation of sample size


In planning a sample survey for estimating the population parameters, the preliminary thing is
how to determine the size of the sample to be drawn. Following ways can do it:
a) Specify the precision in terms of margin of error: The margin of error, which is
permissible in the estimate, is known as permissible error. It is taken as the maximum
difference between the estimate and the parametric value that can be tolerated. Suppose an
error d on either side of the parameter value Y can be tolerated in the estimate y based
on the sample values. Thus the permissible error in the estimate y is specified by

y  Y  d or y  Y   d or | y  Y |  d .
Simple random sampling 25

Since | y  Y |  d differ from sample to sample, so this margin of error can be specified
in the form of probability statement as:
Pr [ | y  Y |  d ]   or Pr [ | y  Y |  d ]  1   . (2.9)
Where  is small and it is the risk that we are willing to bear if the actual difference is
greater than d . This  is called the level of significance and (1   ) is called level of
confidence or confidence coefficient.
As the population is normally distributed, so the sample mean will also follow the normal
y Y
distribution i.e. y ~ N [Y , V ( y )] , then Z  ~ N (0,1) .
V ( y)

For the given value of  we can find a value Z of standard normal variate from the
standard normal table by the following equation:
| y  Y | 
Pr   Z 2    or Pr [| y  Y |  V ( y ) Z 2 ]   (2.10)
 V ( y ) 
Comparing the equation (2.9) and (2.10), we get
1 1 
d  Z 2 V ( y ) , so that d 2  Z2 2 V ( y )  Z 2 2    S 2 .
n N 

Z2 2 S 2  1 1  1 1  Z2 / 2 S 2
 1     n0    , where n0  (2.11)
d2 n N  n N  d2
n n n0 n n0
or 1  0  0   1  0 or n  (2.12)
n N n N n
1 0
N

If N is sufficiently large, then n  n0 and for unknown S 2 , some rough estimate of S 2


can be used in relation’s (2.12) and (2.11).
b) Specify the precision in terms of margin of V ( y ) i.e. we have to find sample size n
such that V ( y )  V (given). As in case of margin of error,

d2 Z 2 2 S 2 S2
d  Z 2 V ( y )  V ( y )  , and n0  
Z / 2 d2 V ( y)

Therefore, n0  S 2 / V , and hence n can be obtained by relation (2.12).


c) Specify the precision in terms of coefficient of variation of y :

V ( y) V ( y)
Let CV ( y )  e    e 2 or V ( y )  e 2 Y 2 (2.13)
Y 2
Y
Substitute equation (2.13) in relation (2.11), we get,

S2
n0  , and hence n from (2.12).
e2 Y 2
26 RU Khan

Remark
i) To get n such that the margin of error in the estimate Yˆ  N y of the population total Y
is d  , then, | Yˆ  Y |  d  or | N y  N Y |  d  , or N | d |  d  or N 2 d 2  d  2 , or
d 2
d2  .
N2
Therefore,
2
 N Z 2 S 
n0    , and n can be obtained by the relation (2.12).

 d  

ii) To find n for Aˆ  N y with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( y )  V 

V N2 S2
 V ( y)  , and n0  , then, n from (2.12).
N2 V

Example: For a population of size N  430 roughly we know that Y  19 , S 2  85.6 with
srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y apart
chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y  Y  10% of Y or | y  Y |  10% of Y   1.9 , so that
10

1 Z 2 2 S 2 (1.96) 2  85.6
Pr [ | y  Y |  1.9]   0.05 , and n0    91.091678 .
20 d2 (1.9) 2
Therefore,
n0
n  75.168  75 .
n0
1
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ  Y |  1000 , so that, Pr [ | Yˆ  Y |  1000]   0.05 .
20
We know that
2
n0  N Z 2 S   676  1.96 
2
n 
, here, n0   
n0 d    1000  229  402.01385 , and hence
1  
N
n  252.09  252 .
Simple random sampling 27

Estimation of sample size for proportion


a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of
P . The margin of error can be specified in the form of probability statement as
Pr [ | p  P |  d ]   or Pr [ | p  P |  d ]  1   (2.14)
As the population is normally distributed, so y ~ N [ P, V ( p)] , then
pP
Z ~ N (0,1) . For the given value of  we can find a value Z of the standard
V ( p)
normal variate from the standard normal table by the following relation:
| p  P | 
Pr   Z 2    or Pr [ | p  P |  V ( p) Z 2 ]   (2.15)
 V ( p) 
Comparing equation (2.14) and (2.15), the relation which gives the value of n with the
required precision of the estimate p of P is given by

 N  n  PQ
d  Z 2 V ( p) or d 2  Z2 / 2 V ( p)  Z 2 2   , as sampling is srswr .
 N 1  n

Z 2 2 PQ  N  n  N n Z 2 2 PQ PQ
 1    n0 , where n0   (2.16)
d2  n ( N  1)  n ( N  1) d2 V ( p)

N 1 N  n N N N 1
or   1   1
n0 n n n n0
N N n0 n0 n0
or n     (2.17)
N  1 n0  ( N  1) n0 N  1 n
1  1 0
n0 N N N
If N is sufficiently large, then n  n0
b) If precision is specified in terms of V ( p) i.e. V ( p)  V (given).
PQ
Substituting V ( p)  V in relation (2.16) we get, n0  , and hence n can be obtained
V
by relation (2.17).
c) When precision is given in terms of coefficient of variation of p
Let
V ( p) V ( p)
CV ( p)  e    e2 , or V ( p)  e 2 P 2 (2.18)
P 2
P
Substitute equation (2.18) in relation (2.16), we get,
PQ Q 1 1 
n0      1 , and hence n is given by the relation (2.17).
e2 P2 e2 P e2  P 
28 RU Khan

Remarks
i) To get n , if the margin of error in the estimate Aˆ  Np of the population total A  NP is
d  , then,

d 2
| Aˆ  A |  d  or | N p  N P |  d  , or N | d |  d  , or N 2 d 2  d  2 , or d 2  .
N2
Thus,
2
 N Z  2 PQ 
n0    , and n can be obtained by the relation (1.17).

 d  

ii) To find n , for Aˆ  Np with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( p)  V  ,


V N 2 PQ
so that, V ( p)  , substitute this value in equation (2.16), we get, n0  , and
N2 V
n is given by relation (2.17).
Example: In a population of 4000 people who were called for casting their votes, 50%
returned to the poll. Estimate the sample size to estimate this proportion so that the marginal
error is 5% with 95% confidence coefficient.
Solution: Margin of error in the estimate p of P is given by
| p  P |  0.05 , then Pr [ | p  P |  0.05]  0.05 .
We know that

Z2 / 2 PQ (1.96) 2  0.5  0.5


n0    384.16  384 , and hence,
d2 0.0025
n0
n  350.498  351 .
1  ( n0 / N )
Exercise: In a study of the possible use of sampling to cut down the work in taking
inventory in a stock room, a count is made of the value of the articles on each of 36 shelves
in the room. The values to the nearest dollar are as follows.
29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 60, 60, 61, 61, 61, 62, 64,
65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85.
The estimate of total value made from a sample is to be correct within $200, apart from a 1 in
20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the
requirements. Do you agree?  Yi  2138 , and  Yi2  131 682 .
Solution: It is given that  Yi  2138 ,  Yi2  131 682 , and N  36 , then
i i

1   1   2138  
2
S2   i     134.5 , and
2 2
Y  NY   131 682  36 
N  1  i  36  1   36  

1
| Yˆ  Y |  200 , then, Pr[| Yˆ  Y |  200]   0.05 .
20
Simple random sampling 29

We know that
2 2
n0  N Z / 2   36  1.96 
n , here n0    S   134.5  16.7409 , and therefore,
1
n0  d   200 
N
n  11.42765  12 .

Determination of sample size in decision problems (Another


approach)
Let l (z ) denote the amount of loss (in monetary terms) that will be incurred in a decision
through an error of amount z in the estimate. Let f (z ) denote the probability density
function of z . Then the expected loss for a given sample size n will be
L(n)  E[l ( z )]   l ( z ) f ( z ) dz

If C (n) is the cost of a sample of size n then the most economic sample size will be that
which minimize the sum of cost and expected loss. Thus the problem of determination of the
sample size can be stated as
Find n such that  (n)  C (n)  L(n) is minimum.

Exercise: If the loss function due to an error in y is proportional to | y  Y | and if the total
cost of the survey is C  c0  c1n , show that with simple random sampling, ignoring the
2/3
 S 
fpc , the most economical value of n is   , where  is a constant.
 c 2 
 1 
Solution: Given l ( z )  | y  Y | , then, l ( z )   | y  Y | , and

 S2  2  2
y ~ N Y ,  , when fpc is ignored V ( y )  S ,  z  ( y  Y ) ~ N  0, S  , so that
 n  n  n 
   

1  1  z 2  1  n z2 
f ( z)  exp      exp   
(S / n ) 2  2  S / n   (S / n ) 2  2S 
   
Now
| z |  | y Y |  y Y  z , if y Y .
| z |  | y  Y |  y  Y   z , if y Y .
Thus, the expected loss
 0 
L(n)    | z | f ( z ) dz   | z | f ( z ) dz    | z | f ( z ) dz
  0
0  
   z f ( z ) dz    z f ( z ) dz  2  z f ( z ) dz
 0 0
 1
 2  z exp (n z 2 / 2 S ) dz
0 (S n ) 2
30 RU Khan

n z2 2n z S2
Put  t , then dz  dt or z dz  dt .
2S2 2S2 n

Therefore,
  S2 1 2  S  t 2 S 
L ( n)  2  e t dt   e dt  , as  e t dt  1 .
0 n (S n) 2 2 n 0 2 n 0

To determine the value of n , consider the function


2  S 1/ 2
 (n)  L(n)  C (n)  c0  c1 n  n
2
Differentiate this function with respect to n , we get
 1  2  S  3 / 2  S 3 / 2
 0  c1   n or n  c1
n 2  2  2
2/3
3 / 2 c 2  S 
or n  1 or n    .
S  c 2 
 1 

Exercise: With a loss function l ( z )   z 2 and a cost function C  c0  c1n . Show that
using srs the most economic value of the sample size n to estimate the population mean Y
12
 S2 
is   , where z  y  Y , y is the sample mean used to estimate Y .
 c1 
 

Solution: Given l ( z )   z 2 quadratic loss function. By definition

L(n)  E [ z 2 ]   E ( z 2 ) . Consider,

V ( z )  E [ z  E ( z )]2  E ( z 2 ) , since E ( z )  E ( y  Y )  0 .
Also
1 1 
V ( z)  V ( y  Y )  V ( y)     S 2 .
n N 
Therefore,

S2 S2  S2  S2
E (z 2 )   , and the expected loss L(n)   .
n N n N
To determine the value of n , consider the function

 S2  S2
 ( n)  L ( n)  C ( n)    c0  c1 n
n N
Differentiate this function with respect to n , we get
1/ 2
  S2  S2  S2 
0  c1 , or  c1 , or n    .
n n2 n2  c1 
 
Simple random sampling 31

Exercise: The selling price of a lot of standing timber is UW , where U is the price per
unit volume and W is the volume of timber on the lot. The number N of logs on the lot is
counted, and the average volume per log is estimated from a simple random sample of n
logs. The estimate is made and paid for by the seller and is provisionally accepted by the
buyer. Later, the buyer finds out the exact volume purchased, and the seller reimburses him if
he has paid for more than was delivered. If he has paid for less than was delivered, the buyer
does not mention the fact.
Construct the seller's loss function. Assuming that the cost of measuring n logs is cn , find
the optimum value of n . The standard deviation of the volume per log may be denoted by S
and the fpc ignored.

Solution: Let Ŵ be the estimated total volume of the timber. The error in the estimate is
Wˆ  W .
If Wˆ  W  z  0 sellers loss is zero, i.e. l ( z )  0

If Wˆ  W  z  0 sellers loss is  Uz , i.e. l ( z )  Uz .

When fpc is ignored V (Wˆ )  N 2 S 2 / n , then

 N 2S 2 
Wˆ ~ N W ,  , or z  (Wˆ  W ) ~ N  0, N S  , so that
 n   
  n

 2

1 1  z   1 n z 2 
f ( z)  
exp    exp  
( NS n) 2  2  NS n   ( NS n) 2  2 N 2S 2 
   

Thus, the expected loss

 1  n z 2 
exp  
0
L ( n)   l ( z ) f ( z ) dz   (Uz) dz
  ( NS n) 2  2 N 2S 2 
 

1  n z 2 
exp  
0
  Uz dz
 ( NS n) 2  2 N 2S 2 
 

 1  n z 2 
  Uz exp   dz
0 ( NS n) 2  2 N 2S 2 
 

n z2 2n z N 2S 2
Put  t , then dz  dt or z dz  dt .
2 N 2S 2 2 N 2S 2 n

Therefore,
 UN 2 S 2 1 UNS  t UNS  t
L ( n)   e t dt  0 e dt  , as 0 e dt  1 .
0 n ( NS n) 2 2 n 2 n
To determine the value of n , consider the function
UNS 1 / 2
 ( n)  L ( n)  C ( n)  c n  n .
2
32 RU Khan

Differentiate this function with respect to n , we get


 1  UNS   3 / 2 UNS
 0  c   n or n 3 / 2  c
n 2  2  2 2
2/3
3 / 2 2c 2  UNS 
or n  or n   
 .
UNS  2c 2 
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0  Q  1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If  2 is the variance of Yi in
the original population and  02 is the variance when all zeros are excluded, then show that
2 Q
 02   Y 2 , where P  1  Q , and Y is the mean value of Yi for the whole
P P2
population.
Solution: Given Y1 , Y2 , , YNP , YNP1 , , YN (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y   Yi , population mean, and
N i 1
YNP   Yi ,
NP i 1

1 NQ N NP N NP
YNQ  i
NQ i 1
Y  0 , also,  i  Yi , and
Y   Yi2   Yi2 , so that NY  NP YNP ,
i 1 i 1 i 1 i 1
1
or YNP  Y . By definition,
P

1 N 1 N 2 N
2  
N i 1
(Yi  Y ) 2
 
N i 1
Yi  Y 2
, or N 2
  Yi2  NY 2 .
i 1
NP
Similarly, NP 02   Yi2  NP YNP
2
.
i 1
Thus,
1 1  Q
N ( 2  P 02 )  NP YNP
2
 NY 2  NP Y 2  NY 2  N   1 Y 2  N   Y 2 .
P 2 P  P
Therefore,

Q 2 Q 2
P o2   2    Y 2 or  o2   Y .
P P P2
Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn
without replacement and added to the original sample. Show that the mean based on (n  n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1  3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1  n1 / n) 2
size is large.
Simple random sampling 33

Solution: Let the sample mean based on n , n1 , and n  n1 elements are denoted by y n ,
n n1
y n1 , and y nn1 respectively, and are defined as y n  1  yi , y n1  1  yi , and
n i 1 n1 i 1
n y n  n1 y n1
y n n1  . We have to show E ( y nn1 )  Y , in this case the expectation is taken
n  n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y nn1 )  E (n y n  n1 y n1 )  E [n y n  n1 E ( y n1 n)]
n  n1 n  n1
1
 E (n y n  n1 y n ) , since n1 is a sub-sample of the sample of size n .
n  n1
1
 (n Y  n1 Y )  Y .
n  n1
To obtain the variance
2
 n y n  n1 y n1 
V ( y n n1 )  E ( y n n1  Y )  E 
2
Y 
 n  n1 
 
1
 E [n y n  n1 y n1  (n  n1 ) Y ] 2
2
(n  n1 )
1
 E [n y n  n Y  n1 y n1  n1 Y ] 2
2
(n  n1 )
1
 E [n ( y n  Y )  n1 y n1  n1 y n  n1 y n  n1Y ] 2
2
(n  n1 )
1
 E [(n  n1 ) ( y n  Y )  n1 ( y n1  y n )]2
2
(n  n1 )
1
 [(n  n1 ) 2 E ( y n  Y ) 2  n12 E ( y n1  y n ) 2 ] , as samples are drawn
2
(n  n1 )
independently.
1
 [(n  n1 ) 2 V ( y n )  n12 E{E ( y n1  y n ) 2 n}]
2
(n  n1 )

1   1 1  
 (n  n1 ) 2 V ( y n )  n12 E    S n2 
(n  n1 ) 2   n1 n  

1   n  n1  2 
 (n  n1 ) 2 V ( y n )  n12   S 
(n  n1 ) 2   n1n  
34 RU Khan

1  n (n  n1 ) 2  n (n  n1 ) 2
 (n  n1 ) 2 V ( y n )  1 S   V ( yn )  1 S .
2  
(n  n1 ) n n (n  n1 ) 2
Therefore,
V ( y n  n1 ) n1 (n  n1 ) n1 (n  n1 )
 1 S 2  1 S2
2 2 2
V ( yn ) n (n  n1 ) V ( y n ) n (n  n1 ) S / n

(n  n1 ) 2  n1 (n  n1 ) n 2  n12  2 n1n  n1n  n12


 
(n  n1 ) 2 (n  n1 ) 2

n 2  3 n1n 1  (3 n1 / n)
  .
(n  n1 ) 2 (1  n1 / n) 2
Exercise: A simple random sample of size n  n1  n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that

i) V ( y1  y 2 )  S 2 [(1 / n1 )  (1 / n2 )] , where y 2 is mean of the remaining n2 units in the


sample,

ii) V ( y1  y )  S 2 [(1 / n1 )  (1 / n)] ,


iii) Cov ( y, y1  y )  0 .
Repeated sampling implies repetition of the drawing of both the sample and subsample.
Solution:
i) In repeated sampling the given procedure is equivalent to draw subsamples of sizes n1
and n2 independently, thus
V ( y1  y 2 )  V ( y1 )  V ( y 2 ) , since Cov ( y1 , y2 )  0

 S 2 [(1 / n1 )  (1 / n2 )] , ignoring fpc .


n y  n2 y 2 n y  n2 y 2
ii) y  1 1  y1  y  y1  1 1
n1  n2 n1  n2
n y n y n y n y n ( y  y2 )
or y1  y  1 1 2 1 1 1 2 2  2 1 .
n1  n2 n
Therefore,

 n ( y  y2 )  n22 n2  1 1  2
V ( y1  y )  V  2 1  V ( y1  y 2 )  2    S
 n  n2 n 2  n1 n2 

n 2  n  n2  2 n2 2 n  n1 2  1 1  2
 2  1  S  S  S     S .
n 2  n1 n2  n1 n n1 n  n1 n 
iii) Cov ( y, y1  y )  E [ y ( y1  y )]  E ( y ) E ( y1  y )

 E ( y y1  y 2 )  Y  0  E ( y y1 )  E ( y 2 ) (1)
Simple random sampling 35

Consider
 n y  n2 y 2  n n 
E ( y y1 )  E  1 1 y1   E  1 y12  2 y1 y 2 
 n  n n 
n n
 1 E ( y12 )  2 E ( y1 ) E ( y 2 )
n n

n1 n2 2 n1  S 2  n
2
 [ V ( y1 )  Y ]  Y  Y 2  2 Y 2
n n n  n1  n

S 2 n1 2 n2 2 S 2
  Y  Y  Y 2 (2)
n n n n
Now

2 2 S2
2
V ( y)  E ( y )  Y 2
or E ( y )  V ( y )  Y  Y 2 (3)
n
In view of equations (1), (2), and (3), we get
 S2   S2 
Cov ( y , y1  y )   Y 2  Y 2   0.
 n   n 
   
Exercise: A population has three units U1 ,U 2 and U 3 with variates Y1 ,Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample (s ) P (s) Estimator t Estimator t 
(U1 , U 2 ) 1/ 2 Y1  2Y2 Y1  2Y2  Y12
(U1 ,U 3 ) 1/ 2 Y1  2Y3 Y1  2Y3  Y12

Prove that both t and t  are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t )   t i p (t i )  (Y1  2Y2  Y1  2Y3 )  Y .
i
2

This shows that estimator t is unbiased for Y .


1 1
E (t 2 )  [(Y1  2Y2 ) 2  (Y1  2Y3 ) 2 ]  (Y12  4Y22  4Y1Y2  Y12  4Y32  4Y1Y3 )
2 2

 Y12  2Y22  2Y32  2Y1Y2  2Y1Y3 .

Therefore,

V (t )  E (t 2 )  [ E (t )]2  Y12  2Y22  2Y32  2Y1Y2  2Y1Y3  (Y1  Y2  Y3 ) 2

 Y22  Y32  2Y2Y3  (Y2  Y3 ) 2 .


36 RU Khan

Similarly,
1
E (t )   t i p (t i )  (Y1  2Y2  Y12  Y1  2Y3  Y12 )  Y , hence, t  is unbiased for Y .
i
2

1
E (t  2 )  [(Y1  2Y2  Y12 ) 2  (Y1  2Y3  Y12 ) 2 ]
2
1 4
 (Y1  2Y13  Y12  4Y12Y2  4Y1Y2  4Y22  Y14  2Y13
2

 Y12  4Y12Y3  4Y1Y3  4Y32 )

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32 .

Therefore,

V (t )  E (t  2 )  [ E (t )]2

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32  (Y1  Y2  Y3 ) 2

 (Y2  Y3 ) 2  Y12 (Y12  2Y2  2Y3 )

 V (t )  Y12 (Y12  2Y2  2Y3 ) .

We conclude that both linear estimator t and quadratic estimator t  are unbiased; among
which estimator has minimum variance depends on the variate values.

You might also like