0% found this document useful (0 votes)

388 views30 pages

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Uploaded by

Partha pratim Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

388 views30 pages

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Uploaded by

Partha pratim Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

SIMPLE RANDOM SAMPLING

A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr) .
ii) Simple random sampling without replacement (srswor) .

Simple random sampling with replacement (srswr)

In sampling with replacement a unit is selected from the population consisting of N units, its
content noted and then returned to the population before the next draw is made, and the
process is repeated n times to give a sample of n units. In this method, at each draw, each of
1
the N units of the population gets the same probability of being selected. Here the same
N
unit of the population may occur more than once in the sample (order in which the sample
units are obtained is regarded). There are N n samples, and each has an equal probability
1
of being selected.
Nn
Note: If order in which the sample units are obtained is ignored (unordered), then in such
case the number of possible samples will be
N
Cn  N (1 N 1C1  N 1C2    N 1Cn2 ) .

Simple random sampling without replacement (srswor)

Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population
before next draw is made. The process is repeated n times to give a sample of n units. In this
method at the r  th drawing, each of the N  r  1 units of the population gets the same
1
probability of being included in the sample. Here any unit of the population cannot
N  r 1
occur more than once in the sample (order is ignored). There are N C n possible samples, and
1
each such sample has an equal probability of being selected.
N
Cn
Example: For a population of size N  5 with values 1, 3, 6, 8 and 9 make list of all
possible samples of size n  3 by both the methods [ srswr (unordered) and srswor ].
Solution: By the sampling wr , the number of possible samples will be
N
Cn  N (1 N 1C1   N 1Cn  2 )5C3  5 (1 4C1 )  35 , which are as follows:
(1, 1, 1), (1, 1, 3), (1, 1, 6), (1, 1, 8), (1, 1, 9), (1, 3, 3), (1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 6),
(1, 6, 8), (1, 6, 9), (1, 8, 8), (1, 8, 9), (1, 9, 9), (3, 3, 3), (3, 3, 6), (3, 3, 8), (3, 3, 9), (3, 6, 6),
(3, 6, 8), (3, 6, 9), (3, 8, 8), (3, 8, 9), (3, 9, 9), (6, 6, 6), (6, 6, 8),(6, 6, 9), (6, 8, 8), (6, 8, 9), (6,
9, 9), (8, 8, 8), (8, 8, 9), (8, 9, 9), (9, 9, 9).
8 RU Khan

By the sampling wor , the number of possible samples will be N Cn  5C3  10 , which are as
follows:
(1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 8), (1, 6, 9), (1, 8, 9), (3, 6, 8), (3, 6, 9), (3, 8, 9), (6, 8, 9).

Theory of simple random sampling with replacement

N , population size.
n , sample size.
Yi , value of the i  th unit of the population.
yi , value of the i  th unit of the sample.
N
Y   Yi , population total.
i 1

1 N
Y   Yi , population mean.
N i 1

1 n
y  yi , sample mean.
n i 1

1 N 1 N 2
   (Yi  Y )   Yi  Y 2 , population variance.
2 2
N i 1 N i 1

1 N 1  N 2 
S2  
N  1 i 1
(Yi  Y ) 2
 
N  1  i 1
Yi  N Y 2  , population mean square.



1 n 1  n 2 
s2  
n  1 i 1
( y i  y ) 2
 
n  1  i 1
yi  n y 2  , sample mean square.


Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
N 1 2  2
i.e. E ( y )  Y and its variance V ( y )  S  .
nN n
Proof: It is immediately seen that
1 n  1 n
E ( y )  E   yi    E ( yi ) . By definition,
n 
 i 1  n i 1
N
1 N
E ( yi )   Yi Pr ( yi  Yi )   Yi  Y , since yi can take any one of the values
i 1
N i 1
Y1 ,, YN each with probability 1 / N .
Therefore,

1 n
E( y)  Y  Y .
n i 1
Simple random sampling 9

To obtain the variance, we have

2 2 2
1 n  1  n  1 n 
V ( y )  E [ y  E ( y )]  E  yi  Y
2   E  yi  nY   E   ( yi  Y ) 
n  n 2  i 1  n 2 i 1
 i 1   
 
1  n  1  n 
 E   ( yi  Y ) 2   E   ( yi  Y ) ( y j  Y )  .
n 2 i 1  n
2
i, j 
i  j 

Justification of the above result can see by taking particular case, i.e. as
2 2
n   n 
{ yi  E ( yi )}    ai   (a1  a2  ...  an ) . Put n  3 , then,
2

i 1   i 1 
3 3
(a1  a2  a3 ) 2  a12  a22  a32  a1a2  a1a3  a2a1  a2a3  a3a1  a3a2   ai2   ai a j .
i 1 i, j
i j

1 n 1 n
  E ( yi  Y ) 2   E [( yi  Y ) ( y j  Y )] , i  j .
n 2 i 1 n 2 i, j

1 n 1 n
  V ( yi )   Cov ( yi , y j ) (2.1)
n 2 i 1 n 2 i, j
i j

Consider
N
V ( yi )  E ( yi  Y ) 2   (Yi  Y ) 2 Pr ( yi  Yi )
i 1

1 N
 
N i 1
(Yi  Y ) 2 , since yi can take any one of the values Y1 ,, YN each with

probability 1 / N .

N 1 2 1 N
2 
N
S , since S 2  
N  1 i 1
(Yi  Y ) 2 (2.2)

and
N
Cov ( yi , y j )  E [( yi  Y ) ( y j  Y )]   (Yi  Y ) (Y j  Y ) Pr ( yi  Yi , y j  Y j ) .
i, j

In this case y j can take any one of the values Y1 ,, YN with probability 1 / N irrespective of
the values taken by yi , because old composition of the population remain the same
throughout the sampling process due to the sampling with replacement. In other words for
i  j , yi and y j are independent, so that
10 RU Khan

1 1 1
Pr ( yi  Yi , y j  Y j )  Pr ( yi  Yi ) Pr ( y j  Y j )    .
N N N2
Hence,

1 N 1 N N
2 i 2  i
Cov ( yi , y j )  (Y  Y ) (Y j  Y )  (Y  Y )  (Y j  Y )  0 . (2.3)
N i, j N i 1 j 1

Substitute the values of equations (2.2) and (2.3) in equation (2.1), we get

1 n N 1 2 N 1 2  2
V ( y)   S  S  .
n 2 i 1 N nN n

Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance

N 2 2 N ( N  1) 2
V (Yˆ )   S .
n n
Proof: By definition,

1 N
E (Y )  E ( N y )  N E ( y )  N Y  N  Yi  Y
ˆ
N i 1

N 2 2 N ( N  1) 2
and V (Yˆ )  V ( N y )  N 2 V ( y )   S .
n n

Remarks:

 N 1
i) The standard error (SE ) of y is SE ( y )  V ( y )  S .
n nN

N N ( N  1)
ii) The standard error Yˆ is SE (Yˆ )  V (Yˆ )  S .
n n

Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population

variance  2 i.e. E (s 2 )   2 .
Proof: By definition
 1 n  1 n 
E (s 2 )  E   ( y i  y ) 2
   E ( y i )  n E ( y )  .
2 2

 n  1 i 1  n  1 i 1 

To obtain E ( yi2 ) and E ( y 2 ) , note that

V ( yi )  E ( yi2 )  Y 2 , so that

E ( yi2 )   2  Y 2 , since V ( yi )  ( N  1) S 2 / N   2 .

and

V ( y )  E ( y 2 )  Y 2 , so that
Simple random sampling 11

2  N  1 2  2
E( y 2 )   Y 2 , since V ( y )    S  , for srswr .
n  nN  n
Therefore,

1 n  2   N  1 2
E (s 2 )   ( 2  Y 2 )  n   Y 2    2   S .
n  1 i 1  n   N 
  
Example: In a population with N  5 , the values of Yi are 8, 3, 11, 4 and 7.

a) Calculate population mean Y , variance  2 and mean sum square S 2 .

b) Enumerate all possible samples of size 2 by the replacement method and verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y )  Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y )  Y .

( N  1) S 2  2
iii) V ( y )   , and
nN n
 N  1 2
iv) E ( s 2 )   S  .
2
 N 
Solution:
a) We know that

1 N 1 N 2 1  N 2 
Y   Yi  6.6 ,    Yi  Y  8.24 and S 
N i 1
2
N i 1
2 2

N  1  i 1
Yi  N Y 2   10.3 .


b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 8) 8.0 64.00 40.0 0.0 (11, 4) 7.5 56.25 37.5 24.5
(8, 3) 5.5 30.25 27.5 12.5 (11, 7) 9.0 81.00 45.0 8.0
(8, 11) 9.5 90.25 47.5 4.5 (4, 8) 6.0 36.00 30.0 8.0
(8, 4) 6.0 36.00 30.0 8.0 (4, 3) 3.5 12.25 17.5 0.5
(8, 7) 7.5 56.25 37.5 0.5 (4, 11) 7.5 56.25 37.5 24.5
(3, 8) 5.5 30.25 27.5 12.5 (4, 4) 4.0 16.00 20.0 0.0
(3, 3) 3.0 9.00 15.0 0.0 (4, 7) 5.5 30.25 27.5 4.5
(3, 11) 7.0 49.00 35.0 32.0 (7, 8) 7.5 56.25 37.5 0.5
(3, 4) 3.5 12.25 17.5 0.5 (7, 3) 5.0 25.00 25.0 8.0
(3, 7) 5.0 25.00 25.0 8.0 (7, 11) 9.0 81.00 45.0 8.0
(11, 8) 9.5 90.25 47.5 4.5 (7, 4) 5.5 30.25 27.5 4.5
(11, 3) 7.0 49.00 35.0 32.0 (7, 7) 7.0 49.00 35.0 0.0
(11, 11) 11.0 121.00 55.0 0.0
12 RU Khan

1 n 1
i) E( y)  
n  i 1
yi 
25
 165  6.6  Y , where n  is the number of sample.

1 n
ii) E ( N y )   N yi  33 or E( N y )  N E( y )  33 .
n  i 1

1 n 2
iii) V ( y ) 
 
n i 1
yi  Y 2  4.12 .

Now,

( N  1) S 2 2
 4.12 , and  4.12 , therefore,
nN n
(n  1) S 2  2
V ( y)    4.12 .
nN n
1 n 2 1
iv) E ( s 2 )   si  25  206  8.24
n  i 1
(1a)

( N  1) S 2
and  8.24 (2a)
N
In view of equation (1a) and (2a), we get

( N  1) S 2
E (s 2 )    2  8.24 .
N

Theory of simple random sampling without replacement

Theorem: In srswor , sample mean y is an unbiased estimate of the population mean Y
 N n 2
i.e. E ( y )  Y and its variance is V ( y )   S .
 nN 
Proof: As in srswr ,

1 n 1 n
E ( y )  Y , and V ( y )   V ( yi )   Cov ( yi , y j ) , (2.4)
n 2 i 1 n 2 i, j
i j

N 1 2
where V ( yi )  S , for each i . (2.5)
N
Consider
N
Cov ( yi , y j )  E [( yi  Y ) ( y j  Y )]   (Yi  Y ) (Y j  Y ) Pr ( yi  Yi , y j  Y j ) .
i, j

In this case y j can take any one of the values except Yi , the value which is known to have
1
already been assumed by yi , with equal probability , so that for i  j ,
N 1
Simple random sampling 13

1 1
Pr ( yi  Yi , y j  Y j )  Pr ( yi  Yi ) Pr ( y j  Y j | yi  Yi )   .
N N 1
Hence,
N
1
Cov( yi , y j )   (Yi  Y ) (Y j  Y )
N ( N  1) i, j

1 N 
N 

 
N ( N  1) i 1
(Yi  Y )  j (Y  Y )  (Yi  Y ) 
 j 1
 


1  N N N 
   (Yi  Y )  (Y j  Y )   (Yi  Y ) 2 
N ( N  1)  i 1 j 1 i 1 
 

1 N
S2
  (Yi  Y )   N
N ( N  1) i 1
2
(2.6)

Substitute the values of equations (2.5) and (2.6) in equation (2.4), we get

1  ( N  1) S 2  1  S 2  ( N  1) 2 n  1 2
V ( y)  n  n (n  1)    S  S
n 2  N  n2  N  nN n N
  

 N n 2  n  S2 S2
  S  1    (1  f ) ,
 nN   N n n
n
where f  is called the sampling fraction and the factor (1  f ) is called the finite
N
population correction ( fpc ) . If the population size N is very large or if n is small
n
corresponding with N , then f   0 and consequently fpc  1.
N
Alternative expression
 N n 2 1 1  2
V ( y)   S   S .
 nN  n N 
Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance
V (Yˆ )  N 2 (1  f ) S 2 / n .
Proof:
By definition,

1 N
E (Yˆ )  E ( N y )  N E ( y )  N Y  N  Yi  Y
N i 1

and

 N n 2 S2
V (Yˆ )  V ( N y )  N 2  2
 S  N (1  f ) .
 nN  n
14 RU Khan

Remarks

N n 1 f 1 1 
i) The standard error of y is SE ( y )  S S S   .
nN n n N 

N n 1 f 1 1 
ii) The standard error Yˆ is SE (Yˆ )  N S NS NS   .
nN n n N 
For large population fpc  (1  f )  1, then

S2 S
i) V ( y )  , and SE ( y )  .
n n

ˆ N 2S 2 NS
ii) V (Y )  , and SE (Yˆ )  .
n n

Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population

mean square S 2 i.e. E ( s 2 )  S 2 .
Proof: By definition,
 1 n  1 n 
E (s 2 )  E   ( y i  y ) 2
   E ( y i )  n E ( y )  .
2 2
 n  1 i 1  n  1 i 1 

To obtain E ( yi2 ) and E ( y 2 ) , note that

V ( yi )  E ( yi2 )  Y 2 , so that
N 1 2
E ( yi2 )  S  Y 2 , since V ( yi )  ( N  1) S 2 / N .
N

and V ( y )  E ( y 2 )  Y 2 , so that

 N n 2  N n 2
E( y 2 )   2
 S  Y , since V ( y )    S , for srswr .
 nN   nN 
Therefore,

1  n  N 1 2  N n 2 
E (s 2 )    S  Y 2   n S  Y 2 
n  1 i 1  N   nN 

1 S2 1 S2
 [n ( N  1)  ( N  n)]  (n  1) N  S2.
n 1 N n 1 N
Example: A random sample of n  2 households was drawn from a small colony of N  5
households having monthly income (in rupees) as follows:

Households: 1 2 3 4 5
Income (in thousand rupees): 8 6.5 7.5 7 6

a) Calculate population mean Y , variance  2 and mean sum square S 2 .

Simple random sampling 15

b) Enumerate all possible samples of size n  2 by the without replacement method and
verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y )  Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y )  Y .

( N  n) S 2
iii) V ( y )  , and
nN

iv) E ( s 2 )  S 2 .
Solution:
a) We know that

1 N 1 N 1  N 2 
Y  
N i 1
Yi  7 ,  2   Yi2  Y 2  0.5 , and S 2 
N i 1

N  1  i 1
Yi  N Y 2   0.625 .


b) Form a table for calculation as below:
Samples yi y i2 N yi si2 Samples yi y i2 N yi si2
(8, 6.5) 7.25 52.563 36.25 1.125 (8, 7.5) 7.75 60.063 38.75 0.125
(8, 7) 7.50 56.250 37.50 0.500 (8, 6) 7.00 49.000 35.00 2.000
(6.5, 7.5) 7.00 49.000 35.00 0.500 (6.5, 7) 6.75 45.563 33.75 0.125
(6.5, 6) 6.25 39.063 31.25 0.125 (7.5, 7) 7.25 52.563 36.25 0.125
(7.5, 6) 6.75 45.563 33.75 1.125 (7, 6) 6.50 42.250 32.50 0.500

1 n
i) E ( y )   yi  7  Y , where n  is the number of sample.
n  i 1

1 n
ii) E ( N y )   N yi  35 , or
n  i 1
E( N y )  N E( y )  35 .

1 n 1 n 2 ( N  n) S 2
iii) V ( y )   ( yi  Y )   yi  Y  0.1875 , and
2 2
 0.1875 .
n  i 1 n  i 1 nN

Therefore,

( N  n) S 2
V ( y)   0.1875 .
nN
1 n 2
iv) E ( s 2 )  
n  i 1
si  0.625  S 2 .

Property: V ( y ) under srswor is less than the V ( y ) under srswr .

Proof:
N n 2
Under srswor , V ( y)  S (2.7)
nN
16 RU Khan

2 N 1 2
and under srswr , V ( y )   S (2.8)
n nN
Comparing (2.7) and (2.8), we note that ( N  1)  ( N  n) , which is always the case
N 1 2 N  n 2
S  S .
nN nN
Example: In a population N  5 , the values are 2, 4, 6, 8 and 10, then for a srs size n  3 ,
show that V ( y ) srswor  V ( y ) srswr .
Solution: We know that
N n 2 N 1 2
V ( y ) srswor  S , and V ( y ) srswr  S ,
nN nN

1 N 1 N
where, S 2   i
N  1 i 1
(Y  Y ) 2
 10 and Y   Yi  6 .
N i 1

Thus,
4 8
V ( y ) srswor  , V ( y ) srswr  , and therefore,
3 3
V ( y ) srswor  V ( y ) srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T    i yi is a class of linear estimator of Y , where  i ' s are coefficient attached to
i 1
sample values, then,
n
i) The class T is linear unbiased estimate class if   i  1.
i 1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
 n  n n n
i) E (T )  E   i yi    i E ( yi )    i Y  Y , iff   i  1 .
 
 
 i 1  i 1 i 1 i 1
2
 n  n
ii) V (T )  E   i yi  Y , under   i  1 .
 
 
 i 1  i 1

 n 
2
 n    n 
2
 E    i yi   2 Y    i yi   Y 2   E    i yi   Y 2 .
      
 i 1   i 1    i 1 

Consider,
2
 n   n n  n n
E   i yi  E    i2 yi2    i j yi y j     i2 E ( yi2 )    i j E ( yi y j )
 
   
 i 1   i 1 i j  i 1 i j
Simple random sampling 17

Now

1 N 2
E ( yi2 )   yi , note that
N i 1
N N N
( N  1) S 2   ( yi  Y ) 2   yi2  N Y 2 or  yi2  ( N  1) S 2  N Y 2 .
i 1 i 1 i 1
Thus,
1
E ( yi2 )  ( N  1) S 2  Y 2
N
N
1 1 N
and E ( yi y j )   yi Pr (i) y j Pr ( j | i )   yi y j .
i j
N N  1 i j

Note that
2
N  N N N
  yi    yi2   yi y j  ( N  1) S 2  N Y 2   yi y j
 
 i 1  i 1 i j i j

N
  yi y j  N 2Y 2  ( N  1) S 2  N Y 2 .
i j

Hence,
1 1
E ( yi y j )  [ N 2Y 2  ( N  1) S 2  N Y 2 ]  Y 2  S 2 / N .
N N 1
and
2
 n  n
1 
n  S 2 
E    i yi     i2  ( N  1) S 2  Y 2     i j  Y 2 
  N  i j  N 
 i 1  i 1 
n
S 2 n n  n  S 2
S 2
  i2  
N i 1
 i2  Y 2   i2  1 
   i2   Y 2 


N 
i 1 i 1  i 1 
n
S2
 S 2   i2  Y 2  .
i 1
N

Thus,
n n 2
n
S2  1 1 n
V (T )  S 2   i2  , since   i2     i    , under condition   i  1 ,
i 1 
i 1
N i 1
n n i 1
then
n  1
2
 1 1 
V (T )  S    i       .
2
i 1  n  n N 
18 RU Khan

n 2
 1 1
Therefore, we note that V (T ) will be minimum, if    i    0 , where  i  , for all
i 1 
n n
1 n
i  1, 2, , n , and T   yi  y .
n i 1

OR
Differentiating variance function with respect to  i and equating to zero, we get

  1 1 1 n
V (T )  2 S 2  i    0  i  , for all i  1, 2, , n , and T   yi  y .
 i  n n n i 1

Simple random sampling applied to qualitative characteristics

Suppose a random sample of size n is drawn from a population of size N , for which the
proportion of individuals having a character C (attribute) is P . Thus, in the population, NP
members are with a particular character C and NQ members with the character not  C (e.g.
in sampling from a population of persons, we may have persons who are smokers and non-
smokers, honest and dishonest, below poverty line and above poverty line etc.). Let a be the
a
number of members in the sample having the character C , then the sample proportion p  .
n
To obtain the expectation and variance of sample proportion, first we change the attribute to
variable by adopting the following procedure.
We assign to the i  th member of the population the value Yi , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise. In this way, we get a variable
y , which has
N
Population total   Yi  NP  A .
i 1

1 N NP
Population mean   Yi   P.
N i 1 N

1 N 1 N 2 NP
Population variance  
N i 1
(Yi  P ) 2
 
N i 1
Yi  P 2 
N
 P 2  PQ .

1 N 1  N 2 
Mean square of population  
N  1 i 1
(Yi  P) 2  
N  1  i 1
Yi  NP 2 



NP  NP 2 NPQ
  .
N 1 N 1
Similarly, assign to the i  th member of the sample the value y i , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise, then
n
1 n a
Sample total   yi  np  a , and Sample mean   yi   p .
i 1
n i 1 n
Simple random sampling 19

1 n 1  n 2 
Mean square for sample  
n  1 i 1
2
( yi  p)  
n  1  i 1
yi  np 2 


1 npq
 (np  np 2 )  .
n 1 n 1

Case I) Random sampling with replacement

a NPQ
On replacing Y by P , Y by NP , y by p  , S 2 by and  2 by PQ in the
n N 1
expressions obtained in expectation and variance of the estimates of population mean and
population total, we find
i) E ( p)  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
2 PQ
population proportion P and V ( p)  V ( y )   .
n n
ii) E ( Aˆ )  E ( Np)  N E ( p)  NP  A , means that Np  Aˆ is an unbiased estimate of
NP  A and

N 2 2 N 2 PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )   .
n n
pq PQ
Theorem: Vˆ ( p)  v( p)  is an unbiased estimate of V ( p)  .
n 1 n
 pq   n pq  1  npq 
Proof: E [Vˆ ( p)]  E   E   E 
 n  1  n n  1 n  n  1
PQ npq
 , since in srswr E ( s 2 )   2  PQ and s 2  .
n n 1
pq PQ
Corollary: Vˆ ( Aˆ )  Vˆ ( Np)  N 2 Vˆ ( p)  N 2 is an unbiased estimate of V ( Aˆ )  N 2 .
n 1 n
Remarks
i) The standard error (SE ) of p is SE ( p)  PQ / n .

ii) The standard error of Â is SE ( Aˆ )  N PQ / n .

Case II) Random sampling without replacement

Results are:
i) E ( p)  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
N  n 2  N  n  NPQ  N  n  PQ
population proportion P and V ( p)  V ( y )  S     .
nN  nN  N  1  N  1  n
ii) E ( Aˆ )  E ( Np)  N E ( p)  NP  A , means that Np is an unbiased estimate of NP and

 N n 2 2  N  n  NPQ  N  n  PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )  N 2  S  N    N2   .
 nN   nN  N  1  N 1  n
20 RU Khan

 N  n  pq  N  n  PQ
Theorem: Vˆ ( p)  v( p)    is an unbiased estimate of V ( p)    .
 n 1  N  N 1  n
 N  n  pq   N  n  npq   N  n   npq 
Proof: E [Vˆ ( p)]  E     E     E 
 n  1  N   nN  n  1  nN   n  1 
 N  n  PQ NPQ npq
  , since in srswor E ( s 2 )  S 2  and s 2  .
 N 1  n N 1 n 1
 N n
Corollary: Vˆ ( Aˆ )  Vˆ ( Np)  N 2 Vˆ ( p)  N   pq is an unbiased estimate of
 n 1 
 N  n  PQ
V ( Aˆ )  N 2   .
 N 1  n
Remarks

 N  n  PQ
The standard error (SE ) of p is SE ( p)    and the standard error of Â
 N 1  n
 N  n  PQ
is SE ( Aˆ )  N   .
 N 1  n
Example: A list of 3000 voters of a ward in a city was examined for measuring the
accuracy of age of individuals. A random sample of 300 names was taken, which revealed
that 51 citizens were shown with wrong ages. Estimate the total number of voters having a
wrong description of age in the list and estimate the standard error.
a
Solution: Given N  3000 , n  300 , a  51 , and p   0.17 , then, Aˆ  N p  510 .
n
i) If srswr , is considered, the estimate of the standard error is given by
pq
Est [ SE ( Aˆ )]  N  65.1696  65 .
n 1
ii) If srswor , is considered, the estimate of the standard error is given by

 N n
Est [ SE ( Aˆ )]  N  pq  61.8246  62 .
 n 1 

Confidence interval (Interval estimations)

After having the estimate of an unknown parameter (which is rarely equal to parameter), it
becomes necessary to measure the reliability of the estimate and to construct some confidence
limits with a given degree of confidence. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an interval estimate,
i.e. an interval estimate of a parameter  is an interval of the form L    U , where L and
U depends on the sampling distribution of ˆ .
To choose L and U for any specified probability 1   , where L , such that
Pr ( L    U )  1   . An interval L    U , computed for a particular sample, is called a
(1   )100% confidence interval, the quantity (1   ) is called the confidence coefficient or
the degree of confidence, and the end points L and U are called the lower and upper
Simple random sampling 21

confidence limits. For instance, when   0.05 the degree of confidence is 0.95 and we get a
95% confidence interval.

Limits in case of simple random sampling with replacement

1. Confidence limit for population mean: It is usually assumed that the estimator y is
normally distributed about the corresponding population values, i.e. y ~ N (Y ,  2 / n) .
Since the tables are available for standard normal variable, so that we transform the values
y Y
normal to standard normal as Z  ~ N (0,1) .
/ n
By definition,
Pr ( | Z |  Z / 2 )  1   or Pr (Z / 2  Z  Z / 2 )  1  

 y Y 
or Pr   Z  / 2   Z  / 2   1  
 SE ( y ) 
or Pr [Z / 2 SE ( y )  y  Y  Z / 2 SE ( y )]  1  

or Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   .
The probability being (1   ) , the interval
Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   will include Y , i.e. y  Z / 2  / n
will include Y .
2. Confidence limit for population total: On the same above lines, we see that
Pr [ N y  Z / 2 SE (Yˆ )  Y  N y  Z / 2 SE (Yˆ )]  1  

The probability being (1   ) , the interval, N y  Z / 2 N / n will include Y .

Note: If the sample size is less than 30, and population variance is unknown, Student  t is
used, instead of standard normal.
3. Confidence limit for population proportion: As above, we see that
Pr [ p  Z / 2 SE ( p)  P  p  Z / 2 SE ( p)]  1  

The probability being (1   ) , the interval, p  Z / 2 PQ / n will include P .

Limits in case of simple random sampling without replacement

1. Confidence limit for population mean: Here also the distribution of the estimate based
on the sample as distributed normally, i.e. y ~ N (Y , (1  f ) S 2 / n) , then,
y Y
Z ~ N (0,1) . By definition,
S (1  f ) / n

Pr [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )]  1   .

The probability being (1   ) , the interval [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )] will

include Y , i.e. y  Z / 2 S (1  f ) / n will include Y .
22 RU Khan

2. Confidence limit for population total: As in srswr , we see that

Pr [ N y  Z / 2 SE (Yˆ )  Y  N y  Z / 2 SE (Yˆ )]  1   . The probability being (1   ) ,
the interval, N y  Z / 2 NS (1  f ) / n will include Y .
Note: If the sample size is less than 30, and population variance is unknown, Student  t is
used, instead of standard normal.
3. Confidence limit for population proportion: As in srswr , we see that
Pr [ p  Z / 2 SE ( p)  P  p  Z / 2 SE ( p)]  1   . The probability being (1   ) , the
 N  n  PQ
interval , p  Z  / 2   will include P .
 N 1  n
Example: In a library, there are 4500 members who can borrow the books. A random
sample of 16 persons was taken and number of books borrowed by them during a month was
recorded as follows:
2, 3, 10, 0, 5, 7, 13, 1, 6, 23, 18, 12, 6, 0, 1 and 7. Estimate the average number of books
borrowed by each member during a month and obtain 95% confidence interval.
Solution: Given N  4500 , n  16
n
Estimate of population mean Yˆ  Sample mean y   yi  7.125 .
1
n i 1

Since sample size is small and variance of population is unknown, so the interval is defined
as
N n S
y  t / 2, n 1 S  y  t / 2, n 1 , as population size is very large.
nN n

1  n 2 
S 2 is unknown, it can be replaced by its estimator s 2   i
n  1  i 1
y  n y 2

 44.25 .

Therefore,
6.652
Upper confidence limit  7.125  2.131   10.668853  11 , and
16
6.652
Lower confidence limit  7.125  2.131   3.58  4 .
16
Example: In a mess, it was observed that leftover cost a lot. A survey was conducted to find
out the optimum quantity for each item. A random sample of 10 inmates showed that they
taken 4, 5, 2, 3, 1, 7, 2, 3, 4, 4 slices of bread in their breakfast. If there are 120 breakfasts are
to be served every day, estimate the number of slices required every day. Also obtain a 95%
confidence interval for it.

1 n
Solution: Given N  120 , n  10 , and y  yi  3.5 , then
n i 1

Estimate of population total Yˆ  N y  420 .

Simple random sampling 23

Since sample size is small and variance of population is unknown, so that, confidence limit
N y  t / 2, n 1 NS (1  f ) / n

Since S 2 is unknown, so it can be replaced by its unbiased estimator

1  n 2 
s2  
n  1  i 1
yi  n y 2   2.94444 .


Hence,

 10 
Upper confidence limit  420  2.262  120  1.716 1   / 10  561.02517  561
 120 
and
Lower confidence limit  420  141.02517  278.97483  279 .
Example: 100 villages were selected under srswor from a list of 1521 villages. It was
found that 19 of the selected villages where illegally occupied by some landlords. Estimate all
such villages occupied by the landlords out of the total 1521 villages and 95% confidence
interval.
Solution: Given N  1521, n  100 , and a  19 , then, p  0.19
Estimate of number of village illegally occupied by landlords in the population of villages
Aˆ  N p  288.99  289 .
Since sample size is  30 and variance of population proportion is unknown, then,
confidence limit will be
N p  Z / 2 SE ( Aˆ ) , where, SE (Aˆ ) is unknown, so it can be replaced by its unbiased
estimator

 N n
N  pq  57.964667 .
 n 1 
Thus,
Upper confidence limit  289  1.96  57.964667  412.5227  413 , and
Lower confidence limit  289  1.96  57.964667  165.4773  165 .
Example: A simple random sample of 30 households was drawn without replacement from
a city area containing 14848 households. The number of persons per household in the sample
were as follows: 5, 6, 3, 3, 2, 3, 3, 3, 4, 4, 3, 2, 7, 4, 3, 5, 4, 4, 3, 3, 4, 3, 3, 1, 2, 4, 3, 4, 3 and
4. Estimate the average and total number of people in the area and compute the probability
that these estimates are with in  10% of the true value.
Solution: Given N  14848 , and n  30 , then,
105
Estimate of the population total Y  N y  14848   51968 . Assuming that the
30
 1 f 
population values are normally distributed, so that, N y ~ N  Y , NS  , thus,

 n 
24 RU Khan

Pr ( Estimate lies with in 10% of the true value)

 Pr (Y  10% of Y  N y  Y  10% of Y )
 P (0.9 Y  N y  1.1Y )  P ( N y  1.1Y )  P ( N y  0.9 Y )
We shall use the result that
 1 f  N y Y
N y ~ N  Y , NS  , so that Z 
 ~ N (0,1)
 n  1 f
NS
n
 1   10 
Pr ( N y  1.1Y )  Pr  N y  Y   Pr  N y  Y 
 1.1   11 
 10 1 1 
 Pr  N y  N y  Y  N y 
 11 11 11 
 1   1 
 Pr  N y  Y  N y   Pr  N y  Y  N y 
 11   11 
 
 
N y Y
 Pr    Pr ( Z  1.457)  0.9279 .
Ny
 
1 f 1 f
 NS 11 NS 
 n n 
Similarly,
 10   10 1 1 
Pr ( N y  0.9 Y )  Pr  N y  Y   Pr  N y  N y  Y  N y 
9  9 9 9 
 
 
  N y Y
 Pr  N y  Y   N y   Pr  
1 Ny

 9   1 f 1 f 
NS 9N S 
 n n 
 Pr (Z  1.78)  0.0375 .
Therefore, the required probability 0.9279  0.0375  0.8904 .

Estimation of sample size

In planning a sample survey for estimating the population parameters, the preliminary thing is
how to determine the size of the sample to be drawn. Following ways can do it:
a) Specify the precision in terms of margin of error: The margin of error, which is
permissible in the estimate, is known as permissible error. It is taken as the maximum
difference between the estimate and the parametric value that can be tolerated. Suppose an
error d on either side of the parameter value Y can be tolerated in the estimate y based
on the sample values. Thus the permissible error in the estimate y is specified by

y  Y  d or y  Y   d or | y  Y |  d .
Simple random sampling 25

Since | y  Y |  d differ from sample to sample, so this margin of error can be specified
in the form of probability statement as:
Pr [ | y  Y |  d ]   or Pr [ | y  Y |  d ]  1   . (2.9)
Where  is small and it is the risk that we are willing to bear if the actual difference is
greater than d . This  is called the level of significance and (1   ) is called level of
confidence or confidence coefficient.
As the population is normally distributed, so the sample mean will also follow the normal
y Y
distribution i.e. y ~ N [Y , V ( y )] , then Z  ~ N (0,1) .
V ( y)

For the given value of  we can find a value Z of standard normal variate from the
standard normal table by the following equation:
| y  Y | 
Pr   Z 2    or Pr [| y  Y |  V ( y ) Z 2 ]   (2.10)
 V ( y ) 
Comparing the equation (2.9) and (2.10), we get
1 1 
d  Z 2 V ( y ) , so that d 2  Z2 2 V ( y )  Z 2 2    S 2 .
n N 

Z2 2 S 2  1 1  1 1  Z2 / 2 S 2
 1     n0    , where n0  (2.11)
d2 n N  n N  d2
n n n0 n n0
or 1  0  0   1  0 or n  (2.12)
n N n N n
1 0
N

If N is sufficiently large, then n  n0 and for unknown S 2 , some rough estimate of S 2

can be used in relation’s (2.12) and (2.11).
b) Specify the precision in terms of margin of V ( y ) i.e. we have to find sample size n
such that V ( y )  V (given). As in case of margin of error,

d2 Z 2 2 S 2 S2
d  Z 2 V ( y )  V ( y )  , and n0  
Z / 2 d2 V ( y)

Therefore, n0  S 2 / V , and hence n can be obtained by relation (2.12).

c) Specify the precision in terms of coefficient of variation of y :

V ( y) V ( y)
Let CV ( y )  e    e 2 or V ( y )  e 2 Y 2 (2.13)
Y 2
Y
Substitute equation (2.13) in relation (2.11), we get,

S2
n0  , and hence n from (2.12).
e2 Y 2
26 RU Khan

Remark
i) To get n such that the margin of error in the estimate Yˆ  N y of the population total Y
is d  , then, | Yˆ  Y |  d  or | N y  N Y |  d  , or N | d |  d  or N 2 d 2  d  2 , or
d 2
d2  .
N2
Therefore,
2
 N Z 2 S 
n0    , and n can be obtained by the relation (2.12).

 d  

ii) To find n for Aˆ  N y with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( y )  V 

V N2 S2
 V ( y)  , and n0  , then, n from (2.12).
N2 V

Example: For a population of size N  430 roughly we know that Y  19 , S 2  85.6 with
srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y apart
chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y  Y  10% of Y or | y  Y |  10% of Y   1.9 , so that
10

1 Z 2 2 S 2 (1.96) 2  85.6
Pr [ | y  Y |  1.9]   0.05 , and n0    91.091678 .
20 d2 (1.9) 2
Therefore,
n0
n  75.168  75 .
n0
1
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ  Y |  1000 , so that, Pr [ | Yˆ  Y |  1000]   0.05 .
20
We know that
2
n0  N Z 2 S   676  1.96 
2
n 
, here, n0   
n0 d    1000  229  402.01385 , and hence
1  
N
n  252.09  252 .
Simple random sampling 27

Estimation of sample size for proportion

a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of
P . The margin of error can be specified in the form of probability statement as
Pr [ | p  P |  d ]   or Pr [ | p  P |  d ]  1   (2.14)
As the population is normally distributed, so y ~ N [ P, V ( p)] , then
pP
Z ~ N (0,1) . For the given value of  we can find a value Z of the standard
V ( p)
normal variate from the standard normal table by the following relation:
| p  P | 
Pr   Z 2    or Pr [ | p  P |  V ( p) Z 2 ]   (2.15)
 V ( p) 
Comparing equation (2.14) and (2.15), the relation which gives the value of n with the
required precision of the estimate p of P is given by

 N  n  PQ
d  Z 2 V ( p) or d 2  Z2 / 2 V ( p)  Z 2 2   , as sampling is srswr .
 N 1  n

Z 2 2 PQ  N  n  N n Z 2 2 PQ PQ
 1    n0 , where n0   (2.16)
d2  n ( N  1)  n ( N  1) d2 V ( p)

N 1 N  n N N N 1
or   1   1
n0 n n n n0
N N n0 n0 n0
or n     (2.17)
N  1 n0  ( N  1) n0 N  1 n
1  1 0
n0 N N N
If N is sufficiently large, then n  n0
b) If precision is specified in terms of V ( p) i.e. V ( p)  V (given).
PQ
Substituting V ( p)  V in relation (2.16) we get, n0  , and hence n can be obtained
V
by relation (2.17).
c) When precision is given in terms of coefficient of variation of p
Let
V ( p) V ( p)
CV ( p)  e    e2 , or V ( p)  e 2 P 2 (2.18)
P 2
P
Substitute equation (2.18) in relation (2.16), we get,
PQ Q 1 1 
n0      1 , and hence n is given by the relation (2.17).
e2 P2 e2 P e2  P 
28 RU Khan

Remarks
i) To get n , if the margin of error in the estimate Aˆ  Np of the population total A  NP is
d  , then,

d 2
| Aˆ  A |  d  or | N p  N P |  d  , or N | d |  d  , or N 2 d 2  d  2 , or d 2  .
N2
Thus,
2
 N Z  2 PQ 
n0    , and n can be obtained by the relation (1.17).

 d  

ii) To find n , for Aˆ  Np with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( p)  V  ,

V N 2 PQ
so that, V ( p)  , substitute this value in equation (2.16), we get, n0  , and
N2 V
n is given by relation (2.17).
Example: In a population of 4000 people who were called for casting their votes, 50%
returned to the poll. Estimate the sample size to estimate this proportion so that the marginal
error is 5% with 95% confidence coefficient.
Solution: Margin of error in the estimate p of P is given by
| p  P |  0.05 , then Pr [ | p  P |  0.05]  0.05 .
We know that

Z2 / 2 PQ (1.96) 2  0.5  0.5

n0    384.16  384 , and hence,
d2 0.0025
n0
n  350.498  351 .
1  ( n0 / N )
Exercise: In a study of the possible use of sampling to cut down the work in taking
inventory in a stock room, a count is made of the value of the articles on each of 36 shelves
in the room. The values to the nearest dollar are as follows.
29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 60, 60, 61, 61, 61, 62, 64,
65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85.
The estimate of total value made from a sample is to be correct within $200, apart from a 1 in
20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the
requirements. Do you agree?  Yi  2138 , and  Yi2  131 682 .
Solution: It is given that  Yi  2138 ,  Yi2  131 682 , and N  36 , then
i i

1   1   2138  
2
S2   i     134.5 , and
2 2
Y  NY   131 682  36 
N  1  i  36  1   36  

1
| Yˆ  Y |  200 , then, Pr[| Yˆ  Y |  200]   0.05 .
20
Simple random sampling 29

We know that
2 2
n0  N Z / 2   36  1.96 
n , here n0    S   134.5  16.7409 , and therefore,
1
n0  d   200 
N
n  11.42765  12 .

Determination of sample size in decision problems (Another

approach)
Let l (z ) denote the amount of loss (in monetary terms) that will be incurred in a decision
through an error of amount z in the estimate. Let f (z ) denote the probability density
function of z . Then the expected loss for a given sample size n will be
L(n)  E[l ( z )]   l ( z ) f ( z ) dz

If C (n) is the cost of a sample of size n then the most economic sample size will be that
which minimize the sum of cost and expected loss. Thus the problem of determination of the
sample size can be stated as
Find n such that  (n)  C (n)  L(n) is minimum.

Exercise: If the loss function due to an error in y is proportional to | y  Y | and if the total
cost of the survey is C  c0  c1n , show that with simple random sampling, ignoring the
2/3
 S 
fpc , the most economical value of n is   , where  is a constant.
 c 2 
 1 
Solution: Given l ( z )  | y  Y | , then, l ( z )   | y  Y | , and

 S2  2  2
y ~ N Y ,  , when fpc is ignored V ( y )  S ,  z  ( y  Y ) ~ N  0, S  , so that
 n  n  n 
   

1  1  z 2  1  n z2 
f ( z)  exp      exp   
(S / n ) 2  2  S / n   (S / n ) 2  2S 
   
Now
| z |  | y Y |  y Y  z , if y Y .
| z |  | y  Y |  y  Y   z , if y Y .
Thus, the expected loss
 0 
L(n)    | z | f ( z ) dz   | z | f ( z ) dz    | z | f ( z ) dz
  0
0  
   z f ( z ) dz    z f ( z ) dz  2  z f ( z ) dz
 0 0
 1
 2  z exp (n z 2 / 2 S ) dz
0 (S n ) 2
30 RU Khan

n z2 2n z S2
Put  t , then dz  dt or z dz  dt .
2S2 2S2 n

Therefore,
  S2 1 2  S  t 2 S 
L ( n)  2  e t dt   e dt  , as  e t dt  1 .
0 n (S n) 2 2 n 0 2 n 0

To determine the value of n , consider the function

2  S 1/ 2
 (n)  L(n)  C (n)  c0  c1 n  n
2
Differentiate this function with respect to n , we get
 1  2  S  3 / 2  S 3 / 2
 0  c1   n or n  c1
n 2  2  2
2/3
3 / 2 c 2  S 
or n  1 or n    .
S  c 2 
 1 

Exercise: With a loss function l ( z )   z 2 and a cost function C  c0  c1n . Show that
using srs the most economic value of the sample size n to estimate the population mean Y
12
 S2 
is   , where z  y  Y , y is the sample mean used to estimate Y .
 c1 
 

Solution: Given l ( z )   z 2 quadratic loss function. By definition

L(n)  E [ z 2 ]   E ( z 2 ) . Consider,

V ( z )  E [ z  E ( z )]2  E ( z 2 ) , since E ( z )  E ( y  Y )  0 .
Also
1 1 
V ( z)  V ( y  Y )  V ( y)     S 2 .
n N 
Therefore,

S2 S2  S2  S2
E (z 2 )   , and the expected loss L(n)   .
n N n N
To determine the value of n , consider the function

 S2  S2
 ( n)  L ( n)  C ( n)    c0  c1 n
n N
Differentiate this function with respect to n , we get
1/ 2
  S2  S2  S2 
0  c1 , or  c1 , or n    .
n n2 n2  c1 
 
Simple random sampling 31

Exercise: The selling price of a lot of standing timber is UW , where U is the price per
unit volume and W is the volume of timber on the lot. The number N of logs on the lot is
counted, and the average volume per log is estimated from a simple random sample of n
logs. The estimate is made and paid for by the seller and is provisionally accepted by the
buyer. Later, the buyer finds out the exact volume purchased, and the seller reimburses him if
he has paid for more than was delivered. If he has paid for less than was delivered, the buyer
does not mention the fact.
Construct the seller's loss function. Assuming that the cost of measuring n logs is cn , find
the optimum value of n . The standard deviation of the volume per log may be denoted by S
and the fpc ignored.

Solution: Let Ŵ be the estimated total volume of the timber. The error in the estimate is
Wˆ  W .
If Wˆ  W  z  0 sellers loss is zero, i.e. l ( z )  0

If Wˆ  W  z  0 sellers loss is  Uz , i.e. l ( z )  Uz .

When fpc is ignored V (Wˆ )  N 2 S 2 / n , then

 N 2S 2 
Wˆ ~ N W ,  , or z  (Wˆ  W ) ~ N  0, N S  , so that
 n   
  n

 2

1 1  z   1 n z 2 
f ( z)  
exp    exp  
( NS n) 2  2  NS n   ( NS n) 2  2 N 2S 2 
   

Thus, the expected loss

 1  n z 2 
exp  
0
L ( n)   l ( z ) f ( z ) dz   (Uz) dz
  ( NS n) 2  2 N 2S 2 
 

1  n z 2 
exp  
0
  Uz dz
 ( NS n) 2  2 N 2S 2 
 

 1  n z 2 
  Uz exp   dz
0 ( NS n) 2  2 N 2S 2 
 

n z2 2n z N 2S 2
Put  t , then dz  dt or z dz  dt .
2 N 2S 2 2 N 2S 2 n

Therefore,
 UN 2 S 2 1 UNS  t UNS  t
L ( n)   e t dt  0 e dt  , as 0 e dt  1 .
0 n ( NS n) 2 2 n 2 n
To determine the value of n , consider the function
UNS 1 / 2
 ( n)  L ( n)  C ( n)  c n  n .
2
32 RU Khan

Differentiate this function with respect to n , we get

 1  UNS   3 / 2 UNS
 0  c   n or n 3 / 2  c
n 2  2  2 2
2/3
3 / 2 2c 2  UNS 
or n  or n   
 .
UNS  2c 2 
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0  Q  1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If  2 is the variance of Yi in
the original population and  02 is the variance when all zeros are excluded, then show that
2 Q
 02   Y 2 , where P  1  Q , and Y is the mean value of Yi for the whole
P P2
population.
Solution: Given Y1 , Y2 , , YNP , YNP1 , , YN (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y   Yi , population mean, and
N i 1
YNP   Yi ,
NP i 1

1 NQ N NP N NP
YNQ  i
NQ i 1
Y  0 , also,  i  Yi , and
Y   Yi2   Yi2 , so that NY  NP YNP ,
i 1 i 1 i 1 i 1
1
or YNP  Y . By definition,
P

1 N 1 N 2 N
2  
N i 1
(Yi  Y ) 2
 
N i 1
Yi  Y 2
, or N 2
  Yi2  NY 2 .
i 1
NP
Similarly, NP 02   Yi2  NP YNP
2
.
i 1
Thus,
1 1  Q
N ( 2  P 02 )  NP YNP
2
 NY 2  NP Y 2  NY 2  N   1 Y 2  N   Y 2 .
P 2 P  P
Therefore,

Q 2 Q 2
P o2   2    Y 2 or  o2   Y .
P P P2
Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn
without replacement and added to the original sample. Show that the mean based on (n  n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1  3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1  n1 / n) 2
size is large.
Simple random sampling 33

Solution: Let the sample mean based on n , n1 , and n  n1 elements are denoted by y n ,
n n1
y n1 , and y nn1 respectively, and are defined as y n  1  yi , y n1  1  yi , and
n i 1 n1 i 1
n y n  n1 y n1
y n n1  . We have to show E ( y nn1 )  Y , in this case the expectation is taken
n  n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y nn1 )  E (n y n  n1 y n1 )  E [n y n  n1 E ( y n1 n)]
n  n1 n  n1
1
 E (n y n  n1 y n ) , since n1 is a sub-sample of the sample of size n .
n  n1
1
 (n Y  n1 Y )  Y .
n  n1
To obtain the variance
2
 n y n  n1 y n1 
V ( y n n1 )  E ( y n n1  Y )  E 
2
Y 
 n  n1 
 
1
 E [n y n  n1 y n1  (n  n1 ) Y ] 2
2
(n  n1 )
1
 E [n y n  n Y  n1 y n1  n1 Y ] 2
2
(n  n1 )
1
 E [n ( y n  Y )  n1 y n1  n1 y n  n1 y n  n1Y ] 2
2
(n  n1 )
1
 E [(n  n1 ) ( y n  Y )  n1 ( y n1  y n )]2
2
(n  n1 )
1
 [(n  n1 ) 2 E ( y n  Y ) 2  n12 E ( y n1  y n ) 2 ] , as samples are drawn
2
(n  n1 )
independently.
1
 [(n  n1 ) 2 V ( y n )  n12 E{E ( y n1  y n ) 2 n}]
2
(n  n1 )

1   1 1  
 (n  n1 ) 2 V ( y n )  n12 E    S n2 
(n  n1 ) 2   n1 n  

1   n  n1  2 
 (n  n1 ) 2 V ( y n )  n12   S 
(n  n1 ) 2   n1n  
34 RU Khan

1  n (n  n1 ) 2  n (n  n1 ) 2
 (n  n1 ) 2 V ( y n )  1 S   V ( yn )  1 S .
2  
(n  n1 ) n n (n  n1 ) 2
Therefore,
V ( y n  n1 ) n1 (n  n1 ) n1 (n  n1 )
 1 S 2  1 S2
2 2 2
V ( yn ) n (n  n1 ) V ( y n ) n (n  n1 ) S / n

(n  n1 ) 2  n1 (n  n1 ) n 2  n12  2 n1n  n1n  n12

 
(n  n1 ) 2 (n  n1 ) 2

n 2  3 n1n 1  (3 n1 / n)
  .
(n  n1 ) 2 (1  n1 / n) 2
Exercise: A simple random sample of size n  n1  n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that

i) V ( y1  y 2 )  S 2 [(1 / n1 )  (1 / n2 )] , where y 2 is mean of the remaining n2 units in the

sample,

ii) V ( y1  y )  S 2 [(1 / n1 )  (1 / n)] ,

iii) Cov ( y, y1  y )  0 .
Repeated sampling implies repetition of the drawing of both the sample and subsample.
Solution:
i) In repeated sampling the given procedure is equivalent to draw subsamples of sizes n1
and n2 independently, thus
V ( y1  y 2 )  V ( y1 )  V ( y 2 ) , since Cov ( y1 , y2 )  0

 S 2 [(1 / n1 )  (1 / n2 )] , ignoring fpc .

n y  n2 y 2 n y  n2 y 2
ii) y  1 1  y1  y  y1  1 1
n1  n2 n1  n2
n y n y n y n y n ( y  y2 )
or y1  y  1 1 2 1 1 1 2 2  2 1 .
n1  n2 n
Therefore,

 n ( y  y2 )  n22 n2  1 1  2
V ( y1  y )  V  2 1  V ( y1  y 2 )  2    S
 n  n2 n 2  n1 n2 

n 2  n  n2  2 n2 2 n  n1 2  1 1  2
 2  1  S  S  S     S .
n 2  n1 n2  n1 n n1 n  n1 n 
iii) Cov ( y, y1  y )  E [ y ( y1  y )]  E ( y ) E ( y1  y )

 E ( y y1  y 2 )  Y  0  E ( y y1 )  E ( y 2 ) (1)
Simple random sampling 35

Consider
 n y  n2 y 2  n n 
E ( y y1 )  E  1 1 y1   E  1 y12  2 y1 y 2 
 n  n n 
n n
 1 E ( y12 )  2 E ( y1 ) E ( y 2 )
n n

n1 n2 2 n1  S 2  n
2
 [ V ( y1 )  Y ]  Y  Y 2  2 Y 2
n n n  n1  n


S 2 n1 2 n2 2 S 2
  Y  Y  Y 2 (2)
n n n n
Now

2 2 S2
2
V ( y)  E ( y )  Y 2
or E ( y )  V ( y )  Y  Y 2 (3)
n
In view of equations (1), (2), and (3), we get
 S2   S2 
Cov ( y , y1  y )   Y 2  Y 2   0.
 n   n 
   
Exercise: A population has three units U1 ,U 2 and U 3 with variates Y1 ,Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample (s ) P (s) Estimator t Estimator t 
(U1 , U 2 ) 1/ 2 Y1  2Y2 Y1  2Y2  Y12
(U1 ,U 3 ) 1/ 2 Y1  2Y3 Y1  2Y3  Y12

Prove that both t and t  are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t )   t i p (t i )  (Y1  2Y2  Y1  2Y3 )  Y .
i
2

This shows that estimator t is unbiased for Y .

1 1
E (t 2 )  [(Y1  2Y2 ) 2  (Y1  2Y3 ) 2 ]  (Y12  4Y22  4Y1Y2  Y12  4Y32  4Y1Y3 )
2 2

 Y12  2Y22  2Y32  2Y1Y2  2Y1Y3 .

Therefore,

V (t )  E (t 2 )  [ E (t )]2  Y12  2Y22  2Y32  2Y1Y2  2Y1Y3  (Y1  Y2  Y3 ) 2

 Y22  Y32  2Y2Y3  (Y2  Y3 ) 2 .

36 RU Khan

Similarly,
1
E (t )   t i p (t i )  (Y1  2Y2  Y12  Y1  2Y3  Y12 )  Y , hence, t  is unbiased for Y .
i
2

1
E (t  2 )  [(Y1  2Y2  Y12 ) 2  (Y1  2Y3  Y12 ) 2 ]
2
1 4
 (Y1  2Y13  Y12  4Y12Y2  4Y1Y2  4Y22  Y14  2Y13
2

 Y12  4Y12Y3  4Y1Y3  4Y32 )

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32 .

Therefore,

V (t )  E (t  2 )  [ E (t )]2

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32  (Y1  Y2  Y3 ) 2

 (Y2  Y3 ) 2  Y12 (Y12  2Y2  2Y3 )

 V (t )  Y12 (Y12  2Y2  2Y3 ) .

We conclude that both linear estimator t and quadratic estimator t  are unbiased; among
which estimator has minimum variance depends on the variate values.

Sampling Unit:: Sampling Theory - Chapter 1 - Introduction - Shalabh, IIT Kanpur
100% (1)
Sampling Unit:: Sampling Theory - Chapter 1 - Introduction - Shalabh, IIT Kanpur
11 pages
Sampling Two Stage Sampling
No ratings yet
Sampling Two Stage Sampling
21 pages
Quiz 3
100% (1)
Quiz 3
6 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
Two-Stage Sampling Explained
No ratings yet
Two-Stage Sampling Explained
21 pages
Bayes' Law and Probability Concepts
No ratings yet
Bayes' Law and Probability Concepts
7 pages
Simple Random Sampling Without Replacement (SRSWOR)
No ratings yet
Simple Random Sampling Without Replacement (SRSWOR)
30 pages
Stratified Randon Sampling
No ratings yet
Stratified Randon Sampling
32 pages
Mathematics and Statistics (Unit IV & V)
75% (4)
Mathematics and Statistics (Unit IV & V)
61 pages
Measure of Dispersion Statistics
No ratings yet
Measure of Dispersion Statistics
24 pages
Sampling Techniques Explained
No ratings yet
Sampling Techniques Explained
53 pages
SPSS & Minitab Guide for Students
No ratings yet
SPSS & Minitab Guide for Students
187 pages
Sampling
No ratings yet
Sampling
9 pages
Statatistical Inferences
No ratings yet
Statatistical Inferences
22 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
Unit 10 Randomised Block Design: Structure
No ratings yet
Unit 10 Randomised Block Design: Structure
16 pages
Chapter12 Sampling Successive Occasions
No ratings yet
Chapter12 Sampling Successive Occasions
11 pages
12th-Class-Statistics - Chapter-12
No ratings yet
12th-Class-Statistics - Chapter-12
38 pages
Demographic Analysis: Mortality: The Life Table Its Construction and Applications
No ratings yet
Demographic Analysis: Mortality: The Life Table Its Construction and Applications
40 pages
Comprehensive Guide to Sampling Methods
No ratings yet
Comprehensive Guide to Sampling Methods
37 pages
Week 11 Lecture 20
No ratings yet
Week 11 Lecture 20
16 pages
Sampling Distributions of Sample Means and Proportions PDF
No ratings yet
Sampling Distributions of Sample Means and Proportions PDF
14 pages
43 Survey Sampling
No ratings yet
43 Survey Sampling
199 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
Latin Square Design Analysis
No ratings yet
Latin Square Design Analysis
16 pages
Statistics for Educators & Analysts
100% (1)
Statistics for Educators & Analysts
5 pages
Understanding Confidence Intervals
No ratings yet
Understanding Confidence Intervals
24 pages
Chapter7 Sampling Varying Probability Sampling
No ratings yet
Chapter7 Sampling Varying Probability Sampling
32 pages
Statistics c.1
No ratings yet
Statistics c.1
125 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
45 pages
Chapter9 Sampling Cluster Sampling
No ratings yet
Chapter9 Sampling Cluster Sampling
21 pages
Groebner Business Statistics 7 Ch07
No ratings yet
Groebner Business Statistics 7 Ch07
34 pages
SQC QP Paper
No ratings yet
SQC QP Paper
3 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
STA 201 Lecture Note NEW
No ratings yet
STA 201 Lecture Note NEW
60 pages
Biostatistics Sampling Methods Guide
No ratings yet
Biostatistics Sampling Methods Guide
94 pages
Anova Notes
No ratings yet
Anova Notes
7 pages
Meanings of Statistics
No ratings yet
Meanings of Statistics
28 pages
Z Test
50% (2)
Z Test
39 pages
Statistics For Management For MBA Programme
No ratings yet
Statistics For Management For MBA Programme
10 pages
Prof. U.J.Dixit
No ratings yet
Prof. U.J.Dixit
11 pages
RCBD Anova Notes (III)
No ratings yet
RCBD Anova Notes (III)
13 pages
Ch. 7 - Sampling and Infrential Statistics
No ratings yet
Ch. 7 - Sampling and Infrential Statistics
27 pages
Data Types
No ratings yet
Data Types
8 pages
Test For Association of Attributes: Contingency Tables
No ratings yet
Test For Association of Attributes: Contingency Tables
7 pages
Theory of Sampling: Unit-Iii
No ratings yet
Theory of Sampling: Unit-Iii
41 pages
Sampling: Business Research Methodologies
No ratings yet
Sampling: Business Research Methodologies
21 pages
Statistical Inference Basics
No ratings yet
Statistical Inference Basics
18 pages
Applied Statistics
No ratings yet
Applied Statistics
31 pages
Different Types of Sampling Designs
100% (6)
Different Types of Sampling Designs
12 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Chapter 6 The 2 Factorial Design
No ratings yet
Chapter 6 The 2 Factorial Design
50 pages
Univariate Statistics
100% (1)
Univariate Statistics
7 pages
Sample Survey
No ratings yet
Sample Survey
27 pages
Programmed Statistics by B L-Agarwal-Part1 PDF
No ratings yet
Programmed Statistics by B L-Agarwal-Part1 PDF
311 pages
Binomial Distribution
No ratings yet
Binomial Distribution
26 pages
2St1 Simple Random Samp - in 12
No ratings yet
2St1 Simple Random Samp - in 12
15 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
18 pages
BSC Sample Surveys Unit I Part II
No ratings yet
BSC Sample Surveys Unit I Part II
12 pages
Lecture 14 Simple Random Sampling 3
No ratings yet
Lecture 14 Simple Random Sampling 3
15 pages
Drug Induced Autoimmunity Report
No ratings yet
Drug Induced Autoimmunity Report
4 pages
Explainable Neural Networks For Dynamic Credit Risk Modelling Presentation
No ratings yet
Explainable Neural Networks For Dynamic Credit Risk Modelling Presentation
29 pages
Power Analysis On Top of The Bayesian Approach Already Used
No ratings yet
Power Analysis On Top of The Bayesian Approach Already Used
2 pages
Factorization Solutions WhatsApp Ready
No ratings yet
Factorization Solutions WhatsApp Ready
2 pages
CPOK Density and Temperature Analysis
No ratings yet
CPOK Density and Temperature Analysis
18 pages
Intro to Multiple Linear Regression
No ratings yet
Intro to Multiple Linear Regression
15 pages
Sampling & Estimation Practice
No ratings yet
Sampling & Estimation Practice
9 pages
SPSS Analysis: Respondent Characteristics
No ratings yet
SPSS Analysis: Respondent Characteristics
4 pages
T-Distribution and Estimation of Parameters Using T-Distribution
No ratings yet
T-Distribution and Estimation of Parameters Using T-Distribution
22 pages
Testing The Assumptions of Linear Regression
100% (1)
Testing The Assumptions of Linear Regression
14 pages
Logistic Regression Tutorial Python
No ratings yet
Logistic Regression Tutorial Python
30 pages
Reg Lin
No ratings yet
Reg Lin
73 pages
Logistic Regression Analysis 2022
No ratings yet
Logistic Regression Analysis 2022
38 pages
A First Course in Linear Model Theory 2nd Edition Nalini Ravishanker Download
No ratings yet
A First Course in Linear Model Theory 2nd Edition Nalini Ravishanker Download
57 pages
Introduction To Estimation
No ratings yet
Introduction To Estimation
9 pages
Tugas 6 Analisis Multivariat Data Panel
No ratings yet
Tugas 6 Analisis Multivariat Data Panel
11 pages
Mutual Fund Assets & Returns Analysis
No ratings yet
Mutual Fund Assets & Returns Analysis
11 pages
Econometrics Exam: Regression Analysis
No ratings yet
Econometrics Exam: Regression Analysis
2 pages
Intermediate R - Nonlinear Regression in R
No ratings yet
Intermediate R - Nonlinear Regression in R
4 pages
Predictive Analytics Basics
No ratings yet
Predictive Analytics Basics
19 pages
Time Series & Forecasting (Theory) 498 - Xid-3607638 - 1
No ratings yet
Time Series & Forecasting (Theory) 498 - Xid-3607638 - 1
2 pages
Fisher On Design
No ratings yet
Fisher On Design
15 pages
Variable 1 Variable 2: T-Test: Paired Two Sample For Means
No ratings yet
Variable 1 Variable 2: T-Test: Paired Two Sample For Means
33 pages
Answer Assignment 2
No ratings yet
Answer Assignment 2
6 pages
Artificial Intelligence Lec 4
No ratings yet
Artificial Intelligence Lec 4
13 pages
Midterm: Attendance Quiz 1 Quiz 2
No ratings yet
Midterm: Attendance Quiz 1 Quiz 2
6 pages
Linear Regression & Python Guide
No ratings yet
Linear Regression & Python Guide
24 pages
Supply Chain Mgnt. Syllabus Upto 6th Semester 2007 PDF
No ratings yet
Supply Chain Mgnt. Syllabus Upto 6th Semester 2007 PDF
30 pages
Intervention Analysis
No ratings yet
Intervention Analysis
37 pages
Errata 7
No ratings yet
Errata 7
5 pages
Callaway SantAnna 2020
No ratings yet
Callaway SantAnna 2020
45 pages
(ENGDAT2) Exercise 3
No ratings yet
(ENGDAT2) Exercise 3
10 pages
PDF File SEE
No ratings yet
PDF File SEE
6 pages
2.9 ECS3706 Study Unit 7 - Dummy Variables SM
No ratings yet
2.9 ECS3706 Study Unit 7 - Dummy Variables SM
7 pages

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Uploaded by

N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR

Uploaded by

SIMPLE RANDOM SAMPLING

Simple random sampling with replacement (srswr)

Simple random sampling without replacement (srswor)

Theory of simple random sampling with replacement

To obtain the variance, we have

Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance

Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population

To obtain E ( yi2 ) and E ( y 2 ) , note that

a) Calculate population mean Y , variance  2 and mean sum square S 2 .

Theory of simple random sampling without replacement

Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population

To obtain E ( yi2 ) and E ( y 2 ) , note that

a) Calculate population mean Y , variance  2 and mean sum square S 2 .

Property: V ( y ) under srswor is less than the V ( y ) under srswr .

Simple random sampling applied to qualitative characteristics

Case I) Random sampling with replacement

ii) The standard error of Â is SE ( Aˆ )  N PQ / n .

Case II) Random sampling without replacement

Confidence interval (Interval estimations)

Limits in case of simple random sampling with replacement

The probability being (1   ) , the interval, N y  Z / 2 N / n will include Y .

The probability being (1   ) , the interval, p  Z / 2 PQ / n will include P .

Limits in case of simple random sampling without replacement

The probability being (1   ) , the interval [ y  Z / 2 SE ( y )  Y  y  Z / 2 SE ( y )] will

2. Confidence limit for population total: As in srswr , we see that

Estimate of population total Yˆ  N y  420 .

Since S 2 is unknown, so it can be replaced by its unbiased estimator

Pr ( Estimate lies with in 10% of the true value)

Estimation of sample size

If N is sufficiently large, then n  n0 and for unknown S 2 , some rough estimate of S 2

Therefore, n0  S 2 / V , and hence n can be obtained by relation (2.12).

ii) To find n for Aˆ  N y with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( y )  V 

Estimation of sample size for proportion

ii) To find n , for Aˆ  Np with precision specified as V ( Aˆ )  V i.e. V ( Aˆ )  N 2 V ( p)  V  ,

Z2 / 2 PQ (1.96) 2  0.5  0.5

Determination of sample size in decision problems (Another

To determine the value of n , consider the function

Solution: Given l ( z )   z 2 quadratic loss function. By definition

If Wˆ  W  z  0 sellers loss is  Uz , i.e. l ( z )  Uz .

When fpc is ignored V (Wˆ )  N 2 S 2 / n , then

Thus, the expected loss

Differentiate this function with respect to n , we get

(n  n1 ) 2  n1 (n  n1 ) n 2  n12  2 n1n  n1n  n12

i) V ( y1  y 2 )  S 2 [(1 / n1 )  (1 / n2 )] , where y 2 is mean of the remaining n2 units in the

ii) V ( y1  y )  S 2 [(1 / n1 )  (1 / n)] ,

 S 2 [(1 / n1 )  (1 / n2 )] , ignoring fpc .

This shows that estimator t is unbiased for Y .

 Y12  2Y22  2Y32  2Y1Y2  2Y1Y3 .

V (t )  E (t 2 )  [ E (t )]2  Y12  2Y22  2Y32  2Y1Y2  2Y1Y3  (Y1  Y2  Y3 ) 2

 Y22  Y32  2Y2Y3  (Y2  Y3 ) 2 .

 Y12  4Y12Y3  4Y1Y3  4Y32 )

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32 .

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32  (Y1  Y2  Y3 ) 2

 (Y2  Y3 ) 2  Y12 (Y12  2Y2  2Y3 )

 V (t )  Y12 (Y12  2Y2  2Y3 ) .

You might also like