CHAPTER 2
SAMPLE RANDOM SAMPLING
2.1 Introduction
This is a simplest case of random sampling where each unit of the population is given an equal
chance of being in the sample. If the population is of size N then each unit has a probability of
1
selection of . The selection of units is done with (SRSWR) or without (SRSWOR)
N
replacement. Both require that the population units are numbered from 1 to N. At any draw (any
time during the draw) the selection probability of any population unit not yet drawn remains 1/N.
For SRSWR: Since at any time during the sample draw, there are N units to select from
probability of selecting any unit from the population at any given draw is 1/N.
For SRSWOR: Although, the number of units left in the population change with each draw, the
probability of selecting any unit at any given draw remains 1/N. The proof requires some algebra
here! Consider the probability of selecting an ith unit, Ui at rth draw. This will be the product of
conditional probability of select Ui on rth draw given being missed on (r-1) draws and the
probability of missing Ui on (r-1) draws.
−1 N −1
P(miss Ui on 1st) = 1 =
N N
P(miss Ui on 2nd/missed on first) = ( N−1
N )(
1−
N−1 ) ( N )( N −1 )
1
=
N−1 N −2 N −2
=
N
⋮=
P(miss Ui on (r-1) / missed on (r-2) draws) = ( N−1
N )( N −2
N −1 ) ⋯(
N−r+ 3 )( N−r+ 2 )
N−r +2 N−r+ 1
= ( N−rN +1 )
After (r-1) draws, there are only N- (r-1) units left and the conditional probability of drawing U
(i.e. having missed it on all (r-1) draws) is, thus, given by:
1 1
= =
N −( r −1 ) N −r +1
The unconditional probability of drawing U on the rth draw is then given as a product:
1 N −r +1 1
× =
N −r +1 N N
Element selection probability into a sample of size, n:
n
1 1 1
SRSWR: + + ⋯+ =¿
N N N ∑ N1 = Nn
1 N−1 1 N −n 1 n
SRSWOR: + +⋯ =
N N N −1 N N −n N
2.2. Procedure (SRWOR)
Aim: Take a sample of size n from a population of size N.
Assume a homogeneous population i.e. members of the population are less variable from each
other.
(a) Number the population units 1 to N
(b) Select one unit at a time without replacement, using a mechanism that ensures that at
each selection all the remaining population members have equal probability of selection,
1/N.
(c) A random mechanism such as a lottery method or random number tables, computer
random number generator does this.
(d) For a sample of size n, there are N C npossible samples (Si). All have equal probability of
1
selection: P(Si) =
N Cn
Notation:
It should be understood that when we sample units U1, U2, ⋯ , Un we take values/responses for
many variables of interest such as xi, yi, zi, etc. but in our theoretical development we shall use
general cases. Also note that in sampling, units are selected before actual field visit.
Capital letters will represent population parameters e.g. mean → X
Small letters for sample characteristics e.g. mean = → x
n N
Sampling fraction, f = and sampling weight is given as
N n
n
The probability of including a specified unit into the sample of sample of size n is also
N
called inclusion probability. It is common for it to be written as
n n ( n−1 )
π i= and π ij= is probability that Ui and Uj selected into the sample. It is called
N N ( N −1 )
the joint probability of selection of Ui and Uj.
2.3. Estimation of population Parameters
As was stated in the previous chapter estimation of the parameters requires that:
(a) Choose an estimator and construct an estimate
(b) State the properties of the estimator: unbiasedness; MSE or sampling variance,
consistency
(c) State the sampling distribution of the estimates so as to construct the 100(1-α ¿ % CI
In the subsequent sections, different estimators of the parameters are given including their
properties.
23.1. Estimating Population Mean
Depending on the information available in the data, there are about 4 ways of estimating the
population mean. These include the sample mean estimator, ratio-mean estimator, regression-
mean estimator and difference-mean estimator. In this section we consider the most common, the
sample mean x as our estimate of population mean ( X )
n
1
x=
n ∑ xi
i=1
Properties of x
(a) E( x ) = X
1−f 2
(b) Sampling variance Var ( x ) = S (see proof in Appendix 1b) (2.2)
n
Where (1-f ) is a finite population correction (fpc). This is a correction for the fact
that the population is finite.
(c) ( x ) is a consistent estimator of population mean (i.e. as n → N , x → X )
(d) For reasonably large samples, we usually assume x is normally distributed (recall central
Limit Theorem).
(e) x is BLUE (Best linear unbiased Estimator) – Minimum variance & Linear function
Note:
2
S
(i) When n << N then (1-f ) ≈ 1 i.e. the fpc is negligible and Var ( x ) =
n
(ii) When n is close to N: (1 – f )≈ 0
(iii) When n = N (i.e. census), Var ( x ) = 0
(iv) Fpc can be ignored whenever f does not exceed 5% (see Cochran, 1977). At times
even when f = 10%
(v) SE ( x ) = √ Var (x) =
√ ( 1−f ) 2
n
S =
√
( N −n ) 2
Nn
S (2.3)
(vi) Randomly chose observation is an unbiased estimator of population mean i.e. E ( x i ) =
X
2.3.2 Estimating Population Variance (( S2 ) – Continuous variables
There are two possible estimators for the population variance with the only difference being that
one estimator uses n and another uses (n-1) as a denominator. The commonly used estimator is:
2 ∑ n ( x i−x )
S = i−¿¿
2
(2.4)
E ( S2 ), unbiased (see proof in Appendix 1c)
2
S has an approximate Chi-squared distribution
The standard deviation(SD) is given by the square root of S2
The sampling variance of this estimator has a long form that we do not go into here.
Notes:
2
∑ N ( x i−x )2
(i) Population variance is given by S = i−¿¿
N −1
( N−1 ) 2
(ii) Var ( X i ) = S
N
2
−S
(iii) Cov ( X i , X j ) = (if population is finite we get negative correlation between
N
successive units of the same sample)
(iv) The standard deviation is a measure of how “spread” the measurements or
observations from individual population members are from the population mean. In
finance and other fields, standard deviation provides a measure of risk e.g. in
investing in stocks.
(v) The standard deviation and standard error are different concepts but they are so
closely tied together that to the uninitiated they are confusing!
2.3.3 Sampling distribution and confidence interval for sample mean estimate
The samples we took in section 1.7 were SRS and we noted that when the sample means were
plotted, the plot approximated the normal distribution density curve. Indeed by Central Limit
Theorem, we have that x N X , [ (1−f ) 2
n
S ]
From which an approximate 100 (1−α ¿ % Cl for X is given by x ± Z α1 /2
n
S
√
( 1−f ) 2
(2.5)
- In practice, S2 is unknown and must be replaced by s2
- If we do so, in samples where (n<40) we use a t-distribution rather than Z, thus we have
√
x ± t α / 2, ( n−1 ) ( 1−f ) S 2
n
(2.6)
2.3.4. Estimating Population Total, (X T ) – Continuous variables
The population total is given by an ‘expansion’ estimator: ^
X T =¿
N
n
❑❑= N x
[ ] (2.7)
n
∑ xi
Where i=1 is sample total.
Properties of ^
XT
(i) Unbiased since NE( x ) = N X = X T
(1−f ) 2
(ii) Var ( ^
X T ¿ = N 2 Var (x) = N 2 S (2.8)
n
(iii) ^
X T is blue
(iv) Under usual conditions assume ^
XT N ^[
XT , N2
(1−f ) 2
n
S ]
(v) If we assume (iv) our 100 (1−α ¿ % CI are
(1−f ) 2 2 (1−f )
X T = N x ± Z a/ 2 √ N 2 S 2
^ or N ( x ) t α / 2, ( n−1 ) √ N S for small n.
n n
Example 1: Supermarket sales (Class work)
Warning: Most of the examples we shall discuss are miniatures (smaller than normal) of true
sample surveys or simple artificial construct. This is to help you understand the concepts. Real
surveys are large and will be a pain working on them manually. In computer practical we shall
do real datasets as well.
Suppose out of the 83 medium-sized supermarkets in Kampala we took a simple random sample
of 14 supermarkets asked managers about the previous month’s sales (x) and the expired
products thrown (y) both in hundreds of dollars, and profit margin (z) The variable Loc-
indicates location (1- town centre; 2 = outskirts). Data are given in Table 2.1.
Questions
Compute the following
(a) Supermarket average monthly sales in Kampala and its standard error
(b) The 95% CI total of losses in form of expired products in all supermarkets in K’la.
(c) The population standard deviation of monthly profit margins all supermarkets in
Kampala.
Table 2.1 Supermarket sales data
Supermarket Loc x Y z Wt
1 1 2094 1.5 10 5.9
2 1 4867 2.4 14 5.9
3 2 2991 1.8 13 5.9
4 2 1994 1.5 11 5.9
5 2 1217 1.3 10 5.9
6 2 1574 1.1 9 5.9
7 1 1138 1.2 10 5.9
8 1 3816 2.0 17 5.9
9 2 1308 1.0 7 5.9
10 1 3460 1.9 9 5.9
11 2 4490 2.2 10 5.9
12 1 4006 2.1 15 5.9
13 2 4567 2.3 8 5.9
14 2 2383 1.6 15 5.9
Totals 39905 24 158
Solution
To be handed /discussed in the next lecture
2.2.5 Estimating Population Ratio, R
Often times the quantity that is to be estimated from a simple random is a ratio of two variables
both of which vary from unit to unit. Examples, ratio of loans for building purposes to total loans
given out in a bank, ratio of total living expenses per individual of differently sized households,
average number of hours per week spend watching TV per child, etc.
XT X
Denote R as the population ratio such that; R = = Y
YT
The corresponding sample estimate is given as;
n
∑ xi x
r= n = y
∑ yi
Properties of r
(i) Its exact distribution is complicated because it is a ratio of two variables
(ii) It is biased, with bias given by Bias (r) Cov (r, y ) / Y
The proof long but an alternative is to think of Cov (r, y ) = E [ r . , y ] – E (, y ) E(r) = E [ ]
x
y
. y −¿
E (, y ) E(r)
From which we have
E (r) =
1
E(, y ) { x
[ ]} = E(r) =
Cov ( r , y )−E . y
y
1
E( y )
¿=
Cov(r , y )
Y
−R
−Cov(r , y )
But Bias (r) = R −E ( r ) =
Y
This bias becomes large as coefficient of variation of y becomes bigger
(iii) The variance of r is given by various forms. These include among others;
1−f 2 2 2
Var(r) = 2 ( S X −2 RS XY + R S Y ¿ or
nY
1−f 2
Var(r) = 2 2
S =P XY S X SY
2 ( S X −2 RP XY S X S Y + R SY ¿ since XY
nY
(iv) One of the sample estimator of the sampling variance (or more appropriately the
MSE) is given by:
n n n
Var(r) =
1−f n
1−f
⌈∑ ¿¿¿ = 2
∑ x 2i −2 r ∑ x i y i +r 2 ∑ y 2i (2.11)
2
ny i=1 ny ⌈ i=1 ⌉
n−1
This estimator is based on a delta method or Taylor linearization method.
(v) It is approximately normally distributed in large samples. The (1−α ¿100% CI is
given by r ± Z α /2 SE (r ) [do not use a t-distribution! ]
Note: (1) If you reserve the letters for denominator with the numerator, i.e. R = Y/X, be
consistent and change the variance formulas as well. (2) The Var(r) is more appropriately termed
the MSE (r) since the estimator is biased (see section 1.7).
Example 2: Student Homework (To be shared with students)