0% found this document useful (0 votes)

6 views31 pages

Module I - Sampling Methodology

Uploaded by

samson.gangarapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views31 pages

Module I - Sampling Methodology

Uploaded by

samson.gangarapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

MODULE I.

SAMPLING METHODOLOGY
Chapter 1. GENERAL INTRODUCTION

Decision makers make better decisions when they use all available information in an effective
and meaningful way. The primary role of statistics is to provide decision makers with
methods for obtaining and analyzing information to help make these decisions. Statistics is
used to answer long-range planning questions, such as when and where to locate facilities to
handle future sales.

Definition

Statistics is defined as the science of collecting, organizing, presenting, analyzing and

interpreting numerical data for the purpose of assisting in making a more effective decision.

Types of Statistics

There are two types of statistics.

1. Descriptive Statistics is concerned with summary calculations, graphs, charts and tables.

2. Inferential Statistics is a method used to generalize from a sample to a population. For

example, the average income of all families (the population) in the US can be estimated from
figures obtained from a few hundred (the sample) families.

Statistical Population

Is the collection of all possible observations of a specified characteristic of interest. An

example is all of the REMA staff. Note that a sample is a subset of the population.

Variable

A variable is an item of interest that can take on many different numerical values.

Types of Variables or Data

1. Qualitative Variables are nonnumeric variables and can't be measured. Examples include
gender, religious affiliation, state of birth.

2. Quantitative Variables are numerical variables and can be measured. Examples include
balance in your checking account, number of children in your family. Note that quantitative
variables are either discrete (which can assume only certain values, and there are usually
"gaps" between the values, such as the number of bedrooms in your house) or continuous
(which can assume any value within a specific range, such as the air pressure in a tire.)

Types of Quantitative Data:

There are four (4) types of quantitative data:

1. Nominal Data: The weakest data measurement. Numbers are used to represent an item or
characteristic. Example: designating male=1 and female=2

2
2. Ordinal or Rank Data: Numbers are used to rank. Example is excellent, good, fair and
poor. The main difference between ordinal data and nominal data is that ordinal data contain
both an equality (=) and a greater-than (>) relationship, whereas the nominal data contain only
an equality (=) relationship.

3. Interval Data: If we have data with ordinal properties (> & =) and can also measure the
distance between two data items, we have an interval measurement.
Interval data are preferred over ordinal data because, with them, decision makers can
precisely determine the difference between two observations, i.e., distances between numbers
can be measured. For example, frozen-food packagers have daily contact with a common
interval measurement--temperature.

4. Ratio Data: Is the highest level of measurement and allows for all basic arithmetic
operations, including division and multiplication. Data measured on a ratio scale have a fixed
or no arbitrary zero point. Examples include business data, such as cost, revenue and profit.

Sources of Data:

1. Secondary Data: Data which are already available. An example: statistical abstract of USA.
Advantage: less expensive. Disadvantage: may not satisfy your needs.

2. Primary Data: Data which must be collected.

Methods of Collecting Primary Data:

1. Focus Group; 2. Telephone Interview; 3. Mail Questionnaires; 4. Door-to-Door Survey; 5.

Mall Intercept; 6. New Product Registration; 7. Personal Interview; and 8. Experiments are
some of the sources for collecting the primary data.

1.1 Sampling Methods

There are many ways to collect a sample. The most commonly used methods are:

A. Statistical Sampling:

1. Simple Random Sampling: Is a method of selecting items from a population such that every
possible sample of specific size has an equal chance of being selected. In this case, sampling
may be with or without replacement.

2. Stratified Random Sampling: Is obtained by selecting simple random samples from strata
(or mutually exclusive sets). Some of the criteria for dividing a population into strata are: Sex
(male, female); Age (under 18, 18 to 28, 29 to 39); Occupation (blue-collar, professional,
other).

3. Cluster Sampling: Is a simple random sample of groups or cluster of elements. Cluster

sampling is useful when it is difficult or costly to generate a simple random sample. For
example, to estimate the average annual household income in a large city we use cluster
sampling, because to use simple random sampling we need a complete list of households in
the city from which to sample. To use stratified random sampling, we would again need the
list of households. A less expensive way is to let each block within the city represent a cluster.

3
A sample of clusters could then be randomly selected, and every household within these
clusters could be interviewed to find the average annual household income.

B. Nonstatistical Sampling:

1. Judgment Sampling: In this case, the person taking the sample has direct or indirect control
over which items are selected for the sample.

2. Convenience Sampling: In this method, the decision maker selects a sample from the
population in a manner that is relatively easy and convenient.

3. Quota Sampling: In this method, the decision maker requires the sample to contain a
certain number of items with a given characteristic. Many political polls are, in part, quota
sampling.

SAMPLING

Sampling is that part of statistical practice concerned with the selection of a subset of
individual observations within a population of individuals intended to yield some knowledge
about the population of concern, especially for the purposes of making predictions based on
statistical inference. Sampling is an important aspect of data collection.

Researchers rarely survey the entire population for two reasons: the cost is too high, and the
population is dynamic in that the individuals making up the population may change over time.
The three main advantages of sampling are that the cost is lower, data collection is faster, and
since the data set is smaller it is possible to ensure homogeneity and to improve the accuracy
and quality of the data.

Each observation measures one or more properties (such as weight, location, color) of
observable bodies distinguished as independent objects or individuals. In survey sampling,
survey weights can be applied to the data to adjust for the sample design. Results from
probability theory and statistical theory are employed to guide practice. In business and
medical research, sampling is widely used for gathering information about a population.

SAMPLING PROCESS

The sampling process comprises several stages:

1. Defining the population of concern
2. Specifying a sampling frame, a set of items or events possible to measure
3. Specifying a sampling method for selecting items or events from the frame
4. Determining the sample size
5. Implementing the sampling plan
6. Sampling and data collecting

POPULATION DEFINITION

Successful statistical practice is based on focused problem definition. In sampling, this

includes defining the population from which our sample is drawn. A population can be
defined as including all people or items with the characteristic one wishes to understand.
Because there is very rarely enough time or money to gather information from everyone or

4
everything in a population, the goal becomes finding a representative sample (or subset) of
that population.

Sometimes that which defines a population is obvious. For example, a manufacturer needs to
decide whether a batch of material from production is of high enough quality to be released to
the customer, or should be sentenced for scrap or rework due to poor quality. In this case, the
batch is the population.

Although the population of interest often consists of physical objects, sometimes we need to
sample over time, space, or some combination of these dimensions. For instance, an
investigation of supermarket staffing could examine checkout line length at various times, or
a study on endangered penguins might aim to understand their usage of various hunting
grounds over time. For the time dimension, the focus may be on periods or discrete occasions.

SAMPLING FRAME

In the most straightforward case, such as the sentencing of a batch of material from
production (acceptance sampling by lots), it is possible to identify and measure every single
item in the population and to include any one of them in our sample. However, in the more
general case this is not possible. There is no way to identify all rats in the set of all rats.

Where voting is not compulsory, there is no way to identify which people will actually vote at
a forthcoming election (in advance of the election). These imprecise populations are not
amenable to sampling in any of the ways below and to which we could apply statistical
theory.

As a remedy, we seek a sampling frame which has the property that we can identify every
single element and include any in our sample. The most straightforward type of frame is a list
of elements of the population (preferably the entire population) with appropriate contact
information. For example, in an opinion poll, possible sampling frames include:
• Electoral register
• Telephone directory

Not all frames explicitly list population elements. For example, a street map can be used as a
frame for a door-to-door survey; although it doesn't show individual houses, we can select
streets from the map and then visit all houses on those streets. (One advantage of such a frame
is that it would include people who have recently moved and are not yet on the list frames
discussed above.)

The sampling frame must be representative of the population and this is a question outside the
scope of statistical theory demanding the judgment of experts in the particular subject matter
being studied. All the above frames omit some people who will vote at the next election and
contain some people who will not; some frames will contain multiple records for the same
person. People not in the frame have no prospect of being sampled. Statistical theory tells us
about the uncertainties in extrapolating from a sample to the frame. In extrapolating from
frame to population, its role is motivational and suggestive.
To the scientist, however, representative sampling is the only justified procedure for choosing
individual objects for use as the basis of generalization, and is therefore usually the only
acceptable basis for ascertaining truth.

5
BASIC PROBLEM OF SAMPLING FRAMES:

Missing elements: Some members of the population are not included in the frame.
Foreign elements: The non-members of the population are included in the frame.
Duplicate entries: A member of the population is surveyed more than once.
Groups or clusters: The frame lists clusters instead of individuals.

A frame may also provide additional 'auxiliary information' about its elements; when this
information is related to variables or groups of interest, it may be used to improve survey
design. For instance, an electoral register might include name and sex; this information can be
used to ensure that a sample taken from that frame covers all demographic categories of
interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone
number may provide some information about location.)
Having established the frame, there are a number of ways for organizing it to improve
efficiency and effectiveness.
It's at this stage that the researcher should decide whether the sample is in fact to be the whole
population and would therefore be a census.

PROBABILITY AND NON PROBABILITY SAMPLING

A probability sampling scheme is one in which every unit in the population has a chance
(greater than zero) of being selected in the sample, and this probability can be accurately
determined. The combination of these traits makes it possible to produce unbiased estimates
of population totals, by weighting sampled units according to their probability of selection.

Example: We want to estimate the total income of adults living in a given street. We visit
each household in that street, identify all adults living there, and randomly select one adult
from each household. (For example, we can allocate each person a random number, generated
from a uniform distribution between 0 and 1, and select the person with the highest number in
each household). We then interview the selected person and find their income. People living
on their own are certain to be selected, so we simply add their income to our estimate of the
total. But a person living in a household of two adults has only a one-in-two chance of
selection. To reflect this, when we come to such a household, we would count the selected
person's income twice towards the total. (In effect, the person who is selected from that
household is taken as representing the person who isn't selected.)

In the above example, not everybody has the same probability of selection; what makes it a
probability sample is the fact that each person's probability is known. When every element in
the population does have the same probability of selection, this is known as an 'equal
probability of selection' (EPS) design. Such designs are also referred to as 'self-weighting'
because all sampled units are given the same weight.

PROBABILITY SAMPLING

Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified

Sampling, Probability Proportional to Size Sampling, and Cluster or Multistage Sampling.
These various ways of probability sampling have two things in common:
• Every element has a known nonzero probability of being sampled and
• Involves random selection at some point.

6
NON PROBABILITY SAMPLING

Nonprobability sampling is any sampling method where some elements of the population
have no chance of selection (these are sometimes referred to as 'out of
coverage'/'undercovered'), or where the probability of selection can't be accurately
determined. It involves the selection of elements based on assumptions regarding the
population of interest, which forms the criteria for selection. Hence, because the selection of
elements is nonrandom, nonprobability sampling does not allow the estimation of sampling
errors.

These conditions give rise to exclusion bias, placing limits on how much information a
sample can provide about the population. Information about the relationship between sample
and population is limited, making it difficult to extrapolate from the sample to the population.

Example: We visit every household in a given street, and interview the first person to answer
the door. In any household with more than one occupant, this is a nonprobability sample,
because some people are more likely to answer the door (e.g. an unemployed person who
spends most of their time at home is more likely to answer than an employed housemate who
might be at work when the interviewer calls) and it's not practical to calculate these
probabilities.

Nonprobability Sampling includes: Quota Sampling and Purposive Sampling. In addition,

nonresponse effects may turn any probability design into a nonprobability design if the
characteristics of nonresponse are not well understood, since nonresponse effectively modifies
each element's probability of being sampled.

SAMPLING METHODS

Within any of the types of frame identified above, a variety of sampling methods can be
employed, individually or in combination. Factors commonly influencing the choice between
these designs include:
• Nature and quality of the frame
• Availability of auxiliary information about units on the frame
• Accuracy requirements, and the need to measure accuracy
• Whether detailed analysis of the sample is expected
• Cost/operational concerns

7
Chapter 2. SIMPLE RANDOM SAMPLING

In a simple random sample ('SRS') of a given size, all such subsets of the frame are given an
equal probability. Each element of the frame thus has an equal probability of selection: the
frame is not subdivided or partitioned. Furthermore, any given pair of elements has the same
chance of selection as any other such pair (and similarly for triples, and so on). This
minimizes bias and simplifies analysis of results. In particular, the variance between
individual results within the sample is a good indicator of variance in the overall population,
which makes it relatively easy to estimate the accuracy of results.

However, SRS can be vulnerable to sampling error because the randomness of the selection
may result in a sample that doesn't reflect the makeup of the population. For instance, a
simple random sample of ten people from a given country will on average produce five men
and five women, but any given trial is likely to over represent one sex and under represent the
other. Systematic and stratified techniques, discussed below, attempt to overcome this
problem by using information about the population to choose a more representative sample.
SRS may also be cumbersome and tedious when sampling from an unusually large target
population. In some cases, investigators are interested in research questions specific to
subgroups of the population. For example, researchers might be interested in examining
whether cognitive ability as a predictor of job performance is equally applicable across racial
groups. SRS cannot accommodate the needs of researchers in this situation because it does not
provide subsamples of the population. Stratified sampling, which is discussed below,
addresses this weakness of SRS.

II.1. Sampling distribution

The sampling distribution of a statistic is a fundamental concept of statistical inference. In this

chapter we will focus on the average sample and sampling distribution. We first present some
definitions of terms and important relations in determining the sampling distribution.

• Expected Value

The expected value is the average value of a feature from all possible samples of the same
size. Mathematically, we define the expected value or average of a random variable Y as
follows:

E ( y ) = ∑ y. p( y )
y
where ∑ p( y ) = 1
y
The expected value is the sum of products of all possible values of variable Y and their
associated probabilities p(y).

For example, consider the following table data from a census where the variable Y is the size
of a randomly selected household:

Size Number of Percentage

Households
1 Person 17 816 0,225
2 persons 24 734 0,313
3 persons 13 845 0,175
4 persons 12 470 0,157

8
5 persons 5 996 0,076
6 persons 2 499 0,032
7 persons or more 1 748 0,022
Total 79 108 1,00

E(y) = 1. (0,225) + 2. (0,313) + 3. (0,175) + 4. (0,157) + 5. (0,076) + 6. (0,032) + 7,7. (0,022)

= 2,75

This means that the average of these different sizes of household is 2.75. Note that the size of
household "7 or more" is the aggregation of data for households where Y = 7, 8, 9 ... We
determined the average size of all households with 7 or more persons, or 7.7.

• Unbiased estimator

An estimator is unbiased when its expected value is equal to the parameter that it is
advantageous to estimate. Thus, the bias is the difference between the expected value of an
estimate and the true population-value (or parameter) is zero.

• Consisting Estimator

An estimator is consistent if its values tend to concentrate around the true value over time as
the sample size increases.

Let us return to the sampling distribution. This is the probability distribution of all possible
values that can make an estimate in a sampling as possible.

Let us use the following table that links each population area to the unit size of household.
We can make the sampling distribution of the mean by creating a list of all possible random
samples of size n = 2 that can be drawn from a population of size N = 5 units.

Then we will estimate the average size of households in these units.

Housing Unit Household size

(Ui) (Yi)
U1 3
U2 5
U3 7
U4 9
U5 11

The number of people in the population is:

N
Y = ∑ Yi = 35
i =1
The average number of persons per household (average household size) is.
_
1 N 35
Y = ∑ Yi = =7
N i=1 5

9
If one draws a sample of size 2 in this population, there are C25 = 10 possibilities as
follows:

3 and 5 5 and 7 7 and 9 9 and 11

3 and 7 5 and 9 7 and 11
3 and 9 5 and 11
3 and 11

The averages of these samples are respectively 4, 5, 6, 7, 6, 7, 8, 8, 9 and 10 and if the

sampling is random so that each sample has a probability of 1 / 10, we obtain all possible
samples of size 2 (housing) of a population of 5 units of housing as shown in the following
table.

Sample size Value of the mean Probability

n=2 p(y)
3;5 4 1/10
3;7 5 1/10
3;9 6 1/10
3;11 7 1/10
5;7 6 1/10
5;9 7 1/10
5;11 8 1/10
7;9 8 1/10
7;11 9 1/10
9;11 10 1/10

The table below shows the sampling distribution of the mean

The mean Probability

p(y)
4 1/10
5 1/10
6 2/10
7 2/10
8 2/10
9 1/10
10 1/10

The expected value of this distribution of sample mean is:

E ( ) = 4. 1/10+5. 1/10+ 6. 2/10 +7. 2/10+ 8. 2/10+ 9. 1/10 +10. 1/10 = 7

From the table of samples of size n = 2, one can calculate the expected value of
−
10
_
1 70 −
E ( y) = ∑ 10
10 i=1
y = = 7 = Y

Thus, in simple random sampling, the sample average is an unbiased estimate of the mean-
population.

10
It should be noted that we would get the same results for samples of any size.

• Standard deviation

It is a measure of variability in the population that can be estimated from observations of a

single sample, and which it is possible to estimate the expected error of the sample average.

The Variability in the population is measured by the standard deviation (σ) whose square is
the variance (σ2). The population variance is defined as the average of squared deviations of
all individual observations from their average value, or the following formula:
Population variance with bias:
− − −
2
(Y1 − Y ) + (Y2 − Y ) + ......+ (YN − Y )2 1 N
2 2 _
σ =
2
= ∑(Yi − Y )
N N i=1
Population variance unbiased (corrected):
2
1 N _
S =
2
∑ i )
N − 1 i=1
(Y − Y
Variance in sample:
2
1 n _
s =
2
∑ ( yi − y)
n − 1 i=1
Note that s2 is an unbiased estimator of S2 (σ2 no unbiased estimator).
N −1 2 N −1 2
Note also that σ2 = S and S2 = σ
N N
• Standard error of the sample average

The variance of the sample average is the average of the average of the square average over
the true value for all possible samples of size n. The true variance is denoted S2 ( ) and is
calculated as follows:
S2 ⎛ N − n ⎞
−
S ( y) =
2
⎜ ⎟
n ⎝ N ⎠

The square root of the variance of is the standard error of averages from samples of size n.
Its formula is:
−
S ⎛ N −n⎞
S ( y) = ⎜ ⎟
n ⎝ N ⎠
It is important to note that the error varies with the size of the sample, the error decreases by
increasing the sample size.

11
⎛ N −n⎞
In the formula for the variance of , the factor ⎜ ⎟ is known as factor of finite
⎝ N ⎠
population correction

⎛ N −n⎞
If n ≤ 0.05, ⎜ ⎟ can be ignored because its value is almost equal to 1.
⎝ N ⎠

Illustration

• Interval estimation (confidence interval)

We know that the probability for an estimate to be equal to the true value (parameter) is zero
for continuous variables. So it is more useful to state how much it is likely that an interval
based on our estimate may contain the parameter value that it is advantageous to estimate.

An interval estimator is a formula that uses the observations of a sample to calculate two
numbers that define an interval containing the parameter with some probability. The interval
obtained is called confidence interval and the probability that it contains the parameter value
is called confidence coefficient. When a confidence interval has a confidence coefficient of
0.95, it is said to set a confidence line of 95%.

In general, the confidence interval for a parameter is given by [ ± tS ( )]. The symbol t is
the value of the variable that corresponds to the normal value of the probability of desired
confidence.

In practice, we do not know S2 and s2, the sample variance is used as an estimate of the value
of S2.

Thus, the confidence interval is : [ ± t s( )]

It should be noted that if the sample size n is large, s provides a good estimate of S, while for
small samples; the estimate is not very good.

−
For the parameter Y , the confidence interval is:

− − −
s ⎛ N −n⎞
y ± ts ( y ) = y ± t ⎜ ⎟
n ⎝ N ⎠ (Ignore fpc if n ≤ 0.05 N)
The value of t depends on the level of confidence. For large samples, the values used are:

t = 1.28 for a confidence level of 80%

t = 1.64 for a confidence level of 90%
t = 1.96 for a confidence level of 95%
t = 2.58 for a confidence level of 99%

12
An estimation by confidence interval has two parts: a ponctual estimate and a value ± which
describes the precision of the estimate. We call the value ± the margin of error.

If the sample size is below 30, the values must be obtained from the table for t distribution
with (n-1) degrees of freedom.
Example

A simple random sample of 100 households was selected from a village where the average
monthly expenditure for electricity is $ 75 with a standard deviation of $ 15. Find the
confidence interval at 95% of the estimated average expenditure of the entire population.
15
75 ± 1,96 = 75 ± 2,94
100
The confidence interval is [72.06, 77.94]
The average monthly expenditure is [72.06, 77.94].
The population may be confident at 95% that the average monthly expenditure is between
72.06 and 77.94.

Note

the difference between S2 and σ 2 has no importance for large populations where σ 2 is most
used.
Note also that in simple random sampling, s2 is an unbiased estimator of S2.
N −1 2 N 2
σ2 = S and
S2 = σ
N N−1
• Population-value: estimates and precision measures
^
The estimate of the total population size (Y), denoted by Y is given by the following
expression:
^ _
N N n
Y = N y = y = ∑ yi
n n i=1
^
The error type of Y is given by : NS( )

Illustration

• Relative error

We can express the error as a proportion (or percentage) of the estimate value. The proportion
is called relative standard error or coefficient of variation CV and written in parentheses.
The CV allows comparing two distributions of values.
To estimate the size (total), the true coefficient of variation is

13
^
N −n
^
S (Y ) NS N
CV (Y ) = ^
= _
Y n Ny

1 S N − n CV N −n
= _
= and with
n y N n N

N −n
≈1
N
CV
=
n

Similarly, the true coefficient of variation for the average-sample is given as follows:

_
N −n
S
_
S ( y) 1 N
CV ( y ) = _
= _
y n y
CV N −n
= and with
n N
N −n
≈1
N

CV
=
n
^ _
We observe that : CV (Y ) = CV ( y )

It may be noted that the standard error of the estimated population is N times the average,
while the coefficients of variation are the same for both. This result is not surprising because
the estimation of a number is obtained by multiplying the sample mean (an estimate) by the
number of elements of the population (a known number), the only source of error is the
average sample.

So, we can predict the error of the size number, when it is expressed as a proportion or
percentage. That error is equal to one for the average. However, when the actual error is

14
expressed in absolute terms, it will be equal to N times the average because N acts as a
multiplication factor.
The variance of the variable Yi is given by:
2
1 N _
σ = ∑ (Yi − Y )
2

N i=1
and the variance of the distribution is given by:
N −
∑ (Y − Y )
i =1
i
2

N σ2
(CV ) = 2
_
=
−2
2
(Y ) Y
Illustration

• Sampling for proportion

In many situations, we use the sample proportion p to perform statistical inference on the
proportion of the population P.
In statistical analysis, the proportions appear in two different ways:
a) we can look at proportions rather than numbers or averages. For example, the proportion
of the unemployed population, the percentage of households whose income falls below the
poverty line, the proportion of business firms interested in a product

b) it may be desirable to classify a population into groups and determine the percentage of the
population falling into each group. For example, distribution of population by five year age
group, the classification of firms by revenue.

• Notations and formulas

Assuming a total population and a sample and considering a particular class of units, we can
use the following notations:

A = Number of units of the class of the population

a = Number of this class in the sample

P = Proportion of units of this class in the population

p = Proportion of units of this class in the sample

Q = population proportion outside of class (Q = 1-P)

q = sample proportion out of the class (q = 1-p)

We have P = A / N where N is the size of the population;

p = c / n where n is the sample size

15
We can apply all the formulas discussed previously in this particular case by considering each
unit in the population as having a characteristic that may take two values: either the value 0 (if
the unit is not in the class of interest) or the value 1 (if the unit belongs to). If we add up the
values of the population, we obtain the size number A.
N

In other words, A can be considered the equivalent of Y = ∑ Yi We already

i =1
discussed.
_
Y
Similarly, P = A / N can be considered as the equivalent of Y= Thus, one can use
N
the previous formulas.

To estimate the proportions in the case of simple random sampling, we use the following
formulas:
^
a NPQ
p= p= S2 =
n and
N −1
The population variance is PQ. This is the variance of the distribution after having coupled
the value 1 or 0 to the element if it is or not in the class of interest.

The true variance of a proportion from a sample of size n is given by:

^
N − n PQ
σ ( P) =
2

N −1 n
The estimate of this variance derived from a single sample of size n is given by:
^ pq ⎛ N − n ⎞
s ( P) =
2
⎜ ⎟
n −1 ⎝ N ⎠
In this case, the relative standard error (coefficient of variation) of a mean and standard error
of an estimated population data are given by the following formulas:

^
Q ⎛ N −n⎞
CV ( P) = ⎜ ⎟
Pn ⎝ N − 1 ⎠

q ⎛ N −n⎞
cv( p ) = ⎜ ⎟
p(n − 1) ⎝ N ⎠
And

16
^
PQ ⎛ N − n ⎞ ^ pq ⎛ N − n ⎞
S ( A) = N ⎜ ⎟ s ( A) = N ⎜ ⎟
n ⎝ N −1 ⎠ (n − 1) ⎝ N ⎠
Note that the error the number size is equal to that of the average.
The confidence interval for the proportion, assuming that the sample proportion follows a
normal distribution, is equal to:
pq ⎛ N − n ⎞
p±t ⎜ ⎟ where the value of t depends on the desired confidence level.
n ⎝ N ⎠
Illustration

II.2. Determining the sample size

One of the first questions to the statistician when designing a survey concerns the sample size
necessary for estimating the population parameter with a desired accuracy. The decision on
sample size for the survey is very important because a size too large, leads to a waste of
resources while a too small size reduces the reliability of results.

It is important first to recall the two types of estimation error:

E = Absolute error of the estimate. E is measured in the same unit as the variable.

Example, E = 10 000 RWF, E = 15 individuals, E = 5 acres,

RE = Relative error of the estimate. E is expressed as a proportion (or percentage) of the true
value of the parameter that is estimated. For example, if E = 5 hectares and the true parameter
value is 100.

Then ER = 5 / 100 = 0.05 or 5%.

Note:

The confidence level indicates the probability for obtaining the expected degree of accuracy
for the calculated sample size n. For example, a confidence level of 95% means that, except
for a low probability of 5%, we can be sure at 95% that the expected accuracy will be reached
with the calculated value for n. This means that the level of acceptable risk that the true
parameter value is outside the limits specified in the confidence limits is 5%.

a) Required sample size for limits estimation of population mean

In the case of large samples with known σ, we establish the following result on the sampling
error when the sample mean is used to estimate the population mean:

• There is a probability equal to (1-α) that the value of the average sample generates a
sampling error ≤ to zα/2 σ .

17
• And as σ = σ/√ n, we can rephrase this result as follows: there is a probability equal
to (1-α) that the value of the average sample generates a sampling error ≤ to
zα/2(σ/√n).

This quantity zα/2 (σ/√ n) is the margin of error. So, the margin of error is determined both by
the values of zα/2, of σ and the sample size n.

When the confidence coefficient 1-α is selected, the value zα/2 can be obtained using statistical
tables and σ being known, one can determine the sample size n required to obtain the
predefined margin of error E.

More samples are larger; more the margin of error is low and better is the accuracy of the
estimate.

Given E the desired margin of error,

σ z σ z 2α / 2σ 2
E = zα / 2 and n = α /2 thus, n=
n E E2

In this formula where n is the minimum size that satisfies the condition imposed to the margin
of error, the value E is the margin of error that the user is willing to accept for a given
confidence level, and the value of zα/2 corresponds to the confidence level used to perform
limits estimation. Although the user has a choice, it should be noted that the confidence level
of 95% is the most used, with the corresponding value zα/2 equal to 1.96.

Applying the formula given above assumes that the standard deviation σ of the population is
well known. Where σ is not known, the three following methods allow obtaining an initial
value of σ.
1) Use the sample standard deviation obtained with a previous sample having similar
characteristics.
2) Use a pilot study to select a preliminary sample and use the deviation from this sample
as the initial value of σ.
3) Use intuition to evaluate σ by estimating the extent of the population (maximum value
minus minimum value), which is then divided by four. The extent divided by four is
often considered a good estimate of the standard deviation σ.

Illustration

b) Sample size required for limits estimation of the proportion of a population with an
absolute error E

The margin of error associated with the estimate of the proportion of the population is zα/2 σ ,
p(1 − p)
with σ =
−
p n
It is bases on the value zα/2 the proportion p and sample size n.

18
Given E the desired margin of error,

p(1 − p ) ( zα / 2 ) 2 p (1 − p )
E = zα / 2 so, n=
n E2
In this formula, the margin of error E must be specified by the user and it is in most cases less
than or equal to 0.10. The user also specifies the confidence level and therefore the
corresponding value zα/2. Finally, the application of the formula for determination of n
requires an initial value of the proportion of the population p. It is obtained by one of the
following four ways:

1. Use the sample proportion obtained from a previous sample having similar
characteristics.

2. Use a pilot study to select a preliminary sample and use the proportion of that
sample as the initial value of p.

3. Use intuition on the assessment of the value of p.

4. If no information is available, use the p-value = 0.50. Indeed, the sample size is
proportional to the quantity p (1-p) and the greatest value of this quantity obtained
when p = 0.50 ensures that all estimated proportions satisfy the conditions imposed to
the margin error.

Illustration

II.3. Selecting a sample

1. Sampling frame

For determining a sample, we need a sampling frame. The sampling frame is a list of all
individuals in the population, without omission or duplication, so that the probability of
selection of each individual is known in advance. Before drawing a sample, we need to make
sure that the frame is the best possible.

2. Selecting a simple random sample from a population

a) Using a table of random numbers

The practical procedure to select a simple random sample is to choose the units, one at a time,
so that the remaining units in the population have the same probability of being selected.
This selection is done using a random number table that presents sets of random digits
classified into groups in horizontal order and vertical order.

To select a set of random numbers, one can start from any point in the table. In addition, after
choosing the first digit, we can do by going in any direction: towards the bottom or top of the
column to the right or left of the line

19
Illustration

b) Algorithm of systematic sampling for equal probability

The selection of a simple random sample can be tedious when the number of units to choose
is great. To get a 5% sample of 10000 individuals (N), it will be necessary to choose 500
random numbers (n) in a table. In practice, we use a different method that involves obtaining
a sampling interval (I) N / n = 20 for the case, then choose a random number between 1 and
20 to serve as a random starting point. Finally, select the frame after every 20th individual
posterior element.

If the elements of the population are sorted in an almost random, ie with little correlation
between successive elements, the results of systematic sampling will agree with those of
simple random sampling. Experience shows that, generally, the two methods give results with
approximately the s The systematic sample has a sampling error often a bit lower, since it
ensures that the dispersal of One can make use of simple random sampling formulas for
evaluating the reliability of estimates of a systematic sample.

General procedure for selecting a systematic sample

The procedure of drawing a systematic sample includes the following steps:

1. Assign serial numbers 1 through N to the population units (N is the total number of
individuals in the population).

2. Calculate I = N / n, the sampling interval (n being the size of the sample)

3. Choose a random number between 0 and I from the table of random numbers. This number
is called the random starting point, expressed by R. If using a calculator that gives random
numbers between zero and one, multiply the random number by the value of I to obtain a
random number between zero and I. do not round off.

4. Begin the series of cumulative numbers starting with R. To this figure add I to determine
the second number in the series. Then add I to the second number to get the third, and so on.

5. Stop when the cumulative last number exceeds N. This must happen after accumulating n
numbers.

6. Go back to all the cumulative numbers obtained, rounding up them.

7. On the list of the population unites, circle the item numbers that correspond to the
determined integers. These are the units that constitute the survey sample.

Generalization:

The jth sample unit (Sj) in the population can be selected as follows:

S j = R + ( j − 1) I With j = 1, 2... n

20
where: Sj = selected number of jth individual in the population
R = random starting point;
I = drawing line (interval)
n = sample size

Illustration (Karongi case study)

CHAP. 3. STRATIFIED SAMPLING

1. Principle

Where the population embraces a number of distinct categories, the frame can be organized
by these categories into separate "strata". Each stratum is then sampled as an independent sub-
population, out of which individual elements can be randomly selected.

Stratified random sampling is a method in which the elements of the population are divided
into homogeneous groups, and then selecting a simple random sample within each group. This
group setting process is called stratification and groups called are called strata. Strata may
reflect country's regions, highly or less populated areas, various ethnic groups, religious
groups or various other groups.

In stratification, we group similar components together for having the variance within the
strata low, at the same time, It is important that the means from different strata can be
different as much as possible.

The initial population size N is divided into L groups G1, G2, ..., Gh, ..., GL of respective sizes
N1, N2, ...., Nh, ..., NL such as:

L
N = ∑ Nh
h=1

Note that the letter h is used to identify the strata so that, if L strata are determined, h ranges
from 1 to L.

In stratified sampling, the probabilities of selection can be either equal in all groups, or
different. In stratified sampling, all items in particular strata have the same probability of
being selected. The draw of sampling units, a count of selected units, distribution and
supervision of fieldwork and, in general, the entire administration of the survey is greatly
simplified. However, the procedure requires a prior knowledge of the sizes of strata, that is to
say, the number of sampling units in each strata and the existence of a sampling frame that
can be used to draw the sample within each strata.

2. Notation

we use the same notation as for simple random sampling, except that there will be a sub-index
to indicate a particular strata when referring to information relating to that strata. The sub
index h denotes strata and sub-index i, the unit within the strata. As in the simple random
sampling, the capital letters refer to population values, while lowercase letters refer to values
corresponding to samples.

21
Illustration

3. Estimation

• Mean estimation

The mean of population can be expressed in terms of numbers within strata, as follows:

_
1 L Y L
Y = ∑ Yh = Where the population-number Y = ∑ Yh
N h =1 N h =1

_ _
1 L _

Since each Yh can be expressed as Nh Y h we can write Y = ∑ Nh Y h

N h =1
_
1 nh
where
Yh = ∑ Yi
N h i =1
Within each stratum, we use simple random sampling. As we know, for simple random
_ _
sampling, y is an estimator of Y . This suggests that for the stratified sampling, one can
obtain an estimate of the population–mean by replacing the mean of each stratum by the
corresponding estimate of the sample. That is to say the mean of elements of the samples from
the first strata provides an exact estimate of the exact mean of the first strata, and the mean of
elements of the sample from the second strata provides a correct estimate of the mean of that
strata, etc.
_
In symbol, the estimate of the population-mean from a stratified sample is noted y st (st
means stratified).

Mathematically, this estimate is given by the following formula:

_
1 L _
y st = ∑ N h y h
N h =1

Illustration

• Estimation of population total

As in the simple random sampling, we get the estimate of the population size by
multiplying the estimate of mean by the number of elements in the population:

22
^ _ L _ L
Nh
Y st = N y st = ∑ N h y h = ∑ yh
h =1 n
h =1 h

• Proportion estimate

The procedure of estimating the population-proportion is similar to that of the mean, except
that the possible values of Yi are 0 and 1. In this case,
_
1 Nh
Yh = Ph =
Nh
∑Y
i =1
hi where Yhi= 0 or 1

For stratified random sampling, the proportion Pst is

1 L 1 Nh
Pst = ∑ N h Ph
N h =1 where
Ph =
Nh
∑Y
i =1
hi

23
CHAPITRE IV. SONDAGE PAR GRAPPES

La présente méthode de tirage consiste à diviser d’abord la population en groupes d’éléments

séparés, appelés grappes, de manière à ce que chaque élément de la population appartienne à
une et une seule grappe. L’une des applications principales de l’échantillonnage par grappes
est l’échantillonnage des régions géographiques, où les grappes sont notamment les quartiers
d’une ville ou d’autres régions bien définies. Au Rwanda, les grappes sont notamment les
Villages/Imidugudu ou les Cellules.

Après ce regroupement, un échantillon aléatoire simple des grappes est ensuite sélectionné.
L’échantillonnage par grappes tend à fournir de meilleurs résultats lorsque les éléments
contenus dans les grappes sont hétérogènes et, dans le cas idéal, chaque grappe est une
représentation à petite échelle de la population entière. La valeur de l’échantillonnage par
grappes dépend du degré de représentativité de la population entière, de chaque grappe. Si
toutes les grappes sont semblables dans ce sens, échantillonner un petit nombre de grappes
fournira de bonnes estimations des paramètres de la population.

Après la sélection des grappes échantillons, celles-ci déterminent à leur tour les unités à
inclure. La détermination peut s’effectuer de deux façons suivantes :

a) l’échantillon peut comprendre toutes les unités à l’intérieur des grappes choisies. Cette
procédure est appelée généralement « sondage par grappes à un degré) ».

b) on peut tirer un sous-échantillon des unités des grappes échantillonnées. Ceci s’appelle
« sondage par grappes à plusieurs degrés » ou simplement « sondage à plusieurs degrés ».

Il y a deux raisons pour l’emploi du sondage par grappes. Fréquemment, il n’existe pas de
base de sondage adéquate à partir de laquelle effectuer le tirage des éléments de la population,
et le coût de la construction d’une telle base de sondage serait énorme.

D’autre part, une telle base pourrait exister, mais l’épargne dans les coûts des travaux de
terrain obtenue avec le sondage par grappes peut faire de cette méthode une alternative plus
efficace qu’un échantillon aléatoire simple.

Dans la plupart de cas pratiques, un échantillon d’un effectif quelconque d’unités choisies
aléatoirement aura une variance plus faible qu’un échantillon de la même taille choisi en
grappes. Cependant, quand il faut balancer le coût contre la fiabilité, l’échantillon par grappes
pourrait être plus efficace.

Quoique les unités d’analyse ne soient pas choisies directement, la probabilité de sélection
d’une grappe et de chaque unité à l’intérieur de celle-ci est connue préalablement ; ainsi donc,
le sondage par grappes satisfait le critère de sondage probabiliste.

24
IV.1. Sondage aréolaire

Le sondage aréolaire (area sample) est un échantillonnage à plusieurs degrés, dont l’avant
dernier degré s’appuie sur des aires géographiques faisant office de grappes. Il est très utile
quand une ou toutes les deux conditions suivantes existent :

a) quand on ne dispose pas d’inventaires complets d’unités de logement ou autres unités

d’observations, mais on dispose des cartes des entités géographiques comportant une quantité
suffisante d’informations. La liste de ces cartes constitue alors une base de sondage dans la
zone considérée.

b) quand les frais de déplacement sont élevés, c’est-à-dire il y a un coût élevé associé aux
déplacements des enquêteurs d’une unité-échantillon choisie au hasard à une autre choisie
aussi au hasard.

Pour le tirage d’un échantillon aréolaire, on utilisera les pâtés d’une ville comme illustration
(dans les zones rurales, on peut utiliser des segments administratifs ayant des limites
identifiables). On pourra procéder de la manière suivante :
1) obtenir un plan raisonnablement fidèle de la ville, qui donne le plus possible de détails sur
les pâtés. Si le plan n’est pas récent, il faut prendre certaines dispositions au moyen des
renseignements au niveau local pour l’actualiser ;

2) numéroter tous les pâtés successivement, en plaçant les numéros directement sur le plan, un
système serpentin est conseillé pour ne pas omettre un pâté ;

3) choisir un échantillon aléatoire simple ou systématique de pâtés ;

4) interviewer tous les ménages dans les pâtés-échantillons ou sélectionner quelques ménages
à l’intérieur des pâtés-échantillons, soit avec une table de nombres aléatoires, soit par sondage
systématique en utilisant les numéros affectés aux unités de logement.

IV.2. Estimation

Il existe une analogie étroite entre le sondage par grappes et le sondage stratifié. Dans les
deux cas, on regroupe les éléments individuels ensemble avant de choisir l’échantillon. La
différence est que dans le sondage stratifié, il faut prélever un échantillon à l’intérieur de
chaque strate, tandis que dans le sondage par grappes, on prélève un échantillon des grappes
avant de choisir les éléments individuels dans les grappes-échantillons. Il convient de noter
toutefois que la combinaison des deux types de sondages est souvent utilisée.

• Notation

Considérons un plan de sondage à deux étapes/degrés où on choisit aléatoirement les Unités

Secondaires de Sondage (USS) dans des grappes choisies (Unités Primaires de Sondage ou
UPS). On a les notations suivantes :
N = effectif des UPS dans la population.
n = nombre de UPS choisies dans l’échantillon.

M = ∑ Mi
N

= effectif des unités élémentaires dans la population, et

i =1

25
Mi = effectif des USS dans le ième UPS (grappe), où i = 1, …, N.

_
1 N
M =
N
∑M
i =1
i = nombre moyen de USS par UPS dans la population ou taille moyenne

de la grappe.

m = ∑ mi
n

= effectif des unités de 2ème degré (USS) dans l’échantillon et mi =

i =1
nombre de USS choisies pour l’échantillon dans la ième UPS, avec i = 1,…, n.

_
1 n
m = ∑ mi = nombre moyen de USS par UPS-échantillon.
n i=1
Yij = valeur d’une caractéristique pour la jème USS dans la ième UPS-population.

Yi = ∑ Yij = valeur totale (effectif) de la caractéristique Y dans la ième UPS-population.

j =1

Y = ∑ Yi
N

= valeur totale (effectif) de la caractéristique Y dans la population.

i =1

_
1 Mi
Yi =
Mi
∑Y
j =1
ij = valeur moyenne de la caractéristique dans la ième UPS.

_
1 N
Yc = ∑ Yi = valeur moyenne de la caractéristique par UPS.
N i=1
_
Y
µ =Y = = moyenne-population.
M
mi

yi = ∑ yij = valeur totale (effectif) de la caractéristique Y dans la ième UPS échantillon.

j =1

yij = valeur de la caractéristique pour la jème USS-échantillon dans la ième UPS-échantillon.

26
_
1 mi
yi =
mi
∑y
j =1
ij = moyenne-échantillon de la caractéristique dans la ième UPS-

échantillon.

• Estimation des moyennes et des effectifs

Les formules présentées au chapitre précédent (Stratification) pour estimer les moyennes-
populations sont valables quand l’unité de sondage est identique à l’unité d’analyse. Or, une
caractéristique particulière du sondage par grappes est que l’unité de sondage (au 1er degré)
n’est pas l’unité d’analyse.

Considérons un plan de sondage à deux étapes/degrés dans lequel les unités de la deuxième
étape sont les unités d’analyse ; on choisit n grappes parmi N, par sondage aléatoire simple ;
puis on choisit mi unités dans la ième UPS par sondage aléatoire simple avec i = 1, …, n.

A l’intérieur de la ième grappe, la moyenne-population est donnée par :

_
1 Mi
Yi =
Mi
∑Y
j =1
ij

Les unités à l’intérieur de la grappe étant choisies par sondage aléatoire simple, on sait qu’on
peut estimer cette moyenne sans biais, à l’aide de la formule suivante :
_
1 mi
yi =
mi
∑y
j =1
ij

Ces estimations des moyennes au niveau de la grappe pour les n grappes-échantillons doivent
être combinées pour estimer l’effectif-population global (Y) et la moyenne-population
_
Y
µ =Y =
M
Pour l’effectif-population de la caractéristique Y, on construit d’abord un estimateur non
biaisé de Yi, l’effectif de la caractéristique Y dans la ième UPS, qui est donnée par :
^ Mi
Yi = yi
mi
Ainsi, un estimateur non biaisé pour l’effectif-population est :
^ n ^ n n _ n mi
N N Mi N N Mi
Y =
n
∑
i =1
Yi =
n
∑i =1 mi
yi =
n
∑
i =1
M i yi =
n
∑
i =1 mi
∑y
j =1
ij

_
Y ∑Y i
µ =Y = = i =1
N

∑M
De la même façon, on peut estimer la moyenne-population M
i
i =1

27
_
Un estimateur de Y est :
^ _ mi
N n
Mi yij ^ n

∑ ∑m
N
µ=y=
n i =1
^
j =1
où M =
n
∑Mi =1
i
M i

Il importe de noter que pour l’estimation d’une proportion, la formule est la même, sauf que
l’on considère seulement les yij = 1, c’est-à-dire les unités possédant l’attribut.

Illustration

IV.3. Sondage à plusieurs degrés

On est par exemple dans une ville où l’on doit procéder à une enquête par grappes à plusieurs
étapes/degrés. La ville est constituée de :

Un quartier = UPS (Unité Primaire de Sondage) ;

Un bloc = USS (Unité Secondaire de Sondage) ;
Un immeuble = UTS (Unité Tertiaire de Sondage) ;
Des ménages = UQS (Unité Quaternaire de Sondage).

Parallèlement aux sondages à probabilités égales, il existe des plans de sondage où les
individus ont des probabilités d’inclusion inégales. Généralement, les tirages stratifiés et les
tirages à plusieurs degrés sont des tirages où les individus de la base de sondage n’ont pas
tous la même probabilité d’être sélectionnés. Une sélection avec probabilité proportionnelle à
la taille signifie, par exemple, qu’une grappe qui est 5 fois plus grande qu’une autre aura 5
fois de probabilité de tomber dans l’échantillon.

Comme résultat du tirage à plusieurs degrés, la possibilité que n’importe quelle unité de la
dernière étape soit comprise dans l’échantillon est le produit des probabilités de sélection de
toutes les toutes les étapes effectuées.

L’illustration du sondage à plusieurs degrés nous amènera au tirage systématique déjà évoqué,
mais avec des probabilités égales. Pour le cas présent, l’algorithme du tirage systématique est
appliqué avec des probabilités inégales.

Calcul des probabilités et facteurs d’extrapolation

1. Probabilité de tirage d’une UPS (grappe), au 1er degré

M hi
P1h = ah
Mh
avec Mh = nombre de ménages dans la strate h

ah = nombre d’UPS (grappes) tirées dans la strate h

28
Mhi = nombre de ménages dans l’UPS (grappe) i de la strate h

2. Probabilité de tirage d’un ménage, au 2ème degré

mhij
P2 h =
M hi
avec mhij = nombre de ménages tirés dans l’UPS (grappe) i de la strate h

3. Probabilité globale de tirage d’un ménage dans la strate h

Ph = P1h .P2 h
4. Facteur d’extrapolation dans une strate

Wh = 1 / Ph = 1 / P1h .P2 h
5. Probabilité de tirage d’un ménage dans toutes les strates (niveau national, par
exemple)

P = ∏ Ph
h

6. Facteur d’extrapolation dans toutes les strates (niveau national, par exemple)

W = 1 / P = 1 / ∏ Ph
h

EXERCICE : Cas de l’enquête de Karongi

29
CHAPTER VI. SAMPLING IN TIME (Panel)

Taking into account the time will add a dimension to the issue of the survey. The difference
between the values of a variable Y on the individuals of a given population is likely to be not
only spatial but also temporal. Indeed, the time will intervene in the definition of two
fundamental concepts: population and variables. No one can deny that populations evolve
over time, as the variables are affected by time.

If we look at a complex parameter ∆ that compares two similar parameters, but set at two
different times t and t +1. If both parameters are noted θt and θt+1 (∆ can be the difference
θt+1- θt or the ratio θt+1/θt).

One question will be raised about the reference population at time t Ωt or involve both
population Ωt (where we define θ t) and population Ωt +1 (where we θ t +1)
This duality of approaches will impact on the choice of sampling method, but it may also
reconcile the two by opting for a rotating sample plan.

If we study a population Ωt defined at time t and we measure the changes over time in this
population, we are in presence of a longitudinal survey. In another case, when the study
intends to describe the status of a population at one point, we talk about a cross-sectional
survey.

Whether the approach is longitudinal or transverse, we can use three types of surveys:
samples drawn independently at each date, panels and rotating samples. However, we often
use panels or rotating samples.

VI.1. Panels in longitudinal approach

When attempting to measure the evolution of a parameter over time, the population being that
of the original date, it often happens that one chooses to take a sample from the original date,
and follow the individuals in the sample as long as it is relevant for the purposes of the study.
Such a sample when individuals are interviewed at least twice is called a panel.

Panels can be used because the parameter we want to estimate explicitly involves the
individual developments of the variable of interest. Example, for calculation of monthly price
indexes, it is necessary to establish a range of products to estimate the index.
The establishment of a panel can be done to minimize errors of observation which necessarily
affect the results of surveys.

IV.2. Panels in transverse approach

Using a panel seems inconsistent with the estimation of a parameter to the current date.
Indeed, a panel is a sample that represents the population set to its drawing. To move from an
initial population (old) to the current one (contemporary),we need to take into account the
new individuals.

To move from the panel to the transverse sample, we must ensure that each individual of the
current population can be sampled. The principle is to avoid probabilities of zero selection.
For this, there are three methods:

30
• Making a stratum (or strata) gathering new individuals, defined as individuals who belong to
the current population without being in the initial population. Then, we draw from this
stratum to form the necessary complement. This method has the major drawback of being
based on the knowledge of all new individuals.

• Either you draw, regardless of the panel, a sample in the cross-sectional population. Thus,
each individual will have a new non-zero probability of inclusion in the final sample.

• Either we use a combination "natural" individuals and we decide to investigate the entire
group if and only if it contains at least one individual of the panel. This methodology known
as "indirect sampling" is not infallible, but significantly reduces the problem. For example, to
conduct a physical survey of individuals, as individuals are grouped into units or households,
they may decide to investigate the current date all the people who are in housing or
households that contain at least one individual panel.

IV.3. Ratational Sampling

a) Principle

The rotary rotational sampling consists of taking regular (quarterly or annual) samples
panelized for a given period, common to all panels. Thus, at each survey date (called a
"wave"), a panel enters the system while a panel comes out.

The main advantages of the rotary sampling are those of the panel, namely the gain of
accuracy in estimating the changes and the reduction of the observation error. However, this
sampling has some drawbacks; including difficulties in monitoring field where a change of
panel address of individuals (this is sometimes called "tracing" or "tracking" of individuals).

b) Longitudinal exploitation

In times of change between dates t and t +1, assume that each panel must be interviewed four
times, eg every quarter for one year. The sample taken at the time α will be investigated to
waves α, α +1, α+ 2 and α +3.

c) Transverse exploitation

In this approach, it is estimated, for example, the total Tt where t is the current date. At each
date t, we practice an inference on the entire population Ωt. Regarding the protocol for the
collection, we follow all the individuals of the panel in time and we question all individuals
living in the same housing or household as the sample person.

Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
No ratings yet
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
27 pages
Audit Sampling Techniques Guide
100% (1)
Audit Sampling Techniques Guide
16 pages
Lecture Notes - Prob and Stat
No ratings yet
Lecture Notes - Prob and Stat
229 pages
Stats Notes
No ratings yet
Stats Notes
7 pages
Introduction to Statistics Guide
No ratings yet
Introduction to Statistics Guide
6 pages
The Nature of Probability and Statistics
No ratings yet
The Nature of Probability and Statistics
22 pages
FDA Module 1 Reading Material
No ratings yet
FDA Module 1 Reading Material
23 pages
Ch.1: Introduction To Business Statistics
No ratings yet
Ch.1: Introduction To Business Statistics
22 pages
To Statistics
No ratings yet
To Statistics
85 pages
Ayesha Ayub - 29883 - Sta410
No ratings yet
Ayesha Ayub - 29883 - Sta410
9 pages
Introduction - To - Business - Statistics (Compatibility Mode)
No ratings yet
Introduction - To - Business - Statistics (Compatibility Mode)
22 pages
Statistics
No ratings yet
Statistics
61 pages
Stat 2 PDF
No ratings yet
Stat 2 PDF
41 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
248 pages
Data Management (Introduction)
No ratings yet
Data Management (Introduction)
18 pages
Business Data Collection Guide
No ratings yet
Business Data Collection Guide
23 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
5 pages
Statistics and Statistical Thinking
No ratings yet
Statistics and Statistical Thinking
6 pages
Reviewer in Statistics
No ratings yet
Reviewer in Statistics
7 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
71 pages
MMWChapter4 6
No ratings yet
MMWChapter4 6
66 pages
Intro to Statistics & Sampling
No ratings yet
Intro to Statistics & Sampling
41 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
37 pages
Biostatistics for Medical Research
No ratings yet
Biostatistics for Medical Research
164 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
67 pages
Mth143: Business Statistics Lesson One: Introduction and Data Collection
No ratings yet
Mth143: Business Statistics Lesson One: Introduction and Data Collection
3 pages
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
No ratings yet
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
129 pages
Probability and Statistics Lesson 1 2
No ratings yet
Probability and Statistics Lesson 1 2
47 pages
APPLIED STATISTICS FOR BUSINESS AND ECONOMICS Midterms Reviewer
No ratings yet
APPLIED STATISTICS FOR BUSINESS AND ECONOMICS Midterms Reviewer
23 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Topic 5 Data Management (Statistics)
No ratings yet
Topic 5 Data Management (Statistics)
116 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
28 pages
Module 7
No ratings yet
Module 7
49 pages
Statistics and Data Management
No ratings yet
Statistics and Data Management
8 pages
Statistics for Decision Making
No ratings yet
Statistics for Decision Making
4 pages
Chapter 1 Statistics
No ratings yet
Chapter 1 Statistics
41 pages
Statistical Tools Complete Notes
No ratings yet
Statistical Tools Complete Notes
68 pages
Probability & Statistics Basics
No ratings yet
Probability & Statistics Basics
40 pages
Chapter 1. The Nature of Probability and Statistics
No ratings yet
Chapter 1. The Nature of Probability and Statistics
5 pages
Statistics Notes Orig
No ratings yet
Statistics Notes Orig
11 pages
2024 Population Ecology
No ratings yet
2024 Population Ecology
120 pages
Basic Statistics Data Management & Sampling GED0103
No ratings yet
Basic Statistics Data Management & Sampling GED0103
36 pages
C1 STS
No ratings yet
C1 STS
3 pages
Basic Concepts in Statistics
100% (1)
Basic Concepts in Statistics
19 pages
Benjamin Alvarez Dillena JR., Ed.D
No ratings yet
Benjamin Alvarez Dillena JR., Ed.D
51 pages
Statistics
No ratings yet
Statistics
52 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
STATISTICS
No ratings yet
STATISTICS
10 pages
Monica
No ratings yet
Monica
43 pages
Introduction to Medical Statistics
No ratings yet
Introduction to Medical Statistics
13 pages
Statistics and Probability
No ratings yet
Statistics and Probability
22 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
Stat PT 1
No ratings yet
Stat PT 1
23 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
RM Final Slides
No ratings yet
RM Final Slides
269 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Homework 5 Answers
No ratings yet
Homework 5 Answers
126 pages
Statistik 1
No ratings yet
Statistik 1
17 pages
Week 1 Chapter 1 - Introduction To Statistics and Sata Collection
No ratings yet
Week 1 Chapter 1 - Introduction To Statistics and Sata Collection
28 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
49 pages
Biostatistics Nurses HND
No ratings yet
Biostatistics Nurses HND
125 pages
CHP 7 Study Guide
100% (2)
CHP 7 Study Guide
17 pages
Ordinal: Ordinal Data Have Order, But The Interval Between Measurements Is Not Meaningful
No ratings yet
Ordinal: Ordinal Data Have Order, But The Interval Between Measurements Is Not Meaningful
6 pages
Sampling and Sample Size Determination
100% (1)
Sampling and Sample Size Determination
42 pages
Andrade 2020 Sample Size and Its Importance in Research
No ratings yet
Andrade 2020 Sample Size and Its Importance in Research
2 pages
Session 1 Part 1
No ratings yet
Session 1 Part 1
6 pages
Enyew Tarko
No ratings yet
Enyew Tarko
28 pages
Applied R&M Manual For Defence Systems Part D - Supporting Theory
No ratings yet
Applied R&M Manual For Defence Systems Part D - Supporting Theory
56 pages
Stats - 112 by Kuyajovert
No ratings yet
Stats - 112 by Kuyajovert
70 pages
Msb14e PPT ch06
No ratings yet
Msb14e PPT ch06
59 pages
1212tesfay Berhe Research Proposal
No ratings yet
1212tesfay Berhe Research Proposal
21 pages
Business Statistics
No ratings yet
Business Statistics
28 pages
Research Methods: Foundation Course On
No ratings yet
Research Methods: Foundation Course On
31 pages
Statistics for Geoscientists
No ratings yet
Statistics for Geoscientists
39 pages
Chapter 7
No ratings yet
Chapter 7
24 pages
AI in TQM for MSMEs: Efficiency Boost
No ratings yet
AI in TQM for MSMEs: Efficiency Boost
13 pages
Module 5 PDF
No ratings yet
Module 5 PDF
15 pages
EIE2001 Lecture 6b Week 7
No ratings yet
EIE2001 Lecture 6b Week 7
10 pages
Pretest ch10
No ratings yet
Pretest ch10
7 pages
STS Reviewer
No ratings yet
STS Reviewer
8 pages
SASA Reviewer
No ratings yet
SASA Reviewer
4 pages
Tests For The Difference of Two Hazard Rates Assuming An Exponential Model
No ratings yet
Tests For The Difference of Two Hazard Rates Assuming An Exponential Model
17 pages
Sigal As 2015
No ratings yet
Sigal As 2015
15 pages
FINAL Research
No ratings yet
FINAL Research
35 pages
Biostat Exam
No ratings yet
Biostat Exam
5 pages
Introduction To Statistics 1662031282
100% (1)
Introduction To Statistics 1662031282
936 pages
What Is Scientific Research and How Can It Be Done
No ratings yet
What Is Scientific Research and How Can It Be Done
10 pages
An Empirical Investigation Into Why Startups Resist Use of Digital Marketing
No ratings yet
An Empirical Investigation Into Why Startups Resist Use of Digital Marketing
15 pages