Chapter 1
Chapter 1
WRU
CHAPTER ONE
SAMPLING AND SAMPLING DISTRIBUTION
1.1. Sampling Theory: The Concept of Sampling
If one studies performance of freshman students in some college, the student is the sampling
unit.
Sampling frame: is the list of all elements in a population.
Examples: List of households.
List of students in the registrar office.
Sampling error – it is the difference between the population parameter and the observed probability
sample statistic.
Non-sampling error – it is an error that occurs in the collection, recording and computation of data.
Sampling with replacement – a sampling procedure in which sample items are returned to the
population; as a result, there is a possibility of their being chosen again in the sample.
Sampling without replacement – a sampling procedure in which sample items are not returned to the
population; as a result, none of these can be selected in the sample again.
Table 1.1 Summary of the difference between populations and Samples
Population Sample
Definition Collection of all items being dealt subjects of the population
Characteristics “Parameter” “Statistics”
Symbols Population Size = N Sample size = n
When studying characteristics of a population, there are many practical reasons why we prefer to select
portions or samples of a population to observe and measure.
Some of the reasons for sampling are:
1. The time to contact the whole population may be prohibitive.
2. The cost of studying all the items in a population may be prohibitive.
3. The physical impossibility of checking all items in the population. The populations of fish, birds,
snakes, mosquitoes, and the like are large and are constantly moving, being born, and dying.
4. The destructive nature of some tests. In the area of industrial production, steel plates, wires, and
similar products must have a certain minimum tensile strength. Each piece is stretched until it
breaks, and the breaking point recorded.
5. The sample results are adequate. Even if funds are available, it is doubtful the additional accuracy
of a 100 percent sample - that is, studying the entire population - is essential in most problems.
When selecting a sample, researchers or analysts must be very careful that the sample is a fair
representation of the population.
1.1.2 Types of sampling
Statistical sampling theory provides the methods used to solve applied problems. There are two types of
sampling based on how the sample is selected. These are: random sampling and non-random sampling.
Example: Assume Werabe University has 845 instructors. A sample of 52 instructors is to be selected from
that population. One way of ensuring that every employee in the population has the same chance of being
chosen is to first write the name of each instructor on a small slip of paper and deposit all of the slips in a
box. After they have been thoroughly mixed, the first selection is made by drawing a slip out of the box
without looking at it. This process is repeated until the sample size of 52 is chosen.
2. Systematic Random Sampling
The simple random sampling procedure may be uncomfortable in some research situations.
A random starting point is selected, and then every kth member of the population is selected.
First, k is calculated as the population size divided by the sample size. For example if we need to select 100
samples from a population of 2000, we would select every 20th (2,000/100) individual from the sample
frame; in so doing the numbering process is avoided. If k is not a whole number, then round down.
3. Stratified Random Sampling
When a population can be clearly divided into groups based on some characteristic, and then stratified
random sampling can be used to guarantee that each group is represented in the sample. The groups
are also called strata.
For example, college students can be grouped as full time or part time, male or female etc. Once
the strata are defined, we can apply simple random sampling within each group or strata to collect
the sample.
4. Cluster Sampling
A population is divided into clusters using naturally occurring geographic or other boundaries. Then,
clusters are randomly selected and a sample is collected by randomly selecting from each cluster.
Suppose you want to determine the views of residents in a particular state about state and federal
environmental protection policies. Selecting a random sample of residents in the state and personally
contacting each one would be time consuming and very expensive. Instead, you could employ cluster
sampling by subdividing the state into small units-either woreda or regions. These are often called
primary units. You could take a random sample of the residents in each of these regions and interview
them.
II. Non-Random/Non-Probability Sampling
Non-probability is also known as non-parametric sampling which characterize as:
1. There is no idea of population in non-probability sampling.
2. There is no probability of selecting any individual.
Each person or unit is connected with another through a direct or indirect linkage. This does not mean
that each person directly knows, interacts with, or is influenced by every other person in the network.
Errors in data acquisition - This type of error arises from the recording of incorrect responses. Incorrect
responses may be the result of incorrect measurements being taken because of faulty equipment, mistakes
made during transcription from primary sources, inaccurate recording of data because terms were
misinterpreted, or inaccurate responses were given to questions concerning sensitive issues such as sexual
activity or possible tax evasion.
Non-response error - Non-response error refers to error (or bias) introduced when responses are not
obtained from some members of the sample. When this happens, the sample observations that are collected
may not be representative of the target population, resulting in biased results.
Non-response can occur for a number of reasons. An interviewer may be unable to contact a person listed
in the sample, or the sampled person may refuse to respond for some reason. In either case, responses are
not obtained from a sampled person, and bias is introduced. The problem of non-response is even greater
when self-administered questionnaires are used rather than an interviewer, who can attempt to reduce the
non- response rate by means of callbacks.
Selection bias - Selection bias occurs when the sampling plan is such that some members of the target
population cannot possibly be selected for inclusion in the sample.
Counting Rules
The probability of occurrence of an outcome was defined as the number of ways the outcome occurs,
divided by the total number of possible outcomes. Often, there are a large number of possible outcomes,
and determining the exact number can be difficult.
In such circumstances, rules have been developed for counting the number of possible outcomes. This
section presents five different counting rules.
Permutation: In many instances you need to know the number of ways in which a subset of an
entire group of items can be arranged in order. The number of ways of arranging r objects selected
from n objects in order is: n!
Where n Pr =
n =total number of objects
(n−r)!
r =number of objects to be arranged
n! =n factorial= n(n – 1)…….(1)
P =symbol for permutations
Example: If you have six books, but there is room for only four books on the shelf, in how many
ways can you arrange these books on the shelf?
n! 6! 6∗5∗4∗3∗2 !
n Pr = = = =360
( n−r ) ! ( 6− 4 )! 2!
Combination: In many situations, you are not interested in the order of the outcomes but only in the
number of ways that r items can be selected from n items, irrespective of order. The number of ways of
selecting r objects from n objects, irrespective of order, is equal to:
Where
n!
n =total number of objects n C r=
r =number of objects to be arranged
r ! ( n−r )!
n! =n factorial= n(n – 1)…….(1)
Dept. of Management Page 7
Statistics for Management II Ch-1
WRU
n! 6! 6∗5∗4∗3∗2∗1
n C r= = = = 15
r ! ( n−r )! 4 ! ( 6− 4 )! 4∗3∗2∗1∗2∗1
Survey Error
Even when surveys use random probability sampling methods, they are subject to potential errors.
There are four types of survey errors:
Coverage error
Non-response error
Sampling error
Measurement error
A. Coverage Error
The key to proper sample selection is having an adequate frame. Remember that a frame is an up-to-date
list of all the items from which you will select the sample.
Coverage error occurs if certain groups of items are excluded from the frame so that they have no
chance of being selected in the sample.
Coverage error results in a selection bias. If the frame is inadequate because certain groups of items in
the population were not properly included, any random probability sample selected will provide only an
estimate of the characteristics of the frame, not the actual population.
B. Non-response Error
Not everyone is willing to respond to a survey. In fact, research has shown that individuals in the upper and
lower economic classes tend to respond less frequently to surveys than do people in the middle class.
Non-response error arises from failure to collect data on all items in the sample and results in a non-
response bias. Because you cannot always assume that persons who do not respond to surveys are similar
to those who do, you need to follow up on the non-responses after a specified period of time.
C. Sampling Error
As discussed earlier, a sample is selected because it is simpler, less costly, and more efficient to examine
than an entire population. However, chance dictates which individuals or items will or will not be included
in the sample.
Sampling error reflects the variation, or “chance differences,” from sample to sample, based on the
probability of particular individuals or items being selected in the particular samples.
When you read about the results of surveys or polls in newspapers or magazines, there is often a
statement regarding a margin of error, such as “the results of this poll are expected to be within ±4
percentage points of the actual value.”
This margin of error is the sampling error. You can reduce sampling error by using larger sample
sizes, although doing so increases the cost of conducting the survey.
D. Measurement Error
In the practice of good survey research, you design a questionnaire with the intention of gathering
meaningful information. But you have a dilemma here: Getting meaningful measurements is often easier
said than done.
Consider the following proverb:
A person with one watch always knows what time it is;
A person with two watches always searches to identify the correct one;
A person with ten watches is always reminded of the difficulty in measuring time.
Unfortunately, the process of measurement is often governed by what is convenient, not what is needed.
The measurements you get are often only a proxy for the ones you really desire.
Much attention has been given to measurement error that occurs because of a weakness in question
wording. A question should be clear, not ambiguous. Furthermore, in order to avoid leading questions,
you need to present questions in a neutral manner.
difference between a sample statistic and its corresponding population parameter. This difference is
called sampling error.
So far we have examined how samples can be taken from population. Using one of the already discussed
samples technique if we take several samples from a population, the statistics of we would compute for
each sample need not be the same and most likely would vary from sample to sample.
Sampling distribution of the sample mean; is A probability distribution of all possible sample
means of a given sample size. Sample means vary from sample to sample. The issue of how samples can be
taken from population is expected to be addressed in different sampling techniques.
Illustration 1
Suppose that a population has five elements (N = 5) 3, 6, 9, 12 and 15. If we draw samples of 3 (n = 3) 10
times. The following may be the elements in the sample.
Samples 3, 6, 9 3,6,12 3,6,15 3,9,12 6, 12, 15
3,9,15 3, 12, 15 6,9,12 6,9,15 9, 12, 15
For each sample we can complete the mean value (i.e., the sample statistics). The following table reveals
the mean value for each sample.
Samples Mean ( )
3, 6, 9 6
3, 6, 12 7
3, 6, 15 8
3, 9, 12 8
3, 9, 15 9
3, 12, 15 10
6, 9, 12 9
6, 9, 15 10
6, 12, 15 11
9, 12, 15 12
∑ ❑=90
Sampling Distribution: - is a probability distribution of all the values of sample statistics. We do have
sampling distribution of the mean, proportion etc.
Sampling Distribution of the mean: - is the probability distribution of the sample mean. To illustrate, we
have taken samples from a population and computed mean values for each sample is referred as sampling
distribution of the mean.
σ ❑=
σ
√n √ N−n
N −1
-------- For finite population.
A population is said to be infinite when it is not possible to list or count all the elements included in the
population, (i.e., when the elements are unlimited). Or, in the cases when the elements in the population
are limited, the population may be considered as infinite when the sample size is small and as rule of thumb
statisticians consider the population as infinite when n 5% of N. A population is said to be finite when
N n
N1
n > 0.05 N. The value is referred as finite population correction factor.
3. The sampling distribution of the mean is normally distributed regardless of the population from which it
is drawn.
Illustration 2:
The average lifetime of a light bulb is 3000 hours with a standard deviation of 696 hours. A simple random
sample of 36 bulbs is taken.
(b) What is the probability that the average life time in the sample will be between 2670.56 and 2809.76
hours?
(c) What is the probability that the average life time in the sample will be equal to or greater than 3219.24
hours?
(d) What is the probability that the average life time in the sample will be equal to or less than 3180.96
hours?
(e) How large of a sample needs to be taken to provide a 0.01 probability that the average life time in the
sample will be equal to or greater than 3219.24 hours
[Solution:] (a)
696
E X 3000, X 116,
n 36
(b)
P 2670.56 X 2809.76
2670.56 3000 X X 3000 2809.76 3000
P
116 X
116 116
P 2.84 Z 1.64 0.0482
(c)
3219.24 3000 X X 3000
P 3219.24 X P
116 X
116
P1.89 Z 0.0294
(d)
(e)
3219.24 3000 X X 3000
0.01 P 3219.24 X P
X X X
219.24
P Z
X
219.24 696 219.24
z 0.01 2.33 X
X n n 2.33
2
696 2.33
n 54.71
219.24
n 55
Finding
Areas under the Standard Normal Distribution Curve
For the solution of problems using the normal distribution a four step procedure is recommended.
i. Draw a picture
ii. Shade the area desired
iii. Find the correct figure in the table (the figure that is similar to the one you’ve drawn).
iv. Follow the directions given in the appropriate block of the procedure table to get the desired area.
The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
x
x n
Due to the Central Limit Theorem, the normal distribution has found a central place in the theory of
statistical inference.(Since, in many situations, the sample is large enough for our sampling distribution to
be approximately normal, therefore we can utilize the mathematical properties of the normal distribution to
draw inferences about the variable of interest.)
The rule of thumb in this regard is that if the sample size, n, is greater than or equal to 30, then we can
assume that the sampling distribution of `X is approximately normally distributed.
Dept. of Management Page 14
Statistics for Management II Ch-1
WRU
On the other hand: If the POPULATION sampled is normally distributed, then the sampling distribution of
`X will also be normal regardless of sample size. In other words, `X will be normally distributed with
mean m and variance s2/n.
Another sampling distribution that you will soon encounter is that of the difference between two sample
means. The sampling plan calls for independent random samples drawn from each of two normal
populations.
Suppose two populations of size N1 and N2 are given. For each sample of size n1 from first population,
compute sample mean x 1 and standard deviation x 1. Similarly, for each sample of size n2 form second
population, compute sample mean x 2 and standard deviation x 2.
For all combinations of these samples from these populations, we can obtain the sampling distribution of
the difference of two sample means ( x 1 - x 2). The mean and the standard distributions are given by: µx
1 - x 2= µ 1 - µ2
Since the standard error of a sampling distribution is the standard deviation of the sampling distribution, the
standard error of the difference between means is:
σ x 1- x 2=
√ σ 1 2 σ 22
n1 n2
+
Just to review the notation, the symbol on the left contains a sigma (σ), which means it is a standard
deviation. The subscripts σx 1 - x 2 indicates that it is the standard deviation of the sampling distribution of
x 1 - x 2.
We find the Z score by assuming that there is no difference between the population means.
Illustration 3:
In a study of annual family expenditures for general health care, two populations were surveyed with the
following results:
If the variances of the populations are σ12 = 2800 and σ22 = 3250, what is the probability of obtaining
sample results ( x 1 - x 2) as large as those shown if there is no difference in the means of the two
populations?
Solution
A value of z = 3.6 gives an area of .9998. This is subtracted from 1 to give the probability
P (z > 3.6) = .0002
The probability that x 1 - x 2 is as large as given is .0002.
The sample proportion ( ) is the point estimator of the population proportion p. The formula for
computing the sample proportion is:
x
¿
n
Where:
x = the number of elements in the sample that possess the characteristic of interest
n = sample size
The sample proportion ( ) is a random variable and its probability distribution is called the sampling
distribution of P . The sampling distribution of P is the probability distribution of all possible values of the
sample proportion P .
Illustration 4:
Consider a population of N = 5 given numbers 3, 6, 9, 12, & 15. Let’s take even numbers, the proportion of
even numbers is 2/5 = 0.4. Consider a samples of size 3 (n = 3) that are drawn from the population the
samples, sample proportions are given in table below.
Given table 1.3, probability distribution of sample proportion ( P ) is the Sampling distribution of the
proportion. Sampling distribution of the proportion is the probability distribution of all possible values of
the sample proportion ( P ).
1. The expected value of the sample proportion ( P ) is equal to the population proportion.
Symbolically: E ( P ) = P
2. Just as with the standard deviation of the sample means ( x ), the standard deviation of the sample
proportion ( P ) also depends on whether the population is finite or infinite. It follows that the
standard deviation of the sample proportion is:
N n p (1 P)
N1
P = N1 --- for finite population (i.e., n > 0.05 N)
p (1 P)
P = N1 --- for infinite population (i.e., n 0.05 N)
Illustration 5:
A new soft drink is being market tested. It is estimated that 60% of consumers will like the new drink. A
sample of 96 taste-tested the new drink.
(b) What is the probability that equal to or more than 70.4% of consumers will indicate they like the drink?
(c) What is the probability that equal to or more than 30% of consumers will indicate they do not like the
drink?
[Solution:] (a)
0.6 0.4
P 0.05
96
(b)
0.704 0.6 P p P 0.6
P0.704 P P
0.05 P
0.05
P2.08 Z 0.0188
(c) We need to compute the probability that less than 70% of consumers will indicate they like the drink?
Suppose two populations of size N1 and N2 are given. For each sample of size n1 from first population,
compute sample proportion P 1 and standard deviation P 1. Similarly, for each sample of size n2 form
second population, compute sample proportion P 2 and standard deviation P 2.
For all combinations of these samples from these populations, we can obtain the sampling distribution of
the difference of two sample proportions ( P 1 - P 2). The mean and the standard distributions are given by:
µ P1 - P 2 = µ -µ
P1 P 2 = (P1 – P2)
- P1 P2 = √ σ 2 1−σ 2 2 =
√ P 1(1 – P 1) P 2(1 – P 2)
n1
+
n2
If sample size n1 and n2 are large, that is, n1 ≥ 30 and n2 ≥ 30, the sampling distribution of the difference of
two sample proportions is clearly approximated by normal distribution.
Illustration 6:
10% of machines produced by Company A are defective and 5% of those produced by Company B are
defective. A random sample of 250 machines is taken from Company A and a random sample of 300
machines is taken from Company B. what is the probability that the difference in sample proportion is less
than or equal to 0.02?
Solution
We are given the following information:
1−2=
n1
+
√
P 1(1 – P 1) P 2(1 – P 2)
n2
=
0.1(0.90) 0.05(0.95)
250
+
300
=0.0228
√
The desired probability of the difference in a sample proportions is given by:
p( ( 1−2 ) ≤ 0.02)= p ¿
¿ p¿
¿ p[Z ≤−1.32]
¿ 0.0934
Hence, the desired probability for the difference P 1 - P 2 = 0.02 in sample proportion is 0.0934.