Chapter Three
Sampling and Sampling Distribution
Chapter Outline
Sampling theory
• Basic definitions
• The need for sampling
• Types of samples
Sampling Distributions
• Sampling Distribution of the mean.
• Sampling Distribution of the proportion.
• Sampling distribution of the difference b/n two means.
• Sampling distribution of the difference b/n two proportions.
1.1 Sampling Theory
1.1.1 Definition of terms
1. Population: Entire number of observations, individuals, objects, items, etc
Involves the totality of the elements being studied.
2. Sample: Is part of the population (subset of the population).
Certain portion of the population which believed as the representatives of the
population.
3. Element: Each individual unit of the population:
Also called building blocks of the population.
3. Census: Involves gathering of information from all elements of the population.
Enumeration or measurement of ever individual or item of the population.
5. Sampling: - Involves gathering information from some part or elements of the population.
6. Population parameter (parameter): - Involves any measurement or characteristics taken from the
population. E.g. µ, δ, δ2
7. Sample statistics: - It is the measurement or characteristics taken from the sample. E. g. ,S, S2
1.1.2 Why we need Sampling
To contact the whole population would be time consuming.
Studying all the items of the population may be costly.
Physical impossibility of checking all the items of the population. E.g. fish, birds, etc...
Destructive nature of some tests
1
The sample results are adequate.
1.1.3 Types of Sampling/ Sampling Techniques
There are two types of sampling techniques;
1. Random sampling (probability sampling)
2. Non random sampling (non-probability sampling)
Whether a sample is random or non- random sampling, it is based on how it is selected.
1. Random Sampling (Probability Sampling): is a technique in which every unit in the population
has a chance (greater than zero) of being selected in the sample.
With probability sampling, every element of the population has a known probability of being
included in the sample.
In this sampling technique, every individual has an equal opportunity for selection and this can be
achieved if the researcher utilizes randomization.
The advantage of using a random sample is the absence of both systematic and sampling bias. If
random selection was done properly, the sample is therefore representative of the entire population.
Random samples are obtained from a finite or infinite population.
If we have a finite population “n” and if we are asked to find the number of different samples of size
“r” that can be taken from the population, we use the following formula.
No. of samples of size n =
Where n= number of elements in the population
r = the proposed sample size
Example 1
Suppose there are 8 persons from whom we have to select samples of size 3. How many samples can be
selected? Ans: 56
Example 2
A group of 10 students are to be divided in to two groups of 5 each and seated at two tables. How many
different ways are there of dividing the 10 students? Ans. 252
There are four types of random sampling techniques. They are:
A. Simple random sampling
B.Systematic random sampling
2
C.Stratified random sampling
D.Cluster random sampling
A. Simple Random Sampling
If a sample is selected in such a way that every unit or element in the population has an equal
chance /probability being selected, it is simple random sampling.
It is the simplest method of random sampling techniques
Each and every element has equal chance of being selected.
No personal bias
Events are equally likely to occur
Example: Consider four peoples A, B, C, and D asking for vacation but only two of them can leave
at the same time.
Possible combination of two people =
AB, AC, AD, BC, BD, CD
A - Three times => P (A) = 3/6 = 1/2
B - Three times => P (B) = 3/6 = 1/2
C - Three times => P (C) = 3/6 = 1/2
D - Three times => P (D) = 3/6 = 1/2
B. Systematic Random Sampling
It is a method in which a random starting point is selected and then every nth member of
the population is selected.
Steps:
Arranging / ordering the values of the population.
Determine the Kth value, and the interval N/n,
Where N = the population size
n = desired sample size
Selecting the starting point by using simple random sampling.
C. Stratified Random Sampling
The population is divided into sub groups called strata; and a sample is randomly selected from
each stratum.
It is used when the population consists of a number of heterogeneous sub-population groups (stratum).
D. Cluster Random Sampling
It is important to reduce the cost of sampling from a scattered population in a large geographic area.
3
Population is divided into clusters using natural occurrence, geographic or other boundaries.
The clusters are randomly selected and a sample is collected by randomly selecting from each cluster.
2. Non-Probability(Non-random) Sampling
This technique does not allow the random procedure. Here, each element doesn’t have an equal
chance of being selected. A non random sample is selected by a procedure other than probability
considerations.
E.g. a sample may be chosen as per the convenience of the person selecting it. It may be selected by an
expert on his judgment.
There are three types of non-random sampling.
They are:
A. Quota sampling
B. Judgment sampling
C. Convenience sampling
A. Quota Sampling
Quota sampling involves the fixation of certain quotas. Suppose that in a certain territory we want to
conduct a survey of households. Their total number is 200,000. It is required that a sample of 1 percent,
which is 2000 household, is to be covered. We may fix certain controls. A sample of this size can be
selected subjected to the condition that 1,200 household should be from rural areas and 800 from the urban
areas of the territory. Likewise, another quota can be fixed. Of the 2000 sample households, rich
households should number 150, the middle class ones 650 and the remaining 1,200 should be from the
poor. If you select a sample in such a way by fixing quotas, it is called quota sampling.
B. Judgmental Sampling
The main characteristic of judgmental sampling is that units or elements in the population are purposively
selected. It is because of this that Judgment samples are also called purposive samples. When small sample
of a few units is to be selected, a judgment sample may be more suitable as the errors of judgment are
likely to be less than the random errors of a probability sample. However, when a large sample is to be
selected, the elements of bias in the selection could be quite large in the case of a judgmental sample.
C. Convenience Sampling
This method is based on the convenience of the statistician who is going to select the sample. This type of
sampling is also called accidental sampling as the respondents in the sample are included in it merely on
account of their being available on the spot where the survey is in progress.
1.1.4. Sampling and Non-Sampling Errors
A number of samples of the same size can be selected from a population. Different samples selected
from the same population will give different results as the elements included in the sample are different.
This will give rise to sampling errors. When there is a difference between a sample statistics, say,
samples mean and population mean, then that difference is called error.
4
Sampling error is the difference between a sample statistics and its corresponding population
parameter.
Since the sample is part/portion of the population, it is unlikely that the sample mean would be
exactly equal to the population mean.
Example: consider the output population of five employees: 97, 96, 103, 99, and 105, and take a sample
of the output of two employees.
µ= (97+ 96+103+ 99+105)/5 =100
= (97+105)/2 = 101
Error = µ - = 100- 101 = -1
There are two types of errors. They are:
1. Sampling errors
2. Non-sampling errors
Sampling Errors: Sampling errors are errors that occurred due to the imperfect selection of the
appropriate sampling technique.
Non- Sampling Errors: Are results from some imperfect of the sampling process. Include all sources of
errors other than those introduced by the random sampling procedures.
There are two types of non-sampling errors:
1. Administrative errors
- Data processing errors
- Sample selection errors
- Interview errors
2. Respondent errors
- Non response
- Response bias
1.2 Sampling Distribution
It is the probability distribution of all possible sample statistics for a given sample size.
There are as many sampling distributions as there are many sample statistics. Example:
• Sampling distribution of the mean.
• Sampling distribution of the standard deviation
• Sampling distribution of the range.
• Sampling distribution of the proportion
1.2.1 Sampling Distribution of the Sample Mean
Sampling distribution of the sample mean is the probability distribution of all Possible sample means of a
given sample size.
Example 1
Suppose that each of the four typists making up a population of secretarial support service in a company is
5
asked to type the same page of a manuscript. The number of errors made by each typist is presented as
follows.
Typist Number of errors made
Ann 3
Bob 2
Carla 1
Dave 4
Required:
A. Calculate the mean and standard deviations of the population.
B. How many different samples of size 2 can be selected from the population?
C. Construct sampling distribution of the sample mean for a sample size of two.
D. Compute the mean and standard deviations of the distribution.
E. Compare the values in A and D and what conclusions can be made.
Example 2:The scores of five students A, B, C, D, E out of six marks is as follows:
A B C D E
0 3 6 3 5
Required:
A. What is the population mean and population standard deviation?
B. How many different samples of size two can be selected?
C. Construct sampling distribution of the sample mean for a sample size of two?
D. What is the mean and standard deviations of the sampling distribution?
E. What conclusions can be made about the population and the sampling distribution?
Exercise: Given five elements of a population; 3, 6, 12, 15, 18
Required:
A. Compute population mean and standard deviation?
B. How many different sample of size 3 are possible?
C. Construct sampling distribution of the for sample size of three?
D. Compute the mean and standard deviation (standard error) of the sampling distribution
of the ?
E. Compare
i. Population mean and mean of sampling distribution of the sample mean?
ii. Population standard deviation and standard deviation of the sampling distribution?
F. Does the mean of sampling distribution of the sample mean shows some tendency towards
being normal distribution?
6
Central Limit Theorem
It states that:
1. If the population is normal distribution, then sampling distribution of the sample means
(distribution of the sample means) is also normal for all sample sizes.
2. The sampling distribution of sample means practically normal if the sample size is n ≥30.
Normal sampling distribution of the sample means can be converted in to standard
sampling distribution of the sample means by:
Z= Value of RV – Mean of RV
Standard deviation of the RV
Where:
= sample mean
= = mean of the sample mean = mean of the population
= standard error of the mean
According to central limit theorem, for a large sample size, the sampling distribution of the sample
means is approximately normal regardless of the shape of the population distribution. The sample size
is large if the n≥ 30.
As the sample size increases, the shape of the sampling distribution of the sample means tends to be
more and more approximate to normal distribution.
Example 1: Hourly wages of workers in an industry have a mean rate of Br 5 per hr. and standard deviation
of 0.60. What is the probability that the mean wage of random sample of 50 workers will be;
A. Greater than BR 5.10
B. Less than br.4.80
C. b/n.4.80 &5.10
Exercise 1: Consider a film Manufacturing Electronic watches whose useful life is approximately
normally distributed with mean life of 5 years and standard deviation of 1.5 year. What is the probability
that a random sample of 25 watches has an average life of
A. Between 5 and 6 years? C. Less than 6 years?
B. Between 4 and 5 years? D. Less than 4 year?
Exercise 2: National Examination Bureau (NEB) of Ethiopia has found that the mean score of the 2000's
examinees is 250 and the standard deviation of the scores is 48. The scores are normally distributed. If a
random sample of 16 examines is taken,
A. What is the probability that the sample mean score will be
7
i. Greater than 250?
ii. Greater than 270?
iii. Less than 230?
iv. Between 230 and 270?
B. What Minimum average score does the highest 5 percent score of the examinees have?
C. About what two values will
i. 68% of the mean scores fall?
ii. 95% of the mean scores fall?
iii. 99.7% of the means
1.2.2. Sampling Distribution of the Difference b/n two Sample Means
If independent random samples of size n1 and n2 are drawn (with out replacement) from two
different finite populations
Population 1 Population 2
Population size N1 N2
Mean
Standard deviation
Sample size
The mean and standard deviations of the distribution are given by:
Mean
Sampling with replacement &
n ≤ 0.05N
Standard deviation
of the distribution
Sampling with out replacement
Example: A teacher has been giving tutorial class for four accounting and four business
administration female students for 8 weeks. Test scores out of 10 is as follows.
Accounting Business Administration
A B C D F G H I
6 5 7 9 7 8 4 10
The teacher wants to take a random sample of two students from each department.
8
Required:
A. Calculate population mean and standard deviation difference.
B. Construct sampling distribution of the difference b/n two sample means.
C. Compute mean and standard deviation of the sample distribution
D. Compare the values in A and C.
Central Limit Theorem
1. If we have two normal populations, then sampling distribution of the difference b/n two sample
means is normal for all sample size.
2. If sample size of each is greater than 30, i.e. n 1 & n2 ≥ 30, then sampling distribution of the sample
mean is normal.
Normal sampling distribution of the difference b/n two sample means is standardized by using the
following formula.
Example: Car stereos of manufacturer A have a mean lifetime of 1400 hours with a standard deviation of
200 hours, while those of manufacturer B have a mean life time of 1200 hours with a standard deviation of
100 hours. If a random sample of 125 stereos of each manufacturer are tested, what is the probability that
manufacturer A’s stereos will have a mean lifetime which is at least
1. 160 hours more than manufacturer B’s stereos and
2. 250 hours more than the manufacturer B’s stereos?
1.2.3. Sampling Distribution of the Sample Proportion (Percentage)
It is probability distribution of all possible sample proportions of a given (fixed) sample size.
This distribution is applicable in the analysis of qualitative data, as against
quantitative data of analysis.
Example 1
Suppose that we have a population of five students who were asked if they wanted to become
managers. The answers were noted as follows
Student response
A YES
B NO
C YES
D YES
E NO
Required:
A. Find population proportion?
9
B. Construct sampling distribution of sampling proportion for sample size of 4?
C. Compute mean and standard deviation of sampling distribution of sample proportion?
D. What relation do you observe between population proportion and mean of sampling
distribution of sample proportion?
Example 2
Six applicants have applied for a recent vacancy post of Assistant Professor in Business Management
department. The department requires only two professors. Their CV for marital status is as follows:.
Applicant Marital Status
Abebe ------------------------------- Single
Belete--------------------------------Married
Chala--------------------------------- Single
Debele-------------------------------Married
Elias---------------------------------- Married
Feyisa--------------------------------Married
Now a days, psychologists and job analysts recommend the married employees. So, we are interested
in the number of married applicants.
Required
a) Compute population proportion
b) Construct sampling de-at-ruction of sample proportion
c) Find mean and standard error of the sampling distribution of sample proportion?
Central Limit Theorem
Central limit theorem is also applicable for sampling distribution of the sample proportion.
1. If the population is normal distribution, then sampling distribution of the sample
proportion (distribution of the sample proportion) is also normal for all sample sizes.
2. The sampling distribution of sample proportion is practically normal if the sample
size n ≥30.
Normal sampling distribution of the sample proportion can be converted in to standard
sampling distribution of the sample proportion by:
Z= Value of RV – Mean of RV
Standard deviation of the RV
Exercise
A study conducted among FBE students revealed that 45% of them believed that "Introduction to
Business statistics" course should be included in the cubiculum. A random sample of 80 such
10
students was taken.
Required
a) What is the probability that
i. More than half of them share this belief?
i. Greater than 35% of them share this belief
i. Between 55% and 60% of them share this belief
b) About what two proportions will
i. 68% of them share this belief?
i. Virtually all of them share this belief?
1.2.4 Sampling Distribution of the Difference b/n two Sample Proportions
Example
An instructor giving tutorial class for female students wants the difference in the analytical and
interpretation skill for statistical numbers between BA and accounting students. After a careful analysis the
teacher found out the following information concerning their skill.
Business Administration Accounting
Name A B C D E Name F G H I J
Skill YES NO YES NO YES Skill NO YES NO YES NO
Sampling is to be done without replacement and sample sizes of four students are selected from each
department.
Required:
A. Calculate the difference b/n the population proportions.
B. Construct sampling distribution of the difference b/n the sample proportion for a sample size
of four.
C. Calculate the mean and standard error of the difference b/n the sample proportion.
11
Chapter Four
Statistical Estimations
Introduction
In business, there are several situations where managers have to make quick estimates. Since their
estimates have an impact on the success and failure of their enterprises, they have to take sufficient care to
ensure that their estimates are not far away from the actual out come. One thing we should know here is
that, the estimates are made without complete information (from the samples) and with a great deal of
uncertainty about the eventual outcome. Therefore, in this chapter we are going to see different methods of
statistical estimations.
Basic definitions
Statistical inference (inferential statistics): is a method that is used to determine something about the
population on the basis of sample results drawn from the same population. In statistical inference
statements about unknown population parameter is derived from information obtained in a random sample
selected from the population.
There are two types of inferential methods
1. Estimation
2. Hypothesis testing
Estimation: It is the process of approximating unknown population parameter based on sample known and
calculated sample statistics.
There are two types of estimation: Point estimation and interval estimation.
Point estimation
Exercise 1: The elements in a random sample are 4 10 11 13 16 18. Compute the following
point estimates.
a. The estimate of the population mean
b. The estimate of the population standard deviation
c. The estimate of the standard error of the mean
Exercise 2: An attempt to investigate how many of the managers in both private and government
institutions in Addis Ababa are management graduates revealed that 10% of the 20 sample managers were
management graduates. Compute the point estimates of
a. Population proportion
b. Standard error of the proportion
12
Interval estimates
An interval estimates consists of two values between which the true population value lies with some stated level of
significance.
Interval estimation of the population mean
An interval estimate of the population mean (µ) is an interval values a to b with in which an
unknown population mean is expected to lie.
The interval is an inference based on
i. The value of the sample mean (x) of a simple random sample selected from the
population, and
ii. Known fact about the sampling distribution of the sample mean.
We cannot be 100% certain that such an interval is correct, because the interval is estimated by
using sample results and the samples are some parts or portions of a population. A confident
interval estimate of the population mean is an interval estimate together with a statement how
confident (e.g. 90%, 95%) we are that the interval is correct.
The choice of methods in constructing a confidence interval for the population mean(µ) depends up
on
Whether or not the population is normal and
Whether the population standard deviation(δx) is known or unknown
Case 1: Large sample case (n ≥ 30), with Standard deviation (δx) known
CI : x za / 2
n
x = the sample mean
c = the confidence coefficient
za/2 = the z value providing an area of a/2 in the upper tail of the standard normal
probability distribution
δ = the population standard deviation
n = the sample size
13
Example 1
Ethiopian management institute (EMI) wishes to have information on the mean income of managers in
small scale industries. A random samples of 256 managers revealed a sample mean of birr 45,420. The
population standard deviation is 2050.
Required: develop a confidence interval estimate of the mean income of managers in small scale industries
with
i. 95% confidence coefficient
ii. 90% confidence coefficient
iii. a=0.05
Exercise: A time study has been carried out at a manufacturing plant to determine how long it takes on
average to assemble a tractor. Random sample (time in hours) were
3.6 4.2 4 3.5 3.8 3.1
Suppose the population of assembly times has a normal distribution whose standard deviation is 0.30 hour.
Required: Construct a confidence interval estimate for
a. C=0.98
b. α=0.1096
c. 68% of confidence
Case 2: Large sample case (n ≥ 30), with Standard deviation (δx) unknown
In most applications the value of the population standard deviation is unknown. We simply use the value of
the sample standard deviation, s, as the point estimate of the population standard deviation.
s
CI : x z / 2
n
Example: National Discount has 260 retail outlets throughout the United States. National evaluates each
potential location for a new retail outlet in part on the mean annual income of the individuals in the
marketing area of the new location. Sampling can be used to develop an interval estimate of the mean
annual income for individuals in a potential marketing area for National Discount. A sample of size n = 36
was taken. The sample mean is $21,100 and the sample standard deviation(s) is $4,500. Develop a 95%
confidence interval estimate for the mean annual income of individuals.
14