Amity Business School
Sampling Distributions
RSR
Amity Business School
Sampling Distributions…
A sampling distribution is created by, as the name suggests,
sampling. There are two ways to create a sampling distribution.
• The first is to actually draw samples of the same size from a
population, calculate the statistic of interest, and then use
descriptive techniques to learn more about the sampling
distribution.
• The method we will employ on the rules of probability and the
laws of expected value and variance to derive the sampling
distribution.
For example, consider the roll of one and two dice…
RSR
Amity Business School
Sampling Distribution of the Mean…
A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
The probability distribution of X is:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
…and the mean and variance are calculated as well:
RSR
Amity Business School
A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means…
RSR
All Samples of Size 2 from a Population
Amity Business School
RSR
Amity Business School
6/36
P( ) 5/36
1.0 1/36
4/36
)
1.5 2/36
2.0 3/36
3/36
P(
2.5 4/36
3.0 5/36
3.5 6/36
2/36
4.0 5/36
4.5 4/36
5.0 3/36 1/36
5.5 2/36
6.0 1/36 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
RSR
Amity Business School
Compare…
Compare the distribution of X…
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
…with the sampling distribution of .
As well, note that:
RSR
Amity Business School
Generalize…
We can generalize the mean and variance of the sampling of two dice:
…to n-dice:
The standard deviation of the
sampling distribution is
called the standard error:
RSR
Amity Business School
It is important to recognize that the distribution of X is
different from the distribution of X. However the two
random variables are related.
Their means are the same( x 3.5) and their variances
(
are related x 2
2
2)
2
x2
If we now repeat the sampling process with the same
population but with other values of n, we produce
somewhat different sampling distributions of when
n = 5, 10 and 25.
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
The variance of sampling distributions of Xis less than the
variance of the population we’re sampling from all the sample
sizes.
Thus, a randomly selected value of X (the mean of the number
of spots in, say five throws of the die) is likely to be closer to
the mean value of 3.5 than is a randomly selected value of X
(the number of spots observed in one throw). Indeed this is
what one would expect, because, in five throws of the die one
is likely to get some 5s and 6s and some 1s and 2s, which will
tend to offset one another in averaging process and produce a
sample mean reasonably close to 3.5. As the number of throws
of the die increases the, the probability that the sample mean
will be close to 3.5 also increases. Thus we observe that the
sampling distribution of X becomes narrower as n increases.
RSR
Amity Business School
RSR
Amity Business School
Central Limit Theorem…
The sampling distribution of the mean of a random sample drawn from
any population is approximately normal for a sufficiently large
sample size.
The larger the sample size, the more closely the sampling distribution
of X will resemble a normal distribution.
RSR
Amity Business School
Central Limit Theorem…
If the population is normal, then X is normally distributed for all
values of n.
If the population is non-normal, then X is approximately normal only
for larger values of n.
In most practical situations, a sample size of 30 may be sufficiently
large to allow us to use the normal distribution as an approximation
for the sampling distribution of X.
RSR
Amity Business School
Sampling Distribution of the Sample Mean
1.
2.
3. If X is normal, X is normal. If X is nonnormal, X is approximately
normal for sufficiently large sample sizes.
Note: the definition of “sufficiently large” depends on the extent of
nonnormality of x (e.g. heavily skewed; multimodal)
RSR
Amity Business School
Sampling Distribution of the Sample Mean
We can express the sampling distribution of the mean
simple as
X
Z
/ n
RSR
Amity Business School
Sampling Distribution of the Sample Mean
The summaries above assume that the population is infinitely large.
However if the population is finite the standard error is
Nn
x
n N 1
where N is the population size and
Nn
N 1
is the finite population correction factor.
RSR
Amity Business School
Sampling Distribution of the Sample Mean
If the population size is large relative to the sample size the
finite population correction factor is close to 1 and can be
ignored.
We will treat any population that is at least 20 times larger
than the sample size as large.
In practice most applications involve populations that qualify
as large.
As a consequence the finite population correction factor is
usually omitted.
RSR
Amity Business School
Contents of a “32 - Ounce” Bottle
The foreman of a bottling plant has observed that the
amount of soda in each “32- ounce” bottle is actually a
normally distributed random variable, with a mean of
32.2 ounces and a standard deviation of .3 ounce.
a. If the customer buys one bottle, what is the
probability that the bottle will contain more than
32 ounces?
b. If a customer buys a carton of four bottles, what is
the probability that the mean amount of four
bottles will be greater than 32 ounces?
RSR
Amity Business School
Distribution of X and Sampling Distribution of X
RSR
Amity Business School
Solution
a. Because the random variable is the amount of soda in
one bottle, we want to find:
P ( X > 32), Where X is normally distributed, 32.2,
and .3. Hence,
X 32 32.2
P X 32 P
.3
P( z .67)
.5 .2486 .7486
RSR
Amity Business School
Distribution of X and Sampling Distribution of X
RSR
Amity Business School
Solution
b. Now we want to find the probability that the mean
amount of four filled bottles exceeds 32 ounces. That is,
we want P X 32. From our previous analysis and
from the central limit theorem, we know the following:
1. X is normally distributed
2. x 32.2
3. ( x n ) .3 4 .15
X x 32 32.2
P X 32 P
x .15
P( z 1.33)
RSR .5 .4082 .9082
Amity Business School
In the example, we began with the assumption that both µ
and σ were known.
Then, using the sampling distribution, we made a
probability statement about mean.
Unfortunately the values of µ and σ are not usually known,
so an analysis such as that in this example cannot usually
be conducted.
However, we can use the sampling distribution to infer
something about an unknown value of µ on the basis of a
sample mean.
RSR
Amity Business School
EXERCISE
The number of pizzas consumed per month by university
students is normally distributed with a mean of 10 and a
standard deviation of 3.
a. What proportion of students consume more than 12
pizzas per month?
b. What is the probability that, in a random sample of
25, students, more than 275 pizzas are consumed?
RSR
Amity Business School
Solution
X 12 10
P(X > 12) = P
3
= P(Z > .67) = .5 – P(0 < Z < .67)
= .5 – .2486
= .2514
RSR
Amity Business School
EXERCISE
The number of customers who enter a supermarket each
hour is normally distributed with a mean of 600 and a
standard deviation of 200. The supermarket is open 16
hours per day. What is the probability that the total
number of customers who enter the supermarket in one
day is greater than 10,000?
RSR
Amity Business School
Solution
P ( X 10,000 / 16)
P ( X 625)
X 625 600
P
/ n 200 / 16
= P(Z > .50)
= .5 – P(0 < Z < .50)
= .5 – .1915
= .3085
RSR
Amity Business School
Salaries of a Business School’s Graduates
In the advertisements for a large university, the dean of
the School of Business claims that the average salary of
schools graduates one year after graduation is $ 800 per
week with a standard deviation of $ 100. A second year
student in the business school who has just completed his
course would like to check whether the claim about the
mean is correct. He does a survey of 25 people who
graduated one year ago and determines their weekly
salary. He discovers the sample mean to be $ 750. To
interpret his findings he needs to calculate the probability
that a sample of 25 graduates would have a mean of $ 750
or less when the population mean is $ 800 and the
standard deviation is $ 100.
RSR
Amity Business School
RSR
Amity Business School
Solution
We want to find the probability that the sample mean is less than $ 750.
Thus we seek P X 750 .
The distribution of X, the weekly income, is likely to be positively skewed,
but not sufficiently so to make the distribution of X nonnormal.
As a result, we may assume that is normal with mean ( x 800)
and standard deviation 2 n 100 25 20 thus
x
The probability of observing a
X x 750 800
P X 750 P
sample mean as low as $750 When
the population mean is $800 is
x 20 extremely small.
P( z 2.5) Because this event is quite
.5 .4938 .006 unlikely, we would have to conclude
that the claim is not justified.
RSR
Amity Business School
Using the Sampling Distribution for Inference
• We know that zA is the value of z such that the area
to the right of zA under the standard normal curve is
equal to A.
and therefore
• we can show that z.025 = 1.96.
• Because the standard normal distribution is
symmetric about 0, the area to the left of -1.96 is
also .025.
• The area between -1.96 and 1.96 is .95.
• We can express the notation algebraically as:
P(-1.96 < Z < 1.96) = .95
RSR
Amity Business School
Using the Sampling Distribution for Inference
RSR
Amity Business School
Using the Sampling Distribution for Inference
Here’s another way of expressing the probability calculated
from a sampling distribution.
P(-1.96 < Z < 1.96) = .95
Substituting the formula for the sampling distribution
X
P(1.96 1.96 ) .95
/ n
With a little algebra
P( 1.96 X 1.96 ) .95
n n
P Z 2 X Z 2 1
n n
RSR
Amity Business School
Using the Sampling Distribution for Inference
We can also produce a general form of this statement
P( z / 2 X z / 2 ) 1
n n
In this formula α (Greek letter alpha) is the probability that X
does not fall into the interval.
To apply this formula all we need do is substitute the values for
µ, σ, n, and α.
RSR
Amity Business School
Salaries of a Business School’s Graduates
100 100
P 800 1.96 X 800 1.96 .95
25 25
P760.8 X 839.2 .95
This tells us that there is a 95% probability that a sample
mean will fall between 760.8 and 839.2. Because the mean
as computed to be $ 750, we would have to conclude that
the dean’s claim is not support by statistic.
RSR
Amity Business School
Exercise
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
Sampling Distribution of a Proportion…
The estimator of a population proportion of successes is the sample
proportion. That is, we count the number of successes in a sample and
compute:
(read this as “p-hat”).
X is the number of successes, n is the sample size.
RSR
Amity Business School
Sampling Distribution of Proportion
For example, suppose that we have a binomial experiment with n
= 10 and p = .4. To find the probability that the sample proportion
P̂
is less than or equal to .5 (because X/n = 5/10 = .50)
After calculations
P( Pˆ .50) P( X 5) .834
P̂
We can calculate the probability associated with other values of
similarly.
Discrete distributions such as the binomial do not lend themselves
easily to the kinds of calculation needed for inference. And
inference is the reason we need sampling distributions.
Fortunately, we can approximate the binomial distribution by
normal distribution.
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Distribution with n = 20 and p = .5 •Let X be a binomial
random variable with n = 20
and p = .5.
•We can easily determine
the probability of each value
of X, where X = 0, 1, 2, …,
19, 20.
•A rectangle representing a
value of x is drawn so that
its area equals the
probability.
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Distribution with n = 20 and p = .5 •If we now smooth the ends of
the rectangles, we produce a
bell-shaped curve.
•Thus the base of each rectangle
for x is the interval x - .5 to x +
.5.
•As you can see, the rectangle
representing x = 10 is the
rectangle whose base is the
interval 9.5 to 10.5 and whose
height is P(X = 10) = .176
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Distribution with n = 20 and p = .5
RSR
Amity Business School
Binomial Distribution with n = 20 and
p = .5 and Normal Approximation
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Binomial Distribution with n = 20 and p •We accomplish this by letting
= .5 and Normal Approximation the height of the rectangle equal
the probability and the base of
the rectangle equal 1.
•Thus, to use the normal
approximation, all we need to
do is find the area under the
normal curve between 9.5 and
10.5.
•To find normal probabilities
requires us to first standardize
X by subtracting the mean and
dividing by the standard
deviation.
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Binomial Distribution with n = 20 and p •The values for µ and σ are
= .5 and Normal Approximation
derived from the binomial
distribution being
approximated as:
np
and
npq
•for n =20 and p = .50 we have
µ = 10 and σ = 2.24
RSR
Amity Business School
Normal Approximation to Binomial Distribution
Binomial Distribution with n = 20 and •To calculate the probability that X =
p = .5 and Normal Approximation 10 using the normal distribution
requires that we find the area under
the normal curve between 9.5 and
10.5, i.e.,
P( X 10) P(9.5 Y 10.5)
9.5 10 Y 10.5 10
P
2.24 2.24
P .22 Z .22 2 .0871 .1742
•The actual probability that X equals
10 is P(X=10) = .176. As we can see
the approximation is quite good.
RSR
Amity Business School
Approximate Sampling Distribution of a Sample Proportion
Using the laws of expected value and variance, we can determine the mean,
variance, and standard deviation of P̂. (The standard deviation of P̂ is known
as Standard error of Proportion) that is,
E ( Pˆ ) p
ˆ
V P pˆ
2 pq
n
pˆ
pq
n
Thus the variable
Pˆ p
Z
pq n
is approximately standard normally distributed provided that the sample size is
large. The theoretical sample size requirements are that np and nq are both greater
than or equal to 5. This requirement is referred as theoretical because in practice
much larger sample size are needed for the inference to be useful.
RSR
Amity Business School
Political Survey
In the last election a Member of Parliament received 52% of the
votes cast. One year after the election the representative
organized a survey that asked a random sample of 300 people
whether they would vote for him in the next election. If we
assume that his popularity has not changed, what is the
probability that more than half of the sample would vote for him?
RSR
Amity Business School
Solution
•The number of respondents who would vote for the representative is
a binomial random variable with n = 300 and p =.52. We want to
determine the probability that the sample proportion is greater than
50%. That is, we want to find P Pˆ .50
•We now know that the sample proportion is approximately normally
distributed with mean p = .52 and standard deviation =
pq n (.52)(.48) 300 .0288
Pˆ P
Thus we calculate
P Pˆ .50 P
pq n
.50
.
.
0288
52
PZ .69 .5 .2549 .7549
If we assume the level of support remains at 52%, the probability that
more than half the sample of 300 people would vote for the
representative is 75.49%
RSR
Amity Business School
Housing Board Colony
A housing board colony of Gwalior consists of
2000 houses. A researcher wants to know the
average income of the households in this
housing board colony. The mean income per
household is Rs 150,000 with standard
deviation of 15,000. A random sample of 200
households is selected by a researcher and
analysed. What is the probability that the
sample average is greater than Rs 160,000.
RSR
Amity Business School
Solution
RSR
Amity Business School
Exercise
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
RSR
Amity Business School
Summary
•The sampling distribution of a statistics is
created by repeated sampling from one
population. Here, we introduced the sampling
distribution of mean, the proportion and the
difference between the means. We described how
these distributions are created theoretically and
empirically.
RSR
Amity Business School
From Here to Inference
we introduced probability distributions, which allowed us to
make probability statements about values of the random
variable.
A prerequisite of this calculation is knowledge of the
distribution and the relevant parameters.
RSR
Amity Business School
From Here to Inference
The figure below symbolically represents the use of probability
distributions.
Simply put, knowledge of the population and its parameter(s) allows us
to use the probability distribution to make probability statements about
individual members of the population.
RSR
Amity Business School
From Here to Inference
we developed the sampling distribution, wherein knowledge of the
parameter(s) and some information about the distribution allow us to
make probability statements about a sample statistic.
RSR
Amity Business School
From Here to Inference
Statistical works by reversing the direction of the flow of knowledge in the previous
figure. The next figure displays the character of statistical inference.
Now onwards, we will assume that most population parameters are unknown. The
statistics practitioner will sample from the population and compute the required
statistic. The sampling distribution of that statistic will enable us to draw inferences
about the parameter.
RSR