Sampling and
Sampling Distributions
Tools of Business Statistics
Descriptive statistics
Collecting, presenting, and describing data
Inferential statistics
Drawing conclusions and/or making decisions
concerning a population based only on
sample data
Populations and Samples
A Population is the set of all items or individuals
of interest
Examples: All likely voters in the next election
All parts produced today
All sales receipts for November
A Sample is a subset of the population
Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Random receipts selected for audit
Population vs. Sample
Population Sample
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Why Sample?
Less time consuming than a census
Less costly to administer than a census
It is possible to obtain statistical results of a
sufficiently high precision based on samples.
Simple Random Samples
Every object in the population has an equal chance of
being selected
Objects are selected independently
Samples can be obtained from a table of random
numbers or computer random number generators
A simple random sample is the ideal against which
other sample methods are compared
Inferential Statistics
Making statements about a population by
examining sample results
Sample statistics Population parameters
(known) Inference (unknown, but can
be estimated from
sample evidence)
Sample
Population
Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Estimation
e.g., Estimate the population mean
weight using the sample mean
weight
Hypothesis Testing
e.g., Use sample evidence to test
the claim that the population mean
weight is 120 kg
Sampling Distributions
A sampling distribution is a distribution of
all of the possible values of a statistic for
a given size sample selected from a
population
Sampling
Distributions
Sampling Sampling Sampling
Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance
Sampling Distributions of
Sample Means
Sampling
Distributions
Sampling Sampling Sampling
Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance
Developing a
Sampling Distribution
Assume there is a population …
A C D
Population size N=4 B
Random variable, X,
is age of individuals
Values of X:
18, 20, 22, 24 (years)
Developing a
Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
μ
X i P(x)
N
.25
18 20 22 24
21
4
0
σ
(X i μ) 2
2.236
18 20 22 24 x
N A B C D
Uniform Distribution
Developing a
Sampling Distribution
(continued)
Now consider all possible samples of size n = 2
1st 2nd Observation
Obs 16 Sample
18 20 22 24
Means
18 18,18 18,20 18,22 18,24
1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
16 Sample Means Sample Means
Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Developing a
Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution:
E(X)
X i
18 19 21 24
21 μ
N 16
σX
( X i μ) 2
N
(18 - 21)2 (19 - 21)2 (24 - 21)2
1.58
16
Comparing the Population with its
Sampling Distribution
Population Sample Means Distribution
N=4 n=2
μ 21 σ 2.236 μX 21 σ X 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0 X 0 18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Expected Value of Sample Mean
Let X1, X2, . . . Xn represent a random sample from a
population
The sample mean value of these observations is
defined as
1 n
X Xi
n i1
Standard Error of the Mean
Different samples of the same size from the same
population will yield different sample means
A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
σ
σX
n
Note that the standard error of the mean decreases as
the sample size increases
If the Population is Normal
If a population is normal with mean μ and
standard deviation σ, the sampling distribution
of X is also normally distributed with
σ
μX μ and σX
n
Z-value for Sampling Distribution
of the Mean
Z-value for the sampling distribution of X :
( X μ) ( X μ)
Z
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Finite Population Correction
Apply the Finite Population Correction if:
a population member cannot be included more
than once in a sample (sampling is without
replacement), and
the sample is large relative to the population
(n is greater than about 5% of N)
Then
σ2 N n or σ σ Nn
Var( X)
n N 1
X
n N 1
Finite Population Correction
If the sample size n is not small compared to the
population size N , then use
( X μ)
Z
σ Nn
n N 1
Sampling Distribution Properties
Normal Population
μx μ Distribution
μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)
μx
x
Sampling Distribution Properties
(continued)
For sampling with replacement:
As n increases, Larger
σ x decreases sample size
Smaller
sample size
μ x
If the Population is not Normal
We can apply the Central Limit Theorem:
Even if the population is not normal,
…sample means from the population will be
approximately normal as long as the sample size is
large enough.
Properties of the sampling distribution:
σ
μx μ and σx
n
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population
x
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution
properties:
Central Tendency
μx μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx Larger
n Smaller
sample size
sample
size
μx x
How Large is Large Enough?
For most distributions, n > 25 will give a
sampling distribution that is nearly normal
For normal population distributions, the
sampling distribution of the mean is always
normally distributed
Example
Suppose a population has mean μ = 8 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.
What is the probability that the sample mean is
between 7.8 and 8.2?
Example
(continued)
Solution:
Even if the population is not normally
distributed, the central limit theorem can be
used (n > 25)
… so the sampling distribution of x is
approximately normal
… with mean μx = 8
…and standard deviation σ x σ 3 0.5
n 36
Example
(continued)
Solution (continued):
7.8 - 8 μX -μ 8.2 - 8
P(7.8 μ X 8.2) P
3 σ 3
36 n 36
P(-0.5 Z 0.5) 0.3830
Population Sampling Standard Normal
Distribution Distribution Distribution .1915
??? +.1915
? ??
? ? Sample Standardize
? ? ?
?
7.8 8.2 -0.5 0.5
μ8 X μX 8 x μz 0 Z
Acceptance Intervals
Determine a range within which sample means are likely
to occur, given a population mean and variance.
By the Central Limit Theorem, we know that the distribution of X
is approximately normal if n is large enough, with mean μ and
standard deviation σ X
Let zα/2 be the z-value that leaves area α/2 in the upper tail of the
normal distribution (i.e., the interval - zα/2 to zα/2 encloses
probability 1 – α)
Then
μ z/2σ X
is the interval that includes X with probability 1 – α
Sampling Distributions of
Sample Proportions
Sampling
Distributions
Sampling Sampling Sampling
Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance
Population Proportions, P
P = the proportion of the population having
some characteristic
Sample proportion (P̂) provides an estimate
of P:
X number of items in the sample having the characteri stic of interest
Pˆ
n sample size
0≤ ≤1
P̂
has a binomial distribution, but can be approximated
P̂
by a normal distribution when nP(1 – P) > 9
^
Sampling Distribution of P
Normal approximation:
Sampling Distribution
P(Pˆ )
.3
.2
.1
0
0 .2 .4 .6 8 1 P̂
Properties:
X P(1 P)
E(P̂) P σ Var
and
2
Pˆ
n n
(where P = population proportion)
Z-Value for Proportions
Standardize P̂ to a Z value with the formula:
Pˆ P Pˆ P
Z
σ Pˆ P(1 P)
n
Example
If the true proportion of voters who support
Proposition A is P = .4, what is the probability
that a sample of size 200 yields a sample
proportion between .40 and .45?
i.e.: if P = .4 and n = 200, what is
P(.40P̂≤ ≤ .45) ?
Example
(continued)
if P = .4 and n = 200, what is
P(.40P̂≤ ≤ .45) ?
P(1 P) .4(1 .4)
Find σ Pˆ : σ Pˆ .03464
n 200
Convert to .40 .40 .45 .40
standard
ˆ
P(.40 P .45) P Z
normal: .03464 .03464
P(0 Z 1.44)
Example
(continued)
if p = .4 and n = 200, what is
P(.40P̂≤ ≤ .45) ?
Use standard normal table: P(0 ≤ Z ≤ 1.44) = .4251
Standardized
Sampling Distribution Normal Distribution
.4251
Standardize
.40 .45 P̂ 0 1.44
Z
Sampling Distributions of
Sample Proportions
Sampling
Distributions
Sampling Sampling Sampling
Distribution of Distribution of Distribution of
Sample Sample Sample
Mean Proportion Variance
Sample Variance
Let x1, x2, . . . , xn be a random sample from a
population. The sample variance is
n
1
s2
n 1 i1
(x i x) 2
the square root of the sample variance is called
the sample standard deviation
the sample variance is different for different
random samples from the same population
Sampling Distribution of
Sample Variances
The sampling distribution of s2 has mean σ2
E(s 2 ) σ 2
If the population distribution is normal, then
4
2σ
Var(s 2 )
n 1
If the population distribution is normal then
(n - 1)s 2
σ2
has a 2 distribution with n – 1 degrees of freedom
The Chi-square Distribution
The chi-square distribution is a family of distributions,
depending on degrees of freedom:
d.f. = n – 1
0 4 8 12 16 20 24 28 2 0 4 8 12 16 20 24 28 2 0 4 8 12 16 20 24 28 2
d.f. = 1 d.f. = 5 d.f. = 15
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7 If the mean of these three
Let X2 = 8 values is 8.0,
What is X3? then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)