7 Sampling and Sampling Distribution (Class Version)
7 Sampling and Sampling Distribution (Class Version)
Slide
1
Introduction
An
An element
element is
is the
the entity
entity on
on which
which data
data are
are collected.
collected.
A
A population
population is
is aa collection
collection of
of all
all the
the elements
elements of
of
interest.
interest.
A
A sample
sample is
is aa subset
subset of
of the
the population.
population.
The
The sampled
sampled population
population is
is the
the population
population from
from
which
which the
the sample
sample is
is drawn.
drawn.
A
A frame
frame is
is aa list
list of
of the
the elements
elements that
that the
the sample
sample will
will
be
be selected
selected from.
from.
Slide
2
Introduction
The
The reason
reason we
we select
select aa sample
sample is
is to
to collect
collect data
data to
to
answer
answer aa research
research question
question about
about aa population.
population.
The
The sample
sample results
results provide
provide only
only estimates
estimates of
of the
the
values
values of
of the
the population
population characteristics.
characteristics.
The
The reason
reason is
is simply
simply that
that the
the sample
sample contains
contains only
only aa
portion
portion of
of the
the population.
population.
With
With proper
proper sampling
sampling methods,
methods, the
the sample
sample results
results
can
can provide
provide “good”
“good” estimates
estimates of
of the
the population
population
characteristics.
characteristics.
Slide
3
Selecting a Sample
Sampling from a Finite Population
Sampling from an Infinite
Population
Slide
4
Sampling from a Finite Population
Finite populations are often defined by lists
such as:
• Organization membership roster
• Credit card account numbers
• Inventory product numbers
A simple random sample of size n from a finite
population of size N is a sample selected such
that
each possible sample of size n has the same
probability
of being selected.
Slide
5
Sampling from a Finite Population
Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement.
Sampling without replacement is the procedure
used most often.
In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
Slide
6
Sampling from a Finite Population
Example: St. Stephen’s College
St. Stephen’s College received 900
applications for
admission in the upcoming year from
prospective
students. The applicants were numbered, from
1 to
900, as their applications arrived. The Director
of
Admissions would like to select a simple
random
sample of 30 applicants.
Slide
7
Sampling from a Finite Population
Example: St. Stephen’s College
Step 1: Assign a random number to each of the 900
applicants.
Slide
8
Sampling from an Infinite Population
Sometimes we want to select a sample, but find it is
not possible to obtain a list of all elements in the
population.
As a result, we cannot construct a frame for the
population.
Hence, we cannot use the random number selection
procedure.
Most often this situation occurs in infinite population
cases.
Slide
9
Sampling from an Infinite Population
Populations are often generated by an ongoing
process where there is no upper limit on the
number of units that can be generated.
Some examples of on-going processes, with infinite
populations, are:
• parts being manufactured on a production line
• transactions occurring at a bank
• telephone calls arriving at a technical help desk
• customers entering a store
Slide
10
Sampling from an Infinite Population
In the case of an infinite population, we must
select
a random sample in order to make valid
statistical
inferences
A random about
sample theanpopulation
from from which
infinite population is a
the
sample selected such that the following conditions
sample is taken.
are satisfied.
• Each element selected comes from the population
of interest.
• Each element is selected independently.
Slide
11
Practical Advice
The
The target
target population
population is
is the
the population
population we
we want
want to
to
make
make inferences
inferences about.
about.
The
The sampled
sampled population
population is
is the
the population
population from
from
which
which the
the sample
sample is
is actually
actually taken.
taken.
Whenever
Whenever aa sample
sample is
is used
used to
to make
make inferences
inferences about
about
aa population,
population, we
we should
should make
make sure
sure that
that the
the targeted
targeted
population
population and
and the
the sampled
sampled population
population are
are in
in close
close
agreement.
agreement.
Slide
12
Point Estimation
Point
Point estimation
estimation is
is aa form
form of
of statistical
statistical inference.
inference.
In
In point
point estimation
estimation wewe use
use the
the data
data from
from the
the sample
sample
to
to compute
compute aa value
value ofof aa sample
sample statistic
statistic that
that serves
serves
as
as an
an estimate
estimate of
of aa population
population parameter.
parameter.
We
We refer tox
refer to as
as the
the point
point estimator
estimator of
of the
the population
population
mean ..
mean
ss is
is the
the point
point estimator
estimator of
of the
the population
population standard
standard
deviation ..
deviation
p is
is the
the point
point estimator
estimator of
of the
the population
population proportion
proportion p
p
Slide
13
Point Estimation
Example: St. Stephen’s College
Recall that St. Stephen’s College received
900
applications from prospective students. The
application form contains a variety of
information
including the individual’s Scholastic Aptitude
Test
At score
(SAT) a meeting in a few or
and whether hours, the individual
not the Director of
Admissions
desires would like to announce the average
SAT
on-campus housing.
score and the proportion of applicants that
want to
live on campus, for the population of 900
applicants. Slide
14
Point Estimation
Example: St. Stephen’s College
However, the necessary data on the
applicants have
not yet been entered in the college’s
computerized
database. So, the Director decides to estimate
the
values of the population parameters of interest
based
on sample statistics. The sample of 30
applicants is
selected using computer-generated random
Slide
15
numbers.
Point Estimation
x as Point Estimator of
x
x 32,910
i
1097
30 30
s as Point Estimator of
s
i
(x x)2
163,996
75.2
29 29
p as Point Estimator of
p 20/30=0.67
Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.
Slide
16
Point Estimation
i
(x )2
80
900
Population Proportion Wanting On-Campus
Housing 648
p .72
900
Slide
17
Summary of Point Estimates
Obtained from a Simple Random Sample
Slide
18
Sampling Distribution ofx
Process of Statistical Inference
Slide
19
Sampling Distribution – an
exercise
Slide
20
Sampling Distribution ofx
E( x) =
Slide
21
Sampling Distribution ofx
• Standard Deviation ofx
Slide
22
Sampling Distribution ofx
• Standard Deviation ofx
Finite Population Infinite Population
N n
x ( ) x
N1 n n
• A finite population is treated as being
infinite if n/N < .05.
• ( N n) / ( N 1) is the finite population
correction factor (without replacement).
• x is referred to as the standard
error of the
sample means.
Slide
23
Sampling Distribution ofx
x
In most applications, the sampling distribution of
can be approximated by a normal distribution
whenever the sample is size 30 or more.
Slide
24
Sampling Distribution ofx
Slide
25
Central Limit Theorem
Slide
26
Sampling Distribution ofx
Example: St. Stephen’s College
Sampling
Distribution 80
of x x 14.6
n 30
for SAT
Scores
x
E(x) 1090
Slide
27
Sampling Distribution ofx
Example: St. Stephen’s College
What is the probability that a simple
random
sample of 30 applicants will provide an
estimate of
the population mean SAT score that is within
In other words, what is the probabilityxthat
+/-10
will
of the actual population mean ?
be between 1080 and 1100?
Slide
28
Sampling Distribution ofx
Example: St. Stephen’s College
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (1100 - 1090)/14.6= .68
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z < .68) = .7517
Slide
29
Sampling Distribution ofx
Example: St. Stephen’s College
Slide
30
Sampling Distribution ofx
Example: St. Stephen’s College
Sampling
Distribution x 14.6
of x
for SAT
Scores
Area = .7517
x
1090 1100
Slide
31
Sampling Distribution ofx
Example: St. Stephen’s College
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (1080 - 1090)/14.6= - .68
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.68) = .2483
Slide
32
Sampling Distribution of x for SAT Scores
Example: St. Stephen’s College
Sampling
Distribution x 14.6
of x
for SAT
Scores
Area = .2483
x
1080 1090
Slide
33
Sampling Distribution of x for SAT Scores
Example: St. Stephen’s College
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.68 < z < .68) = P(z < .68) - P(z < -.68)
= .7517 - .2483
= .5034
The probability that the sample mean SAT
score will
be between 1080 and 1100 is:
P(1080 < x< 1100) = .5034
Slide
34
Sampling Distribution of x for SAT Scores
Example: St. Stephen’s College
Sampling
Distribution x 14.6
of x
for SAT
Scores
Area = .5034
x
1080 1090 1100
Slide
35
Relationship Between the Sample Size
x of
and the Sampling Distribution
Example: St. Stephen’s College
• Suppose we select a simple random sample of 100
applicants with replacement instead of the
30 originally considered.
• E(x ) = m regardless of the sample size. In
our x
• example,
Whenever the E( ) remains
sample size isatincreased,
1090. the standard
error of the mean x is decreased. With the increase
in the sample size to n = 100, the standard error of
the mean is decreased from 14.6 to:
80
x 8.0
n 100
Slide
36
Relationship Between the Sample Size
x of
and the Sampling Distribution
Example: St. Stephen’s College
With n = 100,
x 8
With n = 30,
x 14.6
x
E(x) 1090
Slide
37
Relationship Between the Sample Size
x of
and the Sampling Distribution
Example: St. Stephen’s College
• Recall that when n = 30, P(1080 < x < 1100) = .5034.
• We follow the same steps to solve for P(1080 x<
< 1100) when n = 100 as we showed earlier when
n = 30.
• Now, with n = 100, P(1080 < x < 1100) = .7888.
• Because the sampling distribution with n = 100 has a
smaller standard error, the values ofx have less
variability and tend to be closer to the population
mean than the values of x with n = 30.
Slide
38
Relationship Between the Sample Size
x of
and the Sampling Distribution
Example: St. Stephen’s College
Sampling x 8
Distribution
of x
for SAT
Scores
Area = .7888
x
108010901100
Slide
39
Sampling Distribution ofp
Making Inferences about a Population
Proportion
Population A simple random sample
with proportion of n elements is selected
p=? from the population.
E ( p) p
where:
p = the population proportion
Slide
41
Sampling Distribution ofp
• Standard Deviation ofp
Finite Population Infinite Population
N n p(1 p) p(1 p)
p p
N1 n n
Slide
42
Form of the Sampling Distribution ofp
Slide
43
Sampling Distribution ofp
Example: St. Stephen’s College
Recall that 72% of the prospective students
applying
to St. is
What Stephen’s College
the probability desire
that on-campus
a simple random sample
housing.
of 30 applicants will provide an estimate of the
population proportion of applicant desiring on-campus
housing that is within plus or minus .05 of the actual
population proportion?
Slide
44
Sampling Distribution ofp
Example: St. Stephen’s College
For our example, with n = 30 and p = .72,
the
normal distribution is an acceptable
approximation
because: np = 30(.72) = 21.6 > 5
and
n(1 - p) = 30(.28) = 8.4 > 5
Slide
45
Sampling Distribution ofp
Example: St. Stephen’s College
p
E(p) .72
Slide
46
Sampling Distribution ofp
Example: St. Stephen’s College
Step 1: Calculate the z-value at the upper endpoint
of the interval.
z = (.77 - .72)/.082 = .61
Step 2: Find the area under the curve to the left of
the upper endpoint.
P(z < .61) = .7291
Slide
47
Sampling Distribution ofp
Example: St. Stephen’s College
Cumulative Probabilities for
the Standard Normal
z .00 .01 .02
Distribution
.03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Slide
48
Sampling Distribution ofp
Example: St. Stephen’s College
Sampling p .082
Distribution
of p
Area = .7291
p
.72 .77
Slide
49
Sampling Distribution ofp
Example: St. Stephen’s College
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (.67 - .72)/.082 = - .61
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.61) = .2709
Slide
50
Sampling Distribution ofp
Example: St. Stephen’s College
Sampling p .082
Distribution
of p
Area = .2709
p
.67 .72
Slide
51
Sampling Distribution ofp
Example: St. Stephen’s College
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.61 < z < .61) = P(z < .61) - P(z < -.61)
= .7291 - .2709
= .4582
The probability that the sample proportion of applicants
wanting on-campus housing will be within +/-.05 of the
actual population proportion :
Slide
52
Sampling Distribution ofp
Example: St. Stephen’s College
Sampling p .082
Distribution
of p
Area = .4582
p
.67 .72 .77
Slide
53
Other Sampling Methods
Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling
Slide
54
Stratified Random Sampling
The
The population
population is
is first
first divided
divided into
into groups
groups of
of
elements
elements called
called strata.
strata.
Each
Each element
element in
in the
the population
population belongs
belongs to
to one
one and
and
only
only one
one stratum.
stratum.
Best
Best results
results are
are obtained
obtained when
when the
the elements
elements within
within
each
each stratum
stratum are
are as
as much
much alike
alike as
as possible
possible
(i.e.
(i.e. aa homogeneous
homogeneous group).
group).
Slide
55
Stratified Random Sampling
A
A simple
simple random
random sample
sample is
is taken
taken from
from each
each stratum.
stratum.
Formulas
Formulas are
are available
available for
for combining
combining the
the stratum
stratum
sample
sample results
results into
into one
one population
population parameter
parameter
estimate.
estimate.
Advantage:
Advantage: If If strata
strata are
are homogeneous,
homogeneous, this
this method
method
is
is as
as “precise”
“precise” asas simple
simple random
random sampling
sampling but
but with
with
aa smaller
smaller total
total sample
sample size.
size.
Example:
Example: The
The basis
basis for
for forming
forming the
the strata
strata might
might be
be
department,
department, location,
location, age,
age, industry
industry type,
type, and
and so
so on.
on.
Slide
56
Cluster Sampling
The
The population
population is
is first
first divided
divided into
into separate
separate groups
groups
of
of elements
elements called
called clusters.
clusters.
Ideally,
Ideally, each
each cluster
cluster is
is aa representative
representative small-scale
small-scale
version
version of
of the
the population
population (i.e.
(i.e. heterogeneous
heterogeneous group).
group).
A
A simple
simple random
random sample
sample of
of the
the clusters
clusters is
is then
then taken.
taken
All
All elements
elements within
within each
each sampled
sampled (chosen)
(chosen) cluster
cluster
form
form the
the sample.
sample.
Slide
57
Cluster Sampling
Example:
Example: A A primary
primary application
application is
is area
area sampling,
sampling,
where
where clusters
clusters are
are city
city blocks
blocks or
or other
other well-defined
well-defined
areas.
areas.
Advantage:
Advantage: The The close
close proximity
proximity of
of elements
elements can
can be
be
cost
cost effective
effective (i.e.
(i.e. many
many sample
sample observations
observations can
can be
be
obtained
obtained in
in aa short
short time).
time).
Disadvantage:
Disadvantage: This
This method
method generally
generally requires
requires aa
larger
larger total
total sample
sample size
size than
than simple
simple or
or stratified
stratified
random
random sampling.
sampling.
Slide
58
Systematic Sampling
If
If aa sample
sample size
size of
of n
n is
is desired
desired from
from aa population
population
containing
containing N N elements,
elements, wewe might
might sample
sample one
one
element
element for
for every
every N/n
N/n elements
elements in
in the
the population.
population.
We
We randomly
randomly select
select one
one ofof the
the first
first N/n
N/n elements
elements
from
from the
the population
population list.
list.
We
We then
then select
select every
every N/nth
N/nth element
element that
that follows
follows in
in
the
the population
population list.
list.
Slide
59
Systematic Sampling
This
This method
method has
has the
the properties
properties ofof aa simple
simple random
random
sample,
sample, especially
especially if
if the
the list
list of
of the
the population
population
elements
elements is
is aa random
random ordering.
ordering.
Advantage:
Advantage: TheThe sample
sample usually
usually will
will be
be easier
easier to
to
identify
identify than
than it
it would
would be
be if
if simple
simple random
random sampling
sampling
were
were used.
used.
Example:
Example: Selecting
Selecting every
every 100
100thth listing
listing in
in aa telephone
telephone
book
book after
after the
the first
first randomly
randomly selected
selected listing
listing
Slide
60
Convenience Sampling
It
It is
is aa nonprobability
nonprobability sampling
sampling technique.
technique. Items
Items are
are
included
included in in the
the sample
sample without
without known
known probabilities
probabilities
of
of being
being selected.
selected.
The
The sample
sample is
is identified
identified primarily
primarily by
by convenience.
convenience.
Example:
Example: AA professor
professor conducting
conducting research
research might
might use
use
student
student volunteers
volunteers to
to constitute
constitute aa sample.
sample.
Slide
61
Convenience Sampling
Advantage:
Advantage: Sample
Sample selection
selection and
and data
data collection
collection are
are
relatively
relatively easy.
easy.
Disadvantage:
Disadvantage: ItIt is
is impossible
impossible to
to determine
determine how
how
representative
representative of
of the
the population
population the
the sample
sample is.
is.
Slide
62
Judgment Sampling
The
The person
person most
most knowledgeable
knowledgeable onon the
the subject
subject of
of the
the
study
study selects
selects elements
elements of
of the
the population
population that
that he
he or
or
she
she feels
feels are
are most
most representative
representative of
of the
the population.
population.
It
It is
is aa nonprobability
nonprobability sampling
sampling technique.
technique.
Example:
Example: A A reporter
reporter might
might sample
sample three
three or
or four
four
senators,
senators, judging
judging them
them as
as reflecting
reflecting the
the general
general
opinion
opinion of
of the
the senate.
senate.
Slide
63
Judgment Sampling
Advantage:
Advantage: It
It is
is aa relatively
relatively easy
easy way
way of
of selecting
selecting aa
sample.
sample.
Disadvantage:
Disadvantage: The
The quality
quality of
of the
the sample
sample results
results
depends
depends on
on the
the judgment
judgment ofof the
the person
person selecting
selecting the
the
sample.
sample.
Slide
64
Recommendation
It
It is
is recommended
recommended that
that probability
probability sampling
sampling methods
methods
(simple
(simple random,
random, stratified,
stratified, cluster,
cluster, or
or systematic)
systematic) be
be
used.
used.
For
For these
these methods,
methods, formulas
formulas are
are available
available for
for
evaluating
evaluating the
the “goodness”
“goodness” ofof the
the sample
sample results
results in
in
terms
terms of
of the
the closeness
closeness of
of the
the results
results to
to the
the population
population
parameters
parameters being
being estimated.
estimated.
An
An evaluation
evaluation of
of the
the goodness
goodness cannot
cannot be
be made
made with
with
non-probability
non-probability (convenience
(convenience or
or judgment)
judgment) sampling
sampling
methods.
methods.
Slide
65
How Reliable is Your Probability Sample?
a) 0.091
b) 0.146
c) 0.246
d) 0.854
e) 0.909
Slide
67
Properties of the
Uniform Distribution
a b
μ
2
(b - a)2
σ
12
Slide
68