Chapter 5
Sampling and
Estimation
Sampling
• In this section, we present the various method for obtaining information
on a population through samples.
• The information on a population that we try to obtain usually concerns
the value of a parameter, a quantity computed from or used to describe
a population of data.
• When we use a sample to estimate a parameter, we make use of sample
statistics. A statistic is a quantity computed from or used to describe a
sample of data.
• Two methods of random sampling
simple random sampling
stratified random sampling
Suppose a telecommunications equipment analyst wants to know how
much major customers will spend on average for equipment during the
coming year. How does the analyst find this information?
• One strategy is to survey the population of telecom equipment
customers and inquire what their purchase plans are. Surveying all
companies, however, would be very costly in terms of time and
money.
• Alternatively, the analyst can collect a representative sample of
upcoming telecom equipment expenditures. In this case, the analyst
will compute the sample mean expenditure a statistic. This strategy
has a substantial advantage over polling the whole population
because it can be accomplished more quickly and at lower cost.
Simple Random Sampling
Definition: A simple random sample is a subset of a larger population created
in such a way that each element of population has an equal probability of
being selected to the subset. The procedure of drawing a sample to satisfy the
definition of a simple random sample is called simple random sampling.
Definition: Sampling errors is the difference between the observed value of a
statistic and the quantity it is intended to estimate.
Definition: The sampling distribution of a statistic is the distribution of all the
distinct possible values that the statistic can assume when computed from
samples of the same size randomly drawn from the same population
Stratified Random Sampling
Definition: In stratified random sampling, the population is divided into
subpopulations (strata) based on one or more classification criteria. Simple random
samples are then drawn from each stratum in sizes proportional to the relative size
of each stratum in the population. These samples are then pooled to form a
stratified random sample.
Advantage:
• Compared to simple random sampling, stratified random sampling guarantees
that population subdivisions of interest are represented in the sample.
• Another advantage if that estimates of parameters produced from stratified
sampling have greater precision-that is, smaller variance or dispersion-than
estimates obtained from simple random sampling
Example 1: Bond Indexes and Stratified Sampling
Suppose you are the manager of a mutual fund indexed to the Bloomberg Barclays
US Government/Credit Index. You are exploring several approaches to indexing,
including a stratified sampling approach. You first distinguish among agency bonds,
US Treasury bonds, and investment grade corporate bonds. For each of these three
groups, you define 10 maturity intervals—1 to 2 years, 2 to 3 years, 3 to 4 years, 4
to 6 years, 6 to 8 years, 8 to 10 years, 10 to 12 years, 12 to 15 years, 15 to 20 years,
and 20 to 30 years—and also separate the bonds with coupons (annual interest
rates) of 6 percent or less from the bonds with coupons of more than 6 percent.
1. How many cells or strata does this sampling plan entail?
2. If you use this sampling plan, what is the minimum number of issues the
indexed portfolio can have?
3. Suppose that in selecting among the securities that qualify for selection within
each cell, you apply a criterion concerning the liquidity of the security’s
market. Is the sample obtained random?
The Central Limit Theorem (CLT)
Theorem: Given a population described by any probability distribution
having mean and finite variance , the sampling distribution of the
sample mean computed from samples of size from this population will
be approximately normal with mean and variance when the sample
size is large. ()
Definition: For sample mean computed from samples generated by a
population with standard deviation , the standard error of the sample
mean is when we know or when we do not know the population
standard deviation.
Point Estimator
• Estimator vs Estimate
• To take the example of the mean, the calculated value of the sample mean in a
given sample, used as an estimate of the population mean, is called a point
estimate of the population mean.
• Definition: An unbiased estimator is one whose expected value equals the
parameter it is intended to estimate. (sample mean is an unbiased estimator of
the population mean)
• Definition: An unbiased estimator is efficient if no other unbiased estimator of
the same parameter has a sampling distribution with smaller variance.
• Definition: A consistent estimator is one for which the probability of estimates
close to the value of the population parameter increases as sample size increases.
Confidence Interval
• Definition: A confidence interval is a range for which one can assert
with a given probability , called the degree of confidence, that it will
contain the parameter it is intended to estimate. This interval is often
referred to as the confidence interval for the parameter.
• Construction: A confidence interval for a parameter has the following
structure:
Point estimate Reliability factor Standard error
Confidence Interval for the Population Mean
(Normally Distributed Population with Known
Variance)
• Confidence Intervals for the Population Mean: A confidence interval
for population mean when we are sampling from a normal
distribution with known variance is given by
• Reliability factors for Confidence Intervals Based on the Standard
Normal Distribution:
90% confidence intervals:
95% confidence intervals:
99% confidence intervals:
Confidence Interval for the Population Mean – the z-
Alternative (Large Sample, Population Variance
Unknown)
• Confidence Intervals for the Population Mean: A confidence interval
for population mean when we are sampling from any distribution
with unknown variance and when sampling size is large is given by
Example 2: Suppose an investment analyst takes a random sample of US
equity mutual funds and calculates the average Sharpe ratio. The sample size
is 100, and the average Sharpe ratio is 0.45. The sample has a standard
deviation of 0.30. Calculate and interpret the 90 percent confidence interval
for the population mean of all US equity mutual funds by using a reliability
factor based on the standard normal distribution.
Confidence Interval for the Population Mean – t
Distribution (Population Variance Unknown)
• If we are sampling from a population with unknown variance and the sample is
small, but the population is normally distributed or approximately normally
distributed, a confidence interval for population mean is given by
where the number of degrees of freedom of is n-1 and n is the sample size.
Example 3. Suppose an investment analyst takes a random sample of US equity
mutual funds and calculates the average Sharpe ratio. The sample size is 25,
assume that the Sharpe ratio of US equity mutual funds is normally distributed,
and the average Sharpe ratio is 0.45. The sample has a standard deviation of 0.30.
Calculate and interpret the 90 percent confidence interval for the population
mean of all US equity mutual funds by using a reliability factor based on the t
distribution.