Continuous
Distribution
&
Sampling
STANDARD NORMAL DISTRIBUTION
• Specific Case when mean is zero and
standard deviation is one.
• This distribution is also called Z-
Distribution
•
• Standard distribution is a useful way to understand where a specific observation
falls relative to the entire distribution.
• Observations drawn from normally distributed populations that have different
means and standard deviations and place them on a standard scale.
X−μ
Z=
•
σ
• Most graduate schools of business require applicants for admission to take the
Graduate Management Admission Council’s GMAT examination. Scores on the
GMAT are roughly normally distributed with a mean of 527 and a standard deviation
of 112. What is the probability of an individual scoring above 500 on the GMAT?
• How high must an individual score on the GMAT in order to score in the highest 5%?
•
• The length of human pregnancies from conception to birth approximates a normal
distribution with a mean of 266 days and a standard deviation of 16 days. What
proportion of all pregnancies will last between 240 and 270 days (roughly
between 8 and 9 months)?
•
• The Edwards’s Theater chain has studied its movie customers to determine how
much money they spend on concessions. The study revealed that the
spending distribution is approximately normally distributed with a mean of
$4.11 and a
standard deviation of $1.37. What percentage of customers will spend less than
•
$3.00 on concessions?
SAM PLIN
G
• Population consists of all items of interest in a statistical problem.
• Sample is a subset of the population.
• It is not feasible to gather data on a entire population. Hence we use a subset of the population to
make statistical inference.
• The credibility of statistical inference depends on the quality of the sample on which it is based.
• A “ g o o d ” sample- it is a representative of the population.
• If not “ g o o d ” sample - bias occurs. (Bias is a tendency of a sample statistic to systematically
overestimate or underestimate a population parameter.
SAMPLING METHODS
• Probability Sampling: Uses randomisation to select sample
• Non-Probability Sampling: Uses non- random techniques.
•
Population consists of all items of interest in a statistical problem.
Sample is a subset of the population.
It is not feasible to gather data on a entire population. Hence we use a subset
of the population to make statistical inference.
The credibility of statistical inference depends on the quality of the sample
on which it is based.
A “good” sample- it is a representative of the population.
If not “good” sample - bias occurs. (Bias is a tendency of a sample statistic to
systematically overestimate or underestimate a population parameter).
Sampling Methods
Probability Sampling: Uses randomisation to select sample
Non-Probability Sampling: Uses non- random techniques.
Simple R a n d o m Sampling
A sampling technique in which
each member of the population has
an equal probability of being
selected.
Systematic Sampling
If a sample of size ’n’ is required
from a population of size ’N’-
sample one elements for every n /
N elements in the population.
This method has all the properties
of a simple random sample.
Stratified Sampling
Population is first divided up into
mutually exclusive and collectively
exhaustive groups called ‘strata’
Include randomly selected
observations from each strata
The number of observation per
stratum is proportional
Cluster Sampling
The population is first divided up
into mutually exclusive and
collectively exhaustive group
called ‘clusters’
Sample will include observations
from randomly selected cluster
Stratified
Cluster
Sample consists of observations Sample consists observations from
from each group the selected groups
Preferred when the objective is to Preferred when the objective is to
increase precision reduce costs
Example Question
In 2010, Apple introduced the iPad, a tablet - styled computer that’s its former CEO
Steve Jobs called a ‘ truly magical and revolutionary product’. Suppose you are put
in charge of determining the age profile of people who purchased the iPad in the
United States. Explain in detail the following sampling strategies that you could use
to select a representative sample.
Simple random sampling
Stratified random sampling
Cluster sampling
Non-Probability Sampling
Convenience Sampling
Judgement Sampling
Sampling Distribution
❖ Many possible samples of a given size can be drawn from the population.
❖ Our interest is to determine the population parameter which is constant.
❖ Give the sample- we can calculate an estimate or point estimate of the population parameter of interest.
❖ Since we can obtain different sample- each sample estimate will differ— Statistic — random variable.
❖ Example - sample mean vs population mean.
❖ Sample mean is a statistic which gives all possible estimates of population mean.
❖ Thus, sample mean is a random variable.
The expected value of the sample mean X equals the population mean μ.
E(X) = μ
Standard Deviation of the sample mean X is called standard error of the sample
mean - population standard deviation σ divided by the square root of n.
σ
se(X) =
n
Central Limit Theorem
❖ For any population X with expected value μ and standard deviation σ, the
sampling distribution of X will be approximately normal if the sample size
n is sufficiently large. As a general guideline, the normal distribution is
justified when n ≥ 30.
❖
Example Question
❖ According to survey, high school girls send average 100 texts messages
daily ( The Boston Globe, April 21, 2010). Assume the population standard
deviation is 20 text messages.
❖ What is the probability that the sample mean is more than 105 ?
❖ What is the probability that the sample mean is less than 95?
❖ What is the probability that the sample mean is between 95 and 105?
❖
Sampling
❖ The expected value of the sample proportion equals the population proportion .
❖ E(p) = p
❖ Standard Deviation of the sample proportion is called standard error of the
sample proportion
p(1 −
❖
se(p) =
p) n
Central Limit Theorem for
Sample Proportion
❖ For any population proportion , the sampling distribution of p is
approximately normal if the sample size is sufficiently large. As a
general
guideline, the normal approximation is justified when np ≥ 5 and
n(1 − p) ≥ 5
Anne Jones wants to determine if a packeting campaign has had a lingering effect on the
proportion of customers who are women and teenage girls. Prior to
campaign, 43 % of the customers were women and 21 % were teenage girls. Based on
a random sample of 50 customers after the campaign, these proportions increase to
46% for women and 34 % for teenage girls. Anne has following questions:
If Starbucks chose not to pursue the marketing campaign, how likely is it that
46 % or more ?
If Starbucks chose not to pursue the marketing campaign, how likely is it that
34 % or more ?
❖
CONFIDENCE
INTERVAL
Earlier we discussed about point estimator.
Sometimes it is more informative to provide a range of values - an interval - rather
than a single point- confidence interval
To construct confidence interval - we use sampling distribution. General format —
point estimate +/- margin of error.
A confidence interval provides a range of values that, with a certain level of - also
called -
interval estimate confidence, contains the population parameter of interest
A sample of 25 cereal boxes of Granola, a generic brand of
cereal, yields a mean weight of 1.02 pounds of cereal per box.
Construct the 95% confidence interval for the mean weight for
all cereal boxes. Assume that the weight is normally distributed
with a population standard deviation of 0.03 pound.
Compute the 99% confidence interval of mean
A random sample of 30 households was selected as part of a study
on electricity usage, and number of kilowatt-hours (kWh) was
recorded for each household in the sample for the March quarter
of the previous year it was found that the standard deviation of
the usage was 81 kWh.
Assuming the standard deviation is unchanged and that the usage
is normally distributed, provide an expression for calculating a
99% confidence interval for the mean usage in the March quarter
of 2006.
A random sample of 30 households was selected as
part of a study on electricity usage, and the number
of
kilowatt-hours (kWh) was recorded for each household
in the sample for the March quarter of 2006. The
average usage was found to be 375kWh. In a very large
study in the March quarter of the previous year it was
found
that the standard deviation of the usage was 81kWh.
Assuming the standard deviation is unchanged and
that the usage is normally distributed, provide an
expression for calculating a 99% confidence interval
for the mean usage in the March quarter of 2006.
A random sample of 30 households was selected as part of a
study on electricity usage, and the number of kilowatt-
hours (kWh) was recorded f or each household in t he
sample f or t he March quarter of 2006. The average usage
was found to be 375kWh. In a very large study in the
March quarter of the
previous year it was found that the standard deviation of
the
usage was 81kWh. It is believed that the standard deviation
may have changed from the previous year. From the small
data set in 2006, the sample standard deviation is
91.5kWh. Assuming that the usage is normally distributed,
provide an expression for calculating a 99% confidence
interval for the mean usage in t he M arch quart er of
An indust r ial designer want s t o det ermine t he
average amount of t ime it t akes an adult t o
assemble an “easy t o assemble” toy. A sample of 16
times yielded an average
t ime of 19.92 minut es, wit h a sample st andard
deviat ion of 5.73 minutes. Assuming normality of
assembly times, provide a 95% confidence interval for the
mean assembly time.
What is the smallest sample size required to provide a
95% confidence interval for a mean, if it important
that the interval be no longer than 1cm? You may
assume that t he populat ion is normal wit h
2
. var iance 9cm
To obtain an estimate of the proportion of ‘full time’
university students who have a part time job in excess of 20
hours per week, t he st udent union decides t o
int er view a random sample of f ull t ime st udent s. They
want the length of their 95% confidence interval to be
no greater than 0.1. What size sample,n be taken?
Suppose that a market research firm is hired to
estimate the percent of adults living in a large city
who have cell phones. Five hundred randomly selected
adult residents
in this city are surveyed to determine whether
they have cell phones. Of the 500 people sampled,
421
responded yes – t hey own cell phones. Using
a 95%
confidence level, compute a confidence interval
estimate for the true proportion of adult residents of
this city
The Dundee Dog Training School has a larger than
average proportion of clients who compete in
competitive
professional events. A confidence interval for
the population proportion of dogs that compete
in
professional events from 150 different training
schools is constructed. The lower limit is determined to
be 0.08 and the upper limit is determined to be 0.16.
Determine
the level of confidence used to construct the interval
of the population proportion of dogs that compete in