Normal Distribution
Sampling Distribution
RISHI SINHA
Normal Distribution
•Definition and Characteristics of Normal Distribution
•Properties and Assumptions of Normal Distribution
•Importance in Statistical Analysis
What is normal distribution?
A normal distribution is a type of continuous probability distribution in which most data points
cluster toward the middle of the range, while the rest taper off symmetrically toward either
extreme. The middle of the range is also known as the mean of the distribution.
The normal distribution is also known as a Gaussian distribution or probability bell curve. It is
symmetric about the mean and indicates that values near the mean occur more frequently
than the values that are farther away from the mean.
Assumptions of Normal Distribution
Assumptions of Normal Distribution (Contd…)
Empirical rule
• In normally distributed data, there is a constant proportion of data points lying under the
curve between the mean and a specific number of standard deviations from the mean.
• Thus, for a normal distribution, almost all values lie within
3 standard deviations of the mean.
• These check buttons of normal distribution will help you
realize the appropriate percentages of the area under the curve.
• Remember that this empirical rule applies to all normal
distributions. Also, note that these rules are applied
only to the normal distributions.
Empirical Rule
Source: Introductory Statistics by Neil A.
Weiss
Assumptions of Normal
Distribution (revisited…)
•Independence of Observations
•Homogeneity of Variance
•Linearity
•Residuals Normally Distributed
Applications of Normal
Distribution
•Central Limit Theorem
•Z-Score and Standard Normal Distribution
•Hypothesis Testing and Confidence Intervals
Classification - Accuracy
Sampling Distribution
The sampling distribution of the sample mean is the distribution of all possible
sample means for samples of a given size.
In statistics, the following terms and phrases are synonymous.
> Sampling distribution of the sample mean
> Distribution of the variable x bar (sample mean)
> Distribution of all possible sample means of a given sample size
Sampling Size & Error
•Sampling error is the error resulting
from using a sample to estimate a
population characteristic.
•Sampling error tends to be smaller
for large samples than for small
samples.
•Lets do Examples
Sampling Size & Error (Contd…)
Heights of Starting Players The heights, in inches,
of the five starting players on a men’s basketball
team are repeated. Here the population is the five
players and the variable is height.
Standard Deviation of the
Sample Mean
When sampling is done without replacement from a finite population:
When sampling is done with replacement from a finite population or when it is
done from an infinite population, the appropriate formula is
When the sample size is small relative to the population size, there is little difference between
sampling with and without replacement. So, in such cases, the two formulas for σ ¯ x yield almos
the same numbers.
In most practical applications, the sample size is small relative to the population size, so in this
book, we use the second formula only
Sampling Size & Error (Contd…)
Possible sample means cluster more closely around the population mean as the
sample size increases, and therefore the larger the sample size, the smaller the
sampling error tends to be in estimating a population mean by a sample mean. Here
is why that key fact is true.
Do sum : EXAMPLE 7.10 Pg-345, Weiss
Central Limit Theorem
For a large sample size, the
possible sample means are
approximately normally
distributed, regardless of the
distribution of the variable under
consideration.
Summary – Sampling Distribution
Estimating Population Mean
A common problem in statistics is to obtain information about the mean, μ, of a
population. For example, we might want to know:-
the mean age of people in the civilian labor force,
the mean cost of a wedding,
the mean gas mileage of a new-model car, or
the mean starting salary of liberal-arts graduates.
If the population is small, we can ordinarily determine μ exactly by first taking a census
and then computing μ from the population data.
If the population is large, however, as it often is in practice, taking a census is generally
impractical, extremely expensive, or impossible. Nonetheless, we can usually obtain
sufficiently accurate information about μ by taking a sample from the population.
Point Estimate
Prices of New Mobile Homes The U.S. Census Bureau publishes annual price figures
for new mobile homes in Manufactured Housing Statistics. The figures are obtained
from sampling, not from a census. A simple random sample of 36 new mobile homes
yielded the prices,
in thousands of dollars, shown in below table. Use the data to estimate
the population mean price, μ, of all new mobile homes.
A point estimate of
a parameter is the
value of a statistic
used to estimate the
parameter.
Confidence Interval Estimate
A sample mean is usually not equal to the population mean; generally, there is
sampling error. Therefore, we should accompany any point estimate of μ with
information that indicates the accuracy of that estimate. This information is called
a confidence-interval estimate for μ.
A confidence-interval estimate for a parameter provides a range of numbers along
with a percentage confidence that the parameter lies in that range.
Confidence Interval Estimate
(Contd…)
Consider again the problem of estimating the (population) mean price, μ, of all
new mobile homes by using the sample data in earlier Table (below). Let’s
assume that the population standard deviation of all such prices is $7.2 thousand,
that is, $7200.
Confidence Interval Estimate
(Contd…)
Confidence Interval Estimate
(Contd…)
Check Empirical
Rule
Confidence Interval Estimate
(Contd…)