Faculty of Information Science & Technology
(FIST)
PSM 0325
Introduction to Probability and Statistics
Foundation in Life Science
Foundation in Information Technology
ONLINE NOTES
Topic 5
Sampling and Estimation
FIST, MULTIMEDIA UNIVERSITY (436821-T)
MELAKA CAMPUS, JALAN AYER KEROH LAMA, 75450 MELAKA, MALAYSIA.
URL: http://fist2.mmu.edu.my
PSM0025 Introduction to Probability and Statistics Topic 5
TOPIC 5
SAMPLING AND ESTIMATION
Reference:
Introduction to Probability and Statistics, Assliza Salim. et al.,Pearson. 2011
Objectives:
1. To understand the concept of a population distribution and a sampling distribution.
2. To calculate the probability.
3. To calculate point estimate and confidence interval.
Contents:
1. Population distribution
2. Sampling distribution
3. Mean and standard deviation of x .
4. Shape of sampling distribution
5. Estimation Theory
5.1 POPULATION DISTRIBUTION
The population distribution is the probability distribution of the population data
Denoted by P ( x ) .
5.2 SAMPLING DISTRIBUTION
Sampling distribution is the probability distribution of a sample statistic i.e the
probability distribution of x . Denoted by P x .
5.3 MEAN AND STANDARD DEVIATION OF x .
The mean and standard deviation of the sampling distribution of x are denoted by x
and x respectively.
Mean of the sampling distribution of x
The mean of the sampling distribution of x is always equal to the mean of the
population i.e x = .. This sample statistic is said to be an unbiased estimator of the
population mean, .
Standard Deviation of the Sampling Distribution of x .
__________________________________________________________________________________
1/ 6
PSM0025 Introduction to Probability and Statistics Topic 5
The standard deviation of the sampling distribution of x is
_
x n
where is the standard deviation of the population and n is the sample size.
_
5.4 SHAPE OF THE SAMPLING DISTRIBUTION OF x
Sampling From a Normally Distributed Population
When the population from which samples are drawn is normally distributed with its mean
equal to and standard deviation equal to , then
_
a. The mean of x , is equal to the mean of the population
_
b. The standard deviation of x , is equal to
n
_
c. The shape of the sampling distribution of x is normal, whatever the value of n.
Sampling From a Population That is Not Normally Distributed
Central Limit Theorem
_
For a large sample size, the sampling distribution of x is approximately normal,
irrespective of the shape of the population distribution. The mean and standard deviation
_
of the sampling distribution of x are
and
_ _
x x n
The sample size is usually considered to be large if n 30 .
_
Applications of The Sampling Distribution of x .
The z value for a value of x is calculated as
_
_
x
z
_
x
Example :
Assume that the weights of all packages of a certain brand of cookies are normally
distributed with a mean of 32 ounces and a standard deviation of 0.3 ounces. Find the
_
probability that the mean weight, x , of a random sample of 20 packages of this brand of
cookies will be between 31.8 and 31.9 ounces.
__________________________________________________________________________________
2/ 6
PSM0025 Introduction to Probability and Statistics Topic 5
Solution :
_ = 32 ounces 0.3
and
_ = = 0.06708204
x x n 20
31.8 32
z 2.98
_
For x 31.8
0.06708204
31.9 32
z 1.49
_
For x =31.9
0.06708204
_
P (31.8 x 31.9) P (2.98 Z 1.49)
P ( 2.98 Z 0) P ( 1.49 Z 0)
0.4986 0.4319 0.0667
Therefore, the probability that the mean weight of a sample of 20 packages will be
between 31.8 and 31.9 ounces is 0.0667.
5.5 ESTIMATION THEORY
Introduction
Purpose: to build a foundation that allows statisticians to draw conclusions about the
population parameters from experimental data.
Statistical Inference
Statistical inference consists of those method by which one makes inferences or
generalizations about a population. It may be divided into two major areas: estimation
and test of hypotheses.
1. Estimation
- is a procedure by which numeric value(s) are assigned to a population parameter based
on the information collected from a sample.
2. Estimate
- is the value(s) assigned to a population parameter based on the value of sample statistic.
_
- Eg. = 5.5
x = 5.5
Therefore, estimate of = 5.5
3. Estimator
- is the sample statistic used to estimate a population parameter.
_
- Eg. .
x is an estimator for
4. Point Estimate
__________________________________________________________________________________
3/ 6
PSM0025 Introduction to Probability and Statistics Topic 5
- is the value of a sample statistic that is used to estimate a population parameter.
_
- A point estimate of a population parameter is a single value of a statistic
x.
_ _
- Eg. = 80, then using will give us = 80 .
x x as a point estimate of
There are many situations in which it is preferable to determine an interval which we
would expect to find the value of the parameter. Such an interval is called an interval
estimate.
5. Interval Estimate
In interval estimation, an interval is constructed around the point estimate, and it is stated
that this interval is likely to contain the corresponding population parameter.
An interval estimate of a population parameter is an interval of the form ˆL ˆU
where ˆL andˆU depend on the value of the statistic for a particular sample and
also on the sampling distribution of .
Confidence Level & Confidence Interval
- each interval is constructed with regard to a given
confidence level and is called a confidence interval.
- the confidence level associated with a confidence
interval states how much confidence we have that this
interval contains the true population parameter.
- The confidence level is denoted by (1- ) 100%.
One Population : Confidence interval of ; known
If x is the mean of a random sample of size n from a population with known variance
2 , a (1 )100% confidence interval for is given by
x z / 2 x z / 2 ,
n n
_
(or x z / 2
n
)
where z / 2 is the z-value leaving an area of / 2 to the right.
Steps To Follow:
1. First we obtain the value of 1- .
2
2. Then we locate this values in the body of the normal distribution table & record the
corresponding value of z.
3. Calculate the confidence interval for .
__________________________________________________________________________________
4/ 6
PSM0025 Introduction to Probability and Statistics Topic 5
Example
The average zinc concentration recovered from a sample of zinc measurements in 36
different locations is found to be 2.6 grams per milliliter. Find the 95% and 99%
confidence intervals for the mean zinc concentration in the river. Assume that the
population standard deviation is 0.3.
Solution
_
The point estimate of is
x2.6 .
_
(1- )100% = 95% , then =0.05.
n= 36 ,
x = 2.6 , = 0.3
The z-value, leaving an area of 0.025 to the right and an area of 0.975 to the left, by
locating 0.975 in the body of the normal distribution table, is z0.025 1.96.
Hence the 95% confidence interval is
0.3 0.3
2.6 (1.96)( ) 26 (1.96)( ),
36 36
2.50 2.70.
To find a 99% confidence interval, = 0.005, we find the z-value leaving an area of
0.005 to the right and 0.995 to the left by locating 0.005 in the body of the normal
distribution table, z0.005 2.575 , and the 99% confidence interval is
0.3 0.3
2.6 ( 2.575)( ) 26 ( 2.575)( ),
36 36
2.47 2.73.
We now see that a longer interval is required to estimate with a higher degree of
accuracy.
Example: [ IS. Pg. 379, Eg.8-1]
A publishing company has just published a new college textbook. Before the company
decides the price at which to sell this textbook, it wants to know the average price of all
such textbooks in the market. The research department at the company took a sample of
36 such textbooks and collected information on their prices. This information produced a
mean price of $54.40 for this sample. It is known that the standard deviation of the prices
of all such textbooks is $4.50.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such textbooks.
Solution:
__________________________________________________________________________________
5/ 6
PSM0025 Introduction to Probability and Statistics Topic 5
_
n= 36,
x = $54.40 , = $4.50
_
(a) The point estimate of =
x = $54.40
(b) First we divide = 0.1 by 2 to get 0.05
Then locate 1-0.05 = 0.95 in the body of the normal
distribution table. So, z = 1.65 .
Therefore, the 90% confidence interval for is
_
4.50
x z / 2
n
= 54.40 1.65 ( 36
) = 54.40
1.24
= $53.16 to $55.64
We are 90% confidence that the mean price of all such textbooks is between $53.16 and
$55.64.
Theorem
If x is used as an estimate of , we can then be (1 )100% confident that the error
will not exceed z / 2 / n .
Theorem
If x is used as an estimate of , we can be (1 )100% confident that the error will
z 2
not exceed a specified amount e when the sample size is n( / 2 ) .
e
Example
How large a sample is required in Example 9.2 if we want to be 95%confident that our
estimate of is off by less than 0.05?
Solution
The population standard deviation is 0.3 . Then, by Theorem,
(1.96)(0.3)
n 138.3 .
0.05
Therefore, we can be 95% confident that a random sample of size 139 will provide an
estimate x different from by an amount less than 0.05.
-----------------------------------------End of Topic 5------------------------------------------------
__________________________________________________________________________________
6/ 6