Population Sample = Observations
(Some Unknown (We calculate Some
Parameters) Statistics)
Example: 6 October Example: 20 Students
U Students (Height from 6 October U
Mean) (Sample Mean)
N=Population Size n = Sample Size
· Let X1,X2,…,XN be the population values (in general, they
are unknown)
· Let x1,x2,…,xn be the sample values (these values are
known)
· Statistics obtained from the sample are used to estimate
(approximate) the parameters of the population.
* Statistical Inference
(1) Estimation:
→ Point Estimation
→ Interval Estimation (Confidence Interval)
(2) Hypotheses Testing
Some Important Statistics:
Definition:
Any function of the random sample X1, X2, …, Xn is called a
statistic.
Central Tendency in the Sample:
Definition:
If X1, X2, …, Xn represents a random sample of size n, then the
sample mean is defined to be the statistic:
n
X1 + X 2 + + X n ∑X i
X= = i =1
(unit)
n n
Variability in the Sample:
Definition:
If X1, X2, …, Xn represents a random sample of size n, then the
sample variance is defined to be the statistic:
n
∑(X i − X )2
( X1 − X )2 + ( X 2 − X )2 + + ( X n − X )2
2
S = i =1
= (unit)2
n −1 n −1
Theorem: (Computational Formulas for S2)
Note:
· S2 is a statistic because it is a function of the random
sample X1, X2, …, Xn.
· S2 measures the variability in the sample.
n
The standard deviation ∑(X − X )
2
i
2 i =1
S= S = (unit)
n −1
Example:
Compute the sample variance and standard deviation of the
following observations (ages in year): 10, 21, 33, 53, 54.
Solution:
n=5
n 5
∑x i ∑x i
10 + 21 + 33 + 53 + 54 171
x= i =1
n
= i =1
5
=
5
=
5
= 34.2 (year)
n
2
∑x 2
i − nx
S2 = i =1
n −1
xi 10 21 33 53 54 ∑ xi = 171
x 2 100 441 1089 2809 2916 ∑ x i2 = 7355
i
7355 − (5)(34.2)
2
1506.8
= = = 376.7
5 −1 4
(year)2
The sample standard deviation is:
S = S 2 = 376.7 = 19.41 (year)
Random Sampling:
• Each observation in a population is a value of a random
variable X having some probability distribution f(x).
• To eliminate bias in the sampling procedure, we select a
random sample in the sense that the observations are made
independently and at random.
• The random sample of size n is: X1, X2, …, Xn
It consists of n observations selected independently and
randomly from the population.
E( X ) = µ X = µ
and variance
σ2
Var ( X ) = σ =
2
X
n
· If X1, X2, …, Xn is a random sample of size n from N(µ,σ),
σ
µ σ
then X ~N( X , X ) or X ~N(µ, ).
n
σ X −µ
· X ~ N(µ, )⇔Z= ~ N(0,1)
n σ/ n
Theorem: (Central Limit Theorem)
If X1, X2, …, Xn is a random sample of size n from any distribution
(population) with mean µ and finite variance σ2, then, if the
sample size n is large, the random variable
X −µ
Z=
σ/ n
is approximately standard normal random variable, i.e.,
X −µ
Z= ~ N(0,1) approximately.
σ/ n
X −µ σ
Z = ~ N(0,1) ⇔ X ~ N( µ , )
σ/ n n
We consider n large when n ≥ 30.
For large sample size n, X has approximately a normal
distribution with mean µ and variance σ 2
, i.e.,
σ n
X ~ N( µ , ) approximately.
n
The sampling distribution of X is used for inferences about the
population mean µ.
Example:
An electric firm manufactures light bulbs that have a length of life
that is approximately normally distributed with mean equal to
800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an
average life of less than 775 hours.
Solution:
X= the length of life
µ=800 , σ=40
X~N(800, 40)
n=16
µ X = µ = 800
σ 40
σX = = = 10
n 16
σ
X ~ N(µ, ) = N(800,10)
n
X −µ X − 800
⇔Z= =Z= ~ N(0,1)
σ/ n 10
X − 800 775 − 800
= P <
10 10
775 − 800
= P Z <
10
= P(Z < −2.50 )
= 0.0062
t-Distribution:
Recall that, if X1, X2, …, Xn is a random sample of size n
from a normal distribution with mean µ and variance σ2, i.e.
N(µ,σ), then
X −µ
Z= ~ N(0,1)
σ/ n
We can apply this result only when σ2 is known.
If σ2 is unknown, we replace the population variance σ2 with
n
2
∑(Xi − X )
the sample variance S 2 = i =1 · to have the
following statistic n −1
X −µ
T=
S/ n
Result:
If X1, X2, …, Xn is a random sample of size n from a normal
distribution with mean µ and unknown variance σ2, i.e. N(µ,σ),
then the statistic
X −µ
T=
S/ n
has a t-distribution with ν=n−1degrees of freedom (df), and we
write T~ t(ν).
Note:
t-distribution is a continuous
distribution.
The shape of t-distribution is similar to
the shape of the standard normal
distribution.
Notation:
t α = The t-value above which we find an area equal to α, that
is P(T> t α) = α
Since the curve of the pdf of T~ t(ν) is symmetric about 0, we
have
t1 − α = − t α
Values of tα are tabulated in Table A-4 (p.683).
Critical Values of the t-distribution (tα )
Critical Values of the t-distribution (tα )
Example:
Find the t-value with ν=14 (df) that leaves an area
of:
(a) 0.95 to the left.
(b) 0.95 to the right.
Solution:
ν = 14 (df); T~ t(14)
(a) The t-value that leaves an area of 0.95 to the left is
t0.05 = 1.761
(b) The t-value that leaves an area of 0.95 to the right is
t0.95 = − t 1 − 0.95 = − t 0.05 = − 1.761
Example:
For ν = 10 degrees of freedom (df), find t0.10 and t 0.85 .
Solution:
t0.10 = 1.372
t0.85 = − t1−0.85 = −t 0.15 = −1.093 (t0.15 = 1.093)
Sampling Distribution of the Sample Proportion:
Suppose that the size of a population is N. Each element of the
population can be classified as type A or non-type A. Let p be
the proportion of elements of type A in the population. A random
sample of size n is drawn from this population. Let p̂ be the
proportion of elements of type A in the sample.
Let X = no. of elements of type A in the sample
p =Population Proportion
no. of elements of type A in the population
=
N
p̂ = Sample Proportion
no. of elements of type A in the sample X
= =
n n
Result:
(1) X ~ Binomial (n, p)
(2) E( p̂ )= E( X )= p
n
X pq
(3) Var( p̂ ) = Var( )= ; q =1− p
n n
(4) For large n, we have
p̂ ~ N(p, pq ) (Approximately)
n
pˆ − p
Z= ~ N(0,1) (Approximately)
pq
n