Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views41 pages

MDPN460 Lecture04

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views41 pages

MDPN460 Lecture04

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

MDPN460 – Industrial

Engineering Lab
Lecture 4

Analysis of Variance (ANOVA)


1 / 41
Today’s Lecture

Analysis of Variance (ANOVA)
– What is ANOVA?
– Calculations involved.
– ANOVA table
– Demonstrative examples

2 / 41
ANalysis Of VAriance (ANOVA)


ANOVA (ANalysis Of VAriance) is a statistical
method for determining the existence of
differences among several population means.

m1 m2 m3
Population 1 Population 2 Population 3

3 / 41
ANOVA


We have r independent random samples, each one corresponding
to a population subject to a different treatment.

We have:
– n = k1+ k2+ k3+ ...+kr total observations.

– r sample means: x1, x2 , x3 , ... , xr


– These r sample means can be used to calculate an estimator of
the population variance. If the population means are equal, we
expect the variance among the sample means to be small.
– r sample variances: s12, s22, s32, ...,sr2
– These sample variances can be used to find a pooled estimator
of the population variance.
4 / 41
ANOVA


We assume independent random sampling from each of the r
populations

We assume that the r populations under study:
– are normally distributed,
– with means mi that may or may not be equal,
– but with equal variances, si2.

m1 m2 m3
Population 1 Population 2 Population 3
5 / 41
The Hypothesis Test of ANOVA
Thehypothesis
The hypothesistesttestofofanalysis
analysisof ofvariance:
variance:
HH00::mm11==mm22==mm33==mm44==......mmr r
HH11::Not allmmi i(i(i==1,
Notall 1,...,
...,r)r)are
areequal
equal

Thetest
The teststatistic
statisticof
ofanalysis
analysisof
ofvariance:
variance:

Estimateofofvariance
variancebased
basedon
onmeans
meansfrom
fromr rsamples
samples
FF(r-1, =
n-r) =
(r-1,n-r)
Estimate
Estimateofofvariance
Estimate variancebased
basedon
onall
allsample
sampleobservations
observations
Thatis,
That is,the
thetest
teststatistic
statisticininanananalysis
analysisofofvariance
varianceisisbased
basedon
onthe
theratio
ratioofoftwo
twoestimators
estimators
ofaapopulation
of populationvariance,
variance,and andisistherefore
thereforebased
basedononthe
theFFdistribution,
distribution,with
with(r-1)
(r-1)degrees
degreesofof
freedomininthe
freedom thenumerator denotednn1and
numeratordenoted and(n-r)
(n-r)degrees
degreesofoffreedom
freedomininthethedenominator,
denominator,
1
denotednn2. .
denoted 2
F-Distribution
Let s12 and s 22 represent the sample variances of two different
populations. If both populations are normal and the population
variances σ12 and σ 22 are equal, then the sampling distribution of
2
s 1
F= 2
s 2
is called an F-distribution.
There are several properties of this distribution.

1. The F-distribution is a family of curves each of which is


determined by two types of degrees of freedom: the degrees of
freedom corresponding to the variance in the numerator,
denoted n1, and the degrees of freedom corresponding to the
variance in the denominator, denoted n2.
Continued.
F-Distribution
Properties of the F-distribution continued:
2. F-distributions are positively skewed.
3. The total area under each curve of an F-distribution is equal to 1.
4. F-values are always greater than or equal to 0.
5. For all F-distributions, the mean value of F is approximately equal to 1.

n1 = 1 and n2 = 8

n1 = 8 and n2 = 26
n1 = 16 and n2 = 7
n1 = 3 and n2 = 11

F
1 2 3 4
Critical Values for the F-Distribution

Finding Critical Values for the F-Distribution


1. Specify the level of significance 𝛼.
2. Determine the degrees of freedom for the numerator, n1.
3. Determine the degrees of freedom for the denominator, n2.
4. Use Tables in the F-Distribution table to find the critical value. If
the hypothesis test is
(a) one-tailed, use the 𝛼 F-table.
two-tailed, use the 𝛼 F-table
1
(b)
2
Critical Values for the F-Distribution

Find the critical F-value for a right-tailed test when α= 0.05,


Example:

n1 = 5 and n2 = 28.
F-Distribution
Denominator 𝛼 = 0.05
degrees of Numerator degrees of freedom (n1)
freedom (n2) 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33

27 4.21 3.35 2.96 2.73 2.57 2.46


28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.


Critical Values for the F-Distribution
Example:

when 𝛼 = 0.10, n1 = 4 and n2 = 6.


Find the critical F-value for a two-tailed test 1 1
𝛼 = (0.10) = 0.05
2 2

F-Distribution
Denominator 𝛼 = 0.05
degrees of Numerator degrees of freedom (n1)
freedom (n2) 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
6 5.99 5.14 4.76 4.53 4.39 4.28
7 5.59 4.74 4.35 4.12 3.97 3.87

The critical value is F0 = 4.53.


When the Null Hypothesis Is True

Whenthe
When thenull
nullhypothesis
hypothesisisistrue:
true:
H0: m = m = m
WeWewould
wouldexpect
expectthe
thesample
samplemeans
meanstotobe
benearly
nearlyequal,
equal,
asasininthis
thisillustration.
illustration. And
Andwewewould
wouldexpect
expectthe
the
variationamong
variation amongthe thesample
samplemeans
means(between
(betweensample)
sample)
x
totobe
besmall,
small,relative
relativetotothe
thevariation
variationfound
foundaround
aroundthe
the
individualsample
individual samplemeans
means(within
(withinsample).
sample).

IfIfthe
thenull
nullhypothesis
hypothesisisistrue,
true, the
thenumerator
numeratorininthethetest
test
statisticisisexpected
statistic expectedtotobe
besmall,
small,relative
relativetotothe
the
x denominator:
denominator:

FF(r-1, = Estimate of variance based on means from r samples


n-r)=
(r-1, n-r) Estimate of variance based on means from r samples
Estimate of variance based on all sample observations
Estimate of variance based on all sample observations

x
When the Null Hypothesis Is False

When the null hypothesis is false:


m is equal to m but not to m ,
m is equal to m but not to m ,
m is equal to m but not to m , or
m , m , and m are all unequal.
x x x

Inany
In anyofofthese
thesesituations,
situations,we
wewould
wouldnot
notexpect
expectthe
thesample
samplemeans
meanstotoall
allbe
benearly
nearlyequal.
equal.
Wewould
We wouldexpect
expectthe
thevariation
variationamong
amongthethesample
samplemeans
means(between
(betweensample)
sample)totobe
belarge,
large,
relativetotothe
relative thevariation
variationaround
aroundthe
theindividual
individualsample
samplemeans
means(within
(withinsample).
sample).

IfIfthe
thenull
nullhypothesis
hypothesisisisfalse,
false, the
thenumerator
numeratorininthe
thetest
teststatistic
statisticisisexpected
expectedtotobe
belarge,
large,
relativetotothe
relative thedenominator:
denominator:

FF(r-1, = Estimate of variance based on means from r samples


n-r)=
(r-1, n-r) Estimate of variance based on means from r samples
Estimate of variance based on all sample observations
Estimate of variance based on all sample observations
Example 1
A study was conducted to determine if the drying time for a certain
paint is affected by the type of applicator used. The data in the table
in the next slide represents the drying time (in minutes) for 3 different
applicators when the paint was applied to standard wallboard.
Is there any evidence to suggest the type of applicator has a
significant effect on the paint drying time at the 0.05 level?

Note:
1. The type of applicator is a treatment.
2. The data values from repeated samplings are called replicates.
Sample Results:

(Treatment)
Applicator (Level)
Brush Roller Pad
(i = 1 ) (i = 2) (i = 3)
39.1 31.6 32.7
39.4 33.4 33.2
31.1 30.2 28.7
33.7 41.8 29.2
30.5 33.9 25.8
34.6 31.4
26.7
29.5
Sum C 1= 208.4 C 2 = 170.9 C 3 = 237.2
Mean x¯1= 34.73 x¯2 = 34.18 x¯3 = 29.65

x̄=32.45
Note:
1. The drying time is measured by the mean value.
x i is the mean drying time for treatment i, i = 1, 2, 3.
2. There is a certain amount of variation among the means.
3. Some variation can be expected, even if all three population
means are equal.
4. Consider the question: “Is the variation among the sample means
due to chance, or it is due to the effect of applicator on drying
time?”
Solution:
1. The Set-up:
a. Population parameter of concern: The mean at each
treatment of the test factor. Here, the mean drying time for
each applicator.
b. The null and the alternative hypothesis:
H 0: m 1 = m 2 = m 3
The mean drying time is the same for each applicator.
Ha: mi ≠ mj for some i ≠ j
Not all drying time means are equal.
2. The Test Criteria:
a. Assumptions: The data was randomly collected and all
observations are independent. The effects due to chance and
untested factors are assumed to be normally distributed.
b. Test statistic: F test statistic (see below).
c. Level of significance: a = 0.05
3. The Sample Evidence:
a. Sample information: Data listed in the given table.
b. Calculate the value of the test statistic:
The F statistic is a ratio of two variances.
Separate the variance in the entire data set into two parts.
Partition the Total Sum of Squares:
Consider the numerator of the fraction used to define the sample
variance:
SS(Total)
2
s=
∑ (x− x̄)
2

n−1
The numerator of this fraction is called the sum of squares, or total
sum of squares.

Notation:
Ci = total for column i
ki = number of observations for treatment i
n = ∑ k i = total number of observations
SS(total)=∑ ( x− x̄)2
2

SS(total)=∑ ( x−
∑ ( x)
)
n
2
x ∑ ( x) ( ∑ x)
SS(total)=∑ ( x −2
2
+ 2
)
n n
2

SS(total)=∑ ( x )−2
2 ∑ ( x) ∑ (x) ∑ ((∑ x) )
+ 2
n n
2 2
(∑ x) n(∑ x)
SS(total)=∑ ( x )−2
2
+
n n2
2
(∑ x)
SS(total)=∑ ( x )−
2
n
2
(∑ x)
SS (total)=∑ ( x )−
2
n
= total variation in data

( )
2
( ∑ x)
2 2 2
C C C
1 2 3
SS (factor )= + + +... −
k1 k 2 k 3 n
= variation between treatments

( )
2 2 2
C C C
SS (error )=∑ ( x )−
2 1 2 3
+ + +...
k1 k2 k3
=SS (total)−SS (factor )
= variation within rows
Calculations:
2
(∑ x) (616.5) 2
SS (total)=∑ ( x )−
2
=20316.69−
n 19
= 20316.69 – 20003.80 = 312.89

( )
2
( ∑ x)
2 2 2
C C C 1 2 3
SS (factor )= + + +... −
k1 k 2 k 3 n

( )
2 2 2 2
208.4 170.9 237.2 (616.5)
= + + −
6 5 8 19
= 20112.77 – 20003.8 = 108.97

SS (error )=SS (total)−SS (factor )


= 312.89 – 108.97 = 203.92
An ANOVA table is often used to record the sums of squares and to
organize the rest of the calculations.

Format for the ANOVA Table:

Source df SS MS
Factor 108.97
Error 203.92
Total 312.89
Degrees of freedom, df, associated with each of the three sources
of variation:
1. df(factor): one less than the number of treatments (columns), c,
for which the factor is tested.
df(factor) = c - 1
2. df(total): one less than the total number of observations, n.
df(total) = n - 1
n = k1 + k2 + k3 + ...
3. df(error): sum of the degrees of freedom for all levels
tested. Each column has ki - 1 degrees of freedom.
df(error) = (k1 - 1) + (k2 - 1) + (k3 - 1) + ...
=n-c
Calculations:
df(factor) = df(applicator) = c - 1 = 3 - 1 = 2
df(total) = n - 1 = 19 - 1 = 18
df(error) = n - c = 19 - 3 = 16

Note:
The sums of squares and the degrees of freedom must check.

SS(factor) + SS(error) = SS(total)

df(factor) + df(error) = df(total)


Mean Square:
The mean square for the factor being tested and for the error is
obtained by dividing the sum-of-square value by the corresponding
number of degrees of freedom.

SS (factor) SS(error)
MS (factor )= MS (error)=
df (factor ) df (error)
Calculations:

SS (factor) 108.97
MS (factor )= = =54.49
df (factor ) 2

SS(error) 203.92
MS (error)= = =12.75
df (error) 16
The Complete ANOVA Table:
Source df SS MS
Factor 2 108.97 54.59
Error 16 203.92 12.75
Total 18 312.89

The Test Statistic:


MS (factor) 54.49
F* = = =4.27
MS(error) 12.75

Numerator degrees of freedom = df(factor)


Denominator degrees of freedom = df(error)
4. The Probability Distribution (Classical Approach):
a. Critical value: F( n1 = 2, n2 = 16, a = 0.05) = 3.63
b. F* is in the rejection region.

5. The Probability Distribution (p-Value Approach):


a. The p-value:
By computer: P = 0.033
b. The p-value is smaller than the level of significance, a.
6. The Results:
a. Decision: Reject H0.
b. Conclusion: There is evidence to suggest the three
population means are not all the same. The type of
applicator has a significant effect on the paint drying
time.
The Logic Behind ANOVA
• Many experiments are conducted to determine
the effect that different levels of some test factor
have on a response variable.
• Single-factor ANOVA: obtain independent
random samples at each of several levels of the
factor being tested.
• Draw a conclusion concerning the effect that the
levels of the test factors have on the response
variable.
The Logic of the Analysis of Variance Technique:
1. In order to compare the means of the levels of the test
factor, a measure of the variation between the treatments
(columns), the MS(factor), is compared to a measure of the
variation within the levels, MS(error).
2. If the MS(factor) is significantly larger than the MS(error),
then the means for each of the factor levels are not all the
same.
This implies the factor being tested has a significant effect
on the response variable.
3. If the MS(factor) is not significantly larger than the
MS(error), we cannot reject the null hypothesis that all
means are equal.
Example: Do the box-and-whisker plots below show
sufficient evidence to indicate a difference in the three
population means?

Level 3

Level 2

Level 1

20 25 30 35 40
Time
Solution:
1. The box-and-whisker plots show the relationship among
the three samples.
2. The plots suggest the three sample means are different
from each other.
3. This suggests the population means are different.
4. There is relatively little within-sample variation, but a
relatively large amount of between-sample variation.
Example: Do the box-and-whisker plots below show
sufficient evidence to indicate a difference in the three
population means?

Level 4

Level 3

Level 2

Level 1

60 80 100 120 140


Speed
Solution:
1. The box-and-whisker plots show the relationship among
the four samples.
2. The plots suggest the four sample means are not different
from each other.
3. There is relatively little between-sample variation, but a
relatively large amount of within-sample variation.

The data values within each sample cover a relatively wide


range of values.
Applications of Single-Factor
(One-Way) ANOVA
• Consider the notation used in ANOVA.
• Each observation has two subscripts: first
indicates the column number (test factor level);
second identifies the replicate (row) number.
• The column totals: Ci
• The grand total (sum of all x’s): T
Notation used in ANOVA:

Factor Levels

Sample from Sample from Sample from Sample from
Replication Level 1 Level 2 Level 3 Level C
k =1 x 1,1 x 2,1 x 3,1 x c ,1
k =2 x 1,2 x 2,2 x 3,2 x c ,2
k =3 x 1,3 x 2,3 x 3,3 x c ,3

Column C1 C2 C3 Cc T
Totals T = grand total = sum of all x 's = 
∑x = 
∑C i
Mathematical Model for Single-Factor ANOVA:

x c , k =μ + F c + εk (c)

1. m: mean value for all the data without respect to the test
factor.
2. Fc: effect of factor (level) c on the response variable.
3. ek(c): experiment error that occurs among the k replicates in
each of the c columns.
Example: A study was conducted to determine the effectiveness of
various drugs on post-operative pain. The purpose of the
experiment was to decide if there is any difference in length of pain
relief due to drug. Eighty patients with similar operations were
selected at random and split into four groups. Each patient was
given one of four drugs and checked regularly. The length of pain
relief (in hours) was recorded for each patient. At the 0.05 level of
significance, is there any evidence to reject the claim that the four
drugs are equally effective?

Note:
1. The data is omitted here.
2. The ANOVA table is given in a later slide.
Solution:
1. The Set-up:
a. Population parameter of interest: The mean time of pain
relief for each factor (drug).
b. The null and alternative hypothesis:
H0: m1 = m2 = m3 = m4
Ha: the means are not all equal.
2. The Hypothesis Test Criteria:
a. Assumptions: The patients were randomly assigned to
drug and their times are independent of each other. The
effects due to chance and untested factors are assumed
to be normally distributed.
b. Test statistic: F* with df(numerator) = df(factor) = 3 and
df(denominator) = df(error) = 80 - 4 = 76
c. Level of significance: a = 0.05
3. The Sample Evidence:
a. Sample information: The ANOVA table:
Source df SS MS
Factor 3 70.84 23.61
Error 76 226.05 2.97
Total 79 296.89

b. Calculate the value of the test statistic:

MS (factor) 23.61
F* = = =7.95
MS(error) 2.97
4. The Probability Distribution (Classical Approach):
a. Critical value: F(3, 76, 0.05) @ 2.72
b. F* is in the rejection region.
5. The Probability Distribution (p-Value Approach):
a. The p-value:
P = P(F* > 7.95, with dfn = 3, dfd = 76) < 0.01
By computer: P @ .0001
b. The p-value is smaller than the level of significance, a.
6. The Results:
a. Decision: Reject H0.
b. Conclusion: There is evidence to suggest that not all
drugs have the same effect on length of pain relief.

You might also like