0% found this document useful (0 votes)

50 views51 pages

BBADM 221 Unit 10 - With Notes

The document discusses analysis of variance (ANOVA) testing which is used to compare several population means. It provides examples and steps for conducting a one-way ANOVA test including calculating sums of squares, degrees of freedom, mean squares, and the F-statistic to test for differences between three diet plans' mean weight loss.

Uploaded by

Ran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views51 pages

BBADM 221 Unit 10 - With Notes

Uploaded by

Ran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

1

BADM 221 Statistics for Business

Week 10
ANOVA (Analysis of Variance)
2
ANOVA

Test of several means – n-sample hypothesis test

Many statistical applications in psychology, social science,

business administration, and the natural sciences involve several
groups.

Example:
• An experiment to study the effects of five different brands of
gasoline on car engine efficiency.
• A consumer looking for a new car might compare the average
gas mileage of seven car models.
• A professor wishes to study the effect of four different teaching
techniques on mathematics proficiency.
3
ANOVA
The characteristic that differentiates the treatments from one
another is called the factor of the study. The different
treatments is called the levels of the factor. Here, we only
consider one factor.
Example:
• An experiment to study the effects of five different brands of gasoline
on car engine efficiency.
Factor: Gasoline brand Treatments: The 5 different
brands
• A consumer looking for a new car might compare the average gas
mileage of seven car models.
Factor: Car model Treatments: The 7
car models
• A professor wishes to study the effect of four different teaching
techniques on mathematics proficiency.
Factor: Teaching technique Treatments: The 4 different
techniques
4
ANOVA

For hypothesis tests comparing averages among more

than two groups, statisticians have developed a method
called “Analysis of Variance” (abbreviated ANOVA)

One-way ANOVA
(Single-factor ANOVA)

The purpose of an ANOVA test is to determine whether

there is any significant difference among several group
means. The test uses variances to help determine if the
means are equal or not.
5
ANOVA

Two kinds of variances (source of variations)

• Variance between treatments:

Variation due to the different levels of the factor
(Termed as Sum of Squares of treatment/factor)
SS(Treatment) or SS(Factor)
• Variance within treatments:
Variation due to error
(Termed as Sum of Squares of Error)
SS(Error)
6
ANOVA

Null and Alternative Hypothesis

H0: All the population means are the same.

Ha: At least one of the means is different.

Suppose we want to compare k groups.

H0: The population means of all k groups are the same.
Ha: At least one group has a different mean.

H0: 1  2    k
Ha: At least one i is different from others
7
ANOVA
Data are typically put into a table for easy referencing by
computer software. The table is called ANOVA table.

Number of treatments: k Total number of data: n

Source of Sum of Squares Degrees of
Mean Square (MS) F
Variation (SS) Freedom (df)
MS(Factor)
Between SS(Factor) or MS(Factor)
k–1 SS(Factor) F
Treatments SS(Treatment)  MS(Error)
k 1

MS(Error)
Error (Within
SS(Error) n–k SS(Error)
Treatments)

nk

Total SS(Total) n–1

8
ANOVA
Example:
Three different diet plans are to be tested for mean weight loss. The
entries in the table are the weight losses for the different plans.
Plan 1 Plan 2 Plan 3
5 3.5 8
4.5 7 4
4 4.5 3.5
3

The resulting ANOVA table is shown below:

Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Between Treatments 2.2458

Error (Within
Treatments)
20.8542

Total
9
ANOVA
Number of treatments: k = Total number of data: n =
Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Between Treatments 2.2458

Error (Within
Treatments)
20.8542

Total

Source of Sum of Squares Degrees of

Mean Square (MS) F
Variation (SS) Freedom (df)
Between SS(Factor) or SS(Factor) MS(Factor)
k–1
Treatments SS(Treatment) k 1 MS(Error)

Error (Within SS(Error)

SS(Error) n–k
Treatments) nk

Total SS(Total) n–1

10
ANOVA
Example (continued):
Three different diet plans are to be tested for mean weight loss. The
entries in the table are the weight losses for the different plans.
Plan 1 Plan 2 Plan 3
5 3.5 8
4.5 7 4
4 4.5 3.5
3
Test the hypothesis that the mean weight loss of the 3 diet plans are
the same, at 5% level of significance.
11
ANOVA
Hypothesis Testing:

H 0: The population mean weight loss of the three diet

plans are ALL the same.
Ha: At least one of the diet plans has a different mean
weight loss.

ANOVA table
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 2.2458 2 1.1229 0.3769
Error (Within Treatments) 20.8542 7 2.9792
Total 23.1 9
12
ANOVA
Hypothesis Testing:
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 2.2458 2 1.1229 0.3769
Error (Within Treatments) 20.8542 7 2.9792
Total 23.1 9

F50,50
F-distribution

F10,90
F3,5

F90,10
13
ANOVA
Hypothesis Testing:
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 2.2458 2 (df1) 1.1229 0.3769
Error (Within Treatments) 20.8542 7 (df2) 2.9792
Total 23.1 9

Critical value
Fdf1 ,df2  F2,7  4.7375

Test Statistic
Fc  0.3769
14
ANOVA
Hypothesis Testing:
Reject H0 if (Test Statistic > Critical value)
Do not reject H0 if (Test Statistic Critical

value)

Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Between Treatments 2.2458 2 (df1) 1.1229 0.3769
Error (Within Treatments) 20.8542 7 (df2) 2.9792
Total 23.1 9
Critical value Test Statistic
Fdf1 ,df2  F2,7  4.7375
Fc  0.3769

Fc  F2,7  Do not reject H0 .

 There is insufficient evidence that at least one of the
diet plans has a different mean weight loss.
15
ANOVA

1 H 0: The population mean weight loss of the three diet

plans are ALL the same.
Ha: At least one of the diet plans has a different mean
weight loss.
2 Test statistic: Fc  0.3769

3 Critical Value: At 5% level of significance, F2,7  4.7375

4 Fc  F2,7  Do not reject H0

5 Conclusion: Do not reject H0 at a 5% level of significance.

 There is insufficient evidence that at least one of the diet
plans has a different mean weight loss.
16
ANOVA
Example:
As part of an experiment to see how different types of soil cover
would affect slicing tomato production, Douglas College students
grew tomato plants under different soil cover conditions. Groups of
three plants each had one of the 5 treatments (i.e. a total of 15 plants).
All plants grew under the same conditions and were the same variety.
Students recorded the weight (in grams) of tomatoes produced by
each of the plants and the results are summarized in an ANOVA table:
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 36,648,561
Error (Within Treatments)
Total 57,095,287

At the 0.05 level of significance, conduct a hypothesis test to

determine if all treatment means are the same.
17
ANOVA
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 36,648,561 4 9,162,140.25 4.481
Error (Within Treatments) 20,446,726 10 2,044,672.6
Total 57,095,287 14

H0: The population mean of all 5 treatments are the

1
same.
Ha: At least one treatment has a different mean.
2 Test statistic: Fc  4.481
3 Critical Value: At 5% level of significance, F4,10  3.478

4 Fc  F4,10  Reject H0
5 Conclusion: Reject H0 at a 5% level of significance.
 There is sufficient evidence that at least one treatment
has a different mean.
18
ANOVA

Example:
In a completely randomized experimental design, 7 experimental
units were used for each of the 4 levels of the factor:

Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Between Treatments

Error (Within Treatments) 24,000

Total 38,301

Complete the ANOVA table and test the hypothesis that the
population treatment means are all the same, at   0.05 .
19
ANOVA
Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 14,301 3 4,767 4.767
Error (Within Treatments) 24,000 24 1,000
Total 38,301 27

1 H 0: The population mean of all 4 factors are the same.

Ha: At least one factor has a different mean.
2 Test statistic: Fc  4.767
3 Critical Value: At   0.05 , F3,24  F3,20  3.0983

4 Fc  F3,20  Reject H0
5 Conclusion: Reject H0 at a 5% level of significance.
 There is sufficient evidence that at least one factor has a
different mean.
20

BADM 221 Statistics for Business

Unit 11
Linear Regression
21
Linear Regression

Regression is a statistical technique that uses the idea

that one variable may be related to one or more variables
through an equation.

Here we consider the relationship of two variables only in a

straight line relationship, which is called simple linear
regression.
22
Linear Regression

Simple linear regression uses the relationship between the

two variables to obtain information about one variable by
knowing the values of the other.

The equation showing this type of relationship is called

linear regression equation.
23
Linear Regression

Linear equation: y  mx  b
slope y-intercept

y  2x 1 Slope = 2
Y-intercept = –1

y-intercept
24
Linear Regression

We want to use X to predict (or estimate) the value of Y that

might be obtained without actually measuring it, provided
the relationship between the two can be expressed by a line.

“ X ” is usually called the independent variable and “ Y ” is

called the dependent variable.
Statistics
Score

Mathematics Score
25
Linear Regression
Example: The exam scores of a class of 9 students in
Mathematics ( X ) and in Statistics ( Y ) are shown
below:
Math Score (X) 80 58 92 60 75 63 93 76 78
Stat Score (Y) 78 64 96 62 78 65 90 61 82

Statistics
Score

Mathematics Score
26
Linear Regression
We want to determine the equation of the regression line
that best-fits the data.

Statistics Statistics
Score Score

Mathematics Score Mathematics Score

Statistics Statistics
Score Score

Mathematics Score Mathematics Score

27
Linear Regression
Equation of the regression line:
df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306

Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001

Statistics
Score

Mathematics Score
28
Linear Regression
Equation of the regression line:
df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306

Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001

Y  9.450  0.872 X
Statistics
Score

Stat Score  9.450  0.872  Math Score

Mathematics Score
29
Linear Regression

We can then make prediction using the regression

equation:

Stat Score  9.450  0.872  Math Score

For example:
Score in Math  Estimated score in Stat
61  9.450 +
0.872 61 = 62.42 
73  9.450 +
0.872 73 = 73.11
91 9.450 +
30
Linear Regression

Is the regression relationship significant?

Null and Alternative Hypothesis

H0: There is no relationship between X and Y
(The regression relationship is NOT
significant.)
Ha: There is a linear relationship between X and Y
(The regression relationship is
significant.)
31
Linear Regression

Is the regression relationship significant?

Use the p-value approach

Reject H0 if (p-value level of significance)

 The regression relationship is significant.

Do not reject H0 if (p-value > level of significance)

 The regression relationship is NOT
significant.
32
Linear Regression

Is the regression relationship significant?

df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306

Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001

Which p-value?
33
Linear Regression

Is the regression relationship significant?

df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306

Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001

As an illustration: Take level of significance = 5%

The p-value for Math Score is 0.001 < the level of significance
 Reject H0  The regression relationship is significant.
34
Linear Regression

How good is the regression equation?

Coefficient of Determination, R2

SS (Regression)
R 
2
(decimal  percent)
SS (Total)

Interpreted as the percentage of the observed variation in Y

that can be explained by the variation in X.
35
Linear Regression
df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306

Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001

1004.483
R 
2
 0.7691  76.91%
1306

76.91% of the variability of the Statistics score can be explained

by the linear relationship with the Mathematics score.
36
Linear Regression
Example:
A teacher wishes to investigate if there is any relationship
between a student’s exam score in Mathematics (X) and the
exam score in Accounting (Y). A sample of 11 students is
randomly selected and the results are summarized in the
ANOVA table below:

df SS
Regression 1 1305.68
Residual 9 81.96
Total 10 1387.64

Coefficients Standard Error t Stat p-value

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
37
Linear Regression
df SS
Regression 1 1305.68
Residual 9 81.96
Total 10 1387.64