Statistics
Using Microsoft® Excel
Introduction to Multiple Regression
Learning Objectives
In this chapter, you learn:
How to develop a multiple regression model
How to interpret the regression coefficients
How to determine which independent variables to
include in the regression model
How to determine which independent variables are
most important in predicting a dependent variable
How to use categorical variables in a regression
model
Multiple Regression Equation
The coefficients of the multiple regression model
are estimated using sample data
Multiple regression equation with k independent variables:
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y
Ŷi b0 b1X1i b 2 X 2i bk Xki
In this chapter we will always use Excel to obtain the regression
slope coefficients and other regression summary measures.
Multiple Regression Model
Two variable model
Y
Ŷ b0 b1X1 b 2 X 2
X2
X1
Multiple Regression Equation
2 Variable Example
A distributor of frozen dessert pies wants to evaluate
factors thought to influence demand
Dependent variable: Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100’s)
Data are collected for 15 weeks
Multiple Regression Equation
2 Variable Example
Price Advertising
Week Pie Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3
Multiple regression equation:
3 350 8.00 3.0
4 430 8.00 4.5
Sales = b0 + b1 (Price) +
5 350 6.80 3.0
6 380 7.50 4.0 b2 (Advertising)
7 430 4.50 3.0
8 470 6.40 3.7 Sales = b0 +b1X1 + b2X2
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5 Where X1 = Price
12 300 7.90 3.2
13 440 5.90 4.0 X2 = Advertising
14 450 5.00 3.5
15 300 7.00 2.7
Estimating a Multiple Linear
Regression Equation
Excel will be used to generate the coefficients and
measures of goodness of fit for multiple regression
Excel:
Tools / Data Analysis... / Regression
Multiple Regression Equation
2 Variable Example, Excel
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Sales 306.526 - 24.975(X1 ) 74.131(X 2 )
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Multiple Regression Equation
2 Variable Example
Sales 306.526 - 24.975(X1 ) 74.131(X 2 )
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales will b2 = 74.131: sales will
decrease, on average, by increase, on average, by
24.975 pies per week for 74.131 pies per week
each $1 increase in selling for each $100 increase
price, net of the effects of in advertising, net of the
changes due to advertising effects of changes due
to price
Multiple Regression Equation
2 Variable Example
Predict sales for a week in which the selling price is
$5.50 and advertising is $350:
Sales 306.526 - 24.975(X1 ) 74.131(X 2 )
306.526 - 24.975 (5.50) 74.131 (3.5)
428.62
Note that Advertising is in
Predicted sales is $100’s, so $350 means that X2
428.62 pies = 3.5
Coefficient of
Multiple Determination
Reports the proportion of total variation in Y
explained by all X variables taken together
SSR regression sum of squares
r
2
SST total sum of squares
Coefficient of
Multiple Determination (Excel)
Regression Statistics
SSR 29460.0
Multiple R 0.72213
r
2
.52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
52.1% of the variation in pie sales is
Standard Error 47.46341
explained by the variation in price
Observations 15
and advertising
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Adjusted r2
r2 never decreases when a new X variable is added
to the model
This can be a disadvantage when comparing
models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X
variable is added
Did the new X variable add enough independent
power to offset the loss of one degree of
freedom?
Adjusted r2
Shows the proportion of variation in Y explained by all X
variables adjusted for the number of X variables used
n 1
r 1 (1 rY .12..k )
2 2
n k 1
(where n = sample size, k = number of independent variables)
Penalizes excessive use of unimportant independent
variables
Smaller than r2
Useful in comparing models
Adjusted r2
.44172
Regression Statistics 2
Multiple R 0.72213 radj
R Square 0.52148
Adjusted R Square 0.44172
44.2% of the variation in pie sales is explained
Standard Error 47.46341
by the variation in price and advertising, taking
Observations 15 into account the sample size and number of
independent variables
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
F-Test for Overall Significance
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between any of
the X variables and Y
Use F test statistic
Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable
affects Y)
F-Test for Overall Significance
Regression Statistics
Multiple R 0.72213
MSR 14730.0
F 6.5386
R Square 0.52148
Adjusted R Square 0.44172
MSE 2252.8
Standard Error 47.46341
Observations 15 P-value for
the F-Test
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
F-Test for Overall Significance
H0: β1 = β2 = 0 Test Statistic:
H1: β1 and β2 not both zero
MSR
= .05 F 6.5386
MSE
df1= 2 df2 = 12
Decision:
Critical p-value=.01201
Value:
Reject H0 at = 0.05
F = 3.885
= .05 Conclusion:
There is evidence that at least one
0 Do not Reject H0
F independent variable affects Y
reject H0 F.05 = 3.89
Individual Variables
Tests of Hypothesis
Use t-tests of individual variable slopes
Shows if there is a linear relationship
between the variable Xi and Y
Hypotheses:
H0: βi = 0 (no linear relationship)
H1: βi ≠ 0 (linear relationship does exist
between Xi and Y)
Individual Variables
Tests of Hypothesis
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist
between Xi and Y)
Test Statistic:
bj 0 (df = n – k – 1)
t
Sb j
Individual Variables
Tests of Hypothesis
Regression Statistics
t-value for Price is t = -2.306, with p-
Multiple R 0.72213
value .0398
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
t-value for Advertising is t = 2.855,
Observations 15
with p-value .0145
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Inferences about the Slope:
t Test Example
From Excel output:
H0: β1 = 0
Coefficients Standard Error t Stat P-value
H1: β1 0 Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
H0: β2 = 0 The test statistic for each variable falls
H1: β2 0 in the rejection region (p-values < .05)
Decision:
d.f. = 15-2-1 = 12
Reject H0 for each variable
= .05
Conclusion:
There is sufficient evidence
that both Price and Advertising
affect pie sales at = .05
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope βi
where t has
bi t nk 1Sbi (n – k – 1) d.f.
(15 – 2 – 1) = 12 d.f.
Coefficients Standard Error … Lower 95% Upper 95%
Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888
Example: Excel output also reports these interval endpoints:
Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price
Chapter Summary
In this chapter, we have
Developed the multiple regression model
Tested the significance of the multiple
regression model
Discussed adjusted r2
Tested individual regression coefficients
Tested portions of the regression model