FEM 2063 - Data Analytics
Chapter 3
Multiple Linear Regression
1
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
2
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
3
3.1 Background
Simple regression considers
the relation between a single
independent variable and
dependent variable
Multiple regression
simultaneously considers the
influence of multiple
independent variables on a
dependent variable Y
4
3.1 Background
◼ A simple regression model
fits a regression line in 2-
dimensional space
◼ A multiple regression model
with two independent
variables fits a regression
plane in 3-dimensional space
5
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
6
3.2 Multiple Linear Regression
Regression coefficients are estimated by minimizing
SSE to derive this model:
Again, estimates for the multiple slope
coefficients are derived by minimizing SSE derive
this multiple regression model:
+…
7
3.2 Multiple Linear Regression
❑ An extension of a simple linear regression model.
❑ Allows the dependent variable y to be modeled as a linear
function of more than one independent variable xi
❑ Consider the following data consisting of n sets of values
(𝑦1 , 𝑥11 , 𝑥21 , . . . . 𝑥𝑘1 )
(𝑦2 , 𝑥12 , 𝑥22 , . . . . 𝑥𝑘2 )
.
(𝑦𝑛 , 𝑥1𝑛 , 𝑥2𝑛 , . . . . 𝑥𝑘𝑛 )
8
3.2 Multiple Linear Regression
❑ The value of the dependent variable yi is modeled as
❑ The dependent variable is related to k independent
variables.
❑ As in SLR, the parameters of MLR (𝛽0 , 𝛽1 , . . . , 𝛽𝑘 ) also
estimated using the method of least squares.
❑ However, it would be tedious to find these values by
hand, thus we use the computer to handle the
computations. 9
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
10
3.3 Software Output
The software (Excel) output
Part 3. Reg Statistics
Part 2. ANOVA
Part 1. Regression
analysis
11
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
12
3.4 ANOVA
Source Sum of Degrees Mean Computed F
Of variation Squares of Square
freedom (Sum of squares /
(df) df)
𝑆𝑆𝑅
Regression SSR k 𝑀𝑆𝑅 = F = MSR/MSE
𝑘
𝑆𝑆𝐸
Error SSE n – (k+1) 𝑀𝑆𝐸 =
𝑛 − (𝑘 + 1)
Total SST n–1
13
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
14
3.5 Model Evaluation - (i) Standard error
of estimate (s)
𝐒𝐒𝐄
𝜎ො 𝟐 =
➢ Compute Standard Error of Estimate by 𝐧−𝑘−1
➢
➢ This is an unbiased estimator for s 2 (for Population)
➢ The smaller SSE the more successful is the Multiple Linear
Regression Model in explaining y.
15
3.5 Model Evaluation – (ii) Coefficient of
Determination
❑ Coefficient of determination 𝑅2 =
𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝑅
𝑆𝑆𝑇
=
𝑆𝑆𝑇
=1−
𝑆𝑆𝐸
𝑆𝑆𝑇
❑ proportion of variability in the observed dependent
variable that is explained by the MLR model.
❑ The coefficient of determination measures the strength
of that linear relationship, denoted by R2
❑ The greater R2 the more successful is the MLR Model
16
3.5 Model Evaluation – (iii) The
hypothesis test of the slope (t-test)
▪ The t-test addresses the adequate relationship between
xi and y exists.
▪ Test the hypothesis
H0 : 𝛽𝑖 = 0 (No relationship between xi and y)
H1: 𝛽𝑖 ≠ 0 (There is relationship between xi and y)
𝛽መ𝑖 − 𝛽𝑖 𝛽መ𝑖 − 𝛽𝑖
▪ Test Statistic: T – distribution: 𝑇= =
𝑠𝑒(𝛽መ𝑖 )
𝜎ො 2
𝑠𝑠𝑥𝑥
▪ Critical Region: |T | > tα/2, n-k-1 .
17
3.5 Model Evaluation – (iii) The
hypothesis test of the slope (t-test)
The t – test is used to test for inference on
individual regression coefficient.
18
3.5 Model Evaluation – (iii) Testing the
significance of regression (F-test)
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 =. . . . = 𝛽𝑘 = 0
Hypotheses:
𝐻1 : at least one𝛽𝑗 ≠ 0
𝑀𝑆𝑅
Test statistic: 𝐹0 =
𝑀𝑆𝐸
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑀𝑆𝑅 = , 𝑀𝑆𝐸 =
where: 𝑘 𝑛−𝑘−1
𝑀𝑆𝑅
Rejection criteria: 𝐹0 = > 𝑓𝛼,𝑘,𝑛−𝑘−1
𝑀𝑆𝐸
19
3.5 Model Evaluation – (iii) Testing the
significance of regression (F-test)
❑ The F – test is used to test for inference on multiple linear
regression model
20
Overview
➢ 3.1 Background
➢ 3.2 Multiple Linear Regression (MLR)
➢ 3.3 Software Output
➢ 3.4 ANOVA
➢ 3.5 Model Evaluation
➢ 3.6 Application/Examples
21
3.5 Application/Examples
Wire Bond Pull Strength Data
22
Wire Bond Pull Strength Data
I. Estimate the Multiple linear regression (MLR) equation
II. Find the standard error of estimate of this MLR.
III. Determine the coefficient of determination of this MLR.
IV. Test for significance of Slopes at 5% significance level.
V. Test for significance of MLR at 5% significance level.
23
Wire Bond Pull Strength Data
Regression Statistics
Multiple R 0.990523843
R Square 0.981137483
Adjusted R Square 0.979422709
Standard Error 2.288046833
Observations 25
ANOVA
df SS MS F Significance F
Regression 2 5990.771221 2995.385611 572.1672 1.07546E-19
Residual 22 115.1734828 5.235158308
Total 24 6105.944704
Upper
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% 95.0%
Intercept 2.263791434 1.060066238 2.135518851 0.044099 0.065348613 4.462234256 0.06534861 4.462234
X Variable 1 2.744269643 0.093523844 29.34299438 3.91E-19 2.550313061 2.938226226 2.55031306 2.938226
X Variable 2 0.012527811 0.002798419 4.476746229 0.000188 0.006724246 0.018331377 0.00672425 0.018331
24
Wire Bond Pull Strength Data
The Estimated Multiple Linear regression equation is
Strength = 2.26 + 2.74*Length + 0.0125 Height
25
Wire Bond Pull Strength Data
◼ Standard error of estimate (s) = 2.288
◼ Coefficient of determination (R2) = 98.1%
26
Wire Bond Pull Strength Data
H0 : 𝛽𝑖 = 0 (No relationship between xi and y)
H1: 𝛽𝑖 ≠ 0 (There is relationship between xi and y)
Test Statistic: 𝑇1 = 29.34 & 𝑇2 = 4.48 (From the table)
27
Wire Bond Pull Strength Data
Critical Value tα/2, n-p = t0.05/2, 22 = 2.074 (from statistical table)
Conclusion
Since 𝑇1 = 29.34 & 𝑇2 = 4.48 > 2.074, we reject H0 , and conclude
that pull strength is linearly related wire length and die height
28
Wire Bond Pull Strength Data
𝐻0 : 𝛽1 = 𝛽2 = 0
Hypotheses:
𝐻1 : at least one 𝛽𝑗 ≠ 0
𝑀𝑆𝑅
Test statistic: 𝐹0 = = 2995.4/5.2 = 572.17
𝑀𝑆𝐸
29
Wire Bond Pull Strength Data
𝑀𝑆𝑅
Rejection criteria: 𝐹0 = > 𝑓𝛼,𝑘,𝑛−𝑝
𝑀𝑆𝐸
Let = 0.05. Since k = 2, n-p =22, we require to find F(0.05,2,22).
From table we find that F(0.05, 2, 22) = 3.44.
Conclusion
Since 572.17 > 3.44 we Reject H0 and conclude that pull strength is
linearly related to either wire length or die height or both
30
Example 2
A set of experimental runs were made to determine a way of
predicting cooking time y at various levels of oven width x1, and
temperature x2. The data were recorded as follows:
i. Estimate the Multiple linear regression (MLR)
equation
ii. Find the standard error of estimate of this
MLR.
iii. Determine the coefficient of determination of
this MLR.
iv. Test for significance of Slopes at 1%
significance level.
v. Test for significance of MLR at 1%
significance level.
Cooking time, oven width and
temperature
32
Cooking time, oven width and
temperature
i. MLR equation
ii. Find the standard error of estimate of this MLR.
iii. Determine the coefficient of determination of this MLR.
Cooking time, oven width and
temperature
iv. Test for significance of Slopes at 1% significance level.
v. Test for significance of MLR at 1% significance level.
35