10/26/2021
FORECASTING ENGINEERING
Chapter 7:
Multiple Regression
INSTRUCTOR:
HCMUT-Vietnam
Department of Industrial Systems Engineering
Simple linear regression vs. multiple linear regression
• Simple linear regression: the relationship between a
single independent variable and a dependent variable
(response: giá trị hồi đáp vs. predictor variable: biến
chỉ báo).
• Multiple linear regression: the relationship between
more than a single independent variable and a
dependent variable to predict its future values.
• More accurately predict the dependent variable
• Express real-life forecasting situations: one factor might be influenced
by multiple factors.
1
10/26/2021
Multiple Regression Analysis
Statistical Model between response Y and Independent variables
Xs’:
For the ith
observation set of
X1, X2, ...Xk we
have Xik and Yi
ε: deviations of the
estimated response
from the true
observed data
~ residuals
10/26/2021 Chapter 5_Regression Model - ISE Department 3
Multiple Regression Analysis
• The least squares criterion is used to develop this
equation.
• Because determining b1, b2, etc. is very tedious, a
software package such as Excel or MINITAB is
recommended.
2
10/26/2021
Regression Plane for a 2-Independent
Variable Linear Regression Equation
10/26/2021 5
R2
Total
Variation
2 Explained Variation in Y
R =
Total Variation in Y
10/26/2021 Chapter 7_Regression Model - ISE Department 6
3
10/26/2021
Coefficient of Determination
For multiple regression:
• R2=1: all of the variability in Y is explained when X is
known: The sample data points all lie on the fitted
regression line
• R2=0: none of variability in Y is explained by X
10/26/2021 7
Independent Variables
• Qualitative variables
– Categorical: Categorical variables are also called qualitative
variables or attribute variables. Categorical variables (biến
phân loại/phân lớp) or nominal-scale variables (biến định danh)
or—such as
• gender (male and female).
– Ordinal: An ordinal variable (biến thứ tự) is similar to a categorical
variable.
• economic status (low, medium and high).
• educational experience (1, 2, 3 and 4 ~ elementary school, high school, some
college, and graduate)
• Likert scale (strongly agree”, “agree”, “neutral”, “disagree” and “strongly
disagree”). Ref:
https://www.extension.iastate.edu/Documents/ANR/LikertScaleExamplesforSurv
eys.pdf
10/26/2021 8
4
10/26/2021
– To use a qualitative variable in regression analysis, we use a
scheme of dummy variables in which one of the two possible
conditions is coded 0 and the other 1.
• Quantitative variables:
– The values of a quantitative variable are numbers that usually
represent a count or a measurement.
• Both categorical data and quantitative data: for
exploring a single subject. Categorical variables are
often used to group or subset the data in graphs or
analyses.
10/26/2021 9
Examples of categorical variables Examples of quantitative variables
Data type Examples Data type Examples
Numeric •Gender (1=Female, Numeric •Number of customer
2=Male) complaints
•Survey results (1=Agree, •Proportion of
2=Neutral, 3=Disagree) customers eligible for
a rebate
Text •Payment method (Cash or •Fill weight of a cereal
Credit)
box
•Machine settings (Low,
Medium, High) Date/time •Date and time
•Product types (Wood, payment is received
Plastic, Metal) •Date and time of
technical support
Date/time •Days of the week (Monday, incident
Tuesday, Wednesday)
•Months of the year
(January, February, March)
10/26/2021 10
5
10/26/2021
Coding for categorical variables
• To recode the categorical predictors, the following
methods aim to compare the levels of the predictor to
the overall mean or the mean of a reference level.
(-1, 0, +1): Choose to estimate the difference between each
level mean and the overall mean.
(1, 0): Choose to estimate the difference between each level
mean and the reference level's mean. If you choose the (1, 0)
coding scheme, the reference level table becomes active in the
dialog box.
• The coding scheme does not change the test of the
overall effect of the predictor.
10/26/2021 Chapter 5_Regression Model - ISE Department 11
This column of the table shows all the names of
the categorical predictors in your model. This
column does not take any input.
10/26/2021 12
6
10/26/2021
• For predictors with 1, 0 coding, by default, Minitab sets
the following reference levels based on the data type:
– For numeric categorical predictors, the reference level is the level with
the least numeric value.
– For date/time categorical predictors, the reference level is the level with
the earliest date/time.
– For text categorical predictors, the reference level is the level that is first
in value order, which is alphabetical order, by default.
• For predictors with -1, 0, 1 coding, by default, Minitab
sets the following reference levels based on the data
type:
– For numeric categorical predictors, the reference level is the level with the largest
numeric value.
– For date/time categorical predictors, the reference level is the level with the latest
date/time.
– For text categorical predictors, the reference level is the level that is last in
alphabetical order.
10/26/2021 Chapter 5_Regression Model - ISE Department 13
21
Multicollinearity
• The relation between X and Z or c in the previous example
called multicollinearity (đa cộng tuyến)
• Multicollinearity is the situation in which independent
variables in a multiple regression equation are highly
intercorrelated. That is, a linear relation exists between two or
more independent variables.
• Correlated independent variables make it difficult to make
inferences about the individual regression coefficients (slopes)
and their individual effects on the dependent variable (Y).
• However, correlated independent variables do not affect a
multiple regression equation’s ability to predict the dependent
variable (Y).
7
10/26/2021
22
Multicollinearity
• “How much multicollinearity in a regression analysis” is measured by
Variance Inflation Factor (hệ số phóng đại phương sai)
R2j: coefficient of determination
which is calculated by regressing
the jth IV on remaining
• (VIFj near 1) ~ R2j = 0 (k-1) IVs
the jth IV DOES NOT related to the remaining IV(s)
the coefficient of jth IV does not change when other IV(s) added or removed from
the model
• (VIFj > 1) ~ (1 > R2j > 0)
the jth IV DOES related to the remaining IV(s)
a large VIF make redundant information among predictor variables
difficult to interpret the effects of IV(s) on the response
solutions? (See Page 287).
23
• A VIF value greater than 5 suggests that the
regression coefficient is poorly estimated due to
severe multicollinearity.
VIF Status of predictor
VIF = 1 Not correlated
1 < VIF < 5 Moderately correlated
VIF > 5 Highly correlated
8
10/26/2021
24
Collinearity vs. Interaction
• Interaction terms can be added to the model to investigate if
two (or more) combined independent variables have effects on
the response Y.
The interactions are commonly used when categorical factors are present
• Example: The rates of response of Y (income) to a second factor X2 (years
of education) according to the categorical X1 (gender, 0=male, 1= female).
• Y=β0+β1X1+β2X2, the model only accounts for females earning a fixed
amount more or less than males, with a separate term accounting for
educational difference regardless of gender.
• If we add a third interaction variable X1X2, so
Y=β0+β1X1+β2X2+β3(X1X2), this third factor will be zero for males,
but non-zero for females, thus representing the specific variable female
years of education, and allowing the model to separately account for the
effects this on income (β2 becomes the gradient for rate of change of male
income, and β3 is an adjustment to the slope of income change for
females).
This is an interaction, but it is not collinear
25
Collinearity vs. Interaction
• Example: The rates of response of Y (income) to a second factor X2
(years of education) according to the categorical X1 (gender, 0=male,
1= female).
Y (triệu) X1 X2
15 0 4
12 1 2
Y=β0+β1X1+β2X2
• Male1: Y1=β0+β1X1+β2X2= β0+β2(4)
• Female1: Y2=β0+β1X1+β2X2=β0+β1(1)+β2(2)
Y=β0+β1X1+β2X2+β3(X1X2)
• Male1: Y1=β0+ β2(4)
• Female1: Y2=β0+β1(1)+β2(2)+β3(2)
This is an interaction, but it is not collinear
9
10/26/2021
Variance remedial measures
• If variances are not equal, it can have serious consequences for the
results of the ANOVA.
• If mild, the violation will not be that important.
• If significant, the violation can cause the ANOVA to be wrong or un-
interpretable.
• If variances are not equal:
– Check for outliers
– Transform the data (y)
• When variance is proportional to mean, use sqrt(y)
• When standard deviation is proportional to mean, use log(y)
• When std. dev is proportional to mean^2, use 1/y
• Read https://onlinecourses.science.psu.edu/stat501/node/318/ for more
details.
27
• See Example 7.1
• See Example Salary vs (X1, X2, X3, X4, X5, X6)
10
10/26/2021
What does this mean?
SSE dft
R 2 ( adj) = 1 − =
SSR SSE SST df e
R2 = = 1−
SST SST SSE df t MSE
1− = 1−
df e SST MST
10/26/2021 Chapter 7_Regression Model - ISE Department 28
What does this mean?
Regression DF =
# of predictors
s = MSE
Error DF =
difference
𝑆𝑆𝑅
𝑀𝑆𝑅
𝐷𝐹
𝑀𝑆𝑅
𝐹
𝑀𝑆𝐸
Total DF = # of
observations in
𝑆𝑆𝑇 𝑆𝑆𝑅 𝑆𝑆𝐸 the sample − 1
10/26/2021 Chapter 7_Regression Model - ISE Department 29
11
10/26/2021
What does this mean?
10/26/2021 Chapter 7_Regression Model - ISE Department 30
Systematic methods
• These p-values suggest that these coefficients in this
model might be zero, which makes one question whether
they should be in the model or not.
• DO NOT discard predictors based on their p-value. The
models are very sensitive to the inclusion or exclusion of
predictors. Use a systematic method to determine which
predictors to use.
• Best subsets: Compares all combinations of n predictors
and outputs the best.
• Stepwise: Takes the best predictor first, then adds (and
subtracts) others (See Excel file)
10/26/2021 Chapter 7_Regression Model - ISE Department 31
12
10/26/2021
Systematic methods
• Forward selection: Takes the best predictor, then adds more
one at a time (See Excel File)
• Backward elimination: Starts with all predictors, subtracts
them one at a time.
• What is the best model?
• It depends!
– You typically have a purpose in creating a regression model. It is
usually true that the regression is to have some cause-and-effect
relationship. Because of that, you want the model that accurately
reflects what x variables cause a change in the y variable, even if
that means increasing residuals
– Sometimes you want only the best predictors, sometimes you want
all the good predictors, sometimes you want a certain number of
predictors … and so on.
10/26/2021 Chapter 7_Regression Model - ISE Department 32
Systematic methods
• Stepwise regression procedures are controversial.
• Most of the time, they will give you similar answers.
• Forward elimination was used because it was the
simplest to perform, which was a consideration in the
1960s and 70s due to a lack of computing.
• Best subsets can be computationally intensive for large
models (ones with lots of predictors), but generally gives
the most defensible answer.
• When in doubt, use best subsets. If it takes too long to
run, use stepwise.
13
10/26/2021
Best subsets regression procedure
• Step #1. Identify all of the possible regression models
• Step #2. Determine the k-predictor models that do the
"best" at meeting some well-defined criteria (j=1, 2, 3...)
– Mallows' Cp-statistic (measure of unbiased model): Identify
subsets of predictors for which the Cp value is near p (if
possible) or smallest Cp value.
– more than one model has a small value of Cp value near p, in
general, choose the simpler model
– Large R2(adj), or large Predicted R2, and small s
• Step #3. Further evaluate and refine the handful of
models
10/26/2021 Chapter 7_Regression Model - ISE Department 34
The Assumptions of Multiple Regression
1. There is a linear relationship. That is, there is a straight-line
relationship between the dependent variable and the set of
independent variables.
2. The variation in the residuals is the same for both large and
small values of the estimated Y To put it another way, the
residual is unrelated whether the estimated Y is large or small.
3. The residuals follow the normal probability distribution.
4. The independent variables should not be correlated. That is,
we would like to select a set of independent variables that are
not themselves correlated.
5. The residuals are independent. This means that successive
observations of the dependent variable are not correlated. This
assumption is often violated when time is involved with the
sampled observations.
35
14
10/26/2021
Four Quick Checks (Multiple Regression)
1) Does the model make sense (i.e., check slope terms, F test)?
2) Is there a statistically significant relationship between the dependent and
independent variables (t-test)?
3) What percentage of the variation in the dependent variable does the
regression model explain (R-Square)?
4) Do the residuals violate assumptions (Analysis of Residuals)?
1. zero mean
2. normally distributed
3. homoscedastic (constant variance)
4. mutually independent (non-autocorrelated): Is there a problem of serial
correlation among the error terms in the model (Durbin-Watson)?
10/26/2021 36
Residual Plots for salary (x1000000)
Normal Probability Plot Versus Fits
99
2
90
1
Residual
Percent
50 0
-1
10
-2
1
-4 -2 0 2 4 12 14 16 18 20
Residual Fitted Value
Histogram Versus Order
6.0
2
4.5 1
Frequency
Residual
3.0 0
-1
1.5
-2
0.0
-2 -1 0 1 2 2 4 6 8 10 12 14 16 18 20
Residual Observation Order
10/26/2021 Chapter 7_Regression Model - ISE Department 37
15
10/26/2021
• The correlation between two variables can be visualized
by creating a scatterplot of the data: Y against X
– Not linear is detected transform X’ (of the variable X), Y’ (of the
variable Y), or both, can often significantly improve the
correlation
• A residual plot can reveal whether a data set follows a
random pattern
– No random is detected transform the raw data to make it more
linear significantly improve a fit between X and Y
10/26/2021 Chapter 7_Regression Model - ISE Department 38
the individual error
against the
predicted value
10/26/2021 Chapter 7_Regression Model - ISE Department 39
16
10/26/2021
10/26/2021 Chapter 7_Regression Model - ISE Department 40
10/26/2021 Chapter 7_Regression Model - ISE Department 41
17
10/26/2021
10/26/2021 Chapter 7_Regression Model - ISE Department 42
if the scatterplot of the raw data (X, Y) looks like that shown in
Figure (a),
Transform (X, Y) to (X’, Y’) so that the scatterplot looks more
like that displayed in Figure (b).
10/26/2021 Chapter 7_Regression Model - ISE Department 43
18
10/26/2021
Figure (a) Raw Data (b) Transformed Data
Apply Exponential model: that means to apply a
logarithmic transformation to the dependent variable y as
shown in Figure (b)
10/26/2021 Chapter 7_Regression Model - ISE Department 44
the Quadratic model: If the trend in the data follows the
pattern shown in Figure (a), we could take the square root of
y to get y’=√y.
10/26/2021 Chapter 7_Regression Model - ISE Department 45
19
10/26/2021
the Reciprocal model: A trend in the raw data as
shown in Figure (a) would suggest a reciprocal
transformation, i.e. y’=1/y.
10/26/2021 Chapter 7_Regression Model - ISE Department 46
the Logarithmic model: If the raw data follows a trend as
shown in Figure (a), a logarithmic transformation can be
applied to the independent variable x: x’ = log(x) ; a
logarithmic transformation can be applied to the
independent variable y: y’ = log(y) when Figure (a) has the
opposite values of x vs y
10/26/2021 Chapter 7_Regression Model - ISE Department 47
20
10/26/2021
• The Box-Cox transformation constitutes another
particularly useful family of transformations, which is
applied to the independent variable in most cases.
– T(X) = log(X), if λ = 0
– Where X is the variable being transformed and λ is referred to as
the transformation parameter.
• The optimal value of λ is then the value of λ
corresponding to the maximum correlation
• The Box-Cox transformation can also be applied to the Y
variable
10/26/2021 Chapter 7_Regression Model - ISE Department 48
Letting Minitab
calculate the optimal
lambda should
produce the best-
fitting results.
10/26/2021 Chapter 7_Regression Model - ISE Department 49
21
10/26/2021
Durbin-Watson tests
• Durbin-Watson tests for autocorrelation in
residuals from a regression analysis.
• Method 1:
– The test statistic ranges in between 0 to 4.
– A value of 2 indicates that there is no autocorrelation.
Value nearing 0 (i.e., below 2) indicates positive
autocorrelation and value towards 4 (i.e., over 2)
indicates negative autocorrelation.
– We could reduce the Dw value by increasing your
sample size
10/26/2021 50
• Durbin-Waston test is based on the assumption that the
errors in the regression model are generated by a first-
order autoregressive process observed at equally
spaced time periods, that is,
εt = ρεt−1 + at
where εt is the error term in the model at time period t, at is an
NID(0, σ2 a) random variable, and ρ(|ρ| < 1) is the autocorrelation
parameter.
• A simple linear regression model with first-order
autoregressive errors
yt = β0 + β1xt + εt
εt = ρεt−1 + at
10/26/2021 51
22
10/26/2021
• Most regression problems involving time series
data exhibit positive autocorrelation, the
hypotheses usually considered in the Durbin-
Watson test are
H0 : ρ = 0
H1 : ρ > 0
– If d < dL reject H0 : ρ = 0
– If d > dU do not reject H0 : ρ = 0
– If dL < d < dU test is inconclusive.
10/26/2021 52
10/26/2021 53
23
10/26/2021
10/26/2021 54
Assignment
10/26/2021 55
24