Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
69 views24 pages

Chapter 8 Multiple Regression - Oct21

The document discusses multiple linear regression, which examines the relationship between more than one independent variable and a dependent variable. Multiple regression allows for more accurate prediction of the dependent variable compared to simple linear regression. It expresses relationships where a dependent variable may be influenced by multiple factors. The coefficient of determination, R2, indicates how well the regression model fits the observed data, with higher R2 indicating less unexplained variability. Both categorical and quantitative variables can be used as independent variables in a regression model.

Uploaded by

Nguyễn Ly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views24 pages

Chapter 8 Multiple Regression - Oct21

The document discusses multiple linear regression, which examines the relationship between more than one independent variable and a dependent variable. Multiple regression allows for more accurate prediction of the dependent variable compared to simple linear regression. It expresses relationships where a dependent variable may be influenced by multiple factors. The coefficient of determination, R2, indicates how well the regression model fits the observed data, with higher R2 indicating less unexplained variability. Both categorical and quantitative variables can be used as independent variables in a regression model.

Uploaded by

Nguyễn Ly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

10/26/2021

FORECASTING ENGINEERING

Chapter 7:

Multiple Regression

INSTRUCTOR:

• NGUYỄN VẠNG PHÚC NGUYÊN ([email protected])

HCMUT-Vietnam

Department of Industrial Systems Engineering

Simple linear regression vs. multiple linear regression


• Simple linear regression: the relationship between a
single independent variable and a dependent variable
(response: giá trị hồi đáp vs. predictor variable: biến
chỉ báo).
• Multiple linear regression: the relationship between
more than a single independent variable and a
dependent variable to predict its future values.
• More accurately predict the dependent variable
• Express real-life forecasting situations: one factor might be influenced
by multiple factors.

1
10/26/2021

Multiple Regression Analysis


Statistical Model between response Y and Independent variables
Xs’:

For the ith


observation set of
X1, X2, ...Xk we
have Xik and Yi

ε: deviations of the
estimated response
from the true
observed data
~ residuals

10/26/2021 Chapter 5_Regression Model - ISE Department 3

Multiple Regression Analysis

• The least squares criterion is used to develop this


equation.
• Because determining b1, b2, etc. is very tedious, a
software package such as Excel or MINITAB is
recommended.

2
10/26/2021

Regression Plane for a 2-Independent


Variable Linear Regression Equation

10/26/2021 5

R2

Total
Variation

2 Explained Variation in Y
R =
Total Variation in Y

10/26/2021 Chapter 7_Regression Model - ISE Department 6

3
10/26/2021

Coefficient of Determination

For multiple regression:

• R2=1: all of the variability in Y is explained when X is


known: The sample data points all lie on the fitted
regression line
• R2=0: none of variability in Y is explained by X

10/26/2021 7

Independent Variables
• Qualitative variables
– Categorical: Categorical variables are also called qualitative
variables or attribute variables. Categorical variables (biến
phân loại/phân lớp) or nominal-scale variables (biến định danh)
or—such as
• gender (male and female).

– Ordinal: An ordinal variable (biến thứ tự) is similar to a categorical


variable.
• economic status (low, medium and high).
• educational experience (1, 2, 3 and 4 ~ elementary school, high school, some
college, and graduate)
• Likert scale (strongly agree”, “agree”, “neutral”, “disagree” and “strongly
disagree”). Ref:
https://www.extension.iastate.edu/Documents/ANR/LikertScaleExamplesforSurv
eys.pdf

10/26/2021 8

4
10/26/2021

– To use a qualitative variable in regression analysis, we use a


scheme of dummy variables in which one of the two possible
conditions is coded 0 and the other 1.
• Quantitative variables:
– The values of a quantitative variable are numbers that usually
represent a count or a measurement.
• Both categorical data and quantitative data: for
exploring a single subject. Categorical variables are
often used to group or subset the data in graphs or
analyses.

10/26/2021 9

Examples of categorical variables Examples of quantitative variables

Data type Examples Data type Examples


Numeric •Gender (1=Female, Numeric •Number of customer
2=Male) complaints
•Survey results (1=Agree, •Proportion of
2=Neutral, 3=Disagree) customers eligible for
a rebate
Text •Payment method (Cash or •Fill weight of a cereal
Credit)
box
•Machine settings (Low,
Medium, High) Date/time •Date and time
•Product types (Wood, payment is received
Plastic, Metal) •Date and time of
technical support
Date/time •Days of the week (Monday, incident
Tuesday, Wednesday)
•Months of the year
(January, February, March)

10/26/2021 10

5
10/26/2021

Coding for categorical variables


• To recode the categorical predictors, the following
methods aim to compare the levels of the predictor to
the overall mean or the mean of a reference level.
(-1, 0, +1): Choose to estimate the difference between each
level mean and the overall mean.
(1, 0): Choose to estimate the difference between each level
mean and the reference level's mean. If you choose the (1, 0)
coding scheme, the reference level table becomes active in the
dialog box.
• The coding scheme does not change the test of the
overall effect of the predictor.

10/26/2021 Chapter 5_Regression Model - ISE Department 11

This column of the table shows all the names of


the categorical predictors in your model. This
column does not take any input.
10/26/2021 12

6
10/26/2021

• For predictors with 1, 0 coding, by default, Minitab sets


the following reference levels based on the data type:
– For numeric categorical predictors, the reference level is the level with
the least numeric value.
– For date/time categorical predictors, the reference level is the level with
the earliest date/time.
– For text categorical predictors, the reference level is the level that is first
in value order, which is alphabetical order, by default.
• For predictors with -1, 0, 1 coding, by default, Minitab
sets the following reference levels based on the data
type:
– For numeric categorical predictors, the reference level is the level with the largest
numeric value.
– For date/time categorical predictors, the reference level is the level with the latest
date/time.
– For text categorical predictors, the reference level is the level that is last in
alphabetical order.
10/26/2021 Chapter 5_Regression Model - ISE Department 13

21

Multicollinearity
• The relation between X and Z or c in the previous example
called multicollinearity (đa cộng tuyến)
• Multicollinearity is the situation in which independent
variables in a multiple regression equation are highly
intercorrelated. That is, a linear relation exists between two or
more independent variables.
• Correlated independent variables make it difficult to make
inferences about the individual regression coefficients (slopes)
and their individual effects on the dependent variable (Y).
• However, correlated independent variables do not affect a
multiple regression equation’s ability to predict the dependent
variable (Y).

7
10/26/2021

22

Multicollinearity
• “How much multicollinearity in a regression analysis” is measured by
Variance Inflation Factor (hệ số phóng đại phương sai)

R2j: coefficient of determination


which is calculated by regressing
the jth IV on remaining
• (VIFj near 1) ~ R2j = 0 (k-1) IVs
 the jth IV DOES NOT related to the remaining IV(s)
 the coefficient of jth IV does not change when other IV(s) added or removed from
the model
• (VIFj > 1) ~ (1 > R2j > 0)
 the jth IV DOES related to the remaining IV(s)
 a large VIF make redundant information among predictor variables
 difficult to interpret the effects of IV(s) on the response
 solutions? (See Page 287).

23

• A VIF value greater than 5 suggests that the


regression coefficient is poorly estimated due to
severe multicollinearity.
VIF Status of predictor
VIF = 1 Not correlated
1 < VIF < 5 Moderately correlated
VIF > 5 Highly correlated

8
10/26/2021

24

Collinearity vs. Interaction


• Interaction terms can be added to the model to investigate if
two (or more) combined independent variables have effects on
the response Y.
 The interactions are commonly used when categorical factors are present
• Example: The rates of response of Y (income) to a second factor X2 (years
of education) according to the categorical X1 (gender, 0=male, 1= female).
• Y=β0+β1X1+β2X2, the model only accounts for females earning a fixed
amount more or less than males, with a separate term accounting for
educational difference regardless of gender.
• If we add a third interaction variable X1X2, so
Y=β0+β1X1+β2X2+β3(X1X2), this third factor will be zero for males,
but non-zero for females, thus representing the specific variable female
years of education, and allowing the model to separately account for the
effects this on income (β2 becomes the gradient for rate of change of male
income, and β3 is an adjustment to the slope of income change for
females).
This is an interaction, but it is not collinear

25

Collinearity vs. Interaction


• Example: The rates of response of Y (income) to a second factor X2
(years of education) according to the categorical X1 (gender, 0=male,
1= female).
Y (triệu) X1 X2
15 0 4
12 1 2
 Y=β0+β1X1+β2X2
• Male1: Y1=β0+β1X1+β2X2= β0+β2(4)
• Female1: Y2=β0+β1X1+β2X2=β0+β1(1)+β2(2)
 Y=β0+β1X1+β2X2+β3(X1X2)
• Male1: Y1=β0+ β2(4)
• Female1: Y2=β0+β1(1)+β2(2)+β3(2)
 This is an interaction, but it is not collinear

9
10/26/2021

Variance remedial measures


• If variances are not equal, it can have serious consequences for the
results of the ANOVA.

• If mild, the violation will not be that important.

• If significant, the violation can cause the ANOVA to be wrong or un-


interpretable.

• If variances are not equal:


– Check for outliers
– Transform the data (y)
• When variance is proportional to mean, use sqrt(y)
• When standard deviation is proportional to mean, use log(y)
• When std. dev is proportional to mean^2, use 1/y
• Read https://onlinecourses.science.psu.edu/stat501/node/318/ for more
details.

27

• See Example 7.1


• See Example Salary vs (X1, X2, X3, X4, X5, X6)

10
10/26/2021

What does this mean?

SSE  dft 
R 2 ( adj) = 1 −  =
SSR SSE SST  df e 
R2 = = 1−
SST SST SSE  df t  MSE
1−   = 1−
df e  SST  MST

10/26/2021 Chapter 7_Regression Model - ISE Department 28

What does this mean?


Regression DF =
# of predictors
s = MSE
Error DF =
difference
𝑆𝑆𝑅
𝑀𝑆𝑅
𝐷𝐹
𝑀𝑆𝑅
𝐹
𝑀𝑆𝐸

Total DF = # of
observations in
𝑆𝑆𝑇 𝑆𝑆𝑅 𝑆𝑆𝐸 the sample − 1
10/26/2021 Chapter 7_Regression Model - ISE Department 29

11
10/26/2021

What does this mean?

10/26/2021 Chapter 7_Regression Model - ISE Department 30

Systematic methods
• These p-values suggest that these coefficients in this
model might be zero, which makes one question whether
they should be in the model or not.
• DO NOT discard predictors based on their p-value. The
models are very sensitive to the inclusion or exclusion of
predictors. Use a systematic method to determine which
predictors to use.
• Best subsets: Compares all combinations of n predictors
and outputs the best.
• Stepwise: Takes the best predictor first, then adds (and
subtracts) others (See Excel file)

10/26/2021 Chapter 7_Regression Model - ISE Department 31

12
10/26/2021

Systematic methods
• Forward selection: Takes the best predictor, then adds more
one at a time (See Excel File)
• Backward elimination: Starts with all predictors, subtracts
them one at a time.
• What is the best model?
• It depends!
– You typically have a purpose in creating a regression model. It is
usually true that the regression is to have some cause-and-effect
relationship. Because of that, you want the model that accurately
reflects what x variables cause a change in the y variable, even if
that means increasing residuals
– Sometimes you want only the best predictors, sometimes you want
all the good predictors, sometimes you want a certain number of
predictors … and so on.
10/26/2021 Chapter 7_Regression Model - ISE Department 32

Systematic methods
• Stepwise regression procedures are controversial.
• Most of the time, they will give you similar answers.
• Forward elimination was used because it was the
simplest to perform, which was a consideration in the
1960s and 70s due to a lack of computing.
• Best subsets can be computationally intensive for large
models (ones with lots of predictors), but generally gives
the most defensible answer.
• When in doubt, use best subsets. If it takes too long to
run, use stepwise.

13
10/26/2021

Best subsets regression procedure


• Step #1. Identify all of the possible regression models
• Step #2. Determine the k-predictor models that do the
"best" at meeting some well-defined criteria (j=1, 2, 3...)
– Mallows' Cp-statistic (measure of unbiased model): Identify
subsets of predictors for which the Cp value is near p (if
possible) or smallest Cp value.
– more than one model has a small value of Cp value near p, in
general, choose the simpler model
– Large R2(adj), or large Predicted R2, and small s
• Step #3. Further evaluate and refine the handful of
models

10/26/2021 Chapter 7_Regression Model - ISE Department 34

The Assumptions of Multiple Regression


1. There is a linear relationship. That is, there is a straight-line
relationship between the dependent variable and the set of
independent variables.
2. The variation in the residuals is the same for both large and
small values of the estimated Y To put it another way, the
residual is unrelated whether the estimated Y is large or small.
3. The residuals follow the normal probability distribution.
4. The independent variables should not be correlated. That is,
we would like to select a set of independent variables that are
not themselves correlated.
5. The residuals are independent. This means that successive
observations of the dependent variable are not correlated. This
assumption is often violated when time is involved with the
sampled observations.

35

14
10/26/2021

Four Quick Checks (Multiple Regression)


1) Does the model make sense (i.e., check slope terms, F test)?
2) Is there a statistically significant relationship between the dependent and
independent variables (t-test)?
3) What percentage of the variation in the dependent variable does the
regression model explain (R-Square)?
4) Do the residuals violate assumptions (Analysis of Residuals)?
1. zero mean
2. normally distributed
3. homoscedastic (constant variance)
4. mutually independent (non-autocorrelated): Is there a problem of serial
correlation among the error terms in the model (Durbin-Watson)?

10/26/2021 36

Residual Plots for salary (x1000000)


Normal Probability Plot Versus Fits
99
2
90
1
Residual
Percent

50 0

-1
10
-2
1
-4 -2 0 2 4 12 14 16 18 20
Residual Fitted Value

Histogram Versus Order


6.0
2

4.5 1
Frequency

Residual

3.0 0

-1
1.5
-2
0.0
-2 -1 0 1 2 2 4 6 8 10 12 14 16 18 20
Residual Observation Order

10/26/2021 Chapter 7_Regression Model - ISE Department 37

15
10/26/2021

• The correlation between two variables can be visualized


by creating a scatterplot of the data: Y against X
– Not linear is detected transform X’ (of the variable X), Y’ (of the
variable Y), or both, can often significantly improve the
correlation
• A residual plot can reveal whether a data set follows a
random pattern
– No random is detected transform the raw data to make it more
linear significantly improve a fit between X and Y

10/26/2021 Chapter 7_Regression Model - ISE Department 38

the individual error


against the
predicted value

10/26/2021 Chapter 7_Regression Model - ISE Department 39

16
10/26/2021

10/26/2021 Chapter 7_Regression Model - ISE Department 40

10/26/2021 Chapter 7_Regression Model - ISE Department 41

17
10/26/2021

10/26/2021 Chapter 7_Regression Model - ISE Department 42

if the scatterplot of the raw data (X, Y) looks like that shown in
Figure (a),
 Transform (X, Y) to (X’, Y’) so that the scatterplot looks more
like that displayed in Figure (b).

10/26/2021 Chapter 7_Regression Model - ISE Department 43

18
10/26/2021

Figure (a) Raw Data (b) Transformed Data


Apply Exponential model: that means to apply a
logarithmic transformation to the dependent variable y as
shown in Figure (b)

10/26/2021 Chapter 7_Regression Model - ISE Department 44

the Quadratic model: If the trend in the data follows the


pattern shown in Figure (a), we could take the square root of
y to get y’=√y.

10/26/2021 Chapter 7_Regression Model - ISE Department 45

19
10/26/2021

the Reciprocal model: A trend in the raw data as


shown in Figure (a) would suggest a reciprocal
transformation, i.e. y’=1/y.

10/26/2021 Chapter 7_Regression Model - ISE Department 46

the Logarithmic model: If the raw data follows a trend as


shown in Figure (a), a logarithmic transformation can be
applied to the independent variable x: x’ = log(x) ; a
logarithmic transformation can be applied to the
independent variable y: y’ = log(y) when Figure (a) has the
opposite values of x vs y

10/26/2021 Chapter 7_Regression Model - ISE Department 47

20
10/26/2021

• The Box-Cox transformation constitutes another


particularly useful family of transformations, which is
applied to the independent variable in most cases.

– T(X) = log(X), if λ = 0
– Where X is the variable being transformed and λ is referred to as
the transformation parameter.
• The optimal value of λ is then the value of λ
corresponding to the maximum correlation
• The Box-Cox transformation can also be applied to the Y
variable

10/26/2021 Chapter 7_Regression Model - ISE Department 48

Letting Minitab
calculate the optimal
lambda should
produce the best-
fitting results.

10/26/2021 Chapter 7_Regression Model - ISE Department 49

21
10/26/2021

Durbin-Watson tests
• Durbin-Watson tests for autocorrelation in
residuals from a regression analysis.
• Method 1:
– The test statistic ranges in between 0 to 4.
– A value of 2 indicates that there is no autocorrelation.
Value nearing 0 (i.e., below 2) indicates positive
autocorrelation and value towards 4 (i.e., over 2)
indicates negative autocorrelation.
– We could reduce the Dw value by increasing your
sample size

10/26/2021 50

• Durbin-Waston test is based on the assumption that the


errors in the regression model are generated by a first-
order autoregressive process observed at equally
spaced time periods, that is,

εt = ρεt−1 + at
where εt is the error term in the model at time period t, at is an
NID(0, σ2 a) random variable, and ρ(|ρ| < 1) is the autocorrelation
parameter.
• A simple linear regression model with first-order
autoregressive errors
yt = β0 + β1xt + εt
εt = ρεt−1 + at
10/26/2021 51

22
10/26/2021

• Most regression problems involving time series


data exhibit positive autocorrelation, the
hypotheses usually considered in the Durbin-
Watson test are
H0 : ρ = 0
H1 : ρ > 0
– If d < dL reject H0 : ρ = 0
– If d > dU do not reject H0 : ρ = 0
– If dL < d < dU test is inconclusive.
10/26/2021 52

10/26/2021 53

23
10/26/2021

10/26/2021 54

Assignment

10/26/2021 55

24

You might also like