0% found this document useful (0 votes)

59 views82 pages

Chapter 3 Analysis of Regression - Part 1

The document discusses regression analysis and models. It defines regression as determining the relationship between dependent and independent variables using a function. Linear regression involves either a single independent variable (simple linear regression) or multiple independent variables (multiple linear regression). Nonlinear models can sometimes be transformed into linear models using logarithms or other transformations. The document provides examples of linear and nonlinear regression models.

Uploaded by

Thy Trương Phương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views82 pages

Chapter 3 Analysis of Regression - Part 1

Uploaded by

Thy Trương Phương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr.

Tran Van Hai Trieu

Faculty of Information Systems

Chapter 3

Analysis of Regression
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Learning objectives
• Models of Regression.
• Simple Linear Regression.
• Multiple Linear Regression.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

1. Models of Regression
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Correlation Relationship

“The interconnected relationship among indicators

or criteria of a phenomenon, where the fluctuation
of one indicator (result indicator) is affected by
others (cause criteria) called correlation”.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Method of Correlation Analysis

• The correlation analysis process includes the

following specific tasks.
1. Qualitative analysis of the nature of the relationship.
2. Use the method of clustering or graphing to
determine the nature and trend of that relationship.
3. Specifically, express the correlation relationship
using linear or nonlinear regression equations and
compute the parameters of the equations.
4. Evaluate the tightness of the correlation relationship.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Linear Correlation Coefficient

• Correlation Coefficient (r) is a statistical quantity
used to measure the linear relationship between
two variables and has a value ranging from -1 to 1.
• Formula xy − x.y
r= ➢ When r is closer to 0,
 x . y the relationship is
or weaker. Especially, if r
x = 0, the relationship
r = b. does not happen.
y
➢ Conversely, when r is
Where: closer to 1 or -1, the
relationship is closer (r
> 0 has a positive
relationship, and r < 0
has a negative
relationship).
University of Information Technology – Vietnam National University, Ho Chi Minh City
8
Faculty of Information Systems
Dr. Tran Van Hai Trieu

Linear Correlation Coefficient (Cont.)

• Example of computing the linear correlation coefficient
➢ Suppose that we have the below table 1 related to
workers with age of experience and labor productivity.
Age of Labor
Workers experience - x productivity - y xy x2 y2
(years) (millions - VNĐ)
A 1 3 3 1 9
B 3 12 36 9 144
C 4 9 36 16 81
D 5 16 84 49 144
E 7 12 84 49 144
F 8 21 168 64 441
G 9 21 189 81 441
H 10 24 240 100 576
I 11 19 209 121 361
K 12 27 324 144 729

Sum 70 164 1369 610 3182

Mean 7 16,4 136,9 - -
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Linear Correlation Coefficient (Cont.)

• Based on data from Table 1, we can compute the
correlation coefficient as follows:
610  70  ➢ From the result of the
2

x = −   = 3,464 correlation coefficient,

10  10 
we can conclude that
2
there is a positive
3182  164  relationship between
y = −  = 7,017
10  10  age of experience and
labor productivity.
136,9 − (7  16,4)
r= = 0,909
3,464  7,017
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

What is Regression Model?

• A regression model determines a relationship
between an independent variable and a
dependent variable, by providing a function.
• Formulating a regression analysis helps you
predict the effects of the independent variable
on the dependent one.
• For example
➢ We can say that age and height can be
described using a linear regression model. Since
a person’s height increases as age increases,
they have a linear relationship.
Source: https://www.voxco.com
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Linear Models
• Regression analysis is a tool for building
statistical models that characterize relationships
among a dependent variable and one or more
independent variables, all of which are
numerical.
• Simple linear regression involves a single
independent variable.
Y = b 0 + b 1X
• Multiple linear regression involves two or more
independent variables.
Y = b0 + b1X1 + b2X2 +…..+ bkXk
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models
• Nonlinear models can be transformed into linear
models as follows:
1. Logarithm – Logarithm model
❖ Consider the exponential regression model
Y = b0.(X)b1.eu
❖ Convert the above equation to linear model by using
Logarithm of both sides as follows:
Ln(Y) = Lnb0 + b1.Ln(X) + u; Set Lnb0 = α.
Ln(Y) = α + b1.Ln(X) + u
❖ This is a linear model according to parameters, such as
α and b1. It is linear according to Ln(X) and Ln(Y).
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
1. Logarithm – Logarithm model
❖ Consider marginal effects

❖ Meaning: (% change of Y) = b1*(% change of X)

➔When X changes 1%, then Y changes b1%.
❖ Generalized logarithm - logarithm model
Ln(Yi)= Lnb0 + b1.Ln(X1i) + b2i.Ln(X2i) +…+ bni.Ln(Xni) + Ui
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
1. Logarithm – Logarithm model
❖ Application of the Cobb-Douglas production function
Y = b0 (X1)b1(X2)b2eU (1)
Where: Y: Output; X1 : Labor; X2 : Capital.
❖From the formula (1), we have the below formula (2):
Ln(Y) = Lnb0 + b1.Ln(X1) + b2.Ln(X2) + U (2)
❖ Meaning b1 and b2:
✓ When X1 increases or decreases 1% and X2 do not change, then
Y changes b1%.
✓ When X2 increases or decreases 1% and X1 do not change, then
Y changes b2%.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
1. Logarithm – Logarithm model
❖ An example of the linear equation is as follows:
LnY = -3,3386 + 1,4988 LnX1 + 0,4899 LnX2
Where: Y: Output of Agriculture (millions $); X1: Days of
Labor (millions day); X2: Total of Capital (millions $).
❖ The meaning of regression coefficient:
✓ If total of capital is kept constant when days of labor increase
by 1%, the average output increases by 1.5%.
✓ If days of labor are kept constant when total of capital
increases by 1%, the average output increases by 0.5%.
✓ b1 + b2 > 0: Increase scale effectively.
✓ b1 + b2 ≤ 0: Y does not increase or decrease. The increase in
scale is ineffective.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
2. Semi-logarithm model
2.1. Logarithm – Linear model
✓ The formula of Gross Profit: Yt = Y0 (1+r)t
✓ Convert the above equation to linear model by using
Logarithm of both sides as follows:
❖ Ln(Yt) = Ln(Y0) + t*Ln(1+r) (1)
✓ Set b0 = Ln(Y0), b1 = Ln(1+r)
❖ Ln(Yt) = b0 + b1t (2)
✓ Add random errors (ui) in the formula (2), and we
have the formula (3) as follows:
❖ Ln(Yt) = b0 + b1t + ui (3)
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
2. Semi-logarithm model
2.1. Logarithm – Linear model
✓ Consider marginal effects
b0

✓ Meaning: Change of Y = b1*(change of t)

➔ When t changes one year, Y changes b1*100%.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
2. Semi-logarithm model
2.1. Logarithm – Linear model
✓ Example of wage regression and years of
education

➔ Please, consider the meaning of the regression

coefficient.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Nonlinear models can be transformed into linear
models as follows:
2. Semi-logarithm model
2.2. Linear - Logarithm model
Y = b0 + b1.Ln(X) + U (1)
✓ Consider marginal effects

✓ Meaning: Change of Y = b1*(% change of X)

➔ When X changes 1%, Y changes b1 / 100 units.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Nonlinear Models (Cont.)

• Table of summarized models
Models Dependent Independent dY / dX Explanation of
Variable Variable meaning

Normal Linear Y X dY / dX X changes 1 unit,

then Y changes
b1 units.

Linear - Y Ln(X) dY / X changes 1%,

Logarithm d(LnX) then Y changes
(b1 / 100) units.

Logarithm - Ln(Y) X d(LnY) / X changes 1 unit,

Linear dX then Y changes
(b1 * 100)%.

Logarithm – Ln(Y) Ln(X) d(LnY) / X changes 1%,

Logarithm d(LnX) then Y changes
b1 %.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Equation Testing for

Simple Linear Regression
• R-Square Coefficient (R2) measures linear model fit.
• Adjusted (R2) reflects the fit level of the overall model.
• With points connecting the experimental regression line:
Ai(xi, yi), i=1,…,n
• Suppose that we find the regression equation as follows:
~
y = a + bx
• Set
yi = axi+b+ei
• ei: represents the portion of variation in Y that cannot be
explained by a linear relationship between X and
yi = ~
y + ei
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Equation Testing for

Simple Linear Regression (Cont.)
• SSR: Sum of Square for Regression
• SSE: Sum of Square for Error (Residual)
• SST: Sum of Square for Total
n
SSR SSE
SSR =  (~yi − y ) 2
i =1
R = 2

SST
= 1−
SST
n n
SSE =  (~
yi − yi ) 2 SST =  (y i − y)2
i =1 i =1

SSE /(n − ( k + 1))

Adjusted R 2 = 1 −
SST /(n − 1)

• SST = SSR + SSE

• Meaning: Quantity representing the total variation of Y = the
variation of Y explained by the Xi and the part of the variation
24 of Y due to other factors.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Equation Testing for

Simple Linear Regression (Cont.)
• F-Test
• We state hypotheses
➢ H0 : Regression equation is not appropriate.
Ha : Regression equation is appropriate.
• With the number of independent variable k = 1.
SSR SSR
MSR = =
k 1
SSE SSE
MSE = =
n − ( k + 1) n−2
MSR
F = ~ Fisher(1, n − 2)
MSE
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Equation Testing for

Multiple Linear Regression
• Suppose that we have a multiple linear regression as
follows:
Y = b0 + b1X1 + b2X2 +…..+ bkXk
• We continue to apply the formulas (SSR, SSE, SST)
with meaning like in the case of single linear regression.
• We state hypotheses
➢ H0 : Regression equation is not appropriate.
Ha : Regression equation is appropriate.
SSR
MSR =
k
SSE
MSE =
n − ( k + 1)
MSR
F = ~ Fisher( k , n − ( k + 1))
MSE
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Coefficient Testing

for Simple Linear Regression
• State hypotheses:
H0: b = 0
Ha: b  0

S e2 MSE
Sb = 2
= n

 (x
n

 xi2 − n x
i =1 i =1
i − x) 2

i
e 2

SSE
S e2 = i =1
= = MSE
n−2 n−2
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Coefficient Testing

for Multiple Linear Regression
• State hypotheses:
H0: bj = 0
Ha: bj  0

Gj: Set of variables except for Xj.

Se2
Sb j =
(1 − RX jG j ) * S Xj * (n − 1)
2 2
.
R2XjGj= Correlation between
Where:
Se2: MSE XY − X Y
S2Xj : Sample variance of variable Xj. R 2
XY =
R2XjGj: Correlation between S x SY
Confidence interval for bj: bj  t/2*Sbj
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

In conclusion
• Correlation Relationship
• Method of Correlation Analysis
• Linear Correlation Coefficient
• Linear Models
• Nonlinear Models
• Regression Equation Testing for
Simple and Multiple Linear Regression
• Regression Coefficient Testing
for Simple and Multiple Linear Regression
Understand
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

2. Simple Linear Regression

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression

• Finds a linear relationship between.
➢ one independent variable X and
➢ one dependent variable Y
• First, prepare a scatter plot to verify the data has
a linear trend.
• Use alternative approaches if the data is not
linear.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Example of Home Market Value Data

➢ Size of a house is
typically related to its
market value.
✓ X = square footage
✓ Y = market value ($)
➢ The scatter plot of the
full data set (42 homes)
indicates a linear trend.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Finding the Best-Fitting Regression Line
➢ Two possible lines are shown below.
➢ Line A is clearly a better fit to the data.
• We want to determine the best regression line.
^
Y = b 0 + b 1X
where:
b0 is the intercept
b1 is the slope
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Using Excel to Find the Best Regression Line
• Market value = 32673 + 35.036(square feet)
➢ The regression
model explains
variation in market
value due to size
of the home.
➢ It provides better
estimates of
market value than
simply using the
average.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Least-Squares Regression
• Regression analysis finds
➢ the equation of the best-
fitting line that minimizes

➢ the sum of the squares of

the observed errors
(residuals).
• Using calculus we can solve for the slope and
intercept of the least-squares regression line.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Least-Squares Regression Equations

• Slope

➢ b1 = SLOPE(known y’s, known x’s)

• Intercept
➢ b0 = INTERCEPT(known y’s,^ known x’s)
• Predict Y for specified X values: Y = b0 + b1X
^
Y = TREND(known y’s, known x’s, new x’s)
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Using Excel Functions to Find Least-Squares
Coefficients
➢ Slope = b1 = 35.036
= SLOPE(C4:C45, B4:B45)
➢ Intercept = b0 = 32,673
= INTERCEPT(C4:C45, B4:B45)
➢
^ Estimate Y when X = 1800 square feet

Y = 32,673 + 35.036(1800) = $95,737.80

=TREND(C4:C45, B4:B45, 1800)
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Excel Regression tool

➢ Data
➢ Data Analysis
➢ Regression
❖ Input Y Range
❖ Input X Range
❖ Labels
• Excel outputs a table
with many useful
regression statistics.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Regression Statistics in Excel’s Output

➢ Multiple R
❖ | r | where r is the sample correlation
coefficient.
❖ r varies from -1 to +1 (r is negative if slope
is negative).
➢ R Square
❖ Coefficient of determination, R2 varies from
0 (no fit) to 1 (perfect fit).
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Regression Statistics in Excel’s Output

➢ Adjusted R Square

❖ Adjusts R2 for sample size and number of X

variables.

➢ Standard Error

❖ Variability between observed & predicted Y

variables.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Example of Interpreting Regression Statistics for
Simple Linear Regression (Home Market Value)

53% of the variation in home market values

can be explained by home size.
The standard error of $7287 is less than
standard deviation (not shown) of $10,553.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Regression Analysis of Variance
➢ ANOVA conducts an F-test to determine whether
variation in Y is due to varying levels of X.
➢ ANOVA is used to test for significance of regression:
❖ H0: population slope coefficient = 0
❖ H1: population slope coefficient ≠ 0
➢ Excel reports the p-value (Significance F).
➢ Rejecting H0 indicates that X explains variation in Y.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Example of Interpreting Significance of Regression
Home size is not a significant variable
Home size is a significant variable
➢ p-value = 3.798 x 10-8
❖ Reject H0.
❖ The slope is not equal to zero.
• Using a linear relationship, home size is a significant
variable in explaining variation in market value.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Testing Hypotheses for Regression Coefficients
➢ An alternate method for testing

is to use a t-test:

➢ Excel provides the p-values for tests on the slope

and intercept.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Example of Interpreting Hypothesis Tests for

Regression Coefficients (Home Market Value)
➢ p-value for test on the intercept = 0.000649
➢ p-value for test on the slope = 3.798 x 10-8
➢ Both tests reject their null hypotheses.
➢ Both the intercept and slope coefficients are
significantly different from zero.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Simple Linear Regression (Cont.)

• Example of Interpreting Hypothesis Tests for

Regression Coefficients (Home Market Value)
➢ 95% confidence interval estimates
➢ Intercept is between $14,823 and $50,523
➢ Slope is between 24.59 and 45.48$/sq.ft.
^
➢ Lower extreme: Y = 14,823 + 24.59X
^
➢ Upper extreme: Y = 50,523 + 45.48X
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

In conclusion

• Simple Linear Regression.

Understand
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

3. Multiple Linear Regression

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Multiple Linear Regression

• Multiple Regression has more than one

independent variable.
• The multiple linear regression equation is:

• The ANOVA test for significance of the entire

model is:

• One can also test for significance of individual

regression coefficients.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Multiple Linear Regression (Cont.)

• Example of Interpreting Regression Results for
the Colleges and Universities Data
➢ Colleges try to predict student graduation rates
using a variety of characteristics, such as:
1. Median SAT 3. Acceptance rate
2. Expenditures/student 4. Top 10% of HS class

Y
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Multiple Linear Regression (Cont.)

• Example of Interpreting Regression Results
for the Colleges and Universities Data
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Multiple Linear Regression (Cont.)

• Example of Interpreting Regression Results
for the Colleges and Universities Data

All of the slope

coefficient p-values
are < 0.05.

The residual plots (only one shown

here) show random patterns about 0.

Normal probability plots (not shown)

also validate assumptions.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models

• All of the independent variables in a linear

regression model are not always significant.
• We will learn how to build good regression
models that include the “best” set of variables.
• Banking Data includes demographic information
on customers in the bank’s current market.

Y
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Predicting Average Bank Balance using Regression

Home Value and Education

are not significant.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Systematic Approach to Building Good Multiple
Regression Models
1. Construct a model with all available independent
variables and check for significance of each.
2. Identify the largest p-value that is greater than α.
3. Remove that variable and evaluate adjusted R2.
4. Continue until all variables are significant.
➔ Find the model with the highest adjusted R2.
(Do not use unadjusted R2 since it always increases
when variables are added).
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying the Best Regression Model
➢ Bank regression after removing Home Value

Adjusted R2 improves slightly.

All X variables are significant.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Multicollinearity
➢ It occurs when there are strong correlations among
the independent variables.
➢ Makes it difficult to isolate the effects of independent
variables.
➢ Signs of slope coefficients may be opposite of the
actual value and p-values can be inflated.
• Correlations exceeding ±0.7 are an indication that
multicollinearity might exist.
• Variance Inflation Factors are a better indicator.
• Parsimony is an age-old principle that applies here.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Colleges and Universities (full model)

Full model
Adjusted R2 = 0.4921
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Correlation Matrix (Colleges and Universities data)

➢ All of the correlations are within ±0.7

➢ Signs of the coefficients are questionable for

Expenditures and Top 10%.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Colleges and Universities (reduced model)

Dropping Top 10%

Adjusted R2 drops to 0.4559
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Colleges and Universities (reduced model)

Dropping Expenditures
Adjusted R2 drops to 0.4556
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Colleges and Universities (reduced model)

Dropping Expenditures and Top 10%

Adjusted R2 drops to 0.3613

Which of the 4 models would you choose?

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying the Best Regression Model
➢ Banking Data (full model)

Full Model
Adjusted R2 = 0.9441
Education and Home Value
are not significant.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying Potential Multicollinearity
➢ Correlation matrix for the Banking data

➢ Some of the correlations exceed 0.7 for Home

Value and Wealth.
➢ Signs of the coefficients for predicting bank
balance are as expected (positive).
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying the Best Regression Model
➢ Banking Data (reduced model)

Dropping Wealth and Home Value

Adjusted R2 drops to 0.9201
Education is not significant.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying the Best Regression Model
➢ Re-ordered Correlation matrix for Banking data

➢ By re-ordering the variables, we can see the

correlations for Age, Education, and Wealth are all
within ± 0.7.
➢ Let’s try a reduced model with the Age, Education,
and Wealth variables.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Building Good Regression Models (Cont.)

• Example of Identifying the Best Regression Model
➢ Banking Data (reduced model) ** best model

Dropping Income and Home Value.

Adjusted R2 = 0.9345.
All variables are significant.
Multicollinearity is not a problem.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression with Categorical Variables

• Dealing with Categorical Variables
➢ Must be coded numeric using dummy variables.
➢ For variables with 2 categories, code as 0 and 1.
➢ For variables with k ≥ 3 categories, create k−1
binary (0,1) variables.
• Interaction Terms
➢ A dependence between two variables is called
interaction.
➢ Test for interaction by adding a new term to the
model, such as X3 = X1X2.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression with Categorical Variables (Cont.)

• Example of A Model with Categorical Variables

➢ Employee Salaries provides data for 35
employees.
➢ Predict Salary using Age and MBA (yes=1, no=0).
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression with Categorical Variables (Cont.)

• Example of A Model with Categorical Variables
➢ Salary = 893.59 + 1044(Age) for those without MBA
➢ Salary =15,660.82 + 1044(Age) for those with MBA

Adjusted R2 = 0.949858
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of Incorporating Interaction Terms in a

Regression Model
➢ Define an interaction between Age and MBA and
include in the regression model.
➢ Interaction = (Age)(MBA)
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of Incorporating Interaction Terms in a

Regression Model

MBA is now insignificant so we

will drop it from the model.

Adjusted R2 = 0.976701
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of Incorporating Interaction Terms in a

Regression Model
➢ Salary = 3,323 + 984(Age) for those without MBA
➢ Salary = 3,323 + 1410(Age) for those with MBA

Adjusted R2 = 0.976727
(a slight improvement)
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of A Regression Model with Multiple

Levels of Categorical Variables
➢ Surface Finish data provides measurements for 35
parts produced on a lathe.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of A Regression Model with Multiple

Levels of Categorical Variables
➢ Tool Type (A,B,C,D) is now
coded as 3 dummy variables.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

• Example of A Regression Model with Multiple

Levels of Categorical Variables

Tool A: Surf. Finish = 24.5 + 0.098 RPM

Tool B: Surf. Finish = 11.2 + 0.098 RPM
Tool C: Surf. Finish = 4.0 + 0.098 RPM
Tool D: Surf. Finish = -1.6 + 0.098 RPM
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Models
with Nonlinear Terms
• Curvilinear Regression
➢ Curvilinear models may be appropriate when
scatter charts or residual plots show nonlinear
relationships.
➢ A second order polynomial might be used

➢ Here β1 represents the linear effect of X on Y

and β2 represents the curvilinear effect.
➢ This model is linear in the β parameters so we
can use linear regression methods.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Models
with Nonlinear Terms (Cont.)

• Example of Modeling Beverage Sales Using

Curvilinear Regression
➢ Sales of cold beverages increase when it is
hotter outside.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Models
with Nonlinear Terms (Cont.)
• Example of Modeling Beverage Sales Using
Curvilinear Regression

U-shape residual plot

University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Models
with Nonlinear Terms (Cont.)
• Example of Modeling Beverage Sales Using
Curvilinear Regression

Residual
pattern is
more random

Sales = 142,850
−3643(temperature)
+ 23.3(temperature)2
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

Regression Models
with Nonlinear Terms (Cont.)
• Example of Modeling Beverage Sales Using
Curvilinear Regression
Second Order Polynomial Trendline
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

In conclusion

• Multiple Linear Regression.

• Building Good Regression Models.
• Regression with Categorical
Variables.
• Regression Models Understand
with Nonlinear Terms.
University of Information Technology – Vietnam National University, Ho Chi Minh City Dr. Tran Van Hai Trieu
Faculty of Information Systems

THANK YOU
FOR YOUR ATTENTION

Q&A

Linear Regression
No ratings yet
Linear Regression
16 pages
Predictive Analytics Using Regression
80% (5)
Predictive Analytics Using Regression
62 pages
ANOVA - Two-Way Mixed ANOVA (BW)
No ratings yet
ANOVA - Two-Way Mixed ANOVA (BW)
10 pages
Financial Management - Risk and Return Assignment 2 - Abdullah Bin Amir - Section A
No ratings yet
Financial Management - Risk and Return Assignment 2 - Abdullah Bin Amir - Section A
3 pages
Multicollinearity Test Guide
No ratings yet
Multicollinearity Test Guide
3 pages
Regression Corr
No ratings yet
Regression Corr
15 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
Multicollinearity & Heteroskedasticity Guide
No ratings yet
Multicollinearity & Heteroskedasticity Guide
39 pages
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
No ratings yet
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
14 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
No ratings yet
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
69 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Econometrics for Students
No ratings yet
Econometrics for Students
32 pages
Practico #3
No ratings yet
Practico #3
55 pages
Lec 7
No ratings yet
Lec 7
39 pages
UC Berkeley Econ 140 Section 10
No ratings yet
UC Berkeley Econ 140 Section 10
8 pages
Simple Linear Regression and Correlation 568a5ac2ce9b3
No ratings yet
Simple Linear Regression and Correlation 568a5ac2ce9b3
31 pages
Linear Regression
No ratings yet
Linear Regression
53 pages
Design of Experiments 1
No ratings yet
Design of Experiments 1
37 pages
Econometrics & Regression Analysis
No ratings yet
Econometrics & Regression Analysis
9 pages
Chapter 2 - Lecture Slides
No ratings yet
Chapter 2 - Lecture Slides
74 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Regression Analysis: Causal Relationship Between The Explanatory and
No ratings yet
Regression Analysis: Causal Relationship Between The Explanatory and
17 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Least Squares Matrix Form PDF
No ratings yet
Least Squares Matrix Form PDF
16 pages
On Tap
No ratings yet
On Tap
6 pages
Simple Linear Regression and Correlation PDF
No ratings yet
Simple Linear Regression and Correlation PDF
7 pages
Econometrics Analysis for Students
No ratings yet
Econometrics Analysis for Students
10 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
46 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
36 pages
Analisis Pengaruh Produk, Merek, Harga, Dan Promosi Terhadap Keputusan Pembelian Sepeda Motor Honda Beat
No ratings yet
Analisis Pengaruh Produk, Merek, Harga, Dan Promosi Terhadap Keputusan Pembelian Sepeda Motor Honda Beat
26 pages
2010-05-29 145847 Hospital
No ratings yet
2010-05-29 145847 Hospital
21 pages
Lecture 3 - Functional Forms
No ratings yet
Lecture 3 - Functional Forms
31 pages
Lecture Two (Copy)
No ratings yet
Lecture Two (Copy)
27 pages
Regression Models for Math Majors
100% (1)
Regression Models for Math Majors
30 pages
Cars
No ratings yet
Cars
103 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
MANOVA and MANCOVA Guide
No ratings yet
MANOVA and MANCOVA Guide
10 pages
Calibration Linear
No ratings yet
Calibration Linear
15 pages
Analysis of Variance (ANOVA)
100% (6)
Analysis of Variance (ANOVA)
18 pages
Regresi Data Panel
No ratings yet
Regresi Data Panel
10 pages
Unit 4
No ratings yet
Unit 4
5 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Ritt-Isabel - Lecture 3
No ratings yet
Ritt-Isabel - Lecture 3
20 pages
MC Multiple Regression
No ratings yet
MC Multiple Regression
7 pages
Least Squares Methods To Forecast Sales For A Company
No ratings yet
Least Squares Methods To Forecast Sales For A Company
5 pages
Topic Three Sta450 (Part1)
No ratings yet
Topic Three Sta450 (Part1)
6 pages
Chapter 4 Functional Form
No ratings yet
Chapter 4 Functional Form
27 pages
Baitap English
No ratings yet
Baitap English
7 pages
(Indian Economics Services) (Econometrics) 05. Functional Forms of Regression Models PDF
No ratings yet
(Indian Economics Services) (Econometrics) 05. Functional Forms of Regression Models PDF
8 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
14 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
S1 Exercise 5C
No ratings yet
S1 Exercise 5C
5 pages
Econometrics I - Lecture 6 (Wooldridge)
No ratings yet
Econometrics I - Lecture 6 (Wooldridge)
42 pages
Assess 1 PED 106 Lesson 6
No ratings yet
Assess 1 PED 106 Lesson 6
75 pages
Unit 3
No ratings yet
Unit 3
30 pages
4204 Mid 02 26th C
No ratings yet
4204 Mid 02 26th C
8 pages
Module 3
No ratings yet
Module 3
34 pages
Chap 15
No ratings yet
Chap 15
44 pages
Slides 2 Iu
No ratings yet
Slides 2 Iu
44 pages
Assignment-Regression Analysis
No ratings yet
Assignment-Regression Analysis
13 pages
Lesson 2 - 1
No ratings yet
Lesson 2 - 1
44 pages
Rohan Datla Week 9 Homework 6
No ratings yet
Rohan Datla Week 9 Homework 6
3 pages
Quiz Review Business Statistics
No ratings yet
Quiz Review Business Statistics
9 pages
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
No ratings yet
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
72 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
Unit III
No ratings yet
Unit III
13 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
11 - Econometrics - Linear Regression
No ratings yet
11 - Econometrics - Linear Regression
20 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Jamboree
No ratings yet
Jamboree
10 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
TS Lec11 Stationary Time Series (Chapter 2)
No ratings yet
TS Lec11 Stationary Time Series (Chapter 2)
18 pages
Module 6
No ratings yet
Module 6
35 pages
Inrto To ML
No ratings yet
Inrto To ML
21 pages
Econometrics Part1 Notes
No ratings yet
Econometrics Part1 Notes
7 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Regression
No ratings yet
Regression
19 pages
Econometrics I - Lecture 6 (Wooldridge)
No ratings yet
Econometrics I - Lecture 6 (Wooldridge)
36 pages
Non Linear Regression Models
No ratings yet
Non Linear Regression Models
20 pages