Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views55 pages

Regression Analysis

The document provides an overview of regression analysis, focusing on linear regression and its applications in predicting relationships between response and predictor variables. It covers both simple and multiple linear regression models, including sample equations, interpretations, and examples such as the effects of fertilizer on tomato plant height and square footage on AirBnB rent. Additionally, it discusses model evaluation techniques, including goodness-of-fit measures and standard error of the estimate.

Uploaded by

vcuaoiwk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views55 pages

Regression Analysis

The document provides an overview of regression analysis, focusing on linear regression and its applications in predicting relationships between response and predictor variables. It covers both simple and multiple linear regression models, including sample equations, interpretations, and examples such as the effects of fertilizer on tomato plant height and square footage on AirBnB rent. Additionally, it discusses model evaluation techniques, including goodness-of-fit measures and standard error of the estimate.

Uploaded by

vcuaoiwk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Regression Analysis

Quan ta ve Methods for Managers


ti
ti
Introduction to Linear Regression

Quant Methods 2
Regression Analysis Introduc on to Linear Regression

Linear Regression

• Linear Regression determines the “best- ng” line that explains the
rela onship between a response variable and one or more predictor
variables.
‣ The results of a linear regression analysis include an equa on that relates the
response variable to the predictor variables.
• A regression model also allows us to make predic ons regarding the
response variable based on the known values of the predictor
variables.

Quant Methods 3
ti
ti
fi
tti
ti
ti
Regression Analysis Introduc on to Linear Regression

Simple Linear Regression

• The Simple Linear Regression model uses one predictor variable,


denoted by x, to explain the varia on in the response variable,
denoted by y:
y = ϐ0 + ϐ1 x + ϵ
‣ ϵ is the random error term,
‣ coe cients ϐ0 and ϐ1 are the unknown parameters to be es mated.
• The slope parameter ϐ1 determines whether the linear rela onship is
posi ve (ϐ1 > 0) or nega ve (ϐ1 < 0).

Quant Methods 4
ffi
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression

Sample Regression Equa on: Simple Regression

• The Sample Regression Equa on:


= b0 + b1 x
‣ where b0 and b1 represent the es mates of ϐ0 and ϐ1, respec vely.
• Interpreta on: The slope es mate b1 represents the change in the
predicted value of y when x increases by one unit.

Quant Methods 5

ti
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression

Sample Regression Equa on: Simple Regression

• Fi ed/Predicted values: i = b 0 + b x
1 i for the i th observa on.
• Residuals: ei = yi − i for the i th observa on.
• The es mates b0 and b1 are chosen to minimize
n
2

Sum of Squared Residuals (SSE) = ei
i=1

Quant Methods 6
tt
ti
ti


ti
ti
ti
Regression Analysis Introduc on to Linear Regression

Sample Regression Equa on: Example

• The linear rela onship between the sales price of a house (Price in
$1000) and its square footage (Sq ):
Predicted Price = 150 + 0.2 × Sq
‣ There is a posi ve rela onship between the size of a house and its price.
‣ If the square footage increases by 1 sq , we predict the price of a house to increase
by $200.
• The predicted sales price of a 2000 sq house:
150 + 0.2 × 2000= 550
Quant Methods 7
ti
ti
ti
ti
ft
ft
ti
f
ft
Regression Analysis Introduc on to Linear Regression

Mul ple Linear Regression

• The Mul ple Linear Regression model uses mul ple predictor
variables, denoted by x1, x2, …, xk, to explain the varia on in the
response variable, denoted by y:
y = ϐ 0 + ϐ 1 x1 + ϐ 2 x2 + … + ϐ k xk + ϵ
‣ ϵ is the random error term,
‣ coe cients ϐ0, ϐ1, ϐ2, …, ϐk are the unknown parameters to be es mated.
• The slope parameter ϐj determines whether the linear rela onship is
posi ve (ϐj > 0) or nega ve (ϐj < 0).

Quant Methods 8
ffi
ti
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression

Sample Regression Equa on: Mul ple Regression

• The Sample Regression Equa on:


= b 0 + b 1 x1 + b 2 x2 + … + b k xk
‣ where b0, b1, b2, …, bk represent the es mates of ϐ0, ϐ1, ϐ2, …, ϐk, respec vely.
• Interpreta on: The es mate bj represents the change in when xj
increases by one unit holding all other predictor variables constant.

Quant Methods 9

ti
ti
ti
ti
ti
ti
ti

ti
Regression Analysis Introduc on to Linear Regression

Sample Regression Equa on: Mul ple Regression

• Fi ed/Predicted values: i = b 0 + b x
1 1,i + b x
2 2,i + … + b x
k k,i for the i th

observa on.
• Residuals: ei = yi − i for the i th observa on.
• The es mates b0, b1, b2, …, bk are chosen to minimize
n
2

Sum of Squared Residuals (SSE) = ei
i=1

Quant Methods 10
tt
ti
ti
ti


ti
ti
ti
Estimating Linear Regression Models

Quant Methods 11
Regression Analysis Es ma ng Linear Regression Models

Ac va ng Analysis ToolPak on Windows


• Go to the File ribbon and choose
Op ons
• Then, go to Add-ins and click Go.
• Finally, select Analysis ToolPak and
click OK.

Quant Methods 12
ti
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

Ac va ng Analysis ToolPak on Macs

• Go to the Data ribbon and click Analysis Tools

• Then, select Analysis ToolPak and click OK.

Quant Methods 13
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

Simple Regression in Excel


• Under the Data Ribbon, go to Data Analysis tool:
‣ Then, select Regression.
Range of the values
of the predictor
variables.
Range of the values
of the response
variable.

Check if data range


contains a
descrip ve label.

Quant Methods 14
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

Growing Tomatoes
• Study the e ects of Fer lizer usage on the Height of the tomato
plants:
‣ Data File: Growing Tomatoes.xlsx

Sample Regression Equa on:


Predicted Height = 18.012 + 5.039 × Fer lizer
Quant Methods 15
ti
ti
ff
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

Growing Tomatoes: Interpreta ons

• There is a posi ve rela onship between the amount of fer lizer used
and the height of a tomato plant.
• If the fer lizer use increases by 1 oz, we predict the height of a
tomato plant to increase by 5.039 inches.
• The predicted height of a tomato plant receiving 2 oz of fer lizer:
18.012 + 5.039 × 2 = 28.09 inches

Quant Methods 16
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

AirBnB Rent
• Study the e ects of the square footage (Sq ) on the monthly Rent of
a house on AirBnB:
‣ Data File: AirBnB Rent.xlsx

Sample Regression Equa on:


Predicted Rent = −51.434 + 1.00871 × Sq
Quant Methods 17
ti
ti
ff
ti
f
f
Regression Analysis Es ma ng Linear Regression Models

AirBnB Rent: Interpreta ons

• There is a posi ve rela onship between the square footage and the
rent of a house.
• If the square footage increases by 100 sq , we predict the rent of an
AirBnB house to increase by $100.9.
• The predicted rent of a 1000 sq house:
−51.434 + 1.00871 × 1000 = $957.28

Quant Methods 18
ti
ti
ti
ti
ti
ft
f
Regression Analysis Es ma ng Linear Regression Models

AirBnB Rent- Model 2


• Study the e ects of the Distance to Transit and the square footage
(Sq ) on the monthly Rent of a house on AirBnB:
‣ Data File: AirBnB Rent.xlsx

Sample Regression Equa on:


Predicted Rent = 301.142 − 252.789 × DtoT + 1.013 × Sq
Quant Methods 19
ti
f
ti
ff
ti
f
Regression Analysis Es ma ng Linear Regression Models

AirBnB Rent- Model 2: Interpreta ons

• The rent of a house has a nega ve rela onship with the distance to
transit and a posi ve rela onship between the square footage.
‣ If the square footage increases by 100 sq , we predict the rent of
an AirBnB house to increase by $101.3 keeping Distance to Transit
constant.
‣ If the distance to transit increases by 1 mile, we the rent of an
AirBnB house to decrease by $252.8 keeping Sq constant.

Quant Methods 20
ti
ti
ti
ti
ti
ti
f
ti
ft
Regression Analysis Es ma ng Linear Regression Models

AirBnB Rent- Model 2: Interpreta ons

• The predicted rent of a 1000 sq house which is 0.5 miles away from
transit:
301.142 − 252.789 × 0.5 + 1.013 × 1000 = $1187.75

Quant Methods 21
ti
ti
ft
ti
Regression Analysis Es ma ng Linear Regression Models

Healthy Living

• Study the e ects of the Exercise on being Healthy:


‣ Data File: Healthy Living.xlsx

Sample Regression Equa on:


Predicted Healthy = 61.323 + 0.471 × Exercise

Quant Methods 22
ti
ti
ff
ti
Regression Analysis Es ma ng Linear Regression Models

Healthy Living: Interpreta ons

• There is a posi ve rela onship between the exercise and being


healthy.
• If the % of people doing regular exercise in a state increases by 1 %-
point, we predict % of people being healthy in the state to increase
by 0.471 %-point.
• The predicted % of healthy people in a state where half of the people
do regular exercise:
61.323 + 0.471 × 50 = 84.86
Quant Methods 23
ti
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models

Healthy Living- Alterna ve Models

• Study the e ects of the Exercise and Smoking on being Healthy:


Predicted Healthy = 73.959 + 0.337 × Exercise − 0.443 × Smoke
‣ There is a nega ve rela onship between the smoking and being healthy.
• Study the e ects of the Exercise and ea ng fruits and vegetables (FV)
on being Healthy:
Predicted Healthy = 61.051 + 0.439 × Exercise + 0.081 × FV
‣ There is a posi ve rela onship between the ea ng fruits and vegetables and being
healthy.

Quant Methods 24
ti
ti
ff
ff
ti
ti
ti
ti
ti
ti
ti
Model Evaluation and Selection

Quant Methods 25
Regression Analysis Model Evalua on and Selec on

Goodness-of-Fit Measures

• Goodness-of-Fit Measures help us assess how well the sample


regression equa on ts the data.
• Three goodness-of- t measures:
‣ the standard error of the es mate (se),
‣ the coe cient of determina on (R2), and
‣ the adjusted coe cient of determina on (Adjusted R2)

Quant Methods 26
ffi
ti
ffi
ti
ti
fi
fi
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

The Standard Error of the Es mate

• The standard error of the es mate se is calculated as


SSE
se =
n−k−1
‣ where SSE is the sum of squared residuals, n is the sample size, and k is the number
of predictor variables.
‣ n−k−1 is referred to as the degrees of freedom of the residuals.
• The model with the smaller se is preferred.

Quant Methods 27
ti
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

The Stand. Error of the Es mate: Excel Output


• The Excel regression output report the standard error of the
es mate.
• Recall the Growing Tomatoes example:
1477.0423
se = = 3.882
98

SSE = 1477.0423
n−k−1 = 98
Quant Methods 28
ti
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

se Comparison: AirBnB Rent

• Recall that we consider TWO di erent models while studying the


AirBnB Rent example:
‣ Model 1: Rent vs. Sq
‣ Model 2: Rent vs. Distance to Transit & Sq
• The se of these models:
‣ Model 1: 322.89
‣ Model 2: 288.69
• Model 2 ts the data be er according to se.

Quant Methods 29
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on

se Comparison: Healthy Living

• Recall that we consider FOUR di erent models while studying the


Healthy Living example:
‣ Model 1: Healthy vs Exercise
‣ Model 2: Healthy vs Exercise & Smoke
‣ Model 3: Healthy vs Exercise & FV
‣ Model 4: Healthy vs Exercise & Smoke & FV
• Model 2 ts the data the best according to se.

Quant Methods 30
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on

The Coe cient of Determina on (R )


2

• R2 quan es the sample varia on in the response variable y that is


explained by the sample regression equa on.
SSE
1−
SST
‣ where SSE is the sum of squared residuals, and
n
‣ SST is the sum of squared mean devia ons ( SST = ∑(yi − )2).
i=1

• The model with the larger R2 is preferred.

Quant Methods 31
ti
ti
fi
ffi
ti
ti
ti
ti
ti

Regression Analysis Model Evalua on and Selec on

R:
2 Excel Output
• The Excel regression output reports the standard error of the
es mate.
• Recall the Growing Tomatoes example:
1477.0423
R2 = 1 − = 0.8415
9316.72

SSE = 1477.0423
SST = 9316.72
Quant Methods 32
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

R2 Comparison: AirBnB Rent

• Recall that we consider TWO di erent models while studying the


AirBnB Rent example:
‣ Model 1: Rent vs. Sq
‣ Model 2: Rent vs. Distance to Transit & Sq
• The R2 of these models:
‣ Model 1: 0.695
‣ Model 2: 0.759
• Model 2 ts the data be er according to R2.

Quant Methods 33
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on

R2 Comparison: Healthy Living

• Recall that we consider FOUR di erent models while studying the


Healthy Living example:
‣ Model 1: Healthy vs Exercise
‣ Model 2: Healthy vs Exercise & Smoke
‣ Model 3: Healthy vs Exercise & FV
‣ Model 4: Healthy vs Exercise & Smoke & FV
• Model 4 ts the data the best according to R2.

Quant Methods 34
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on

Adjusted R2

• R2 tends to favor models with more predictor variables.


‣ CONCERN: R2 always increases by adding more predictors.
• Adjusted R2 solves this problem by accoun ng for the number of
predictors k:

( n−k−1 )
n−1
Adjusted R2 = 1 − (1−R2) ×

• The model with the larger Adjusted R2 is preferred.

Quant Methods 35
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

Adjusted R:
2 Excel Output
• The Excel regression output report the standard error of the
es mate.
• Recall the Growing Tomatoes example:
Adjusted R2

( 98 )
99
= 1 − (1−0.8415)× = 0.8398

Quant Methods 36
ti
ti
ti
Regression Analysis Model Evalua on and Selec on

Adjusted R2 Comparison: AirBnB Rent

• Recall that we consider TWO di erent models while studying the


AirBnB Rent example:
‣ Model 1: Rent vs. Sq
‣ Model 2: Rent vs. Distance to Transit & Sq
• The Adjusted R2 of these models:
‣ Model 1: 0.691
‣ Model 2: 0.753
• Model 2 ts the data be er according to Adjusted R2.

Quant Methods 37
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on

Adjusted R2 Comparison: Healthy Living

• Recall that we consider FOUR di erent models while studying the


Healthy Living example:
‣ Model 1: Healthy vs Exercise
‣ Model 2: Healthy vs Exercise & Smoke
‣ Model 3: Healthy vs Exercise & FV
‣ Model 4: Healthy vs Exercise & Smoke & FV
• Model 2 ts the data the best according to Adjusted R2.

Quant Methods 38
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on

Test of Joint Signi cance

• Test of Joint Signi cance aims at tes ng if at least one predictor has a
linear rela onship with the response variable.
• Consider the following mul ple linear regression model
y = ϐ 0 + ϐ 1 x1 + ϐ 2 x2 + … + ϐ k xk + ϵ
• The hypothesis test for the joint signi cance:
Ho: ϐ1 = ϐ2 = … = ϐk = 0
Ha: At least one ϐj ≠ 0 for j ∈ {1, 2, …, k}
‣ Excel reports the p-value as part of the regression outcome.

Quant Methods 39
ti
ti
ti
fi
fi
ti
ti
fi
Regression Analysis Model Evalua on and Selec on

Test of Individual Signi cance

• Test of Individual Signi cance aims at tes ng if a predictor has a


linear rela onship with the response variable.
• Consider the following mul ple linear regression model
y = ϐ 0 + ϐ 1 x1 + ϐ 2 x2 + … + ϐ k xk + ϵ
• The hypothesis test for the individual signi cance of predictor j:
Ho: ϐj = 0
Ha: ϐj ≠ 0

Quant Methods 40
ti
ti
ti
fi
ti
fi
ti
fi
Regression Analysis Model Evalua on and Selec on

Test of Individual Signi cance

• The hypothesis test for the individual signi cance of predictor j:


Ho: ϐj = 0
Ha: ϐj ≠ 0
bj
• Test sta s c: t =
Std Err of bj
• p-value: 2 × (1 − T.DIST(|t|, n−k−1, 1))
‣ Reject NULL if p-value< α

Quant Methods 41
ti
ti
ti
ti
fi
fi
Regression Analysis Model Evalua on and Selec on

Test of Signi cance: AirBnB Rent

Joint Signi cance


p-value ≈ 0

Test stat for the Intercept: p-value for the Intercept:


301.142 2 × (1 − T.DIST( |2.364| , 77 , 1))
= 2.364
127.372 = 0.021

Quant Methods 42
ti
fi
ti
fi
Linear Regression Assumptions and Common
Violations

Quant Methods 43
Regression Analysis Linear Regression Assump ons and Common Viola ons

Major Linear Regression Assump ons

• Correct Model Speci ca on:


‣ The data supports the linear rela onship between the response and the predictor
variables
• No Mul collinearity:
‣ The predictor variables should not be highly correlated.
• No Changing Variability (Heteroskedas city):
‣ The variability of the residuals does not depend on the values of the predictor
variables.

Quant Methods 44
ti
ti
fi
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Correct Model Speci ca on

• We need to verify the followings:


‣ The sca er plot of the response and the predictor variables do not show any signs
of non-linearity.
‣ The residual are randomly dispersed across the observa ons of an explanatory
variable.
• Remedy:
‣ Consider adding simple transforma ons of the response variable and/or the
predictor variables.

Quant Methods 45
tt
ti
ti
fi
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Correct Model: Growing Tomatoes


• The sca er plot of the Height and Fer lizer variables supports linear
rela onship.
• The residual are randomly dispersed.

Quant Methods 46
ti
tt
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Incorrect Model: Energy Cost


• Study the e ects of Temperature on Energy Cost:
‣ Data File: Energy Cost.xlsx
• Model: EnergyCost = ϐ0 + ϐ1 × Temp
Under
- Sca er plot and residuals show signs of non-linearity (convexity)!!!
Es mated

Over
Es mated

Quant Methods 47
ti
ti
tt
ff
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Incorrect Model: Wage Age Educa on


• Study the e ects of Age and years of Educa on on Wage:
‣ Data File: Wage Age Educa on.xlsx
• Model: Wage = ϐ0 + ϐ1 × Age + ϐ2 × Educa on Under
- Sca er plot and residuals show signs of non-linearity (concavity)!!! Es mated

Over
Es mated

Quant Methods 48
ti
ti
tt
ff
ti
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

No Mul collinearity

• We need to verify that the predictor variables are not highly


correlated with each other.
‣ High correla on between predictor variables is assumed when the correla on
coe cient is more than 0.75 or less than -0.75.
‣ Correla on between a predictor and the response variable is not a problem.
‣ AirBnB Rent: The correla on between Sq and Distance to Transit is
almost zero.
• Remedy: Remove one of the highly correlated predictor variables.

Quant Methods 49
ffi
ti
ti
ti
ti
ti
ti
f
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Median Home Value


• Study the e ects of Household Income, Per-Capita Income, and
Owner Occ % on Home Value in a state:
‣ Data File: Median Home Value.xlsx
• Regression outcome using all predictors:

Quant Methods 50
ff
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Correla on Matrix for Predictors

• How about the correla on between the predictor variables?


• Excel’s Analysis ToolPak includes Correla on analysis tool that
creates a correla on matrix
‣ Make sure Analysis ToolPak add-in is ac ve.(See slides 12-13)
• Under the Data Ribbon, go to Data Analysis tool. Then, select
Correla on.

Quant Methods 51
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Median Home Value: Correla on

• The correla on between the predictor variables:


HH Income Per-cap Income
Per-cap Income 0.858
Owner Occ % -0.339 -0.532
‣ Household Income and Per-Capita Income are highly correlated.
• Two alterna ve models to consider:
‣ Model 1: Home Value = ϐ0 + ϐ1 × (HH Income) + ϐ2 × (Owner Occ %)
‣ Model 2: Home Value = ϐ0 + ϐ1 × (Per-cap Income) + ϐ2 × (Owner Occ %)

Quant Methods 52
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Mul collinearity: Median Home Value


• Comparison of the alterna ve models:

Model 1 Model 2
Model 1 is Intercept 348187.14* 285604.08
the be er HH Income 7.74* NA
alterna ve.
Per Cap Income NA 13.21*
Owner Occ % -8027.90* -6454.08*
Adjusted R2 0.8069 0.6621
* represents signi cance at the 5% level. NA denotes that the variable is not included.

Quant Methods 53
tt
ti
ti
fi
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

No Changing Variability (Heteroskedas city)

• We need to verify that


‣ The variability of the residual are not increasing or decreasing over the values of
predictor variables.
• Remedy:
‣ Calculate and use robust standard errors. Not possible in Excel!!!
‣ However, the coe cient es mates are s ll unbiased.

Quant Methods 54
ti
ffi
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons

Changing Variability: Store Sales


• Study the e ects of retail store size (Sq ) on Sales volume:
‣ Data File: Store Sales.xlsx
• Model: Sales = ϐ0 + ϐ1 × Sq
- Residuals show signs of changing variability!!!

Quant Methods 55
ff
ti
ti
f
f

You might also like