Regression Analysis
Regression Analysis
Quant Methods 2
Regression Analysis Introduc on to Linear Regression
Linear Regression
• Linear Regression determines the “best- ng” line that explains the
rela onship between a response variable and one or more predictor
variables.
‣ The results of a linear regression analysis include an equa on that relates the
response variable to the predictor variables.
• A regression model also allows us to make predic ons regarding the
response variable based on the known values of the predictor
variables.
Quant Methods 3
ti
ti
fi
tti
ti
ti
Regression Analysis Introduc on to Linear Regression
Quant Methods 4
ffi
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression
Quant Methods 5
ŷ
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression
• Fi ed/Predicted values: i = b 0 + b x
1 i for the i th observa on.
• Residuals: ei = yi − i for the i th observa on.
• The es mates b0 and b1 are chosen to minimize
n
2
∑
Sum of Squared Residuals (SSE) = ei
i=1
Quant Methods 6
tt
ti
ti
ŷ
ŷ
ti
ti
ti
Regression Analysis Introduc on to Linear Regression
• The linear rela onship between the sales price of a house (Price in
$1000) and its square footage (Sq ):
Predicted Price = 150 + 0.2 × Sq
‣ There is a posi ve rela onship between the size of a house and its price.
‣ If the square footage increases by 1 sq , we predict the price of a house to increase
by $200.
• The predicted sales price of a 2000 sq house:
150 + 0.2 × 2000= 550
Quant Methods 7
ti
ti
ti
ti
ft
ft
ti
f
ft
Regression Analysis Introduc on to Linear Regression
• The Mul ple Linear Regression model uses mul ple predictor
variables, denoted by x1, x2, …, xk, to explain the varia on in the
response variable, denoted by y:
y = ϐ 0 + ϐ 1 x1 + ϐ 2 x2 + … + ϐ k xk + ϵ
‣ ϵ is the random error term,
‣ coe cients ϐ0, ϐ1, ϐ2, …, ϐk are the unknown parameters to be es mated.
• The slope parameter ϐj determines whether the linear rela onship is
posi ve (ϐj > 0) or nega ve (ϐj < 0).
Quant Methods 8
ffi
ti
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Introduc on to Linear Regression
Quant Methods 9
ŷ
ti
ti
ti
ti
ti
ti
ti
ŷ
ti
Regression Analysis Introduc on to Linear Regression
• Fi ed/Predicted values: i = b 0 + b x
1 1,i + b x
2 2,i + … + b x
k k,i for the i th
observa on.
• Residuals: ei = yi − i for the i th observa on.
• The es mates b0, b1, b2, …, bk are chosen to minimize
n
2
∑
Sum of Squared Residuals (SSE) = ei
i=1
Quant Methods 10
tt
ti
ti
ti
ŷ
ŷ
ti
ti
ti
Estimating Linear Regression Models
Quant Methods 11
Regression Analysis Es ma ng Linear Regression Models
Quant Methods 12
ti
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models
Quant Methods 13
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models
Quant Methods 14
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models
Growing Tomatoes
• Study the e ects of Fer lizer usage on the Height of the tomato
plants:
‣ Data File: Growing Tomatoes.xlsx
• There is a posi ve rela onship between the amount of fer lizer used
and the height of a tomato plant.
• If the fer lizer use increases by 1 oz, we predict the height of a
tomato plant to increase by 5.039 inches.
• The predicted height of a tomato plant receiving 2 oz of fer lizer:
18.012 + 5.039 × 2 = 28.09 inches
Quant Methods 16
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Es ma ng Linear Regression Models
AirBnB Rent
• Study the e ects of the square footage (Sq ) on the monthly Rent of
a house on AirBnB:
‣ Data File: AirBnB Rent.xlsx
• There is a posi ve rela onship between the square footage and the
rent of a house.
• If the square footage increases by 100 sq , we predict the rent of an
AirBnB house to increase by $100.9.
• The predicted rent of a 1000 sq house:
−51.434 + 1.00871 × 1000 = $957.28
Quant Methods 18
ti
ti
ti
ti
ti
ft
f
Regression Analysis Es ma ng Linear Regression Models
• The rent of a house has a nega ve rela onship with the distance to
transit and a posi ve rela onship between the square footage.
‣ If the square footage increases by 100 sq , we predict the rent of
an AirBnB house to increase by $101.3 keeping Distance to Transit
constant.
‣ If the distance to transit increases by 1 mile, we the rent of an
AirBnB house to decrease by $252.8 keeping Sq constant.
Quant Methods 20
ti
ti
ti
ti
ti
ti
f
ti
ft
Regression Analysis Es ma ng Linear Regression Models
• The predicted rent of a 1000 sq house which is 0.5 miles away from
transit:
301.142 − 252.789 × 0.5 + 1.013 × 1000 = $1187.75
Quant Methods 21
ti
ti
ft
ti
Regression Analysis Es ma ng Linear Regression Models
Healthy Living
Quant Methods 22
ti
ti
ff
ti
Regression Analysis Es ma ng Linear Regression Models
Quant Methods 24
ti
ti
ff
ff
ti
ti
ti
ti
ti
ti
ti
Model Evaluation and Selection
Quant Methods 25
Regression Analysis Model Evalua on and Selec on
Goodness-of-Fit Measures
Quant Methods 26
ffi
ti
ffi
ti
ti
fi
fi
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
Quant Methods 27
ti
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
SSE = 1477.0423
n−k−1 = 98
Quant Methods 28
ti
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
Quant Methods 29
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on
Quant Methods 30
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on
Quant Methods 31
ti
ti
fi
ffi
ti
ti
ti
ti
ti
ȳ
Regression Analysis Model Evalua on and Selec on
R:
2 Excel Output
• The Excel regression output reports the standard error of the
es mate.
• Recall the Growing Tomatoes example:
1477.0423
R2 = 1 − = 0.8415
9316.72
SSE = 1477.0423
SST = 9316.72
Quant Methods 32
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
Quant Methods 33
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on
Quant Methods 34
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on
Adjusted R2
( n−k−1 )
n−1
Adjusted R2 = 1 − (1−R2) ×
Quant Methods 35
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
Adjusted R:
2 Excel Output
• The Excel regression output report the standard error of the
es mate.
• Recall the Growing Tomatoes example:
Adjusted R2
( 98 )
99
= 1 − (1−0.8415)× = 0.8398
Quant Methods 36
ti
ti
ti
Regression Analysis Model Evalua on and Selec on
Quant Methods 37
ti
fi
ti
ft
tt
ff
ft
Regression Analysis Model Evalua on and Selec on
Quant Methods 38
ti
fi
ti
ff
Regression Analysis Model Evalua on and Selec on
• Test of Joint Signi cance aims at tes ng if at least one predictor has a
linear rela onship with the response variable.
• Consider the following mul ple linear regression model
y = ϐ 0 + ϐ 1 x1 + ϐ 2 x2 + … + ϐ k xk + ϵ
• The hypothesis test for the joint signi cance:
Ho: ϐ1 = ϐ2 = … = ϐk = 0
Ha: At least one ϐj ≠ 0 for j ∈ {1, 2, …, k}
‣ Excel reports the p-value as part of the regression outcome.
Quant Methods 39
ti
ti
ti
fi
fi
ti
ti
fi
Regression Analysis Model Evalua on and Selec on
Quant Methods 40
ti
ti
ti
fi
ti
fi
ti
fi
Regression Analysis Model Evalua on and Selec on
Quant Methods 41
ti
ti
ti
ti
fi
fi
Regression Analysis Model Evalua on and Selec on
Quant Methods 42
ti
fi
ti
fi
Linear Regression Assumptions and Common
Violations
Quant Methods 43
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 44
ti
ti
fi
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 45
tt
ti
ti
fi
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 46
ti
tt
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Over
Es mated
Quant Methods 47
ti
ti
tt
ff
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Over
Es mated
Quant Methods 48
ti
ti
tt
ff
ti
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
No Mul collinearity
Quant Methods 49
ffi
ti
ti
ti
ti
ti
ti
f
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 50
ff
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 51
ti
ti
ti
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 52
ti
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Model 1 Model 2
Model 1 is Intercept 348187.14* 285604.08
the be er HH Income 7.74* NA
alterna ve.
Per Cap Income NA 13.21*
Owner Occ % -8027.90* -6454.08*
Adjusted R2 0.8069 0.6621
* represents signi cance at the 5% level. NA denotes that the variable is not included.
Quant Methods 53
tt
ti
ti
fi
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 54
ti
ffi
ti
ti
ti
ti
Regression Analysis Linear Regression Assump ons and Common Viola ons
Quant Methods 55
ff
ti
ti
f
f