Page 1
2018, Study Session # 3, Reading # 10
“MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS”
MSR = Mean Regression Sum of Squares = Critical F taken from F
MSE = Mean Squared Error 1. INTRODUCTION Distribute Table
RSS = Regression Sum of Squares = Null Hypothesis
SSE = Sum of Squared Errors/Residuals ∝ = Alternative Hypothesis
α = Level of Significance Multiple linear regression models are more X = Independent Variable
sophisticated. Y = Dependent Variable
They incorporate more than one independent F = F Statistic (calculated)
variable.
2. MULTIPLE LINEAR REGRESSIONS
Allows determining effects of more than one independent variable on a
particular dependent variable
= + + + ⋯ +
Tells the impact on Y by changing X1 by 1 unit keeping other independent
variables same.
Individual slope coefficients (e.g. b1) in multiple regressions known as partial
regression/slope coefficients.
2.1 Assumption of the Multiple Linear Regression Model
Relationship b/w Y and , , , … is linear.
Independent variables are not random and no exact linear relationship exists
b/w 2 or more independent variables.
Expected value of error terms is 0.
Variance of error term is same for all observations.
Error term is uncorrelated across observations.
Error term is normally distributed.
2.2 Predicting the Dependent Variable in a Multiple Regression Model
Obtain estimates of regression parameters.
= ^ , ^ , ^ , … ^
= , , , …
Determine assumed values of , …
Compute predicted value of using = + + + ⋯ +
To predict dependent variable:
Be confident that assumptions of the regression are met.
Predictions regarding X must be within reliable range of data used to estimate the model.
2.3 Testing Whether All Population Regression Coefficients Equals Zero
⇒ All slope coefficients are simultaneously = 0, none of the X
variable helps explain Y.
To test F-test is used.
T-test cannot be used.
=
/
=
/(
())
Copyright © FinQuiz.com. All rights reserved.
Page 2
2018, Study Session # 3, Reading # 10
2.3 Testing Whether All Population Regression Coefficients Equals Zero
Where
= −
= −
n = no. of observation
k = no. of slope coefficients
Decision rule ⇒ reject if F > FC (for given α).
It is a one-tailed test.
df numerator =k
df denominator =n-(k+1).
For k and n the test statistic representing H0, all slope coefficients are
equal to 0, is ,
()
In F-distribution table ,
() where K represents column and n-
(k+1) represents row.
Significance of F in ANOVA table represents ‘p value’.
F-statistic chances of Type I error.
2.4 Adjusted R2
R2 with addition of independent variables (X) in regression
= 1 − 1 − !.
When k ≥ 1 ⇒ >
can be –ve but R2 is always +ve.
If is used for comparing regression models.
Sample size must be the same
Dependent variable is defined in the same way.
Does not necessarily indicate regression is well specified.
3. USING DUMMY VARIABLES IN REGRESSION
Dummy variable ⇒ takes 1 if particular condition is
true & 0 when it is false.
Diligence is required in choosing no. of dummy
variables.
Usually n-1 dummy variables are used
where n= no. of categories.
Copyright © FinQuiz.com. All rights reserved.
Page 3
2018, Study Session # 3, Reading # 10
4. VIOLATIONS OF REGRESSION ASSUMPTIONS
4.1 Heteroskedasticity
Variance of errors differs across observations ⇒ heteroskedastic
Variance of errors is similar across observations ⇒ homoskedastic
Usually no systematic relationship exists b/w X & regression residuals.
If systematic relationship is present ⇒ heteroskedasticity can exist.
4.1.1 The Consequence of Heteroskedasticity
It can lead to mistake in inference.
Does not affect consistency.
F-test becomes unreliable.
Due to biased estimators of standard errors, t-test also becomes unreliable.
Most likely result of heteroskedasticity is that the:
estimated standard errors will be underestimated.
t-statistic will be inflated.
Ignoring heteroskedasticity leads to significant relationship that does not exist actually.
It becomes more serious while developing investment strategy using regression analysis.
Unconditional heteroskedasticity ⇒ when heteroskedasticity of error variance is not correlated with
independent variables in the multiple regression.
Create major problems for statistical inference.
Conditional heteroskedasticity ⇒ when heteroskedasticity of error variance is correlated with the
independent variables.
It causes most problems.
Can be tested & corrected easily through many statistically software packages.
4.1.2 Testing for Heteroskedasticity
Breush-Pagan test is widely used.
Regression squared residuals of regression on independent variables.
Independent variables explain much of the variation of errors ⇒
conditional heteroskedasticity exists.
= no conditional heteroskedasticity exists.
= conditional heteroskedasticity exist
Under Breush-pagan test statistic = nR2
R2: from regression of squared residuals on X
Critical value ⇒ calculated χ2 distribution.
df = no. of independent variables
Reject if test-static > critical value.
Copyright © FinQuiz.com. All rights reserved.
Page 4
2018, Study Session # 3, Reading # 10
4.1.3 Correcting for Heteroskedasticity
Robust Standard Errors Generalized Least Squares
Corrects standard error of estimated Modify original equation.
coefficients. Requires economic expertise to
Also known as heteroskedasticity implement correctly on financial data.
consistent standards errors or white-
corrected standards errors.
4.2 Serial Correlation
Regression errors correlated across observations.
Usually arises in time-series regression.
4.2.1 The Consequences of Serial Correlation
Incorrect estimate of regression coefficient standard errors
Parameter estimates become inconsistent & invalid when Y is lagged onto X under serial
correlation.
Positive serial correlation ⇒ positive (negative) errors chance of positive (negative) errors
Negative serial correlation ⇒ positive (negative) errors chance of negative (positive) errors
It leads to wrong inferences
If positive serial correlation:
Standard errors underestimated
T-statistic & F-statistics inflated
Type-I error
If negative serial correlation
Standard errors overestimated
T-statistics & F-statistics understated
Type-II error
4.2.2 Testing for Serial Correlation
Variety of tests, most common → Durbin-Watson test
∑
"# =
షభమ
సమ
∑ మ
సభ
Where = regression residual for period t.
For large sample size Durbin-Watson statistic (d) is approximately
→DW ≈ 2(1-r)
→where r = sample correlation b/w regression residuals of t and t-1
Values of DW can range from 0 to 4.
DW = 2 ⇒ r=0 ⇒ no serial correlation.
DW = 0 ⇒ r=1 ⇒ perfectly positively serially correlated.
DW = 4 ⇒ r = -1 ⇒ perfectly negatively serially correlated.
For positive serial correlation:
⇒ No positive serial correlation
⇒ Positive serial correlation
"# < $ ⇒ reject
"# > ⇒ do not reject
dl ≤ "# ≤ ⇒ inconclusive.
Copyright © FinQuiz.com. All rights reserved.
Page 5
2018, Study Session # 3, Reading # 10
4.2.2 Testing for Serial Correlation
For negative serial correlation:
⇒ No negative serial correlation.
⇒ Negative serial correlation.
"# > 4 − $ ⇒ Reject .
"# < 4 − ⇒ do not reject
4 − ≤ "# ≤ 4 − $ ⇒ inconclusive.
4.2.3 Correcting for Serial Correlation
Adjust the coefficient standard errors. Modify regression equation.
→ Recommended method Extreme care is required.
Hansen’s method ⇒ most prevalent one. May lead to inconsistent parameters
estimates.
4.3 Multicollinearity
Occurs when two or more independent variables (X) are highly
correlated with each other.
Regression can be estimated but result becomes problematic.
Serious practical concern due to commonly found approximate linear
relation among financial variables.
4.3.1 The Consequences of Multicollinearity
Difficulty in detecting significant relationships.
Estimates become extremely imprecise & unreliable though consistency is unaffected.
F-statistic is unaffected.
Standard errors of regression can .
Causing insignificant t-tests
Wide confidence interval
Type II error
4.3.2 Detecting Multicollinearity
Multicollinearity is a matter of degree rather than the presence / absence.
Pair wise correlation does not necessarily indicate presence of Multicollinearity
Pair wise correlation does not necessarily indicate absence of Multicollinearity
With 2 independent variables ⇒ correlation is a useful indicator.
R2 significant, F-statistic significant, insignificant t-statistic on slope coefficients ⇒
classic symptom of Multicollinearity
4.3.3 Correcting Multicollinearity
Exclude one or more regression variables.
In many cases, experimentation is done to determine
variable causing Multicollinearity
Copyright © FinQuiz.com. All rights reserved.
Page 6
2018, Study Session # 3, Reading # 10
5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION
Model specification ⇒ set of variables included in
regression.
Incorrect specification leads to biased & inconsistent
parameters
5.1 Principles of Model Specification
Model grounded on economic reasoning.
Functional form of variables compatible with nature of variables
Parsimonious ⇒ each included variable should play an essential role
Model is examined for the violation of regression assumptions.
Model is tested for the validity & usefulness of the out of sample data.
5.2 Misspecified Functional Form
One or more variables are omitted. If omitted variable is correlated with
remaining variable, error term will also be correlated with the latter and
the:
result can be biased & inconsistent.
estimated standard errors of the coefficients will be inconsistent.
One or more variables may require transformation.
Pooling of data from different samples that should not be pooled.
Can lead to spurious results.
5.3 Times-Series Misspecification (Independent Variables Correlated with Errors)
Including lagged variables (dependent) as independent
with serial correlation.
Including a function of the dependent variable as an
independent variable.
Independent variables measured with error
5.4 Other Types of Time-Series Misspecification
Nonstationarity: variable properties, e.g. mean, are
not constant through time.
In practice nonstationarity is a serious problem.
Copyright © FinQuiz.com. All rights reserved.
Page 7
2018, Study Session # 3, Reading # 10
6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES
Qualitative dependent variables ⇒ dummy variables used as dependent instead of
independent.
Probit model⇒ based on normal distribution estimates the probability:
of discrete outcome, given values of independent variables used to explain that
outcome.
that Y=1, implying a condition is met.
Logit model:
Identical to Probit model.
Based on logistic distribution.
Both Logit and Probit models must be estimated using maximum likelihood methods.
Discriminate analysis ⇒ can be used to create an overall score that is used for classification.
Qualitative dependent variable models can be used for portfolio management and business
management.
Copyright © FinQuiz.com. All rights reserved.