Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views66 pages

MFIN 305 - Lecture3

Uploaded by

tamara.sammak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views66 pages

MFIN 305 - Lecture3

Uploaded by

tamara.sammak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

MFIN 305: Quantitative

Methods of Finance
Classical Linear Regression Model Assumptions and
Diagnostic Tests
1. Diagnostic Tests
• Recall that a multiple (or simple) linear regression, 𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2 +
⋯ + 𝛽𝑘 𝑥𝑘 + 𝑢𝑡 with the following four assumptions is referred to as
the Classical Linear Regression Model (CLRM)
(1) E 𝑢𝑡 = 0
(2) var 𝑢𝑡 = 𝜎 2 < ∞
(3) cov 𝑢𝑖 , 𝑢𝑗 = 0
(4) cov 𝑢𝑡 , 𝑥𝑡 = 0
2. Violations of the Assumptions of the
CLRM
• How to detect violations?
• What are the most common violations?
• What are the consequences?
• What are the solutions?
• These questions will be addressed.
3. Statistical Distributions for Diagnostic
Tests
Diagnostic or misspecification tests

Lagrangian Multiplier Wald test ~𝐹 distribution with


(LM) test ~𝜒 2 𝑚 , (𝑚, 𝑇 − 𝑘) degrees of freedom
m: number of restrictions
• Asymptotically, the two tests are equivalent
𝜒2 (𝑚)
→ 𝐹(𝑚, 𝑇 − 𝑘) as 𝑇 → ∞
𝑚
• For small samples, the 𝐹-version is preferable
4. Assumption (1): E 𝑢𝑡 = 0
• If a constant is included in the regression equation, this assumption
will never be violated
• If an intercept is not included in the regression, the average of the
residuals will not be zero and:
𝐸𝑆𝑆
(1) 𝑅2 = can be negative
𝑇𝑆𝑆
(2) The slope coefficient estimates can be biased
4. Assumption (1): E 𝑢𝑡 = 0
No intercept implies that the regression line goes through the origin

2
⇒ 𝑅2 and 𝑅 will be meaningless
𝑦𝑡

𝑥𝑡
2
5. Assumption (2): var 𝑢𝑡 = 𝜎 < ∞
• We have so far assumed that the variance of the errors is constant
• That is 𝜎 2 is constant (i.e., does not vary with time or with the
regressors)
• this is known as homoskedasticity
• If the errors do not have a constant variance, we say that they are
heteroskedastic
𝑣𝑎𝑟(𝑢ො 𝑡 )

𝑥2𝑡
6. Detection of Heteroscedasticity: Goldfeld-
Quandt test
1) Split the total sample of length T into two sub-samples
Length 𝑇1 Length 𝑇2
′ ′
𝑢ො 1 𝑢ො 1 2
𝑢ො 2 𝑢ො2
𝑠12 = 𝑠2 =
𝑇1 − 𝑘 𝑇2 − 𝑘

2) The null hypothesis is that the variances of the disturbances are equal:
𝐻0 : 𝜎12 = 𝜎22 (homoskedasticity)
𝐻1 : 𝜎12 ≠ 𝜎22 (heteroskedasticity)
6. Detection of Heteroscedasticity: Goldfeld-
Quandt test
3) The test statistic, denoted 𝐺𝑄, is simply the ratio of the two residual
variances, where the larger of the two variances must be placed in
the numerator
𝑠12
𝐺𝑄 = 2 ~𝐹(𝑇1 − 𝑘, 𝑇2 − 𝑘)
𝑠2
4) Decision rule: if 𝐺𝑄 > critical value ⇒ Reject 𝐻0
Note: The higher of the two standard errors of the regression goes
in the numerator
7. Detection of Heteroscedasticity: White’s
test
1) Assume that the regression model is
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝑢𝑡
Obtain the residuals 𝑢ො 𝑡
2) Run the auxiliary regression
𝑢ො 𝑡 2 = 𝛼1 + 𝛼2 𝑥2𝑡 + 𝛼3 𝑥3𝑡 + 𝛼4 𝑥2𝑡
2 2
+ 𝛼5 𝑥3𝑡 + 𝛼6 𝑥2𝑡 𝑥3𝑡 + 𝑣𝑡
• The goal is to investigate whether the variance of the residuals varies
systematically with any of the variables
• 𝑣𝑡 is an error term
7. Detection of Heteroscedasticity: White’s
test
Null and alternative hypotheses:
𝐻0 : 𝛼2 = 0 𝑎𝑛𝑑 𝛼3 = 0 𝑎𝑛𝑑 𝛼4 = 0 𝑎𝑛𝑑𝛼5 = 0 𝑎𝑛𝑑 𝛼6
= 0 No heteroskedasticity
𝐻1 : 𝛼2 ≠ 0 or 𝛼3 ≠ 0 or 𝛼4 ≠ 0 or 𝛼5 ≠ 0 or 𝛼6 ≠ 0 (heteroskedastity)

3) Lagrange Multiplier (LM) test:


𝑇𝑅2 ~𝜒 2 (𝑚)
Where 𝑚 is the number of restrictions
4) Decision rule: if test statistic > critical value → Reject 𝐻0
7. Detection of Heteroscedasticity: White’s
test
Example
Assume that the model 𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝑢𝑡 has been
estimated using 120 observations, and the 𝑅2 from the auxiliary
regression is 0.234
• Auxiliary regression:
𝑢ො 𝑡 2 = 𝛼1 + 𝛼2 𝑥2𝑡 + 𝛼3 𝑥3𝑡 + 𝛼4 𝑥2𝑡
2 2
+ 𝛼5 𝑥3𝑡 + 𝛼6 𝑥2𝑡 𝑥3𝑡 + 𝑣𝑡
• Null and alternative hypotheses:
𝐻0 : 𝛼2 = 0, 𝛼3 = 0, 𝛼4 = 0, 𝛼5 = 0, 𝛼6 = 0 (No heteroskedasticity)
𝐻1 : 𝛼2 ≠ 0 𝑜𝑟 𝛼3 ≠ 0 𝑜𝑟 𝛼4 ≠ 0 𝑜𝑟 𝛼5 ≠ 0 𝑜𝑟 𝛼6 ≠ 0 (heteroskedasticity)
7. Detection of Heteroscedasticity: White’s
test
• Test statistic: 𝑇𝑅2 = 120 ⨯ 0.234 = 28.8 ~ 𝜒 2 (5)
• The 5% critical value from the 𝜒 2 table is 11.07
• Since the test statistic is larger than the critical value → Reject 𝐻0
(there is heteroscedasticity and the variance of the errors is not
constant)
8. Consequences of Using OLS in the
Presence of Heteroscedasticity
• OLS estimator will be consistent and unbiased but not efficient (i.e.
does not have the minimum variance among the class of unbiased
estimators)
• OLS is no longer BLUE
⇒ standard errors are no longer correct
⇒ inferences are misleading (standard errors are too large)
• OLS standard errors will be too large for the intercept when the errors
are heteroscedastic
8. Consequences of Using OLS in the
Presence of Heteroscedasticity
• The OLS slope standard errors will be too big when the variance of the
errors is inversely related to the explanatory variable
• If the variance of the errors is positively related to the square of an
explanatory variable, the OLS standard error for the slope will be too
low
9. How Do we Deal with Heteroscedasticity?
• If the form (i.e. the cause) of the heteroskedasticity is known, then we can
use an estimation method which takes this into account (called generalized
least squares, GLS)
• A simple illustration of GLS is as follows: suppose that the error variance is
related to another variable 𝑧𝑡 by
v𝑎𝑟 𝑢𝑡 = 𝜎 2 𝑧𝑡2
• To remove the heteroskedasticity, divide the regression equation by 𝑧𝑡 :
𝑦𝑡 1 𝑥2𝑡 𝑥3𝑡
= 𝛽1 + 𝛽2 + 𝛽2 + 𝑣𝑡
𝑧𝑡 𝑧𝑡 𝑧𝑡 𝑧𝑡
𝑢𝑡
where 𝑣𝑡 = is now a homoskedastic error term
𝑧𝑡
9. How Do we Deal with Heteroscedasticity?
𝑢𝑡 v𝑎𝑟(𝑢𝑡 ) 𝜎 2 𝑧𝑡2
• To see that, note: v𝑎𝑟 𝑣𝑡 = v𝑎𝑟 = = = 𝜎2
𝑧𝑡 𝑧𝑡2 𝑧𝑡2
• So, the disturbances from the new regression will be homoskedastic
• Other solutions include:
1. Transforming the variables into logs or reducing by some other measure
of “size”
2. Use White’s heteroscedasticity consistent standard errors estimates for the
slope coefficients are increased relative to the usual OLS standard errors.
This makes hypothesis testing more conservative, so that we would
need more evidence against the null hypothesis before we would reject it
10. Assumption 3: cov 𝑢𝑖 , 𝑢𝑗 = 0 for i ≠ j
• The covariance between the error terms over time (or cross-
sectionally) is zero
• If the errors are not uncorrelated with one another, it would be stated
that they are “autocorrelated” or that they are “serially correlated”
11. The concept of the Lagged Value
• The lagged value of a variable is the value that the variable took
during a previous period
• ∆𝑦𝑡 : Difference between the values in this period and the previous
period
∆𝑦𝑡 = 𝑦𝑡 − 𝑦𝑡−1
11. The concept of the Lagged Value
t 𝑦𝑡 𝑦𝑡−1 ∆𝑦𝑡
2006𝑀09 0.8 - -
2006𝑀10 1.3 0.8 1.3 − 0.8 = 0.5
2006𝑀11 -0.9 1.3 −0.9 − 1.3 = −2.2
2006𝑀12 0.2 -0.9 0.2 − −0.9 = 1.1
2007𝑀01 -1.7 0.2 −1.7 − 0.2 = −1.9
2007𝑀02 2.3 -1.7 2.3 − −1.7 = 4.0
2007𝑀03 0.1 2.3 0.1 − 2.3 = −2.2
2007𝑀04 0.0 0.1 0.0 − 0.1 = −0.1
. . . .
. . . .
12. Testing for Autocorrelation
• We assumed of the CLRM’s errors that 𝑐𝑜𝑣 𝑢𝑖 , 𝑢𝑗 = 0 for 𝑖 ≠ 𝑗
i.e. this is essentially the same as saying there is no pattern in the
errors
• Obviously, we never have the actual 𝑢′ 𝑠, so we use their sample

counterpart, the sample residuals (the 𝑢ො 𝑡 𝑠)
• If there are patterns in the residuals from a model, we say they are
autocorrelated
12. Testing for Autocorrelation
+ +
𝑢ො 𝑡 𝑢ො 𝑡

- + Time
𝑢ො 𝑡−1

- -

Positive Autocorrelation is indicated by a cyclical residual plot over times


12. Testing for Autocorrelation
+ +
𝑢ො 𝑡 𝑢ො 𝑡

- +
Time
𝑢ො 𝑡−1

- -
Negative autocorrelation is indicated by an alternating pattern where the
residuals cross the time axis more frequently than if they were distributed
randomly
12. Testing for Autocorrelation
+ +
𝑢ො 𝑡 𝑢ො 𝑡

- +
𝑢ො 𝑡−1 Time

- -

No pattern in residuals at all: this is what we would like to see


13. Detecting Autocorrelation: The Durbin-
Watson Test
• The Durbin-Watson (DW) is a test for first order autocorrelation
• It assumes that the relationship is between an error and the previous one
𝑢𝑡 = 𝜌𝑢𝑡−1 + 𝑣𝑡
Where 𝑣𝑡 ~𝑁(0, 𝜎𝑣 2 )
• The DW has as its null and alternative hypotheses:
𝐻0 : 𝜌 = 0 (no autocorrelation)
𝐻1 : 𝜌 ≠ 0 (First order autocorrelation)
• However, it is not necessary to run the regression
13. Detecting Autocorrelation: The Durbin-
Watson Test
• the test statistic is calculated by
σ𝑇𝑡=2(𝑢ො 𝑡 − 𝑢ො 𝑡−1 )2
𝐷𝑊 =
σ𝑇𝑡=2 𝑢ො 𝑡 2
• Based on the residuals, we have
𝑢ො 𝑡 = 𝜌𝑢ො 𝑡−1
• It is also possible to express the DW as an approximate function of the
estimated value 𝜌
𝐷𝑊 ≈ 2(1 − 𝜌)

13. Detecting Autocorrelation: The Durbin-
Watson Test
• Since 𝜌ො is a correlation, it implies that
−1 ≤ 𝜌ො ≤ 1
• 𝜌ො = 0 ⇒ DW = 2 (No autocorrelation in residuals)
• 𝜌ො = 1 ⇒ DW = 0 (Perfect positive autocorrelation in the residuals)
• 𝜌ො = −1 ⇒ DW = 4 (Perfect negative autocorrelation in the residuals)
• DW has 2 critical values, an upper critical value (𝑑𝑢 ) and a lower
critical value 𝑑𝐿 , and there is also an intermediate region where we
can neither reject nor not reject H0
13. Detecting Autocorrelation: The Durbin-
Watson Test

• If DW < 𝑑𝐿 ⇒ Reject H0 (Positive autocorrelation)


• If DW > 4 − 𝑑𝐿 ⇒ Reject H0 (Negative autocorrelation)
• If 𝑑𝑢 < DW < 4 − 𝑑𝑢 ⇒ Do not reject H0 (No significant residual
autocorrelation)
13. Detecting Autocorrelation: The Durbin-
Watson Test
Conditions which must be fulfilled for DW to be a valid test:
1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of the dependent variable in the regression
13. Detecting Autocorrelation: The Durbin-
Watson Test
Example:
• A researcher wishes to test for first order serial autocorrelation in the
residuals from a linear regression
• The DW statistic value is 0.86. There are 80 quarterly observations in
the regression, which is of the form
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝑢𝑡
• Critical values: 𝑑𝐿 = 1.42, 𝑑𝑢 = 1.57
• Thus, 4 − 𝑑𝑢 = 2.43 and 4 − 𝑑𝐿 = 2.58
• Since DW < 𝑑𝐿 ⇒ the null hypothesis of no autocorrelation is rejected
and we can conclude that the residuals are positively correlated
14. Another Test for Autocorrelation: The
Breusch-Godfrey Test
• It is a more general test for 𝑟 𝑡ℎ order autocorrelation (more general
than first-order autocorrelation)
𝑢𝑡 = 𝜌1 𝑢𝑡−1 + 𝜌2 𝑢𝑡−2 + 𝜌3 𝑢𝑡−3 + ⋯ + 𝜌𝑟 𝑢𝑡−𝑟 + 𝑣𝑡 ; 𝑣𝑡 ~𝑁(0, 𝜎𝑣 2 )
• 𝑟 has to be picked and the choice is somewhat arbitrary
• The null and alternative hypotheses are:
H0 : 𝜌1 = 0 𝑎𝑛𝑑 𝜌2 = 0, ⋯ 𝑎𝑛𝑑 𝜌𝑟 = 0 (No autocorrelation)
H1 : 𝜌1 ≠ 0 𝑜𝑟 𝜌2 ≠ 0 𝑜𝑟 ⋯ 𝑜𝑟 𝜌𝑟 ≠ 0 (Autocorrelation)
• Under the null hypothesis, the error is not related to any of its 𝑟
previous values
14. Another Test for Autocorrelation: The
Breusch-Godfrey Test
• The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals 𝑢ො 𝑡
2. Regress 𝑢ො 𝑡 on all the regressors from stage 1 (all the xs) plus
𝑢ො 𝑡−1, 𝑢ො 𝑡−2, ⋯ , 𝑢ො 𝑡−𝑟 :
𝑢ො 𝑡 = 𝛾1 + 𝛾2 𝑥2𝑡 + 𝛾3 𝑥3𝑡 + 𝛾4 𝑥4𝑡 + 𝜌1 𝑢ො 𝑡−1 + 𝜌2 𝑢ො 𝑡−2
+𝜌3 𝑢ො 𝑡−3 + ⋯ + 𝜌𝑟 𝑢ො 𝑡−𝑟 + 𝑣𝑡 ; 𝑣𝑡 ~𝑁(0, 𝜎𝑣 2 )
Obtain 𝑅2 from the auxiliary regression
3. Test statistic: (𝑇 − 𝑟)𝑅2 ~𝜒𝑟 2
4. Decision rule: if test statistic > critical value ⇒ Reject 𝐻0
15. Consequences of ignoring Autocorrelation
if it is present
• The consequences are similar to those of ignoring heteroscedasticity
• The coefficient estimates derived using OLS are still unbiased, but
they are inefficient, i.e. they are not BLUE
• Even for large sample sizes (T), standard errors could be wrong. Thus,
there exists the possibility that we could make the wrong inferences
15. Consequences of ignoring Autocorrelation
if it is present
• In the case of positive serial correlation in the residuals:
1. The OLS standard error estimates will be biased downwards relative to
the true standard errors
2. Therefore, the probability of type I error (the tendency to reject the
null hypothesis when it is correct) would increase
• 𝑅2 is likely to be inflated relative to its “correct” value for positively
correlated residuals
16. Dealing with Autocorrelation
• If the form of autocorrelation is known, we could use a GLS procedure –
i.e. an approach that allows for autocorrelated residuals (Cochran-Orcutt)
• But such procedures that correct for autocorrelation require assumptions
about the form of the autocorrelation
• It is unlikely to be the case that the form of the autocorrelation is known
• Solutions?
• HAC standard errors
• Dynamic models
• Regression in differences
∆𝑦𝑡 = 𝛽1 + 𝛽2 ∆𝑥2𝑡 + 𝛽3 ∆𝑥3𝑡 + 𝑢𝑡
17. Why Might Lags be Required in a
Regression?
• Lagged values of the explanatory variables or of the dependent
variable may capture important dynamic structures in the dependent
variable that be caused by:
1. Inertia of the dependent variable
2. Overreactions
• Lagged variables will not solve the problem of the omission of
relevant variables which are themselves autocorrelated
𝑟𝑡 = 𝛼0 + 𝛼1 𝛺𝑡−1 + 𝑢𝑡
17. Why Might Lags be Required in a
Regression?
• 𝛺𝑡−1 : unexpected changes in inflation
unexpected change in industrial production
unexpected change in default or term spread
• But, the previous equation definitely omits some variables. If the
omissions are autocorrelated, so will the 𝑢𝑡
• Autocorrelation owing to unparameterised seasonality
• If misspecification error has been committed by using an inappropriate
functional form
18. Problems with adding Lagged Regressors
to “Cure” Autocorrelation
• A move from a static model to a dynamic one may cure autocorrelation
• An example of a dynamic model is the autoregressive distributed lag model
(ARDL)
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝛽5 𝑥5𝑡 + 𝛾1 𝑦𝑡−1 + 𝛾2 𝑦2𝑡−1
+ ⋯ + 𝛾𝑘 𝑦𝑡−𝑘 + 𝑢𝑡
• However,
1. Inclusion of lagged values of the dependent variable violates the
assumption that the explanatory variables are non-stochastic (assumption
4)
2. What does an equation with a large number of lags actually mean?
19. Assumption 4: The 𝑥𝑡 are Non-Stochastic
• OLS estimator is consistent and unbiased in the presence of stochastic
regressors, provided that the regressors are not correlated with the
error term from the estimated equation
• 𝛽መ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 and 𝑦 = 𝑋𝛽 + 𝑢

• Thus, 𝛽መ = 𝑋 ′ 𝑋 −1
𝑋 ′ (𝑋𝛽 + 𝑢)
= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑋𝛽 + (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑢
= 𝛽 + (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑢
19. Assumption 4: The 𝓍𝑡 are Non-Stochastic
• Taking expectations, and provided that 𝑋 and 𝑢 are independent:
𝐸 𝛽መ = 𝐸 𝛽 + 𝐸 (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑢
= 𝛽 + 𝐸 𝑋′𝑋 −1 𝑋 ′ 𝐸 𝑢
=𝛽
20. Assumption 5: The Disturbances are
Normally Distributed
𝑢𝑡 ~𝑁(0, 𝜎 2 ) is required in order to conduct single or joint hypothesis
tests about the model parameters
21. Testing for Departures from Normality
• The most commonly applied tests for normality is the Bera-Jarque
(BJ) test
• A normal distribution is fully characterized by its first two moments:
the mean and the variance
• The third moment: skewness, measures symmetry around the mean
• The fourth moment: kurtosis, measures how fat the tails of the
distribution are
• A normal distribution is not skewed (skewness = 0) and is defined to
have a coefficient of kurtosis of 3
21. Testing for Departures from Normality
• Excess kurtosis = kurtosis – 3
• A normal distribution will have a coefficient of excess kurtosis = 0
• A normal distribution is symmetric and said to be mesokurtic
• Leptokurtic distribution: is a distribution which has fatter tails and is
more peaked at the mean than a normally distributed random variable
• Platykurtic distribution: less peaked in the mean and will have thinner
tails and more distribution in the shoulders than a normal distribution
• Financial time series are far more likely to be characterized as
leptokurtic distributions
21. Testing for Departures from Normality
• Bera and Jarque (1981) formalize these ideas by testing whether the
coefficients of excess kurtosis and skewness are jointly zero
• Recall 𝑢𝑡 is the error term with variance 𝜎 2
𝐸(𝑢3 ) 𝐸(𝑢4 )
𝑏1 = 2 3/2 and 𝑏2 = 2 2
(𝜎 ) (𝜎 )
• The kurtosis of the normal distribution is 3 ⇒ its excess kurtosis is
𝑏2 − 3 = 0
21. Testing for Departures from Normality
• The Bera-Jarque statistic is given by
2
𝑏1 (𝑏2 − 3)2
𝑊=𝑇 + ~𝜒 2 (2)
6 24
• The test statistic asymptotically follows a 𝜒 2 (2) under the null
hypothesis that the distribution of the series is symmetric and
mesokurtic
H0 : 𝑏1 = 0 and 𝑏2 = 3 (Normality)
H1 : 𝑏1 ≠ 0 or 𝑏2 ≠ 3 (Non-normality – the residuals are significantly
skewed or leptokurtic/platykurtic or both)
22. What Should be Done if Evidence of
Non-Normality is Found?
• For sufficiently large sample sizes (asymptotically), violations of the
normality assumptions are inconsequential A
• ppealing to the central limit theorem (CLT), the test statistics will
asymptotically follow the appropriate distributions even in the absence
of error normality
• It is quite often the case that one or two very extreme residuals cause a
rejection of the normality assumption, and would therefore lead 𝑢4 to
be very large
• Such observations that do not fit with the pattern of the remainder of
the data are known as outliers
23. Multicollinearity
• An implicit assumption that is made when using the OLS estimation
method is that the explanatory variables are not correlated with one
another
• If there is no relationship between the explanatory variables, they
would be said to be orthogonal to one another
• A small degree of association between the explanatory variables is to
be expected (benign)
• A problem occurs when the explanatory variables are very highly
correlated with each other, and this problem is known as
multicollinearity
23. Multicollinearity
It is possible to distinguish between two cases of multicollinearity:
1. Perfect Multicollinearity:
• It occurs when there is an exact relationship between two or more
variables
• e.g., 𝑥3 = 2𝑥2 → the model parameters cannot be estimated
• Identical information → sufficient to estimate only one parameter,
not two
• (𝑋 ′ 𝑋) matrix is not full rank (two of the columns would be linearly
dependent on one another) ⇒ (𝑋 ′ 𝑋)−1 can not be calculated so that
OLS parameters cannot be estimated
23. Multicollinearity
2. Near multicollinearity: is much more to occur in practice and
would arise when there is non-negligible but not perfect relationship
between two of the explanatory variables
24. Detecting Multicollinearity
• One can examine the matrix of correlations between the individual
variables
• Suppose that a regression equation has three explanatory variables
(plus a constant term), and that the pairwise correlations between these
explanatory variables are:

corr 𝑥2 𝑥3 𝑥4

𝑥2 − 0.2 0.8
𝑥3 0.2 − 0.3
𝑥4 0.8 0.3 −
25. Problems if Near Multicollinearity is
Present but Ignored
• 𝑅2 will be high but the individual coefficients will have high standard
errors, so that the regression looks good as a whole, but the individual
variables are not significant
• Adding or removing a variable would change the coefficient value
significantly
• Confidence intervals are very wide
26. Solutions to the Problem of
Multicollinearity
• Ignore it: it does not violate assumptions (1)-(4) of the CLRM
• Drop one of the collinear variables
• Transform the highly correlated variables into a ratio and include the
ratio in the regression
• Collect more data or switch to a higher frequency
27. Adopting the Wrong Functional Form
• We have previously assumed that the appropriate functional form is
linear
• This may not always be true
• We can formally test this using Ramsey’s RESET test, which is a
general test for misspecification of functional form
• The method works by using higher order terms of the fitted values
(e.g., 𝑦ො𝑡 2 , 𝑦ො𝑡 3 ) in an auxiliary regression
27. Adopting the Wrong Functional Form
• The auxiliary regression is one where 𝑦𝑡 is regressed on powers of the
fitted values together with the original explanatory variables
𝑝
𝑦𝑡 = 𝛼1 + 𝛼2 𝑦ො𝑡2 + 𝛼3 𝑦ො𝑡3 + ⋯ + 𝛼𝑝 𝑦ො𝑡 + σ𝑘𝑖=1 𝛽𝑖 𝑥𝑖𝑡 +𝑢𝑡
• Higher order powers of the fitted values of 𝑦 can capture a variety of
non-linear relationships:
𝑦ො𝑡2 = (𝛽መ1 + 𝛽መ2 𝑥2𝑡 + 𝛽መ3 𝑥3𝑡 + ⋯ + 𝛽መ𝑘 𝑥𝑘𝑡 )2
27. Adopting the Wrong Functional Form
• We are interested in testing:
H0 : 𝛼2 = 0 and 𝛼3 = 0 and ⋯ 𝛼𝑝 = 0 (correct functional form)
H1 : 𝛼2 ≠ 0 or 𝛼3 ≠ 0 or ⋯ 𝛼𝑝 ≠ 0 (incorrect functional form)
• Obtain 𝑅2
• Test statistic: 𝑇𝑅2 ~𝜒 2 (𝑝 − 1)
• Decision rule: if the value of the test statistic is greater than the 𝜒 2
critical value, reject the null hypothesis that the functional form was
correct
28. Omission of an Important Variable
• Suppose the true regression is
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝛽5 𝑥5𝑡 + 𝑢𝑡
• Yet the researcher estimates
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝑢𝑡
• Consequences:
1. biased and inconsistent coefficients unless the excluded variable is
uncorrelated with all the included variables
2. Standard errors will also be biased upwards
29. Inclusion of an Irrelevant Variable
• True data generating process (DGP)
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝑢𝑡
• The researcher estimates a model of the form
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝛽4 𝑥4𝑡 + 𝛽5 𝑥5𝑡 + 𝑢𝑡
• A superfluous or irrelevant variable 𝑥5𝑡 is included
• 𝛽5 should be zero. But, in almost all cases, it will not be
• Consequence: the coefficient estimators would still be consistent and
unbiased, but the estimators would be inefficient (i.e. standard errors of the
coefficients will be inflated) ⇒ there is a trade-off between consistency and
efficiency
30. Parameter Stability Tests
• So far, we have estimated regressions such as
𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑡 + 𝛽3 𝑥3𝑡 + 𝑢𝑡
• We have implicitly assumed that the parameters (𝛽1 , 𝛽2 and 𝛽3 ) are
constant for the entire sample period
• We can test this implicit assumption using parameter stability tests
𝐻0 :Parameters are stable over time
𝐻1 : Parameters are not stable over time
31. Parameter Stability Tests: The Chow Test
1. Split the data into two sub-periods. Estimate the regression over the
whole period and then for the two sub-periods separately (3
regressions). Obtain 𝑅𝑆𝑆 from each

T1 T2

2. The restricted regression is now the regression for the whole period
while the unrestricted regression comes into parts: one for each of
the subsample
31. Parameter Stability Tests: The Chow Test
Perform an 𝐹-test based on the difference between the two 𝑅𝑆𝑆𝑠
𝑅𝑆𝑆 − (𝑅𝑆𝑆1 + 𝑅𝑆𝑆2 ) 𝑇 − 2𝑘
𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = ⨯
𝑅𝑆𝑆1 + 𝑅𝑆𝑆2 𝑘
Where, 𝑅𝑆𝑆: Residual sum of squares for whole sample
𝑅𝑆𝑆1 : 𝑅𝑆𝑆 for subsample 1
𝑅𝑆𝑆2 : 𝑅𝑆𝑆 for subsample 2
𝑇: number of observations
2𝑘: number of regressors in unrestricted regression including the constants
𝑘: number of regressors in each “unrestricted” regression
31. Parameter Stability Tests: The Chow Test
• The unrestricted regression is the one where the restriction has not
been imposed. Since the restriction is that the coefficients are equal
across the two sub-samples, the unrestricted regression is the single
regression for the whole sample
• Intuition: if the coefficients do not change between the samples, the
𝑅𝑆𝑆 will not rise much upon imposing the restriction
• Restricted 𝑅𝑆𝑆 = 𝑅𝑆𝑆
• Unrestricted sum of squares: 𝑈𝑅𝑆𝑆 = 𝑅𝑆𝑆1 + 𝑅𝑆𝑆2
• The number of restrictions is equal to the number of coefficients that
are estimated for each of the regressions = 𝑘
31. Parameter Stability Tests: The Chow Test
• The number of regressors in the unrestricted regression (including the
constants) is 2𝑘, since the unrestricted regression comes in two parts,
each with 𝑘 regressors
3. Perform the test: if the value of the 𝐹-statistic > 𝐹(𝑘, 𝑇 − 2𝑘)
⇒ Reject the null hypothesis that the parameters are stable over
time
31. Parameter Stability Tests: The Chow Test
Example: Suppose that it is now January 1993. consider the following
regression for the standard CAPM 𝛽 for the returns on a stock
𝑟𝑔𝑡 = 𝛼 + 𝛽𝑟𝑀𝑡 + 𝑢𝑡
• Where 𝑟𝑔𝑡 is the excess returns on Glaxo and 𝑟𝑀𝑡 is the excess returns
on a market portfolio.
• Suppose that you are interested in estimating beta using monthly data
from 1981 to 1992.
• Test this conjecture using a Chow test, given that another researcher
expresses concern that the October 1987 stock market crash
fundamentally altered the risk-return relationship
31. Parameter Stability Tests: The Chow Test
The model for each sub-period is:
• 1981𝑀1 − 1987𝑀10
𝑟𝑔𝑡
Ƹ = 0.24 + 1.2𝑟𝑀𝑡 𝑇 = 82, 𝑅𝑆𝑆 = 0.03555
• 1987𝑀11 − 1992𝑀12
𝑟𝑔𝑡
Ƹ = 0.68 + 1.53𝑟𝑀𝑡 𝑇 = 62, 𝑅𝑆𝑆 = 0.00336
• 1981𝑀1 − 1992𝑀12
𝑟𝑔𝑡
Ƹ = 0.39 + 1.37𝑟𝑀𝑡 𝑇 = 144, 𝑅𝑆𝑆 = 0.0434
31. Parameter Stability Tests: The Chow Test
• The hypotheses being tested:
H0 : 𝛼1 = 𝛼2 and 𝛽1 = 𝛽2
H1 : 𝛼1 ≠ 𝛼2 or 𝛽1 ≠ 𝛽2
0.0434− 0.0355+0.00336 144−4
• 𝑡𝑒𝑠𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = ⨯ = 7.698
0.0355+0.00336 2

• 𝐹 -statistic > 𝐹 2,140 = 3.06 ⇒ Reject the null hypothesis, the


restriction that the coefficients are the same in the two periods cannot
be employed
31. Parameter Stability Tests: The Chow Test
How to decide on the subsamples to use?
• Plot the variables to see any structural changes
• Split the data according to any known independent historical events

You might also like