Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views25 pages

15-Econometrics-Linear Regression

The document discusses heteroskedasticity in econometrics, specifically focusing on a dataset from California's SMSAs in 1972. It covers various tests for heteroskedasticity, including the Breusch-Pagan and White tests, and discusses the implications of heteroskedasticity on OLS estimators. Additionally, it presents methods for addressing heteroskedasticity, such as using loglinear models and robust standard errors.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views25 pages

15-Econometrics-Linear Regression

The document discusses heteroskedasticity in econometrics, specifically focusing on a dataset from California's SMSAs in 1972. It covers various tests for heteroskedasticity, including the Breusch-Pagan and White tests, and discusses the implications of heteroskedasticity on OLS estimators. Additionally, it presents methods for addressing heteroskedasticity, such as using loglinear models and robust standard errors.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Econometrics

University of Milan-Bicocca

Course lecturer:
Maryam Ahmadi
[email protected]

1
Heteroskedasticity

2
Problem 14 & Answer.

The data set AIRQ contains observations for 30 standard metropolitan


statistical areas (SMSAs) in California for 1972 on the following variables:

airq: indicator for air quality (the lower the better);


vala: value added of companies (in 1000 US$);
rain: amount of rain (in inches);
coas: dummy variable, 1 for SMSAs at the coast; 0 for others;
dens: population density (per square mile);
medi: average income per head (in US$).

3
a. Estimate a linear regression model that explains airq from the other variables using ordinary least
squares. Interpret the coefficient estimates.

coastal regions, ceteris paribus, have a better air quality.


Keeping other factors fixed, population density does not significantly affect air quality (the effect is negative but
insignificant).
A higher value added for a regions, or having more rain, or a higher household income do not significantly affect
air quality (the effect of each one, ceteris paribus, is positive but insignificant).
• The F-statistic, 2.98, with a p-value of 0.031, indicat a marginal rejection (of the joint effect being zero) at the
95% confidence level.
4
b. Perform a Breusch–Pagan test for heteroskedasticity related to all five
explanatory variables.

The test statistics of the Breusch-Pagan test is based on N* R2 = 3.141. Given CHI-2(0.1,
5df)= 9.24, We can not reject the null of homoskedasticity.
Note, however, that the non-rejection may be due to a lack of power due to the small
number of observations and the general nature of the alternative hypothesis. 5
c. Perform a White test for heteroskedasticity. Comment upon the appropriateness of the White test
in light of the number of observations and the degrees of freedom of the test.

The White test is based on including 5


variables, their squared(4) and interactions
(10), which leads to a large number of
regressors in the test regression, that is 19
regressors in addition to the intercept.

• The test statistics is N*R2 (small N)


• and it follows a chi2(df=19) (large df with
respect to N),
• Critical values for Chi-squared distribution
with high df is so likely be larger than NR2.

As a result, it is very unlikely to find a rejection,


indicating that the use of the White test is
inappropriate in this case, as the sample size is
too small (given the number of regressors).
We can not reject the null of homoskedasticity
6
d. Assuming that we have multiplicative heteroskedasticity related to coas and medi, estimate the
coefficients by running a regression of log e2 upon these two variables. Test the null hypothesis of
homoskedasticity on the basis of this auxiliary regression..

The F-statistic of this


regression is 5.07 with a p-
value of 0.013. So we reject
the null hypothesis of
homoskedasticity at a 95%
significance level. As a
consequence, we can say
that we have multiplicative
heteroskedasticity related
to coas and medi.

7
If A3, the assumption of Homoskedasticity, is violated, Heteroskedasticity arises.

Consequences of heteroskedasticity:
• OLS is still unbiased under heteroskedastictiy!

• Also, interpretation of R-squared is not changed

• Heteroskedasticity invalidates variance formulas for OLS estimators, therefore, Routinely computed

standard errors are incorrect.

• The usual F and t tests are not longer valid under heteroskedasticity

• Under heteroskedasticity, OLS is no longer the best linear unbiased estimator (BLUE); there may be more

efficient linear estimators.

8
Detection of Heteroskedasticity
The Breusch-Pagan test 𝑢ො 2 = 𝛿0 + 𝛿1 𝑥1 + ⋯ + 𝛿𝑘 𝑥𝑘 + 𝜈
The null hypothesis of the homoskedasticity is then 𝐻0 : 𝛿1 = 𝛿2 = ⋯ = 𝛿𝑘 = 0
against the alternative that is 𝐻1 : 𝐻0 is not true
Test statistic: N*R2 Having Chi-squared distribution (DF= # variables in auxiliary regression).

The White test


It is based on regressing the squared OLS residuals upon all regressors, their squares and their (unique) cross-products.
Test statistic: N*R2 having Chi-squared distribution(DF = # variables in auxiliary regression).

Multiplicative heteroskedasticity test


It is based on an auxiliary regression log 𝑒𝑖2 = log 𝜎 2 + 𝑧𝑖′ 𝛼 + 𝑢i
the simplest test is based on the standard F-test in auxiliary regression for the hypothesis that all coefficients, are equal
to zero. 9
Solutions of Heteroskedasticity
✓consider a loglinear model

It is quite common to find heteroskedasticity in the situations in which the size


of the observational units differs substantially. For example, in a sample
containing firms with one employee and firms with over 1000 employees. We
can expect that large firms have larger absolute values of all variables in the
model, including the unobservables collected in the error term.

A common approach to alleviate this problem is to use logarithms of all


variables rather than their levels. Consequently, our first step in handling the
heteroskedasticity problem is to consider a loglinear model.
10
✓robust standard errors
It is possible to estimate standard errors for OLS without specifying σ2i.
𝑛 −1 𝑛 𝑛 −1
var 𝛽መ = ෌𝑖=1 𝑥𝑖′ 𝑥𝑖 ෌𝑖=1 𝑢ො 𝑖2 𝑥𝑖′ 𝑥𝑖 ′
෌𝑖=1 𝑥𝑖 𝑥𝑖

If we use this formula to compute standard errors rather, we can continue as before with our (t- and F) tests.
These are standard errors that are robust to heteroskedasticity. That is, are correct even if errors are
heteroskedastic. We call the square root of this variance as “heteroskedasticity-robust standard error”.

heteroskedasticity robust standard errors can be used for any inferences (t and F tests).

Note: parameters estimates and goodness-of-fit measures do not change. Standard errors, and F-test are
adjusted.

11
✓robust standard errors

If heteroskedasticity is detected, with a large sample size, we can


estimate an OLS regression and obtain valid inferences (t and F tests) with
using heteroskedaticity robust standard errors.

In Stata, you can get the heteroskedasticity robust standard error by using the option (robust). for
example:
reg y x1 x2, robust

However, OLS is not the Best estimator anymore, and if we want to have a
more efficient estimator, we need to use an alternative estimator.

12
✓ Deriving an alternative estimator
•.

•.

•.

•.

•.

13
✓Deriving an alternative estimator

Trick: we know that OLS is BLUE under the Gauss-Markov conditions.

1. Transform the model such that it satisfies the Gauss-Markov assumptions again.

𝑦𝑖 x𝑖 𝑢𝑖
V{εi} = σ2i = σ2h2i → The transformed model model: = 𝛽+ (has an homoskedastic error term)
ℎ𝑖 ℎ𝑖 ℎ𝑖

2. Apply OLS to the transformed model.


𝑛 −1 𝑛
This leads to the generalized least squares (GLS) estimator, 𝛽መ = ෌𝑖=1 ℎ𝑖−2 𝑥𝑖′ 𝑥𝑖 ෌𝑖=1 ℎ𝑖−2 𝑥𝑖′ 𝑦𝑖 , which is BLUE.

• However, it can only be applied if we know hi or if we can estimate it by making additional


restrictive assumptions on the form of hi.

• This leads to a feasible GLS (FGLS, EGLS) estimator for heteroskedasticity that is called also
weighted least square (WLS), which is BLUE.
14
Illustration: explaining labor demand
We estimate a simple labor demand function for a sample of 569 Belgian
firms (from 1996).
We explain labor from output, wage costs and capital stock.

Note that the variables are scaled (to obtain coefficients in the same
order of magnitude).

15
A Linear Model

16
Breusch-Pagan test

We see (very) high t-


ratios and the high R2.

This indicates that the


squared errors are
strongly related to xi.

Test statistic: N x R2,


gives 331.0, which
provides a very strong
rejection!
reg labor wage output capital
predict e, resid
Gen e2=e^2
In Stata reg e2 wage output capital
Display 569* 0.5818
17
1st solution: Loglinear Model

Recall that in the loglinear model


the coefficients have the
interpretation of elasticities.
We can perform the Breusch–Pagan
test in a similar way as before: the
auxiliary regression of squared OLS
residuals on the three explanatory
variables (in logs) leads to an R2 of
0.0136. The resulting test statistic is
7.74, which is on the margin of being
significant at the 5% level. gen llabor=log(labor) …
Reg llabor lwage loutput lcapital
In Stata
predict e, resid
Gen e2=e^2
reg e2 lwage loutput lcapital
A more general test is the White display 569* 0.8430
test(next slide). 18
The White Test

With an R2 of 0.1029, this


leads to a value for the
White test statistic of 58.5,
which is highly significant for
a Chi-squared with 9
degrees of freedom.

Given the strong rejection(of


Homoskedasticity), we next
estimate the loglinear model reg labor lwage loutput lcapital
using White standard errors
imtest wage output capital, white
(next slide).
19
2nd solution: Robust (White) s.e.’s

• In many cases, using White


(heteroskedasticity-consistent)
standard errors is appropriate and
a good solution to the problem of
heteroskedasticity.
• Standard errors , t-statictics and F-
statistic are adjusted.
• Qualitatively, the conclusions are
not changed: wages and output
are significant in explaining labour
demand, capital is not.

Sometimes, we would like to have a more efficient estimator, by making some assumption about the form of
heteroskedasticity(next slide).
20
Multiplicative heteroskedasticity

• If we are willing to make assumptions


about the form of heteroskedasticity,
the use of the more efficient EGLS
estimator is an option.
• We consider the multiplicative form,
and choose zi = xi.
• The variables log(capital ) and
log(output) appear to be important in
explaining the variance of the error
term. Also note that the F-value of
this auxiliary regression
• leads to rejection of the null
hypothesis of homoskedasticity.

The exponential of the predicted values of the regression can be used to transform the original data.
Transforming all variables and using an OLS procedure on the transformed equation yields the EGLS estimates
presented in Table 4.7.
21
3rd solution: EGLS loglinear model

To obtain the EGLS estimator,


compute

and transform all observations


to obtain
𝑦𝑖 𝑥𝑖 𝑢𝑖
= 𝛽+
ℎ෠ 𝑖 ℎ෠ 𝑖 ℎ෠ 𝑖

The error term in this model is (approximately) homoskedastic. Applying OLS to the
transformed model gives the EGLS estimator for β.

Note: the transformed regression is for computational purposes only. All economic
interpretations refer to the original model!
22
In Stata:

. reg llabor lwage loutput lcap


. predict u, resid
. gen u2=u^2
. gen lu2=log(u2)
. reg lu2 lwage loutput lcap
. predict yhat
. gen weight=1/exp(yhat)
. reg llabor lwage loutput lcap [aweight=weight]

23
• Comparing Table 4.7 and 4.5, we see that the efficiency gain is substantial.

• The standard errors for the EGLS approach are smaller.

• Comparison with Table 4.3 is not appropriate. This table is wrong and misleading.

• The coefficient estimates are fairly close to the OLS ones. Note that the effect of
capital is now statistically significant.

• The fact that the R2 in Table 4.7 is larger than in the OLS case is misleading because
the R2 is computed for the transformed model with a transformed endogenous
variable.
The R2 in table 4.7 expresses the amount of variation in llabor/h that is explained by the model,
not the variation in llabor itself. Because observations with large values for hi are less accurately
described by the model, this increases the value of the reported R2. 24
Problem 15 (Problem 14- Continued)

Consider the same data in problem 14.

e. Using the results from d, compute an EGLS estimator for the linear model. Compare
your results with those obtained under a. Redo the tests from b.

f. Comment upon the appropriateness of the R2 in the regression of e.

25

You might also like