Econometrics
University of Milan-Bicocca
Course lecturer:
Maryam Ahmadi
[email protected]
1
Heteroskedasticity
2
Take-Home Exam & Answer.
Question1.
a) Estimate the following model and report the estimation result. 𝑙𝑠𝑎𝑙𝑒𝑠=𝛽0 + 𝛽1 𝑙𝑝𝑟𝑖𝑐𝑒+ 𝛽2 𝑎𝑑𝑒𝑥𝑝+𝑢
where lsales is logarithm of sales and lprice is logarithm of price.
reg lsale lprice adexp …
b) Write the estimated equation (regression line) along with standard errors, number of observations an R-squared.
= 5.3054 - 0.57968 𝑙𝑝𝑟ⅈcⅇ + 0.02437 aⅆ𝑒𝑥𝑝
𝑙𝑠𝑎𝑙𝑒𝑠
(0.1415) (0.0811) (0.0089)
N = 75 R2 = 0.4451
c) Interpret the coefficient for lprice.
The coefficient of lprice is statistically significant at the 1% level. Holding other factors fixed, if price increases by 1% it is predicted that the sales
revenue would decrease by 0.579%.
d) What is the percentage effect of two additional units of advertisement expenditure ($2000) on sales revenue? Interpret your finding.
0.0243682×100×2 = 2.4368×2 = 4.87%, ceteris paribus, if advertising expenditure increases by 2000$, it is predicted that the sales revenue
would increase by 4.87%.
e) Is the model overly statistically significant? Justify your answer.
H0 : Blprice = Badexp = 0, H1: H0 is not true. Under assumptions A1-A5 the F-statistic follows a F distribution with 2 and 72 degrees of freedom.
F(2, 72)=28.87 with p-value=0.0000 (< 0.01). We reject the null hypothesis at 1% significance level. It means that this model is overly
statistically significant at 1% level. 3
Question 2.
a) Estimate the following model and report the estimation result. 𝑙𝑠𝑎𝑙𝑒𝑠=𝛽0 + 𝛽1 𝑙𝑝𝑟𝑖𝑐𝑒+ 𝛽2 𝑎𝑑𝑒𝑥𝑝+𝛽3 𝑎ⅆ𝑒𝑥𝑝2 +𝛽4 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 +𝑢
gen adexp2 = adexp^2
reg 𝑙𝑝𝑟𝑖𝑐𝑒 𝑎𝑑𝑒𝑥𝑝 adⅇxp2 𝑐𝑜𝑙𝑙𝑒𝑔𝑒
b) Write the fitted value equation (regression line) along with standard errors, number of observations, and R-squared.
= 5.6504 - 0.8512 𝑙𝑝𝑟ⅈcⅇ + 0.1660 aⅆ𝑒𝑥𝑝 - 0.0385 𝑎ⅆ𝑒𝑥𝑝2 + 0.0694 𝑐𝑜𝑙𝑙𝑒𝑔𝑒
𝑙𝑠𝑎𝑙𝑒𝑠
(0.2183) (0.1273) (0.0442) (0.0117) (0.0248)
n=75, R-squared=0.5569
c) Based on your estimation result, do the restaurant branches that are located in a college town have on average a higher sales revenue
than the other ones? Explain your answer.
Yes, we find that the dummy variable of college is statistically significant at 1% level, therefore ceteris paribus, the sales revenue in college
towns is on average 6.94% higher than other towns.
d) What is the percentage effect of an additional unit of advertisement expenditure on sales revenue, when adexp=0 and when adexp=1.5
($1000s) per month? What do you conclude?
𝜎𝑙𝑠𝑎𝑙𝑒𝑠/𝜎𝑎𝑑𝑒𝑥𝑝=0.16601-(2×0.0385×adexp).
If adexp=0, it equals 0.16601 and if adexp=1.5, it will be 0.05051. This means that, ceteris paribus, if there is no advertising expenditure at all,
spending 1000$ for advertisement is expected to increase the sales revenue by 16.6%, however if there is already an advertising expenditure
of 1.5 ($1000) an addition 1000$ is expected to increase the sales revenue by 5.05%. Therefore, we find a positive but diminishing effect
from advertising expenditure on sales revenue. 4
e) Suppose you are working in this restaurant chain as a consultant. What would you suggest to the managers about the expected effect of
advertising expenditures on sales revenue? At which point should they stop increasing their advertising expenditure? Why?
We expect a positive diminishing effect from advertising expenditure on sales revenue of this chain restaurant. This means that in the
beginning each additional unit of ads expenditure will increase the sales revenue however with a decreasing rate as the ad expenditure
increases, therefore, after a certain point of ad expenditure its effect on sales revenue turns to be negative.
𝜎𝑙𝑠𝑎𝑙𝑒𝑠/𝜎𝑎𝑑𝑒𝑥𝑝 = 0.1660104 + 2*(-0.0385643)adexp > 0 → - 0.077adexp > - 0.1660104 → adexp < 2.1523
Therefore this restaurant chain can increase the ad expenditure up to 2.152 or 2,152$, after this point, I suggest them to do not increase
their ad expenditure anymore because it will negatively affect their sales revenue, holding other factors fixed.
Question 3.
a)Among models in question 1 and question 2, which model would you choose to estimate the impact of advertising expenditure on sales
revenue? Explain your reason by goodness of fit of the models and statistical significance of the variables.
I choose MODEL 2, as it has a higher higher goodness of fit (R-squared) compared to MODEL 1. Moreover, both additional explanatory
variables of adⅇxp2 and college are significantly affect sales revenue and they are relevant to the model and excluding them cause omitted
variable bias.
b) In this case, what is an important consequence of choosing the wrong model? Be detailed in your answer. (Hint: discuss the potential
bias in the coefficient of adexp).
If we choose MODEL 1, we might face omitted variable bias for adexp. As adⅇxp2 is negatively correlated to lsales and positively correlated
to the adexp, if we drop it from the model, there will be a potential downward or negative bias or underestimating the effect of adexp.
5
If A3, the assumption of Homoskedasticity, is violated, Heteroskedasticity arises.
Consequences of heteroskedasticity:
• OLS is still unbiased under heteroskedastictiy!
• Also, interpretation of R-squared is not changed
• Heteroskedasticity invalidates variance formulas for OLS estimators, therefore, Routinely computed
standard errors are incorrect.
• The usual F and t tests are not longer valid under heteroskedasticity
• Under heteroskedasticity, OLS is no longer the best linear unbiased estimator (BLUE); there may be more
efficient linear estimators.
6
Detection of Heteroskedasticity
In order to judge whether in a given model the OLS results are misleading
because of inappropriate standard errors due to heteroskedasticity, a
number of alternative tests are available.
If these tests do not reject the null, there is no need to suspect our least
squares results.
If rejections are found, we may consider:
revising the specification of our model,
heteroskedasticity-consistent standard errors for the OLS estimator, or
the use of an EGLS estimator.
7
• The Breusch-Pagan test
The Breusch-Pagan test tests whether the error variance is a function of zi . Where zi is a
function (subset) of xi. Note that the functional form is such that the variances are never
negative. It is given by 𝜎𝑖2 = 𝜎 2 ℎ(𝑧𝑖′ 𝛼), for some function h with h(0)=1. The null is, H0: α = 0
versus H1: α ≠ 0. It is based on regressing the squared OLS residuals upon zi and a constant.
We can choose zi equal to the original regressors:
𝑢ො 2 = 𝛿0 + 𝛿1 𝑥1 + ⋯ + 𝛿𝑘 𝑥𝑘 + 𝜈
The null hypothesis of the homoskedasticity is then 𝐻0 : 𝛿1 = 𝛿2 = ⋯ = 𝛿𝑘 = 0
against the alternative that is 𝐻1 : 𝐻0 is not true
Test statistic: N multiplied by R2 of the auxiliary regression. Has Chi-squared distribution (DF= #
variables in auxiliary regression).
8
• The White test
The White test tests whether the error variance is a function of the explanatory variables, with a
more general alternative than Breusch-Pagan.
It is based on regressing the squared OLS residuals upon all regressors, their squares and their
(unique) cross-products.
For example, for k=3: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢, the white test is based on an estimation of
𝑢ො 2 = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥2 + 𝛿3 𝑥3 + 𝛿4 𝑥12 + 𝛿5 𝑥22 + 𝛿6 𝑥32 + 𝛿7 𝑥1 𝑥2 + 𝛿8 𝑥1 𝑥3 + 𝛿9 𝑥2 𝑥3 + 𝑣
Test statistic: N multiplied by R2 of the auxiliary regression. Has Chi-squared distribution
(DF = # variables in auxiliary regression).
• Advantage: general.
• Disadvantage: general (low power in small samples). 9
• Multiplicative heteroskedasticity test
Assume 𝜎𝑖2 = σ2 ⅇxp(𝑧𝑖′ 𝛼)
where zi is a function (subset) of xi. Note that the functional form is such that
the variances are never negative.
To estimate α we run an auxiliary regression log 𝑢ො 𝑖2 = log σ2 + 𝑧𝑖′ 𝛼 +𝜈𝑖
the simplest test is based on the standard F-test in auxiliary regression for the
hypothesis that all coefficients, are equal to zero.
10
Solutions of Heteroskedasticity
1- consider a loglinear model
It is quite common to find heteroskedasticity in the situations in which the size
of the observational units differs substantially. For example, in a sample
containing firms with one employee and firms with over 1000 employees. We
can expect that large firms have larger absolute values of all variables in the
model, including the unobservables collected in the error term.
A common approach to alleviate this problem is to use logarithms of all
variables rather than their levels. Consequently, our first step in handling the
heteroskedasticity problem is to consider a loglinear model.
11
Illustration: explaining labor demand
We estimate a simple labor demand function for a sample of 569 Belgian
firms (from 1996).
We explain labor from output, wage costs and capital stock.
Note that the variables are scaled (to obtain coefficients in the same
order of magnitude).
12
A Linear Model
13
Breusch-Pagan test
We see (very) high t-
ratios and the high R2.
This indicates that the
squared errors are
strongly related to xi.
Test statistic: N x R2,
gives 331.0, which
provides a very strong
rejection!
reg labor wage output capital
predict e, resid
Gen e2=e^2
In Stata reg e2 wage output capital
Display 569* 0.5818
14
1st solution: Loglinear Model
Recall that in the loglinear model
the coefficients have the
interpretation of elasticities.
We can perform the Breusch–Pagan
test in a similar way as before: the
auxiliary regression of squared OLS
residuals on the three explanatory
variables (in logs) leads to an R2 of
0.0136. The resulting test statistic is
7.74, which is on the margin of being
significant at the 5% level. gen llabor=log(labor) …
predict e, resid
In Stata Gen e2=e^2
reg e2 lwage loutput lcapital
display 569* 0.8430
A more general test is the White
15
test(next slide).
The White Test
With an R2 of 0.1029, this leads
to a value for the White test
statistic of 58.5, which is
highly significant for a Chi-
squared with 9 degrees of
freedom:
Strong rejection of Homoskedasticity
reg labor lwage loutput lcapital
imtest, white
16
Multiplicative heteroskedasticity test
• If we are willing to make
assumptions about the form of
heteroskedasticity, the use of the
more efficient EGLS estimator is an
option.
• We consider the multiplicative form,
and choose zi = xi.
• The variables log(capital) and
log(output) appear to be important
in explaining the variance of the
error term. Also note that the F-
value of this auxiliary regression
leads to rejection of the null
hypothesis of homoskedasticity.
17
Problem 14.
18