Econometrics
University of Milan-Bicocca
Course lecturer:
Maryam Ahmadi
[email protected]
1
Interpreting and
Comparing Regression
Models
2
Problem & Answer. 11.
Assumptions A1 and A2. A1: Zero average assumption: E(u)=0
A2: Conditional mean independence; E(u|x) = 0
A3: Homoskedasticity
We also need A3 and A4 to make sure that se(𝛽) መ is the correct standard error. A4: No autocorrelation
In small samples, the confidence interval is valid if A5 is imposed too. A5: normality of the error terms
3 −1
𝛽
The hypothesis that 𝛽3 =1 can be tested by means of a t-test. The test statistic is 𝑡 = 3 ), which under the null
𝑠𝑒(𝛽
hypothesis and assumptions A1-A5, has a t distribution with n-3 degree of freedom. (At the 95% confidence level, we
reject the null of |t| > c. (c is the critical value for 𝛼 = 0.05 and df=n-3)
3
(𝑹𝟐𝟏 −𝑹𝟐𝟎 )/𝑗
The joint hypothesis that 𝛽2 = 𝛽3 = 0 can be tested by means of an F-test. The test statistic is 𝑓 =
𝟏−𝑹𝟐𝟏 /(𝑛−𝑘−1)
We compare the test statistic with the critical values from an F distribution with 2 (the number of restrictions)
and n-3 degrees of freedom.
This is an example of exact multicollinearity. Estimation will break down. Regression software will either give an error
message (“singular matrix") or automatically drop either xi2 or xi3 from the model.
𝑦𝑖 = 𝛽1∗ + 𝛽2∗ 𝑥12
∗
+ 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖
𝜷𝟑∗ = 𝜷𝟑
= 𝛽1∗ + 𝛽2∗ 2𝑥𝑖2 − 2 + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖
2𝛽2∗ = 𝛽2 → 𝜷𝟐∗ = 𝜷𝟐 /𝟐
= 𝛽1∗ + 2𝛽2∗ 𝑥𝑖2 − 2𝛽2∗ + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖
𝛽1∗ − 2𝛽2∗ = 𝛽1 → 𝛽1∗ − 𝛽2 = 𝛽1 → 𝜷𝟏∗ = 𝜷𝟏 + 𝜷𝟐
= 𝛽1∗ − 2𝛽2∗ + 2𝛽2∗ 𝑥𝑖2 + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖
𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢𝑖
The R2s of the two models are identical. These two models are statistically equivalent.
4
𝑦𝑖 = 𝛽1∗ + 𝛽2∗ 𝜀i∗ + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖
= 𝛽1∗ + 𝛽2∗ 𝑥𝑖2 − 𝑥𝑖3 + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖 𝜷𝟏∗ = 𝜷𝟏
= 𝛽1∗ + 𝛽2∗ 𝑥𝑖2 − 𝛽2∗ 𝑥𝑖3 + 𝛽3∗ 𝑥𝑖3 + 𝑢𝑖 𝜷𝟐∗ = 𝜷𝟐
= 𝛽1∗ + 𝛽2∗ 𝑥𝑖2 + (𝛽3∗ −𝛽2∗ )𝑥𝑖3 + 𝑢𝑖 𝛽3∗ − 𝛽2∗ = 𝛽3 → 𝛽3∗ − 𝛽2 = 𝛽3 → 𝜷𝟑∗ = 𝜷𝟐 + 𝜷𝟑
𝑦𝑖 = 𝛽1 + 𝛽2 𝑥𝑖2 + 𝛽3 𝑥i3 + 𝑢𝑖
The R2s are not affected. This model is also statistically equivalent to the original model.
The coefficient for 𝒙𝐢𝟑 now has a different interpretation, because the ceteris paribus condition has changed.
5
The RESET test
A simple test on the functional form of the model, E{𝑦𝑖 | x𝑖 } = x𝑖 β. It is based on an auxiliary regression to test
whether additional nonlinear terms in x𝑖 are significant.
RESET = regression equation specification error test (Ramsey, 1969).
1- Construct the fitted value from the model
𝑦ො𝑖 = x𝑖 β (fitted value).
If you do not reject the null: the test
2- test whether nonlinear functions of it help explaining yi. result suggest that powers of the
regressors do not jointly add to the
𝑄
Auxiliary regression: 𝑦𝑖 = x𝑖 β + 𝛼2 𝑦ො𝑖2 + 𝛼3 𝑦ො𝑖3 + ⋯ + 𝛼𝑄 𝑦ො𝑖 + ν𝑖 explanatory power of the model. So we
may not considering adding them to the
model.
If you reject the null: the test result gives
• RESET test = F-test on Q-1 restrictions (𝛼2 = 𝛼3 = ⋯ = 𝛼𝑄 = 0). you a strong evidence that you might
want to consider adding powers of
independent variables to the model.
6
Choosing between a linear and loglinear model
On the basis of economic interpretation: do we think effects are in level or by percentage?
Loglinear models are easily interpretable in terms of elasticities.
Note: Because the dependent variable is different (𝑦𝑖 and log(𝑦𝑖 )) a comparison on the basis of goodness-of-fit measures is inappropriate.
• Estimate the linear model 𝑦𝑖 = x𝑖 𝛽 + 𝑢𝑖 ; and construct the fitted value 𝑦ො𝑖 .
𝑖 ).
• Estimate the loglinear model and construct the fitted value log(𝑦
• Test the linear model by testing 𝛿𝐿𝐼𝑁 = 0 in
𝑖 )) + 𝑢𝑖
𝑦𝑖 = x𝑖 𝛽 + 𝛿𝐿𝐼𝑁 (log(𝑦ො𝑖 ) − log(𝑦
• Test the loglinear model by testing 𝛿𝐿𝑂𝐺 = 0 in
𝑖 )}) + 𝑢𝑖
Log(𝑦𝑖 ) = log x𝑖 𝛾 + 𝛿𝐿𝑂𝐺 (𝑦ො𝑖 − exp{log(𝑦
• In both cases, use standard t-test.
7
• Both are auxiliary regressions and are only estimated for testing purposes.
• If the linear model is not rejected, you’d prefer the linear model.
• If the loglinear model is not rejected, you’d prefer the loglinear model.
• If both are rejected, neither model is appropriate and a more general one should be
considered.
• It may be noted that with sufficiently general functional forms it is possible
to obtain models for 𝑦𝑖 and log 𝑦𝑖 that are both correct in the sense that
they represent E{𝑦𝑖 | x𝑖 } and E{log 𝑦𝑖 |x𝑖 }, respectively. It is not possible,
however, that both specifications have a homoskedastic error term
8
Illustration: explaining house prices
An example of estimating an hedonic price function:
the price of a product (a house) is determined by the combination of its
characteristics.
Data: housing.dta
prices of 546 houses sold in Canada (1987). We observe characteristics like:
lot size, number of bedrooms, bathrooms, garage places, stories, and
dummy variables for presence of driveway, recreational room, full basement,
airco, gas hot water heating, and being located in a preferred area.
9
A simple model for log prices
𝑙𝑜𝑔(price) = 𝛽0 + 𝛽1 𝑙𝑜𝑔(𝑙𝑜𝑡𝑠𝑖𝑧𝑒) + 𝛽2 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠 + 𝛽3 𝑏𝑎𝑡ℎ𝑟𝑜𝑜𝑚𝑠 + 𝛽4 𝑎𝑖𝑟𝑐𝑜 + 𝑢
10
R2 = 0.5674: model explains 57% of variation in log prices.
Estimated elasticity of lot size is 0.4. That is, a 10% larger lot implies a 4%
higher expected price (ceteris paribus).
An additional bedroom increases the expected price by almost 8% (ceteris
paribus).
The functional form / model specification can be tested by
- means of a RESET test,
- including functions of explanatory variables,
- including other explanatory variables.
11
The RESET test:
A test on the functional form of the model, E{𝑦𝑖 | x𝑖 } = x𝑖 β.
𝑄
𝑦𝑖 = x𝑖 β + 𝛼2 𝑦ො𝑖2 + 𝛼3 𝑦ො𝑖3 + ⋯ + 𝛼𝑄 𝑦ො𝑖 + ν𝑖
RESET test = F-test on Q-1 restrictions (𝛼2 = 𝛼3 = ⋯ = 𝛼𝑄 = 0).
For the above model:
Q=2 : t = 0.514 (p = 0.61)
Q=3 : F = 0.56 (p = 0.57).
Thus, we cannot reject the null → we can not reject the current specification.
However, this does not necessarily mean that other variables are irrelevant (i.e. have no
impact on the house prices).
In fact, we may want to include other characteristics too.
12
include other characteristics:
𝑙𝑜𝑔(price) = 𝛽0 + 𝛽1 𝑙𝑜 𝑔 𝑙𝑜𝑡𝑠𝑖𝑧𝑒 + 𝛽2 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠 + 𝛽3 𝑏𝑎𝑡ℎ𝑟𝑜𝑜𝑚𝑠 + 𝛽4 𝑎𝑖𝑟𝑐𝑜 + 𝛽5 𝑑𝑟𝑖𝑣𝑒𝑤𝑎𝑦 + 𝛽6 𝑟𝑒𝑐𝑟𝑜𝑜𝑚
+𝛽7 𝑓𝑢𝑙𝑙𝑏𝑎𝑠𝑒 + 𝛽8 𝑔𝑎𝑠ℎ𝑤 + 𝛽9 𝑔𝑎𝑟𝑎𝑔𝑒𝑝𝑙 + 𝛽10 𝑝𝑟𝑒𝑓𝑎𝑟𝑒𝑎 + 𝛽11 𝑠𝑡𝑜𝑟𝑖𝑒𝑠 + 𝑢
Table 3.2
13
• R2 increases to 0.6865 (69%).
• Coefficients on previous variables typically reduce (lot size, bedrooms,
bathrooms, airco). Why? Note that the ceteris paribus condition has
changed.
• All characteristics have a significant (and positive) impact upon the house
prices.
• F-test on the seven additional variables can be computed as
corresponding to a p-value of 0.000. So?
14
linear and loglinear model?
price = 𝛽0 + 𝛽1 𝑙𝑜𝑡𝑠𝑖𝑧𝑒 + 𝛽2 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠 + 𝛽3 𝑏𝑎𝑡ℎ𝑟𝑜𝑜𝑚𝑠 + 𝛽4 𝑎𝑖𝑟𝑐𝑜 + 𝛽5 𝑑𝑟𝑖𝑣𝑒𝑤𝑎𝑦 + 𝛽6 𝑟𝑒𝑐𝑟𝑜𝑜𝑚
+𝛽7 𝑓𝑢𝑙𝑙𝑏𝑎𝑠𝑒 + 𝛽8 𝑔𝑎𝑠ℎ𝑤 + 𝛽9 𝑔𝑎𝑟𝑎𝑔𝑒𝑝𝑙 + 𝛽10 𝑝𝑟𝑒𝑓𝑎𝑟𝑒𝑎 + 𝛽11 𝑠𝑡𝑜𝑟𝑖𝑒𝑠 + 𝑢
15
• R2 is now 0.6731 (67%).
• Coefficients now have interpretation in terms of $’s rather than %. That is,
this model assumes that the price effects are additive rather than
multiplicative.
For example, the presence of a driveway is estimated to increase the expected price
by $6688 (rather than 11% according to model for log prices).
• As before, all characteristics have a significant (and positive) impact upon
the house prices.
16
The PE-test:
• Test the linear model by testing 𝛿𝐿𝐼𝑁 = 0 in
𝑖 )) + 𝑢𝑖
𝑦𝑖 = x𝑖 𝛽 + 𝛿𝐿𝐼𝑁 (log(𝑦ො𝑖 ) − log(𝑦
• Test the loglinear model by testing 𝛿𝐿𝑂𝐺 = 0 in
𝑖 )}) + 𝑢𝑖
Log(𝑦𝑖 ) = log x𝑖 𝛾 + 𝛿𝐿𝑂𝐺 (𝑦ො𝑖 − exp{log(𝑦
In both cases, use standard t-test.
linear model: PE = -6.196 (strong rejection);
loglinear model: PE = -0.569 (no rejection).
(If the linear model is not rejected, you’d prefer the linear model.
If the loglinear model is not rejected, you’d prefer the loglinear model.
If both are rejected, neither model is appropriate and a more general one should be considered.)
17
Problem 12.
In the housing price example:
1- Interpret the coefficients of “log(lotsize)”, “bedrooms” and “bathrooms” in table 3.2 and compare them with
the ones in in table 3.1. how do you justify the differences.
2- Use RESET test, in stata, to test the functional form of the specification in slide 13. (test for Q=1 and Q=2)
3- Generate four dummy variables relating to the number of bedrooms, corresponding to 2 or less, 3, 4, and 5 or
more. Estimate a model for log prices that includes log lot size, the number of bathrooms, the air conditioning
dummy and three of these dummies. Interpret the results.
4- Why is the model in question 3 not nested in the specification that is reported in Table 3.1?
(two models are said to be 'non-nested' if neither can be obtained from the other by the imposition of
appropriate parametric restrictions or as a limit of a suitable approximation) 18