NOVA
IMS
Information
NOVA
Management
School
Pooled OLS. Fixed and
IMS Random Effects.
Information
Management
School Econometrics II
Bruno Damásio
R [email protected]
7 @bmpdamasio
Carolina Vasconcelos
R [email protected]
7 @vasconceloscm
2022/2023
Nova Information Management School
NOVA University of Lisbon
Instituto Superior de Estatística e Gestão da Informação
Universidae Nova de Lisboa
NOVA
IMS Table of contents
Information
Management
School
1. Motivation
2. Pooled OLS
3. Random Effects Estimation
4. Fixed Effects Estimation
5. Hausman Test
1
NOVA
IMS
Information
Management
School
Motivation
NOVA
IMS Motivation
Information
Management
School
• So far, we have covered cross-sectional and time series data.
• When using panel data, the main motivation is to solve the omitted
variables problem.
• Let y and x = (x1 , x2 , . . . , xk ) be observable random variables, and
let c be an unobservable random variable.
• As is often the case, we are interested in the partial effects of the
observable explanatory variable xj in the population regression
function:
E [y | x1 , x2 , . . . , xk , c]
• In other words, we would like to hold c constant when obtaining
partial effects of the observable explanatory variables.
2
NOVA
IMS Motivation
Information
Management
School
• Assuming a linear model, with c entering additively along with the
xj , we have:
E [y | x, c] = β0 + xβ + c
• If c is uncorrelated with each xj , then c is just another unobserved
factor affecting y that is not systematically related to the observable
explanatory variables whose effects are of interest.
• On the other hand, if Cov (xj , c) 6= 0 for some j, putting c into the
error term can cause some problems.
3
NOVA
IMS Motivation
Information
Management
School
• In addition to unobserved effect, there are many other names given
to ci in applications: unobserved component, latent variable,
unobserved heterogeneity.
• ci can be treated as a random effect or a fixed effect.
4
NOVA
IMS Motivation
Information
Management
School
Example
Suppose we want to study the factors that influence wage for working
adults. Our regression is:
log (wage) = β0 +β1 educ+β2 exper +β3 exper 2 +β4 married+β5 union+ci +ui
where educ is the number of years od schooling, exper the number of
years working, married is a dummy variable that assumes the value 1 if
the individual is married and union is a dummy variable that assumes
the value 1 if the individual is unionized.
Here, the observed heterogeneity could be the individual’s ability.
Ignoring this term could lead to biased and inconsistent estimators.
5
NOVA
IMS Example
Information
Management
School
library(plm)
library(wooldridge)
pdata <- pdata.frame(wagepan, index=c('nr','year'))
6
NOVA
IMS
Information
Management
School
Pooled OLS
NOVA
IMS Pooled OLS
Information
Management
School
• A pooled OLS simply applies the OLS for a sample of N × T
observations, ignoring the panel structure of the data.
• Under certain assumptions, the pooled OLS estimator can be used
to obtain a consistent estimator of β in the model below:
yit = xit β + vit , t = 1, 2, . . . , T
where vit = ci + uit are the composite errors.
• For each t, vit is the sum of the unobserved effect and an
idiosyncratic error (uit ).
7
NOVA
IMS Pooled OLS
Information
Management
School
Knowing that we have a sample of N independent cross sections
observed during T periods of time and that ci is independent of uit .
Under the following assumptions:
POLS.1: Contemporaneous exogeneity
E [vit | xit ] = 0, which implies that
E [xit ci ] = 0 and E [xit uit ] ⇒ E [xit vit ] = 0
POLS.2: No Perfect Collinearity
r(E [X0i Xi ]) = k
The pooled OLS is consistent, however if c is correlated with any element
of xt , then pooled OLS is biased and inconsistent.
8
NOVA
IMS Pooled OLS: Assumptions
Information
Management
School
• In this setting, the composite errors will be serially correlated due to
the presence of ci in each time period.
• Therefore, inference using pooled OLS requires the robust variance
matrix estimator.
9
NOVA
IMS Pooled OLS - Example
Information
Management
School
#Pooled ols
pool <- plm(lwage ~ educ + exper + expersq
+ married + union, data=pdata, model='pooling')
summary(pool)
## Pooling Model
##
## Call:
## plm(formula = lwage ~ educ + exper + expersq + married + union,
## data = pdata, model = "pooling")
##
## Balanced Panel: n = 545, T = 8, N = 4360
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -5.250263 -0.251143 0.032958 0.296583 2.578665
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -0.03430571 0.06325594 -0.5423 0.5876177
## educ 0.09899449 0.00462272 21.4148 < 2.2e-16 ***
## exper 0.08616963 0.01014151 8.4967 < 2.2e-16 ***
## expersq -0.00273490 0.00070989 -3.8526 0.0001186 ***
## married 0.12301124 0.01557145 7.8998 3.511e-15 ***
## union 0.16852431 0.01706519 9.8753 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 1236.5
## Residual Sum of Squares: 1015.2
## R-Squared: 0.179
## Adj. R-Squared: 0.17806 10
## F-statistic: 189.857 on 5 and 4354 DF, p-value: < 2.22e-16
NOVA
IMS Pooled OLS
Information
Management
School
To obtain the robust standard errors:
library(lmtest)
coeftest(pool,vcov=vcovHC(pool,type="HC0",cluster="group"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03430571 0.11468931 -0.2991 0.764864
## educ 0.09899449 0.00891189 11.1081 < 2.2e-16 ***
## exper 0.08616963 0.01267493 6.7984 1.201e-11 ***
## expersq -0.00273490 0.00089009 -3.0726 0.002135 **
## married 0.12301124 0.02566248 4.7934 1.694e-06 ***
## union 0.16852431 0.02784722 6.0517 1.553e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
11
NOVA
IMS
Information
Management
School
Random Effects Estimation
NOVA
IMS Random Effects
Information
Management
School
As with pooled OLS, a random effects analysis puts ci into the error
term.
In fact, random effects analysis imposes more assumptions than those
needed for pooled OLS: strict exogeneity in addition to orthogonality
between ci and xit .
RE.1: Strict exogeneity
(a) E [uit | xi , ci ] = 0, t = 1, . . . , T
(b) E [ci | xi ] = E [ci ] = 0
where xi = (xi1 , xi2 , . . . , xiT )
12
NOVA
IMS Random Effects
Information
Management
School
• The random effects approach exploits the serial correlation in the
composite error, vit = ci + uit , in a generalized least squares (GLS)
framework.
• We can write the model for all T time periods as
yi = Xi β + vi
and vi can be written as vi = ci jT + ui , where jT of the T × 1
vector of ones.
• We define the (unconditional) variance matrix of vi as
Ω = E [vi vi0 ]
a T × T matrix that we assume to be positive definite.
13
NOVA
IMS Random Effects
Information
Management
School
For consistency of GLS, we need the usual rank condition for GLS:
RE.2: No perfect colinearity
rank E Xi Ω−1 Xi = K
14
NOVA
IMS Random Effects
Information
Management
School
RE.3: Homoskedasticity
(a) E [ui u0i | xi , ci ] = σu2 IT
(b) E ci2 | xi = σc2
• It can be shown that
E vit2 = E ci2 + 2E [ci uit ] + E uit2 = σc2 + σu2 and, for t 6= s we
have E [vit , vis ] = E [(ci + uit )(ci + uis )] = E ci2 = σc2 .
• Hence,
Ω is given by
σc2 + σu2 σc2 ... σc2
σ2 ..
c σc2 + σu2 ... .
.
.. ..
. σc2
σc2 σc2 + σu2
15
NOVA
IMS Random Effects
Information
Management
School
Let β̂ RE be the random effects estimator:
Under assumptions RE.1 and RE.2 ⇒ β̂ RE is consistent.
Under assumptions RE.1, RE.2 and RE.3 ⇒ β̂ RE is asymptotically
efficient.
16
NOVA
IMS Random Effects
Information
Management
School
library(plm)
re <- plm(lwage ~ educ + exper + expersq
+ married + union, model='random', data=pdata)
summary(re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = lwage ~ educ + exper + expersq + married + union,
## data = pdata, model = "random")
##
## Balanced Panel: n = 545, T = 8, N = 4360
##
## Effects:
## var std.dev share
## idiosyncratic 0.1234 0.3513 0.536
## individual 0.1068 0.3268 0.464
## theta: 0.6448
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -4.571195 -0.144684 0.022637 0.185205 1.540599
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -0.11868027 0.10716727 -1.1074 0.2681
## educ 0.10120098 0.00877628 11.5312 < 2.2e-16 ***
## exper 0.11147576 0.00826109 13.4941 < 2.2e-16 ***
## expersq -0.00404533 0.00059199 -6.8334 8.292e-12 ***
## married 0.06683014 0.01673672 3.9930 6.524e-05 ***
## union 0.10415011 0.01781439 5.8464 5.023e-09 *** 17
## ---
NOVA
IMS
Information
Management
School
Fixed Effects Estimation
NOVA
IMS Fixed Effects Estimation
Information
Management
School
• Let’s consider again the linear unobserved effects model for T time
periods:
yit = xit β + ci + uit , t = 1, . . . , T
• The fixed effects analysis allows for ci to be arbitrarily correlated
with xit :
yi = Xi β + ci jt + ui
where jT is the T × 1 vector of ones.
18
NOVA
IMS Fixed Effects Estimation
Information
Management
School
• The first fixed effects (FE) assumption is strict exogeneity of the
explanatory variables conditional on ci :
FE.1: Strict exogeneity
E [uit | xi , ci ] = 0, t = 1, 2, . . . , T
• In fixed effects analysis, E [ci | xi ] is allowed to be any function of xi .
19
NOVA
IMS Fixed Effects Estimation
Information
Management
School
• Essentially, we relax the assumption RE.1b, which allows to
consistently estimate partial effects in the presence of time-constant
omitted variables that can be arbitrarily related to the observable xit .
• Therefore, fixed effects analysis is more robust than random effects
analysis.
20
NOVA
IMS Fixed Effects Estimation
Information
Management
School
• The idea for estimating β under the Assumption FE.1 is to
transform the equations to eliminate the unobserved effect ci . This
transformation is denoted the fixed effects transformation or
within transformation.
• Considering the cross section equation
ȳi = x̄i β + ci + ūi
PT PT
where ȳi = T −1 t=1 yit , x̄i = T −1 t=1 xi t and
T
ūi = T −1 t=1 uit . Subtracting this equation for each t gives the
P
FE transformed equation:
yit − ȳi = (xit − x̄i )β + uit − ūi
or
ÿit = ẍit β + üit , t = 1, 2, . . . , T
21
NOVA
IMS Fixed Effects Estimation
Information
Management
School
• Since the assumption E [ẍ0it üit ] = 0 holds under Assumption FE.1,
we can apply pooled OLS.
• The fixed effects (FE) estimator, denoted by β̂ FE is the pooled
OLS estimator from the regression ÿit on ẍit .
22
NOVA
IMS Fixed Effects Estimation
Information
Management
School
FE.2: No perfect collinearity
PT
rank( t=1 E [ẍ0it ẍit ]) = k
FE.3: Homoskedasticity
E [ui u0i | xi , ci ] = σu2 IT
It can be shown that the errors (üit ) are negatively serially correlated,
however as T gets large, the correlation tends to zero.
23
NOVA
IMS Fixed Effects Estimation
Information
Management
School
library(plm)
fe <- plm(lwage ~ educ + exper + expersq
+ married + union, model='within', data=wagepan)
summary(fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = lwage ~ educ + exper + expersq + married + union,
## data = wagepan, model = "within")
##
## Balanced Panel: n = 545, T = 8, N = 4360
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -4.1726214 -0.1257010 0.0092527 0.1595770 1.4701690
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## exper 0.11684669 0.00841968 13.8778 < 2.2e-16 ***
## expersq -0.00430089 0.00060527 -7.1057 1.422e-12 ***
## married 0.04530332 0.01830968 2.4743 0.01339 *
## union 0.08208713 0.01929073 4.2553 2.138e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 572.05
## Residual Sum of Squares: 470.2
## R-Squared: 0.17804
## Adj. R-Squared: 0.059852
## F-statistic: 206.375 on 4 and 3811 DF, p-value: < 2.22e-16
24
NOVA
IMS
Information
Management
School
Hausman Test
NOVA
IMS Hausman test
Information
Management
School
• Since the key consideration in choosing between a random effects
and fixed effects approach is whether ci and xit are correlated, it is
important to have a method for testing this assumption.
• Hausman (1978) proposed a test based on the difference between
the random effects and fixed effects estimates. Since FE is
consistent when ci and xit are correlated, but RE is inconsistent, a
statistically significant difference is interpreted as evidence against
the random effects assumption RE.1b.
25
NOVA
IMS Hausman test
Information
Management
School
• The hypothesis are H0 : E [xit ci ] = 0 (both FE and RE are consistent
estimators, RE is the most efficient estimator) vs H1 : E [xit ci ] 6= 0
(only FE is consistent).
• The test applies only to variables that vary on time.
• If the assumption RE.3 fails, one must use the Robust Hausman test.
26
NOVA
IMS Hausman test
Information
Management
School
phtest(fe, re)
##
## Hausman Test
##
## data: lwage ~ educ + exper + expersq + married + union
## chisq = 33.712, df = 4, p-value = 8.537e-07
## alternative hypothesis: one model is inconsistent
27
NOVA
IMS Hausman test
Information
Management
School
#Test heteroskedasticity (verify assumption RE.3)
library(lmtest)
bptest(lwage ~ educ + exper + expersq
+ married + union + factor(nr), data=pdata)
##
## studentized Breusch-Pagan test
##
## data: lwage ~ educ + exper + expersq + married + union + factor(nr)
## BP = 761.47, df = 548, p-value = 3.716e-09
28
NOVA
IMS Robust Hausman test
Information
Management
School
phtest(fe, re, vcov = function(x) vcovHC(x, method="white2", type="HC3"))
##
## Hausman Test
##
## data: lwage ~ educ + exper + expersq + married + union
## chisq = 33.712, df = 4, p-value = 8.537e-07
## alternative hypothesis: one model is inconsistent
29