Topic 7 Panel Data Analysis II
Topic 7 Panel Data Analysis II
http://rizaudinsahlan.blogspot.my
RIZAUDIN SAHLAN 1
4.1 FIXED EFFECTS ESTIMATION
• First differencing is just one if the many ways to eliminate the
fixed effect, 𝑢𝑖 .
• An alternative method, is called the fixed effects
transformation.
• Consider a model with a single explanatory variable, for each
𝑖;
𝑦𝑖 = 𝛽1 𝑥𝑖 + 𝑢𝑖 + 𝑒𝑖𝑡 (4.2)
RIZAUDIN SAHLAN 2
4.1 FIXED EFFECTS ESTIMATION
where;
𝑇
𝑦𝑖 = 𝑇 −1 𝑡=1 𝑦𝑖𝑡 and so on.
or
RIZAUDIN SAHLAN 4
4.1 FIXED EFFECTS ESTIMATION
• The between estimator is obtained by OLS estimator on
Eq(4.2) .
• We use the time average for both 𝑦 and 𝑥 and then run a
cross-sectional regression.
• We not study between estimator – biased when 𝑢𝑖 is
correlated with 𝑥.
• If we think 𝑢𝑖 is not correlated , better use random effect
estimator.
• Adding more explanatory variables to the equation, the
unobserved effect model become;
RIZAUDIN SAHLAN 7
4.1 FIXED EFFECTS ESTIMATION
• For each cross-sectional observation 𝑖, we loose one 𝑑𝑓
because time-demeaning.
• Therefore, an appropriate 𝑑𝑓 is 𝑑𝑓 = 𝑁𝑇 − 𝑁 − 𝑘 or
𝑑𝑓 = 𝑁 𝑇 − 1 − 𝑘.
• Fortunately, modern regression packages that have fixed
effects estimator feature properly compute the 𝑑𝑓.
RIZAUDIN SAHLAN 8
4.1 FIXED EFFECTS ESTIMATION
Example Effect of Job on Firm Scrap Rates
• We use the data JTRAIN.dta to look the effect of job training
on firm scrap rates.
• We use data for three years, 1987,1988 and 1989 on the 54
firms reported scrap rates in each year.
• No firm received grant in 1987, 19 firm received in 1988 and
10 different firms received in 1989.
RIZAUDIN SAHLAN 9
4.1 FIXED EFFECTS ESTIMATION
• Therefore, we allow for the possibility that the additional job
training in 1988 made workers more productive in 1989.
• This can be done by including lagged value of the grant.
• We also include year dummies for 1988 and 1989.
RIZAUDIN SAHLAN 10
Fixed-effects (within) regression Number of obs = 162
Group variable: fcode Number of groups = 54
F(4,104) = 6.54
corr(u_i, Xb) = -0.0714 Prob > F = 0.0001
sigma_u 1.438982
sigma_e .4977442
rho .89313867 (fraction of variance due to u_i)
F test that all u_i=0: F(53, 104) = 24.66 Prob > F = 0.0000
RIZAUDIN SAHLAN 12
4.2 THE LEAST SQUARE DUMMY VARIABLE (LSDV)
• A traditional view of the FE approach is to assume the
unobserved effect, 𝑢𝑖 is the intercept for person 𝑖 (firm, city
and so on).
• Clearly, we cannot do this with single cross-section.
• We need at least two periods.
• The way we estimate an intercept for each 𝑖 is to put in DV for
each cross-sectional observation, along with explanatory
variables (and probably DV for each time period).
• This method is usually called the dummy variable regression.
• The DV method is not very practical for panel data sets with
many cross-sectional observations.
RIZAUDIN SAHLAN 13
4.2 THE LEAST SQUARE DUMMY VARIABLE (LSDV)
• The LSDV regression give us exactly the same estimate of the
𝛽𝑗 that we would obtain from the regression on time-
demeaned data, and the SE and other major statistics are
identical.
• Therefore, the FE estimator can be obtained by the dummy
variable regression.
• The 𝑅2 from the LSDV regression is usually rather high.
• This occurs because we are including a DV for each cross-
sectional unit, which explains much of the variation in the
data.
• We use the data JTRAIN.dta again to look the effect of job
training on firm scrap rates
• Now, we will create the DV for cross section (𝑖) or 𝑓𝑐𝑜𝑑𝑒
identifier. RIZAUDIN SAHLAN 14
. xtset fcode year
panel variable: fcode (strongly balanced)
time variable: year, 1987 to 1989
delta: 1 unit
. tabulate fcode,generate(dum)
firm code
number Freq. Percent Cum.
RIZAUDIN SAHLAN 15
. list fcode dum1-dum10 in 1/30
fcode dum1 dum2 dum3 dum4 dum5 dum6 dum7 dum8 dum9 dum10
1. 410032 1 0 0 0 0 0 0 0 0 0
2. 410032 1 0 0 0 0 0 0 0 0 0
3. 410032 1 0 0 0 0 0 0 0 0 0
4. 410440 0 1 0 0 0 0 0 0 0 0
5. 410440 0 1 0 0 0 0 0 0 0 0
6. 410440 0 1 0 0 0 0 0 0 0 0
7. 410495 0 0 1 0 0 0 0 0 0 0
8. 410495 0 0 1 0 0 0 0 0 0 0
9. 410495 0 0 1 0 0 0 0 0 0 0
10. 410500 0 0 0 1 0 0 0 0 0 0
11. 410500 0 0 0 1 0 0 0 0 0 0
12. 410500 0 0 0 1 0 0 0 0 0 0
13. 410501 0 0 0 0 1 0 0 0 0 0
14. 410501 0 0 0 0 1 0 0 0 0 0
15. 410501 0 0 0 0 1 0 0 0 0 0
16. 410509 0 0 0 0 0 1 0 0 0 0
17. 410509 0 0 0 0 0 1 0 0 0 0
18. 410509 0 0 0 0 0 1 0 0 0 0
19. 410513 0 0 0 0 0 0 1 0 0 0
20. 410513 0 0 0 0 0 0 1 0 0 0
21. 410513 0 0 0 0 0 0 1 0 0 0
22. 410517 0 0 0 0 0 0 0 1 0 0
23. 410517 0 0 0 0 0 0 0 1 0 0
24. 410517 0 0 0 0 0 0 0 1 0 0
25. 410518 0 0 0 0 0 0 0 0 1 0
26. 410518 0 0 0 0 0 0 0 0 1 0
27. 410518 0 0 0 0 0 0 0 0 1 0
28. 410521 0 0 0 0 0 0 0 0 0 1
29. 410521 0 0 0 0 0 0 0 0 0 1
30. 410521 0 0 0 0 0 0 0 0 0 1
RIZAUDIN SAHLAN 16
. reg lscrap d88 d89 grant grant_1 dum2- dum157
RIZAUDIN SAHLAN 17
4.2 THE LEAST SQUARE DUMMY VARIABLE (LSDV)
dum135 0 (omitted)
dum136 .4492324 .4124022 1.09 0.279 -.3685766 1.267041
dum137 0 (omitted)
dum138 0 (omitted)
dum139 0 (omitted)
dum140 0 (omitted)
dum141 -.833677 .4124022 -2.02 0.046 -1.651486 -.0158681
dum142 0 (omitted)
dum143 0 (omitted)
dum144 -.1156672 .4201882 -0.28 0.784 -.9489162 .7175818
dum145 0 (omitted)
dum146 0 (omitted)
dum147 0 (omitted)
dum148 0 (omitted)
dum149 -1.210424 .4201882 -2.88 0.005 -2.043673 -.3771747
dum150 0 (omitted)
dum151 0 (omitted)
dum152 0 (omitted)
dum153 0 (omitted)
dum154 0 (omitted)
dum155 -.7330617 .4201882 -1.74 0.084 -1.566311 .1001873
dum156 1.480697 .4201882 3.52 0.001 .6474482 2.313946
dum157 0 (omitted)
_cons 1.833711 .2990539 6.13 0.000 1.240676 2.426746
RIZAUDIN SAHLAN 18
4.2 THE LEAST SQUARE DUMMY VARIABLE (LSDV)
. xtreg lscrap d88 d89 grant grant_1,fe
F(4,104) = 6.54
corr(u_i, Xb) = -0.0714 Prob > F = 0.0001
sigma_u 1.438982
sigma_e .49774421
rho .89313867 (fraction of variance due to u_i)
F test that all u_i=0: F(53, 104) = 24.66 Prob > F = 0.0000
RIZAUDIN SAHLAN 19
LSDV xtreg
b/se b/se
RIZAUDIN SAHLAN 20
4.3 FIXED EFFECTS OR FIRST DIFFERENCING?
• So far our discussion- two competing method for estimating
unobserved effects models.
• First – differencing the data.
• Second – time-demeaning (FE).
• How to know which one to use?
• When 𝑇 = 2, the estimation of FE and FD are identical and it
does not matter which we use.
• The FE estimation must include a DV for the second time
period in order to be identical to the FD estimates that include
intercept.
• The FD has advantage of being straightforward to implement
in any econometric package that support basic data
manupulation and easy to compute hetero-robust statistics
after FD estimation. RIZAUDIN SAHLAN 21
4.3 FIXED EFFECTS OR FIRST DIFFERENCING?
• When 𝑇 ≥ 3, the FE and FD estimators are not same even
both are consistent (with 𝑇 fixed as 𝑁 → ∞)
• For large 𝑁 and small 𝑇, the choice between FE and FD hinges
on the relative efficiency of the estimators which is
determined by serial correlation in the idiosyncratic error, 𝑢𝑖𝑡 .
• When 𝑢𝑖𝑡 serially uncorrelated, FE is more efficient than the
FD (and the SE from FE are valid).
• Since the unobserved effects model is typically stated with
serially uncorrelated idiosyncratic error, the FE estimator is
used more than the FD estimator.
RIZAUDIN SAHLAN 22
4.4 RANDOM EFFECTS MODELS
• We begin with same unobserved effects model;
RIZAUDIN SAHLAN 23
4.4 RANDOM EFFECTS MODELS
• Eq(4.7) becomes a random effects model when we assume
that the unobserved effect 𝑢𝑖 is uncorrelated with each
explanatory variable;
RIZAUDIN SAHLAN 25
4.4 RANDOM EFFECTS MODELS
• Under the RE assumptions,
𝐶𝑜𝑟𝑟 𝑣𝑖𝑡 , 𝑣𝑖𝑠 = 𝜎𝑢2 𝜎𝑢2 + 𝜎𝑒2 , 𝑡 ≠ 𝑠
• The serial correlation in error term will make the usual pooled
OLS for SE ignore the correlation, and lead to incorrect SE and
others usual test statistics.
• In topics before we have showed how GLS can be used to
estimate model with AR serial correlation.
• We also can used GLS to solve serial correlation problem in
panel data here.
RIZAUDIN SAHLAN 26
4.4 RANDOM EFFECTS MODELS
• For this procedure have good properties, we should have
large 𝑁and relatively small 𝑇.
• Deriving the GLS transformation that eliminate serial
correlation in the errors requires sophisticated matrix algebra.
• But the transformation itself is simple.
• Define
𝜃 = 1 − 𝜎𝑒2 𝜎𝑒2 + 𝑇𝜎𝑢2 1/2 (4.10)
RIZAUDIN SAHLAN 30
4.4 RANDOM EFFECTS MODELS
• If our RE estimation show that 𝜃 close to zero, RE estimate will
be close to pooled OLS estimate.
• But if 𝜃 close to one, RE estimate will be close to FE estimate.
This is the case when unobserved effect, 𝑢𝑖 , is relatively
unimportant ( small variance relative to 𝜎𝑒2 ) or 𝜎𝑢2 < 𝜎𝑒2 .
• The 𝜎𝑢2 must have large relative to 𝜎𝑒2 to make 𝜃 closer to
unity.
• Generally, as 𝑇 gets large, 𝜃 tends to one, and this makes the
RE and FE estimate very similar.
RIZAUDIN SAHLAN 31
4.4 RANDOM EFFECTS MODELS
Example A Wage Equation Using Panel Data
• We again use the data WAGEPAN.dta to estimate a wage
equation for men.
• We use three method: pooled OLS, RE and FE.
• The dependent variable is log(𝑤𝑎𝑔𝑒) and explanatory
variables is 𝑒𝑑𝑢𝑐, 𝑏𝑙𝑎𝑐𝑘, 𝑖𝑠𝑝, 𝑒𝑥𝑝𝑒𝑟, 𝑒𝑥𝑝𝑒𝑟 2 ,
𝑚𝑎𝑟𝑟𝑖𝑒𝑑,𝑢𝑛𝑖𝑜𝑛, and full set of year dummies.
RIZAUDIN SAHLAN 32
4.4 RANDOM EFFECTS MODELS
Pooled OLS
RIZAUDIN SAHLAN 33
4.4 RANDOM EFFECTS MODELS
Random Effects Random-effects GLS regression
Group variable: nr
Number of obs
Number of groups
=
=
4,360
545
sigma_u .32460314
sigma_e .35099001
rho .46100215 (fraction of variance due to u_i)
RIZAUDIN SAHLAN 34
4.4 RANDOM EFFECTS MODELS
Fixed-effects (within) regression Number of obs = 4,360
F(10,3805) = 83.85
corr(u_i, Xb) = -0.1212 Prob > F = 0.0000
educ 0 (omitted)
black 0 (omitted)
hisp 0 (omitted)
exper .1321464 .0098247 13.45 0.000 .1128842 .1514087
expersq -.0051855 .0007044 -7.36 0.000 -.0065666 -.0038044
married .0466804 .0183104 2.55 0.011 .0107812 .0825796
union .0800019 .0193103 4.14 0.000 .0421423 .1178614
d81 .0190448 .0203626 0.94 0.350 -.0208779 .0589674
d82 -.011322 .0202275 -0.56 0.576 -.0509798 .0283359
d83 -.0419955 .0203205 -2.07 0.039 -.0818357 -.0021553
d84 -.0384709 .0203144 -1.89 0.058 -.0782991 .0013573
d85 -.0432498 .0202458 -2.14 0.033 -.0829434 -.0035563
d86 -.027382 .0203863 -1.34 0.179 -.0673511 .0125872
d87 0 (omitted)
_cons 1.02764 .0299499 34.31 0.000 .9689201 1.086359
sigma_u .40092789
sigma_e .35099001
rho .56612235 (fraction of variance due to u_i)
F test that all u_i=0: F(544, 3805) = 9.64 Prob > F = 0.0000
RIZAUDIN SAHLAN 35
4.4 RANDOM EFFECTS MODELS
RIZAUDIN SAHLAN 36
4.4 RANDOM EFFECTS MODELS
• The pooled OLS SE are the usual OLS SE, and these
underestimate the true SE because they ignore the positive
serial correlation; we report just for comparison only.
• The 𝑒𝑥𝑝𝑒𝑟 coefficient for RE is different to pooled OLS, and
both 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 and 𝑢𝑛𝑖𝑜𝑛 coefficients falls in RE estimation.
• The estimate of 𝜃 for RE estimation is 𝜃 = 0.643 which help
explain why, on time-varying variables, the RE estimates lie
closer to the FE estimates than the pooled OLS estimates.
RIZAUDIN SAHLAN 37
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
• Because FE allows arbitrary correlation between 𝑢𝑖 and the
𝑥𝑗,𝑖𝑡 while not for RE, the FE is widely thought to be more
convincing tool for estimating ceteris paribus effects.
• Still, RE is applied in certain situation.
• From example WAGEPAN analysis, obviously, if the key
explanatory variable is constant over time, we cannot use FE
to estimate of that variables.
• We can only use RE because we are willing to assume the
unobserved effect is uncorrelated with all explanatory
variables.
• RE is preffered to pooled OLS because RE is generally more
efficient.
RIZAUDIN SAHLAN 38
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
• It is still fairly common to see researcher apply both RE and
FE, and then formally test for statistically significant
differences in the coefficient on the time-varying explanatory
variables.
• Hausman (1978) first proposed such test.
• Some econometric packages routinely compute Hausman test
under the full set of RE assumptions.
• The idea, one uses RE estimates unless the Hausman test
reject the condition of;
RIZAUDIN SAHLAN 39
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
• In practice, a failure to reject means either that RE and FE
estimates are sufficiently close so that it does not matter
which is used, or the sampling variation is so large in the FE
estimates .
• The null for Hausman test is that both FE and RE is consistent.
• A rejection of null in Hausman test mean that the key RE
assumption, Eq(4.12) is false and then the FE estimates are
used.
• We again use the data WAGEPAN.dta to estimate a wage
equation for men.
RIZAUDIN SAHLAN 40
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
• The dependent variable is log(𝑤𝑎𝑔𝑒) and explanatory
variables is 𝑒𝑑𝑢𝑐, 𝑏𝑙𝑎𝑐𝑘, 𝑖𝑠𝑝, 𝑒𝑥𝑝𝑒𝑟, 𝑒𝑥𝑝𝑒𝑟 2 ,
𝑚𝑎𝑟𝑟𝑖𝑒𝑑,𝑢𝑛𝑖𝑜𝑛, and full set of year dummies.
• We use the Hausman test to decide which panel model is fit
to our data, the FE or RE.
RIZAUDIN SAHLAN 41
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
Random-effects GLS regression Number of obs = 4,360
Group variable: nr Number of groups = 545
sigma_u .32460314
sigma_e .35099001
rho .46100215 (fraction of variance due to u_i)
RIZAUDIN SAHLAN 42
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
Fixed-effects (within) regression Number of obs = 4,360
Group variable: nr Number of groups = 545
Fixed Effects
R-sq: Obs per group:
within = 0.1806 min = 8
between = 0.0005 avg = 8.0
overall = 0.0635 max = 8
F(10,3805) = 83.85
corr(u_i, Xb) = -0.1212 Prob > F = 0.0000
educ 0 (omitted)
black 0 (omitted)
hisp 0 (omitted)
exper .1321464 .0098247 13.45 0.000 .1128842 .1514087
expersq -.0051855 .0007044 -7.36 0.000 -.0065666 -.0038044
married .0466804 .0183104 2.55 0.011 .0107812 .0825796
union .0800019 .0193103 4.14 0.000 .0421423 .1178614
d81 .0190448 .0203626 0.94 0.350 -.0208779 .0589674
d82 -.011322 .0202275 -0.56 0.576 -.0509798 .0283359
d83 -.0419955 .0203205 -2.07 0.039 -.0818357 -.0021553
d84 -.0384709 .0203144 -1.89 0.058 -.0782991 .0013573
d85 -.0432498 .0202458 -2.14 0.033 -.0829434 -.0035563
d86 -.027382 .0203863 -1.34 0.179 -.0673511 .0125872
d87 0 (omitted)
_cons 1.02764 .0299499 34.31 0.000 .9689201 1.086359
sigma_u .40092789
sigma_e .35099001
rho .56612235 (fraction of variance due to u_i)
F test that all u_i=0: F(544, 3805) = 9.64 Prob > F = 0.0000
RIZAUDIN SAHLAN 43
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
Hausman Test
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
FE_Model RE_Model Difference S.E.
chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 26.22
Prob>chi2 = 0.0001
(V_b-V_B is not positive definite)
RIZAUDIN SAHLAN 44
4.5 RANDOM EFFECTS OR FIXED EFFECTS?
• The Hausman test show that the null hypothesis successfully
rejected even at 1% significance level.
• That means, the FE model is preferable than the RE model.
RIZAUDIN SAHLAN 45