Teklebirhan Alemnew (Assistant Professor)
[email protected]
AAU, 2024
Cont…
In this part we will explore:
Introduction
Pooled OLS
First difference estimator
The fixed effect model
The random effect model
Hausman test
Illustration using real data
By: Teklebirhan A. 2
5.1. Introduction
Panel data are a type of longitudinal data, or data
collected at different points in time from the same
observations.
When the same subjects are measured at different
points of time.
Often loosely use the term panel data to refer to any data
set that has both a cross-sectional and a time-series
dimension.
By: Teklebirhan A. 3
Cont…
In other words, data sets with repeated observations over the
same units of individual, firm, or country.
More precisely, it’s a data following the same cross-section
units over time
Otherwise it’s a pooled cross-section
The data structure looks like this
By: Teklebirhan A. 4
Cont…
By: Teklebirhan A. 5
Cont…
Availability of repeated observation over the same units
allows;
To handle more realistic models than a single cross-section
or single time series
But complicated analysis in
Non-linear and dynamic models
Suffer from missing observation or attrition
Even if attrition is random, standard analysis has to be
adjusted
By: Teklebirhan A. 6
5.2. Advantages of Panel Data
An important advantage of panel data compared to time
series or cross-sectional data sets is that it allows identification
of certain parameters or questions, without the need to make
restrictive assumptions.
For example, panel data make it possible to analyze
changes on an individual level.
That is, panel data are not only suitable to model or explain
why individual units behave differently but also to model
why a given unit behaves differently at different time periods
(for example, because of a different past).
By: Teklebirhan A. 7
Cont…
Efficiency of Parameter Estimators
Because panel data sets are typically larger than cross-
sectional or time series data sets, and explanatory variables
vary over two dimensions (individuals and time) rather than
one, estimators based on panel data are quite often more
accurate than from other sources.
In other words, if one is interested in changes from one period
to another, a panel will yield more efficient estimators than a
series of cross-sections.
By: Teklebirhan A. 8
Cont…
Identification of Parameters
It reduces identification problems.
Although this advantage may come under different headings,
in many cases it involves identification in the presence of
endogenous regressors or measurement error, robustness to
omitted variables and the identification of individual
dynamics.
Omitted variable bias arises if a variable that is correlated
with the included variables is excluded from the model.
By: Teklebirhan A. 9
Cont…
A classic example is the estimation of production functions
(Mundlak, 1961).
Management quality is an input but it is unobservable.
If panel data are available, this problem can be resolved by
introducing a firm specific effect and considering this as a
fixed unknown parameter.
In a similar way, a fixed time effect can be included in the
model to capture the effect of all (observed and unobserved)
variables that do not vary over the individual units.
By: Teklebirhan A. 10
Cont…
This illustrates the proposition that panel data can reduce the
effects of omitted variable bias, or – in other words –
estimators from a panel data set may be more robust to an
incomplete model specification.
By: Teklebirhan A. 11
Cont…
Why Analyze Panel Data?
a) We are interested in describing change over time, even at
individual level
b) Panel models can be used to inform policy – e.g. health, trend
of fertility, poverty
c) Multiple observations on each unit can provide superior
estimates as compared to single cross-sectional models of
association
d) We want to estimate causal models
By: Teklebirhan A. 12
Cont…
Classical Linear Regression Model (CLRM) states some
precondition for the use of OLS. But, if some of these
assumptions are violated, OLS estimates are biased,
inconsistent and inefficient.
By: Teklebirhan A. 13
Cont…
One solution for this problem is to use Panel Data Analysis
(Fixed Effect Model).
This model accounts for the problem of omitted variables
in our estimation.
Suppose that we are interested to model panel data with
unobserved fixed effect as follow:
Motivation: Unobserved heterogeneity
By: Teklebirhan A. 14
Cont…
We have there cases:
Case-1: If the unobserved heterogeneity is correlated with one
or more of the explanatory variables, OLS parameter
estimates are biased and inconsistent.
By: Teklebirhan A. 15
Cont…
Case-2: If the unobserved heterogeneity is uncorrelated with
the explanatory variables (Xi), OLS is unbiased even in a
single cross-section.
Case-3: If we have more than one observation on any unit, the
errors will be correlated and OLS estimates will be inefficient.
“Unobserved effects” means that one or some of the
explanatory variables are unobservable:
for example, consumption choice of one flavor of ice cream
over another is a function of personal preference, but
preference is unobservable.
By: Teklebirhan A. 16
Cont…
How can we estimate the above model which
includes individual heterogeneity/unobserved
effect/ fixed effect/ unobserved heterogeneity?
By: Teklebirhan A. 17
Numerical Example
Notice that education, black, and Hispanic are not change
over time.
By: Teklebirhan A. 18
Cont…
Such data which are collected from the same individuals,
firms, factories over a periods of time is called panel Data.
In this section, we will see the motivation for panel data and
the various econometric methods of analyzing panel data.
By: Teklebirhan A. 19
1) Pooled OLS Regression
By: Teklebirhan A. 20
Cont…
By: Teklebirhan A. 21
Cont…
. reg lwage educ black hisp exper expersq married union
Source SS df MS Number of obs = 4,360
F(7, 4352) = 142.61
Model 230.719766 7 32.9599665 Prob > F = 0.0000
Residual 1005.80988 4,352 .231114402 R-squared = 0.1866
Adj R-squared = 0.1853
Total 1236.52964 4,359 .283672779 Root MSE = .48074
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ .0993878 .0046776 21.25 0.000 .0902173 .1085583
black -.1438417 .0235595 -6.11 0.000 -.1900303 -.0976531
hisp .015698 .0208112 0.75 0.451 -.0251026 .0564985
exper .0891791 .010111 8.82 0.000 .0693563 .1090019
expersq -.0028487 .0007074 -4.03 0.000 -.0042354 -.0014619
married .1076656 .0156965 6.86 0.000 .0768925 .1384387
union .1800726 .0171205 10.52 0.000 .1465076 .2136375
_cons -.0347057 .064569 -0.54 0.591 -.1612938 .0918824
By: Teklebirhan A. 22
Cont…
By: Teklebirhan A. 23
2) First Difference Estimator
By: Teklebirhan A. 24
Cont…
By: Teklebirhan A. 25
Cont…
By: Teklebirhan A. 26
Cont…
By: Teklebirhan A. 27
Cont…
. sort nr year
. regress D.(lwage educ black hisp exper expersq married union)
note: D.educ omitted because of collinearity
note: D.black omitted because of collinearity
note: D.hisp omitted because of collinearity
note: D.exper omitted because of collinearity
Source SS df MS Number of obs = 3,815
F(3, 3811) = 5.36
Model 3.15766207 3 1.05255402 Prob > F = 0.0011
Residual 748.036267 3,811 .19628346 R-squared = 0.0042
Adj R-squared = 0.0034
Total 751.193929 3,814 .196956982 Root MSE = .44304
D.lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ
D1. 0 (omitted)
black
D1. 0 (omitted)
hisp
D1. 0 (omitted)
exper
D1. 0 (omitted)
expersq
D1. -.0038824 .0013863 -2.80 0.005 -.0066004 -.0011644
married
D1. .0381377 .0229283 1.66 0.096 -.0068152 .0830905
union
D1. .0427878 .0196575 2.18 0.030 .0042477 .081328
_cons .11575 .0195867 5.91 0.000 .0773487 .1541514
By: Teklebirhan A. 28
Cont…
By: Teklebirhan A. 29
3) FIXED EFFECT MODEL
The fixed effects model is a simply linear regression model in
which the intercept terms vary over the individual units i, i.e.
We can write this in the usual regression framework by
including a dummy variable for each unit i in the model.
where and 0 elsewhere. We thus have a set of N
dummy variables in the model.
By: Teklebirhan A. 30
Cont…
By: Teklebirhan A. 31
Cont…
By: Teklebirhan A. 32
Cont…
By: Teklebirhan A. 33
Cont…
By: Teklebirhan A. 34
Cont…
By: Teklebirhan A. 35
Cont…
36
By: Teklebirhan A.
Cont…
. xtreg lwage educ black hisp exper expersq married union, fe
note: educ omitted because of collinearity
note: black omitted because of collinearity
note: hisp omitted because of collinearity
Fixed-effects (within) regression Number of obs = 4,360
Group variable: nr Number of groups = 545
R-sq: Obs per group:
within = 0.1780 min = 8
between = 0.0005 avg = 8.0
overall = 0.0638 max = 8
F(4,3811) = 206.38
corr(u_i, Xb) = -0.1139 Prob > F = 0.0000
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ 0 (omitted)
black 0 (omitted)
hisp 0 (omitted)
exper .1168467 .0084197 13.88 0.000 .1003392 .1333542
expersq -.0043009 .0006053 -7.11 0.000 -.0054876 -.0031142
married .0453033 .0183097 2.47 0.013 .0094056 .081201
union .0820871 .0192907 4.26 0.000 .044266 .1199083
_cons 1.06488 .0266607 39.94 0.000 1.012609 1.11715
sigma_u .4000539
sigma_e .35125535
rho .5646785 (fraction of variance due to u_i)
F test that all u_i=0: F(544, 3811) = 9.71 Prob > F = 0.0000
By: Teklebirhan A. 37
4) RANDOM EFFECT MODEL
The random effects (EGLS) estimator, combining the
information from the between and within dimensions in an
efficient way.
By: Teklebirhan A. 38
Cont…
By: Teklebirhan A. 39
Cont…
By: Teklebirhan A. 40
Cont…
By: Teklebirhan A. 41
Cont…
By: Teklebirhan A. 42
Cont…
. xtreg lwage educ black hisp exper expersq married union, re
Random-effects GLS regression Number of obs = 4,360
Group variable: nr Number of groups = 545
R-sq: Obs per group:
within = 0.1774 min = 8
between = 0.1837 avg = 8.0
overall = 0.1808 max = 8
Wald chi2(7) = 943.95
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]
educ .1012246 .0089133 11.36 0.000 .0837549 .1186943
black -.1441307 .0476148 -3.03 0.002 -.237454 -.0508073
hisp .0201511 .0426011 0.47 0.636 -.0633456 .1036477
exper .1121195 .0082609 13.57 0.000 .0959285 .1283105
expersq -.0040689 .0005918 -6.88 0.000 -.0052288 -.0029089
married .0627951 .0167729 3.74 0.000 .0299209 .0956693
union .1073789 .01783 6.02 0.000 .0724327 .142325
_cons -.1074643 .1107057 -0.97 0.332 -.3244435 .1095149
sigma_u .32456727
sigma_e .35125535
rho .46057172 (fraction of variance due to u_i)
By: Teklebirhan A. 43
Compare the estimates
. estimates table OLS Random Fixed
Variable OLS Random Fixed
educ .09938779 .10122462 (omitted)
black -.14384171 -.14413068 (omitted)
hisp .01569798 .02015107 (omitted)
exper .08917907 .1121195 .11684669
expersq -.00284866 -.00406885 -.00430089
married .10766558 .0627951 .04530332
union .18007257 .10737886 .08208713
_cons -.03470569 -.1074643 1.0648798
The above regression result shows that the random effect
estimates are larger than the pooled OLS estimates.
There is also a significant difference in the standard errors.
By: Teklebirhan A. 44
Hausman test
By: Teklebirhan A. 45
Cont…
By: Teklebirhan A. 46
Cont…
By: Teklebirhan A. 47
Cont…
. hausman Fixed Random
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Fixed Random Difference S.E.
exper .1168467 .1121195 .0047272 .0016276
expersq -.0043009 -.0040689 -.000232 .0001269
married .0453033 .0627951 -.0174918 .0073427
union .0820871 .1073789 -.0252917 .0073636
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 31.45
Prob>chi2 = 0.0000
By: Teklebirhan A. 48
Cont…
By: Teklebirhan A. 49
Cont…
Since Fixed Effects Model is wiped-out the time invariant
variable, how we can use time invariant variables in Fixed
effects model?
Simply, by add intersections (transforming the variable in
to time variant)
Lets assume that we want to examine the effect of education
on wage
First, generate a time intersection from education variable
for each round of the panel data except the first round – to
make it time variant
Then, estimate FE model
By: Teklebirhan A. 50
Cont…
. xtreg lwage d81 d82 d83 d84 d85 d86 d87 exper expersq married union, fe
note: exper omitted because of collinearity
Fixed-effects (within) regression Number of obs = 4,360
Group variable: nr Number of groups = 545
R-sq: Obs per group:
within = 0.1806 min = 8
between = 0.0286 avg = 8.0
overall = 0.0888 max = 8
F(10,3805) = 83.85
corr(u_i, Xb) = -0.1222 Prob > F = 0.0000
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
d81 .1511912 .0219489 6.89 0.000 .1081584 .194224
d82 .2529709 .0244185 10.36 0.000 .2050963 .3008454
d83 .3544437 .0292419 12.12 0.000 .2971125 .4117749
d84 .4901148 .0362266 13.53 0.000 .4190894 .5611402
d85 .6174823 .0452435 13.65 0.000 .5287784 .7061861
d86 .7654966 .0561277 13.64 0.000 .6554532 .8755399
d87 .9250249 .0687731 13.45 0.000 .7901893 1.059861
exper 0 (omitted)
expersq -.0051855 .0007044 -7.36 0.000 -.0065666 -.0038044
married .0466804 .0183104 2.55 0.011 .0107811 .0825796
union .0800019 .0193103 4.14 0.000 .0421423 .1178614
_cons 1.426019 .0183415 77.75 0.000 1.390058 1.461979
sigma_u .39176195
sigma_e .35099001
rho .55472817 (fraction of variance due to u_i)
F test that all u_i=0: F(544, 3805) = 9.16 Prob > F = 0.0000
By: Teklebirhan A. 51
Cont…
The return to education increase over time and the return to
education has a positive effect on wage.
By: Teklebirhan A. 52
Cont…
By: Teklebirhan A. 53