Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views53 pages

Chapter - 5 - Panel Data Analysis

The document provides an overview of panel data analysis, including its definition, advantages, and various econometric methods such as Pooled OLS, Fixed Effect, and Random Effect models. It emphasizes the benefits of using panel data for identifying parameters and modeling changes over time, while addressing issues like omitted variable bias. The document also includes examples and results from regression analyses using real data.

Uploaded by

NATNAEL MENGISTU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views53 pages

Chapter - 5 - Panel Data Analysis

The document provides an overview of panel data analysis, including its definition, advantages, and various econometric methods such as Pooled OLS, Fixed Effect, and Random Effect models. It emphasizes the benefits of using panel data for identifying parameters and modeling changes over time, while addressing issues like omitted variable bias. The document also includes examples and results from regression analyses using real data.

Uploaded by

NATNAEL MENGISTU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Teklebirhan Alemnew (Assistant Professor)

[email protected]
AAU, 2024
Cont…
In this part we will explore:

 Introduction

 Pooled OLS

 First difference estimator

 The fixed effect model

 The random effect model

 Hausman test

 Illustration using real data


By: Teklebirhan A. 2
5.1. Introduction
 Panel data are a type of longitudinal data, or data
collected at different points in time from the same
observations.

 When the same subjects are measured at different


points of time.

 Often loosely use the term panel data to refer to any data
set that has both a cross-sectional and a time-series
dimension.
By: Teklebirhan A. 3
Cont…
 In other words, data sets with repeated observations over the
same units of individual, firm, or country.

 More precisely, it’s a data following the same cross-section


units over time

 Otherwise it’s a pooled cross-section

 The data structure looks like this

By: Teklebirhan A. 4
Cont…

By: Teklebirhan A. 5
Cont…
 Availability of repeated observation over the same units
allows;

 To handle more realistic models than a single cross-section


or single time series

 But complicated analysis in

 Non-linear and dynamic models

 Suffer from missing observation or attrition

 Even if attrition is random, standard analysis has to be


adjusted
By: Teklebirhan A. 6
5.2. Advantages of Panel Data
 An important advantage of panel data compared to time
series or cross-sectional data sets is that it allows identification
of certain parameters or questions, without the need to make
restrictive assumptions.

 For example, panel data make it possible to analyze


changes on an individual level.

 That is, panel data are not only suitable to model or explain
why individual units behave differently but also to model
why a given unit behaves differently at different time periods
(for example, because of a different past).
By: Teklebirhan A. 7
Cont…
 Efficiency of Parameter Estimators

 Because panel data sets are typically larger than cross-


sectional or time series data sets, and explanatory variables
vary over two dimensions (individuals and time) rather than
one, estimators based on panel data are quite often more
accurate than from other sources.

 In other words, if one is interested in changes from one period


to another, a panel will yield more efficient estimators than a
series of cross-sections.

By: Teklebirhan A. 8
Cont…
 Identification of Parameters

 It reduces identification problems.

 Although this advantage may come under different headings,


in many cases it involves identification in the presence of
endogenous regressors or measurement error, robustness to
omitted variables and the identification of individual
dynamics.

 Omitted variable bias arises if a variable that is correlated


with the included variables is excluded from the model.

By: Teklebirhan A. 9
Cont…
 A classic example is the estimation of production functions
(Mundlak, 1961).

 Management quality is an input but it is unobservable.

 If panel data are available, this problem can be resolved by


introducing a firm specific effect and considering this as a
fixed unknown parameter.

 In a similar way, a fixed time effect can be included in the


model to capture the effect of all (observed and unobserved)
variables that do not vary over the individual units.

By: Teklebirhan A. 10
Cont…
 This illustrates the proposition that panel data can reduce the
effects of omitted variable bias, or – in other words –
estimators from a panel data set may be more robust to an
incomplete model specification.

By: Teklebirhan A. 11
Cont…
 Why Analyze Panel Data?

a) We are interested in describing change over time, even at


individual level

b) Panel models can be used to inform policy – e.g. health, trend


of fertility, poverty

c) Multiple observations on each unit can provide superior


estimates as compared to single cross-sectional models of
association

d) We want to estimate causal models

By: Teklebirhan A. 12
Cont…
 Classical Linear Regression Model (CLRM) states some
precondition for the use of OLS. But, if some of these
assumptions are violated, OLS estimates are biased,
inconsistent and inefficient.

By: Teklebirhan A. 13
Cont…
 One solution for this problem is to use Panel Data Analysis
(Fixed Effect Model).

 This model accounts for the problem of omitted variables


in our estimation.

 Suppose that we are interested to model panel data with


unobserved fixed effect as follow:

 Motivation: Unobserved heterogeneity

By: Teklebirhan A. 14
Cont…

 We have there cases:

 Case-1: If the unobserved heterogeneity is correlated with one


or more of the explanatory variables, OLS parameter
estimates are biased and inconsistent.
By: Teklebirhan A. 15
Cont…
 Case-2: If the unobserved heterogeneity is uncorrelated with
the explanatory variables (Xi), OLS is unbiased even in a
single cross-section.

 Case-3: If we have more than one observation on any unit, the


errors will be correlated and OLS estimates will be inefficient.

 “Unobserved effects” means that one or some of the


explanatory variables are unobservable:

 for example, consumption choice of one flavor of ice cream


over another is a function of personal preference, but
preference is unobservable.
By: Teklebirhan A. 16
Cont…

How can we estimate the above model which


includes individual heterogeneity/unobserved
effect/ fixed effect/ unobserved heterogeneity?

By: Teklebirhan A. 17
Numerical Example

Notice that education, black, and Hispanic are not change


over time.
By: Teklebirhan A. 18
Cont…
 Such data which are collected from the same individuals,
firms, factories over a periods of time is called panel Data.

 In this section, we will see the motivation for panel data and
the various econometric methods of analyzing panel data.

By: Teklebirhan A. 19
1) Pooled OLS Regression

By: Teklebirhan A. 20
Cont…

By: Teklebirhan A. 21
Cont…
. reg lwage educ black hisp exper expersq married union

Source SS df MS Number of obs = 4,360


F(7, 4352) = 142.61
Model 230.719766 7 32.9599665 Prob > F = 0.0000
Residual 1005.80988 4,352 .231114402 R-squared = 0.1866
Adj R-squared = 0.1853
Total 1236.52964 4,359 .283672779 Root MSE = .48074

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0993878 .0046776 21.25 0.000 .0902173 .1085583


black -.1438417 .0235595 -6.11 0.000 -.1900303 -.0976531
hisp .015698 .0208112 0.75 0.451 -.0251026 .0564985
exper .0891791 .010111 8.82 0.000 .0693563 .1090019
expersq -.0028487 .0007074 -4.03 0.000 -.0042354 -.0014619
married .1076656 .0156965 6.86 0.000 .0768925 .1384387
union .1800726 .0171205 10.52 0.000 .1465076 .2136375
_cons -.0347057 .064569 -0.54 0.591 -.1612938 .0918824

By: Teklebirhan A. 22
Cont…

By: Teklebirhan A. 23
2) First Difference Estimator

By: Teklebirhan A. 24
Cont…

By: Teklebirhan A. 25
Cont…

By: Teklebirhan A. 26
Cont…

By: Teklebirhan A. 27
Cont…
. sort nr year

. regress D.(lwage educ black hisp exper expersq married union)


note: D.educ omitted because of collinearity
note: D.black omitted because of collinearity
note: D.hisp omitted because of collinearity
note: D.exper omitted because of collinearity

Source SS df MS Number of obs = 3,815


F(3, 3811) = 5.36
Model 3.15766207 3 1.05255402 Prob > F = 0.0011
Residual 748.036267 3,811 .19628346 R-squared = 0.0042
Adj R-squared = 0.0034
Total 751.193929 3,814 .196956982 Root MSE = .44304

D.lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ
D1. 0 (omitted)

black
D1. 0 (omitted)

hisp
D1. 0 (omitted)

exper
D1. 0 (omitted)

expersq
D1. -.0038824 .0013863 -2.80 0.005 -.0066004 -.0011644

married
D1. .0381377 .0229283 1.66 0.096 -.0068152 .0830905

union
D1. .0427878 .0196575 2.18 0.030 .0042477 .081328

_cons .11575 .0195867 5.91 0.000 .0773487 .1541514

By: Teklebirhan A. 28
Cont…

By: Teklebirhan A. 29
3) FIXED EFFECT MODEL
 The fixed effects model is a simply linear regression model in
which the intercept terms vary over the individual units i, i.e.

 We can write this in the usual regression framework by


including a dummy variable for each unit i in the model.

 where and 0 elsewhere. We thus have a set of N


dummy variables in the model.
By: Teklebirhan A. 30
Cont…

By: Teklebirhan A. 31
Cont…

By: Teklebirhan A. 32
Cont…

By: Teklebirhan A. 33
Cont…

By: Teklebirhan A. 34
Cont…

By: Teklebirhan A. 35
Cont…

36
By: Teklebirhan A.
Cont…
. xtreg lwage educ black hisp exper expersq married union, fe
note: educ omitted because of collinearity
note: black omitted because of collinearity
note: hisp omitted because of collinearity

Fixed-effects (within) regression Number of obs = 4,360


Group variable: nr Number of groups = 545

R-sq: Obs per group:


within = 0.1780 min = 8
between = 0.0005 avg = 8.0
overall = 0.0638 max = 8

F(4,3811) = 206.38
corr(u_i, Xb) = -0.1139 Prob > F = 0.0000

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ 0 (omitted)
black 0 (omitted)
hisp 0 (omitted)
exper .1168467 .0084197 13.88 0.000 .1003392 .1333542
expersq -.0043009 .0006053 -7.11 0.000 -.0054876 -.0031142
married .0453033 .0183097 2.47 0.013 .0094056 .081201
union .0820871 .0192907 4.26 0.000 .044266 .1199083
_cons 1.06488 .0266607 39.94 0.000 1.012609 1.11715

sigma_u .4000539
sigma_e .35125535
rho .5646785 (fraction of variance due to u_i)

F test that all u_i=0: F(544, 3811) = 9.71 Prob > F = 0.0000

By: Teklebirhan A. 37
4) RANDOM EFFECT MODEL

The random effects (EGLS) estimator, combining the


information from the between and within dimensions in an
efficient way.
By: Teklebirhan A. 38
Cont…

By: Teklebirhan A. 39
Cont…

By: Teklebirhan A. 40
Cont…

By: Teklebirhan A. 41
Cont…

By: Teklebirhan A. 42
Cont…
. xtreg lwage educ black hisp exper expersq married union, re

Random-effects GLS regression Number of obs = 4,360


Group variable: nr Number of groups = 545

R-sq: Obs per group:


within = 0.1774 min = 8
between = 0.1837 avg = 8.0
overall = 0.1808 max = 8

Wald chi2(7) = 943.95


corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]

educ .1012246 .0089133 11.36 0.000 .0837549 .1186943


black -.1441307 .0476148 -3.03 0.002 -.237454 -.0508073
hisp .0201511 .0426011 0.47 0.636 -.0633456 .1036477
exper .1121195 .0082609 13.57 0.000 .0959285 .1283105
expersq -.0040689 .0005918 -6.88 0.000 -.0052288 -.0029089
married .0627951 .0167729 3.74 0.000 .0299209 .0956693
union .1073789 .01783 6.02 0.000 .0724327 .142325
_cons -.1074643 .1107057 -0.97 0.332 -.3244435 .1095149

sigma_u .32456727
sigma_e .35125535
rho .46057172 (fraction of variance due to u_i)

By: Teklebirhan A. 43
Compare the estimates
. estimates table OLS Random Fixed

Variable OLS Random Fixed

educ .09938779 .10122462 (omitted)


black -.14384171 -.14413068 (omitted)
hisp .01569798 .02015107 (omitted)
exper .08917907 .1121195 .11684669
expersq -.00284866 -.00406885 -.00430089
married .10766558 .0627951 .04530332
union .18007257 .10737886 .08208713
_cons -.03470569 -.1074643 1.0648798

 The above regression result shows that the random effect


estimates are larger than the pooled OLS estimates.

 There is also a significant difference in the standard errors.


By: Teklebirhan A. 44
Hausman test

By: Teklebirhan A. 45
Cont…

By: Teklebirhan A. 46
Cont…

By: Teklebirhan A. 47
Cont…
. hausman Fixed Random

Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Fixed Random Difference S.E.

exper .1168467 .1121195 .0047272 .0016276


expersq -.0043009 -.0040689 -.000232 .0001269
married .0453033 .0627951 -.0174918 .0073427
union .0820871 .1073789 -.0252917 .0073636

b = consistent under Ho and Ha; obtained from xtreg


B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 31.45
Prob>chi2 = 0.0000

By: Teklebirhan A. 48
Cont…

By: Teklebirhan A. 49
Cont…
 Since Fixed Effects Model is wiped-out the time invariant
variable, how we can use time invariant variables in Fixed
effects model?

 Simply, by add intersections (transforming the variable in


to time variant)
 Lets assume that we want to examine the effect of education
on wage
 First, generate a time intersection from education variable
for each round of the panel data except the first round – to
make it time variant
 Then, estimate FE model
By: Teklebirhan A. 50
Cont…
. xtreg lwage d81 d82 d83 d84 d85 d86 d87 exper expersq married union, fe
note: exper omitted because of collinearity

Fixed-effects (within) regression Number of obs = 4,360


Group variable: nr Number of groups = 545

R-sq: Obs per group:


within = 0.1806 min = 8
between = 0.0286 avg = 8.0
overall = 0.0888 max = 8

F(10,3805) = 83.85
corr(u_i, Xb) = -0.1222 Prob > F = 0.0000

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

d81 .1511912 .0219489 6.89 0.000 .1081584 .194224


d82 .2529709 .0244185 10.36 0.000 .2050963 .3008454
d83 .3544437 .0292419 12.12 0.000 .2971125 .4117749
d84 .4901148 .0362266 13.53 0.000 .4190894 .5611402
d85 .6174823 .0452435 13.65 0.000 .5287784 .7061861
d86 .7654966 .0561277 13.64 0.000 .6554532 .8755399
d87 .9250249 .0687731 13.45 0.000 .7901893 1.059861
exper 0 (omitted)
expersq -.0051855 .0007044 -7.36 0.000 -.0065666 -.0038044
married .0466804 .0183104 2.55 0.011 .0107811 .0825796
union .0800019 .0193103 4.14 0.000 .0421423 .1178614
_cons 1.426019 .0183415 77.75 0.000 1.390058 1.461979

sigma_u .39176195
sigma_e .35099001
rho .55472817 (fraction of variance due to u_i)

F test that all u_i=0: F(544, 3805) = 9.16 Prob > F = 0.0000

By: Teklebirhan A. 51
Cont…
 The return to education increase over time and the return to
education has a positive effect on wage.

By: Teklebirhan A. 52
Cont…

By: Teklebirhan A. 53

You might also like