Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views11 pages

Panel Data Modeling and Estimation Process

This document outlines the panel data modeling and estimation process, emphasizing a sequential methodology to address issues in panel data analysis. It discusses unobserved heterogeneity, basic linear panel models (fixed-effects, random-effects, and pooled models), and the testing procedures for individual-specific effects, including F-tests and Hausman tests. The document also addresses the structure of disturbance terms and the importance of testing for heteroskedasticity and other violations of standard assumptions in panel data analysis.

Uploaded by

jibril
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Panel Data Modeling and Estimation Process

This document outlines the panel data modeling and estimation process, emphasizing a sequential methodology to address issues in panel data analysis. It discusses unobserved heterogeneity, basic linear panel models (fixed-effects, random-effects, and pooled models), and the testing procedures for individual-specific effects, including F-tests and Hausman tests. The document also addresses the structure of disturbance terms and the importance of testing for heteroskedasticity and other violations of standard assumptions in panel data analysis.

Uploaded by

jibril
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Appendix 3: Panel data modeling and estimation process

Our final baseline specification has required adopting a sequential approach in order to deal
with number of issues arising with panel data analysis. Our methodology process heavily
leans on Cameron and Trivedi’s (2009) book Microeconometrics using Stata, Christopher F.
Baum (2013) lectures of Financial Econometrics (Boston College) and Torres’ (2007)
lectures of Panel data analysis using Stata (Princeton). We have chosen to use the
conventional notation of Cameron and Trivedi’s (2009) book throughout our study because
there’s no clear consensus among researchers for the use of a universal notation. In addition,
the different tests and estimations have been performed under the statistical software Stata®.
In this section, we will review in order: (a) the ‘heterogeneity bias problem’, (b) three basic
panel models, (c) different model tests, (d) the error/disturbance structure and (e) the
estimation issue.

A. Unobserved heterogeneity

As explicitly mentioned by Cameron and Trivedi (2009) ‘the goal of a linear regression is to
estimate the parameters of the linear conditional mean 𝐸 𝑦 𝑥 = 𝑥 ! 𝛽 =   𝛽! 𝑥! +   𝛽! 𝑥! +
𝛽! 𝑥! + ⋯ + 𝛽! 𝑥! 1.  However, any regression model may suffer from the ‘omitted variable
bias’, meaning that unobserved individual or time-specific factors might influence the
regression outcome beyond the defined regressors (Baum, 2013, Woolridge, 2012). Not
controlling for this issue, which amount to assume that units and time periods are
homogeneous in levels, can involve misspecification in the model and subsequently
biased/inconsistent estimates (Baum, 2013). As mentioned by Baltagi (2005), panel data
models can appropriately deal with unobserved heterogeneity and capture his net effect.

B. Basic linear panel data models

We consider three basic panel models; two individual-effects models, namely the covariance
model and error component model, and a pooled/ population average model. After being
modeled, the different methods will be subject to several statistic tests to determine their
relevance for our panel data.

1 Cameron, A.C., Trivedi, P.K. (2009), Microeconometrics using Stata, Stata Press Publication, p. 80
1. The covariance model or fixed-effect model (FE)2

𝒚𝒊𝒕 =   𝒂𝒊   + 𝒙!𝒊𝒕 𝜷 + 𝜺𝒊𝒕     for i = 1,…,N et t = 1,…,T (1)

Where: 𝛽! is a constant term, 𝑥!" is a (kx1) vector of explanatory variables,  𝑎!   are random
individual specific effects and 𝜀!"   are idiosyncratic errors term with 𝜀!"   ~  𝑖. 𝑖. 𝑑  (0, 𝜎 ! ). The
model has (N+K) parameters.

The fixed-effects (FE) model is based on the following assumptions:

a) 𝑎!   are permitted to be correlated with the regressors 𝑥!"


b) We assume strict exogeneity, that is 𝐸(𝜀!" 𝑎!   , 𝑥!" )=0.

The fixed effect model accounts for time-invariant unobserved features of the cross-sectional
units in order to obtain consistent estimates of the marginal effect of the regressors on
𝐸(𝑦!" 𝑎!   , 𝑥!" ) (Cameron and Trivedi, 2009, Torres, 2007). The FE method control for these
differences between cross-sectional units by including individual specific intercepts (=𝑎!   )
while assuming a constant variance across individuals (Wooldridge, 2012). The regression is
estimated with an Ordinary Least Square (OLS) estimator3.

2. The error component model or random-effect model (RE)

𝒚𝒊𝒕 =   𝒙!𝒊𝒕 𝜷 + (𝒂𝒊   + 𝜺𝒊𝒕   )   for i = 1,…,N et t = 1,…,T (2)

In the random effect model, it is assumed that the individual specific effects 𝑎!   are variables
independently distributed of 𝑥!" . In order to capture this individual heterogeneity, the RE
method estimates error variances specific to cross-sectional units (Park, 2011).
Consequently,  𝑎!    is treated as a component of the composite error term, which is defined as
follow: 𝑢!" = 𝛼! + 𝜀!" where 𝑎!   ~  𝑖. 𝑖. 𝑑  (𝛼, 𝜎!! ) and 𝜀!"   ~  𝑖. 𝑖. 𝑑  (0, 𝜎 ! ) (Cameron and Trivedi,
2009).

2
Due to the incidental parameter problem, we do not specify a two-way fixed effects model with both
individual and time specific effects. Indeed, such a specification would increase the number of parameters
to be estimated, which subsequently imply a loss in the degree of freedom and thus less efficient
parameters estimates.
3
It can be performed either with the ‘Within’, ‘LSDV’ or ‘Between’ estimation technique.
The difference among units lies now in their individual disturbance term rather than in their
specific intercepts (Park, 2011). Assuming that 𝑎!   are purely random presumes the following
assumptions about the model

a) 𝑎!   are uncorrelated with the regressors 𝑥!"


b) We assume strict exogeneity, that is 𝐸 𝑢!" 𝑥!" = 0

Although the RE model reduces the number of parameters to be estimated (K instead of K+N
in FE model), it will produce inconsistent estimates if 𝑎!   and 𝑥!" are correlated because it
would imply that explanatory variables are correlated with the error term (Cameron and
Trivedi, 2009). So, the main question surrounding individual-effects models is to determine
whether these effects are correlated with regressors (or not) rather than knowing if it needs to
be imputed to the intercept or variance component (Greene, 2008). The regression is
estimated with a General Least Square (GLS) estimator.

3. Pooled or population-averaged (PA) model:

 𝒀𝒊𝒕 =   𝜷𝟎 +  𝒙!𝒊𝒕   𝜷 +   𝒖𝒊𝒕 for i = 1,…,N et t = 1,…,T (3)

The pooled model could be seen as a natural starting point where the data is pooled all
together and individual effects averaged out. Indeed, this basic regression does not include
any fixed or random effect but assume a common intercept 𝜷𝟎  for every cross-sectional units
and exogenous regressors 𝑥!" . The composite error equals 𝑢!" = 𝑎! − 𝛼 + 𝜀!" where
individual effects 𝑎! − 𝛼   are centered on zero (=0) and the idiosyncratic error
𝜀!" ~𝑖. 𝑖. 𝑑. (0, 𝜎 ! ) (Cameron and Trivedi, 2009)

The regression is estimated either by a pooled OLS or pooled FGLS technique. Like RE
estimators, pooled OLS provides consistent parameters estimates of 𝛽 if we can be certain
that the disturbance term 𝑢!" is uncorrelated with 𝑥!" (Baum, 2013, Cameron and Trivendi,
2009).

C. Testing for individual specific effects

In order to choose between the different panel models, we first need to test for the presence of
unobserved/individual specific effects (=𝛼! ). Fixed effects are tested with a Fischer (F) test
while random effects are explored with a Breusch and Pagan’s Lagrange Multiplier (LM) test
(Park, 2011). Following Park (2011), the former F-test settles whether fixed effects or simple
pooled OLS better fits our panel data whereas the LM test contrast the random effects with
pooled OLS.

C.1. Testing for fixed effects (F-test)

It tests for the null hypothesis that all individual intercepts are equal to zero, i.e. 𝐻! : 𝛼! = 0  in
the regression model 𝑦!" =   𝑎!   + 𝑥!"! 𝛽 + 𝜀!"   . More specifically, the result is an F-statistic (N-
1, NT-N-K) that quantifies by how much the goodness-of-fit has changed (Park, 2011). By
default, Stata’s FE estimator command xtreg,fe includes the F-test for fixed effects.

xtreg spreadgerm corspaaa ca ds debt budgetbal ir outdebt gdpgr , fe

F test that all u_i=0: F(8, 415) = 11.57 Prob > F = 0.0000

Here, the p-value is small enough (at <0.01 level) to reject the null hypothesis. So there is a
significant fixed effect and the FE model is thus preferred than a Pooled OLS model.

C.2. Testing for random effects (Breusch-Pagan LM test)

It tests for the null hypothesis that all individual specific variance components are zero, i.e.
𝐻! : 𝛼! = 0 in the regression model 𝑦!" =   𝑥!"! 𝛽 + (𝑎!   + 𝜀!"   ). After having run the random
effect model, we test for this specification thanks to Stata’s command xttest0.

xtreg spreadgerm corspaaa ca ds debt budgetbal ir outdebt gdpgr, re

xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

Test: Var(u) = 0

chibar2(01) = 41.77

Prob > chibar2 = 0.0000

Here, the p-value us small enough (at <0.01 level) to reject the null hypothesis. So, there’s a
significant random effect and the RE model is preferred than the Pooled OLS model.
C.3. Testing between FE and RE (Hausman Test)

The distinction between the covariance and error component model is crucial in panel data
analysis.

From an economic/ point of view, we have to question ourselves about countries’ potential
unobserved heterogeneity. Inspired by Reinhart (2010) ‘timeline of countries creditworthiness
and financial turmoil’, we believe factors like countries’ (i) serial default (=countries who
experienced multiple defaults), (ii) domestic debt (iii) serial pattern in the incidence of
international assistance programs, (iv) ramp-up in the short-term debt issuance and (v)
national banking crises could be included in the individuals’ specific effects and possibly be
correlated with the regressors. Furthermore, the nature of the sample’ cross-sectional units
may also influence our model choice. According to Baum (2013), FE model better fits with
observations related to a mutually exhaustive set of cross-sections. Like the fifty states in the
United-States, our nine countries nearly comprise the entire population of the Eurozone
former states.

From an econometric view, RE estimator have the advantage to secure more-efficient


coefficient estimates because it saves N-1 degrees of freedom (=or parameters to be
estimated) compared to its FE counterpart. Moreover, it offers the ability to estimate
coefficients of time-invariant regressors, a characteristic not shared by the within (=FE)
estimator (Cameron and Trivedi, 2009). Nevertheless, the RE model might suffer from the
over-identifying restriction which assumes that individual-specific effects are independently
distributed. If this additional orthogonality is violated, meaning that cross-sectional
characteristics are correlated with explanatory variables, the parameters estimated are
inconsistent and biased (Podestà, 2002). So, the crucial issue is to test for the existence of
such a correlation between the specific error term 𝛼!  and the regressors 𝑥!" . This will be
performed thanks to the Hausman test, which assess the appropriateness of the RE estimator4.

Indeed, it tests for the null hypothesis that individual-specific are random, i.e. 𝐸 𝛼! +
𝜀!" 𝑥!" = 0. More specifically, a Hausman test checks if there are no systematic differences
between the coefficient estimators of the two models (Baum, 2013). Under the null

4
The test is performed conditional on the specification of the model.
hypothesis, both estimators are consistent and estimators should display similar results
whereas under the alternative one estimator widely differs from the consistent estimator
(Cameron and Trivedi, 2009). In light of this, the RE estimator is consistent and more
efficient than the FE estimator under H0 while only FE remain consistent under the
alternative.

The Hausman command implements the Hausman test as follows:

xtreg spreadgerm corspaaa ca ds debt budgetbal ir outdebt gdpgr,fe

est store FE

xtreg spreadgerm corspaaa ca ds debt budgetbal ir outdebt gdpgr,re

est store RE

hausman FE RE

Test: Ho: difference in coefficients not systematic

chi2(8) = (b-B)'[(V_b-V_B)^(-1)](b-B)

= 47.13

Prob>chi2 = 0.0000

(V_b-V_B is not positive definite)

Here, the overall statistic 𝜒 ! (𝑘) has a p=0.0016. This leads to reject the null hypothesis for
any confidence level. So, the effects are fixed and the regression model should be an
individual FE model.

We now shift to some other complications arising from using paned data structures.

D. Model errors structure

D.1. Structure of the disturbance term

Until now, we have assumed that the idiosyncratic errors were generated in a spherical
manner and thus satisfied the classical OLS assumptions about homoscedasticity and
correlation (c.f.: i.i.d.). So, we have

a) 𝐸 𝑒!,! = 0

b)  𝑉𝑎𝑟 𝑒!,! = 𝜎 !
c)  𝐶𝑜𝑣  𝑒!,! 𝑒!,! = 0 if t≠s or i≠j

This is equivalent to consider that the default variance-covariance matrix (VCE) of the
disturbance terms can be written as (Stata Manual):

𝜎!Ι  0 0
!
E 𝑒𝑒 =   Ω! !"#$% = 0 𝜎!Ι 0
0 0 𝜎!Ι

However, panel data structures often violate these standard assumptions about the error
process (Podestà, 2002). So, we need to check for the assumptions concerning
homoskedasticity, cross-sectional correlation (=contemporaneous correlation) and
autocorrelation within units (=serial correlation). This is of primary importance in order to
avoid our findings to be statistical artifacts.

Let us start with the diagnostic of the residuals of our individual FE model.

D.2. Testing for heteroskedasticity

In many panel datasets, the variance among cross-sectional units can differ. Among the
reasons responsible for this phenomenon, we can quote differences in the scale of the
dependent variable between units. In consequence, we will perform a modified Wald test to
detect for the existence of groupwise heteroskedasticity in the residuals of our fixed-effect
regression. Under the null hypothesis, the variance of the error is the same for all individuals:
𝜎!! = 𝜎 !  ∀𝑖 = 1, … , 𝑁. We test this assumption in Stata thanks the user-written routine
xttest3 developed by C. Baum (2001).

.xttest3

Modified Wald test for groupwise heteroskedasticity

in cross-sectional time-series FGLS regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (9) = 13981.60

Prob>chi2 = 0.0000

Here, the overall statistic 𝜒 ! (𝑁) has a p=0.0000. This leads to strongly reject the null
hypothesis for any confidence level. So, a phenomenon of heteroskedascitcity is present.
D.3. Testing for cross-sectional correlation

A second deviation from i.i.d. errors could result from the contemporaneous correlation of
errors across units, i.e. 𝐸 𝑒!" 𝑒!"   ≠ 0  𝑝𝑜𝑢𝑟  𝑖 ≠ 𝑗5. To test for cross-sectional dependence in
the error term, we run a Breusch-Pagan LM test. Under the null hypothesis, the residual
correlation matrix is an identity matrix of order N, which means that the error terms are not
correlated across entities (Baum, 2001). We test this assumption in Stata thanks the user-
written routine xttest2 developed by C. Baum (2001).

. xttest2

Correlation matrix of residuals:

__e1 __e2 __e3 __e4 __e5 __e6 __e7 __e8 __e9

__e1 1.0000

__e2 0.3363 1.0000

__e3 0.4143 0.3018 1.0000

__e4 0.4273 -0.1871 -0.1178 1.0000

__e5 0.3253 0.5059 0.4809 -0.0427 1.0000

__e6 -0.1811 0.1642 0.3839 0.0159 0.4727 1.0000

__e7 0.3569 0.7009 0.5620 -0.3256 0.6610 0.2769 1.0000

__e8 0.5311 -0.0166 0.0415 0.8932 0.2096 0.1167 -0.1386 1.0000

__e9 0.6979 -0.0880 0.1483 0.7328 -0.0016 -0.2429 -0.0606 0.6785 1.0000

Breusch-Pagan LM test of independence: chi2(36) = 284.279, Pr = 0.0000

Based on 48 complete observations over panel units

Here, the overall statistic 𝜒 ! ((𝑁 𝑁 − 1 )/2) has a p=0.0000. This leads to strongly reject the
null hypothesis for any confidence level. So, the errors exhibit cross-sectional correlation.

D.4. Testing for autocorrelation within units

According to Torres (2007), serial correlation is responsible for too optimistic standard errors.
To check for this complication, we run a Wald test where the null hypothesis assumes no

5
According to Baltagi (2005), cross-sectional dependance is a complication particularly specific to long panels.
first-order autocorrelation. Should serial correlation be detected, we may replace the
individual identity matrices along the diagonal of Ω!"#$%&'  with more general structures to
allow for this correlation (Stata Manual, 2014). We test this assumption in Stata using the
user-written routine xtserial.

. xtserial spreadgerm corspaaa ca ds debt budgetbal ir outdebt gdpgr

Wooldridge test for autocorrelation in panel data

H0: no first-order autocorrelation

F( 1, 8) = 441.355

Prob > F = 0.0000

The P value (<0.01) leads us to strongly reject the null hypothesis and validate the presence of
autocorrelation of first order,

i.e. : 𝜀!,! = 𝜌 ∗ 𝜀!,!!! + 𝜂!,!    ; 𝜂!,! ~𝑖𝑖𝑑 0, 𝜎!!  where 𝜂!,! are incoming ‘shocks’.

So, our error structure is characterized by panel heteroskedasticity, autocorrelation and


contemporaneous correlation (HPAC) 6 .However, controlling for these standard errors
complications depends upon the nature of the panel under study.

In short panels (T fixed, N-> ∞), we can use alternative covariance matrix estimators and
obtain valid standard errors (Cameron and Trivedi, 2009). White’s (1980) robust standard
errors and Rogers’ (1983) clustered standard errors are the most popular. Besides being
heteroskedasticity-consistent like White’s robust SE, the cluster option provides the additional
feature to control for arbitrary autocorrelation (Hoechle, 2007)7. Yet, some conditions are
required for the use of these standard errors, notably that the errors are independent across
individuals (often assumed in short panels) as well as the respect of the asymptotic in N (N-
>∞).

6
The HPAC acronym is taken over from Blackwell (2005).
7
In Stata, robust and clustered standard errors are respectively obtained by using the options
vce(robust) and cluster(id), available in for most estimations commands.
In our case, time periods (T=48) are more numerous than the cross-sectional units (N=9). So,
our dataset is temporal dominant and can be characterized as a long panel (N is fixed, T->∞).
Since T is relatively larger than N, the asymptotics behind the correct functioning of robust
and cluster options is now violated. Consequently, long panels cannot rely on these option
methods and require putting some structure on the assumed error process, which is not the
case in short panels (Cameron and Trivedi, 2009)8. This emphasis on richer and more flexible
models of the disturbance term is paramount because it will guide us to different preferred
methods of estimation.

E. Estimation issue

Since a covariance (=individual fixed effects) model better fits our panel data (see sub-section
C), we need to focus on the following estimation methods: (a) feasible generalized least
square (FGLS) estimator (b) OLS with panel corrected standard errors (PCSE) estimation and
(c) FE (Within, LSDV) estimator (Blackwell, 2005). Nevertheless, the HAPC structure of our
disturbance term (see sub-section D) rules out the simple FE estimators, which do have the
appropriate options to deal with non-spherical errors. This leaves us with the two large-T-
consistent covariance matrix estimators, namely the Parks-Kmenta’s (1986) FGLS approach
and the Beck-Katz (1995) PCSE method (Hoechle, 2007)9.

The former uses an application of the GLS estimation that fits panel data models, namely the
FGLS estimator. This estimation strategy has the same optimal properties as GLS for panel
data but avoids the GLS assumption that specifies the covariance matrix Ω is known (Podestà,
2002). Instead, it uses an estimate of the variance-covariance matrix Ω to replace Ω in the next
formula, which gives us unbiased estimates of 𝛽 under very general conditions (Stata
Manual):
!!
𝛽 = 𝑋 ! Ω!! 𝑋 𝑋 ! Ω!! 𝑦

8
The error structure modeling will offer the possiblity to relax the over-restricted assumption of
independence of spatial units (N). According to Hoechle (2007), panel data are likely to exhibit cross-
sectional dependencies, especially in a cross-national context (case of our study). So it is of great advice to
deal with this complication.
9
We deliberately decided not to talk about the Driscoll and Kraay estimator, which applies nonparametric
corrections for the contemporaneous correlation. Modeling general forms of spatial dependence is of great
interest when the cross sectional dimension N of the panel gets large, which is not the case in our study.
Thus implementing this estimation method would render our process unnecessarily tricky.
However, Parks-Kmenta’s method requires that T is larger than N. Moreover, Beck and Katz
(1995) question the performance of FGLS in finite samples and claim that this method tends
to produce overconfident standard errors. So, the authors suggest using a classic OLS
estimation method with large-T-based standard errors that are corrected for the HPAC
complications, namely the PCSE’s (Beck and Katz, 1995).

But, Beck and Katz’ (1995) argument of overconfidence in standard errors under FGLS needs
to be tempered in our case. First, a general remark is that authors were unable to provide
analytic formulae for the degree of overconfidence, which oblige us to settle between the two
methods based on Monte Carlo experiments. Second, our panel data structure (N=9, T=48) is
one of the most favorable cases for Parks-Kmenta: Following Beck and Katz’ (1995) table
results of Monte Carlo experiments, Parks becomes more efficient than OLS when the
average cross-sectional correlation of the residuals rise to 0.50. Calculated from the
correlation matrix of residuals (see sub-section D.3 above) this number stands at 0.4289(≅
0.5) in our case. When we combine this finding with our favorable T/N ratio, the results
indicate a 13% efficiency gain in favor of the FGLS estimation relative to OLS. This confirms
Cameron and Trivedi’s (2009) argument about FGLS estimator efficiency in large T datasets

In conclusion, we will use an FGLS estimation method for our regressions model. Xtgls
command with the adequate options run this estimation process in Stata.

You might also like