Models for Panel Data
Panel Data
In panel data the same cross-sectional unit is surveyed our time. In short, panel data have space
as well as time dimensions. There are other names for panel data, such as pooled data,
combination of time series and cross-section data, micro-panel data, longitudinal data. Panel
data are now being increasingly used in economic research.
Advantages of Panel Data over Cross-Section or Time Series Data
Baltagi lists the following the advantages of panel data
Since panel data relate to individuals, firms, states, countries etc. over time, there is
bound to be heterogeneity in these units. The techniques of panel data estimation can
take such heterogeneity explicitly into account by allowing for individual –specific
variables.
Panel data give more informative data, more variability, less co linearity among variables
more degrees of freedom and more efficiency.
Panel data are better suited to study the dynamics of change.
Panel data can better detect and measure effects that simply cannot be observed in pure
cross-section or pure time series data.
Panel data enables us to study more complicated behavioral models.
Panel data can minimize the bias.
Fixed Effects or Least-Square Dummy Variable (LSDV) Regression Model
Let and be the observations for the unit, be a column of ones, and let be
associated vector of disturbances. Then
which can be written as
where is dummy variable indication the unit. Let the matrix,
Then, assembling all rows gives
This model is usually referred to as the least squares dummy variable (LSDV) model. This
model is a classical regression model, so it can be estimated by ordinary least squares with
regressors in and column in , as a multiple regression with parameters. The least
squares estimator of as
Where, . This amounts to a least squares regression using the transformed
data and .
The structure of is particularly convenient, its columns are orthogonal, so
Models for Panel Data ~ 1 of 6
Each matrix on the diagonal is
Therefore, the least squares regression of on is equivalent to a regression of
on , where and are the scalar and vector of means of and over the
observations for group . The dummy variable coefficients can be recovered from the other
normal equation in the partitioned regression
This implies that for each ,
The appropriate estimator of the asymptotic covariance matrix for is
Which uses the second moment matrix with now expressed as deviations from their
respective group means. The disturbance variance estimator is
The residual is
Thus, the numerator in is exactly the sum of squared residuals using the LS slopes and the
data in group mean deviation form.
Testing the Significance of the Group Effects
Let us consider the hypothesis that
Under the null hypothesis of equality, the efficient estimator is pooled least squares. To test the
hypothesis we use the following test statistic
Where indicates the dummy variable model and pooled indicates the pooled or restricted
model with only a single overall constant term. If , then we reject the null
hypothesis, otherwise we accept it.
The Within and Between Groups Estimators
We can formulate a pooled regression model in three-ways. First, the original formulation is
In terms of deviations from the group means,
And in terms of group means,
Models for Panel Data ~ 2 of 6
From the above models we estimate . In equation , the moments would accumulate
variation about the overall means, and and we would use the total sums of squares and
cross products
For equation , since the data are in deviations already, the means of and
are zero. The moment matrices are within- groups sums of squares and cross products,
Finally, for equation , the mean of group means is the overall mean. The moments matrices
are the between groups sums of squares and cross-products i.e. the variation of the group means
around the overall means
We know that,
Therefore, the least squares estimators is
The within group estimator is
The between group estimator is
Fixed Time and Group Effect
The least squares dummy variable approach can be extended to include a time specific effect as
well. One way to formulate the extended model is simply to add the time effect as in
A symmetric form of the model is
Where a full and effects are included but the restrictions are imposed.
Least squares estimates of the slopes in this model are obtained by regression of
Where the period specific and overall means are
Models for Panel Data ~ 3 of 6
The overall constant and the dummy variable coefficients can then be recovered from the normal
equation as
The estimated asymptotic covariance matrix for is computed using the sums of squares and
cross products of computed in and
Panel Data Model for Random Effect
If the unobserved individual heterogeneity, however formulated can be assumed to be
uncorrelated with the included variables, then the model may be formulated as
In random effects approach specifies that is a group specific random element, similar to
expect that for each group, there is but a single draw that enters the regression identically in each
period. We assume further that
The formulation of the model in blocks of observations for group . For these
observations let,
And
From equation i.e. in error components model
For the observations for unit , let then
Models for Panel Data ~ 4 of 6
where, is a column vector of 1’s. Now for observations
Generalized Least Squares
To estimate the parameter of panel data (random effect) we have to use GLS. The generalized
least squares estimator is
For estimating we have to first characterized and we need
We need , which is
The transformation of and for GLS is therefore,
If , then we use LSDV. It can be shown that the GLS estimator is like the OLS estimator, a
matrix weighted average of the within and between units estimators
If , then generalized least squares is identical to ordinary least squares. This situation would
occur if were zero, then we use OLS method. If , then the estimator is the dummy
variable estimator.
Feasible Generalized Least Squares (when is unknown)
Models for Panel Data ~ 5 of 6
If the variance components are unknown, then we use feasible generalized least square
procedure and we first estimate the disturbance variances. A heuristic approach to estimation of
the variance components is as follows,
Therefore, taking deviations from the group means removes the heterogeneity
If must be estimated implies that the LSDV estimator is consistent, indeed, unbiased in
general we make the degrees of freedom correction and use the LSDV residuals in
We have such estimators, so we average them to obtain
The unbiased estimator is
Testing for Random Effects
Breusch and Pagan have devised a Lagrange multiplier test for the random effects model
based on the OLS residuals. Consider the hypothesis
The test statistic is
Under the null hypothesis, LM is distributed as chi-squared with one degree of freedom.
Models for Panel Data ~ 6 of 6