Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
77 views48 pages

Week 1

The document outlines a course on Panel Data Analysis, focusing on applied econometrics and the methodologies for analyzing panel data sets. It covers prerequisites, textbooks, course applications, and a detailed outline of topics such as fixed and random effects, instrumental variables, and hypothesis testing. The course aims to equip students with the necessary tools to apply econometric methods to real-world data, emphasizing the advantages of panel data in studying individual behavior and dynamics over time.

Uploaded by

magedhaggag36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views48 pages

Week 1

The document outlines a course on Panel Data Analysis, focusing on applied econometrics and the methodologies for analyzing panel data sets. It covers prerequisites, textbooks, course applications, and a detailed outline of topics such as fixed and random effects, instrumental variables, and hypothesis testing. The course aims to equip students with the necessary tools to apply econometric methods to real-world data, emphasizing the advantages of panel data in studying individual behavior and dynamics over time.

Uploaded by

magedhaggag36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Panel Data

Analysis
Lecture1
Introduction
Spring 2025
TS109
Mohamed Abdallah
Lecturer of Applied
Statistics&Economerics
[email protected]
01222596520
Panel Data Analysis

Overview
Panel Data Econometrics

This is an intermediate level, diploma. course in the area of


Applied Econometrics dealing with Panel Data. The
range of topics covered in the course will span a large
part of econometrics generally, though we are
particularly interested in those techniques as they are
adapted to the analysis of panel data sets.
Why a Course on ‘Panel Data?’

 Microeconometrics and applications –


contemporary broad field in
economics/econometrics
 Behavioral modeling
 Individual choice and response

 A platform for surveying econometric models


and methods – most of the field
 Various types
 Recent developments
Prerequisites

 Introduction to econometrics
 Applied statistics

We will examine many empirical applications.


You will apply the tools developed in the course.
Text Books
 Main text: Baltagi (2008);
read chapters 1,2
 Recommended: Greene
(2008); read chapters
1,2,9
 Suggested: Wooldridge
(2002); read chapters
1,2,10
Course Applications

 Problem sets
 Software: R , Eviews
 Questions and review as requested
Course Outlines
 Statistics and Regression
 Fixed and Random Effects
 Instrumental Variables, MDE, GMM
 The One-way Error Component Regression Model
 The Two-way Error Component Regression Model
 Hypothesis testing
 Heteroskedasticity
 Serial Correlation
 Term project: Application of method(s) developed in class
to a „live‟ data set. Details to be given in class. (25%)
 Attendance (10%).
 Midterm, in class, (25%)
 Final exam (40%)
Panel Data Analysis

1. Methodology
Econometrics: Modeling

 Theoretical foundations
 Microeconometrics and macroeconometrics
 Behavioral modeling
 Statistical foundations: Econometric
methods
 Mathematical elements: the usual
 „Model‟ building – the econometric model
 Mathematical elements
 The underlying truth – is there one?
Model Building in Econometrics

 Role of the assumptions


 Inference
 Parametric analysis
Estimation Platforms
 Model based
 Kernels and smoothing methods (nonparametric)
 Moments and quantiles (semiparametric)
 Likelihood and M- estimators (parametric)
 Methodology based (?)
 Classical – parametric
The Sample and Measurement

Population Measurement
Theory

Characteristics
Behavior Patterns
Choices
Classical Inference

Population Measurement

Econometrics
Characteristics
Imprecise inference about Behavior Patterns
the entire population –
sampling theory and Choices
asymptotics
Data Structures

 Observation mechanisms
 Non-experimental
 Active, experimental
 The „natural experiment‟
 Data types
 Cross section
 time series
 Panel or longitudinal data
Econometric Models

 Linear
 Static
 Dynamic
 Vector auto regressive (VAR)
 Structural models and demand systems
Estimation Methods and Applications
 Least squares etc. – OLS, GLS
 Maximum likelihood
 Instrumental variables and GMM
 Simulation based estimation
 Monte Carlo methods
Panel data
 These are Models that Combine Cross-
section and Time-Series Data

 In panel data the same cross-sectional unit


(industry, firm, country) is surveyed over time,
so we have data which is pooled over space as
well as time.
Reasons for using Panel Data
1. Panel data can take explicit account of individual-specific
heterogeneity (“individual” here means related to the
microunit)
2. By combining data in two dimensions, panel data gives
more data variation, less collinearity and more degrees
of freedom.
3. Panel data is better suited than cross-sectional data for
studying the dynamics of change. For example it is well
suited to understanding transition behaviour – for
example company bankruptcy or merger.
 4. Panel data is better at detecting and measuring
effects that cannot be observed in either cross-section
or time-series data.
 5. Panel data enables the study of more complex
behavioural models – for example the effects of
technological change, or economic cycles.
 6. Panel data can minimise the effects of aggregation
bias, from aggregating firms into broad groups.
Benefits of Panel Data
 Time and individual variation in behavior unobservable
in cross sections or aggregate time series
 Observable and unobservable individual heterogeneity
 Rich hierarchical structures
 More complicated models
 Features that cannot be modeled with only cross
section or aggregate time series data alone
 Dynamics in economic behavior
 Panel data regression models are based on panel data,
which are observations on the same cross-sectional, or
individual, units over several time periods.
 A balanced panel has the same number of time observations
for each cross-sectional unit.
 Panel data have several advantages over purely cross-
sectional or purely time series data. These include:
 (a) Increase in the sample size
 (b) Study of dynamic changes in cross-sectional units over time
 (c) Study of more complicated behavioral models, including
study of time-invariant variables
Where Do We Go From Here?
 Review of familiar classical procedures
 Fundamental, familiar regression extensions; common
effects models
 Endogeneity, instrumental variables, GMM estimation
 Dynamic models
 Models of heterogeneity
 Features of the linear, static and dynamic common
effects models
Panel Data Analysis

2. Econometric Methods
A Statistical Relationship
 A relationship of interest:
 Number of hospital visits: H = 0,1,2,…
 Covariates: x1=Age, x2=Sex, x3=Income, x4=Health

 Causality and covariation
 Theoretical implications of „causation‟
 Comovement and association
Models

 Conditional mean function: E[y | x]


 Other conditional characteristics – what is „the model?‟
 Conditional variance function: Var[y | x]
 Other conditional moments
 Conditional probabilities: P(y|x)
Using the Model
 Understanding the relationship:
 Estimation of quantities of interest such as
elasticities
 Prediction of the outcome of interest
 Control of the path of the outcome of interest
Panel Data Sets*

 Longitudinal data – „short panels‟


 National longitudinal survey of youth (NLS)
 British household panel survey (BHPS)
 Panel Study of Income Dynamics (PSID)
 Cross section time series – „long panels‟
 Grunfeld‟s investment data
 Penn world tables
 Financial data by firm, year – „huge panels‟
 rit – rft = i(rmt - rft) + εit, i = 1,…,many; t=1,…many
 Exchange rate data, essentially infinite T, large N
 Effects: i=  + vi
* See Baltagi, Chapter 1
Notation
 Fixed Effects – the „dummy variable model‟

y it = i +  x it + it

 Random Effects – the „error components model‟

y it = ai   x it + it + ui

Compound (“composed”) disturbance


Exogeneity
 Exogeneity
 E[εit|xit,ci]=0  Not sufficient for regression
 Doesn‟t imply how to estimate β
 Strict Exogeneity
 E[εit|xi1, xi2,…,xiT,ci]=0
 Can use first difference or fixed effects
 Cannot hold if xit contains lagged values of yit
 Suppose y is investment and x is a measure of profit.
We have i = 1…n companies and t = 1…T time
periods. Suppose we specify a simple econometric
model which says that investment depends on profit:

yit  a0  1 xit  uit (1)


 uit is a random error term: E (uit ) ~ N (0, σ2)
 Estimation of (1) depends on the assumptions that we
make about the intercept (a0), the slope coefficient (a1)
and the error term (uit ).
Pooled regression by OLS
This is estimation option 1 on the list. But pooled regression
may result in heterogeneity bias :
Pooled regression:
y
yit=a0+a1xit+uit

• • True model: Firm 4




• • True model: Firm 3
• •

• True model: Firm 2


• •

• • True model: Firm 1

x
Fixed and Random Effects

 Unobserved individual effects in regression: E[yit | xit, ci]


 Notation:
y it = x it + c i + it

 Linear specification:
 Fixed Effects: E[ci | Xi ] = g(Xi); effects are correlated with
included variables. Common: Cov[xit,ci] ≠0
 Random Effects: E[ci | Xi ] = μ; effects are uncorrelated with
included variables. If Xi contains a constant term, μ=0 WLOG.
Common: Cov[xit,ci] =0, but E[ci | Xi ] = μ is needed for the
full model
 However, panel models pose several estimation and
inference problems, such as heteroscedasticity,
autocorrelation, and cross-correlation in cross-sectional
units at the same point in time.

 The fixed effects model (FEM) and the random effects


model (REM), also known as the error components
model (ECM), are commonly used methods to deal with
one or more of these problems.
 In FEM, the intercept in the regression model is allowed
to differ among individuals to reflect the unique feature
of individual units.
 This is done by using dummy variables, provided we take care
of the dummy variable trap.
 The FEM using dummy variables is known as the least-
squares dummy variable model (LSDV).
 FEM is appropriate in situations where the individual-
specific intercept may be correlated with one or more
regressors, but consumes a lot of degrees of freedom
when N (the number of cross-sectional units) is very
large.
Assumptions for Asymptotics
 Convergence of moments involving cross section Xi.
 N increasing, T or Ti assumed fixed.
 “Fixed T asymptotics.
 Time series characteristics are not relevant (may be
nonstationary)
 If T is also growing, need to treat as multivariate time series.
 Strict exogeneity and dynamics. If xit contains yi,t-1 then
xit cannot be strictly exogenous. Xit will be correlated
with the unobservables in period t-1. (To be revisited
later.)
 Empirical characteristics of microeconomic data
Estimating β
 β is the partial effect of interest
 Can it be estimated (consistently) in the
presence of (unmeasured) ci?
 Does pooled least squares “work?”
 Strategies for “controlling for ci” using the sample
data
Balanced and Unbalanced Panels
 Distinction
 A notation to help with mechanics
zi,t, i = 1,…,N; t = 1,…,Ti
 The role of the assumption
 If all the cross-sectional units have the same
number of time series observations the panel is
balanced, if not it is unbalanced.
 Balanced, n=NT
Unbalanced: n   i=1 Ti
N

Short Term Agenda for Simple Effects Models

 Models with individual effects


 Interpretation of models
 Computation (practice) and estimation (theory)
 Extensions
 Nonstandard panels: Rotating, Pseudo-, Nested
 Generalizing the regression model
 Alternative estimators
 Methods
 Least squares: OLS, GLS, FGLS
 MLE and Maximum Simulated Likelihood
The Pooled Regression
 Presence of omitted effects
y it =x itβ+c i +εit , observation for person i at time t
y i =X iβ+c ii+ε i , Ti observations in group i
=X iβ+c i +ε i , note c i  (c i , c i ,...,c i )
y =Xβ+c +ε , Ni=1 Ti observations in the sample

 Potential bias/inconsistency of OLS – depends


on „fixed‟ or „random‟
Endogeneity
 Definition: E[ε|x]≠0
 Why not?
 Omitted variables
 Unobserved heterogeneity (equivalent to omitted
variables)
 Measurement error on the RHS (equivalent to
omitted variables)
How do panel data fit into this?
 We can use the usual models.
 We can use far more elaborate models
 We can study effects through time
 Observations are surely correlated.
 The same individual is observed more than once
 Unobserved heterogeneity that appears in the disturbance in a
cross section remains persistent across observations (on the
same „unit‟).
 Procedures must be adjusted.
 Dynamic effects are likely to be present.
Model Selection
 Regression models: Fit measure = R2
 Nested models: log likelihood, GMM criterion
function (distance function)
 Akaike information criterion=(logL – 2K)/N
 Bayes (Schwartz) information criterion = (logL-K(logN))/N
Estimation of the Parameters
 Least squares, LAD, other estimators – we will
focus on least squares
-1
b = (X'X) X'y
2
s  e'e/N or e'e/(N-K)
 Classical estimation of 
 Properties
 Statistical inference: Hypothesis tests
 Prediction .
Properties of Least Squares
 Finite sample properties: Unbiased, etc. No
longer interested in these.
 Asymptotic properties
 Consistent? Under what assumptions?
 Efficient?
 Contemporary work: Often not important
 Efficiency within a class: GMM
 Asymptotically normal: How is this established?
 Robust estimation: To be considered later
Remaining to Consider for the
Linear Regression Model

 Failures of standard assumptions


 Heteroscedasticity
 Autocorrelation
 Robust estimation
 Omitted variables
 Measurement error
Thank
you

You might also like