Econometric Modeling
Research Methods
Professor Lawrence W. Lan
Email: [email protected]
http://140.116.6.5/mdu/
Institute of Management
Outline
• Overview
• Single-equation Regression Models
• Simultaneous-equation Regression
Models
• Time-Series Models
Overview
• Objectives
• Model building
• Types of models
• Criteria of a good model
• Data
• Desirable properties of estimators
• Methods of estimation
• Software packages and books
Objectives
• Empirical verification of the theories in business,
economics, management and related disciplines is
becoming increasingly quantitative.
• Econometrics, or economic measurement, is a social
science in which the tools of economic theory,
mathematical statistics are applied to the analysis of
economic phenomena.
• Focus on models that can be expressed in equation form
and relating variables quantitatively.
• Data are used to estimate the parameters of the
equations, and the theoretical relationships are tested
statistically.
• Used for policy analysis and forecasting.
Model Building
• Model building is a science and art, which
serves for policy analysis and forecasting.
– science: consists of a set of quantitative tools
used to construct and test mathematical
representations of the real world problems.
– art: consists of intuitive judgments that occur
during the modeling process. No clear-cut
rules for making these judgments.
Types of Models (1/4)
• Time-series models
– Examine the past behavior of a time series in
order to infer something about its future
behavior, without knowing about the causal
relationships that affect the variable we are
trying to forecast.
– Deterministic models (e.g. linear
extrapolation) vs. stochastic models (e.g.
ARIMA, SARIMA).
Types of Models (2/4)
• Single-equation models
– With causal relationships (based on
underlying theory) in which the variable (Y)
under study is explained by a single function
(linear or nonlinear) of a number of variables
(Xs)
– Y: explained or dependent variable
– Xs: explanatory or independent variables
Types of Models (3/4)
• Simultaneous-equation models (or multi-
equation simulation models)
– With causal relationships (based on
underlying theory) in which the dependent
variables (Ys) under study are related to each
other as well as to a set of equations (linear or
nonlinear) with a number of explanatory
variables (Xs)
Types of Models (4/4)
• Combination of time-series and regression
models
– Single-input vs. multiple-input transfer
function models
– Linear vs. rational transfer functions
– Simultaneous-equation transfer functions
– Transfer functions with interventions or
outliers
Criteria of a Good Model
• Parsimony
• Identifiability
• Goodness of fit
• Theoretical consistency
• Predictive power
Data
• Sample data: the set of observations from the
measurement of variables, which may come
from any number of sources and in a variety of
forms.
• Time-series data: describe the movement of any
variable over time.
• Cross-section data: describe the activities of any
individual or group at a given point in time.
• Pooled data: a combination of time-series and
cross-section data, also known as panel data,
longitudinal or micropanel data.
Desirable Properties of Estimators
• Unbiased: the mean or expected value of an
estimator is equal to the true value.
• Efficient (best): the variance of an estimator is
smaller than any other ones.
• Minimum mean square error (MSE): to trade off
bias and variance. MSE is equal to the square of
the bias and the variance of the estimator.
• Consistent: the probability limit of an estimator
gets close to the true value. It is a large-sample
or asymptotic property.
Methods of Estimation
• Ordinary least squares (OLS)
• Maximum likelihood (ML)
• Weighted least squares (WLS)
• Generalized least squares (GLS)
• Instrumental variable (IV)
• Two-stage least squares (2SLS)
• Indirect least squares (ILS)
• Three-stage least squares (3SLS)
Software Packages and Books
• LIMDEP: single-equation and
simultaneous-equation regression models
• SCA: time series models
• Textbooks
– (1) Damodar Gujarati, Essentials of Econometrics,
2nd ed. McGraw-Hill, 1999.
– (2) Robert S. Pindyck and Daniel L. Rubinfeld,
Econometric Models and Economic Forecasts, 4th
ed. McGraw-Hill, 1997.
Single-equation Regression Models
• Assumptions
• Best Linear Unbiased Estimation (BLUE)
• Hypothesis testing
• Violations for assumptions 1 ~ 5
• Forecasting
Assumptions
• A1: (i) The relationship between Y and X is truly
existent and correctly specified. (ii) Xs are
nonstochastic variables whose values are fixed.
(iii) Xs are not linearly correlated.
• A2: The error term has zero expected value for
all observations.
• A3: The error term has constant variance for all
observations
• A4: The error terms are statistically independent.
• A5: The error term is normally distributed.
Best Linear Unbiased Estimation
• Gauss-Markov (GM) Theorem: Given
assumptions 1, 2, 3, and 4, the estimation of the
regression parameters by least squares (OLS)
method are the best (most efficient) linear
unbiased estimators. (BLUE)
• GM theorem applies only to linear estimators
where the estimators can be written as a
weighted average of the individual observations
on Y.
Hypothesis Testing
• Normal, Chi-square, t, and F distributions
• Goodness of fit
• Testing the regression coefficients (single
equation)
• Testing the regression equation (joint
equations)
• Testing for structural stability or
transferability of regression models
A1(i) Violation -- Specification Error
• Omitting irrelevant variables biased and
inconsistent estimators
• Inclusion of irrelevant variables
unbiased but inefficient estimators
• Incorrect functional form (nonlinearities,
structural changes) biased and
inconsistent estimators
A1(ii) Violation – Xs Correlated with Error
• OLS leads to biased and inconsistent estimators
• Criteria of good instrumental (proxy) variables
• Instrumental-variables estimation consistent,
but no guarantee for unbiased or unique
estimators
• Two-stage least squares (2SLS) estimation
optimal instrumental variable, unique consistent
estimators
A1(iii) Violation -- Multicollinearity
• Perfect collinearity between any of Xs
no solution will exist
• Near or imperfect multicollinearity large
standard error of OLS estimators or wider
confidence intervals; high R2 but few
significant t values; wrong signs for
regression coefficients; difficulty in
explaining or assessing the individual
contribution of Xs to Y.
Detection of Multicollinearity
• Testing the significance of R-i2 from the various
auxiliary regressions. F=[R-i2/(k-1)]/[(1-R-i2)/(n-k)],
where n=number of observations, k=number of
explanatory variables including the intercept.
Check if F-value is significantly different from zero. If
yes (F-value > F-table), X-i and Xi are significantly
collinear with each other.
• Variance inflation factor (VIF = 1/(1-R-i2): VIF=1
representing no collinearity; if VIF>10 then high degree
of multicollinearity
A2 Violation – Measurement Error in Y
• OLS will result in biased intercept;
however, the estimated slope parameters
are still unbiased and consistent.
• Correction for the dependent variable
A3 Violation -- Heteroscedasticity
• It happens mostly for cross-sectional data;
sometimes for time-series data.
• OLS will lead to inefficient estimation, but still
unbiased.
• Can be corrected by weighted least squares
(WLS) method
• Detection: Goldfeld-Quandt test, Breusch-Pagan
test, White test, Park-Glejser test, Bartlett test,
Peak test, Spearman’s rank correlation test, etc.
A4 Violation -- Autocorrelation
• It happens mostly for time-series data;
sometimes for cross-sectional data.
• OLS will lead to inefficient estimation, but still
unbiased.
• Can be corrected by generalized least squares
(GLS) method
• Detection: Durbin-Watson test, runs test. (For
lagged dependent variable, DW2 even when
serial correlation, do not use DW test, use h test
or t test instead)
A5 Violation – Non-normality
• Chi-square, t, F tests are not valid;
however, these tests are still valid for large
sample.
• Detection: Shapiro-Wilk test, Anderson-
Darling test, Jarque-Bera (JB) test.
JB=(n/6)[S2 + (K-3)2/4] where n=sample
size, K=kurtosis, S=skewness. (For
normal, K=3, S=0) JB~ Chi-square
distribution with 2 d.f.
Forecasting
• Ex post vs. ex ante forecast
• Unconditional forecasting
• Conditional forecasting
• Evaluation of ex post forecast errors
– means: root-mean-square error, root-mean-square
percent error, mean error, mean percent error, mean
absolute error, mean absolute percent error, Theil’s
inequality coefficient
– variances: Akaike information criterion (AIC), Schwarz
information criterion (SIC)
Simultaneous-equations
Regression Models
• Simultaneous-equation models
• Seemly unrelated equation models
• Identification problem
Simultaneous-equations Models
• Endogenous variables exist on both sides of the
equations
• Structural model vs. reduced form model
• OLS will lead to biased and inconsistent
estimation; indirect least squares (ILS) method
can be used to obtain consistent estimation
• Three-stage least squares (3SLS) method will
result in consistent estimation
• 3SLS often performs better than 2SLS in terms
of estimation efficiency
Seemly Unrelated Equation Models
• Endogenous variables appear only on the
left hand side of equations
• OLS usually results in unbiased but
inefficient estimation
• Generalized least squares (GLS) method
is used to improve the efficiency Zellner
method
Identification Problem
• Unidentified vs. identified (over identified
and exactly identified)
• Order condition
• Rank condition
Time-series Models
• Time-series data
• Univariate time series models
• Box-Jenkins modeling approach
• Transfer function models
Time-series Data
• Yt: A sequence of data observed at equally
spaced time interval
• Stationary vs. non-stationary time series
• Homogeneous vs. non-homogeneous time
series
• Seasonal vs. non-seasonal time series
Univariate Time Series Models
• Types of models: white noise model,
autoregressive (AR) models, moving-average
(MA) models, autoregressive-moving average
(ARMA) models, integrated autoregressive-
moving average (ARIMA) models, seasonal
ARIMA models
• Model identification: MA(q) sample
autocorrelation function (ACF) cuts off; AR(p)
sample partial autocorrelation function (PACF)
cuts off; ARMA(p,q) both ACF and PACF die
out
Box-Jenkins Modeling Approach
• Tentative model identification (p, q) extended
sample autocorrelation function (EACF)
• Estimation (maximum likelihood estimation
conditional or exact)
• Diagnostic checking (t, R2, Q tests, sample ACF
of residuals, residual plots, outlier analysis)
• Application (using minimum mean squared error
forecasts)
Transfer Function Models
• Single input (X) vs. multiple input (Xs) models
• Linear transfer function (LTF) vs. rational
transfer function (RTF) models
• Model identification (variables to be used; b, s, r
for each input variable using corner table
method; ARMA model for the noise)
• Model estimation: maximum likelihood
estimation (conditional or exact)
• Diagnostic checking: cross correlation function
(CCF)
• Forecasting: simultaneous forecasting
Simultaneous Transfer Function
(STF) Models
• Purposes (to facilitate forecasting and
structural analysis of a system, and to
improve forecast accuracy)
• Yt and Xt can be both endogenous
variables in the system
• Use LTF method for model identification,
FIML for estimation, CCM (cross
correlation matrices) for diagnostic
checking, simultaneous forecasting
Transfer Function Models with
Interventions or Outliers
• Additive Outlier (AO)
• Level Shift (LS)
• Temporary Change (TC)
• Innovational Outlier (IO)
• Intervention models