Econometrics Tutor
By: Aemro Tazeze (PhD)
Department:
BSc in Agricultural Economics
BSc in Agribusiness and Value Chain Management
Haramaya University, Ethiopia
5 March 2023 Aemro T.
Core Competence for National Exit
Exam
General Objective Specific Objectives
Understand, estimate ▪ Understand and apply the
methodology of econometrics
and predict economic for their research
variables using ▪ Estimate and use regression
regression model using the real data and
interpret the result,
▪ Use estimated equations to
make predictions and
forecasting
5 March 2023 Aemro T.
Outlines for Tutor
❖Introduction to Econometrics
❖Linear Regression& Econometric Problem
❖Dummy Regression
❖Non-linear Regression
❖Introduction to Time Series
5 March 2023 Aemro T.
Introduction to Econometrics
❖ What is Econometrics?
❖ Use of Econometrics
❖ Steps in Econometric Methodology
5 March 2023 Aemro T.
What is Econometrics?
◼ Intuitively, econometrics (econ+metrics) is economic
measurement
Combines economic theory, mathematics and
statistics to explain, model and measure quantitative
economic relationships.
Econometrics is relevant in virtually every branch of
applied economics: finance, labor, health, industrial,
macro, development, international, trade, marketing
strategy, etc.
5 March 2023 Aemro T.
What is Econometrics?
What makes econometrics different from other
applications of statistics?
1. Economic data is non-experimental
data.
2. Economic models (either simple or
sophisticated) are key to interpret the
statistical results in econometric
applications.
5 March 2023 Aemro T.
Use of Econometrics
Testing economic theories
✓ Test Keynes hypothesis: Consumption
increases with income
Estimation of economic relationships
✓ Demand and supply equations
✓ Production functions
5 March 2023 Aemro T.
Use of Econometrics
Forecasting
✓ Use current and past economic data to predict
future values of variables such as inflation, GDP, stock
prices, etc.
Evaluating government policies
✓ Impact of coffee export on economic growth
5 March 2023 Aemro T.
Steps in Econometrics Methodology
1. Formulation of the question(s) of
interest
2. Collection of data
3. Specification of the econometric model
4. Estimation, validation, hypotheses
testing and forecasting .
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 1: Question(s)/hypothesis
• Suppose we want to test the hypothesis:
H1: Consumption increases with income
increase
• This is important for decision-makers to
improve household consumption
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 2: Collection of data
There are different datasets that can be collected to
test this hypothesis.
1. Data on consumption versus data on income.
2. Data at the household level Vs country level.
3. Types of data: Time series data (over several
years), or cross-sectional data (at the same
period); or panel data (many households over
multiple periods).
11
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric
model
econometric model- observable and not
observable to the researcher.
A researcher’s decision- depends critically
on what is observable.
12
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric
model
Whether we estimate one function or the
other depends very much on the available
data:
✓with data on consumption; data on income
13
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric model
Suppose that we decide to estimate a consumption function.
Y= F(X1, X2,..Xk, u)
14
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric
model
▪ An important specification assumption is the
choice of the functional form of the
consumption function F(.) & empirical
literature
• For instance, is it linear, quadratic, or Cobb-Douglas
(C-D) function
15
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric model
In linear consumption function is:
Y = 0 + 1 X1 + 2 X 2 ... + k X k +
• The β’s are parameters to estimate.
• u represents unobservable factors to the
econometrician, e.g., climate related shocks
16
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 3: Specification of the econometric model
Certain conditions on the statistical properties of
the error term are key for the good properties of
our estimators of the parameters of interest.
The economic interpretation of the error term is very
important to interpret our estimation results.
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 4: Estimation, validation, hypotheses testing, Forecasting
We want to estimate the parameters β in the
consumption function.
After estimation, we have to make specification
tests to validate some of the specification
assumptions.
The results of these tests may imply a re-
specification and re-estimation of the model.
18
5 March 2023 Aemro T.
Steps in Econometrics Methodology
Step 4: Estimation, validation, hypotheses
testing, prediction
Once we have a validated model, we can
interpret the results from an economic point
of view, and make tests, and predictions.
19
5 March 2023 Aemro T.
Linear Regression & Econometric
Problem
Introduction
Economic theory specifies a set of precise,
deterministic relationships among variables
Econometrics, however, a combination of
deterministic and stochastic process.
We start with OLS which minimizes the sum
of squared residuals
5 March 2023 Aemro T.
Linear Regression
Theory Mathematical Model Econometric Model
on also
yi = f ( xi ) = b0 + b1xi yi = f ( x i ) = b0 + b1x i + ei
increases, but not ti
a much as income.
s
5 March 2023 Aemro T.
Some Concepts: Regression,
Causation, and Correlation
▪ Correlation analysis shows the existence of a
relationship between two variables.
✓ It measures the strength of linear association between
variables.
• But not have a cause-and-effect relationship
▪ In correlation analysis both y and x are independent
variables.
▪ correlation coefficient shows the:
✓ existence, direction and magnitude of a relationship
between the variables; cannot be used to predict one for the
other
5 March 2023 Aemro T.
Cont…
Regression is the estimation or predicts on of the
average value of a dependent variable on the basis of
the fixed values of other variables.
Regression-causality between dependent variable y
and independent variable x.
Causation comes from theory rather than statistics.
5 March 2023 Aemro T.
Cont…
When do we apply regression analysis?
The relationship between dependent and
independent variables by determining the:
◼ extent,
◼ direction and
◼ strength of the relationship
5 March 2023 Aemro T.
Cont…
The classical linear regression population model is of
the form:
Y = 0 + 1 X1 + 2 X 2 ... + k X k +
Where
◼ Y is the dependent variable,
◼ X , X ,..., X
1 2 k are the independent variables
◼ ε is the unobservable or random or disturbance or error;
and
◼
0 , 1 , 2 ,..., k are the parameters
(constants)
5 March 2023 Aemro T.
Cont…
Why is the disturbance term ε?
✓ Measurement errors
✓ Erratic human behavior
✓ Exclusion of important variables
✓ Simultaneity
The sample linear regression is given by:
Yi = 0 + 1 X i1 + 2 X i 2 + ... + k X ik + i
5 March 2023 Aemro T.
Cont…
Why more than one predictor
variable?
✓ More than one variable influences a
dependent variable.
✓ Predictors may themselves be correlated
(multicollinearity)
5 March 2023 Aemro T.
Cont…
Model selection
✓ Should explain the most variation in the
dependent variable
Evaluation of assumptions
✓ Have we met the assumptions of the OLS
Model validation
✓ Validating the model results
5 March 2023 Aemro T.
Cont…
Yi = 0 + 1 X i1 + 2 X i 2 + + k X ik + i
0 - Intercept
1k - Partial Regression slope coefficients
i - Error term associated with the ith observation
This model gives the expected value of Y conditional
on the fixed values of X1, X2, Xk, plus error
5 March 2023 Aemro T.
Matrix Representation
For a sample of size n the regression model is best
described as a system of equations:
Y1 = 0 + 1 X 11 + ... + k X 1k + 1
Y2 = 0 + 1 X 21 + ... + k X 2 k + 2
.
.
.
Yn = 0 + 1 X n1 + ... + k X nk + n
5 March 2023 Aemro T.
Cont…
•We can re-write these equations in a matrix form
as :
Y 1
1 X X X 0 1
11 12 1k
Y2 1 X 21 X 22 X 2k 1 2
= +
Y 1 X nk
n X n1 X n2 k n
Y= X +
(n 1) (n k) (k 1) (n 1)
5 March 2023 Aemro T.
OLS Assumptions
Assumption 1: The expected value of the
error vector is 0
1 0
2 0
E ( ) = E =
n 0
5 March 2023 Aemro T.
OLS Assumptions
Assumption 2: There is no correlation
between the ith and jth error terms
E ( i j ) = 0
This is called no autocorrelation
5 March 2023 Aemro T.
OLS Assumptions
Assumption 3: The errors exhibit
constant variance
E ( ) = I 2
This is called homoscedasticity
If errors don’t exhibit-hetroscedasticity
5 March 2023 Aemro T.
OLS Assumptions
Assumption 4: Covariance between the
X’s and error terms is 0
◼ Usually satisfied if the predictor variables are
fixed and non-stochastic
cov( , X ) = 0
◼ X is called an exogeneous variable
◼ If not then it is called an endogeneous
variable
5 March 2023 Aemro T.
OLS Assumptions
Assumption 5: No exact linear relationships
among X variables.
◼ Assumption of no multicollinearity
5 March 2023 Aemro T.
OLS Assumptions
If these assumptions hold…
◼ Then the OLS estimators are unbiased
linear estimators & minimum variance
estimators
◼ In this case we say that the OLS
estimators are BLUE
5 March 2023 Aemro T.
OLS Assumptions
What does it mean to be BLUE?
◼ Allows us to compute a number of
statistics.
◼ OLS estimation
5 March 2023 Aemro T.
OLS Assumptions
Assumption 6: The error terms are normally
distributed.
i ~ N (0, ) 2
✓ Not necessarily, but will ease statistical analysis.
Assumption 7: Data generating process for X
is not related to ε
5 March 2023 Aemro T.
OLS Estimation
Population regression model:
Y = Xb + e
OLS requires choosing values of b, such that
residual sum-of-squares (SSR) is as small as
possible.
5 March 2023 Aemro T.
The Normal Equations
Need to differentiate with respect to the
unknowns (b):
SSE = ee = (Y − Xb ) (Y − Xb )
Yields n simultaneous equations in k
unknowns, also known as the Normal
Equations
Matrix form of the normal equations
( X X )b = X Y
5 March 2023 Aemro T.
The solution for the “b’s”
•It should be apparent how to solve for the
unknown parameters
•Pre-multiply by the inverse of XX
( X X )−1 ( X X )b = ( X X )−1 X Y
b = ( X X )−1 X Y
•This is the fundamental outcome of OLS theory
5 March 2023 Aemro T.
Goodness-of-Fit (R2)
▪ R2 statistic) given by:
SSE
R =
2
SST
• proportion of variability in response variable
that is accounted for the explanatory variables
0 R2 1
Good fit- R2 will be close to one.
Poor fit- R2 will be near 0.
5 March 2023 Aemro T.
R2 –Coefficient of Determination
R = 1 − SSR / SST = 1 −
2 (
)(
Y − Yˆ Y − Yˆ )
(Y − Y ) (Y − Y )
5 March 2023 Aemro T.
Critique of R2
R2 is inflated by increasing the number
of explanatory variables in the model
✓ Alternatively use the adjusted R2
5 March 2023 Aemro T.
Adjusted R 2
R = 1−
2 Y − Y(
ˆ
Y − Y)(
ˆ )
/ (n − k )
(Y − Y ) (Y − Y )/ (n − 1)
= 1 − MSR / MST
k 1; R 2 R 2
5 March 2023 Aemro T.
How adjusted R2 work?
Total Sum-of-Squares is fixed since it is
independent of the number of explanatory
variables
The numerator, SSR, decreases as the number
of variables increases
R2 artificially inflated by adding explanatory
variables to the model
Adjusted R2 takes into account the number of
predictors in the model
5 March 2023 Aemro T.
Statistical Inference
Inference can be made using:
1) hypothesis testing
2) interval estimation
5 March 2023 Aemro T.
ANOVA Approach
Decomposition of total sums-of-squares
into components relating
◼ explained variance (regression)
◼ unexplained variance (error)
5 March 2023 Aemro T.
ANOVA Table
Source of Sums-of- df Mean F-ratio
Variation Squares Square
Regression k-1 MSE/M
bX Y − nY 2 bX Y − nY 2
k −1 SR
Residual n-k Y Y − bX Y
Y Y − bX Y n−k
Total n-1
Y Y
5 March 2023 Aemro T.
F-test/Test of Multiple Restrictions
•Tests the null hypothesis:
H0: 1=2k = 0
•Null hypothesis is known as a joint or simultaneous
hypothesis, because it compares the values of all i
simultaneously
•This tests overall significance of regression model
5 March 2023 Aemro T.
The F-test statistic and R2 vary directly
(bX Y − nY ) (k − 1)
2 SSE (k − 1)
F= F=
(Y Y − bX Y ) (n − k ) SSR (n − k )
SSE (k − 1) F=
SSE SST n − k
F= 1 − (SSE SST ) k − 1
( SST − SSE ) (n − k )
R2 n − k
F=
1 − R2 k − 1
5 March 2023 Aemro T.
Test statistic
bi − i
t=
s cii
•Follows a t distribution with n – k df.
where cii is the element of the ith row and ith column
of []-1
•The 100(1-)% Confidence Interval is obtained from
bi t ; n − k s cii
2
5 March 2023 Aemro T.
Econometric Problems
What happens if one or more of these
assumptions are violated or not fulfilled?
❖The estimator/s :
➢ Biased
➢ Inefficient parameter
➢ Unacceptable standard errors
➢ Inconsistent estimates
5 March 2023 Prepared by Aemro Tazeze 54
Cont…
The basic questions to be addressed for all
the assumptions are:
➢What is the nature of the problem?
➢What are the consequences of the problem?
➢How do we detect (diagnose) the problem?
➢What remedies (prescriptions) are available
for the problem?
5 March 2023 Prepared by Aemro Tazeze 55
Cont…
➢ The Zero Mean Assumption i.e. E( i)=0
✓ If this assumption is violated, we obtain a biased
estimate of the intercept term.
✓ But, since the intercept term is not very important
we can leave it.
✓ The slope coefficients remain unaffected even if
assumption one is violated.
✓ The intercept term does not also have physical
interpretation.
5 March 2023 Prepared by Aemro Tazeze 56
Cont…
➢Homoscedasticity Assumption
✓The error terms in the regression equation
have a common variance i.e., are
Homoscedastic.
✓If they do not have common variance-
Heteroscedastic.
5 March 2023 Prepared by Aemro Tazeze 57
Cont…
✓In the case of homoscedastic, the spread of
disturbance term, around the mean is
constant, i.e. var (ei) = 2.
✓But, in the case of heteroscedasticity, the
variance disturbance terms change with each
explanatory variable.
5 March 2023 Prepared by Aemro Tazeze 58
What are the causes Heteroscedasticity?
✓The problem is more common in cross-
sectional data than in time-series data.
✓Inappropriate or faulty sampling design or mix-
up of random sampling methods
✓Various observations within a population or re-
grouping problem of non-overlapping samples.
5 March 2023 Prepared by Aemro Tazeze 59
What are the effects or consequences
:symptoms
▪ An unbiased but inefficient estimate
✓ high standard errors
✓ wider confidence interval problems
✓ increases the variance of the parameters
✓ OLS estimators are still unbiased
5 March 2023 Prepared by Aemro Tazeze 60
Cont…
✓It does affect the minimum variance
property. Thus the OLS estimators are
inefficient.
✓Thus the test statistics – t-test and F-test –
cannot be relied on in the face of
heteroscedasticity.
5 March 2023 Prepared by Aemro Tazeze 61
DIAGNOSES: How do we detect
non-Constant Variance or
Heteroscedasticity?
➢Breuch-Pagan (BP) test
✓One of the most common tests for
heteroscedasticity is the Breuch-Pagan (BP)
test.
✓Under the null hypothesis of
◼ H0 : Constant variance,
compute χ2 and compare with the tabulated χ2 and if
calculated is less than tabulated Chi-square then
heteroscedasticity exists.
5 March 2023 Prepared by Aemro Tazeze 62
Example
✓ An organization dealing with Family Planning
wished to examine the relationship between
expenditure, income and family size in Oromia
region.
✓ The organization drew a random sample of ten
families and obtained the data given in Table below.
Determine the regression equation.
5 March 2023 Prepared by Aemro Tazeze 63
Cont…
Family Expenditure Income Family size
1 19 6 3
2 20 7 4
3 14 6 2
4 10 4 4
5 22 7 6
6 23 8 5
7 17 6 3
8 15 4 3
9 7 2 4
10 23 10 3
5 March 2023 Prepared by Aemro Tazeze 64
Cont…
. sum
Variable Obs Mean Std. Dev. Min Max
expenditure 10 17 5.497474 7 23
income 10 6 2.260777 2 10
familysize 10 3.7 1.159502 2 6
5 March 2023 Prepared by Aemro Tazeze 65
Cont…
. reg expenditure income familysize
Source SS df MS Number of obs = 10
F( 2, 7) = 23.94
Model 237.307999 2 118.653999 Prob > F = 0.0007
Residual 34.6920014 7 4.95600021 R-squared = 0.8725
Adj R-squared = 0.8360
Total 272 9 30.2222222 Root MSE = 2.2262
expenditure Coef. Std. Err. t P>|t| [95% Conf. Interval]
income 2.175534 .3294222 6.60 0.000 1.396574 2.954494
familysize .9627217 .6423018 1.50 0.178 -.5560807 2.481524
_cons .3847267 3.041991 0.13 0.903 -6.80844 7.577893
5 March 2023 Prepared by Aemro Tazeze 66
Cont…
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of expenditure
chi2(1) = 0.96
Prob > chi2 = 0.3266
5 March 2023 Prepared by Aemro Tazeze 67
Cont…
✓The test for heteroskedasticity (BP test)
implies that there is no problem of
heteroskedasticity (non-constant variance)
problem.
✓since the chi-square value (p =0.3266 > 0.05)
suggests not rejecting the null hypothesis of
constant variance.
5 March 2023 Prepared by Aemro Tazeze 68
PRESCRIPTIONS: What are the
remedies for heteroskedasticity?
➢Transform the original data to log, or x2 or
square root following acceptable procedure.
➢Deflate the values by some measure of ‘size’.
➢Use the weighted- least-square (WLS) analysis
5 March 2023 Prepared by Aemro Tazeze 69
Multicollinearity
✓Since collinearity indicates that one of the
predictors or independent variables is an exact
linear combination of the others cov( X i , X j ) 0 ,
then this is known as a problem of
multicollinearity.
5 March 2023 Prepared by Aemro Tazeze 70
CAUSES: What are the causes of
multicollinearity?
✓Use of highly related independent variables
✓Rounding off sensitive variables
✓Improper scaling (or choice of measurement
unit) of variables
✓Inclusion of extreme values via errors in data
collection
✓Use of many dummy independent variables
✓Use of many interaction terms in a model
5 March 2023 Prepared by Aemro Tazeze 71
SYMPTONS: What are the effects or
consequences of multicollinearity?
✓Inaccurate coefficient estimates/measurements
✓Incorrect model specification or estimation
✓High coefficient of determination
5 March 2023 Prepared by Aemro Tazeze 72
DIAGNOSES: How do we detect
multicollinearity problem?
✓Wrong R2 value- very high/negative value
✓OLS estimates might be insignificant/high
standard error values.
✓VIF
• If VIF > 10, then there is the intolerable problem of
multicollinearity
5 March 2023 Prepared by Aemro Tazeze 73
Cont…
. vif
Variable VIF 1/VIF
familysize 1.01 0.992814
income 1.01 0.992814
Mean VIF 1.01
. pwcorr
expend~e income family~e
expenditure 1.0000
income 0.9119 1.0000
familysize 0.2789 0.0848 1.0000
5 March 2023 Prepared by Aemro Tazeze 74
Cont…
✓The VIF values for both family size (fsize) and
income variables are by far less than 10- no
multicollinearity
✓Pairwise correlation matrix show also there is
very weak collinearity/non-existence of
multicollinearity.
5 March 2023 Prepared by Aemro Tazeze 75
PRESCRIPTIONS: What are the remedies
for multicollinearity?
➢Drop one or some highly correlated variable
➢Scale the variables/adjust choice of
measurement.
➢Center the data set or normalize the data.
✓ For example, adding a constant number.
5 March 2023 Prepared by Aemro Tazeze 76
Normality
✓Normality assumption is basically the
disturbance terms are normality distributed
✓The violation of the normality assumption is
known as non-normality problem.
5 March 2023 Prepared by Aemro Tazeze 77
CAUSES: What are the causes of non-
normality
➢Outliers data.
➢Incorrect random sampling technique
➢Incorrect sampling methods choosing non-
random methods such as convenience
sampling, purposive sampling and quota
sampling
5 March 2023 Prepared by Aemro Tazeze 78
Cont…
➢Very small sample size
➢Observations mis-recorded (too many
outliers)
➢Omission of relevant variables
➢Missing value problems
5 March 2023 Prepared by Aemro Tazeze 79
SYMPTONS: What are the effects or
consequences of non-normality?
➢Biased estimates
➢Inflated standard error
5 March 2023 Prepared by Aemro Tazeze 80
Cont…
✓The residuals/standardized residuals test for
normality.
✓Kolmogorov-Smirnov test for normality of the
distribution: sktest resid
✓Check normality by previous data related to
expenditure, income and family size
5 March 2023 Prepared by Aemro Tazeze 81
Cont…
. sktest resid
Skewness/Kurtosis tests for Normality
joint
Variable Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
resid 10 0.9980 0.6411 0.22 0.8970
5 March 2023 Prepared by Aemro Tazeze 82
Cont…
✓In this case, since p=0.8970 > 0.05, the
residuals are not statistically significant
✓Therefore the Ho of no difference between
the theoretical normal distribution and the
data residuals distribution cannot be
rejected.
✓Thus, the normality assumption is not
violated.
5 March 2023 Prepared by Aemro Tazeze 83
Cont…
. gen lnexp=ln(exp)
. kdensity lnexp, bwidth(0.2) normal n(10)
Kernel density estimate
1
Density
.5
0
1.5 2 2.5 3 3.5
lnexp
Kernel density estimate
Normal density
kernel = epanechnikov, bandwidth = 0.2000
5 March 2023 Prepared by Aemro Tazeze 84
Cont…
✓The kernel density plot provides a smoother
version of histogram which looks like
normal graph.
✓The kernel density graph for expenditure
data is fairly smooth and it appears that it
closely matches the normal curve.
5 March 2023 Prepared by Aemro Tazeze 85
PRESCRIPTIONS: What are the
remedies for non-normality?
➢Trim or drop the outliers
➢Smooth or transform the original data
5 March 2023 Prepared by Aemro Tazeze 86
Autocorrelation
What is autocorrelation?
➢Autocorrelation is the
interdependence/correlation of a pair of error
terms in a model.
➢The problem should be eliminated/minimized
5 March 2023 Prepared by Aemro Tazeze 87
CAUSES: What are the causes of
autocorrelation?
➢Omission of independent variable because of
lack of data.
➢Faulty functional form or misspecification of
a model.
➢Missing values or observations
5 March 2023 Prepared by Aemro Tazeze 88
SYMPTONS: What are the effects or consequences of
autocorrelation?
❖Unbiased and inconsistent (no efficient)
parameter estimates with wrong standard
errors.
❖Incorrect statistical test (such as wider
confidence interval, the variance are biased
❖Thus, R2, t and F statistics tend to be
exaggerated.
5 March 2023 Prepared by Aemro Tazeze 89
DIAGNOSES: How Do We Detect
Autocorrelation Problem?
❖Durbin-Watson (DW) test
❖The value of DW lies between 0 and 4
inclusive (i.e., 0 DW 4 ).
5 March 2023 Prepared by Aemro Tazeze 90
Cont…
✓If DW is in the neighborhoods of 2 and
equal to 2 (no evidence of autocorrelation)
✓DW as moves towards 0 (positive
autocorrelation)
✓DW as it moves towards 4 (negative
autocorrelation)
5 March 2023 Prepared by Aemro Tazeze 91
PRESCRIPTIONS: What are the remedies for
autocorrelation?
✓Averaging/extrapolating to estimate the
missing values or observations
✓Differencing the data.
• This induces the stationarity condition by
removing trend and some seasonal components of
a time series data set
5 March 2023 Prepared by Aemro Tazeze 92
Endogeniety
What is endogeniety problem?
➢Endogeniety is condition when any of the
independent/dependent variable is correlated
with any error term, that is, Cov( xi , i ). 0
➢ A departure from the non-endogeniety
assumption Cov( xi , i ) = 0 is known as
Exogenous.
5 March 2023 Prepared by Aemro Tazeze 93
CAUSES: What are the causes of endogeniety?
➢Endogeniety occurs where irrelevant
variables or lagged dependent variable (s) are
introduced as independent variable(s) in a
model.
5 March 2023 Prepared by Aemro Tazeze 94
SYMPTONS: What are the effects or
consequences of endogeniety?
➢This leads to high standard error and
inefficient parameter estimates.
5 March 2023 Aemro T.
DIAGNOSES: How do we detect endogeniety?
➢Most widely applied test statistic is known as
Hausman test.
5 March 2023 Prepared by Aemro Tazeze 96
PRESCRIPTIONS: What are the
remedies for endogeniety?
➢Omit irrelevant dependent or independent
variables
➢Exclude lagged dependent variable (s) that
were introduced as independent variable(s) in
a model
5 March 2023 Aemro T.
Misspecification
What is misspecification or non-specification?
➢ Misspecification is usually a problem that may
arise due to a mismatch of a model and a data
set;
➢ On the other hand, misspecification occurs due
to inclusion of irrelevant variables, and
exclusion of relevant variables in a regression
equation.
5 March 2023 Prepared by Aemro Tazeze 98
What are the causes of misspecification?
➢Incorrect functional forms
SYMPTONS: What are the effects or
consequences of misspecification?
➢Biased estimates
➢Incorrect statistical estimates
5 March 2023 Prepared by Aemro Tazeze 99
DIAGNOSES: How do we detect
misspecification?
➢Observe for outliers
➢Notice if non-constant variance or
heteroskedasticity behavior exists
➢Unusual coefficient of determination value,
closer to 100%.
➢Using Ramsey Reset test
5 March 2023 Prepared by Aemro Tazeze 100
PRESCRIPTIONS: What are the
remedies for misspecification?
➢Re-examine the data set and identify if the
right type of modeling is applied to it.
➢Recall that the type of data collected vs
determines what type of model
➢Transformation of data set through the
inclusion of relevant variables and exclusion
of irrelevant variables.
5 March 2023 Prepared by Aemro Tazeze 101
Cont…
✓ The following is test for misspecification on the
expenditure (two variable equation previously) data,
using the STATA command:
. ovtest
Ramsey RESET test using powers of the fitted values of expenditure
Ho: model has no omitted variables
F(3, 4) = 0.63
Prob > F = 0.6322
➢ The Ramsey RESET test (Prob > F = 0.6322 > 0.05)
indicates that there are no omitted variables for this
particular model with the two variables; therefore, there is
no need to improve the specification of the model.
5 March 2023 Prepared by Aemro Tazeze 102
Cont…
➢Conclusion: Interpreting the linear
regression equation for the expenditure
function
➢As demonstrated above, since the model has
passed all the regression hurdles
➢we therefore conclude that the model
adequately fits the data.
5 March 2023 Prepared by Aemro Tazeze 103
Dummy Regression
✓Dummy variables are discrete variables
taking a value of ‘0’ or ‘1’. They are often
called ‘on’ ‘off’ variables, being ‘on’ when
they are 1.
✓Dummy variables can be used either as
explanatory variables or as the dependent
variable.
5 March 2023 Aemro T.
Cont…
✓These are: nominal, ordinal, interval and ratio
scale variables.
✓regression models do not deal only with ratio
scale variables; they can also involve nominal
and ordinal scale variables.
➢Dummy variable regression is that if there are
m categories, we need only m-1 dummy
variables.
5 March 2023 Aemro T.
Cont…
✓One approach to this problem would simply
be to estimate two separate consumption
functions and obtain two consumption
equations.
✓Suppose that we hypothesize that war time
controls do not alter the marginal
propensity to consume out of disposable
income, but instead simply reduce the average
propensity to consume.
5 March 2023 Aemro T.
Cont…
✓By this we mean that the slope remains the
same, whereas the constant term becomes
smaller for war- time case.
✓With this assumption, the consumption
function becomes
Ct = b0 + b1Ydt + b2 Dt + ut , t = 1, 2, ..., n,
5 March 2023 Aemro T.
Cont…
✓ Where Dt = 0 during peace time years
= 1 for war years
✓ Equation above says that during peace time,
when Dt = 0, we have
Ct = b0 + b1Ydt+ut
✓ Which in period of war (Dt=1) becomes
Ct = (b0 + b2) + b1Ydt+ ut
5 March 2023 Aemro T.
Cont…
✓Suppose the time period under consideration
has both war and peace periods.
✓Using the data, we could estimate the values
of the coefficient in equation with our
standard multiple regression equation.
5 March 2023 Aemro T.
Cont…
✓ Suppose that we in fact did this and obtained the equation
ˆ = 40 + 0.9Y − 30 D
C t dt t
✓ Let us say that the t- ratio corresponding to the Dt was of
sufficient size to suggest that the parameter b2 is not zero.
✓ We would then conclude that the war had a significant
negative effect on consumption expenditures. The
estimated consumption function would be
5 March 2023 Aemro T.
Cont…
✓ Cˆ t = 10 + 0.9Ydt , for years of peace
✓ Cˆ t = 40 + 0.9Ydt , for war years
✓ If consumption expenditures are measured in
billions of dollars, a comparison of the above two
equation would then suggest that, for
corresponding levels of income, consumption
expenditures were 30 billion dollars less during
years of war.
5 March 2023 Aemro T.
Non-linear Regression
✓ What type of models do we use for qualitative
observations?
✓ Several methods have been developed to analyze
data using regression models with dichotomous
(binary) or several categorical dependent variable.
✓ The most common ones are - Linear Probability
Models (LPM), Probit (or Normit), Logit, Tobit and
etc.
5 March 2023 Aemro T.
Cont…
There are several situation in which the outcome
variable we want to explain can take only two
possible values.
So the researchers are interested to model the
choice of an individual by using binary choice
models.
5 March 2023 Aemro T.
Cont…
Consumer economics: whether a consumer makes a
purchase or not.
Labor economics: whether an individual participates
in the labor market or not.
Agricultural economics: whether or not a farmer
adopts or uses organic practices,
marketing/production contracts, etc.
5 March 2023 Aemro T.
Binary Choice Models
Binary choice models are the foundation from
which more complex models for ordinal, nominal,
and count models can be derived.
The decision/choice is whether or not to have, do,
use, or adopt.
The dependent variable is a binary response
It takes on two values: 0 and 1.
5 March 2023 Aemro T.
5 March 2023
Logit /Probit Regression
Logit or probit model is a realization of a binomial
process with probabilities given to the occurrence or
non-occurrence of an event
which its dependent variable is a dichotomous
observation.
The attractiveness of the logit/probit model is to
capture exactly the effect in categorical dependent
variables.
Aemro T.
Logit
For the logit model, F(X’) is the cdf of the logistic
Econometrics(AgEc 721)
distribution.
Advanced
5
The predicted probabilities are limited between 0 Marc
h
and 1. 2023
Aemro T.
5 March 2023
Cont…
Which indicate how often something happens (y =
1) and not happen (y = 0).
Then the probability of the event happening is given
by
z
Pi =
1+ z
From the probability rule the probability of not
happening is z 1
1 − Pi = 1 − =
1+ z
1+ z
Aemro T.
5 March 2023
Cont…
Where,
z = 0 + 1 X
We can write the following
z
Pi
= 1 + z
= z
1 − Pi 1
1+ z
Now Pi/(1 − Pi) is simply the odds ratio in favor of
event happen. i.e. the ratio of the probability that an
event will happen to the probability that it will not
happen.
Aemro T.
5 March 2023
Cont…
Now if we take the natural log of the above equation,
we obtain :
Pi
Li = ln ( )
= ln z = zi = 0 + 1 X i
1 − Pi
That is, L, the log of the odds ratio, is not only linear
in X, but also (from the estimation viewpoint) linear in
the parameters.
L is called the logit, but probability is not linear
Aemro T.
Probit Model
5 March 2023
For the logit model, F(X’) is the cdf of the
standard normal distribution.
The predicted probabilities are limited between 0
and 1.
Aemro T.
5 March 2023
Probit/Logit
1.2
0.8
P(Y)
0.6 Probit Logit
0.4
0.2
0
-10 0 10
z
Aemro T.
5 March 2023
Interpretation of Coefficients
An increase in x increases/decreases the likelihood that y=1
(makes that outcome more/less likely).
In other words, an increase in x makes the outcome of 1 more
or less likely.
We interpret the sign of the coefficient but not the magnitude.
The magnitude cannot be interpreted using the coefficient
because different models have different scales of coefficients.
Aemro T.
5 March 2023
Choice between the Logit and Probit
Model
The choice depends on the data generating process, which
is unknown.
The models produce almost identical results (different
coefficients but similar marginal
effects).The choice is up to the researcher.
If we reverse the categories 0 and 1, the signs of the
coefficients are reversed (positive become
negative and vice versa) but the magnitudes are the same.
Aemro T.
Tobit
❖This model is called Tobit because it was first
proposed by Tobin (1958.
❖The model is used when we have all observations of
the explanatory variables but the continuous
dependent variable is “limited” in the sense that we
observe it only if it is above or below some cut off
level.
5 March 2023 Aemro T.
Introduction to Time Series
Time series data is data collected for a single
entity at multiple points in time
Time-series analysis: The statistical analysis of a
sample of time-ordered, periodic observations
Example: annual performance data of GDP
(gross domestic product), and PCE (personal
consumption expenditure) a country.
5 March 2023 Aemro T.
Cont…
A time series is a collection of data yt (t=1,2,…,T), with
the interval between yt and yt+1 being fixed and
constant.
We can think of time series as being generated by a
stochastic process, or the data generating process
(DGP).
A time series (sample) is a particular realization of the
DGP (population).
Time series analysis is the estimation of difference
equations containing stochastic (error) terms
5 March 2023 Aemro T.
Cont…
Regression analysis based on time series data
implicitly assumes that the underlying time
series are stationary.
In practice most economic time series are
nonstationary.
The are various potential difficulty in the
statistical analysis of time-series data that can
invalidate the empirical results
5 March 2023 Aemro T.
Category of Time Series
Time series can broadly be categorized into two:
◼ Univariate time series: Concerned with time
series properties of a single series
◼ Eg. yt = β0 + β1 yt-1 +εt
◼ Multivariate time series: Concerned with time
series properties of more than one series
◼ Eg. yt = β0 + β1 yt-1 + β2xt +…+ βixt-i +εt
5 March 2023 Aemro T.
Stationarity and weakly
dependent time series
Stationary is important property that must hold
before we can estimate a time-series model
Stationarity: Time series yt is strongly stationary if
its probability density function does not depend on
time i.e. pdf of (ys,ys+1,ys+2,..ys+t ) does not depend
on s (gap)
✓ A stationary time series process is one whose probability
distributions are stable over time
Weak stationarity: A series has weak stationarity if
first and second moments do not depend on t
5 March 2023 Aemro T.
Cont…
Time series data often have time-dependent
moments (e.g. mean, variance..).
Stochastic process (y1,…, yt ,…yt+n) is weakly
stationary…
✓ if E(yt) = does not depend on t (constant mean)
✓ V{yt}= does not depend on t (constant variance)
✓ Cov{yt , yt-s}= depends on s, the distance (gap)
between the two periods and not t.
5 March 2023 Aemro T.
Cont…
Weakly Dependent Time Series
Stationarity has to do with the joint
distributions of a process as it moves through
time.
A very different concept is that of weak
dependence, which places restrictions on how
strongly related the random variables xt and
xt+h can be as the time distance between them,
h, gets large.
5 March 2023 Aemro T.
Cont…
The mean or variance of many time series
increases over time.
This is a property of time series data called
nonstationarity.
If two independent, nonstationary series are
regressed on each other, the chances for finding
a spurious relationship are very high.
5 March 2023 Aemro T.
Great Love for you!!!
Good Luck!
5 March 2023 Aemro T.