Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
53 views39 pages

Econometrics Lecture Note Chapter 4 and 5

Uploaded by

yade ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views39 pages

Econometrics Lecture Note Chapter 4 and 5

Uploaded by

yade ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Yadeta Ahmed

Chapter Four

Regression on Dummy Variables


4.1. The nature of dummy variables
In regression analysis the dependent variable is frequently influenced not only by
variables that can be readily quantified on some well-defined scale (e.g., income,
output, prices, costs, height, and temperature), but also by variables that are
essentially qualitative in nature (e.g., sex, race, color, religion, nationality, wars,
earthquakes, strikes, political upheavals, and changes in government economic
policy). For example, holding all other factors constant, female college professors
are found to earn less than their male counterparts, and nonwhites are found to
earn less than whites. This pattern may result from sex or racial discrimination, but
whatever the reason, qualitative variables such as sex and race do influence the
dependent variable and clearly should be included among the explanatory
variables. Since such qualitative variables usually indicate the presence or absence
of a “quality” or an attribute, such as male or female, black or white, or Christian
or Muslim, one method of “quantifying” such attributes is by constructing
artificial variables that take on values of 1 or 0, 0 indicating the absence of an
attribute and 1 indicating the presence (or possession) of that attribute. For
example, 1 may indicate that a person is a male, and 0 may designate a female; or
1 may indicate that a person is a college graduate, and 0 that he is not, and so on.
Variables that assume such 0 and 1 values are called dummy variables.
Alternative names are indicator variables, binary variables, categorical variables,
and dichotomous variables.

Dummy variables can be used in regression models just as easily as quantitative


variables. As a matter of fact, a regression model may contain explanatory
variables that are exclusively dummy, or qualitative, in nature.

Econometrics
Yadeta Ahmed

Example: Yi    Di  u i ------------------------------------------(5.01)


where Y=annual salary of a college professor
Di  1 if male college professor

= 0 otherwise (i.e., female professor)


Note that (5.01) is like the two variable regression models encountered previously
except that instead of a quantitative X variable we have a dummy variable D
(hereafter, we shall designate all dummy variables by the letter D).

Model (5.01) may enable us to find out whether sex makes any difference in a
college professor’s salary, assuming, of course, that all other variables such as age,
degree attained, and years of experience are held constant. Assuming that the
disturbance satisfy the usually assumptions of the classical linear regression
model, we obtain from (5.01).
Mean salary of female college professor: E (Yi / Di  0)   -------(5.02)
Mean salary of male college professor: E (Yi / Di  1)    

that is, the intercept term  gives the mean salary of female college professors
and the slope coefficient  tells by how much the mean salary of a male college
professor differs from the mean salary of his female counterpart,    reflecting
the mean salary of the male college professor. A test of the null hypothesis that
there is no sex discrimination ( H 0 :   0) can be easily made by running
regression (5.01) in the usual manner and finding out whether on the basis of the t
test the estimated  is statistically significant.

4.2. Regression on one quantitative variable and one qualitative variable with
two classes, or categories
Consider the model: Yi   i   2 Di  X i  u i ----------------------------(5.03)
Where: Yi  annual salary of a college professor

Econometrics
Yadeta Ahmed

X i  years of teaching experience

Di  1 if male

=0 otherwise
Model (5.03) contains one quantitative variable (years of teaching experience) and
one qualitative variable (sex) that has two classes (or levels, classifications, or
categories), namely, male and female. What is the meaning of this equation?
Assuming, as usual, that E (u i )  0, we see that
Mean salary of female college professor: E (Yi / X i , Di  0)   1  X i ---------(5.04)
Mean salary of male college professor: E (Yi / X i , Di  1)  (   2 )  X i ------(5.05)
Geometrically, we have the situation shown in fig. 5.1 (for illustration, it is
assumed that  1  0 ). In words, model 5.01 postulates that the male and female
college professors’ salary functions in relation to the years of teaching experience
have the same slope   but different intercepts. In other words, it is assumed that
the level of the male professor’s mean salary is different from that of the female
professor’s mean salary (by  2 ) but the rate of change in the mean annual salary
by years of experience is the same for both sexes.

Econometrics
Yadeta Ahmed

If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions (5.04) and (5.05) have the same intercept (i.e., there is no sex
discrimination) can be made easily by running the regression (5.03) and noting the
statistical significance of the estimated  2 on the basis of the traditional t test. If
the t test shows that ̂ 2 is statistically significant, we reject the null hypothesis that
the male and female college professors’ levels of mean annual salary are the same.

Before proceeding further, note the following features of the dummy variable
regression model considered previously.

1. To distinguish the two categories, male and female, we have introduced


only one dummy variable Di . For if Di  1 always denotes a male, when
Di  0 we know that it is a female since there are only two possible

outcomes. Hence, one dummy variable suffices to distinguish two


categories. The general rule is this: If a qualitative variable has ‘m’
categories, introduce only ‘m-1’ dummy variables. In our example, sex
has two categories, and hence we introduced only a single dummy variable.
If this rule is not followed, we shall fall into what might be called the
dummy variable trap, that is, the situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and
female, is arbitrary in the sense that in our example we could have assigned
D=1 for female and D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often
referred to as the base, benchmark, control, comparison, reference, or
omitted category. It is the base in the sense that comparisons are made
with that category.

Econometrics
Yadeta Ahmed

4. The coefficient  2 attached to the dummy variable D can be called the


differential intercept coefficient because it tells by how much the value of
the intercept term of the category that receives the value of 1 differs from
the intercept coefficient of the base category.

4.3. Regression on one quantitative variable and one qualitative


variable with
more than two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the
annual expenditure on health care by an individual on the income and education of
the individual. Since the variable education is qualitative in nature, suppose we
consider three mutually exclusive levels of education: less than high school, high
school, and college. Now, unlike the previous case, we have more than two
categories of the qualitative variable education. Therefore, following the rule that
the number of dummies be one less than the number of categories of the variable,
we should introduce two dummies to take care of the three levels of education.
Assuming that the three educational groups have a common slope but different
intercepts in the regression of annual expenditure on health care on annual income,
we can use the following model:
Yi   1   2 D2i   3 D3i  X i  u i --------------------------(5.06)

Where Yi  annual expenditure on health care


X i  annual expenditure

D2  1 if high school education


= 0 otherwise
D3  1 if college education

= 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily
treating the “less than high school education” category as the base category.

Econometrics
Yadeta Ahmed

Therefore, the intercept  1 will reflect the intercept for this category. The
differential intercepts  2 and  3 tell by how much the intercepts of the other two
categories differ from the intercept of the base category, which can be readily
checked as follows: Assuming E (u i )  0 , we obtain from (5.06)
E (Yi | D2  0, D3  0, X i )   1  X i

E (Yi | D2  1, D3  0, X i )  ( 1   2 )  X i

E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i

which are, respectively the mean health care expenditure functions for the three
levels of education, namely, less than high school, high school, and college.
Geometrically, the situation is shown in fig 5.2 (for illustrative purposes it is
assumed that  3   2 ).

4.4. Regression on one quantitative variable and two qualitative


variables
The technique of dummy variable can be easily extended to handle more than one
qualitative variable. Let us revert to the college professors’ salary regression
(5.03), but now assume that in addition to years of teaching experience and sex the
skin color of the teacher is also an important determinant of salary. For simplicity,

Econometrics
Yadeta Ahmed

assume that color has two categories: black and white. We can now write (5.03)
as :
Yi   1   2 D2i   3 D3i  X i  u i -------------------------------------------(5.07)

Where Yi  annual salary


X i  years of teaching experience

D2  1 if female
=0 otherwise
D3  1 if white

=0 otherwise
Notice that each of the two qualitative variables, sex and color, has two categories
and hence needs one dummy variable for each. Note also that the omitted, or base,
category now is “black female professor.”
Assuming E (u i )  0 , we can obtain the following regression from (5.07)
Mean salary for black female professor:
E (Yi | D2  0, D3  0, X i )   1  X i

Mean salary for black male professor:


E (Yi | D2  1, D3  0, X i )  ( 1   2 )  X i

Mean salary for white female professor:


E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i

Mean salary for white male professor:


E (Yi | D2  1, D3  1, X i )  ( 1   2   3 )  X i

Once again, it is assumed that the preceding regressions differ only in the intercept
coefficient but not in the slope coefficient  .
An OLS estimation of (5.06) will enable us to test a variety of hypotheses. Thus,
if  3 is statistically significant, it will mean that color does affect a professor’s
salary. Similarly, if  2 is statistically significant, it will mean that sex also affects
a professor’s salary. If both these differential intercepts are statistically

Econometrics
Yadeta Ahmed

significant, it would mean sex as well as color is an important determinant of


professors’ salaries.

From the preceding discussion it follows that we can extend our model to include
more than one quantitative variable and more than two qualitative variables. The
only precaution to be taken is that the number of dummies for each qualitative
variable should be one less than the number of categories of that variable.

4.5. Testing for structural stability of regression models


Until now, in the models considered in this chapter we assumed that the qualitative
variables affect the intercept but not the slope coefficient of the various subgroup
regressions. But what if the slopes are also different? If the slopes are in fact
different, testing for differences in the intercepts may be of little practical
significance. Therefore, we need to develop a general methodology to find out
whether two (or more) regressions are different, where the difference may be in
the intercepts or the slopes or both.

4.6. Interaction effects


Consider the following model:
Yi   1   2 D2i   3 D3i  X i  u i ---------------------------------(5.08)

where Yi  annual expenditure on clothing


X i  income

D2  1 if female
= 0 if male
D3  1 if college graduate

= 0 otherwise
Implicit in this model is the assumption that the differential effect of the sex
dummy D2 is constant across the two levels of education and the differential

Econometrics
Yadeta Ahmed

effect of the education dummy D3 is also constant across the two sexes. That is, if,
say, the mean expenditure on clothing is higher for females than males this is so
whether they are college graduates or not. Likewise, if, say, college graduates on
the average spend more on clothing than non college graduates, this is so whether
they are female or males.

In many applications such an assumption may be untenable. A female college


graduate may spend more on clothing than a male graduate. In other words, there
may be interaction between the two qualitative variables D2 and D3 and therefore
their effect on mean Y may not be simply additive as in (5.08) but multiplicative
as well, as in the following model:
Yi   1   2 D2i   3 D3i   4 ( D2i D3i )  X i  u i -----------------(4.09)

From (4.09) we obtain


E (Yi | D2  1, D3  1, X i )  ( 1   2   3   4 )  X i ------------(4.10)

which is the mean clothing expenditure of graduate females. Notice that


 2  differential effect of being a female
 3  differential effect of being a college graduate

 4  differential effect of being a female graduate


which shows that the mean clothing expenditure of graduate females is different
(by  4 ) from the mean clothing expenditure of females or college graduates. If
 2 ,  3 , and  4 are all positive, the average clothing expenditure of females is

higher (than the base category, which here is male nongraduate), but it is much
more so if the females also happen to be graduates. Similarly, the average
expenditure on clothing by a college graduate tends to be higher than the base
category but much more so if the graduate happens to be a female. This shows
how the interaction dummy modifies the effect of the two attributes considered
individually. Whether the coefficient of the interaction dummy is statistically

Econometrics
Yadeta Ahmed

significant can be tested by the usual t test. If it turns out to be significant, the
simultaneous presence of the two attributes will attenuate or reinforce the
individual effects of these attributes. Needless to say, omitting a significant
interaction term incorrectly will lead to a specification bias.

4.7. The use of dummy variables in seasonal analysis


Many economic time series based on monthly or quarterly data exhibit seasonal
patterns (regular oscillatory movement). Examples are sales of department stores
at Christmastime, demand for money (cash balances) by households at holiday
times, demand for ice cream and soft drinks during the summer, and prices of
crops right after the harvesting season. Often it is desirable to remove the seasonal
factor, or component, from a time series so that one may concentrate on the other
components, such as the trend. The process of removing the seasonal component
from a time series is known as deseasonalization, or seasonal adjustment, and the
time series thus obtained is called the deseasonalized or seasonally adjusted, time
series. Important economic time series, such as the consumer price index, the
wholesale price index, the index of industrial production, are usually published in
the seasonably adjusted form.

4.8. Piecewise linear regression


To illustrate yet another use of dummy variables, consider fig 5.3, which shows
how a hypothetical company remunerates its sales representatives.

Econometrics
Yadeta Ahmed

It pays commissions based on sales in such manner that up to a certain level, the
target, or threshold, level X*, there is one (stochastic) commission structure and
beyond that level another. (Note: Besides sales, other factors affect sales
commission. Assume that these other factors are represented by the stochastic
disturbance term.) More specifically, it is assumed that sales commission increases
linearly with sales until the threshold level X*, after which also it increases
linearly with sales but at a much steeper rate. Thus, we have a piece-wise linear
regression consisting of two linear pieces or segments, which are labeled I and II
in fig. 5.3, and the commission function changes its slope at the threshold value.
Given the data on commission, sales, and the value of the threshold level X*, the
technique of dummy variables can be used to estimate the (differing) slopes of the
two segments of the piecewise linear regression shown in fig. 5.3. We proceed as
follows:
Yi   1  X   2 ( X i  X *)Di  u i ------------------------------------(5.11)

where Yi  sales commission


X i  volume of sales generated by the sales person

X*= threshold value of sales also known as a knot (Known in advance)


D=1 if X i  X *

Econometrics
Yadeta Ahmed

= 0 if X i  X *
Assuming E (u i )  0, we see at once that
E (Yi | Di  0, X i , X *)   1  1 X i ---------------------------------------(5.12)

which gives the mean sales commission up to the target level X* and
E (Yi | Di  1, X i , X *)   1   2 X * ( 1   2 ) X i ----------------------(5.13)

which gives the mean sales commission beyond the target level X*.

Thus,  1 gives the slope of the regression lien in segment I, and 1   2 gives the
slope of the regression line in segment II of the piecewise linear regression shown
in fig 5.3. A test of the hypothesis that there is no break in the regression at the
threshold value X* can be conducted easily by noting the statistical significance of
the estimated differential slope coefficient ̂ 2 .

Summary:
1. Dummy variables taking values of 1 and 0 (r their linear transforms) are a
means of introducing qualitative regressors in regression analysis.
2. Dummy variables are a data-classifying device in that they divide a sample
into various subgroups based on qualities or attributes (sex, marital status,
race, religion, etc.) and implicitly allow one to run individual regressions
for each subgroup. If there are differences in the response of the regress
and to the variation in the quantitative variables in the various subgroups,
they will be reflected in the differences in the intercepts or slope
coefficients, or both, of the various subgroup regressions.
3. Although a versatile took, the dummy variable technique needs to be
handled carefully. First, if the regression contains a constant term, the
number of dummy variables must be less than the number of classifications
of each qualitative variable. Second, the coefficient attached to the dummy

Econometrics
Yadeta Ahmed

variables must always be interpreted in relation to the base, or reference,


group, that is, the group that gets the value of zero. Finally, if a model has
several qualitative variables with several classes, introduction of dummy
variables can consume a large number of degrees of freedom. Therefore,
one should always weigh the number of dummy variables to be introduced
against the total number of observations available for analysis.
4. Among its various applications, this chapter considered but a few. These
included (1) comparing two (or more) regressions, (2) deseasonalizing time
series data, (3) combining time series and cross-sectional data, and(4)
piecewise linear regression models.
5. Since the dummy variables are non stochastic, they pose no special
problems in the application of OLS. However, care must be exercised in
transforming data involving dummy variables. In particular, the problems
of autocorrelation and heteroscedasticity need to be handled very carefully.

Test your-self question


In studying the effect of a number of qualitative attributes on the prices
charged for movie admissions in a large metropolitan area for the period 1961-
1964, R. D.Lampson obtained the following regression for the year 1961:
Yˆ  4.13  5.77D1  8.12D2  7.68D3  1.13D4  27.09D5  31.46 log X 1  0.81X 2  3other dummy var iables
(2.04) (2.67) (2.51) (1.78) (3.58) (13.78) (0.17)
R  0.961
2

where : D1  theater location: 1 if suburban, 0 if city center


D2  theater age: 1 if less than 10 years since construction or major
renovation, 0 otherwise.
D3  type of theater: 1 if outdoor, 0 if indoor

D4  parking: 1 if provided, 0 otherwise

Econometrics
Yadeta Ahmed

D5  Screening policy: 1 if first run, 0 otherwise

X 1  average percentage unused seating capacity per showing

X 2  Average film rental, cents per ticket charged by the distributor

Y  adult evening admission price, cents

and where the figures in parentheses are standard errors.


a. Comment on the results.
b. How would you rationalize the introduction of the variable X 1 ?
c. How would you explain the negative value of the coefficient of D4 ?

Econometrics
Yadeta Ahmed

Chapter Five

An Introduction to Simultaneous Equation models


5.1. Introduction
In all the previous chapters discussed so far, we have been focusing exclusively with the
problems and estimations of a single equation regression models. In such models, a
dependent variable is expressed as a linear function of one or more explanatory variables.
The cause-and-effect relationship in such models between the dependent and
independent variable is unidirectional. That is, the explanatory variables are the cause
and the independent variable is the effect. But there are situations where such one-way or
unidirectional causation in the function is not meaningful. This occurs if, for instance, Y
(dependent variable) is not only function of X’s (explanatory variables) but also all or
some of the X’s are, in turn, determined by Y. There is, therefore, a two-way flow of
influence between Y and (some of) the X’s which in turn makes the distinction between
dependent and independent variables a little doubtful. Under such circumstances, we
need to consider more than one regression equations; one for each interdependent
variables to understand the multi-flow of influence among the variables. This is precisely
what is done in simultaneous equation models.

A system describing the joint dependence of variables is called a system of simultaneous


equation or simultaneous equations model. The number of equations in such models is
equal to the number of jointly dependent or endogenous variables involved in the
phenomenon under analysis. Unlike the single equation models, in simultaneous equation
models it is not usually possible (possible only under specific assumptions) to estimate a
single equation of the model without taking into account the information provided by
other equation of the system. If one applies OLS to estimate the parameters of each
equation disregarding other equations of the model, the estimates so obtained are not only
biased but also inconsistent; i.e. even if the sample size increases indefinitely, the
estimators do not converge to their true values.

Econometrics
Yadeta Ahmed

The bias arising from application of such procedure of estimation which treats each
equation of the simultaneous equations model as though it were a single model is known
as simultaneity bias or simultaneous equation bias. To avoid this bias we will use other
methods of estimation, such as, Indirect Least Square (ILS), Two Stage Least Square
(2SLS), three Stage Least Square(3SLS), Maximum Likelihood Methods and the Method
of Instrumental Variable (IV).

What happens to the parameters of the relationship if we estimate by applying OLS to


each equation without taking into account the information provided by the other
equations in the system? The application of OLS to estimate the parameters of economic
relationships presupposes the classical assumptions discussed in chapter one of this
course. One of the crucial assumptions of the OLS is that the explanatory variables and
the disturbance term is independent i.e. the disturbance term is truly exogenous.
Symbolically: E[XiUi] = 0. As a result, the linear model could be interpreted as
describing the conditional expectation of the dependent variable (Y) given a set of
explanatory variables. In the simultaneous equation models, such independence of
explanatory variables and disturbance term is violated i.e. E[XiUi]  0. If this
assumption is violated, the OLS estimator is biased and inconsistent.
Simultaneity bias of OLS estimators: The two-way causation in a relationship
leads to violation of the important assumption of linear regression model, i.e. one
variable can be dependent variable in one of the equation but becomes also
explanatory variable in the other equations of the simultaneous-equation model.
In this case E[XiUi] may be different from zero. To show simultaneity bias, let’s
consider the following simple simultaneous equation model.
Y   0  1 X  U 
 --------------------------------------------------(10)
X   0  1Y   2 Z  V 

Suppose that the following assumptions hold.

Econometrics
Yadeta Ahmed

(U )  0 , (V )  0
(U )   ,
2 2
u (V 2 )   u2
(U iU j )  0 , (ViV j )  0, also (UiVi )  0;

where X and Y are endogenous variables and Z is an exogenous variable.


The reduced form of X of the above model is obtained by substituting Y in the
equation of X.
X   0  1 ( 0   1 X  U )   2 Z  V

 0   0 1   2    U V 
X     Z   1                       (11)
1   1 1  1   1 1   1   1 1 
Applying OLS to the first equation of the above structural model will result in
biased estimator because cov( X iU i )  ( X iU j )  0 . Now, let’s proof whether this

expression.
cov( XU )  X  ( X )U  (U )

 X  ( X )U                           (12)

    0 1   2    U V   0   0 1   2  
   0    Z   1      Z U
 1   1 1  1   1 1   1   1 1  1   1 1  1   1 1  

Substituting the value of X in equation (11) into equation (12)


 U 
   (  0   0 1   2 Z  1U  V   0  1 0   2 Z )
1   1 1 

 U 
   ( 1U  V )
1   1 1 

 1 
  ( 1U 2  UV )
 1   
1 1 

 1  2
  (U 2 )  1 u  0 , since E(UV) = 0
 1   1 1  1   1 1

Econometrics
Yadeta Ahmed

That is, covariance between X and U is not zero. As a consequence, if OLS is


applied to each equation of the model separately the coefficients will turn out to be
biased. Now, let’s examine how the non-zero co-variance of the error term and
the explanatory variable will lead to biasness in OLS estimates of the parameters.
If we apply OLS to the first equation of the above structural equation (10)
Y   0   1 X  U , we obtain

xy x(Y  Y ) xY Y x


ˆ 1    ; (since is zero)
x 2 x 2 x 2
x 2
x( 0   1 X  U )  0 x xU xU
   1 
x 2 x 2
x 2 x 2
xX
But, we know that x  0 and  1 , hence
x 2
xU
ˆ   1                       (13)
x 2
Taking the expected values on both sides;
 xU 
(ˆ )  1   2 
 x 
Since, we have already proved that (XU )  0 ; which is the same as ( XU )  0 .
Consequently, when ( XU )  0 ; (ˆ )   , that is ̂ 1 will be biased by the amount
 xu
equivalent to .
x 2

5.2. Definitions of Some Concepts


 Endogenous and exogenous variables
In simultaneous equation models variables are classified as endogenous and
exogenous. The traditional definition of these terms is that endogenous variables
are variables that are determined by the economic model (within the system) and
exogenous variables are those determined from outside. Exogenous variables are
also called predetermined. Predetermined groups can be divided into two

Econometrics
Yadeta Ahmed

categories which are considered in general as exogenous variables. These are:


current and lagged exogenous and lagged endogenous. For instance;
X t and X t 1 depict the current and lagged exogenous variables and Yt 1 depicts

lagged endogenous variable. This is on the assumption that X’s symbolize the
exogenous variables and Y’s symbolize the endogenous variables. Thus, X t , X t 1
and Yt 1 are regarded as predetermined (exogenous) variables.

Since the exogenous variables are predetermined, they are supposed to be


independent of the error terms in the model.
Consider the demand and supply functions.
Q d   0  1 P   2Y  U1                  (14)

Q s   0  1 P   2 R  U 2                  (15)

where : Q=quantity , Y=income, P=price, R=Rainfalls, U1 &U 2 are error terms.

Here P and Q are endogenous variables and Y and R are exogenous variables.
 Structural models
A structural model describes the complete structure of the relationships among the
economic variables. Structural equations of the model may be expressed in terms
of endogenous variables, exogenous variables and disturbances (random
variables). The parameters of structural model express the direct effect of each
explanatory variable on the dependent variable. Variables not appearing in any
function explicitly may have an indirect effect and is taken into account by the
simultaneous solution of the system. For instance, a change in consumption affects
the investment indirectly and is not considered in the consumption function. The
effect of consumption on investment cannot be measured directly by any structural
parameter, but is measured indirectly by considering the system as a whole.

Econometrics
Yadeta Ahmed

Example: The following simple Keynesian model of income determination can


be considered as a structural model.
C     Y  U -----------------------------------------------(16)

Y  C  Z ----------------------------------------------------(17)
for  >0 and 0<<1
where: C=consumption expenditure
Z=non-consumption expenditure
Y=national income
C and Y are endogenous variables while Z is exogenous variable.

 Reduced form of the model:


The reduced form of a structural model is the model in which the endogenous
variables are expressed a function of the predetermined variables and the error
term only.
Illustration: Find the reduced form of the above structural model.
Since C and Y are endogenous variables and only Z is the exogenous variables,
we have to express C and Y in terms of Z. To do this substitute Y=C+Z into
equation (16).
C     (C  Z ) + U
C    C  Z  U

C  C    Z  U

C (1   )     Z  U

    U
C   Z  ----------------------------------(18)
1  1   1 

Substituting again (18) into (17) we get;


  1  U
Y   Z  --------------------------------(19)
1  1   1 

Econometrics
Yadeta Ahmed

Equation (18) and (19) are called the reduced form of the structural model of the
above. We can write this more formally as:
Structural form equations Reduced form equations
C    Y  U     U
C   Z 
1  1   1 

Y CZ   1  U
Y   Z 
1  1   1 

Parameters of the reduced form measure the total effect (direct and indirect) of a
change in exogenous variables on the endogenous variable. For instance, in the
  
above reduced form equation(18),   measures the total effect of a unit
1  
change in the non-consumption expenditure on consumption. This total effect is
 1 
 , the direct effect, times   ,the indirect effect.
1  
The reduced form equations can be obtained in two ways:
1) To express the endogenous variables directly as a function of the
predetermined variables.
2) To solve the structural system of endogenous variables in terms of the
predetermined variables, the structural parameters, and the disturbance
terms.
Consider the following simple model for a closed economy.
Ct = a1Yt + U1 ---------------------------------------------------------(i)
It = b1Yt + b2Yt-1 + U2-----------------------------------------------(ii)
Yt = Ct +It + Gt-------------------------------------------------------(iii)
This model has three equations in three endogenous variables (Ct , It , and Yt ) and
two predetermined variables (Gt, andYt-1).

Econometrics
Yadeta Ahmed

To obtain the reduced form of this model, we may use two methods (direct method
and solving the structural model method).
Direct Method: Express the three endogenous variables(Ct , It , and Yt ) as
functions of the two predetermined variables (Gt, andYt-1) directly using ’s as the
parameters of the reduced form model as follows.
Ct = 11Yt-1 + 12Gt + V1 ------------------------------------(iv)
It , =21Yt-1 + 22Gt +V2 -------------------------------------(v)
Yt =31Yt-1 + 32Gt + V3 ------------------------------------(vi)
Note: 11 , 12 , 21 , 22 , 31 , and 32 are reduced from parameters. By solving the
structural system of endogenous variables in terms of predetermined variables,
structural parameters and disturbances, the expressions for the reduced parameters
can be obtained easily. For instance, the third structural equation (iii) can be
expressed in reduced form as follows:
Yt = b2/ (1-a1-b1)Yt-1 + 1/(1-a1-b1) Gt + (U1 +U2)/ (1-a1-b1). This equation is
obtained by simply substituting structural equations (i) and (ii) in (iii). Form this
expression: 31 = b2/ (1-a1-b1)
32 = b2/ (1-a1-b1)
Test yourself Questions:
a) Determine the reduced form equations for the structural equations (ii) and
(iii).
b) Indicate the expressions for 11 , 12, 21 , and 22 form (a) above.
How to estimate the reduced form parameters?
The estimates of the reduced from coefficients (’s ) may be obtained in two ways.
1) Direct estimation of the reduced coefficients by applying OLS.
2) Indirect estimation of the reduced form coefficients:
Steps:

Econometrics
Yadeta Ahmed

i) Solve the system of endogenous variables so that each equation contains


only predetermined explanatory variables. In this way we may obtain
the system of parameters’ relations (relations between ’s and
structural parameters)
ii) Obtain the estimates of the structural parameters by any appropriate
econometric method.
iii) Substitute the estimates of the structural coefficients into the system of
parameters’ relations to find the estimates of the reduced coefficients,.
 Recursive models
A model is called recursive if its structural equations can be ordered in such a way
that the first equation includes only the predetermined variables in the right hand
side; the second equation contains predetermined variables and the first
endogenous variable (of the first equation) in the right hand side and so on. The
special feature of recursive model is that its equations may be estimated, one at a
time, by OLS without simultaneous equations bias.

OLS is not applicable if there is interdependence between the explanatory


variables and the error term. In the simultaneous equation models, the endogenous
variables may depend on the error terms of the model; hence the OLS technique is
not appropriate for estimation of an equation in a simulations equations model.
However, in a special type of simultaneous equations model called Recursive,
Triangular or Causal model, the use of OLS procedure of estimation is
appropriate. Consider the following three equation system to understand the
nature of such models:
Y1   10  11 X 1  12 X 2  U 1 

Y2   20   21Y1   21 X 1   22 X 2  U 2 
Y3   30   31Y1   32Y2   31 X 1   32 X 2  U 3 

Econometrics
Yadeta Ahmed

In the above illustration, as usual, the X’s and Y’s are exogenous and endogenous
variables respectively. The disturbance terms follow the following assumptions.
(U 1U 2 )  (U 1U 3 )  (U 2U 3 )  0

The above assumption is the most crucial assumption that defines the recursive
model. If this does not hold, the above system is no longer recursive and OLS is
also no longer valid. The first equation of the above system contains only the
exogenous variables on the right hand side. Since by assumption, the exogenous
variable is independent of U 1 , the first equation satisfies the critical assumption of
the OLS procedure. Hence OLS can be applied straight forwardly to this equation.

Consider the second equation. It contains the endogenous variable Y1 as one of the
explanatory variables along with non-stochastic X’s. OLS can be applied to this
equation only if it can be shown that Y1 and U 2 are independent of each other. This
is true because U1, which affects Y1 is by assumption uncorrelated with U 2 , i.e.
(U 1U 2 )  0 . Y1 acts as a predetermined variable in so far as Y2 is concerned.
Hence OLS can be applied to this equation. Similar argument can be stretched to
the 3rd equation because Y1 and Y2 are independent of U 3 . In this way, in the
recursive system OLS can be applied to each equation separately.

Let us build a hypothetical recursive model for an agricultural commodity, say


wheat. The production of wheat  Y1 , may be assumed to depend on exogenous
factors: X 2 = climatic conditions; and X 3 =last season’s price. The retail rice =
Y2 may be assumed to be the function of production level = Y1 and exogenous

factor X 4 = disposable income. Finally the price obtained by the producer = Y3 can
be expressed in terms of the retail price Y2 and exogenous factor X j = the cost of

marketing the producer.

Econometrics
Yadeta Ahmed

The relevant equations of the model may be described as under:


Y1   1   2 X 2   3 X 3  U 1 

Y2   4  1Y1   5 X 4  U 2 
Y3   6   2Y2   7 X 5  U 3 

In the first equation, there are only exogenous variables and are assumed to be
independent of U 1 . In the second equation, the causal relation between Y1 and
Y2 is in one direction. Also Y1 is independent of U 2 and can be treated just like

exogenous variable. Similarly since Y2 is independent of U 3 , OLS can be applied


to the third equation. Thus, we can rewrite the above equations as follows:
Y1   1   2 X 2   3 X 3  U 1 

 1Y1  Y2   4   5 X 4  U 2 
  2Y2  Y3   6   7 X 5  U 3 

We can again rewrite this in matrix form as follows:


 
 
 
 1 0 0 Y1     1   2   3 0 0   X 1  U 1 
  1 0 Y     0 0 5 0   X 2   U 2 
 1  2  4  
 0   1 Y3    6   7   X 3  U 3 
2
0 0 0
  
Coefficient matrix of coefficient matrix of X4
endogenous var iables exogenous var iable  
X 5 
The coefficient matrix of endogenous variables is thus a triangular one; hence
recursive models are also called as triangular models.
5.3. Problems of simultaneous equation models
Simultaneous equation models create three distinct problems. These are:
1. Mathematical completeness of the model: any model is said to be
(mathematically) complete only when it possesses as many independent
equations as endogenous variables. In other words if we happen to know values
of disturbance terms, exogenous variables and structural parameters, then all the
endogenous variables are uniquely determined.

Econometrics
Yadeta Ahmed

2. Identification of each equation of the model: Many times it so happens that a


given set of values of disturbance terms and exogenous variables yield the same
values of different endogenous variables included in the model. It is because the
equations are observationally indistinguishable, what is needed is that the
parameters of each equation in the system should be uniquely determined.
Hence, certain tests are required to examine the identification of each equation
before its estimation.
3. Statistical estimation of each equation of the model: Since application of OLS
yield biased and inconsistent estimates, different statistical techniques are to be
developed to estimate the structural parameters. Some of the most common
simultaneous methods* of estimation are:
i) The indirect least square method(ILS)
ii) The two-stage least square method(2SLS)
iii) The three-stage least square method(3SLS)
iv) Limited information maximum likelihood method (LIML)
v) The instrumental variable method (IV)
vi) The mixed estimation method; and
vii) The full information maximum likelihood method (FIML)
Of the three problems, we are going to discuss the second problem (the identification
problem) in the following section.

5. 4. The identification problem


In simultaneous equation models, the Problem of identification is a problem of
model formulation; it does not concern with the estimation of the model. The
estimation of the model depends up on the empirical data and the form of the
model. If the model is not in the proper statistical form, it may turn out that the
parameters may not uniquely estimated even though adequate and relevant data
are available. In a language of econometrics, a model is said to be identified only

*
These methods of estimation are not discussed in this module as they are beyond the scope of this
introductory course.

Econometrics
Yadeta Ahmed

when it is in unique statistical form to enable us to obtain unique estimates of its


parameters from the sample data. To illustrate the problem identification, let’s
consider a simplified wage-price model.
W =  + P + E + U --------------------------------------(i)
P    W  V ------------------------------------------------(ii)
where W and P are percentage rates of wage and price inflation respectively, E is a
measure of excess demand in the labor market while U and V are disturbances, E
is assumed to be exogenously determined. If E is assumed to be exogeneoulsy
determined, then (i) and (ii) represent two equations determining two endogenous
variables: W and P. Let’s explain the problem of identification with help of these
two equations of a simultaneous equation model.
Let’s use equation (ii) to express ‘W’ in terms of P:
 1 V
W   P -------------------------------------------------(iii)
  
Now, suppose A and B are any two constants. Let’s multiply equation (i) by A,
multiply equation (ii) by B and then add the two equations. This gives
  B B
( A  B)W  A  B   A   P  AE  AU  V or
   

B  A  B  B
A    AU   V
W V    A 
P    -------------------(iv)
E 
A B  A B   A B A B
 
 
Equation (iv) is what is known as a linear combination of (i) and (ii). The point
about equation (iv) is that it is of the same statistical form as the wage equation (i).
That is, it has the form:
W = constant + (constant)P + (constant)E + disturbance
Moreover, since A and B can take any values we like, this implies that our wage
price model generates an infinite number of equations such as (iv), which are all

Econometrics
Yadeta Ahmed

statistically indistinguishable from the wage equation (i). Hence, if we apply OLS
or any other technique to data on W, P and E in an attempt to estimate the wage
equation, we can’t know whether we are actually estimating (i) rather than one of
the infinite number of possibilities given by (iv). Equation (i) is said to be
unidentified, and consequently there is now no way in which unbiased or even
consistent estimators of its parameters may be obtained.

Notice that, in contrast, price equation (ii) cannot be confused with the linear
combination (iv), because it is a relationship involving W and P only and does not,
like (iv), contain the variable E. The price equation (ii) is therefore said to be
identified, and in principle it is possible to obtain consistent estimates of its
parameters. A function (an equation) belonging to a system of simultaneous
equations is identified if it has a unique statistical form, i.e. if there is no other
equation in the system, or formed by algebraic manipulations of the other
equations of the system, contains the same variables as the function(equation) in
question.

Identification problems do not just arise only on two equation-models. Using the
above procedure, we can check identification problems easily if we have two or
three equations in a given simultaneous equation model. However, for ‘n’
equations simultaneous equation model, such a procedure is very cumbersome. In
general for any number of equations in a given simultaneous equation, we have
two conditions that need to be satisfied to say that the model is in general
identified or not. In the following section we will see the formal conditions for
identification.
5.5. Formal Rules (Conditions) for Identification
Identification may be established either by the examination of the specification of
the structural model, or by the examination of the reduced form of the model.

Econometrics
Yadeta Ahmed

Traditionally identification has been approached via the reduced form. Actually the term
‘identification’ was originally used to denote the possibility (or impossibility) of
deducing the values of the parameters of the structural relations from a knowledge of the
reduced form parameters. In this section we will examine both approaches. However, we
think that the reduced form approach is conceptually confusing and computationally
more difficult than the structural model approach, because it requires the derivation of the
reduced from first and then examination of the values of the determinant formed form
some of the reduced form coefficients. The structural form approach is simpler and more
useful.

In applying the identification rules we should either ignore the constant term, or, if we
want to retain it, we must include in the set of variables a dummy variable (say X0) which
would always take on the value 1. Either convention leads to the same results as far as
identification is concerned. In this chapter we will ignore the constant intercept.
5.5.1. Establishing identification from the structural form of the model
There are two conditions which must be fulfilled for an equation to be identified.
1. The order condition for identification
This condition is based on a counting rule of the variables included and excluded
from the particular equation. It is a necessary but not sufficient condition for the
identification of an equation. The order condition may be stated as follows.
For an equation to be identified the total number of variables (endogenous and
exogenous) excluded from it must be equal to or greater than the number of
endogenous variables in the model less one. Given that in a complete model the
number of endogenous variables is equal to the number of equations of the model,
the order condition for identification is sometimes stated in the following
equivalent form. For an equation to be identified the total number of variables
excluded from it but included in other equations must be at least as great as the
number of equations of the system less one.
Let: G = total number of equations (= total number of endogenous variables)

Econometrics
Yadeta Ahmed

K= number of total variables in the model (endogenous and predetermined)


M= number of variables, endogenous and exogenous, included in a
particular equation.
Then the order condition for identification may be symbolically expressed as:
(K  M )  (G  1)
excluded
 var iable   total number of equatioins 1
 
For example, if a system contains 10 equations with 15 variables, ten endogenous
and five exogenous, an equation containing 11 variables is not identified, while
another containing 5 variables is identified.
a. For the first equation we have
G  10 K  15 M  11

Order condition:
( K  M )  (G  1)
; that is, the order condition is not satisfied.
(15  11)  (10  1)
b. For the second equation we have
G  10 K  15 M 5

order condition:
( K  M )  (G  1)
; that is, the order condition is satisfied.
(15  5)  (10  1)
The order condition for identification is necessary for a relation to be identified,
but it is not sufficient, that is, it may be fulfilled in any particular equation and yet
the relation may not be identified.
2. The rank condition for identification
The rank condition states that: in a system of G equations any particular equation
is identified if and only if it is possible to construct at least one non-zero
determinant of order (G-1) from the coefficients of the variables excluded from
that particular equation but contained in the other equations of the model. The

Econometrics
Yadeta Ahmed

practical steps for tracing the identifiablity of an equation of a structural model


may be outlined as follows.
Firstly. Write the parameters of all the equations of the model in a separate table,
noting that the parameter of a variable excluded from an equation is equal to zero.
For example let a structural model be:
y1  3 y2  2 x1  x2  u1

y 2  y 3  x3  u 2

y3  y1  y 2  2 x3  u 3

where the y’s are the endogenous variables and the x’s are the predetermined
variables. This model may be rewritten in the form
 y1  3 y 2  0 y3  2 x1  x 2  0 x3  u1  0

0 y1  y 2  y3  0 x1  0 x2  x3  u 2  0

y1  y 2  y3  0 x1  0 x2  2 x3  u 3  0

Ignoring the random disturbance the table of the parameters of the model is as
follows:
Variables
Equations Y1 Y2 Y3 X1 X2 X3
1st equation -1 3 0 -2 1 0
2nd equation 0 -1 1 0 0 1
3rd equation 1 -1 -1 0 0 -2

Secondly. Strike out the row of coefficients of the equation which is being
examined for identification. For example, if we want to examine the identifiability
of the second equation of the model we strike out the second row of the table of
coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation
being examined appears. By deleting the relevant row and columns we are left
with the coefficients of variables not included in the particular equation, but
contained in the other equations of the model. For example, if we are examining

Econometrics
Yadeta Ahmed

for identification the second equation of the system, we will strike out the second,
third and the sixth columns of the above table, thus obtaining the following tables.
Table of structural parameters Table of parameters of excluded variables

Y1 Y2 Y3 X1 X2 X3 Y3 X1 X2
  
st
1 -1 3 0 -2 1 0 -1 -2 1
2nd 0 -1 1 0 0 1
3rd 1 -1 -1 0 0 -2 1 0 0

Fourthly. Form the determinant(s) of order (G-1) and examine their value. If at
least one of these determinants is non-zero, the equation is identified. If all the
determinants of order (G-1) are zero, the equation is underidentified.
In the above example of exploration of the identifiability of the second structural
equation we have three determinants of order (G-1)=3-1=2. They are:
1  2 2 1 1 1
1  0 2  0 3  0
1 0 0 0 1 0

(the symbol  stands for ‘determinant’) We see that we can form two non-zero
determinants of order G-1=3-1=2; hence the second equation of our system is
identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use
the order condition ( K  M )  (G  1). With this criterion, if the equality sign is
satisfied, that is if ( K  M )  (G  1) , the equation is exactly identified. If the
inequality sign holds, that is, if ( K  M )  (G  1) , the equation is overidentified.
In the case of the second equation we have:
G=3 K=6 M=3
And the counting rule ( K  M )  (G  1) gives
(6-3)>(3-1)
Therefore the second equation of the model is overidentified.
The identification of a function is achieved by assuming that some variables of the
model have zero coefficient in this equation, that is, we assume that some

Econometrics
Yadeta Ahmed

variables do not directly affect the dependent variable in this equation. This,
however, is an assumption which can be tested with the sample data. We will
examine some tests of identifying restrictions in a subsequent section. Some
examples will illustrate the application of the two formal conditions for
identification.
Example 1. Assume that we have a model describing the market of an agricultural
product. From the theory of partial equilibrium we know that the price in a market
is determined by the forces of demand and supply. The main determinants of the
demand are the price of the commodity, the prices of other commodities, incomes
and tastes of consumers. Similarly, the most important determinants of he supply
are the price of the commodity, other prices, technology, the prices of factors of
production, and weather conditions. The equilibrium condition is that demand be
equal to supply. The above theoretical information may be expressed in the form
of the following mathematical model.
D  a0  a1 P1  a 2 P2  a3Y  a 4 t  u

D  b0  b1 P1  b2 P2  b3C  b4 t  w

DS
Where: D= quantity demanded
S= quantity supplied

P1  price of the given commodity


P2  price of other commodities
Y= income
C= costs (index of prices of factors of production)
t= time trend. In the demand function it stands for ‘tastes’; in the supply function it stands for
‘technology’.

The above model is mathematically complete in the sense that it contains three
equations in three endogenous variables, D,S and P1. The remaining variables, Y,
P2, C, t are exogenous. Suppose we want to identify the supply function. We
apply the two criteria for identification:

Econometrics
Yadeta Ahmed

1. Order condition: ( K  M )  (G  1)
In our example we have: K=7 M=5 G=3
Therefore, (K-M)=(G-1) or (7-5)=(3-1)=2
Consequently the second equation satisfies the first condition for identification.
2. Rank condition
The table of the coefficients of the structural model is as follows.
Variables
Equations P1 P2 t S C
D Y
st
1 equation -1 a1 a2 a3 a4 0 0
2nd equation 0 b1 b2 0 b4 -1 b3
3rd equation 1 0 0 0 0 1 0

Following the procedure explained earlier we strike out the second row an the
second, third, fifth, sixth and seventh columns. Thus we are left with the table of
the coefficients of excluded variables:
Complete table of Table of parameters of
Structural parameters variables excluded from
the second equation
-1 a1 a2 a3 a4 0 0 -1 a3
0 b1 b2 0 b4 1 b3
1 0 0 00 1 1 -1 0
From this table we can form only one non-zero determinant of order
(G-1) = (3-1) =2
1 a3
  (0)(1)  (1)(a3 )  a3
1 0

The value of the determinant is non-zero, provided that a 3  0 .


We see that both the order and rank conditions are satisfied. Hence the second
equation of the model is identified. Furthermore, we see that in the order
condition the equality holds: (7-5) = (3-1) = 2. Consequently the second
structural equation is exactly identified.
Example 2. Assume the following simple version of the Keynesian model of
income determination.

Econometrics
Yadeta Ahmed

Consumption function: Ct  a0  a1Yt  a 2Tt  u


Investment function: I t  b0  b1Yt 1  u
Taxation function: Tt  c0  c1Yt  w
Definition: Yt  Ct  I t  Gt
This model is mathematically complete in the sense that it contains as many
equations as endogenous variables. There are four endogenous variables, C,I,T,Y,
and two predetermined variables, lagged income (Yt 1 ) and government
expenditure (G).
A. The first equation (consumption function) is not identified
1. Order condition: ( K  M )  (G  1)
There are six variables in the model (K=6) and four equations (G=4). The
consumption function contains three variables (M=3).
(K-M)=3 and (G-1)=3
Thus (K-M)=(G-1), which shows that the order condition for identification is
satisfied.
2. Rank condition
The table of structural coefficients is as follows
Variables
Equations C Y T I Yt 1 G
st
1 equation -1 a1 a2 0 0 0
2nd equation 0 0 -1 b1 0
3rd equation 0 C1 -1 0 0 0
4th equation -1 0 1 0 1
1
1

We strike out the first row and the three first columns of the table and thus obtain
the table of coefficients of excluded variables.

Complete table of Table of coefficients of


structural parameters excluded variables

Econometrics
Yadeta Ahmed

-1 a1 a 2 0 0 0
0 0 0 -1 b1 0 -1 b1 0
0 c1 -1 0 0 0 0 0 0
1 -1 0 1 0 1 -1 0 0
We evaluate the determinant of this table. Clearly the value of this determinant is
zero, since the second row contains only zeros. Consequently we cannot form any
nonzero determinant of order 3(=G-1). The rank condition is violated. Hence we
conclude that the consumption function is not identified, despite the satisfaction of
the order criterion.
B. The investment function is overidentified
1. Order condition
The investment function includes two variables. Hence
K-M = 6-2
Clearly (K-M) > (G-1), given that G-1=3. The order condition is fulfilled.
2. Rank condition
Deleting the second row and the fourth and fifth columns of the structural
coefficients table we obtain .
Complete table of structural Table of coefficients of
Parameters excluded variables
-1 a1 a2 0 0 0
0 0 0 -1 b1 0 -1 a1 a2 0
0 c1 -1 0 0 0 0 c1 -1 0
1 -1 0 1 0 1 -1 -1 0 1

The value of the first 3x3 determinant of the parameters of excluded variables is
c1 1 0 1 0 c1
 1  1  a1  a2  1  a1  a 2 c1  0
1 0 1 0 1 1

(provided a1  a2 c1  1)
The rank condition is satisfied since we can construct at least one non-zero
determinant of order 3=(G-1).

Econometrics
Yadeta Ahmed

Applying the counting rule ( K  M )  (G  1) we see that the inequality sign holds:
4>3; hence the investment function is overidentified.

Self-test Question: Detect the identificability of the tax equation .

5.5.2. Establishing identification from the reduced form


Like that of the identification conditions from structural equations, there are two
conditions for identification based on the reduced form of the model, an order
condition and a rank condition. The order condition is the same as in the
structural model. The rank condition here refers to the value of the determinant
formed from some of the reduced form parameters, π‘s.
1. Order condition (necessary condition), as applied to the reduced form
An equation belonging to a system of simultaneous equations is identified if
(K  M )  (G  1)
Total number of   number of 
excluded var iables  equations less one
   
where K, M and G have the same meaning as before:
K= total number of variables, endogenous and exogenous, in the entire
model
M= number of variables, endogenous and exogenous, in any particular
equation
G= number of structural equations=number of all endogenous variables in
the model
If (K-M) = (G-1), the equation is exactly identified, provided that the rank
condition set out below is also satisfied. If (K-M)>(G-1), the equation is
overidentified, while if (K-M)<(G-1), the equation is underidentified, under the
same proviso.

Econometrics
Yadeta Ahmed

2. Rank condition as applied to the reduced form


Let G* stand for the number of endogenous variables contained in a particular
equation. The rank condition as applied to the reduced form may be stated as
follows.
An equation containing G* endogenous variables is identified if and only if it is
possible to construct at least one non-zero determinant of order G*-1 from the
reduced form coefficients of the exogenous (predetermined) variables excluded
from that particular equation.

The practical steps involved in this method of identification may be outlined as


follows.
Firstly. Obtain the reduced form of structural model. For example assume that
the original model is
y1  b12 y 2  y11 x1  y12 x2  u1

y 2  b23 y3  y 23 x3  u 2

y3  b31 y1  b32 y 2  y33 x3  u 3

This model is complete in the sense that it contains three equations in three
endogenous variables. The model contains altogether six variables, three
endogenous ( y1 , y 2 , y3 ) and three exogenous ( x1 , x2 , x3 ).
The reduced form of the model is obtained by solving the original equations for
the exogenous variables. The reduced form in the above example is:
y1   11 x1   12 x 2   13 x3  v1

y 2   21 x1   22 x2   23 x3  v2

y3   31 x1   32 x 2   33 x3  v3

where the π’s are functions of the structural parameters.


Secondly. Form the complete table of the reduced form coefficients.
Exogenous Variables

Econometrics
Yadeta Ahmed

Equations x1 x2 x3
1st equation: y1  11  12  13
2nd equation: y 2
3rd equation: y 3  21  22  23
 31  32  33

Strike out the rows corresponding to endogenous variables excluded from the
particular equation being examined for identifiability. Also strike out all the
columns referring to exogenous variables included in the structural form of the
particular equation.
After these deletions we are left with the reduced form coefficients of exogenous
variables excluded (absent) from the structural equation. For example, assume that
we are investigating the identification procedure are found by striking out the first
row (since y1 , does not appear in the second equation) and the third column
(since x3 , is included in this equation).
Complete table of reduced Table of reduced form
form coefficients coefficients of excluded
exogenous variables
 11  12  13  21  22

 21  22  23  31
 31  32  33  32

Thirdly. Examine the order of the determinants of the π’s of excluded exogenous
variables and evaluate them. If the order of the larges non-zero determinant is
G*-1, the equation is identified. Otherwise the equation is not identified.

Econometrics

You might also like