Topic 1
Background Material
John Stapleton
()
Background Material
1 / 85
Table of Contents
1.1 A review of some basic statistical concepts
1.2 Random regressors
1.3 Modelling the conditional mean
1.3.1 Specifying a functional form for the conditional mean
1.3.2 Choosing the regressors
1.4 Some asymptotic theory
1.4.1 Introduction
1.4.2 Consistency
1.4.3 Asymptotic normality
1.4.4 Asymptotic e ciency
1.5 Testing linear restrictions on the parameters
1.6 A review of generalized least squares (GLS)
(ETC3410)
Background Material
2 / 85
1.1 A Review of some basic statical concepts I
Denition (1.1)
Let x be a discrete random variable which can take on the values (x1 ,
x2, ......., xn ) with probabilities ( f (x1 ), f (x2 ), ......., f (xn )) respectively. Then
the mean or expected value or expectation of x , which we denote by E (x ), is
dened as:
E (x ) = xi f (x i ).
i =1
If x is a continuous random variable with probability density function f (x ),
then
Z
E (x ) =
xf (x )dx.
For any set of random variables x , y and z, the expectations operator satises the
following rules:
(ETC3410)
Background Material
3 / 85
1.1 A Review of some basic statical concepts II
R1 E(x + y + z) = E(x) + E(y) + E(z).
R2 E(k) = k for any constant k.
R3 E(kx) = kE(x) for any constant k.
R4 E(k + x) = k + E(x) for any constant k.
R5 In general, E(xy) 6=E(x)E(y).
(ETC3410)
Background Material
4 / 85
1.1 A Review of some basic statical concepts III
Denition (1.2)
The variance of the random variable x , which we denote by Var (x ), is dened
as:
Var (x ) = E f[x
=
=
=
=
E (x )]2 g
E [x 2 + E (x )2
2xE (x )]
2E (x )E (x )
2E (x )2
E (x ) + E (x )
E (x ) + E (x )
E (x 2 )
E (x )2 .
Informally, Var (x ) measures how tightly the values of x are clustered around the
mean.
(ETC3410)
Background Material
5 / 85
1.1 A Review of some basic statical concepts IV
Denition (1.3)
Let x and y be two random variables. Then the covariance between x and y ,
which we denote by Cov (x, y ) is dened as:
Cov (x, y ) = E f[x
E (x )][y
E (y )]g.
Cov (x, y ) measures the degree of linear association between x and y .
Notice that
Cov (x, y ) = E f[x
E (x )][y
E (y )]g
= E [xy xE (y ) yE (x ) + E (x )E (y )]
= E (xy ) E (x )E (y ) E (y )E (x ) + E (x )E (y )
= E (xy ) E (x )E (y ).
(ETC3410)
Background Material
6 / 85
1.1 A Review of some basic statical concepts V
Therefore, in the special case in which
E (x ) = 0 and/or E (y ) = 0,
the formula for the covariance between x and y simplies to
Cov (x, y ) = E (xy ).
For any pair of random variables x and y and any constants a and b, the Var
operator satisfy the following rules:
R6 Var(a) = 0.
R7 Var(ax) = a2 Var (x ).
R8 Var(ax + by) = a2 Var (x ) + b 2 Var (y ) + 2abCov (x, y ).
(ETC3410)
Background Material
7 / 85
1.1 A Review of some basic statical concepts VI
R9 If x and y are independent random variables, Cov(x,y) = 0 and
Var (ax + by ) = a2 Var (x ) + b 2 Var (y ).
As a measure of linear association, the covariance suers from two serious
limitations:
The value of Cov (x, y ) depends on the units in which x and y are
measured.
The value of Cov (x, y ) is di cult to interpret. For example, how to we
interpret the statement that
Cov (x, y ) = 2?
Correlation, which we dene below, is a superior measure of the degree of
linear association between two random variables.
(ETC3410)
Background Material
8 / 85
1.1 A Review of some basic statical concepts VII
Denition (1.4)
Let x and y be two random variables. Then the correlation between x and y ,
which we denote by Corr (x, y ), is dened as:
Corr (x, y ) =
Cov (x, y )
,
SD (x )SD (y )
where
SD (x ) = Var (x )1/2 , SD (y ) = Var (y )1/2 .
It can be shown that
(ETC3410)
Corr (x, y )
Background Material
1.
9 / 85
1.1 A Review of some basic statical concepts VIII
Corr (x, y ) is unit free and is easy to interpret. For example, if
Corr (x, y ) = 0.8
we conclude that there is a strong, positive, linear relationship between x
and y.
(ETC3410)
Background Material
10 / 85
1.2 Random regressors I
In introductory econometrics units it is often assumed that the regressors in
the model are not random variables. For example, in the simple bivariate
regression model
yi = 0 + 1 xi + ui
yi and ui are assumed to be random variables, but xi is assumed to be a
xed number which does not change in value from sample to sample.
While this assumption is useful for pedagogical purposes because it simplies
the analysis, it is inappropriate for the nonexperimental data with which
we typically work in disciplines such as economics and nance.
Nonexperimental data is data that is not generated by performing a
controlled experiment.
(ETC3410)
Background Material
11 / 85
1.2 Random regressors II
When working with nonexperimental data, it is appropriate to treat both
the dependent variable and the regressors in our regression models as
random variables. Under this more realistic assumption, when we collect a
sample of data
(yi , xi ), i = 1, 2, ..., N
we are eectively making a drawing from the joint probability distribution
of the random variables
(yi , xi ).
Consider the multivariate linear regression model
yi = 0 + 1 xi 1 + 2 xi 2 + ... + k xik + ui .
(1.1)
Let
fJ (yi , xi 1 , ...., xik j),
(ETC3410)
Background Material
12 / 85
1.2 Random regressors III
denote the joint probability distribution of the random variables
(yi , xi 1 , ...., xik ), with parameter vector . That is, is the vector of
parameters that appears in the mathematical formula for the joint
probability distribution of (yi , xi 1 , ...., xik ).
Recall from elementary statistics that
fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j), (1.2)
where:
fJ (yi , xi 1 , ...., xik j) is the joint probability distribution of
(yi , xi 1 , ...., xik ).
fC (yi jxi 1 , ...., xik , ) is the probability distribution of yi conditional on
(xi 1 , ...., xik ).
fJ (xi 1 , ...., xik j) is the joint probability distribution of (xi 1 , ...., xik ).
(ETC3410)
Background Material
13 / 85
1.2 Random regressors IV
Notice that the conditional probability distribution,
fC (yi jxi 1 , ...., xik , ), enables us to make probability statements about y
conditional on the values of (xi 1 , ...., xik ) being xed.
The most general statistical analysis of the behavior of (yi , xi 1 , ...., xik )
would involve constructing a mathematical model of fJ (yi , xi 1 , ...., xik j).
However, this task is usually too di cult and instead we restrict our
attention to modelling fC (yi jxi 1 , ...., xik , ).
Since,
fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j), (1.2)
this strategy obviously means that we ignore fJ (xi 1 , ...., xik j), and lose
any information that it contains regarding the parameter vector .
(ETC3410)
Background Material
14 / 85
1.2 Random regressors V
The strategy of focusing on fC (yi jxi 1 , ...., xik , ) and ignoring
fJ (xi 1 , ...., xik j) does not entail any loss of information in the following
special case. Let
= ( , ),
where is the vector of parameters of interest, and assume that
fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j)
(1.3)
Notice that in (1.3) the parameter vector of interest appears only in the
conditional distribution of yi .
When (1.3) holds, (xi 1 , ...., xik ) are said to be weakly exogenous with
respect to , and there is no loss of information as a result of ignoring
fJ (xi 1 , ...., xik j) and focusing exclusively on fC (yi jxi 1 , ...., xik , ).
(ETC3410)
Background Material
15 / 85
1.2 Random regressors VI
In fact, even modelling fC (y jx1 , x2 , ..., xk ) is usually too di cult. Instead,
we typically, we focus on only one feature of the conditional distribution of
yi , namely the conditional mean, which we denote by E (y jx1 , x2 , ..., xk ).
(To economize on notation, the parameter vector and the subscript i are
suppressed).
In particular, we are usually most interested in estimating and testing
hypotheses about how the conditional mean of y changes in response to
changes in (x1 , x2 , ..., xk ).
Typically, y will not assume its conditional mean value. Let u denote the
deviation of y from its conditional mean. Then, by denition,
u=y
E (y jx1 , x2 , ..., xk ).
(1.4)
y = E (y jx1 , x2 , ..., xk ) + u.
(1.5)
Rearranging (1.4) we obtain
(ETC3410)
Background Material
16 / 85
1.2 Random regressors VII
Equation (1.5) is sometimes referred to as the error form of the model, or
the model in error form.
When we take conditional expectations of both sides of (1.5) we obtain
E (y jx1 , x2 , ..., xk ) = E (y jx1 , x2 , ..., xk ) + E (u jx1 , x2 , ..., xk ),
which implies that
E (u jx1 , x2 , ..., xk ) = 0.
(1.6)
Equations (1.5) and (1.6) together imply that we can always express y as
the sum of its true conditional mean and a random error term, which itself
has a conditional mean of zero.
(ETC3410)
Background Material
17 / 85
1.2 Random regressors VIII
If xj is a continuous variable, the marginal or partial eect of xj on the
average value of y is given by
E (y jx1 , x2 , ..., xk )
.
xj
(1.7)
A great deal of applied econometrics consists of trying to correctly specify
the conditional mean of the dependent variable y , and trying to obtain an
estimator of the marginal eects of interest that has goodstatistical
properties.
There are two aspects to specifying the conditional mean of the dependent
variable:
We must specify a functional form for the conditional mean.
We must decide what explanatory variables to include in the
conditional mean function.
We briey consider each of these issues in the following two subsections.
(ETC3410)
Background Material
18 / 85
1.3 Modelling the conditional mean I
1.3.1 Specifying a functional form for the conditional mean
In order to model the conditional mean we have to make an assumption
about its functional form. The assumption that we make has important
implications for:
How we compute the marginal eects of the x variables.
The properties of the marginal eects.
How we interpret the regression coe cients.
The method we use to estimate the regression coe cients.
In this section we briey consider the most common specications that are
used for the conditional mean. To economize on notation, we assume a
model with two explanatory variables and an intercept.
M1 The conditional mean is assumed to be linear in both the parameters and
the regressors.
(ETC3410)
Background Material
19 / 85
1.3 Modelling the conditional mean II
1.3.1 Specifying a functional form for the conditional mean
Under this specication the conditional mean is given by
E (y jx1 , x2 ) = + 1 x1 + 2 x2 ,
(1.8)
and the model in error form is
= E (y jx1 , x2 ) + u
= + 1 x1 + 2 x2 + u.
(1.9)
From (1.8) we have
E (y jx1 , x2 )
= j , j = 1, 2.
xj
(1.10)
Under this specication for the conditional mean:
The marginal eect of xj is constant and equal to j .
(ETC3410)
Background Material
20 / 85
1.3 Modelling the conditional mean III
1.3.1 Specifying a functional form for the conditional mean
j measures the change in the conditional mean of the dependent
variable arising from a one unit change in xj , holding the other
regressor constant.
The marginal eect of xj does not vary across observations and does
not depend on the value of any of the regressors.
M2 The conditional mean is assumed to be linear in the parameters but
nonlinear in one or more of the regressors.
For example,
E (y jx1 , x2 ) = + 1 x12 + 2 x22 ,
(1.11)
or, in error form,
(ETC3410)
= E (y jx1 , x2 ) + u
= + 1 x12 + 2 x22 + u.
Background Material
(1.12)
21 / 85
1.3 Modelling the conditional mean IV
1.3.1 Specifying a functional form for the conditional mean
From (1.11) we have
E (y jx1 , x2 )
E (y jx1 , x2 )
= 21 x1 ,
= 22 x2 .
x1
x2
(1.13)
Under this specication:
The marginal eect of xj is not measured by j .
The marginal eect of xj varies with the value of xj .
The marginal eect of xj measures the change in the conditional mean
of the dependent variable arising from a one unit change in xj holding
the other regressor constant.
(ETC3410)
Background Material
22 / 85
1.3 Modelling the conditional mean V
1.3.1 Specifying a functional form for the conditional mean
In some cases, a model specication that allows some of the marginal eects
to vary, such as M2, may be more realistic than one that constrains all the
marginal eects to be constant. For example, if we wished to study the
eect of education on average wages, we might specify the conditional mean
of wages as
E (wage jeduc, exp er , race, gender )
= + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 .
(1.14)
Since (1.14) implies that
E (wage jeduc, exp er , race, gender )
= 2 +25 exp er ,
exp er
(ETC3410)
Background Material
23 / 85
1.3 Modelling the conditional mean VI
1.3.1 Specifying a functional form for the conditional mean
this specication allows the marginal eect of experience to depend on the
level of experience.
M3 The conditional mean of the natural log of the dependent variable is
assumed to be linear in the parameters and the natural log of the
explanatory variables. (log-linear model).
Under this specication
E (ln y jx1 , x2 ) = + 1 ln x1 + 2 ln x2 ,
(1.15)
or, in error form,
ln y
(ETC3410)
= E (ln y jx1 , x2 ) + u
= + 1 ln x1 + 2 ln x2 + u.
Background Material
(1.16)
24 / 85
1.3 Modelling the conditional mean VII
1.3.1 Specifying a functional form for the conditional mean
Although the model is nonlinear in the regressors, it is linear in the natural
log of the regressors and in the parameters and can easily be estimated by
OLS.
From (1.16) we have
ln y
= j , j = 1, 2.
ln xj
(1.17)
This specication is often attractive because the regression coe cients
can be interpreted as elasticities or percentage changes.
In (1.17) j measures the percentage change in the level of y arising
from a one percent change in the level of xj , holding the other
regressor constant. That is, j measures the elasticity of y (not lny)
with respect to xj (not lnxj ), holding the other regressor constant.
(ETC3410)
Background Material
25 / 85
1.3 Modelling the conditional mean VIII
1.3.1 Specifying a functional form for the conditional mean
To see this note that
ln y
= lim
ln x !0
ln x
ln y
ln x
ln y
, for small ln x.
ln x
Let
(ETC3410)
ln y
= ln y1
ln y0
ln x
= ln x1
ln x0 .
Background Material
26 / 85
1.3 Modelling the conditional mean IX
1.3.1 Specifying a functional form for the conditional mean
Then
ln y
= ln y1
= ln
= ln
= ln
= ln
ln y0
y1
y0
y1
y0
y1
y0
y1
1+1
y0
y0
+1
y0
y0
+1
y1
100 ln y
(ETC3410)
y0
for small changes in y
y0
y1 y0
100
y0
% change in y.
Background Material
27 / 85
1.3 Modelling the conditional mean I
1.3.1 Specifying a functional form for the conditional mean
In deriving this approximation we have used the fact that,
ln(N + 1)
for any "small" number N. For example,
ln(0.2 + 1) = 0.18
0.2.
Using the same logic,
100 ln x
(ETC3410)
% change in x.
Background Material
28 / 85
1.3 Modelling the conditional mean II
1.3.1 Specifying a functional form for the conditional mean
Therefore, for small changes in x and y,
ln y
ln x
ln y
100 ln y
=
ln x
100 ln x
% change in y
.
% change in x
For example, if
1 = 2
in M3, then a one percent increase in x1 , holding x2 xed, is associated
with a two percent increase in y.
M4 The conditional mean of the log of the dependent variable is assumed to
be linear in the parameters and in the level of the regressors. (log-level
model)
(ETC3410)
Background Material
29 / 85
1.3 Modelling the conditional mean III
1.3.1 Specifying a functional form for the conditional mean
Under this specication the model is given by
E (ln y jx1 , x2 ) = + 1 x1 + 2 x2 ,
(1.18)
or, in error form,
ln y
= E (ln y jx1 , x2 ) + u
= + 1 x1 + 2 x2 + u.
(1.19)
From (1.19) we have
ln y
= j , j = 1, 2.
xj
(1.20)
Under this specication:
(ETC3410)
Background Material
30 / 85
1.3 Modelling the conditional mean IV
1.3.1 Specifying a functional form for the conditional mean
100j measures the percentage change in the level of y arising from
a one unit change in the level of xj , holding the other regressor
constant, since
100j
ln y
xj
ln y
100
xj
100 ln y
xj
= 100
% change in y
.
xj
For example, if
1 = 0.2
(ETC3410)
Background Material
31 / 85
1.3 Modelling the conditional mean V
1.3.1 Specifying a functional form for the conditional mean
in M4, then a one unit increase in x1 , holding x2 xed, is associated
with a twenty percent increase in y.
The marginal eect of xj on the % change in y is constant.
All of the specications for the conditional mean of y that we have
considered so far have the property that they are linear in the parameters.
Models that are linear in the parameters can generally be estimated by OLS.
Of course, whether or not the OLS estimator has good statistical properties
depends on other features of the model such as, for example, whether or not
the errors are homoskedastic.
(ETC3410)
Background Material
32 / 85
1.3 Modelling the conditional mean VI
1.3.1 Specifying a functional form for the conditional mean
Many models that appear to be nonlinear in the parameters can be
transformed into models that are linear in the parameters. For example, the
model given by
y = e [ + 1 x 1 + 2 x 2 ] e u
(1.21)
in nonlinear in the parameters. However, taking logs on both sides of (1.21)
we obtain
ln y = + 1 x1 + 2 x2 + u,
(1.22)
which is linear in the parameters.
Notice that the parameters in (1.22) are exactly the same as the parameters
in (1.21), so when we estimate (1.22) we get estimates of the parameters in
(1.21). However, because it is linear in the parameters, (1.22) is much easier
to estimate than (1.21).
M5 The conditional mean of the dependent variable is intrinsically nonlinear in
the parameters.
(ETC3410)
Background Material
33 / 85
1.3 Modelling the conditional mean VII
1.3.1 Specifying a functional form for the conditional mean
Some models of the conditional mean of the dependent variable are
intrinsically nonlinear in the parameters in the sense that they cannot be
made linear by applying a mathematical transformation, such as taking logs.
For example, assume that
E (y jx1 , x2 ) =
1
1+e
( + 1 x 1 + 2 x 2 )
(1.23)
This model is known as the logit model and is studied in topic 2. The logit
model is intrinsically nonlinear since it cannot be made linear in the
parameters by applying a mathematical transformation.
Intrinsically nonlinear models cannot be estimated by OLS. They are
typically estimated by using the method of maximum likelihood or, less
commonly, the method of nonlinear least squares.
(ETC3410)
Background Material
34 / 85
1.3 Modelling the conditional mean VIII
1.3.1 Specifying a functional form for the conditional mean
In intrinsically nonlinear models such as (1.23) the marginal eects of the
regressors:
Are not given by the regression coe cients.
Depend on the values of the regressors.
As we will see in Topic 2, a nonlinear specication for the conditional mean
of the dependent variable is sometimes more appropriate than a linear
specication, given the nature of the dependent variable.
(ETC3410)
Background Material
35 / 85
1.3 Modelling the conditional mean I
1.3.2 Choosing the regressors
Consider the linear regression model
y = + 1 x1 + 2 x2 + ..... + k
1 xk 1
+ k xk + u.
(1.24)
It is very important to understand the role of the error term, u, in (1.24).
The error term represents all those variables that aect the
dependent variable that have not been explicitly included as
regressors in the model.
If one the regressors, say xi , in (1.24) is correlated with any of the omitted
variables that are contained in u , then xi will necessarily be correlated with
u. A regressor that is correlated with the error term is referred to as an
endogenous regressor.
(ETC3410)
Background Material
36 / 85
1.3 Modelling the conditional mean II
1.3.2 Choosing the regressors
For example, suppose that the correct model in error form is
y = + 1 x1 + 2 x2 + ..... + k
1 xk 1
+ k xk + u
(1.24)
+ v.
(1.25)
but we estimate
y = + 1 x1 + 2 x2 + ..... + k
1 xk 1
In this case, we have omitted the relevant regressor xk . It follows from
(1.24) and (1.25) that
v = k xk + u.
(1.26)
The omitted variable xk is now incorporated in the error term, v, in (1.25).
If, for example, xk is correlated with x2 , then x2 will be correlated with v in
(1.25). That is, x2 will be an endogenous regressor.
(ETC3410)
Background Material
37 / 85
1.3 Modelling the conditional mean III
1.3.2 Choosing the regressors
As we will see in topic 3, when a regression equation contains one or more
endogenous regressors both the OLS and GLS estimators of the regression
coe cients lose their desirable statistical properties. Specically, both
estimators are inconsistent. (The concept of consistency is discussed in
section 1.4 below).
In light of this result, it is clearly very important to think carefully about
which regressors to include in the model, and in particular what factors we
wish to control for.
However, even when we are very careful in selecting the regressors, omitting
a relevant regressor may be unavoidable. This will be the case when one or
more of the relevant regressors is unobservable.
(ETC3410)
Background Material
38 / 85
1.3 Modelling the conditional mean IV
1.3.2 Choosing the regressors
For example, suppose that we are interested in estimating the marginal eect
of education on an individual wage, controlling for experience, race, gender,
experience and ability. In this case the conditional mean of interest is
E (wage jeduc, exp er , race, gender , exp er 2 , ability )
= + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2
+ 6 ability ,
(1.27)
which implies that the model in error form is
wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2
+ 6 ability + u.
(ETC3410)
Background Material
(1.28)
39 / 85
1.3 Modelling the conditional mean V
1.3.2 Choosing the regressors
In (1.28)
E (wage jeduc, exp er , race, gender , exp er 2 , ability )
= 1 .
educ
That is, 1 measures the marginal eect of education on the average wage,
controlling for dierences in experience, race, gender and ability.
Unfortunately, since ability is unobservable, we cant explicitly include it in
the model. Consequently, the equation that we actually estimate is
wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 + v ,
(1.28a)
where
v = 6 ability + u.
(ETC3410)
Background Material
40 / 85
1.3 Modelling the conditional mean VI
1.3.2 Choosing the regressors
We will see in Topic 3 that if, as we suspect, education and ability are
correlated, the OLS estimator of 1 in equation (1.28a) will no longer be
"reliable" even in very large samples. More specically, the OLS estimator of
1 will be an inconsistent estimator of the marginal eect of education on
the average wage controlling for dierences in experience, race, gender and
ability.. (The concept of consistency is discussed in section 1.4 below).
Informally, if we estimate (1.28a) by OLS, the OLS estimate of 1 will be an
"unreliable" estimate of the marginal eect of education on wages,
controlling for exper, race, gender and ability.
In Topic 4 we will discuss how to deal with the problem of endogenous
regressors.
(ETC3410)
Background Material
41 / 85
1.4 Some asymptotic theory I
1.4.1 Introduction
In topics 2 we will study models in which it is desirable to allow the
conditional mean of the dependent variable to be nonlinear in the
parameters and in topics 3 to 8 we will allow the regressors in our models to
be correlated with the error term. In these models it is generally impossible
to derive estimators that can be shown to be unbiased, e cient and
normally distributed in nite samples. In fact, in these models:
The nite sample properties of the estimators that we use are typically
unknown.
In addition, the nite sample distributions of our test statistics are also
typically unknown.
When conducting inference in these models we are forced to rely almost
entirely on asymptotic results, that is results that can be proved to hold
only as the sample size goes to innity.
(ETC3410)
Background Material
42 / 85
1.4 Some asymptotic theory II
1.4.1 Introduction
The strategy researchers use in these circumstances is to derive the
asymptotic distributions of estimators and test statistics and to use these
asymptotic distributions as approximations to the nite sample distributions
of the estimators and test statistics. In eect, we proceed "as if " the
asymptotic distributions are valid in nite samples. However, we never know
how accurate these approximations are in a given application.
In this section we provide a brief and relatively informal discussion of the
important concepts of consistency, asymptotic normality and asymptotic
e ciency. A more detailed and technical discussion of these concepts is
provided in ETC3400.
(ETC3410)
Background Material
43 / 85
1.4 Some asymptotic theory I
1.4.2 Consistency
Let bn denote an estimator of the parameter , given a sample of size n.
Formally, bn is said to be a consistent estimator of if
Pr (jbn
j < ) ! 1 as n ! , for all > 0.
(1.29)
When (1.29) holds we say that bn converges in probability to or that
is the probability limit of bn , which we denote by
p lim(bn ) = .
(1.30)
Intuitively, bn is a consistent estimator of if the probability that bn is
arbitrarily close to goes to 1 as the sample size gets innitely large.
(ETC3410)
Background Material
44 / 85
1.4 Some asymptotic theory II
1.4.2 Consistency
The practical implication of bn being a consistent estimator of is that
there is a very high probability that bn will be very closeto when the
sample size is large, and in this sense bn will be a goodestimator of in
large samples.
Obviously, consistency is a very desirable for an estimator.
There are four useful properties of the plim operator which we state below
without proof. We will use these properties on several occasions later in the
lecture notes.
Let x1n and x2n be two random variables such that
p lim (x 1n ) = x 1 , p lim (x 2n ) = x 2 .
That is, the random variables x1n and x2n converge in probability to the
random variables x1 and x2 respectively. Then the following properties can
be shown to hold:
(ETC3410)
Background Material
45 / 85
1.4 Some asymptotic theory III
1.4.2 Consistency
P1 The plim of a sum is the sum of the plims. That is,
p lim (x 1n +x 2n ) = p lim (x 1n ) + p lim (x 2n ) = x 1 + x2 .
P2 The plim of a product is the product of the plims. That is,
p lim (x 1n x2n ) = p lim (x 1n )p lim (x 2n ) = x1 x2 .
P3 The plim of the inverse is the inverse of the plim. That is,
p lim (x 1n1 ) = [p lim (x 1n )]
1
, x 1 6= 0.
x1
P4 The plim of a ratio is the ratio of the plims. That is,
p lim
(ETC3410)
x1n
x2n
p lim (x 1n )
x1
, = , x2 6= 0.
p lim (x 2n )
x2
Background Material
46 / 85
1.4 Some asymptotic theory IV
1.4.2 Consistency
Although P1, P2, P3 and P4 above have been stated for scalar random
variables, they can be generalized to random vectors and random matrices.
(That is, vectors and matrices whose elements are random variables).
(ETC3410)
Background Material
47 / 85
1.4 Some asymptotic theory I
1.4.3 Asymptotic normality
Let the scalar bn denote an estimator of the unknown parameter , given a
sample of size n. The estimator bn is a random variable and, like any
random variable, has a probability distribution. The form of this distribution
may depend on n. That is, as n increases the form of the probability
distribution of bn may change.
Using a body of mathematics known as central limit theorems, many
random variables whose probability distribution based on a nite sample
(nite sample distribution) is unknown, can be shown to have a well dened
probability distribution as the sample size tends to innity.
When this is the case, the random variable in question is said to "converge
in distribution" and the probability distribution to which it converges is
called a limiting (or limit) distribution.
(ETC3410)
Background Material
48 / 85
1.4 Some asymptotic theory II
1.4.3 Asymptotic normality
When bn is a consistent estimator,
p lim(bn ) = ,
which means that bn collapses to a single point as n goes to innity, in
which case the limiting distribution of bn is degenerate.
In order to obtain a non-degenerate limiting distribution for a consistent
estimator we "normalize" bn as described below.
Formally, we say that bn has a limiting normal distribution if
n ( bn
) ! N (0, V ),
(1.31)
where N(0,V) denotes a normally distributed random variable with mean
d
zero and some unknown variance V, and the notation ! denotes
convergence in distribution as n tends to innity.
(ETC3410)
Background Material
49 / 85
1.4 Some asymptotic theory III
1.4.3 Asymptotic normality
Although we refer to the estimator bn as having a limiting normal
distribution, it is clear from (1.31) that it is actually the random variable
n ( bn
that converges to a normal random variable as n goes to innity.
Equation (1.31) is an exact result, not an approximation. It states that
n ( bn
N (0, V )
N (0, V )
(1.32)
is strictly true as n tends to innity.
However, assume that
n ( bn
for large, but nite, n (where the symbol
(ETC3410)
Background Material
denotes "is approximately").
50 / 85
1.4 Some asymptotic theory IV
1.4.3 Asymptotic normality
Recall that if x is a random variable and c and d are constants, then
E (c + dx ) = c + dE (x )
var (c + dx ) = d 2 var (x ).
Using these results it follows that if
n ( bn
then
bn
bn
(ETC3410)
N (0, V )
1
p N (0, V ) ,
n
N
0,
Background Material
V
n
,
51 / 85
1.4 Some asymptotic theory V
1.4.3 Asymptotic normality
+N
bn
bn
0,
V
n
V
n
(1.33)
Equation (1.33) states that in a large nite sample bn is approximately
normally distributed with mean and variance Vn .
It is conventional to rewrite (1.33) as
bn
asy
V
n
(1.34)
Equation (1.34) is referred to as the asymptotic distribution of bn , and
V
n is referred to as the asymptotic variance of bn .
(ETC3410)
Background Material
52 / 85
1.4 Some asymptotic theory VI
1.4.3 Asymptotic normality
In summary, whenever
n ( bn
) ! N (0, V ),
(1.31)
we say that bn is asymptotically normally distributed with asymptotic
distribution
bn
asy
V
n
(1.34)
In econometrics we use the asymptotic distribution of bn as an
approximation to the true distribution of bn in a nite sample (i.e. we use
the asymptotic distribution of bn as an approximation to its nite sample
distribution).
(ETC3410)
Background Material
53 / 85
1.4 Some asymptotic theory VII
1.4.3 Asymptotic normality
Notice that the asymptotic distribution (1.34) is derived from the limiting
distribution (1.31) by assuming that the latter is approximately true in large
nite samples.
Obviously, the larger the sample size the more likely it is that the asymptotic
distribution is a good approximation to the true nite sample distribution of
bn .
Note:
Most estimators used in econometrics satisfy
n ( bn
) ! N (0, V ).
(1.31)
The results stated in (1.31) and (1.34) generalize to the case in which
bn is a kx1 vector rather than a scalar, as assumed above.
In the case in which bn is a kx1 vector, is also a kx1 vector and Vn is
a kxk variance matrix.
(ETC3410)
Background Material
54 / 85
1.4 Some asymptotic theory VIII
1.4.3 Asymptotic normality
Knowledge of the asymptotic distribution of bn is useful for two principal
reasons:
It can be used to construct condence intervals for our estimates.
It can be used to construct (asymptotically valid) hypothesis tests - as
we will see in section 1.5 below.
(ETC3410)
Background Material
55 / 85
1.4 Some asymptotic theory I
1.4.4 Asymptotic e ciency
The estimator bn is asymptotically e cient if:
(i) bn is a consistent estimator of .
(ii) The asymptotic variance of bn is at least as small as that of any other
consistent estimator. That is,
Avar (bn )
Avar (bn )
where bn denotes any other consistent estimator of .
Notice that, just as we restrict our attention to unbiased estimators when
dening nite sample e ciency, we restrict our attention to consistent
estimators when dening asymptotic e ciency.
(ETC3410)
Background Material
56 / 85
1.4 Some asymptotic theory II
1.4.4 Asymptotic e ciency
Asymptotic variance is the criterion that we use to choose between two or
more consistent estimators. The consistent estimator with the smallest
asymptotic variance is generally preferred.
In Topic 2 we will introduce the estimation method known as maximum
likelihood estimation. One of the most attractive features of maximum
likelihood estimation is that, provided the statistical/econometric model
model is correctly specied, the maximum likelihood estimator will be:
consistent
asymptotically normally distributed
asymptotically e cient
(ETC3410)
Background Material
57 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model I
A hypothesis test that is valid in a sample of any size is called an exact
test. Tests that are valid only in large samples are called asymptotic tests.
Generally speaking, exact tests are available only in the linear regression
model with normally distributed, homoskedastic, serially uncorrelated errors.
Once we relax these very restrictive assumptions, we are forced to use
asymptotic tests.
Many hypotheses of economic interest can be expressed as linear restrictions
on the parameters of an econometric model. For example, consider the wage
equation,
wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 + v .
(1.28)
(ETC3410)
Background Material
58 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model II
Suppose that we wish to simultaneously test the following hypotheses:
(i) The marginal eect of educ is equal but opposite in sign to the
marginal eect of exper for someone who has one year of
experience.
(ii) The marginal eect of gender is twice that of race.
Since,
MEeduc
MEexp
2 + 25 ,
the hypothesis that the marginal eect of educ is equal but opposite in
sign to the marginal eect of exper implies that
1 =
(ETC3410)
( 2 + 25 ), or 1 + 2 + 25 = 0.
Background Material
59 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model III
Since
MEgender
MErace
3 ,
the hypothesis that the marginal eect of gender is twice that of race
implies that
4 = 23 , or 4 23 = 0.
Notice that each of these economic hypotheses has been expressed as a
restriction on the parameters of the model.
(ETC3410)
Background Material
60 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model IV
The two hypotheses we wish to test impose the following two linear
restrictions on the parameters of the wage equation
1 + 2 + 25 = 0
(1.35)
4
23 = 0
The restrictions in (1.35) can be written more compactly as
= r ,
(2x 6 )(6x 1 )
(ETC3410)
(2x 1 )
Background Material
(1.36)
61 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model V
where
0 1 1
0 0 0
0
0
0 0 2
2 1 0
, =
To see this note that
R = r
)
0 1 1
0 0 0
(ETC3410)
0 0 2
2 1 0
Background Material
2
6
6
6
6
6
6
4
1
2
3
4
5
7
7
7
7=
7
7
5
0
0
62 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model VI
)
1 + 2 + 25
23 + 4
0
0
)
1 + 2 + 25 = 0
(1.35)
4
23 = 0
In general, q (independent) linear restrictions on the kx1 vector can be
written as
= r .
(qxk )(kx 1 )
(qx 1 )
(1.37)
The precise denitions of R and r depend on the particular restrictions being
tested.
(ETC3410)
Background Material
63 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model VII
The advantage of expressing our restrictions in the form of (1.37) is that it
enables us to represent a set of linear restrictions on without specifying
exactly what the restrictions are, and to derive results that will hold for any
set of linear restrictions on .
Under the null hypothesis that the restrictions in (1.37) are correct,
r = 0.
(1.38)
However, since is unknown, how do we determine whether or not (1.38)
holds?
An obvious approach is to consider whether or not
where b
is our estimator of .
(ETC3410)
Rb
r = 0,
Background Material
64 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model VIII
However, b
is a random variable the value of which varies from sample to
sample. Therefore, the question we need to consider is whether or not
Rb
r is statistically signicantly dierent from zero.
To determine whether or not R b
r is statistically signicantly dierent
from zero we need to know the probability distribution of R b
r . We next
b
show that the asymptotic distribution of R r can be derived from our
knowledge of the asymptotic distribution of b
.
Assume that
Then,
asy
b
Rb
(ETC3410)
asy
RN
V
n
Background Material
(1.39)
V
n
65 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model IX
)
)
Rb
Rb
Rb
asy
asy
asy
R ,
RVR 0
,
n
R ,
RVR 0
n
r,
RVR 0
.
n
(1.40)
In going from the second line to the third line of the derivation we used the
result that
Var (R b
) = RVar (b
)R 0
RVR 0
=
.
n
(ETC3410)
Background Material
66 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model X
Equation (1.40) implies that under the null hypothesis that
r = 0,
R
Rb
asy
0,
RVR 0
.
n
(1.41)
In principal, we could use (1.41) as our test statistic. However, if we did so,
the critical value for our test would depend on R, and there would be a
dierent critical value for each possible choice of R.
We can eliminate the dependence on R of the critical value for our test
statistic by transforming our test statistic from a normal random variable
into a chi-square variable. The transformation is achieved by appealing to
the following well known theorem in mathematical statistics.
(ETC3410)
Background Material
67 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XI
Theorem
1. Let Z be a kx1 random vector. If
Z
asy
N (0, ),
then
Z 0
asy
2 (q ),
where q is the rank of the matrix .
(ETC3410)
Background Material
68 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XII
Applying Theorem 1 to
Rb
asy
0,
RVR 0
,
n
(1.41)
with R b
r playing the role of Z, we conclude that, under the null
hypothesis
Rb
RVR 0
n
r = 0,
1
Rb
asy
2 (q ),
(1.42)
(1.43)
where q is the number of restrictions imposed under the null
hypothesis.
(ETC3410)
Background Material
69 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XIII
The statistic on the left-hand side of (1.43) is not feasible, since it depends
on the unknown matrix V. A feasible test statistic for testing (1.42) is given
by
W = Rb
b R0
RV
n
Rb
asy
2 (q ),
(1.44)
b is a consistent estimator of V, i.e.
where V
b) = V.
p lim( V
b is a consistent estimator of V, the left-hand sides of (1.43) and
As long as V
(1.44) are asymptotically equivalent.
Note:
(ETC3410)
Background Material
70 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XIV
A hypothesis test based on (1.44) is called a Wald test (because it
was rst proposed by Hermann Wald in 1943).
A Wald test is the most common form of hypothesis test used in
econometrics because, unlike other tests, a Wald test can be conducted
no matter what estimation method is used to estimate the regression
equation.
Since only the asymptotic distribution of W is known, the Wald test is
an asymptotic test, and may be unreliable in small samples.
The Wald test statistic in (1.44) is sometimes written as
W = n Rb
(ETC3410)
b R0
RV
Background Material
Rb
r .
(1.45)
71 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XV
The derivation of (1.44) depends crucially on the result that
asy
b
V
n
(1.39)
and illustrates how knowledge of the asymptotic distribution of an
estimator can be used to construct an asymptotically valid test statistic.
Testing at the 5% signicance level, we reject the null hypothesis that
r =0
(1.42)
if
Wcalc > 20.95 (q ),
where Wcalc denotes the sample value of the test statistic, and 20.95 (q )
denotes the 95th percentile of the chi-square distribution with q degrees of
freedom.
(ETC3410)
Background Material
72 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XVI
By Theorem 1 above
q = rank
bR
RV
n
which in turn can be shown to equal the number of restrictions imposed
under the null hypothesis. Therefore q in
W = Rb
b R0
RV
n
Rb
asy
2 (q )
(1.44)
is always equal to the number of restrictions imposed under the null
hypothesis that
(ETC3410)
r =0
Background Material
(1.42)
73 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XVII
Equivalently, we reject (1.42) if
value < 0.05,
p
where
value = prob 2 (q ) > Wcalc .
It can be shown that
W
q
asy
F (q, n
k ),
(1.46)
where n is the sample size and k denotes the number of regressors in the
model (including the constant).
(ETC3410)
Background Material
74 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XVIII
Consequently, one can also implement the Wald test as an asymptotic
F-test. In this case we reject the null hypothesis if
Fcalc
Wcalc
> F0.95 (q, n
q
k ),
where F0.95 (q, n k ) denotes the 95th percentile of an F distribution with
q degrees of freedom in the numerator and n-k degrees of freedom in the
denominator.
Note:
Tests based on (1.44) and (1.46) are asymptotically equivalent.
However, they produce dierent p-values in nite samples.
Some software packages report results based on (1.44), some report
results based on (1.46) and some report both.
(ETC3410)
Background Material
75 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XIX
Some researchers believe that that F (q, n k ) is a better
2
approximation to the nite sample distribution of W
q than (q ) is to
the nite sample distribution of W. Consequently, they use (1.46) in
the hope that it will produce more reliable results in a nite sample.
In the special case of testing
H0 : k = 0
in the linear regression equation
y = 1 + 2 x2 + ..... + k + u
(i.e. testing the individual signicance of xk ), it can be shown that he test
statistic
W = Rb
(ETC3410)
b R0
RV
n
Background Material
Rb
asy
2 (q )
(1.44)
76 / 85
1.5 Testing linear restrictions on the parameters of an
econometric model XX
reduces to
Wz =
b
k
se (b
k )
asy
N (0, 1).
(1.47)
If the model is estimated by maximum likelihood, the null hypothesis
r =0
(1.42)
can also be tested by performing a likelihood ratio (LR) test. It can be
shown that under (1.42) the test statistic
LR
2(lu
lr )
asy
2 (q ),
(1.48)
where lu and lr respectively denote the maximized values of the
log-likelihood function of the unrestricted and restricted models, and q again
denotes the number of restrictions imposed under the null. The LR test will
be discussed in more detail in Topic 2.
(ETC3410)
Background Material
77 / 85
1.6 A review of generalized least squares (GLS) I
Consider the linear regression model
yi = 0 + 1 xi 1 + 1 xi 2 + ... + 1 xik + ui , i = 1, ..., n,
(1.49)
or in matrix notation
y = X + u.
(1.50)
In introductory econometrics units it is often assumed that
var (ui jxi ) = 2 , i = 1, ....n
(1.51)
cov (ui , uj jxi , xj ) = 0, i 6= j,
(1.52)
and
where
xi = (xi 1 , xi 2 , ....., xik ).
Equations (1.51) and (1.52) respectively state that the errors in (1.50) are
conditionally homoskedastic and conditionally serially uncorrelated.
(ETC3410)
Background Material
78 / 85
1.6 A review of generalized least squares (GLS) II
When (1.51) and (1.52) hold,
6
6
Var (u jX ) = 6
6
4
(nxn )
When
2 0 . . 0
0 2 0 . 0
.
. . . .
.
. . . .
0
. . . 2
6
7
6
7
7 = 2 6
6
7
4
5
1 0 . . 0
0 1 0 . 0
. . . . .
. . . . .
0 . . . 1
Var (u jX ) = 2 In ,
7
7
7 = 2 In .
7
5
(1.53)
the errors in (1.50) are said to be "spherical", and when (1.53) is violated
they are said to be "non-spherical".
Notice that when the errors are spherical, the error covariance matrix is a
scalar identity matrix, that is, an identity matrix multiplied by a scalar 2 .
Assumption (1.51) is usually unrealistic for cross-section data, and
assumption (1.52) is usually unrealistic for time series data.
(ETC3410)
Background Material
79 / 85
1.6 A review of generalized least squares (GLS) III
Denote the conditional error variance matrix for non-spherical errors by
Var (u jX ) = 6= 2 In ,
(1.54)
where the precise form of depends on the nature of the departure from
sphericity. For example, in the case of conditionally uncorrelated,
heteroskedastic errors
6
6
=6
6
4
(ETC3410)
21 0 . . 0
0 22 0 . 0
.
. . . .
.
. . . .
0
. . . 2n
Background Material
7
7
7.
7
5
80 / 85
1.6 A review of generalized least squares (GLS) IV
It is well known that when (1.54) holds the OLS estimator of in
y = X + u.
(1.50)
is ine cient. In this case an e cient estimator can be obtained by
executing the following steps:
S1 Multiply on both sides of
y = X+u
(1.50)
by 1/2 and obtain
1/2
y =
1/2
X+
1/2
u,
or
y = X +u ,
(1.55)
where
y
(ETC3410)
1/2
y, X
1/2
Background Material
X, u
1/2
u.
81 / 85
1.6 A review of generalized least squares (GLS) V
Notice that
Var (u jX ) = Var (
=
=
=
=
=
1/2
1/2
1/2
1/2
u)
Var (u jX )
1/2
1/2
1/2
(using (1.54))
1/2
1/2
0 0
In .
(1.56)
Therefore, the errors in (1.55) are spherical, and by the Gauss-Markov
theorem in (1.55) can be e ciently estimated by OLS.
(ETC3410)
Background Material
82 / 85
1.6 A review of generalized least squares (GLS) VI
S2 Applying the usual OLS formula to
y = X +u
(1.55)
b
= (X 0 X ) 1 X 0 y
h
i 1
= ( 1/2 X )0 ( 1/2 X
( 1/2 X )0 y
h
i 1
= X 0 1/2 1/2 X
X 0 1/2 1/2 y
(1.57)
we obtain
X 0
X 0
y.
The estimator in (1.57) is called the generalized least squares (GLS)
estimator of in the regression equation
y = X + u,
(ETC3410)
Background Material
(1.50)
83 / 85
1.6 A review of generalized least squares (GLS) VII
and is denoted by
b
GLS = X 0
X 0
y.
(1.58)
In summary, the OLS estimator of in
y = X + u,
(1.50)
is
b
OLS = (X 0 X )
X 0y ,
X 0
and the GLS estimator of is
b
GLS = X 0
(ETC3410)
Background Material
y.
(1.58)
84 / 85
1.6 A review of generalized least squares (GLS) VIII
Notice b
GLS can be obtained from the formula for the OLS estimator,
b
OLS = (X 0 X )
X 0y ,
by inserting 1 between X 0 X and between X 0 y , where
= Var (u jX )
b
GLS is not a feasible estimator, since it depends on the unknown matrix
1 . A feasible GLS (FGLS) estimator is given by
h
i 1
b
b 1X
b 1y ,
FGLS = X 0
X 0
(1.59)
where
b = .
p lim
b is a consistent estimator of .
That is,
Many of the estimators that we will discuss in this unit are FGLS estimators.
(ETC3410)
Background Material
85 / 85