Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
249 views10 pages

GLS Handout

1. The document discusses generalized least squares (GLS), which provides an efficient estimator for the regression coefficients β when the errors in a linear regression model are not spherical (have non-equal variances). 2. GLS transforms the data using a matrix Ψ so that the transformed errors have equal variances. The coefficients are then estimated using ordinary least squares on the transformed model. 3. The GLS estimator is shown to be more efficient than estimators that use other transformations, since it uses the transformation that satisfies the Gauss-Markov conditions for the transformed model.

Uploaded by

marcelinoguerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
249 views10 pages

GLS Handout

1. The document discusses generalized least squares (GLS), which provides an efficient estimator for the regression coefficients β when the errors in a linear regression model are not spherical (have non-equal variances). 2. GLS transforms the data using a matrix Ψ so that the transformed errors have equal variances. The coefficients are then estimated using ordinary least squares on the transformed model. 3. The GLS estimator is shown to be more efficient than estimators that use other transformations, since it uses the transformation that satisfies the Gauss-Markov conditions for the transformed model.

Uploaded by

marcelinoguerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Generalized Least Squares

(Handout Version)∗

Walter Belluzzo

Econ 507 Econometric Analysis


Spring 2013

1 Introduction
Efficiency of the OLS Estimator

• Remember that the OLS estimator efficient (best linear unbiased estimator) if the DGP
belongs to the regression model

y = Xβ + u, u|X ∼ iid(0, σ 2 I),

a result stated in the Gauss-Markov theorem.

• For efficiency of least squares, the error terms must be uncorrelated and have the equal
variance, Var(u) = σ 2 I.

• The usual estimators of the covariance matrices of the OLS and NLS estimators are not
valid when these assumptions do not hold.

• Alternative “sandwich” covariance matrix estimators that are asymptotically valid can be
obtained. But inefficiency of the estimators β̂ remains.

Regression Model with Non-spherical Disturbances

• Non-spherical disturbances affect both linear and nonlinear regression models in the same
way. So, we can focus our attention to the simpler, linear case.

• Let us consider the model

y = Xβ + u, E(uu0 ) = Ω.

• The idea to obtain an efficient estimator for the vector β in this model is to find a
transformation that makes the Gauss-Markov conditions to be satisfied.

• The resulting efficient estimator (why?) is called the generalized least squares, or
GLS, estimator.
∗ This lecture is based on D&M Chapter 6.
Econ 507 – Spring 2013

2 Generalized Least Squares


• The transformation we want to find must be such that the new. transformed, error terms
have variance matrix Var(u) = σ 2 I.

• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed


error vector Ψ 0 u is

E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ

= Ψ 0Ω Ψ ,

• To make the expression in the farther right-hand side to reduce to σ 2 I, we must define Ψ
such that
Ω −1 = Ψ Ψ 0 .

Transforming Back to Classic Regression

• In this case, the variance of the transformed error reduces to

E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ


= Ψ 0 (Ψ 0 )−1 Ψ −1 Ψ = I.

• Premultiplying the regression by Ψ 0 gives

Ψ 0 y = Ψ 0 Xβ + Ψ 0 u.

• Because the covariance matrix Ω is nonsingular, the matrix Ψ must be as well, and so
the transformed regression model is perfectly equivalent to the original model.

GLS of the Transformed Model

• The OLS estimator of β from the transformed regression is

β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y

= (X0 Ω −1 X)−1 X0 Ω −1 y.

• This is the expression for the generalized least squares, estimator of β.

• Since β̂gls is just the OLS estimator for the transformed model, its covariance matrix can
−1
be found directly from the OLS covariance matrix, σ 2 X0 X .

• Replacing X by Ψ 0 X and σ02 by 1 we get

Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .

2
Econ 507 – Spring 2013

The GLS Criterion Function

• The generalized least squares estimator β̂gls can also be obtained by minimizing the GLS
criterion function
(y − Xβ)0 Ω −1 (y − Xβ),
which is just the sum of squared residuals from the transformed regression.

• This can be viewed as the SSR function from the original model, weighted by the inverse
of the matrix Ω.

• The effect of such a weighting scheme is clearest when Ω is a diagonal matrix. In that
case, the weight given to the tth observation is proportional to the inverse of Var(ut ).

3 Efficiency of the GLS Estimator


Method of Moments Representation of GLS

• The GLS estimator β̂gls defined in (7.04) is also the solution of the set of moment condi-
tions
X0 Ω −1 (y − X β̂gls ) = 0.
which the same old with W = Ω −1 X.

• It is easy to verify that these moment conditions are equivalent to the first-order conditions
for the minimization of the GLS criterion function (do it as an exercise).

• Since the GLS estimator is a method of moments estimator, it is interesting to compare


it with estimators obtained with a general matrix W, denoted β̂w .

• We will obtain efficiency from this comparison.

Method of Moments Representation of GLS

• Suppose that the DGP is a special case of that model, with parameter vector β0 and
known covariance matrix Ω.

• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,


pre-determinedness would suffice.

• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that

β̂w = β0 + (W0 X)−1 W0 u.

• Therefore, the covariance matrix of β̂w is


 
Var(β̂w ) = E (β̂w − β0 )(β̂w − β0 )0
= E (W0 X)−1 W0 uu0 W(X0 W)−1


= (W0 X)−1 W0 ΩW(X0 W)−1 .

3
Econ 507 – Spring 2013

Efficiency of the GLS Estimator

• To show efficiency of β̂gls , we proceed as in previous cases and show that the difference
of the precision matrices,

X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)

is positive semidefinite (Do it as an exercise).

• This difference being positive semidefinite means that any other choice of variables W
yields larger variance than W = X0 Ω −1 .

• In fact, β̂gls is typically more efficient for all elements of β, because it is only in very
special cases that the matrix (1) will have any zero diagonal elements.

• Note that β̂w reduces to the OLS estimator when W = X. Thus we conclude that our
conclusions apply to the OLS estimator, β̂.

4 Computing GLS Estimates


• The main issue in computing the GLS estimator is that, in general, the matrix Ω in
unknown. But it is important to note that there is a computational difficulty even if Ω
is known.

• The reason is that when n is large, computation based on Ω, which is an n × n matrix,


can be very demanding in terms of computer memory.

• In general, computation of the GLS estimator will be easy only if the matrix Ψ has a
form that allows us to calculate Ψ 0 x, without having to store Ψ itself in memory.

GLS with Ω Known Up to a Constant

• Suppose that Ω = σ 2 ∆ , where the n × n matrix ∆ is known to the investigator, but the
positive scalar σ 2 is unknown.

• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,


but the error terms will now have variance σ 2 instead of variance 1.

• The OLS estimates from the transformed regression with the modified Ψ is numerically
identical to β̂gls :

(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y

= (X0 Ω −1 X)−1 X0 Ω −1 y

= β̂gls .

• Thus the GLS estimates will be the same whether we use Ω or ∆, that is, whether or not
we know σ 2 .

4
Econ 507 – Spring 2013

• The covariance matrix of β̂gls in this case can be written as

Var(β̂gls ) = σ 2 (X∆X),

which can be estimated by replacing σ 2 with the usual estimator OLS of the error variance,
s2 , from the transformed regression.

Weighted Least Squares


• Let ωt2 denote the tth diagonal element of Ω. That is, the error terms are heteroskedastic
but uncorrelated.

• Then Ω −1 is a diagonal matrix with tth diagonal element ω −2 , and thus Ψ will be a
diagonal matrix with elements ωt−1 .

• In this case, the transformed regression can be written as


1 1 1
yt = Xt β + ut ,
ωt ωt ωt
and estimated by OLS.

• This special case of GLS estimation is often called weighted least squares, or WLS.

• The weight given to each observation is ω −1 , and thus observations for which the variance
of the t error term is large/small are given low/high weights.

• Note that all the variables in the regression, including the constant term, must be multi-
plied by the same weights.

• Note that the R2 only makes sense in terms of the transformed regressand, since the
“undoing” the weighting does not preserve orthogonality of residuals and fitted values.
That is,
û ⊥ ŷ =⇒
6 Ψ −1 û ⊥ Ψ −1 ŷ

Generalized Nonlinear Least Squares


• Replacing the vector of regression functions Xβ by x(β), we obtain generalized non-
linear least squares, or GNLS, estimates by minimizing the criterion function
0
(y − x(β)) Ω −1 (y − x(β)) ,

• Differentiating with respect to β and dividing by −2 yields the moment conditions

X 0 (β)Ω −1 (y − x(β)) = 0,

where, X(β) is the matrix of derivatives of x(β) with respect to β.

5
Econ 507 – Spring 2013

5 Feasible Generalized Least Squares


GLS is Infeasible in Practice

• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is


expensive because there is a n × n matrix to be handled.

• Life is much easier if there is heteroskedasticity and no serial correlation. In this case, we
can simply use weighted least squares.

• But even in this case some information on ωt is still necessary, such as sampling design
or a direct relationship between E(u2t ) and some variable zt that can be used as weight.

• In practice, the covariance matrix Ω is often not known even up to a scalar factor. This
makes it impossible to compute GLS estimates.

Estimating the Variance Matrix Ω

• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a


vector of unknown parameters γ, that is, assume that Ω = Ω(γ).

• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is


consistent for Ω.

• Then we can define Ψ (γ̂) such that

Ω̂ = Ψ (γ̂)Ψ 0 (γ̂).

and obtain GLS estimates conditional on Ψ (γ̂).

• The resulting estimator is called feasible generalized least squares, or feasible GLS

Estimating Ω Using Skedastic Functions

• In the same way that a regression function determines the conditional mean of a random
variable, a skedastic function determines its conditional variance:

E(u2t |xt , zt ) = h(zt ; γ),

where γ is an l-vector of unknown parameters, and zt is a vector of observations on


exogenous or predetermined variables that belong to the information set on which we are
conditioning.

• An example of a skedastic function is exp(Zt γ), which conveniently produces positive


estimated variances for all γ.

6
Econ 507 – Spring 2013

Example of Feasible GLS Procedure

• Consider the linear regression model

yt = xt β + ut , E(u2t ) = exp(zt γ).

• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates


of the error terms from the vector of OLS residuals with typical element ût .

• We can then obtain OLS estimates γ̂ running the auxiliary linear regression

log û2t = Zt γ + vt ,

• These estimates are then used to compute


 1/2
ω̂t = exp(Zt γ̂)

for all t.

• Finally, feasible GLS estimates of β are obtained by using ordinary least squares to esti-
mate regression, with the estimates ω̂t replacing the unknown ωt ,
1 1 1
yt = Xt β + ut .
ω̂t ω̂t ω̂t

• This is an example of feasible weighted least squares.

• Under suitable regularity conditions, it can be shown that this type of procedure yields
a feasible GLS estimator β̂f that is consistent and asymptotically equivalent to the GLS
estimator β̂gls .

Why Feasible GLS Works


Consistency of the GLS Estimator

• If we substitute Xβ0 + u for y into the formula for the GLS estimator, we find that

β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.

• Taking probability limits, after rearranging multiplying each factor by an appropriate


power of n, we get
−1 

 
a 1
n(β̂gls − β0 ) = plim X0 Ω −1 X plim n −1/2 0
XΩ −1
u .
n

• As usual, we assume sufficient conditions for the first factor in the right-hand side to tend
to a non-stochastic k × k matrix.

• Then, we apply a CLT to the second factor to conclude that it is a asymptotically normal
random vector, and thus obtain root-n consistency and normality.

7
Econ 507 – Spring 2013

• Following the same argument for the feasible GLS estimator, we find that
−1 

 
a 1 0 −1 −1/2 0 −1
n(β̂f − β0 ) = plim X Ω (γ̂)X plim n X Ω (γ̂)u .
n

• Clearly, β̂gls will be asymptotically equivalent to β̂f if


1 0 −1 1
plim X Ω (γ̂)X = plim X0 Ω −1 X
n n
and
plim n− /2 X0 Ω −1 (γ̂)u = plim n− /2 X0 Ω −1 u.
1 1

• For these equalities to hold, it is necessary that plim γ̂ = γ.

Small Sample Properties of the Feasible GLS


• Whether or not feasible GLS is a desirable estimation method in practice depends on how
good an estimate of Ω can be obtained.

• If Ω(γ̂) is a very good estimate, then feasible GLS will have essentially the same properties
as GLS itself.

• As a result, inferences should be reasonably reliable, even though they will not be exact
in finite samples.

• On the other hand, if Ω(γ̂) is a poor estimate, feasible GLS estimates may have quite
different properties from real (infeasible) GLS estimates, and inferences may be quite
misleading.

Alternative Estimation Approaches


• It is possible to iterate a feasible GLS procedure, using β̂f to compute new set of residuals,
ˆ.

ˆ to obtain a second-round estimate of γ̂


• Then, use û ˆ , which can be used to calculate
ˆ
second-round feasible GLS estimates, β̂f , and so on.

• This procedure can either be stopped after a predetermined number of rounds or continued
until convergence is achieved (although convergence is not guaranteed).

• Iteration does not change the asymptotic distribution of the feasible GLS estimator, but
it does change its finite-sample distribution.
• Another way to estimate models in which the covariance matrix of the error terms depends
on one or more unknown parameters is to use the method of maximum likelihood.

• As we will see later on, in this case, β and γ are estimated jointly and consistency will
follow if the maximum likelihood regularity conditions are satisfied.

• In many cases, an iterated feasible GLS estimator will be the same as a maximum likeli-
hood estimator based on the assumption of normally distributed errors.

8
Econ 507 – Spring 2013

6 Testing for Heteroskedasticity


Model Specification and Heteroskedasticity

• It is important to note that in our usual setup, homoskedasticity is imposed as a assump-


tion in model specification.

• If the true DGP is heteroskedastic, it will not the included in the estimated model, and
therefore there is a specification error.

• The specification error does not bias the OLS estimator, but renders it inefficient, as the
sandwich form of its covariance matrix suggests.

• As we have seen, we can compute asymptotically valid covariance matrix estimates for
the (inefficient) OLS and NLS parameter estimates.

• So, what if we choose to assume heteroskedasticity and settle with a inefficient estimator,
but the true DGP is homoskedastic?

• Simulation experiments suggest that this specification error frequently has little cost.

• This evidence can be taken as an indication that it may be prudent to employ an HCCME
anyway, especially if the sample size is large.

• However, in finite samples, tests and confidence intervals based on HCCMEs will always
be somewhat less reliable than ones based on the usual OLS covariance matrix under
homoskedasticity.

• If we have information on the form of the skedastic function, we might well wish to use
feasible generalized least squares, which is asymptotically equal to the efficient generalized
least squares.

• However, small sample properties of the feasible generalized least squares depend critically
on the estimates Ω̂.

• So, if the true DGP is homoskedastic and we assume heteroskedastcity, we can expect
that the specification error may be costly in small samples.

• So, before deciding to use a HCCME or a Feasible GLS procedure, it is advisable to


perform a specification test of the null hypothesis that the error terms are homoskedastic.

Skedastic Function and Heteroskedasticity Testing

• Let us consider a reasonably general model of conditional heteroskedasticity, such as

E(u2t | Ωt ) = h(δ + zt γ),

where the skedastic function h( · ) is a nonlinear function that can take on only posi-
tive values, zt is a 1 × r vector of observations on exogenous or predetermined variables
that belong to the information set Ωt , δ is a scalar parameter, and γ is an r-vector of
parameters.

9
Econ 507 – Spring 2013

• Under the null hypothesis that γ = 0, the function h(δ+Zt γ) collapses to h(δ), a constant.

• If we think of the skedastic function as a regression equation in conditional expectation


form, then its error form can be written as

u2t = h(δ + zt γ) + vt .

• Alternatively, you can define vt as the difference between u2t and its conditional expecta-
tion, and rewrite the skedastic function as in the last expression.

• Suppose that we actually observe ut . Then, we can test γ = 0 using a Gauss-Newton


regression

u2t − h(δ + Zt γ) = h0 (δ + Zt γ)bδ + h0 (δ + Zt γ)Zt bγ + residual,

where h0 ( · ) is first derivative of h( · ), bδ is the coefficient of δ, and bγ is the r-vector of


coefficients associated with γ.

GNR Testing for Heteroskedasticity

• Remember that we need to evaluate the GNR at “initial” parameter values.

• So, let us evaluate it at γ = 0 and δ = δ̃ ≡ h−1 (σ̃ 2 ), where σ̃ 2 is the sample variance of
ut :
u2t − σ̃ 2 = h0 (δ̃)bδ + h0 (δ̃)Zt bγ + residual.

• For the purpose of testing the null hypothesis that γ = 0, this regression is equivalent to

u2t = bδ + Zt bγ + residual,

with a suitable redefinition of the artificial parameters bδ and bγ , which does not depend
on the functional form of h( · ).

Residuals and Heteroskedasticity Testing

• It can be shown that replacing u2t by û2t does not change the asymptotic distribution of
the F and nR2 statistics for testing the hypothesis bγ = 0;

• The last issue is to choose the variables to be included in Z. White suggests including
all squares and cross-products of the variables em X (why?), which results in the White
Test.

• The general form of the test is basically the Breush-Pagan Test. We will derive the
limiting distribution for this test later, in a more convenient framework.

• Since the asymptotic approximations for these test statistics may be inaccurate in finite-
samples, bootstrapping them when the sample size is small or moderate may be a good
idea.

10

You might also like