ECON 5360 Class Notes
Heteroscedasticity
1 Introduction
In this chapter, we focus on the problem of heteroscedasticity within the multiple linear regression model.
Throughout, we assume that all other classical assumptions are satis…ed. Assume the model is
Y =X + (1)
where 2 3
2
6 1 0 0 07
6 7
60 2
0 07
6 2 7
6 7
0 2 6. .. .. .. .. 7
E( )= = 6 .. . . . . 7 : (2)
6 7
6 7
60 0 2
07
6 n 1 7
4 5
2
0 0 0 n
Heteroscedasticity is a common occurrence in cross-sectional data. It can also occur in time series data
(e.g., AutoRegressive Conditional Heteroscedasticity, ARCH).
2 Ordinary Least Squares
We now examine several results related to OLS when heteroscedasticity is present in the model.
2.1 Summary of Findings
1. b = (X 0 X) 1
X 0 Y is unbiased and consistent.
2
2. var(b) = (X 0 X) 1
X 0 X(X 0 X) 1
is the correct formula.
2
3. var(b) = (X 0 X) 1
is the incorrect formula.
asy 2
4. b N( ; Q 1 ~
QQ 1 ~
) where plim n1 (X 0 X) = Q and plim n1 (X 0 X) = Q.
n
2.2 White’s Estimator of Var(b)
2
If we continue to use OLS, we need a good estimate of var(b) = (X 0 X) 1
X 0 X(X 0 X) 1
. White (1980)
suggests that if we don’t know the form of , we can still …nd a consistent estimate of X 0 X, that is,
1 Xn 2
S0 = ei xi x0i
n i=1
1
2
will converge in probability to n X 0 X, where the ei are the OLS residuals. Therefore, White’s asymptotic
estimate of var(b) is
est:asy:var(b) = (X 0 X) 1
nS0 (X 0 X) 1
.
Davidson and McKinnon have shown that White’s estimator can be unreliable in small samples and have
suggested appropriate modi…cations.
2.3 Gauss Example
In this application, we are interested in measuring the degree of technical ine¢ ciency of rice farmers in the
Ivory Coast. The data are both cross-sectional (N = 154 farmers) and time series (T = 3 years). The
model is
ln(1=T E) = +X +Z +
where T E represents technical e¢ ciency (i.e., ratio of actual production to the e¢ cient level from a production
frontier), X is a set of managerial variables (e.g., years of experience, gender, age, education, etc.) and Z
is a set of exogenous variables (i.e., erosion, slope, weed density, pests, region dummies, year dummies,
etc.). The main point of the exercise is to see whether technical ine¢ ciency is related to the managerial
characteristics of the rice farmers, once we have accounted for aspects of the production process outside their
control. See Gauss example 1 for further details.
3 Testing for Heteroscedasticity
All the tests below are based on the OLS residuals. This makes sense, at least asymptotically, because
p
b! .
3.1 Graphical Test
As a …rst step, it may be useful to graph e2i or ei against any variable suspected of being related to the
heteroscedasticity. If you are unsure which variable is responsible, you can plot against Y^i = Xi b, which is
simply a weghted sum of all X.
3.2 White’s Test
The advantage of White’s test for heteroscedasticity (and similarly White’s estimator of var(b)) is that you do
2 2
not need to know the speci…c form of . The null hypothesis is H0 : i = ; 8i and the alternative is that the
null is false. The motivation for the test is that if the null is true s2 (X 0 X) 1
and s2 (X 0 X) 1
X 0 X(X 0 X) 1
2
are both consistent estimators of var(b), while if the null is false, the two estimates will diverge. The test
procedure is
asy
Regress e2i on all the crosses and squares of X. The test statistic is W = nR2 2
(P 1), where P
is the total number of regressors, including the constant.
The disadvantage of the test is that since it is so general, it can easily detect other sorts of misspeci…cations
other than heteroscedasticity. Also the test in nonconstructive, in the sense that once heteroscedasticity is
found, the test does not provide guidance in how to …nd an optimal estimator.
3.3 Goldfeld-Quandt Test
The Goldfeld-Quandt test addresses the disadvantage of White’s test. It is a more powerful test that assumes
the sample can be divided into two groups – one with a low error variance and the other with a high error
variance. The trick is to …nd the variable on which to sort the data. The hypotheses are
2 2
H0 : i = ; 8i
2 2 2
HA : n n 1 ::: 1
The test procedure is
1. Order the observations in ascending order according to the size of the error variances.
2. Omit r central observations (often r = n=3).
3. Run two separate regressions –…rst (n r)=2 observations and last (n r)=2 observations.
4. Form the statistic F = (e01 e1 =(n1 k))=(e02 e2 =(n2 k)) F (n1 k; n2 k), which requires that
2
N (0; ).
5. Reject or fail to reject the null hypothesis.
3.4 Breusch-Pagan Test
One drawback of the Goldfeld-Quandt test is that you need to choose only one variable related to the
heteroscedasticity. Often there are many candidates. The Breusch-Pagan test allows you to choose a
vector, zi , of variables causing the heteroscedasticity. The hypotheses are
2 2
H0 : i = ; 8i
2 2 0
HA : i = f( 0 + zi ).
3
The test statistic is
g 0 Z(Z 0 Z) 1
Z 0g asy 2
LM = (P 1)
2
where gi = (e2i =^ 2 ) 1 and Zi = (1; zi ). If Z the regressors from White’s test, then the two tests are
algebraically equivalent.
3.5 Gauss Example (cont.)
We now perform the three tests for heteroscedasticity using the Ivory Coast rice-farming data. The Goldfeld-
Quandt test will not work because after sorting, the smaller X matrix is not of full rank. White’s test will
not work either because there are too many variables. See Gauss example 2 for the results from the
Breusch-Pagan test.
4 Generalized Least Squares
4.1 is Known
2
Assume that the variance-covariance matrix of the errors is known (apart from the scalar ) and is given
by (2). We learned that the e¢ cient estimator is
^ = (X 0 1
X) 1
(X 0 1
Y)
= (X 0 P 0 P X) 1
(X 0 P 0 P Y )
where P P 0 = I and 2 3
1= 1 0 0
6 7
6 7
6 0 1= 0 7
6 2 7
P =6 . .. .. 7:
6 . .. 7
6 . . . . 7
4 5
0 0 1= n
GLS can be interpreted as "weighted least squares" because the transformation matrix P weights every
observation by the inverse of its error standard deviation. Therefore, observations with the most inherent
uncertainty get the smallest weight.
Example. Let the model be
Yi = Xi + i
where
2 2
i = Xi2 :
4
The GLS estimator is therefore
P 1
^ = Pi Xi
2 Xi Yi 1X
1 2
= Yi =Xi
i X 2 Xi n i
i
or the average y-x ratio.
4.2 is Unknown
2
There are too many i elements to estimate with a sample size equal to n. Therefore, we need to restrict
2 2 2
i so that it is a function of a smaller number of parameters (e.g., i = Xi2 or 2
i = f ( 0 zi )).
4.2.1 Two-Step Estimation
Since is unknown, we need to estimate it. Let’s refer to
^ = (X 0 ^ 1
X) 1
(X 0 ^ 1
Y)
F GLS
as the feasible GLS estimator. Consider the following two-step procedure for calculating ^ F GLS :
1. Estimate the regression model e2i = f ( 0 zi ) + i. Use ^ to obtain the estimates ^ 2i = f (^ 0 zi ).
2. Calculate ^ F GLS .
Provided ^ is a consistent estimate of in step #1, then ^ F GLS will be asymptotically e¢ cient at step
#2. It may be possible to iterate steps #1 and #2 further, but nothing is gained asymptotically. Sometimes
2
it may be necessary to transform the regression model in step #1 (e.g., take natural logs of i = exp( 0 zi )).
4.2.2 Maximum Likelihood Estimation
2 2
Write the heteroscedasticity generally as i = fi ( ). The (normal) log likelihood function is
n Xn 1 (yi x0i )2
2 2
ln L( ; ; )= (ln(2 ) + ln( )) 0:5 [ln fi ( ) + 2
]:
2 i=1 fi ( )
The …rst-order conditions are
@ ln L 1 Xn 2xi yi 2xi x0i Xn xi i
= 2
= 0 =) =0 (3)
@ 2 i=1 fi ( ) i=1 fi ( )
@ ln L n 1 Xn 2
i 1 Xn 2
i
= + 4 = 0 =) 2 = (4)
@ 2 2 2 2 i=1 fi ( ) n i=1 fi ( )
@ ln L 1 Xn gi ( ) 1 Xn 2
i gi ( )
= + 2 =0 (5)
@ 2 i=1 fi ( ) 2 i=1 fi ( )2
5
where gi ( ) = @fi ( )=@ . Notice that equation (3) gives the normal equation for GLS. Solving equations
2
(3) through (5) jointly for =f ; ; g will produce the maximum likelihood estimates of the model. This
can be accomplished in a couple of di¤erent ways.
1. Brute force. Use one of the nonlinear optimization algorithms (e.g., Newton-Raphson) to maximize
the likelihood function.
2. Oberhofer and Kmenta two-step estimator. Start with a consistent estimator of . Use that estimate
2
to obtain estimates of and . Iterate back and forth until convergence.
The (e¢ cient) asymptotic ML variance is given by the negative inverse of the information matrix
@ 2 ln L
asy:var:(^M L ) = E[ ] 1
@ @ 0
and is given as equation (11-21) in Greene. If this matrix is not working well in the nonlinear optimization
algorithm or is not invertible, one could simply use the negative inverse Hessian (without expectations) or
the outer product of the gradients (OPG).
4.3 Model Based Test for Heteroscedasticity
As a …nal note, rather than use the OLS residuals to test for heteroscedasticity, one could test the null
hypothesis H0 : = 0 using one of the classical asymptotic tests. For example, the likelihood ratio test
would use
asy 2
LR = 2[ln(LR ) ln(LU )] (J)
where LR is the likelihood value with homoscedasticity imposed (i.e., = 0) and LU is the likelihood value
allowing for heteroscedasticity (i.e., 6= 0).
4.4 Gauss Application (cont.)
Using the Ivory Coast rice-farming example, we now calculate feasible GLS and ML estimates of and .
2 2
The heteroscedasticity is assumed to follow i = exp( 0 zi ), where zi = (1; region1i ; region2i ). See Gauss
example 3 for further details.