Hansen’s J Test: Is the model specification correct?
That is, is E(z0 u) = 0 for y = xβ + u correct?
H0 : E(z0 u) = 0 (The model is correct.)
H1 : E(z0 u) , 0
The number of equations is r, while the number of parameters is k.
The degree of freedom is r − k.
( 1 ∑ )0 ( 1d ∑ )−1 ( 1 ∑ )
n n n
√ zi ûi V( √ zi ûi ) √ zi ûi −→ χ(r − k),
n i=1 n i=1 n i=1
where ûi = yi − xi βGMM .
1d∑ 1∑
n n
V( zi ûi ) indicates the estimate of V( zi ui ) for ui = yi − xi β.
n i=1 n i=1
The J test is called a test for over-identifying restrictions (過剰識別制約).
149
Remark 1: X1 , X2 ,· · ·, Xn are mutually independent.
Xi ∼ N(µ, σ2 ) are assumed.
1∑
n
Consider X = Xi .
n i=1
X − E(X) X−µ
Then, √ = √ −→ N(0, 1).
σ/ n
V(X)
√
That is, n(X − µ) −→ N(0, σ2 ).
Remark 2: X1 , X2 ,· · ·, Xn are mutually independent.
Xi ∼ N(µ, σ2 ) are assumed.
( Xi − µ )2 ∑n (
Xi − µ )2
Then, ∼ χ2
(1) and ∼ χ2 (n).
σ 2
i=1
σ 2
∑n (
Xi − X )2
If µ is replaced by its estimator X, then ∼ χ2 (n − 1).
i=1
σ 2
150
Note:
X − X 0 σ2 0 −1 Xi − X
i
X − X
∑( Xi − X )2 Xi − X
n σ2 i
= . ... . ∼ χ2 (n − 1)
σ 2 .. ..
i=1
Xn − X 0 σ 2
Xn − X
151
In the case of GMM,
1 ∑
n
√ zi ui −→ N(0, Σ),
n i=1
( 1 ∑ n )
where Σ = V √ zi ui .
n i=1
( 1 ∑ n )0 ( 1 ∑ n )
−1
Therefore, we obtain: √ zi ui Σ √ zi ui −→ χ2 (r).
n i=1 n i=1
In order to obtain ûi , we have to estimate β, which is a k × 1 vector.
( 1 ∑ n )0 ( 1 ∑ n )
Therefore, replacing ui by ûi , we have: √ zi ûi Σ−1 √ zi ûi −→ χ2 (r −k).
n i=1 n i=1
( 1 ∑ n )0 ( 1 ∑ n )
−1
Moreover, from Σ̂ −→ Σ, we obtain: √ zi ûi Σ̂ √ zi ûi −→ χ2 (r − k),
n i=1 n i=1
where Σ̂ is a consistent estimator of Σ.
152
5.3 Generalized Method of Moments (GMM, 一般化積率法) II
— Nonlinear Case —
Consider the general case:
E(h(θ; w)) = 0,
which is the orthogonality condition.
A k × 1 vector θ denotes a parameter to be estimated.
h(θ; w) is a r × 1 vector for r ≥ k.
Let wi = (yi , xi ) be the ith observed data, i.e., the ith realization of w.
Define g(θ; W) as:
1∑
n
g(θ; W) = h(θ; wi ),
n i=1
where W = {wn , wn−1 , · · · , w1 }.
g(θ; W) is a r × 1 vector for r ≥ k.
153
Let θ̂ be the GMM estimator which minimizes:
g(θ; W)0 S −1 g(θ; W),
with respect to θ.
• Solve the following first-order condition:
∂g(θ; W)0 −1
S g(θ; W) = 0,
∂θ
with respect to θ. There are r equations and k parameters.
Computational Procedure:
Linearizing the first-order condition around θ = θ̂,
∂g(θ; W)0 −1
0= S g(θ; W)
∂θ
∂g(θ̂; W)0 −1 ∂g(θ̂; W)0 −1 ∂g(θ̂; W)
≈ S g(θ̂; W) + S (θ − θ̂)
∂θ ∂θ ∂θ0
= D̂0 S −1 g(θ̂; W) + D̂0 S −1 D̂(θ − θ̂),
154
∂g(θ̂; W)
where D̂ = , which is a r × k matrix.
∂θ0
Note that in the second term of the second line the second derivative is ignored and
omitted.
Rewriting, we have the following equation:
θ − θ̂ = −(D̂0 S −1 D̂)−1 D̂0 S −1 g(θ̂; W).
Replacing θ and θ̂ by θ̂(i+1) and θ̂(i) , respectively, we obtain:
θ̂(i+1) = θ̂(i) − (D̂(i)0 S −1 D̂(i) )−1 D̂(i)0 S −1 g(θ̂(i) ; W),
∂g(θ̂(i) ; W)
where D̂ = (i)
.
∂θ0
Given S , repeat the iterative procedure for i = 1, 2, 3, · · ·, until θ̂(i+1) is equal to θ̂(i) .
How do we derive the weight matrix S ?
155
• In the case where h(θ; wi ), i = 1, 2, · · · , n, are mutually independent, S is:
(√ ) ( )
S =V ng(θ; W) = nE g(θ; W)g(θ; W)0
(( 1 ∑n )( 1 ∑n )0 ) 1 ∑ n ∑ n ( )
= nE h(θ; wi ) h(θ; w j ) = E h(θ; wi )h(θ; w j )0
n i=1 n j=1 n i=1 j=1
1∑ ( )
n
= E h(θ; wi )h(θ; wi )0 ,
n i=1
which is a r × r matrix.
Note that
( ) ( )
(i) E h(θ; wi ) = 0 for all i and accordingly E g(θ; W) = 0,
1∑ 1∑
n n
(ii) g(θ; W) = h(θ; wi ) = h(θ; w j ),
n i=1 n j=1
( )
(iii) E h(θ; wi )h(θ; w j )0 = 0 for i , j.
1∑
n
The estimator of S , denoted by Ŝ is given by: Ŝ = h(θ̂; wi )h(θ̂; wi )0 −→ S .
n i=1
156
• Taking into account serial correlation of h(θ; wi ), i = 1, 2, · · · , n, S is given by:
(√ ) ( )
S = V ng(θ; W) = nE g(θ; W)g(θ; W)0
(( 1 ∑n )( 1 ∑n )0 ) 1 ∑ n ∑ n ( )
= nE h(θ; wi ) h(θ; w j ) = E h(θ; wi )h(θ; w j )0 .
n i=1 n j=1 n i=1 j=1
(∑
n )
Note that E h(θ; wi ) = 0.
i=1
( )
Define Γτ = E h(θ; wi )h(θ; wi−τ )0 < ∞, i.e., h(θ; wi ) is stationary.
Stationarity:
( )
(i) E h(θ; wi ) does not depend on i,
( )
(ii) E h(θ; wi )h(θ; wi−τ )0 depends on time difference τ.
( )
=⇒ E h(θ; wi )h(θ; wi−τ )0 = Γτ
157
1 ∑∑ ( )
n n
S = E h(θ; wi )h(θ; w j )0
n i=1 j=1
1( ( ) ( ) ( )
= E h(θ; w1 )h(θ; w1 )0 + E h(θ; w1 )h(θ; w2 )0 + · · · + E h(θ; w1 )h(θ; wn )0
n
( ) ( ) ( )
E h(θ; w2 )h(θ; w1 )0 + E h(θ; w2 )h(θ; w2 )0 + · · · + E h(θ; w2 )h(θ; wn )0
..
.
( ) ( ) ( ))
E h(θ; wn )h(θ; w1 )0 + E h(θ; wn )h(θ; w2 )0 + · · · + E h(θ; wn )h(θ; wn )0
1
= (Γ0 + Γ01 + Γ02 + · · · + Γ0n−1
n
Γ1 + Γ0 + Γ01 + · · · + Γ0n−2
..
.
Γn−1 + Γn−2 + Γn−3 + · · · + Γ0 )
158
1( )
= nΓ0 + (n − 1)(Γ1 + Γ01 ) + (n − 2)(Γ2 + Γ02 ) + · · · + (Γn−1 + Γ0n−1 )
n
∑
n−1
n−i ∑
n−1 (
i)
= Γ0 + (Γi + Γ0i ) = Γ0 + 1 − (Γi + Γ0i )
i=1
n i=1
n
∑
q
( i )
= Γ0 + 1− (Γi + Γ0i ).
i=1
q+1
( ) ( )
Note that Γ0τ = E h(θ; wi−τ )h(θ; wi )0 = Γ(−τ), because Γτ = E h(θ; wi )h(θ; wi−τ )0 .
In the last line, n is replaced by q + 1, where q < n.
1 ∑
n
We need to estimate Γτ as: Γ̂τ = h(θ̂; wi )h(θ̂; wi−τ )0 .
n i=τ+1
As τ is large, Γ̂τ is unstable.
Therefore, we choose the q which is less than n.
159
S is estimatated as:
∑
q
( i )
Ŝ = Γ̂0 + 1− (Γ̂i + Γ̂0i ),
i=1
q+1
=⇒ the Newey-West Estimator
Note that Ŝ −→ S , because Γ̂τ −→ Γτ as n −→ ∞.
Asymptotic Properties of GMM:
GMM is consistent and asymptotic normal as follows:
√ ( )
n(θ̂ − θ) −→ N 0, (D0 S −1 D)−1 ,
where D is a r × k matrix, and D̂ is an estimator of D, defined as:
∂g(θ; W) ∂g(θ̂; W)
D= , D̂ = .
∂θ0 ∂θ0
160
Proof of Asymptotic Normality:
Assumption 1: θ̂ −→ θ
√ (√ )
Assumption 2: ng(θ; W) −→ N(0, S ), i.e., S = lim V ng(θ; W) .
n→∞
The first-order condition of GMM is:
∂g(θ; W)0 −1
S g(θ; W) = 0.
∂θ
The GMM estimator, denote by θ̂, satisfies the above equation.
Therefore, we have the following:
∂g(θ̂; W)0 −1
Ŝ g(θ̂; W) = 0.
∂θ
161
Linearize g(θ̂; W) around θ̂ = θ as follows:
∂g(θ; W)
g(θ̂; W) = g(θ; W) + (θ̂ − θ) = g(θ; W) + D(θ̂ − θ),
∂θ0
∂g(θ; W)
where D = , and θ is between θ̂and θ.
∂θ0
=⇒ Theorem of Mean Value (平均値の定理)
Substituting the linear approximation at θ̂ = θ, we obtain:
0 = D̂0 Ŝ −1 g(θ̂; W)
( )
= D̂0 Ŝ −1 g(θ; W) + D(θ̂ − θ)
= D̂0 Ŝ −1 g(θ; W) + D̂0 Ŝ −1 D(θ̂ − θ),
which can be rewritten as:
θ̂ − θ = −(D̂0 Ŝ −1 D)−1 D̂0 Ŝ −1 g(θ; W).
162
∂g(θ; W)
Note that D = , where θ is between θ̂ and θ.
∂θ0
From Assumption 1, θ̂ −→ θ implies θ −→ θ
Therefore,
√ √
n(θ̂ − θ) = −(D̂0 Ŝ −1 D)−1 D̂0 S −1 × ng(θ; W).
Accordingly , the GMM estimator θ̂ has the following asymptotic distribution:
√ ( )
n(θ̂ − θ) −→ N 0, (D0 S −1 D)−1 .
Note that D̂ −→ D, D −→ D, Ŝ −→ S and Assumption 2 are utilized.
163
Computational Procedure:
∑
q
( i ) 1 ∑
n
0
(1) Compute Ŝ = Γ̂0 +
(i)
1− (Γ̂i + Γ̂i ), where Γ̂τ = h(θ̂; wi )h(θ̂; wi−τ )0 .
i=1
q+1 n i=τ+1
q is set by a researcher.
(2) Use the following iterative procedure:
θ̂(i+1) = θ̂(i) − (D̂(i)0 Ŝ (i)−1 D̂(i) )−1 D̂(i)0 Ŝ (i)−1 g(θ̂(i) ; W).
(3) Repeat (1) and (2) until θ̂(i+1) is equal to θ̂(i) .
In (2), remember that when S is given we take the following iterative procedure:
θ̂(i+1) = θ̂(i) − (D̂(i)0 S −1 D̂(i) )−1 D̂(i)0 S −1 g(θ̂(i) ; W),
∂g(θ̂(i) ; W)
where D̂ =(i)
. S is replaced by Ŝ (i) .
∂θ0
164
( )
• If the assumption E h(θ; w) = 0 is violated, the GMM estimator θ̂ is no longer
consistent.
( )
Therefore, we need to check if E h(θ; w) = 0.
From Assumption 2, note as follows:
(√ )0 ( √ )
J= ng(θ̂; W) Ŝ −1 ng(θ̂; W) −→ χ2 (r − k),
which is called Hansen’s J test.
Because of r equations and k parameters, the degree of freedom is given by r − k.
If J is small enough, we can judge that the specified model is correct.
165
Testing Hypothesis:
Remember that the GMM estimator θ̂ has the following asymptotic distribution:
√ ( )
n(θ̂ − θ) −→ N 0, (D0 S −1 D)−1 .
Consider testing the following null and alternative hypotheses:
• The null hypothesis: H0 : R(θ) = 0,
• The alternative hypothesis: H1 : R(θ) , 0,
where R(θ) is a p × 1 vector function for p ≤ k.
p denotes the number of restrictions.
∂R(θ)
R(θ) is linearized as: R(θ̂) = R(θ) + Rθ (θ̂ − θ), where Rθ = , which is a p × k
∂θ0
matrix.
166
Note that θ is bewteen θ̂ and θ. If θ̂ −→ θ, then θ −→ θ and Rθ −→ Rθ .
Under the null hypothesis R(θ) = 0, we have R(θ̂) = Rθ (θ̂ − θ), which implies that the
distribution of R(θ̂) is equivalent to that of Rθ (θ̂ − θ).
√
The distribution of nR(θ̂) is given by:
√ √ ( )
nR(θ̂) = nRθ (θ̂ − θ) −→ N 0, Rθ (D0 S −1 D)−1 R0θ .
Therefore, under the null hypothesis, we have the following distribution:
( )−1
nR(θ̂) Rθ (D0 S −1 D)−1 R0θ R(θ̂)0 −→ χ2 (p).
Practically, replacing θ by θ̂ in Rθ , D and S , we use the following test statistic:
( )−1
nR(θ̂) Rθ̂ (D̂0 Ŝ −1 D̂)−1 R0θ̂ R(θ̂)0 −→ χ2 (p).
=⇒ Wald type test
167
Examples of h(θ; w):
1. OLS:
Regression Model: yi = xi β + i , E(xi0 i ) = 0
h(θ; wi ) is taken as:
h(θ; wi ) = xi0 (yi − xi β).
2. IV (Instrumental Variable, 操作変数法):
Regression Model: yi = xi β + i , E(xi0 i ) , 0, E(z0i i ) = 0
h(θ; wi ) is taken as:
h(θ; wi ) = z0i (yi − xi β),
where zi is a vector of instrumental variables.
168
When zi is a 1 × k vector, the GMM of β is equivalent to the instrumental
variable (IV) estimator.
When zi is a 1 × r vector for r > k, the GMM of β is equivalent to the two-stage
least squares (2SLS) estimator.
3. NLS (Nonlinear Least Squares, 非線形最小二乗法):
Regression Model: f (yi , xi , β) = i , E(xi0 i ) , 0, E(z0i i ) = 0
h(θ; wi ) is taken as:
h(θ; wi ) = z0i f (yi , xi , β)
where zi is a vector of instrumental variables.
169
Example: Demand function using STATA
二人以上の世帯のうち勤労者世帯(全国)
year
y = 実収入(一月当たり,実質データ)
q1 = 穀類支出額(一年当たり,実質データ)
p1 = 穀類価格 (相対価格=穀類 CPI /総合 CPI)
p2 = 魚介類価格(相対価格=魚介類 CPI /総合 CPI)
p3 = 肉類価格 (相対価格=肉類 CPI /総合 CPI)
year y q1 p1 p2 p3
2000 567865 7087.0 1.043390 0.884965 0.818365
2001 561722 6993.1 1.032520 0.886179 0.822154
2002 553768 6934.4 1.031800 0.891282 0.834872
2003 539928 6816.8 1.050410 0.876543 0.843621
2004 547006 6651.6 1.089510 0.865226 0.868313
2005 541367 6615.8 1.020640 0.862745 0.887513
2006 540863 6523.7 1.000000 0.878601 0.891975
2007 543994 6680.5 0.994856 0.886831 0.908436
2008 541821 6494.7 1.043610 0.894523 0.932049
2009 533154 6477.3 1.066870 0.898148 0.934156
2010 539577 6458.2 1.040420 0.889119 0.924352
2011 529750 6448.4 1.025960 0.894081 0.925234
2012 538988 6377.6 1.057170 0.904366 0.917879
2013 542018 6360.7 1.047620 0.909938 0.916149
2014 523953 6174.6 1.016130 0.971774 0.960685
2015 525669 6268.0 1.000000 1.000000 1.000000
2016 527501 6244.8 1.018020 1.019020 1.017020
170
2017 531693 6106.6 1.027890 1.066730 1.025900
. tsset year
time variable: year, 2000 to 2017
delta: 1 unit
. reg q1 y p1 p2 p3 if year>2000.5
Source | SS df MS Number of obs = 17
-------------+---------------------------------- F(4, 12) = 25.83
Model | 913640.443 4 228410.111 Prob > F = 0.0000
Residual | 106100.077 12 8841.67308 R-squared = 0.8960
-------------+---------------------------------- Adj R-squared = 0.8613
Total | 1019740.52 16 63733.7825 Root MSE = 94.03
------------------------------------------------------------------------------
q1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y | .0067843 .0045443 1.49 0.161 -.003117 .0166856
p1 | -1128.834 998.7698 -1.13 0.280 -3304.966 1047.299
p2 | 356.8095 806.2301 0.44 0.666 -1399.815 2113.434
p3 | -3442.221 1130.078 -3.05 0.010 -5904.448 -979.9931
_cons | 6850.563 3179.316 2.15 0.052 -76.57278 13777.7
------------------------------------------------------------------------------
. gmm (q1-{b0}-{b1}*y-{b2}*p1-{b3}*p2-{b4}*p3) if year>2000.5, instruments(y p1
> p2 p3)
171
Step 1
Iteration 0: GMM criterion Q(b) = 42400764
Iteration 1: GMM criterion Q(b) = 6.781e-12
Iteration 2: GMM criterion Q(b) = 6.781e-12 (backed up)
Step 2
Iteration 0: GMM criterion Q(b) = 1.966e-15
Iteration 1: GMM criterion Q(b) = 1.963e-15 (backed up)
convergence not achieved
The Gauss-Newton stopping criterion has been met but missing standard errors
indicate some of the parameters are not identified.
GMM estimation
Number of parameters = 5
Number of moments = 5
Initial weight matrix: Unadjusted Number of obs = 17
GMM weight matrix: Robust
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/b0 | 6850.563 17645.71 0.39 0.698 -27734.4 41435.53
/b1 | .0067843 .0282325 0.24 0.810 -.0485504 .062119
/b2 | -1128.834 1057.915 -1.07 0.286 -3202.309 944.6415
/b3 | 356.8095 1565.86 0.23 0.820 -2712.219 3425.838
/b4 | -3442.221 5085.561 -0.68 0.498 -13409.74 6525.296
------------------------------------------------------------------------------
172
Instruments for equation 1: y p1 p2 p3 _cons
Warning: convergence not achieved
. gmm (q1-{b0}-{b1}*y-{b2}*p1-{b3}*p2-{b4}*p3) if year>2000.5, instruments(p1 p2
> p3 l.p1 l.p2 l.p3)
Step 1
Iteration 0: GMM criterion Q(b) = 42404066
Iteration 1: GMM criterion Q(b) = 2790.3146
Iteration 2: GMM criterion Q(b) = 2790.3146
Step 2
Iteration 0: GMM criterion Q(b) = .3201826
Iteration 1: GMM criterion Q(b) = .2469289
Iteration 2: GMM criterion Q(b) = .2469289
GMM estimation
Number of parameters = 5
Number of moments = 7
Initial weight matrix: Unadjusted Number of obs = 17
GMM weight matrix: Robust
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/b0 | -1192.466 4669.012 -0.26 0.798 -10343.56 7958.63
/b1 | .0186312 .0067682 2.75 0.006 .0053657 .0318967
173
/b2 | -1016.864 780.979 -1.30 0.193 -2547.554 513.8271
/b3 | -905.5585 598.0885 -1.51 0.130 -2077.79 266.6734
/b4 | -499.8064 1147.985 -0.44 0.663 -2749.815 1750.202
------------------------------------------------------------------------------
Instruments for equation 1: p1 p2 p3 L.p1 L.p2 L.p3 _cons
. estat overid
Test of overidentifying restriction:
Hansen’s J chi2(2) = 4.19779 (p = 0.1226)
174