Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views10 pages

Delta Method

The delta method uses Taylor expansion to approximate the distribution of a random vector, allowing for the derivation of limit laws from the properties of estimators. It establishes that if an estimator converges in probability, then a function of that estimator will also converge, provided the function is continuous and differentiable. The document discusses applications of the delta method, including its implications for the robustness of chi-square tests and examples involving sample variance and skewness.

Uploaded by

MichaelZamesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

Delta Method

The delta method uses Taylor expansion to approximate the distribution of a random vector, allowing for the derivation of limit laws from the properties of estimators. It establishes that if an estimator converges in probability, then a function of that estimator will also converge, provided the function is continuous and differentiable. The document discusses applications of the delta method, including its implications for the robustness of chi-square tests and examples involving sample variance and skewness.

Uploaded by

MichaelZamesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

3

Delta Method

The delta method consists of using a Taylor expansion to approximate a


random vector of the form ~(Tn) by the polynomial~(())+ ~'(())(Tn­
()) + · · · in Tn - (). It is a simple but useful method to deduce the limit law
of~(Tn)- ~(())from that ofTn- 0. Applications include the nonrobust-
ness of the chi-square test for normal variances and variance stabilizing
transformations.

3.1 Basic Result


Suppose an estimator Tn for a parameter() is available, but the quantity of interest is ~ (()) for
some known function~. A natural estimator is ~(Tn). How do the asymptotic properties
of~ (Tn) follow from those of Tn?
A first result is an immediate consequence of the continuous-mapping theorem. If the
sequence Tn converges in probability to() and~ is continuous at(), then ~(Tn) converges
in probability to ~(0).
Of greater interest is a similar question concerning limit distributions. In particular, if
.jii(Tn-()) converges weakly to a limit distribution, is the same true for ..fo(~ (Tn) -~ (()))?
If ~ is differentiable, then the answer is affirmative. Informally, we have

.jii(~CTn)- ~(0)) ~ ~'(0) .jii(Tn- 0).

If .,fo(Tn -0) -v-+ T for some variable T, then we expect that ,Jn(~(Tn) - ~(())) -v-+ ~'(()) T.
In particular, if .,fo(Tn - ()) is asymptotically normal N(O, a 2), then we expect that
..fo(~CTn)- ~(()))is asymptotically normal N(O, ~'(0) 2 a 2 ). This is proved in greater
generality in the following theorem.
In the preceding paragraph it is silently understood that Tn is real-valued, but we are more
interested in considering statistics~ (Tn) that are formed out of several more basic statistics.
Consider the situation that Tn = (Tn,!, ... , Tn,k) is vector-valued, and that~: Rk 1-+ Rm is
a given function defined at least on a neighbourhood of (). Recall that ~ is differentiable at
() if there exists a linear map (matrix) ~9 :Rk 1-+ Rm such that

~(()+h)-~(())= ~9(h) + o(llhll), h--+ 0.

All the expressions in this equation are vectors of length m, and llhll is the Euclidean
norm. The linear map h 1-+ ~9 (h) is sometimes called a "total derivative," as opposed to

25
26 Delta Method

partial derivatives. A sufficient condition for 4> to be (totally) differentiable is that all partial
e
derivatives 84>i(x)f8x; exist for x in a neighborhood of and are continuous at e. (Just
existence of the partial derivatives is not enough.) In any case, the total derivative is found
from the partial derivatives. If 4> is differentiable, then it is partially differentiable, and the
derivative map h ~---* 4>9 (h) is matrix multiplication by the matrix

If the dependence of the derivative 4>9 on e is continuous, then 4> is called continuously
differentiable.
It is better to think of a derivative as a linear approximation h ~---* 4>8(h) to the function
h ~---* ¢(e +h)- ¢(e) than as a set of partial derivatives. Thus the derivative at a pointe
is a linear map. If the range space of 4> is the real line (so that the derivative is a horizontal
vector), then the derivative is also called the gradient of the function.
Note that what is usually called the derivative of a function 4> : lR ~---* lR does not com-
pletely correspond to the present derivative. The derivative at a point, usually written 4>' (e),
is written here as 4>9. Although 4>' (e) is a number, the second object 4>9 is identified with the
map h ~---* 4>9(h) = 4>' (e) h. Thus in the present terminology the usual derivative function
e ~---* ¢'(e) is a map from lR into the set of linear maps from lR ~---* IR, not a map from
lR ~---* JR. Graphically the "affine" approximation h ~---* ¢(e)+ 4>9(h) is the tangent to the
function 4> at e.
3.1 Theorem. Let 4> : ID>.p C IRk ~---* !Rm be a map defined on a subset of IRk and dif-
ferentiable at e. Let Tn be random vectors taking their values in the domain of¢. If
rn(Tn- e)"-"+ T for numbers rn --* 00, then rn(<f>(Tn)- ¢(e)) -4>9(T). Moreover, the
difference between rn(<f>(Tn)- ¢(e)) and ¢9(rn(Tn- e)) converges to zero in probability.

Proof. Because the sequencern(Tn -e) converges in distribution, it is uniformly tight and
Tn - e converges to zero in probability. By the differentiability of 4> the remainder function
R(h) = ¢(e +h)- ¢(e)- 4>9(h) satisfies R(h) = o(llhll) ash--* 0. Lemma 2.12 allows
to replace the fixed h by a random sequence and gives

</>(Tn)- ¢(e)- 4>9(Tn- e)= R(Tn- e)= op(IITn- en).


Multiply this left and right with rn, and note that op(rn IITn - e II) = op(l) by tightness of
the sequence rn(Tn -e). This yields the last statement of the theorem. Because matrix
multiplication is continuous, 4>9 (rn(Tn -e)) "-"+ 4>9 (T) by the continuous-mapping theorem.
Apply Slutsky's lemma to conclude that the sequence rn(<f>(Tn)- ¢(e)) has the same weak
limit. •

A common situation is that ,Jri(Tn- e) converges to a multivariate normal distribution


Nk(/1-, I:). Then the conclusion of the theorem is that the sequence Jn(<f>(Tn) -¢(e))
converges in law to the Nm (4>9~-t. 4>9 I:(4>9)T) distribution.

3.2 Example (Sample variance). The sample variance of n observations X 1, ••• , Xn


is defined as S 2 = n- 1.L:7=1 (X; - X) 2 and can be written as 4> (X, X 2) for the function
3.1 Basic Result 27

~(x, y) = y-x 2 • (Forsimplicityofnotation, wedividebynratherthann-1.) Supposethat


S 2 is based on a sample from a distribution with finite first to fourth moments a I, a 2, a 3 , a 4.
By the multivariate central limit theorem,

The map ~ is differentiable at the point (J = (a I , a 2) T, with derivative ~(a 1 ,a2 ) = (- 2a I , 1).
Thus if the vector (TI, T2 )' possesses the normal distribution in the last display, then

..;Ti(~(X, X 2) - ~(ai, a2))- -2ai TI + T2.

The latter variable is normally distributed with zero mean and a variance that can be ex-
pressed in ai, ... , a4. In case ai = 0, this variance is simply a4- ai. The general case
can be reduced to this case, because S2 does not change if the observations X; are replaced
by the centered variables Y; = X; - a I. Write Il-k = EY;k for the central rrwments of the
X;. Noting that S2 = ~ (Y, Y 2 ) and that ~ (/1-I, 11-2) = 11- 2 is the variance of the original
observations, we obtain

..;Ti(S2 -11-2)- N(o, 11-4- 11-D·


In view of Slutsky's lemma, the same result is valid for the unbiased version nj(n- 1)S2
of the sample variance, because Jn(n/(n- 1)- 1)--* 0. D

3.3 Example (Level of the chi-square test). As an application of the preceding example,
consider the chi-square test for testing variance. Normal theory prescribes to reject the null
hypothesis Ho: 11-2 ::::; 1 for values of nS2 exceeding the upper a point x;,a of the x;_I
distribution. If the observations are sampled from a normal distribution, then the test has
exactly level a. Is this still approximately the case if the underlying distribution is not
normal? Unfortunately, the answer is negative.
For large values of n, this can be seen with the help of the preceding result. The central
limit theorem and the preceding example yield the two statements

Jn(~:- 1)- N(O," + 2),


X2 - (n- 1)
n-I -N(O 1)
../2n- 2 ' '
where K = /1-4/ tti - 3 is the kurtosis of the underlying distribution. The first statement
implies that (x;,a - (n - 1))1,J2n - 2) converges to the upper a point Za of the standard
normal distribution. Thus the level of the chi-square test satisfies

(
PJJ.2=InS
2
>Xn,a -P vn
2 ) _ ( r::(S/1-2-1
2
)
>
-n)
x;,aJn (za..fi)
--*1-<1> JK+2.

The asymptotic level reduces to 1- <l>(za) =a if and only if the kurtosis of the underlying
distribution is 0. This is the case for normal distributions. On the other hand, heavy-tailed
distributions have a much larger kurtosis. If the kurtosis of the underlying distribution is
"close to" infinity, then the asymptotic level is close to 1- <1>(0) = 1/2. We conclude that
the level of the chi-square test is nonrobust against departures of normality that affect the
value of the kurtosis. At least this is true if the critical values of the test are taken from
the chi-square distribution with (n - 1) degrees of freedom. If, instead, we would use a
28 Delta Method

Table 3.1. Level of the test that rejects


if ns21f.L2 exceeds the 0.95 quantile
f
of the X 9 distribution.

Law Level

Laplace 0.12
0.95 N(O, 1) + 0.05 N(O, 9) 0.12

Note: Approximations based on simulation of


10,000 samples.

normal approximation to the distribution of .jn(S21JL 2 - 1) the problem would not arise,
provided the asymptotic variance JC + 2 is estimated accurately. Table 3.1 gives the level
for two distributions with slightly heavier tails than the normal distribution. D

In the preceding example the asymptotic distribution of .jn(S2 -a 2) was obtained by the
delta method. Actually, it can also and more easily be derived by a direct expansion. Write

r.:: 2 -a)=-vn
'V"(S 2 1
'-( -L)X;-JL)f--
2 -a 2) --vn(X-JL).
r.:: - 2
n i=l
The second term converges to zero in probability; the first term is asymptotically normal
by the central limit theorem. The whole expression is asymptotically normal by Slutsky's
lemma.
Thus it is not always a good idea to apply general theorems. However, in many exam-
ples the delta method is a good way to package the mechanics of Taylor expansions in a
transparent way.

3.4 Example. Consider the joint limit distribution of the sample variance S 2 and the
X1S. Again for the limit distribution it does not make a difference whether we
t -statistic
use a factor n or n - 1 to standardize S 2 • For simplicity we use n. Then ( S2 , X1S) can be
written as </J(X, X 2) for the map </J: R? 1-+ JR2 given by

</J(x, y) = (y- x2, (y- :2)1/2).


The joint limit distribution of .jn(X- a 1 , X 2 -a2 ) is derived in the preceding example. The
map </J is differentiable at() = (a I' a2) provided a 2 = a2 - ar is positive, with derivative

It follows that the sequence .jn(S2 -a 2 , X/ S -a 1 ja) is asymptotically bivariate normally


distributed, with zero mean and covariance matrix,

It is easy but uninteresting to compute this explicitly. D


3.1 Basic Result 29

3.5 Example (Skewness). The sample skewness of a sample X 1, ..• , Xn is defined as


l _ n -l"'n - 3
~i-l(Xi- X)
n - (n-1 L7=1 (Xi - X)2)3/2.
Not surprisingly it converges in probability to the skewness of the underlying distribution,
defined as the quotient)..= JL 3 fa 3 of the third central moment and the third power of the
standard deviation of one observation. The skewness of a symmetric distribution, such
as the normal distribution, equals zero, and the sample skewness may be used to test this
aspect of normality of the underlying distribution. For large samples a critical value may
be determined from the normal approximation for the sample skewness.
The sample skewness can be written as 4J (X, X 2 , X 3 ) for the function 4J given by
c- 3ab + 2a 3
4J(a, b, c) = (b- a2)3/2

The sequence .jii.(X - at. X 2 - a 2, X 3 - a 3) is asymptotically mean-zero normal by the


central limit theorem, provided EXf is finite. The value 4J (a 1, a 2, a 3) is exactly the popu-
lation skewness. The function4J is differentiable at the point (at. a2, a3) and application of
the delta method is straightforward. We can save work by noting that the sample skewness
is location and scale invariant. With Yi = (Xi - a 1) I a, the skewness can also be written as
4J(Y, Y2 , Y3). With)..= JL 3 fa 3 denoting the skewness of the underlying distribution, the
Ys satisfy

.;n (y2Y_
y3-)..
1) -N (o. ( +i
/C
JC ~ 2 /L6/a6-
3 J.Ls/as-)..
J.Ls;a~ ~ )..)) .
)..2

The derivative of 4J at the point (0, 1, )..) equals ( -3, -3)../2, 1). Hence, if T possesses the
normal distribution in the display, then .jii(ln -)..) is asymptotically normal distributed with
mean zero and variance equal to var( -3T1 - 3)..T2 /2 + T3 ). If the underlying distribution
is normal, then).. = J.Ls = 0, JC = 0 and JL6/a 6 = 15. In that case the sample skewness is
asymptotically N(O, 6)-distributed.
An approximate level a test for normality based on the sample skewness could be to
reject normality if .jiilln I > J6 Zaf2· Table 3.2 gives the level of this test for different
values of n. D

Table 3.2. Level of the test that


rejects if Jillln 1/.J6 exceeds the
0.975 quantile of the normal
distribution, in the case that the
observations are normally
distributed.

n Level

10 0.02
20 0.03
30 0.03
50 0.05

Note: Approximations based on simula-


tion of 10,000 samples.
30 Delta Method

3.2 Variance-Stabilizing Transformations


Given a sequence of statistics Tn with ,Jri(Tn- 0) ~ N(O, a 2 (0)) for a range of values of
(}, asymptotic confidence intervals for (} are given by

( Tn-Za a(O) a(O))


Jn'Tn+Za Jn .

These are asymptotically of level 1 - 2a in that the probability that (} is covered by the
interval converges to 1 - 2a for every (}. Unfortunately, as stated previously, these intervals
are useless, because of their dependence on the unknown (}. One solution is to replace
the unknown standard deviations a(O) by estimators. If the sequence of estimators is
chosen consistent, then the resulting confidence interval still has asymptotic level 1 - 2a.
Another approach is to use a variance-stabilizing transformation, which often leads to a
better approximation.
The idea is that no problem arises if the asymptotic variances a 2 ( (}) are independent of(}.
Although this fortunate situation is rare, it is often possible to transform the parameter into
a different parameter 'fl = 4J(O), for which this idea can be applied. The natural estimator
for 'fl is 4J(Tn). If 4J is differentiable, then

For 4J chosen such that 4J'(O)a(O) = 1, the asymptotic variance is constant and finding an
asymptotic confidence interval for 'fl = 4J(O) is easy. The solution

4J(O) =I a~(}) d(}

is a variance-stabililizing transformation. variance stabililizing transformation. If it is


well defined, then it is automatically monotone, so that a confidence interval for 'fl can be
transformed back into a confidence interval for (}.

3.6 Example (Co"elation). Let (Xt, Yt), ... , (Xn, Yn) be a samplefromabivariatenor-
mal distribution with correlation coefficient p. The sample correlation coefficient is defined
as

With the help of the delta method, it is possible to derive that ,Jri(r- p) is asymptotically
zero-mean normal, with variance depending on the (mixed) third and fourth moments of
(X, Y). This is true for general underlying distributions, provided the fourth moments exist.
Under the normality assumption the asymptotic variance can be expressed in the correlation
of X and Y. Tedious algebra gives

It does not work very well to base an asymptotic confidence interval directly on this result.
3.3 Higher-Order Expansions 31

Table 3.3. Coverage probability of the asymptotic 95%


confidence interval for the correlation coefficient, for two
values of n and five different values of the true correlation p.

n p = 0 p = 0.2 p = 0.4 p = 0.6 p = 0.8

15 0.92 0.92 0.92 0.93 0.92


25 0.93 0.94 0.94 0.94 0.94

Note: Approximations based on simulation of 10,000 samples.

Figure 3.1. Histogram of 1000 sample correlation coefficients, based on 1000 independent samples
of the the bivariate normal distribution with correlation 0.6, and histogram of the arctanh of these
values.

The transformation

</J(p) = I 1 1 1+p
- -2 dp =-log--= arctanhp
1-p 2 1-p

is variance stabilizing. Thus, the sequence .J1i(arctanh r -arctanh p) converges to a standard


normal distribution for every p. This leads to the asymptotic confidence interval for the
correlation coefficient p given by

(tanh(arctanhr- Za./.Jli), tanh(arctanhr + Za./.Jli)).


Table 3.3 gives an indication of the accuracy of this interval. Besides stabilizing the vari-
ance the arctanh transformation has the benefit of symmetrizing the distribution of the
sample correlation coefficient (which is perhaps of greater importance), as can be seen in
Figure 5.3. D

*3.3 Higher-Order Expansions


To package a simple idea in a theorem has the danger of obscuring the idea. The delta
method is based on a Taylor expansion of order one. Sometimes a problem cannot be
exactly forced into the framework described by the theorem, but the principle of a Taylor
expansion is still valid.
32 Delta Method

In the one-dimensional case, a Taylor expansion applied to a statistic Tn has the form

Usually the linear term (Tn - e)~'(e) is of higher order than the remainder, and thus
determines the order at which ~(Tn)- ~(e) converges to zero: the same order as Tn -e.
Then the approach of the preceding section gives the limit distribution of ~(Tn)- ~(e). If
~'(e) = 0, this approach is still valid but not of much interest, because the resulting limit
distribution is degenerate at zero. Then it is more informative to multiply the difference
~(Tn) -~(e) by a higher rate and obtain a nondegenerate limit distribution. Looking at
the Taylor expansion, we see that the linear term disappears if~' (e) = 0, and we expect
that the quadratic term determines the limit behavior of ~(Tn).

3.7 Example. Suppose that Jn,X converges weakly to a standard normal distribution.
Because the derivative of x 1-+ cos x is zero at x = 0, the standard delta method of the
preceding section yields that Jn(cos X - cos 0) converges weakly to 0. It should be
concluded that Jn is not the right norming rate for the random sequence cos X - 1. A
more informative statement is that - 2n (cos X- 1) converges in distribution to a chi-square
distribution with one degree of freedom. The explanation is that

cos X- cos 0 = (X - 0)0 +t<:X- 0) 2 (cos x)j~=o + ·· ·.


That the remainder term is negligible after multiplication with n can be shown along the
same lines as the proof of Theorem 3.1. The sequence nX2 converges in law to a xt
distribution by the continuous-mapping theorem; the sequence -2n(cos X- 1) has the
same limit, by Slutsky's lemma. D

A more complicated situation arises if the statistic Tn is higher-dimensional with coor-


dinates of different orders of magnitude. For instance, for a real-valued function~.

If the sequences Tn,i - ei are of different order, then it may happen, for instance, that the
linear part involving Tn,i - ei is of the same order as the quadratic part involving (Tn,j - ei ) 2 •
Thus, it is necessary to determine carefully the rate of all terms in the expansion, and to
rearrange these in decreasing order of magnitude, before neglecting the "remainder."

*3.4 Uniform Delta Method


Sometimes we wish to prove the asymptotic normality of a sequence Jn(~(Tn)- ~(en))
for centering vectors en changing with n, rather than a fixed vector. If ,Jn(en- e) --* h for
e
certain vectors and h, then this can be handled easily by decomposing
3.5 Moments 33

Several applications of Slutsky's lemma and the delta method yield as limit in law the vector
l/J9(T +h) -l/J9(h) = l/J9(T), if Tis the limit in distribution of ,.jn(Tn- en). For en ~ e
at a slower rate, this argument does not work. However, the same result is true under a
slightly stronger differentiability assumption on l/J.

3.8 Theorem. Let l/J : JRk ~---+ JRm be a map defined and continuously differentiable in
a neighborhood of e. Let Tn be random vectors taking their values in the domain of
l/J. If rn(Tn -en) ov-t T for vectors en ~ e and numbers rn ~ oo, then rn(l/J(Tn) -
l/J(en)) ov-tl/J9(T). Moreover, the difference between rn(l/J(Tn) -lfJ(en)) and l/J9(rn(Tn- en))
converges to zero in probability.

Proof. It suffices to prove the last assertion. Because convergence in probability to zero
of vectors is equivalent to convergence to zero of the components separately, it is no loss
of generality to assume that l/J is real-valued. For 0 ::::; t ::::; 1 and fixed h, define gn(t) =
l/J(en + th). For sufficiently large n and sufficiently small h, both en and en + h are in a
ball around e inside the neighborhood on which l/J is differentiable. Then gn : [0, 1] ~---+ lR is
continuously differentiable with derivative g~ (t) = lf19.+th (h). By the mean-value theorem,
gn(l)- gn(O) = g~(;) for some 0::::;;::::; 1. In other words

e
By the continuity of the map ~---+ l/19. there exists for every e > 0 a 8 > 0 such that
e
lll/J~(h) -l/19 (h) II < ellh II for every II~ - II < 8 and every h. For sufficiently large n and
llh II < 8/2, the vectors en +; h are within distance 8 of e, so that the norm II Rn (h) II of the
right side of the preceding display is bounded by ell h 11. Thus, for any "f/ > 0,

The first term converges to zero as n ~ oo. The second term can be made arbitrarily small
by choosing e small. •

*3.5 Moments
So far we have discussed the stability of convergence in distribution under transformations.
We can pose the same problem regarding moments: Can an expansion for the moments of
l/J(Tn) -l/J(e) be derived from a similar expansion for the moments of Tn- e? In principle
the answer is affirmative, but unlike in the distributional case, in which a simple derivative
of l/J is enough, global regularity conditions on l/J are needed to argue that the remainder
terms are negligible.
One possible approach is to apply the distributional delta method first, thus yielding the
qualitative asymptotic behavior. Next, the convergence of the moments of l/J(Tn) -l/J(e)
(or a remainder term) is a matter of uniform integrability, in view of Lemma 2.20. If
l/J is uniformly Lipschitz, then this uniform integrability follows from the corresponding
uniform integrability of Tn - e. If l/J has an unbounded derivative, then the connection
between moments of l/J(Tn) -l/J(e) and Tn- e is harder to make, in general.
34 Delta Method

Notes
The Delta method belongs to the folklore of statistics. It is not entirely trivial; proofs are
sometimes based on the mean-value theorem and then require continuous differentiability in
a neighborhood. A generalization to functions on infinite-dimensional spaces is discussed
in Chapter 20.

PROBLEMS
1. Find the joint limit distribution of (.Jii(X- JL), .Jii(S2 - a 2)) if X and S2 are based on a sample
of size n from a distribution with finite fourth moment. Under what condition on the underlying
distribution are .Jii(X- JL) and .Jii(S2 - a 2) asymptotically independent?
2. Find the asymptotic distribution of .Jii(r - p) if r is the correlation coefficient of a sample of n
bivariate vectors with finite fourth moments. (This is quite a bit of work. It helps to assume that
the mean and the variance are equal to 0 and 1, respectively.)
3. Investigate the asymptotic robustness of the level of the t-test for testing the mean that rejects
Ho : f.L ~ 0 if .JiiXIS is larger than the upper a quantile of the tn-1 distribution.
4. Find the limit distribution of the sample kurtosis kn = n- 1 Ll=l (X;- X) 4 /S4 - 3, and design an
asymptotic level a test for normality based on kn. (Warning: At least 500 observations are needed
to make the normal approximation work in this case.)
5. Design an asymptotic level a test for normality based on the sample skewness and kurtosis jointly.
6. Let X 1, ... , Xn be i.i.d. with expectation f.L and variance 1. Find constants such that an (X; - bn)
converges in distribution if f.L = 0 or f.L =f. 0.
7. Let X 1, ... , X n be a random sample from the Poisson distribution with mean (}. Find a variance
stabilizing transformation for the sample mean, and construct a confidence interval for (} based on
this.
8. Let X 1, •.• , Xn be i.i.d. with expectation 1 and finite variance. Find the limit distribution of
.Jii(x;1 - 1). If the random variables are sampled from a density f that is bounded and strictly

positive in a neighborhood of zero, show that EIX; 1 1 = oo for every n. (The density of Xn is
bounded away from zero in a neighborhood of zero for every n.)

You might also like