Dirac Delta in Statistics
Dirac Delta in Statistics
net/publication/233213407
CITATIONS READS
27 873
1 author:
Andre Khuri
University of Florida
115 PUBLICATIONS 2,932 CITATIONS
SEE PROFILE
All content following this page was uploaded by Andre Khuri on 30 January 2016.
1. Introduction
The Dirac delta function (δ-function) was introduced by Paul Dirac at the end
of the 1920s in an effort to create the mathematical tools for the development of
quantum field theory (see Dirac [2]). It has since been used with great success in
applied mathematics and mathematical physics.
The δ-function does not actually conform to the usual mathematical definition
of a function, and is therefore referred to as a generalized function. Dirac initially
called it an “improper function” (see Dirac [3], page 58), and he denoted it by δ(x),
−∞ < x < ∞. The following are some of the basic properties of δ(x); more details
can be found in Hoskins [6], Kanwal [8], Saichev and Woyczynski [11]:
R∞
(a) δ(x) = 0, if x 6= 0, and −∞
δ(x)dx = 1.
1
Formula (1) represents the so-called sifting, or sampling, property of the δ-
function. This formula can also be written as
Z b
f (x)δ(x − x0 )dx = f (x0 ) (2)
a
for any a, b such that a < x0 < b. The integral in (2) represents a so-called linear
functional, which assigns to the continuous function f (x) the value f (x0 ).
(d) If f (x) is any function with continuous derivatives up to the nth order in some
neighborhood of x0 , then
Z b
f (x)δ (n) (x − x0 )dx = (−1)n f (n) (x0 ), n ≥ 0 (3)
a
(e) If f (x) has simple zeros at x1 , x2 , . . . , xn and is differentiable at these points such
that f 0 (xi ) 6= 0 for i = 1, 2, . . . , n, then
n
X δ(x − xi )
δ[f (x)] = . (5)
i=1
|f 0 (xi )|
In particular, if f (x) has only one simple zero at x = x0 and f 0 (x0 ) 6= 0, then
δ(x − x0 )
δ[f (x)] = . (6)
|f 0 (x0 )|
In the event f (x) has higher-order zeros, no significance is attached to δ[f (x)]
(see Kanwal [8], page 49). For example, if f (x) = ax, where a is a nonzero
constant, then
δ(x)
δ(ax) = . (7)
|a|
Using a = −1 in (7) we conclude that δ(−x) = δ(x), which indicates that δ(x)
is an even function.
2
Formula (5) remains true for an infinite set of simple zeros, for example, for
−∞ < x < ∞, we have
∞
X δ(x − nπ)
δ(sin x) =
n=−∞
| cos(nπ)|
∞
X
= δ(x − nπ).
n=−∞
(f) A closely related function to the δ-function is the Heaviside function H(x) which
is defined as the unit step function,
0 x<0
H(x) = (8)
1 x≥0
dH(x)
δ(x) = (9)
dx
[see, for example, Hoskins ([6], page 34), Hsu ([7], page 58)]. From (9) it follows
that for any fixed x0 ,
dH(x − x0 )
δ(x − x0 ) =
dx
dH(x0 − x)
= − . (10)
dx
(g) The definition of the δ-function can be extended to Rn , the n-dimensional Eu-
clidean space. Thus, if x ∈ Rn and f (x) is a continuous function in a neigh-
borhood of x = x0 , then
Z
f (x)δ(x − x0 )dx = f (x0 ), (11)
Rn
where dx = dx1 dx2 . . . dxn . See, for example, Saichev and Woyczynski ([11],
page 28).
3
2.1. Representation of discrete distributions
Suppose that X is a discrete random variable that assumes Pn the values a1 , a2 , . . . , an
with corresponding probabilities p1 , p2 , . . . , pn such that i=1 pi = 1. The probability
mass function, p(x), of X can be represented as a generalized function of the form
n
X
p(x) = pi δ(x − ai ). (12)
i=1
The moments of X can then be derived using the integral notation instead of sum-
mation. For example, the k th noncentral moment of X is written as
Z ∞ Z ∞ n
X
k k
x p(x)dx = x pi δ(x − ai )dx
−∞ −∞ i=1
n
X Z ∞
= pi xk δ(x − ai )dx
i=1 −∞
Xn
= aki pi ,
i=1
as can be seen from applying formula (1) to f (x) = xk . Formula (12) is still applicable
if the ai ’s are vector valued in Rm . Thus, if x and the ai ’s are in Rm , then
n
X
p(x) = pi δ(x − ai ).
i=1
One interesting advantage of this application is that it does not require the function
g(·) to be one-to-one, nor does it involve the computation of the Jacobian, as is usually
the case with the conventional change-of-variable technique.
4
Formula (13) can be extended to a single transformation involving several random
variables. If Y = u(X1 , X2 , . . . , Xn ), where the Xi ’s are continuous random variables
with a joint density function f (x), where x = (x1 , x2 , . . . , xn )0 , then the density
function, λ(y), of Y is given by
Z ∞
λ(y) = f (x)δ[y − u(x)]dx, (14)
−∞
where the integral is n-dimensional. A couple of examples were given by Au and Tam
[1] to illustrate the usefulness of this representation. Au and Tam [1] also pointed
out that the integral in (14) is considerably easier to use and more direct than the
conventional approach which requires the introduction of n − 1 additional random
variables.
Another extension of formula (13) is the derivation of the joint distribution of
several functions of X1 , X2 , . . . , Xn . For example, if Y = u(X1 , X2 , . . . , Xn ) and
Z = v(X1 , X2 , . . . , Xn ) are two such functions, then the bivariate density function,
τ (y, z), of Y and Z is given by
Z ∞
τ (y, z) = f (x)δ[y − u(x)]δ[z − v(x)]dx. (15)
−∞
To show the validity of (15), let T (y, z) denote the cumulative bivariate distribution
function of Y and Z. Then,
T (y, z) = P [u(X1 , X2 , . . . , Xn ) ≤ y, v(X1 , X2 , . . . , Xn ) ≤ z]
Z ∞
= f (x)H[y − u(x)]H[z − v(x)]dx,
−∞
which results from applying formula (10). This particular application was not men-
tioned in Au and Tam [1]. Extensions to more than two multivariable transformations
is straightforward.
2.2.1. Examples
Consider the following examples that illustrate the application of the δ-function in
transforming random variables:
5
Example 1. One particular application of formula (14) is the derivation of the
density function of Y = X1 + X2 . In this case,
Z ∞Z ∞
λ(y) = f (x1 , x2 )δ(y − x1 − x2 )dx1 dx2
−∞ −∞
Z ∞ Z ∞
= dx2 f (x1 , x2 )δ[x1 − (y − x2 )]dx1
−∞ −∞
Z ∞
= f (y − x2 , x2 )dx2 , (17)
−∞
as can be seen from applying formula (1). If X1 and X2 are statistically independent
with marginal density functions f1 (x1 ) and f2 (x2 ), respectively, then
Z ∞
λ(y) = f1 (y − x2 )f2 (x2 )dx2 . (18)
−∞
The integral in (18) [or (19)] is the convolution of f1 (x) and f2 (x).
6
h i
z−x2
Now, by applying formula (6) to δ z
− y we obtain
z − x2 δ[x2 − (z − zy)]
δ −y =
z | z1 |
δ[x2 − (z − zy)]
= 1 ,
z
Hence,
Z ∞ Z ∞
1 1 2
τ (y, z) = dx2 exp − (x1 + x2 ) δ(x21 + x22 − y) ×
2
2π −∞ −∞ 2
δ(x1 − x2 z)
dx1
| x12 |
Z ∞
1 1 2 2
= |x2 | exp − (x2 z + x2 ) δ(x22 z 2 + x22 − y)dx2 .
2
(21)
2π −∞ 2
7
Making the substitution in (21), we obtain
y
1 exp − 2
τ (y, z) = 2
,
2π 1 + z
which shows that Y and Z are independently distributed with Y ∼ χ22 and Z has the
1
Cauchy distribution with the density function π(1+z 2) .
8
p1 (y1 ) = p1 (y − xn )
n−1
X n−1 i
= p (1 − p)n−1−i δ(y − xn − i),
i=0
i
pn (xn ) = pδ(xn − 1) + (1 − p)δ(xn ),
respectively, as can be seen from applying formula (12). The probability mass function
of Y is therefore the convolution of p1 (y1 ) and pn (xn ), that is,
∞ n−1
X
n−1 i
Z
n−1−i
p(y) = p (1 − p) δ(y − xn − i) ×
−∞ i=0
i
[pδ(xn − 1) + (1 − p)δ(xn )]dxn
R∞ R∞
[see formula (18)]. Noting that −∞ δ(y − xn − i)δ(xn − 1)dxn = δ(y − 1 − i), −∞ δ(y −
xn − i)δ(xn )dxn = δ(y − i), we obtain
n−1
X n−1
p(y) = pi+1 (1 − p)n−1−i δ(y − 1 − i)
i=0
i
n−1
X n−1 i
+ p (1 − p)n−i δ(y − i).
i=0
i
n
X n−1 i
= p (1 − p)n−i δ(y − i)
i=1
i − 1
n−1
X n−1 i
+ p (1 − p)n−i δ(y − i)
i=0
i
n−1
n
X n−1 n−1
= p δ(y − n) + + pi (1 − p)n−i δ(y − i)
i=1
i − 1 i
+(1 − p)n δ(y)
n−1
n
X n i
= p δ(y − n) + p (1 − p)n−i δ(y − i) + (1 − p)n δ(y)
i=1
i
n
X n i
= p (1 − p)n−i δ(y − i).
i=0
i
9
2.3. Markov’s inequality
Let X be a random variable (discrete or continuous). If g(x) is a nonnegative
function, then
1
P [g(X) ≥ b] ≤ E[g(X)], (24)
b
provided that E[g(X)] exists, where b is a positive constant. This is known as
Markov’s inequality. Let us now prove this inequality using the δ-function approach.
The density function (or probability mass function) of Y = g(X) is given by (13),
where f (x) is the density function (or probability mass function of X). Then,
Z ∞ Z ∞
P [g(X) ≥ b] = dy f (x)δ[y − g(x)]dx
b −∞
Z ∞ Z ∞
= f (x)dx δ[y − g(x)]dy
−∞ b
1 ∞
Z Z ∞
≤ f (x)dx yδ[y − g(x)]dy
b −∞ b
1 ∞
Z
≤ f (x)g(x)dx (25)
b −∞
1
= E[g(X)].
b
Inequality (25) follows from the fact that
Z ∞ 0, if g(x) < b
1
yδ[y − g(x)]dy = 2
g(x), if g(x) = b
b
g(x), if g(x) > b
R∞
and hence, b yδ[y − g(x)]dy ≤ g(x).
where δ (i) (x) is the generalized ith derivative of δ(x) as in formula (4). Using (26) it
is easy to verify that
∞
∞ ∞
(−1)i
Z Z X
n n
x f (x)dx = x µi δ (i) (x)dx
−∞ −∞ i=0
i!
10
∞
(−1)i µi ∞
X Z
= xn δ (i) (x)dx
i=0
i! −∞
= µn , (27)
since by (4),
Z ∞
n (i) 0, i 6= n
x δ (x)dx =
−∞ (−1)n n!, i = n.
The interchange of the order of integration and summation in (28) is permissible if the
power series in (29) is uniformly convergent (with respect to t) in some neighborhood
of the origin [see, for example, Fulks ([5], page 515].
Note that the moments of X uniquely determine the distribution of X if the power
series ∞
X µn n
τ (30)
n=0
n!
is absolutely convergent for some τ > 0 [see, for example, Fisz ([4], Theorem 3.2.1)].
This follows from the fact that absolute convergence of the series in (30) guarantees
uniform convergence of the series in (29) within the interval (−τ, τ ) [see, for exam-
ple, Khuri ([10], Theorem 5.4.4)], and hence the existence of the moment generating
function within the same interval.
The representation of f (x) as in (26) is instrumental in deriving an approximation
Rb
for the integral a ϕ(x)e−λΨ(x) dx using the method of Laplace, where λ is a large
positive constant, ϕ(x) is continuous on [a, b], and the first and second derivatives
11
of Ψ(x) are continuous on [a, b] [see Kanwal ([8], Section 13.2)]. This integral was
originally used by Pierre Laplace in his development of the central limit theorem. In
addition, Laplace’s approximation is useful in several areas in statistics, particularly
in Baysian statistics (see Kass, Tierney, and Kadane [9]).
3. Concluding remarks
The Dirac delta function provides a very helpful tool in mathematical statistics.
Several examples were presented in this manuscript to demonstrate its usefulness.
One of its main advantages is the provision of a unified approach for the treatment
of discrete and continuous distributions. This was demonstrated in the discussions
concerning random variables transformations. Given the generalized nature of the δ-
function and its derivatives, the δ-function approach has the potential of facilitating
the understanding and development of classical concepts in mathematical statistics.
References
[1] AU, C., and TAM, J., 1999, The American Statistician, 53, 270-272.
[2] DIRAC, P.A.M., 1927, Proceedings of the Royal Society of London, Series A, 113,
621-641.
[3] DIRAC, P.A.M., 1958, The Principles of Quantum Mechanics (London: Oxford
University Press).
[4] FISZ, M., 1963, Probability Theory and Mathematical Statistics, third edition
(New York: Wiley).
[5] FULKS, W., 1978, Advanced Calculus, third edition (New York: Wiley).
[6] HOSKINS, R.F., 1979, Generalized Functions (New York: Wiley).
[7] HSU, H.P., 1984, Applied Fourier Analysis (San Diego, CA: Harcourt Brace
Jovanovich).
[8] KANWAL, R.P., 1998, Generalized Functions Theory and Technique, second
edition (Boston, MA: Birkhäuser).
[9] KASS, R.E., TIERNEY, L., and KADANE, J.B., 1991, in Statistical Multi-
ple Integration, edited by N. Flournoy and R.K. Tsutakawa (Providence, RI:
American Mathematical Society), pp. 89-99.
[10] KHURI, A.I., (2003), Advanced Calculus with Applications in Statistics, second
edition (New York: Wiley).
[11] SAICHEV, A.I., and WOYCZYNSKI, W.A., 1997, Distributions in the Physical
and Engineering Sciences (Boston, MA: Birkhäuser).
12