Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
October 8, 2017
2
LECTURES 20-21
Theorem 0.1 Let X be a continuous non negative random variable with
pdf f . Then Z ∞
EX = xf (x) dx
0
provided the rhs integral exists.
Proof. By using the simple functions given in the proof of Theorem 7.4, we
get
n2n −1
X k k k+1
EX = lim n
P ({ n ≤ X(ω) < })
n→∞ 2 2 2n
k=0
n2n −1
X k k+1 k
= lim n
[F ( n ) − F ( n )]
n→∞ 2 2 2
k=0
n2n −1
X k k+1 k
= lim f (tk )( n − n )
n→∞ 2n 2 2
k=0
n2n −1 n2 −1 n
X k+1 k X k k+1 k
= lim tk f (tk )( n − n ) + lim ( n − tk ) f (tk )( n − n ) ,
n→∞ 2 2 n→∞ 2 2 2
k=0 k=0
(0.1)
k k+1
where tk ∈ ( 2n , 2n ) is the point given by the mean value theorem.
n2n −1
X k k+1 k
0 ≤ lim ( n
− tk ) f (tk )( n − n )
n→∞ 2 2 2
k=0
n2n −1
X 1 k+1 k
≤ lim n
f (tk )( n − n )
n→∞ 2 2 2
k=0 Z
∞
1
= lim n · f (x)dx = 0 .
n→∞ 2 0
Hence Z ∞
EX = xf (x) dx .
0
Definition 7.4. Let X be a random variable on (Ω, F, P ). The mean or
expectation of X is said to exists if either EX + or EX − is finite. In this
case EX is defined as
EX = EX + − EX − ,
3
where
X + = max{X, 0}, X − = max{−X, 0} .
Note that X + is the positive part and X − is the negative part of X.
Theorem 0.2 Let X be a continuous random variable with finite mean and
pdf f . Then Z ∞
EX = xf (x) dx .
−∞
Proof. Set
k k k+1
(
if ≤ X + (ω) < , k = 0, · · · , n2n − 1
Yn (ω) = 2n 2n 2n
0 if X + (ω) ≥ n .
Hence
k k k+1
(
if ≤ X(ω) < , k = 0, · · · , n2n − 1
Yn (ω) = 2n 2 n 2n
0 if X(ω) ≥ n or X(ω) ≤ 0 .
Then Yn is a sequence of simple random variables such that
EX + = lim EYn .
n→∞
Similarly, set
k k k+1
(
if ≤ X − (ω) < , k = 0, · · · , n2n − 1
Zn (ω) = 2n 2n 2n
0 if X − (ω) ≥ n .
Hence
k k k+1
(
if ≤ −X(ω) < , k = 0, · · · , n2n − 1
Zn (ω) = 2n 2 n 2n
0 if X(ω) ≤ −n or X(ω) ≥ 0 .
Then
EX − = lim EZn .
n→∞
Now
n2 n −1
X k k k+1
lim EYn = lim n
P ({ n ≤ X(ω) < }) (0.2)
n→∞ n→∞ 2 2 2n
k=0
4
and
n2n −1
X k k k+1
− limn→∞ EZn = − lim P ({ n ≤ −X(ω) < })
n→∞ 2n 2 2n
k=
n2n −1
X k −k − 1 −k
= − n P ({ n
< X(ω) ≤ n })
2 2 2
k=0 (0.3)
−n2 n
X+1 k k−1 k
= n
P ({ n < X(ω) ≤ n })
2 2 2
k=0
−n2 n
X+1 k−1 k−1 k
= P ({ n < X(ω) ≤ n }) .
2n 2 2
k=0
The last equality follows by the arguments from the proof of Theorem 6.0.26.
Combining (0.2) and (0.2), we get
n2n −1
X k k k+1
EX = lim n
P ({ n ≤ X(ω) < }) .
n→∞ 2 2 2n
k=−n2n
Now as in the proof of Theorem 6.0.26, we complete the proof.
We state the following useful properties of expectation. The proof follows
by approximation argument using the corresponding properties of simple
random variables.
Theorem 0.3 Let X, Y be random variables with finite mean. Then
(i) If X ≥ 0, then EX ≥ 0.
(ii) For a ∈ R,
E(aX + Y ) = aEX + EY .
(iii) Let Z ≥ 0 be a random variable such that Z ≤ X. Then Z has finite
mean and EZ ≤ EX.
In the context of Riemann integration, one can recall the following con-
vergence theorem.
“ If gn , n ≥ 1 is a sequence of continuous functions defined on the [a, b]
such that gn → g uniformly in [a, b], then
Z b Z b
lim gn (x) dx = g(x)dx .00
n→∞ a a
i.e., to take limit inside the integral, one need uniform convergence of func-
tions. In many situations in it is highly unlikely to get uniform convergence.
5
In fact uniform convergence is not required to take limit inside an in-
tegral. This is illustrated in the following couple of theorem. The proof of
them are beyond the scope of this course.
Theorem 0.4 (Monotone convergence theorem) Let Xn be an increasing
sequence of nonnegative random variables such that limn→∞ Xn = X. Then
lim EXn = EX .
n→∞
[Here limn→∞ Xn = X means limn→∞ Xn (ω) = X(ω), ω ∈ Ω.]
Theorem 0.5 (Dominated Convergence Theorem) Let Xn , X, Y be ran-
dom variables such that
(i) Y has finite mean.
(ii) |Xn | ≤ Y .
(iii) limn→∞ Xn = X.
Then
lim EXn = EX .
n→∞
Now we state the following theorem which provides a useful tool to com-
pute expection of random variable which are mixed in nature.
Theorem 0.6 Let X be a continuous random variable Rwith pdf f and φ :
∞
R → R be a continuous function such that the integral −∞ φ(x)f (x)dx is
finite. Then
Z ∞
E[φ ◦ X] = φ(x)f (x)dx .
−∞
Proof: First, I will give a proof for a special case, i.e. when ϕ : R → R is
strictly increasing and differentiable. Set Y = ϕ(X). Then Y has a density
g given by
1
g(y) = f (ϕ−1 (y)) , y ∈ ϕ(R), = 0 otherwise.
ϕ0 (ϕ−1 (y))
6
Hence
Z ∞
E[ϕ(X)] = yg(y)dy
−∞
Z
1
= yf (ϕ−1 (y)) dy
ϕ(R) ϕ0 (ϕ−1 (y))
Z ∞
dy 1
(use y = ϕ(x), Jacobian is = ϕ0 (x)) = ϕ(x)f (x) ϕ0 (x)dx
dx ϕ0 (x)
Z−∞
∞
= ϕ(x)f (x)dx.
−∞
When ϕ(x) = x2n+1 , n ≥ 0. Also let E[X 2n+1 ] exists. Then note that ϕ
is strictly increasing and differentiable. Though one can use the proof given
above to conclude the result, we will give a direct proof. Here
1
ϕ−1 (y) = y 2n+1 , ϕ0 (x) = (2n + 1)x2n .
Hence the pdf g of X 2n+1 is given by
1 1 2n
g(y) = f ( y 2n+1 )y − 2n+1 , y 6= 0.
2n + 1
Hence
Z ∞
E[X 2n+1 ] = yg(y)dy
−∞
Z ∞
1 1 2n
= yf ( y 2n+1 )y − 2n+1 dy
2n + 1 −∞
Z ∞
dy
(use y = x2n+1 , Jacobian is = (2n + 1)x2n ) = x2n+1 f (x)dx.
dx −∞
When ϕ(x) = x2n . Then note ϕ is not one to one.
Set Y = X 2n and G, g denote respectively the distribution function and
the pdf of Y . Then for y > 0,
G(y) = P {Y ≤ y}
1 1
= P {−y 2n ≤ X ≤ y 2n }
1 1
= F (y 2n ) − F (−y 2n ).
Hence
1 1 −1 1 1
g(y) = y 2n f (y 2n ) + f (−y 2n ) , y > 0, = 0 for y ≤ 0.
2n
7
Therefore
Z ∞
2n
E[X ] = yg(y)dy
−∞
Z ∞
1 1
1 1
= y 2n f (y 2n ) + f (−y 2n ) dy.
2n 0
Consider Z ∞
1 1 1
y 2n f (y 2n )dy.
2n 0
We use the following change variable argument. Set y = x2n := ψ(x), x > 0.
Then note that ψ : (0, ∞) → (0, ∞) is a bijective map and the Jacobian is
ψ 0 (x) = 2nx2n−1 , x > 0. Hence
Z ∞ Z
1 1 1
y 2n f (y 2n )dy = xf (x)|ψ 0 (x)|dx
2n 0 −1
ψ ((0,∞))
Z ∞
= x2n f (x)dx.
0
Similarly consider Z ∞
1 1 1
y 2n f (−y 2n )dy.
2n 0
Set y = x2n := ψ(x), x < 0, i.e. ψ : (−∞, 0) → (0, ∞) and is a bijective map
2n
with Jacobian ψ 0 (x) = 2nx2n−1 , x < 0. We write is as ψ 0 (x) = 2nxx . Also
note that y 1/2n = |x|, x < 0. Hence
Z ∞ Z
1 1 1
y 2n f (−y 2n )dy = |x|f (x)|ψ 0 (x)|dx
2n 0 ψ 1 ((∞,0))
Z 0
= x2n f (x)dx.
−∞
Now combining the integrals, we get the formula.
Now when ϕ is a polynomial, we can prove the theorem by writing ϕ as
a linear combination of xn of appropriate order and then use the (linearity)
property of expectation. Proof of the theorem beyond polynomials requires
more sofistication so we won’t be considering in this course.
The above theorem is some times referred as the “Law of unconscious
statistician” since often users treat the above as a definition itself.
8
Example 0.1 Let X ∼ U (0, 2) and Y = max{1, X}. Then for ϕ(x) =
max{1, x}, we have
EY = E[ϕ ◦ X]
Z ∞
= ϕ(x)f (x)dx
−∞
Z 2
1
= max{1, x}dx
2 0
1 1 2
Z
= + xdx
2 2 1
5
= ,
4
where f is the pdf of U (0, 2).
Example 0.2 Let X be a random variable with pdf f . Find E[XI{X≤a} ]
where a ∈ R.
Note this doesn’t come under the realm of Theorem 0.6, because here
ϕ(x) = xI{x≤a} has a discontinuity at x = a for a 6= 0. So discussion below
will give you a method to tackle ϕ with discontinuities.
So consider
ϕa (x) = (x − a)I{x≤a} .
Note ϕa is continuous and hence using Theorem 0.6, we get
Z ∞
E[(X − a)I{X≤a} ] = (x − a)I{x≤a} (x)f (x)dx
−∞
Z a
= (x − a)f (x)dx
−∞
Z a
= xf (x)dx − aP {X ≤ a}.
−∞
Now
E[(X − a)I{X≤a} ] = E[XI{X≤a} ] − aE[I{X≤a} ]
= E[XI{X≤a} ] − aP {X ≤ a}.
Now equating the above two, we get
Z a
E[XI{X≤a} ] = xf (x)dx.
−∞
9
Exercise Let ϕ be a continuous function and a ∈ R and X be a random
variable with pdf f such that E[ϕ(X)] is finite. Show that
Z a
E[ϕ(X)I{X≤a} ] = ϕ(x)f (x)dx.
−∞
Along similar line to Theorem 0.6, we have the following theorem.
Theorem 0.7 Let X and Y be continuous random variables with joint pdf
f and ϕ : R2 → R be a continuous function. Then
Z ∞Z ∞
E[ϕ ◦ (X, Y )] = ϕ(x, y)f (x, y)dxdy.
−∞ −∞
Here again, proof is beyond the scope of the course but special cases like
X + Y, XY, X 2 etc . I will do the case ϕ(x, y) = xy as an example.
First note that the pdf g of Y = XY is given by
Z ∞
1 z
g(z) = f (x, )dx, z ∈ R.
−∞ |x| x
(If you have not yet done this, immediately attend the above)
Hence
Z ∞
E[XY ] = zg(z)dz
Z−∞
∞ Z ∞
z z
= f (x, )dxdz
|x| x
Z−∞ −∞
∞ Z ∞
z z
(change order of integrtaion) = f (x, )dzdx
|x| x
Z−∞ −∞
∞ Z ∞
xy
(put z = xy) = f (x, y)|x|dydx
|x|
Z−∞ −∞
∞ Z ∞
= xyf (x, y)dydx.
−∞ −∞
[In the above, justify the change of variable calculation. Hint: split the
outer integral (i.e. integral over x variable into (−∞, 0) and (0, ∞) and
apply change of variable formula separately and then combine.]
10
Theorem 0.8 Let X and Y be independent random variables such that
EX, EY exists. Then E[XY ] exists and is given by
E[XY ] = EXEY.
Proof: Using the above one can see that when X and Y are independent
and with a pdf f , then
E[XY ] = EXEY
(exercise). When X, Y discrete with joint pmf f , then again it is easy to
prove (exercise).
Variance and other Higer order moments: In this subsection, we ad-
dress the question of recontruction of the distribution of a random variable.
To this end, we introduce objects are moments of a random variable and
later see whether we can determine the distribution function using them.
Definition 7.5.(Higher Order Moments) Let X be a random variable. Then
EX n is called the nth moment of X and E(X − EX)n is called the nth
central moment of X. The second central moment is called the variance.
Example 0.3 (1) Let X be Bernoulli (p). Then we have see that EX = p
and hence for a Bernoulli random variable, knowing first moment itself
will uniquely identify the distribution. Also note that other moments
are EX n = p, n ≥ 1. Observe the pattern of the moments, {p, p, · · · }.
(2) Let X be Binomial (n, p). Then EX = np. This doesn’t gives uniquely
the distribution. So let us compute the variance. To do this, we use the
following. Let X1 , X2 , · · · , Xn be n-independent Bernoulli(p) random
variables. Then we know that X1 + · · · + Xn is Binomial (n, p). Hence
take X = X1 + · · · + Xn . Now
n
X
2
E[X − EX] = E[Xk − p]2
k=1
n
X
= p(1 − p)
k=1
= np(1 − p).
Now given EX and E[X − EX]2 , we can solve
EX = np, E[X − EX]2 = np(1 − p)
0.1. MOMENT GENERATING FUNCTION 11
to find the parameters n and p.
Also find few more moments (exercise).
(3) Let X ∼ N (0, 1). Then EX = 0 and EX 2 = 1 and also Variance
Var(X) = 1. (Exercise)
The above examples only tell that if we know apriori that given random
variable is of certain type like Binomial, Bernoulli, Poisson but we don’t
know the parameters, one can use their moments to determine their param-
eters and hence their distribution.
There is an interesting problem, given a sequence of numbers {a1 , a2 , · · · },
can one find a distribution/ distribution function whose moments are given
by {an }? Also is it unique. This is called the moment problem.
For example, we have seen that the sequence {p, p, · · · } corresponds to
the Bernoulli (p) distribution but not sure right now whether it is the only
distribution with moments given by the above sequence.
If a distribution is uniquely determined by its moments are called moment
determinant distributions and others are called moment indeterminant.
In fact, Bernoulli, Binomial, Poisson, Normal etc are moment determi-
nant but log normal distribution is moment indetreminant.
Chapter 8: Moment Generating and Characteristic functions
In this chapter, we introduce the notion of moment generating function
(in short mgf) and characteristic function of a random variable and study its
properties. Both moment generating function and Characteristic function
can be used to identify distribution functions uniquely unlike moments. In
fact, a way to understand whether a distribution is moment determinant or
not is by using either moment generating function or characteristic functions.
It is interesting to note that mgf is closely related the Laplace transform and
characteristic function is its counter part Fourier transform.
0.1 Moment generating function
In this subsection we study moment generating functions and its properties.
Definition 8.1 Given a random variable on a probability space (Ω, F, P ),
its moment generating function denoted by MX is defined as
MX (t) = E[etX ], t ∈ I,
12
where I is an interval on which the rhs expectation exists. In fact for a
non negative random variable X, I always contains (−∞, 0]. If X is a non
negative random variable such that EX doesn’t exists, then MX (t) doesn’t
exists for t > 0 (exercise). Analogous comment holds for negative random
variable. Moment generating functions becomes useful, if I contains an
interval containing 0.
Example 0.4 Let X ∼ Bernoulli (p). Then
MX (t) = (1 − p) + pet , t ∈ R.