Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
September 9, 2017
2
LECTURES 14-15
Example 0.1 (Bernoulli distribution) Let (Ω, F, P ) is a probability space
and A ∈ F with p = P (A). Tossing a p-coin gives such a probability space
with an event A. Here note that X takes 2 values 0 and 1.
The distribution function of X is given by
F (x) = 0 if x < 0
= 1 − p if 0 ≤ x < 1
= 1 if x ≥ 1 .
The distribution of X is given by
X
µ(B) = pk , B ∈ BR ,
k∈B∩{0,1}
where p0 = 1 − p, p1 = p. The above function F and the probability measure
µ on (R, BR ) are called respectively the Bernoulli distribution function and
the Bernoulli distribution. Also X = IA is an example of Bernoulli (p)
random variable.
Example 0.2 ( Binomial distribution with parameters (n, p)). Let X1 , X2 , · · · , Xn
be n independent Bernoulli(p) random variables defined on a probability
space. In fact one can define independent Bernoulli’s given above through
the following. Toss a p-coin n-times independently and let Xk = 1 if kth
toss is H and = 0 if the kth toss is T . Then
n
P {X = k} = P {X = 1, for k i0 s and Xi = 0 otherwise}
k
n k
= p (1 − p)n−k .
k
Hence the distribution of X = X1 + · · · + Xn is given by
0
if x < 0
n
(1 − p)n
if 0 ≤ x < 1
0
n
n n
F (x) = (1 − p) + p(1 − p)n−1 if 1 ≤ x < 2
0 1
k
n i
X
p (1 − p)n−i if k ≤ x < k + 1, k = 2, . . . , n − 1
i
i=0
1 if x ≥ n .
3
X n k
µX (B) = p (1 − p)n−k ).
k
k∈B∩{0,1,··· ,n}
The above F and µX are called Binomial (n, p) distribution function and
Binomial (n, p) distribution respectively. A random variable with Binomial
distribution as its law (distribution) is called a Binomial random variable. In
the beginning we have seen an example of a Binomial (n, p) random variable.
Example 0.3 (Poisson distribution with parameter λ). On (R, BR ) define
probability measure
X λk e− λ
µ(B) = , B ∈ BR .
k!
k∈B∩{0,1,··· }
Then µ defines a probability measure on BR and is called Poisson distribution
with parameter λ. The Poisson distribution function is given by
0
if x < 0
λ k e− λ
F (x) =
X
,x ≥ 0
k!
k=0,1,2,··· , k≤x
Question: Is F is indeed a distribution function? From the definition
of distribution function (I have given in the beginning of this chapter),
F is a distribution function if there exists a random variable X such that
F (x) = P {X ≤ x} for all x ∈ R.
I will give one such construction. Observe that X should take values
from {0, 1, 2, · · · } = {0} ∪ N. So take Ω = {0} ∪ N, F = P(Ω) and
X λk e− λ
P (A) = , A ⊆ Ω.
k!
k∈A
On this probability space, define X : Ω → R by X(ω) = ω. Then X is a
random variable and
X λk e− λ
P {X ≤ x} = , x ≥ 0 = F (x)
k!
k∈{0}∪N:k≤x
and P {X ≤ x} = 0 = F (x), x < 0. i.e., F is the distribution function of X.
Exercise: Give all the details.
4
Example 0.4 (Geometric distribution) On (R, BR ) define probability mea-
sure
µ({k}) = p(1 − p)k−1 , k = 1, 2, · · · ,
0 < p < 1. Then µ defines a probability measure on BR and is called
geometric distribution with parameter p.
Stusent may add the details as in the previous example.
Example 0.5 (Uniform distribution on [0, 1)) The distribution function is
given by
0 if x ≤ 0
F (x) = x if 0 < x ≤ 1
1 if x ≥ 1
is called the Uniform [0, 1) distribution function.
As discussed above, first one need to check that the above indeed is a
distribution function, i.e. we need to get(construct) a random variable X
such that P {X ≤ x} = F (x), x ∈ R.
To this end, first we describe a probability space. On (R, BR ), define the
probability measure µ such that
µ(B) = l(B ∩ [0, 1))
when the Borel set B is an interval, where l(B ∩ [0, 1)) denote the length of
the interval B ∩ [0, 1) if it is non empty. Now take (Ω, F, P ) as (R, BR , µ)
and define X : Ω → R as X(ω) = ω, the identity function. Then X is a
random variable (exercise) and
P {X ≤ x} = P ((−∞, x]) = l((−∞, x] ∩ [0, 1)) = F (x), x ∈ R
The probability measure µ is called the uniform distribution on [0, 1).
Example 0.6 ( Normal distribution with parameters µ, σ )
The distribution function F : R → R is given by
Z x
1 (y−µ)2
F (x) = µ(−∞, x] = √ e− 2σ2 dy .
2πσ −∞
is called normal distribution function with parameters µ, σ. Again to see that
F is indeed a distribution function, I will give another useful construction
of the ’normal’ random variable. Let U be a uniform [0, 1) random variable
5
defined on (Ω, F, P ). We can see that (exrcise) F is strictly increasing and
continuous with 0 < F (x) < 1 for all x ∈ R.
Define X = F −1 ◦ U . Then (exercise) X is a random variable on
(Ω, F, P ).
P {X ≤ x} = P {F −1 ◦ U ≤ x}
= P {U ≤ F (x)} = F (x), since 0 < F (x) < 1.
Example 0.7 (Exponential distribution with parameter λ > 0)
The distribution function is given by
0 if x ≤ 0
F (x) =
1 − e−λx if x > 0
is called exponential distribution function with parameter λ. Details of this
example is left as an exercise.
Remark 0.1 If F : R → R satisfies the following
(1)
lim F (x) = 0, lim F (x) = 1,
x→−∞ x→∞
(2) F is increasing and right continuous,
then we can show that there exists a random variable X on a probability
space (Ω, F, P ) such that
P {X ≤ x} = F (x), x ∈ R.
In fact, we have seen this in the above examples. We have seen three
methods to construct probability space and random variable on it satisfy-
ing P {X ≤ x} = F (x), x ∈ R. Two methods works only for special cases
which are easy (to construct) and the third method works for any F satisfy-
ing (1) and (2) but details are difficult (in fact ’finer details’ are beyond the
scope of this course).
Method I: This method is for the ’discrete’ F satisfying (1) and (2), i.e.
F ’increases’ only at jumps and hence all jumps add upto 1. Examples of
such F ’s are Bernoulli, Binomial, Poisson, Geometric etc. The precise def-
inition of ’discrete’ F is given in the subsection on ’classification of random
variables’.
6
Here one take Ω = D = {xi |i ∈ I}, the set of discontinuities of F (here
I is countable) and F = P(Ω) and P is defined by
P ({xi }) = F (xi ) − F (xi −), i ∈ I,
i.e. P ({xi }) is the jump size at xi . Now define X : Ω → R as X(ω) = ω.
Then X X
P {X ≤ x} = P {X = xi } = P {xi } = F (x).
i:xi ≤x i:xi ≤x
(Instruction: Student should carefully look at how each equality follows)
Method II: WhenF is strictly increasing, one can use the following.
Let U be a uniform (0, 1) random variable on a probability space (Ω, F, P ).
Then define X = F −1 ◦ U . Now as explained in the example of Normal
distribution that
P {X ≤ x} = P {U ≤ F (x)} = F (x).
Method III: This method is very general and works for any F satisfying
(1) and (2). Method relay on defining a probability measure P on (Ω, F) =
(R, BR ) such that
P ((−∞, x]) = F (x), x ∈ R.
(Here note that P is nothing but the distribution µ corresponding to F ).
Now define X : Ω → R → R by X(ω) = ω. Then
P {X ≤ x} = P ((−∞, x]) = F (x), x ∈ R.
See the example-Uniform distribution.
A Classification of random variables. Random variables can be classi-
fied using distribution functions according the ’continuity properties.
Definition 5.3: A random variable X with distribution function F : R → R
is said to be a discrete random variable if
X
(F (x) − F (x−) = 1 ,
x∈D
where D is the set of discontinuities of F .
Here observe that the ’discrete’ distribution F exhaust all the probability
masses through its jumps.
7
Lemma 0.1 If F is a discrete distribution then it is of the form
X
F (x) = pi H0 (x − xi ), x ∈ R,
i:xi ∈D
where H0 denote the Heaviside function1 and pi = P {X = xi } = F (xi ) −
F (xi −) and D is the set of discontinuities of F .
Proof: Let D = {xi |i ∈ I} where the index set I is countable and X be a
random variable with distribution X. Then it follows that P {X = x} = 0
for all x ∈
/ D. Therefore
F (x) = P {X ≤ x}
X
= P {X = xi }
i∈I:xi ≤x
X
= pi H0 (x − xi ).
i∈I
The distributions in Examples 0.1, 0.2, 0.3 corresponds to discrete ran-
dom variables.
Definition 5.4 A random variable X with distribution function F : R → R
which is continuous is said to be random variable with continuous distribu-
tion and is in short called by the name continuous random variable.
The distributions given in Examples 0.6, 0.5, 0.7 corresponds to contin-
uous random variable.
Definition 5.5 (Probability mass function)
Let X be a discrete random variable with distribution function F : R →
R.
Define f : R → R as follows:
f (x) = F (x) − F (x−)
Then f is called the probability mass function(pmf) of X.
For example, the pmf of the discrete random variable given in Example
4.0.26 is given by 1
4 if x = 0, 2
1
f (x) = if x = 1
2
0 otherwise .
1
Heaviside function H0 is defined by
0 if x<0
H0 (x) =
1 if x ≥ 0.
8
It is left as an exercise for the student to write down the pmf of random
variables in Examples 0.1, 0.2, 0.3.
The pmf of a continuous random variable is the zero function. Hence
the notion of pmf is useless for continuous random variables.
Definition 5.6(Probability density function)
A continuous random variable X with distribution function F : R → R
is said to have a probability density function(pdf) if there exists a function
f : R → R such that
Z x
F (x) = f (y)dy ∀ x ∈ R
−∞
If f : R → R exists, then it is called the pdf of X.
A continuous random variable with a pdf is simply called by absolutely
continuous random variable
It is easy to see that if F is differentiable every where and the derivative
denoted by F 0 is a continuous function, then the corresponding random
variable X has a pdf and is given by f = F 0 . This is not a necessary
condition.
Example 0.8 Define F : R → R as follows.
0 if x < 0
if 0 ≤ x < 21
x
1
F (x) = 2 if 12 ≤ x < 1
x − 12 if 1 ≤ x < 32
1 if x ≥ 32 .
Student can verify that F corresponds to distribution function of the random
variable given by the random experiment of picking a point ’at random’ from
[0, 12 ] ∪ [1, 23 ].
Then F is a distribution function corresponding to a continuous random
variable. But F is not differentiable at x = 12 , 1. The function
0 if x<0
0 ≤ x < 21
1 if
1
f (x) = 0 if 2 ≤x<1
1 if 1 ≤ x < 32
0 if x ≥ 32 .
is the pdf of F .
9
Distribution function of transformation of random variables: In this
subsection, we will see how one can write down the distribution function of
Y = ϕ ◦ X in terms of the distribution of X where ϕ : R → R is Borel
measurable. Note it is not possible to give an explicit formula but in some
cases one will be able to do that. Here my plan is to give a general recipe
and will illustrate it through some examples. I will give as an example, one
special class of transformations. Though it is possible to give an explicit
formula for many other cases, I will not do it instead take some examples
and show you how to use it.
General Recipe: The distribution of Y is given by
µY (B) = P {Y ∈ B}
= P {ϕ(X) ∈ B}
= P {X ∈ ϕ−1 (B)}
= µX (ϕ−1 (B), B ∈ BR .
Hence by taking B = (−∞, y], y ∈ R, we get following:
FY (y) = µX (ϕ−1 (−∞, y]), y ∈ R.
Hence to compute the distribution function Y in terms of the distribution
function of X, one need to identify the set ϕ−1 (−∞, y]). This I will illustrate
in the next example.
Example 0.9 (Reading exercise) Let ϕ : R → R be a continuous function
which is increasing. Then ϕ−1 (−∞, y]) = (−∞, sup ϕ−1 (y)]. This implies
that
FY (y) = FX (sup ϕ−1 (y)), y ∈ R.
In particular, if ϕ is strictly increasing, then FY (y) = FX (ϕ−1 (y)).
Now we will see the proof of ϕ−1 (−∞, y]) = (−∞, sup ϕ−1 (y)].
x ∈ ϕ−1 (−∞, y]) ⇒ ϕ(x) ∈ (−∞, y]
⇒ ϕ(x) ≤ y
⇒ x ≤ z for all z ∈ ϕ−1 (y) or ϕ(x) = y
⇒ x ∈ (−∞, sup ϕ−1 (y)].
The statement ϕ(x) ≤ y ⇒ x ≤ z for all z ∈ ϕ−1 (y) or ϕ(x) = y follows
from the argument. Suppose there exists some z ∈ ϕ−1 (y) such that x > z,
10
then ϕ(x) ≥ ϕ(z) = y. Hence ϕ(x) = y.
Now we prove the reverse inclusion. Suppose x ≤ sup ϕ−1 (y). Then
either (I) : x ≤ z for some z ∈ ϕ−1 (y) or (II) : x > z for all z ∈ ϕ−1 (y)
and there exists a sequence zn ∈ ϕ−1 (y) with zn → x.
Now
(I) ⇒ ϕ(x) ≤ y
⇒ x ∈ ϕ−1 ((−∞, y])
⇒ x ∈ ϕ−1 ((−∞, y]).
(II) ⇒ ϕ(x) = lim ϕ(zn ) = y (using continuity of) ϕ
n→∞
⇒ ϕ(x) = y
⇒ x ∈ ϕ−1 (y) ⊆ ϕ−1 ((−∞, y]).
This completes the proof of the reverse inclusion. Hence the proof is com-
plete.
Example 0.10 ϕ(x) = x3 . (Prototype for ϕ which is strictly increasing and
1 1 1
continuous) Hence FY (y) = FX (y 3 ). Here note y 3 = −(|y|) 3 for y < 0.
Example 0.11 Let ϕ(x) = x2 +1. (Prototype for ϕ which is increasing and
continuous and with some ’turning’ points)Then
∅ if y < 1
−1
ϕ (−∞, y] = {0} if y = 1
√ √
[− y − 1, y − 1] if y > 1.
Hence
p p p p
FY (y) = µX ([− y − 1, y − 1]) = FX ( y − 1) − FX ( y − 1−).
Example 0.12 ϕ be the Heaviside function, i.e. ϕ(x) = 0 if x < 0 and = 1
if x ≥ 0. (Prototype for ϕ which is piece-wise continuous)Then
∅ if y < 0
ϕ−1 (−∞, y]) = (−∞, 0) if 0 ≤ y < 1
if y ≥ 1.
R
Hence
0 if y < 0
FY (y) = FX (0−) if 0 ≤ y < 1
1 if y ≥ 1.