Chapter 1
Probability
1.1 Definition
prop If E ∩ F = ∅ then P(E ∪ F ) = P(E ) + P(F )
prop Inclusion-exclusion principle:
n
! n n
!
[ X X X \
P Ai = P(Ai ) − P(Ai ∩ Aj ) + P(Ai ∩ Aj ∩ Ak ) − · · · + (−1)n+1 P Ai
i=1 i=1 i<j i<j<k i=1
1.2 Conditional probability and Bayes’ theorem
def The conditional probability of event E given event F is defined as:
P(E ∩ F )
P(E | F ) = .
P(F )
Multiplication rule
P(E1 E2 E3 · · · En ) = P(E1 )P(E2 | E1 )P(E3 | E1 E2 ) · · · P(En | E1 E2 · · · En−1 )
Total probability rule
Let E1 , E2 , E3 , . . . , En be a collection of mutually exclusive events whose union is sample space
S. Let E be any event, then
n
X
P(E ) = P(E | Ej )P(Ej )
j=1
Bayes’ Theorem
Let E1 , E2 , E3 , . . . , En be a collection of mutually exclusive events whose union is sample space
S. Let E be any event such that P(E ) ̸= 0. Then for any event Ek , k = 1, 2, 3, . . . , n,
P(E | Ek )P(Ek ) P(E | Ek )P(Ek )
P(Ek | E ) = Pn =
j=1 P(E | Ej )P(Ej ) P(E )
1
Chapter 2
Discrete Distributions
2.1 Random variables
def A random variable is any rule that associates a number with each outcome in a sample
space.
Random variables are customarily denoted by uppercase letters. A particular value of the ran-
dom variable is denoted by a lowercase letter.
def A Bernoulli random variable is a rv whose only possible values are 0 and 1.
2.2 Probability distributions
def The probability distribution of X gives how the total probability of 1 is allocated to each
value of the rv. Also known as the probability mass function (pmf).
p(x) = P(X = x) = P (∀s ∈ S, X (s) = x)
def A parameter is a quantity that can be assigned any one of a number of possible values,
with each different value determining a different probability distribution.
The collection of all probability distributions for different values of the parameter is called a
family of probability distributions.
def The cumulative distribution function (cdf) of a discrete random variable X is the probability
that X will take a value less than or equal to x.
X
F (x) = P(X ≤ x) = p(y )
y :y ≤x
The graph of the cdf of a discrete random variable is a step function.
prop P(a ≤ X ≤ b) = F (b) − F (a−)
where ”a−” represents the largest possible value of the rv X strictly less than a.
2
2.3 Expected values
def The expected value or mean value of a rv X is the average value of X on performing repeated
trials of an experiment.
The expectation of a discrete random variable is the weighted average of all possible outcomes,
where the weights are the probabilities of realizing each given value.
X
E (x) = µX = x · p(x)
x∈D
The expected value of X describes where the probability distribution is centered.
P
prop For any function h(X ), E [h(X )] = h(x) · p(x), where D is the set of possible values for
D
the rv X and p(x) is the pmf.
prop For a linear function, E (aX + b) = a · E (X ) + b.
prop E [X + Y ] = E [X ] + E [Y ]
prop E [X Y ] = E [X ] · E [Y ] if X and Y are independent variables.
def If an rv X has a pmf p(x) and expected value µ, the variance (V(X) or σX 2 ) is given by:
X
V (X ) = (x − µ)2 · p(x) = E [(X − µ)2 ] = E [X 2 ] − (E [X ])2
D
prop V (cX ) = c 2 V (X )
Proof. V (cX ) = E [(cX )2 ] − (E [cX ])2 = E [c 2 X 2 ] − c 2 (E [X ])2 = c 2 V (X )
2.4 Moment generating functions
def The k th moment of X is defined as E [X k ].
def The k th central moment of X is defined as E [(X − µ)k ].
def The moment generating function for X is denoted by mX (t) and defined as mX (t) = E [e tX ]
prop The k th moment of a random variable X is given by:
dk mX (t)
E [X k ] =
dt k t=0
If the Taylor expansion of the mgf of X is given by M(t) = a0 + a1 t + a2 t 2 + . . . + an t n + . . . ,
then E [X n ] = n!an .
3
prop if the moments of a specified order exist, then all the lower order moments automatically
exist.
prop MX +a (t) = e at MX (t)
prop MbX (t) = MX (bt)
prop The mgf of the sum of a number of independent random variables is equal to the product
of their respective mgfs.
MX1 +X2 +...+Xn (t) = MX1 (t) · MX2 (t) · . . . · MXn (t)
2.5 Binomial random variable
def A Bernoulli trial is a random experiment or a trial whose outcome can be classified as either
a success or a failure.
def The binomial random variable X denotes the number of successes that occur in n indepen-
dent Bernoulli trials. It takes the parameters n and p, where p is the probability of success,
which remains same for every trial.
If X is a binomial random variable with parameters (n, p), then we write it as X ∼ Bin(n, p).
The pmf of a binomial random distribution having parameters (n, p) is given by:
n p x q n−x , x = 0, 1, . . . , n
b(x; n, p) = x
0, otherwise
where q = 1 − p.
prop E [X ] = np
Proof.
prop V (X ) = np(1 − p)
prop mX (t) = p(e t + 1 − p)n
prop If Xi , (i = 1, 2, . . . , k) are independent random variables with parameters (ni , p), (i =
1, 2, . . . , k), then their sum
k k
!
X X
Xi ∼ B ni , p
i=1 i=1
4
2.6 Geometric random variable
def A geometric random variable X is one which has a geometric distribution with parameter
p, 0 < p < 1.
The density function (pmf) of a geometric rv is given by:
f (x) = (1 − p)x−1 p = q x−1 p for x = 1, 2, 3, . . .
The cumulative density function of a geometric rv is given by:
(
0, if x < 1
F (x) =
1 − q [x] , if x ≥ 1
1
prop E [X ] =
p
q
prop V (X ) =
p2
pe t
prop mX (t) =
1 − qe t
2.7 Poisson random variable
def The Poisson random variable X is one which has a Poisson distribution with parameter k.
The pmf of a Poisson distribution is given as:
e −k k x
p(x; k) = ; for x = 0, 1, 2, . . . and k > 0.
x!
∞ ∞
k
X kx X e −k · k x
e = ; and so =1
x=0
x! x=0
x!
prop E [X ] = k
prop V [X ] = k
t −1)
prop mX (t) = e k(e ∀t ∈ R
prop For any binomial experiment in which n is large (¿50) and p is small, b(x; n, p) ≈ p(x; k),
where k = np.
5
2.7.1 Poisson process
A Poisson process is a counting process with rate λ, such that Xt is the Poisson random variable
(with parameter λt), or in other words the number of events that occur during the interval
[0, t).
2.8 Hypergeometric random variable
A random sample of size n is drawn from a collection of N objects. Of the N objects, r objects
have a trait of interest.
def The hypergeometric random variable X is the number of objects with the trait of interest
in the random sample. It takes the parameters (N, n, r ).
The pmf of a hypergeometric random variable is given by:
r N −r
x n−x
P(X = x) =
N
n
r
prop E [X ] = n
N
r N − r N − n
prop V (X ) = n
N N N −1