0% found this document useful (0 votes)

10 views244 pages

Multiple Random Variables

The document is a lecture on statistical inference focusing on multiple random variables, covering topics such as joint and marginal distributions, conditional distribution, and independence. It introduces concepts like random vectors, joint probability mass functions (pmf), and expectations of functions of random vectors, along with examples to illustrate these concepts. The lecture also discusses joint probability density functions (pdf) and methods for calculating probabilities and marginal pdfs.

Uploaded by

bocerin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views244 pages

Multiple Random Variables

Uploaded by

bocerin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 244

Statistical Inference

Lecture 4: Multiple Random Variables

MING GAO

DASE @ ECNU
(for course related communications)
[email protected]

Mar. 21, 2018

Outline
Joint and Marginal Distributions
Conditional Distribution and Independence
Bivariate Transformations
Hierarchical Models and Mixture Distributions
Hierarchical Models and Mixture Distributions
Covariance and Correlation
Multivariate Distributions
Inequalities
Numerical Inequalities
Functional Inequalities
Take-aways
2 / 97
Random vector
Definition
An n-dimensional random vector is a function from a sample
space Ω into Rn , n−dimensional Euclidean space.

3 / 97
Random vector
Definition
An n-dimensional random vector is a function from a sample
space Ω into Rn , n−dimensional Euclidean space.

Example

Consider the experiment of tossing two fair dices. Let

X = sum of the two dices, and Y = |difference of the two dices|.

For the sample point (3, 3), X = 6 and Y = 0.
For the sample point (4, 1) or (1, 4), X = 5 and Y = 3.
Since each of the 36 sample points in Ω is equally likely, thus
1
P(X = 5 ∧ Y = 3) = .
18
3 / 97
Joint PMF
Definition
Let (X , Y ) be a discrete bivariate random vector. Then the
function f (x, y ) from R2 into R defined by

f (x, y ) = P(X = x, Y = y )

is called the joint probability mass function or joint pmf of

(X , Y ). If it is necessary, the notation fX ,Y (x, y ) will be used.

4 / 97
Joint PMF
Definition
Let (X , Y ) be a discrete bivariate random vector. Then the
function f (x, y ) from R2 into R defined by

f (x, y ) = P(X = x, Y = y )

is called the joint probability mass function or joint pmf of

(X , Y ). If it is necessary, the notation fX ,Y (x, y ) will be used.

Example

There are 21 possible values of (X , Y ).

1
Two of these values, f (5, 3) = 18 and
1
f (6, 0) = 36 .
4 / 97
Probability calculation

The joint pmf can be used to compute the probability of any

event defined in terms of (X , Y ). Let A be any subset of R2 .
Then X
P((X , Y ) ∈ A) = f (x, y ).
(x,y )∈A

5 / 97
Probability calculation

The joint pmf can be used to compute the probability of any

event defined in terms of (X , Y ). Let A be any subset of R2 .
Then X
P((X , Y ) ∈ A) = f (x, y ).
(x,y )∈A

Example

Let A = {(x, y )|x = 7 and y ≤ 4.}

Thus
1 1 1
P(A) = P(X = 7, Y ≤ 4) = f (7, 1) + f (7, 3) = + = .
18 18 9

5 / 97
Expectation
Expectations of functions of random vectors are computed just
as with univariate r.v.s. Let g (x, y ) be a real-valued function
defined for all possible values (x, y ) of the discrete random vec-
tor (X , Y ). Then g (X , Y ) is itself a random variable and its
expected value E (g (X , Y )) is given by
X
E (g (X , Y )) = g (x, y )f (x, y ).
(x,y )∈R2
Expectation
Expectations of functions of random vectors are computed just
as with univariate r.v.s. Let g (x, y ) be a real-valued function
defined for all possible values (x, y ) of the discrete random vec-
tor (X , Y ). Then g (X , Y ) is itself a random variable and its
expected value E (g (X , Y )) is given by
X
E (g (X , Y )) = g (x, y )f (x, y ).
(x,y )∈R2

Question:
For the above given (X , Y ), what is the average value of XY ?
Expectation
Expectations of functions of random vectors are computed just
as with univariate r.v.s. Let g (x, y ) be a real-valued function
defined for all possible values (x, y ) of the discrete random vec-
tor (X , Y ). Then g (X , Y ) is itself a random variable and its
expected value E (g (X , Y )) is given by
X
E (g (X , Y )) = g (x, y )f (x, y ).
(x,y )∈R2

Question:
For the above given (X , Y ), what is the average value of XY ?
Answer: Letting g (x, y ) = xy , we compute E (XY ) =
E (g (X , Y )). Thus,

1 1 11
E (XY ) = 2 × 0 × + ··· + 7 × 5 × = 13 .
6 / 97 36 18 18
Properties of joint pmf

Properties
For any (x, y ), f (x, y ) ≥ 0 since f (x, y ) is a probability.
Since (X , Y ) is certain to be in R2
X
f (x, y ) = P((X , Y ) ∈ R2 ) = 1.
(x,y )∈R2

It turns out that any nonnegative function from R2 to R

that is nonzero for at most a countable number of (x, y )
pairs and sums to 1 is the joint pmf for some bivariate
discrete random vector (X , Y ).

7 / 97
Marginal pmf
Theorem
Let (X , Y ) be a discrete bivariate random vector with join-
t pmf fX ,Y (x, y ). Then the marginal pmfs of X and Y ,
fX (x) = P(X = x) and fY (y ) = P(Y = y ), are given by
X X
fX (x) = fX ,Y (x, y ), fY (y ) = fX ,Y (x, y ).
y ∈R x∈R

Proof.
For any x ∈ R, let Ax = {(x, y )|y ∈ R}. That is, Ax is the
line in the plane with first coordinate equal to x. Then, for any
x ∈ R:

fX (x) = P(X = x) = P(X = x, −∞ < Y < ∞) = P((X , Y ) ∈ Ax )

X X
= fX ,Y (x, y ) = fX ,Y (x, y ).
(x,y )∈Ax y ∈R

8 / 97
Example

Given the above joint pmf, we can compute the marginal pmf
of Y .

fY (0) = fX ,Y (2, 0) + fX ,Y (4, 0) + fX ,Y (6, 0)

1
+ fX ,Y (8, 0) + fX ,Y (10, 0) + fX ,Y (12, 0) =
6
Example

Given the above joint pmf, we can compute the marginal pmf
of Y .

fY (0) = fX ,Y (2, 0) + fX ,Y (4, 0) + fX ,Y (6, 0)

1
+ fX ,Y (8, 0) + fX ,Y (10, 0) + fX ,Y (12, 0) =
6
5
Similarly, we have fY (1) = 18 , fY (2) = 29 , fY (3) = 61 , fY (4) =
1 1
9 , and fY (1) = 18 .
Example

Given the above joint pmf, we can compute the marginal pmf
of Y .

fY (0) = fX ,Y (2, 0) + fX ,Y (4, 0) + fX ,Y (6, 0)

1
+ fX ,Y (8, 0) + fX ,Y (10, 0) + fX ,Y (12, 0) =
6
5
Similarly, we have fY (1) = 18 , fY (2) = 29 , fY (3) = 61 , fY (4) =
1
9 , and fY (1) = 1.
P5 18
Note that k=0 fY (k) = 1, as it must, since these are the only
six possible values of Y .

9 / 97
Joint PDF
Definition

A function f (x, y ) from R2 into R is called a joint probability

density function or joint pdf of the continuous bivariate random
vector (X , Y ) if, for every A ∈ R2
Z Z
P((X , Y ) ∈ A) = f (x, y )dxdy .
A
If g (x, y ) be a real-valued function, then the expected
values of g (X , Y ) is defined to be
Z +∞ Z +∞
E (g (X , Y )) = g (x, y )f (x, y )dxdy .
−∞ −∞
The marginal probability density functions of X and Y are
Z +∞ Z +∞
fX (x) = fX ,Y (x, y )dy , fY (y ) = fX ,Y (x, y )dx.
10 / 97 −∞ −∞
Example

Define a joint pdf by

6xy 2 , 0 < x < 1 and 0 < y < 1;

f (x, y ) =
0, otherwise.

It is indeed a joint pdf, since

Example

Define a joint pdf by

6xy 2 , 0 < x < 1 and 0 < y < 1;

f (x, y ) =
0, otherwise.

It is indeed a joint pdf, since

f (x, y ) ≥ 0 for all (x, y ) in the defined range;
Example

Define a joint pdf by

6xy 2 , 0 < x < 1 and 0 < y < 1;

f (x, y ) =
0, otherwise.

It is indeed a joint pdf, since

f (x, y ) ≥ 0 for all (x, y ) in the defined range;

Z +∞ Z +∞ Z 1 Z 1
f (x, y )dxdy = 6xy 2 dxdy
−∞ −∞ 0 0
Z 1 Z 1
= 3x 2 y 2 |10 dy = 3y 2 dy = y 3 |10 = 1.
0 0

11 / 97
Calculating probability I

Now consider calculating a probability such as P(X + Y ≥ 1).

Letting A = {(x, y )|x + y ≥ 1}, i.e., P((X , Y ) ∈ A).
Calculating probability I

Now consider calculating a probability such as P(X + Y ≥ 1).

Letting A = {(x, y )|x + y ≥ 1}, i.e., P((X , Y ) ∈ A).

A = {(x, y )|x + y ≥ 1, 0 < x < 1, 0 < y < 1}

= {(x, y )|x ≥ 1 − y , 0 < x < 1, 0 < y < 1}
= {(x, y )|1 − y ≤ x < 1, 0 < y < 1}
Calculating probability I

Now consider calculating a probability such as P(X + Y ≥ 1).

Letting A = {(x, y )|x + y ≥ 1}, i.e., P((X , Y ) ∈ A).

A = {(x, y )|x + y ≥ 1, 0 < x < 1, 0 < y < 1}

= {(x, y )|x ≥ 1 − y , 0 < x < 1, 0 < y < 1}
= {(x, y )|1 − y ≤ x < 1, 0 < y < 1}

Thus,
Z Z Z 1Z 1
9
P((X , Y ) ∈ A) = f (x, y )dxdy = 6xy 2 dxdy = .
A 0 1−y 10

12 / 97
Calculating marginal pdf
To calculate fX (x), we note that for x ≥ 1 or x ≤ 0, f (x, y ) = 0.
Thus for x ≥ 1 or x ≤ 0, we have
Z +∞
fX (x) = f (x, y )dy = 0.
−∞
Calculating marginal pdf
To calculate fX (x), we note that for x ≥ 1 or x ≤ 0, f (x, y ) = 0.
Thus for x ≥ 1 or x ≤ 0, we have
Z +∞
fX (x) = f (x, y )dy = 0.
−∞

For 0 < x < 1, we have

Z +∞ Z 1
fX (x) = f (x, y )dy = 6xy 2 dy = 2xy 3 |10 = 2x.
−∞ 0
Calculating marginal pdf
To calculate fX (x), we note that for x ≥ 1 or x ≤ 0, f (x, y ) = 0.
Thus for x ≥ 1 or x ≤ 0, we have
Z +∞
fX (x) = f (x, y )dy = 0.
−∞

For 0 < x < 1, we have

Z +∞ Z 1
fX (x) = f (x, y )dy = 6xy 2 dy = 2xy 3 |10 = 2x.
−∞ 0

Similarly, we can calculate

3y 2 , 0 < y < 1;

fX (x) = .
0, otherwise.

13 / 97
Calculating probability II
Let f (x, y ) = e −y , 0 < x < y < ∞, and A = {(x, y )|x +y ≥ 1}.
Calculating probability II
Let f (x, y ) = e −y , 0 < x < y < ∞, and A = {(x, y )|x +y ≥ 1}.
Notice that region A is an unbounded region with three sides
given by the lines y = x, x + y = 1 and x = 0. To integrate
over this region, we would have to break the region into at least
two parts to write this appropriate limits of integration.
Calculating probability II
Let f (x, y ) = e −y , 0 < x < y < ∞, and A = {(x, y )|x +y ≥ 1}.
Notice that region A is an unbounded region with three sides
given by the lines y = x, x + y = 1 and x = 0. To integrate
over this region, we would have to break the region into at least
two parts to write this appropriate limits of integration.
Thus P((X , Y ) ∈ A) can be calculated as

P(X + Y ≥ 1) = 1 − P(X + Y < 1)

Z 1 Z 1−x
2
=1− e −y dydx
0 x
Z 1
2
=1− (e −x − e −(1−x) )dx
0
− 21
= 2e − e −1
14 / 97
Joint cdf
The joint probability distribution of (X , Y ) can be completely
described with the joint cdf rather than with the joint pmf or
joint pdf.
Joint cdf
The joint probability distribution of (X , Y ) can be completely
described with the joint cdf rather than with the joint pmf or
joint pdf.
The joint cdf is the function F (x, y ) defined by

F (x, y ) = P(X ≤ x, Y ≤ y ).
Joint cdf
The joint probability distribution of (X , Y ) can be completely
described with the joint cdf rather than with the joint pmf or
joint pdf.
The joint cdf is the function F (x, y ) defined by

F (x, y ) = P(X ≤ x, Y ≤ y ).
The joint cdf is usually not very handy for discrete cases;
Joint cdf
The joint probability distribution of (X , Y ) can be completely
described with the joint cdf rather than with the joint pmf or
joint pdf.
The joint cdf is the function F (x, y ) defined by

F (x, y ) = P(X ≤ x, Y ≤ y ).
The joint cdf is usually not very handy for discrete cases;
For continuous bivariate random vector,
Z x Z y
F (x, y ) = f (s, t)dsdt.
−∞ −∞
Joint cdf
The joint probability distribution of (X , Y ) can be completely
described with the joint cdf rather than with the joint pmf or
joint pdf.
The joint cdf is the function F (x, y ) defined by

F (x, y ) = P(X ≤ x, Y ≤ y ).
The joint cdf is usually not very handy for discrete cases;
For continuous bivariate random vector,
Z x Z y
F (x, y ) = f (s, t)dsdt.
−∞ −∞

∂ 2 F (x, y )
= f (x, y ).
∂x∂y
15 / 97
Conditional pmf
Let (X , Y ) be a discrete bivariate random vector with joint pmf
f (x, y ) and marginal pmfs fX (x) and fY (y ). For any x such that
P(X = x) = fX (x) > 0, the conditional pmf of Y given that
X = x is the function of y denoted by f (y |x) and defined by

f (x, y )
f (y |x) = P(Y = y |X = x) = .
fX (x)
Conditional pmf
Let (X , Y ) be a discrete bivariate random vector with joint pmf
f (x, y ) and marginal pmfs fX (x) and fY (y ). For any x such that
P(X = x) = fX (x) > 0, the conditional pmf of Y given that
X = x is the function of y denoted by f (y |x) and defined by

f (x, y )
f (y |x) = P(Y = y |X = x) = .
fX (x)

For any y such that P(Y = y ) = fY (y ) > 0, the conditional

pmf of X given that Y = y is the function of x denoted by
f (x|y ) and defined by

f (x, y )
f (x|y ) = P(X = x|Y = y ) = .
fY (y )

16 / 97
Example
Define the joint pmf of (X , Y ) by
2 3
f (0, 10) = f (0, 20) = , f (1, 10) = f (1, 30) =
18 18
4 4
f (1, 20) = , f (2, 30) = .
18 18
Example
Define the joint pmf of (X , Y ) by
2 3
f (0, 10) = f (0, 20) = , f (1, 10) = f (1, 30) =
18 18
4 4
f (1, 20) = , f (2, 30) = .
18 18
First, the marginal pmf of X is
4 4
fX (0) = f (0, 10) + f (0, 20) = , fX (2) = f (2, 30) =
18 18
10
fX (1) = f (1, 10) + f (1, 20) + f (1, 30) =
18
Example
Define the joint pmf of (X , Y ) by
2 3
f (0, 10) = f (0, 20) = , f (1, 10) = f (1, 30) =
18 18
4 4
f (1, 20) = , f (2, 30) = .
18 18
First, the marginal pmf of X is
4 4
fX (0) = f (0, 10) + f (0, 20) = , fX (2) = f (2, 30) =
18 18
10
fX (1) = f (1, 10) + f (1, 20) + f (1, 30) =
18
For x = 0,
f (0, 10) 1 f (0, 20) 1
fX (10|0) = = , fX (20|0) = =
fX (0) 2 fX (0) 2
17 / 97
Example Cont’d
For x = 1,
f (1, 10) 3
fX (10|1) = =
fX (1) 10
f (1, 20) 4
fX (20|1) = =
fX (1) 10
f (1, 30) 3
fX (30|1) = =
fX (1) 10
Example Cont’d
For x = 1,
f (1, 10) 3
fX (10|1) = =
fX (1) 10
f (1, 20) 4
fX (20|1) = =
fX (1) 10
f (1, 30) 3
fX (30|1) = =
fX (1) 10

For x = 2,
f (2, 30)
fX (30|2) = =1
fX (2)

18 / 97
Conditional pdf

Let (X , Y ) be a continuous bivariate random vector with joint

pmf f (x, y ) and marginal pdfs fX (x) and fY (y ). For any x such
that P(X = x) = fX (x) > 0, the conditional pdf of Y given
that X = x is the function of y denoted by f (y |x) and defined
by
f (x, y )
f (y |x) = .
fX (x)
For any y such that P(Y = y ) = fY (y ) > 0, the conditional pdf
of X given that Y = y is the function of x denoted by f (x|y )
and defined by
f (x, y )
f (x|y ) = .
fY (y )

19 / 97
Calculating conditional pdf

Let f (x, y ) = e −y , 0 < x < y < ∞, and A = {(x, y )|x +y ≥ 1}.

We need to compute the conditional pdf of Y given X = x.
Calculating conditional pdf

Let f (x, y ) = e −y , 0 < x < y < ∞, and A = {(x, y )|x +y ≥ 1}.

We need to compute the conditional pdf of Y given X = x.
The marginal pdf of X is computed as
For x ≤ 0, fX (x) = 0 since f (x, y ) = 0;
For x > 0,
Z ∞ Z ∞
fX (x) = f (x, y )dy = e −y dy = e −x .
−∞ x

Thus, the conditional pdf of Y given X = x can be

f (x,y ) e −y
f (y |x) = fX (x) = e −x
= e −(y −x) , if y > x;
f (x,y ) 0
f (y |x) = fX (x) = e −x
= 0, if y ≤ x;

20 / 97
Conditional expectation
If g (Y ) is a function of Y , then the conditional expected value
of g (Y ) given that X = x is denoted by E (g (Y |x)) and is
defined by
X
E (g (Y |x)) = g (y )f (y |x)
y
Z ∞
E (g (Y |x)) = g (y )f (y |x)dy
−∞
Conditional expectation
If g (Y ) is a function of Y , then the conditional expected value
of g (Y ) given that X = x is denoted by E (g (Y |x)) and is
defined by
X
E (g (Y |x)) = g (y )f (y |x)
y
Z ∞
E (g (Y |x)) = g (y )f (y |x)dy
−∞

The conditional expected value has all of the properties of

the usual expected value;
E (Y |x) provides the best guess at Y based on knowledge of
X.

21 / 97
Calculating conditional expectation and variance
Given above example, the conditional expected value of Y given
X = x can be calculated as
Z ∞
E (Y |X = x) = ye −(y −x) dy = 1 + x.
x
Calculating conditional expectation and variance
Given above example, the conditional expected value of Y given
X = x can be calculated as
Z ∞
E (Y |X = x) = ye −(y −x) dy = 1 + x.
x

The conditional variance can be computed as

Var (Y |X = x) = E (Y 2 |x) − (E (Y |x))2

Z ∞ Z ∞
2 −(y −x)
= y e dy − ( ye −(y −x) dy )2 = 1
x x
Calculating conditional expectation and variance
Given above example, the conditional expected value of Y given
X = x can be calculated as
Z ∞
E (Y |X = x) = ye −(y −x) dy = 1 + x.
x

The conditional variance can be computed as

Var (Y |X = x) = E (Y 2 |x) − (E (Y |x))2

Z ∞ Z ∞
2 −(y −x)
= y e dy − ( ye −(y −x) dy )2 = 1
x x

Note that the marginal distribution of Y is gamma(2, 1), which

has Var (Y ) = 2. Given the knowledge that X = x, the variabil-
ity in Y is considerably reduced.
22 / 97
Independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ) and marginal pdfs or pmfs fX (x) and fY (y ). Then X
and Y are called independent r.v.s if, for any x ∈ R and y ∈ R

f (x, y ) = fX (x)fY (y ).
Independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ) and marginal pdfs or pmfs fX (x) and fY (y ). Then X
and Y are called independent r.v.s if, for any x ∈ R and y ∈ R

f (x, y ) = fX (x)fY (y ).
If X and Y are independent, the conditional pdf of Y given
X = x is
f (x, y ) fX (x)fY (y )
f (y |x) = = = fY (y ).
fX (x) fX (x)
Independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ) and marginal pdfs or pmfs fX (x) and fY (y ). Then X
and Y are called independent r.v.s if, for any x ∈ R and y ∈ R

f (x, y ) = fX (x)fY (y ).
If X and Y are independent, the conditional pdf of Y given
X = x is
f (x, y ) fX (x)fY (y )
f (y |x) = = = fY (y ).
fX (x) fX (x)
For any A ⊂ R and x ∈ R,
Z Z
P(Y ∈ A|x) = f (y |x)dy = fY (y )dy = P(Y ∈ A).
A A
23 / 97
Checking independent I
Define the joint pmf of (X , Y ) by
1
f (10, 1) = f (20, 1) = f (20, 2) =
10
1 3
f (10, 2) = f (10, 3) = , f (20, 3) = .
5 10
Checking independent I
Define the joint pmf of (X , Y ) by
1
f (10, 1) = f (20, 1) = f (20, 2) =
10
1 3
f (10, 2) = f (10, 3) = , f (20, 3) = .
5 10
The marginal pmfs are
1
fX (10) = fX (20) =
2
1 3 1
fY (1) = , fY (2) = , fY (3) =
5 10 2
Checking independent I
Define the joint pmf of (X , Y ) by
1
f (10, 1) = f (20, 1) = f (20, 2) =
10
1 3
f (10, 2) = f (10, 3) = , f (20, 3) = .
5 10
The marginal pmfs are
1
fX (10) = fX (20) =
2
1 3 1
fY (1) = , fY (2) = , fY (3) =
5 10 2
Thus, the r.v.s X and Y are not independent since
1 11
f (10, 3) = 6= = fX (10)fY (3).
5 22
24 / 97
Lemma for independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ). Then X and Y are independent r.v.s if and only if there
exist functions g (x) and h(y ) such that, for every x ∈ R and
y ∈R
f (x, y ) = g (x)h(y ).

Proof.
Lemma for independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ). Then X and Y are independent r.v.s if and only if there
exist functions g (x) and h(y ) such that, for every x ∈ R and
y ∈R
f (x, y ) = g (x)h(y ).

Proof.
⇒: Easily to prove based on the definition.
Lemma for independent r.v.s
Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x, y ). Then X and Y are independent r.v.s if and only if there
exist functions g (x) and h(y ) such that, for every x ∈ R and
y ∈R
f (x, y ) = g (x)h(y ).

Proof.
⇒: Easily to prove based on the definition.
⇐: Let f (x, y ) = g (x)h(y ). We define
Z ∞
g (x)dx = c
−∞
Z ∞
h(y )dy = d
−∞

25 / 97
Proof Cont’d
Z ∞ Z ∞ Z ∞Z ∞
cd = ( g (x)dx)( h(y )dy ) = g (x)h(y )dxdy
−∞ −∞ −∞ −∞
Z ∞Z ∞
f (x, y )dxdy = 1
−∞ −∞

Furthermore, the marginal pdfs are given by

Z ∞ Z ∞
fX (x) = g (x)h(y )dy = dg (x), fY (y ) = g (x)h(y )dx = ch(y )
−∞ −∞

Thus we have

f (x, y ) = g (x)h(y ) = g (x)h(y )cd = fX (x)fY (y )

That is X and Y are independent.

26 / 97
Checking independent II
x
1 2 4 −y − 2
Consider the joint pdf f (x, y ) = 384 x y e , x > 0 and
y > 0.
Question: Please confirm whether r.v.s X and Y are indepen-
dent.
Checking independent II
x
1 2 4 −y − 2
Consider the joint pdf f (x, y ) = 384 x y e , x > 0 and
y > 0.
Question: Please confirm whether r.v.s X and Y are indepen-
dent.
Answer: If we define
1
x 2 e − 2 , x > 0;
g (x) =
0, otherwise.
1 4 −y

h(y ) = 384 y e , y > 0;
0, otherwise.
Checking independent II
x
1 2 4 −y − 2
Consider the joint pdf f (x, y ) = 384 x y e , x > 0 and
y > 0.
Question: Please confirm whether r.v.s X and Y are indepen-
dent.
Answer: If we define
1
x 2 e − 2 , x > 0;
g (x) =
0, otherwise.
1 4 −y

h(y ) = 384 y e , y > 0;
0, otherwise.

Then f (x, y ) = g (x)h(y ) for all x ∈ R and y ∈ R. In terms of

the lemma, we conclude that X and Y are independent r.v.s.
Checking independent II
x
1 2 4 −y − 2
Consider the joint pdf f (x, y ) = 384 x y e , x > 0 and
y > 0.
Question: Please confirm whether r.v.s X and Y are indepen-
dent.
Answer: If we define
1
x 2 e − 2 , x > 0;
g (x) =
0, otherwise.
1 4 −y

h(y ) = 384 y e , y > 0;
0, otherwise.

Then f (x, y ) = g (x)h(y ) for all x ∈ R and y ∈ R. In terms of

the lemma, we conclude that X and Y are independent r.v.s.
Note that we no not have to compute the marginal pdfs.

27 / 97
Theorem for independent r.v.s
Let X and Y are independent r.v.s
For any A ⊂ R and B ⊂ R,

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B),

i.e., the events {X ∈ A} and {Y ∈ B} are independent

events;
Theorem for independent r.v.s
Let X and Y are independent r.v.s
For any A ⊂ R and B ⊂ R,

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B),

i.e., the events {X ∈ A} and {Y ∈ B} are independent

events;
Let g (x) and h(y ) be functions only of x and y ,
respectively, then

E (g (X )h(Y )) = (E (g (X )))(E (h(Y ))).

Theorem for independent r.v.s
Let X and Y are independent r.v.s
For any A ⊂ R and B ⊂ R,

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B),

i.e., the events {X ∈ A} and {Y ∈ B} are independent

events;
Let g (x) and h(y ) be functions only of x and y ,
respectively, then

E (g (X )h(Y )) = (E (g (X )))(E (h(Y ))).

The moment generating function of the r.v. Z = X + Y is

given by
MZ (t) = MX (t)MY (t).
28 / 97
Expectation of independent r.v.s

Let X and Y are independent exponential(1) r.v.s

Expectation of independent r.v.s

Let X and Y are independent exponential(1) r.v.s

P(X ≥ 4, Y < 3) = P(X ≥ 4)P(Y < 3)

= e −4 (1 − e −3 )
Expectation of independent r.v.s

Let X and Y are independent exponential(1) r.v.s

P(X ≥ 4, Y < 3) = P(X ≥ 4)P(Y < 3)

= e −4 (1 − e −3 )

Letting g (x) = x 2 and h(y ) = y , we see that

E (X 2 Y ) = (E (X 2 ))(E (Y ))
= (Var (X ) + (E (X ))2 )E (Y )
= (1 + 12 )1 = 2.

29 / 97
MGF of a sum of normal variables
Let X ∼ N(µ1 , σ12 ) and Y ∼ N(µ2 , σ22 ) are independent r.v.s.
Then, the mgfs of X and Y are
2 2 /2
MX (t) = exp µ1 t+σ1 t
2 2 /2
MY (t) = exp µ2 t+σ2 t

In terms of the theorem, the mgf of Z = X + Y is

2 2 2 /2
MZ (t) = MX (t)MY (t) = exp (µ1 +µ2 )t+(σ1 +σ2 )t .

30 / 97
MGF of a sum of normal variables
Let X ∼ N(µ1 , σ12 ) and Y ∼ N(µ2 , σ22 ) are independent r.v.s.
Then, the mgfs of X and Y are
2 2 /2
MX (t) = exp µ1 t+σ1 t
2 2 /2
MY (t) = exp µ2 t+σ2 t

In terms of the theorem, the mgf of Z = X + Y is

2 2 2 /2
MZ (t) = MX (t)MY (t) = exp (µ1 +µ2 )t+(σ1 +σ2 )t .

Theorem

Let X ∼ N(µ1 , σ12 ) and Y ∼ N(µ2 , σ22 ) are independent r.v.s.

Then, the r.v. Z = X +Y has a N(µ1 +µ2 , σ12 +σ22 ) distribution.

30 / 97
Distribution of bivariate function

Let (X , Y ) be a bivariate random vector with a known proba-

bility distribution. Now consider a new bivariate random vector
(U, V ) defined by U = g1 (X , Y ) and V = g2 (X , Y ), where
gi (x, y ) is some specified function.
Distribution of bivariate function

Let (X , Y ) be a bivariate random vector with a known proba-

bility distribution. Now consider a new bivariate random vector
(U, V ) defined by U = g1 (X , Y ) and V = g2 (X , Y ), where
gi (x, y ) is some specified function.
If B ⊂ R2 if and only if (X , Y ) ∈ A, where

A = {(x, y )|(g1 (x, y ), g2 (x, y )) ∈ B}.

Distribution of bivariate function

Let (X , Y ) be a bivariate random vector with a known proba-

A = {(x, y )|(g1 (x, y ), g2 (x, y )) ∈ B}.

Thus
P((U, V ) ∈ B) = P((X , Y ) ∈ A),
i.e., the probability distribution of (U, V ) is completely deter-
mined by the probability distribution of (X , Y ).

31 / 97
Transformation of discrete r.v.s
If (X , Y ) is discrete bivariate random vector, then there is only
a countable set of values for which the joint pmf of (X , y ) is
positive. Call this set A.
Transformation of discrete r.v.s
If (X , Y ) is discrete bivariate random vector, then there is only
a countable set of values for which the joint pmf of (X , y ) is
positive. Call this set A.
Define the set

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}.

Transformation of discrete r.v.s
If (X , Y ) is discrete bivariate random vector, then there is only
a countable set of values for which the joint pmf of (X , y ) is
positive. Call this set A.
Define the set

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}.

Then B is the countable set of possible values for the discrete

random vector (U, V ). And if, for any (u, v ) ∈ B, we define

Auv = {(x, y ) ∈ A|u = g1 (x, y ) and v = g2 (x, y )}.

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}.

Then B is the countable set of possible values for the discrete

random vector (U, V ). And if, for any (u, v ) ∈ B, we define

Auv = {(x, y ) ∈ A|u = g1 (x, y ) and v = g2 (x, y )}.

Then the joint pmf of (U, V ) can be computed as

fU,V (u, v ) = P(U = u, V = v ) = P((X , Y ) ∈ Auv )

X
= fX ,Y (x, y ).
(x,y )∈Auv
32 / 97
Distribution of the sum of Poisson variables
Let X and Y are independent Poission r.v.s with parameters θ1
and θ2 , respectively. Thus the joint pmf of (X , Y ) is

θ1x e −θ1 θ2y e −θ2

fX ,Y (x, y ) = , x ∈ N, y ∈ N
x! y!
Distribution of the sum of Poisson variables
Let X and Y are independent Poission r.v.s with parameters θ1
and θ2 , respectively. Thus the joint pmf of (X , Y ) is

θ1x e −θ1 θ2y e −θ2

fX ,Y (x, y ) = , x ∈ N, y ∈ N
x! y!

Now define U = X + Y and V = Y . That is, g1 (x, y ) = x + y

and g2 (x, y ) = y . Thus,

A = {(x, y )|x ∈ N, y ∈ N}
B = {(u, v )|v ∈ N, u ≥ v , u ∈ N}.
Distribution of the sum of Poisson variables
Let X and Y are independent Poission r.v.s with parameters θ1
and θ2 , respectively. Thus the joint pmf of (X , Y ) is

θ1x e −θ1 θ2y e −θ2

fX ,Y (x, y ) = , x ∈ N, y ∈ N
x! y!

Now define U = X + Y and V = Y . That is, g1 (x, y ) = x + y

and g2 (x, y ) = y . Thus,

A = {(x, y )|x ∈ N, y ∈ N}
B = {(u, v )|v ∈ N, u ≥ v , u ∈ N}.

θ1u−v e −θ1 θ2v e −θ2

fU,V (u, v ) = fX ,Y (u − v , v ) =
(u − v )! v !
33 / 97
Distribution of the sum of Poisson variables Cont’d
In this example it is interesting to compute the marginal pmf of
U. Thus
u
X θ1u−v e −θ1 θ2v e −θ2
fU (u) =
(u − v )! v !
v =0
u
X θ1u−v θ2v
= e −(θ1 +θ2 )
(u − v )! v !
v =0
u
e −(θ1 +θ2 ) X u u−v v
= θ θ2
u! v 1
v =0
e −(θ1 +θ2 )
= (θ1 + θ2 )u
u!
(θ1 + θ2 )u −(θ1 +θ2 )
= e
u!
Distribution of the sum of Poisson variables Cont’d
In this example it is interesting to compute the marginal pmf of
U. Thus
u
X θ1u−v e −θ1 θ2v e −θ2
fU (u) =
(u − v )! v ! Theorem
v =0
u
X θ1u−vθ2v Let X and Y are in-
= e −(θ1 +θ2 )
(u − v )! v ! dependent Poission r.v.s
v =0
u with parameters θ1 and
e −(θ1 +θ2 ) X u u−v v
= θ θ2 θ2 , respectively. Thus
u! v 1
v =0 X + Y ∼ Poisson(θ1 +
e −(θ1 +θ2 ) θ2 ).
= (θ1 + θ2 )u
u!
(θ1 + θ2 )u −(θ1 +θ2 )
= e
u!
34 / 97
Transformation of continuous r.v.s
If (X , Y ) is a continuous bivariate random vector with joint pdf
fX ,Y (x, y ), then the joint pdf of U, V can be expressed in terms
of fX ,Y (x, y ).
Transformation of continuous r.v.s
If (X , Y ) is a continuous bivariate random vector with joint pdf
fX ,Y (x, y ), then the joint pdf of U, V can be expressed in terms
of fX ,Y (x, y ).
Define the sets

A = {(x, y )|fX ,Y (x, y ) > 0}

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}
Transformation of continuous r.v.s
If (X , Y ) is a continuous bivariate random vector with joint pdf
fX ,Y (x, y ), then the joint pdf of U, V can be expressed in terms
of fX ,Y (x, y ).
Define the sets

A = {(x, y )|fX ,Y (x, y ) > 0}

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}

For the simplest version of this result we assume that the trans-
formation u = g1 (x, y ) and v = g2 (x, y ) defines a one-to-one
transformation of A onto B.
Transformation of continuous r.v.s
If (X , Y ) is a continuous bivariate random vector with joint pdf
fX ,Y (x, y ), then the joint pdf of U, V can be expressed in terms
of fX ,Y (x, y ).
Define the sets

A = {(x, y )|fX ,Y (x, y ) > 0}

B = {(u, v )|u = g1 (x, y ) and v = g2 (x, y ) for some (x, y ) ∈ A}

For the simplest version of this result we assume that the trans-
formation u = g1 (x, y ) and v = g2 (x, y ) defines a one-to-one
transformation of A onto B.
For such a one-to-one, onto transformation, we can obtain a
reverse transformation by x = h1 (u, v ) and y = h2 (u, v ). The
role played by a derivative in the univariate case is now played
by a quantity called the Jacobian of the transformation.
35 / 97
Transformation of continuous r.v.s
We further define the Jacobian determinant of the transforma-
tion as
∂x ∂x
∂u ∂v
∂x ∂y ∂x ∂y
J = ∂y ∂y = − ,
∂u ∂v ∂u ∂v ∂v ∂u
∂x
where ∂u = ∂h1∂u
(u,v ) ∂y
, ∂v = ∂h2∂v
(u,v ) ∂x
, ∂v = ∂h1∂v
(u,v ) ∂y
, ∂u = ∂h2∂u
(u,v )
.
The joint pdf of (U, V ) is 0 outside the set B and on the set B
is given by

fU,V (u, v ) = fX ,Y (h1 (u, v ), h2 (u, v ))|J|,

where |J| is the absolute value of J.

Transformation of continuous r.v.s
We further define the Jacobian determinant of the transforma-
tion as
∂x ∂x
∂u ∂v
∂x ∂y ∂x ∂y
J = ∂y ∂y = − ,
∂u ∂v ∂u ∂v ∂v ∂u
∂x
where ∂u = ∂h1∂u
(u,v ) ∂y
, ∂v = ∂h2∂v
(u,v ) ∂x
, ∂v = ∂h1∂v
(u,v ) ∂y
, ∂u = ∂h2∂u
(u,v )
.
The joint pdf of (U, V ) is 0 outside the set B and on the set B
is given by

fU,V (u, v ) = fX ,Y (h1 (u, v ), h2 (u, v ))|J|,

where |J| is the absolute value of J.

Note that it is sometimes just as difficult to determine the set
B and verify that the transformation is one-to-one as it is to
substitute into the formula.
36 / 97
Sum and difference of normal variables
Let X and Y are independent, standard normal r.v.s. Consider
the transformation U = X + Y and V = X − Y , thus we have

g1 (x, y ) = x + y , g2 (x, y ) = x − y
u+v u−v
h1 (u, v ) = , h2 (u, v ) = .
2 2
Sum and difference of normal variables
Let X and Y are independent, standard normal r.v.s. Consider
the transformation U = X + Y and V = X − Y , thus we have

g1 (x, y ) = x + y , g2 (x, y ) = x − y
u+v u−v
h1 (u, v ) = , h2 (u, v ) = .
2 2
Furthermore,
∂x ∂x 1 1
∂u ∂v 2 2
1
J= ∂y ∂y = 1 =−
∂u ∂v 2 − 12 2
Sum and difference of normal variables
Let X and Y are independent, standard normal r.v.s. Consider
the transformation U = X + Y and V = X − Y , thus we have

g1 (x, y ) = x + y , g2 (x, y ) = x − y
u+v u−v
h1 (u, v ) = , h2 (u, v ) = .
2 2
Furthermore,
∂x ∂x 1 1
∂u ∂v 2 2
1
J= ∂y ∂y = 1 =−
∂u ∂v 2 − 12 2

fU,V (u, v ) = fX ,Y (h1 (u, v ), h2 (u, v ))|J|

1 −((u+v )/2)2 −((u−v )/2)2 1 2 1 2
= e e = ( √ √ e −u /4 )( √ √ e −v /4 )
4π 2π 2 2π 2
37 / 97
Analysis
The joint pdf has factored into a function of u and a
function of v . By the above lemma, U and V are
independent.
U ∼ N(0, 2) and V ∼ N(0, 2).
This important fact, that sums and differences of
independent normal r.v.s are independent normal r.v.s, is
true regardless of the means of X and Y , so long as
Var (X ) = Var (Y ).

38 / 97
Analysis
The joint pdf has factored into a function of u and a
function of v . By the above lemma, U and V are
independent.
U ∼ N(0, 2) and V ∼ N(0, 2).
This important fact, that sums and differences of
independent normal r.v.s are independent normal r.v.s, is
true regardless of the means of X and Y , so long as
Var (X ) = Var (Y ).

Theorem
Let X and Y be independent r.v.s. Let g (x) be a function only
of x and h(y ) be a function only of y . Then the r.v.s U = g (X )
and V = h(Y ) are independent.
38 / 97
Distribution of the ratio of normal variables
Let X and Y be independent N(0, 1) r.v.s. Consider the trans-
X
formation U = Y and V = |Y |.
Distribution of the ratio of normal variables
Let X and Y be independent N(0, 1) r.v.s. Consider the trans-
X
formation U = Y and V = |Y |.
Note that this transformation is not one-to-one since the points
(x, y ) and (−x, −y ) are both mapped into the same (u, v ) point.
Distribution of the ratio of normal variables
Let X and Y be independent N(0, 1) r.v.s. Consider the trans-
X
formation U = Y and V = |Y |.
Note that this transformation is not one-to-one since the points
(x, y ) and (−x, −y ) are both mapped into the same (u, v ) point.
Let

A1 = {(x, y ) : y > 0}, A2 = {(x, y ) : y < 0}, A0 = {(x, y ) : y = 0}.

Distribution of the ratio of normal variables
Let X and Y be independent N(0, 1) r.v.s. Consider the trans-
X
formation U = Y and V = |Y |.
Note that this transformation is not one-to-one since the points
(x, y ) and (−x, −y ) are both mapped into the same (u, v ) point.
Let

A1 = {(x, y ) : y > 0}, A2 = {(x, y ) : y < 0}, A0 = {(x, y ) : y = 0}.

Thus, B = {(u, v ) : v > 0} is the image of both A1 and A2

under the transformation.
Distribution of the ratio of normal variables
Let X and Y be independent N(0, 1) r.v.s. Consider the trans-
X
formation U = Y and V = |Y |.
Note that this transformation is not one-to-one since the points
(x, y ) and (−x, −y ) are both mapped into the same (u, v ) point.
Let

A1 = {(x, y ) : y > 0}, A2 = {(x, y ) : y < 0}, A0 = {(x, y ) : y = 0}.

Thus, B = {(u, v ) : v > 0} is the image of both A1 and A2

under the transformation.
The inverse transformation from B to A1 and B to A2 are given
by

x = h11 (u, v ) = uv , y = h21 (u, v ) = v

x = h12 (u, v ) = −uv , y = h22 (u, v ) = −v
39 / 97
Distribution of the ratio of normal variables Cont’d
Note that the Jacobians from the two inverses are J1 = J2 = v ,
and the joint pdf is
1 − x2 − y2
fX ,Y (x, y ) = e 2e 2.
2π
Distribution of the ratio of normal variables Cont’d
Note that the Jacobians from the two inverses are J1 = J2 = v ,
and the joint pdf is
1 − x2 − y2
fX ,Y (x, y ) = e 2e 2.
2π
Thus, we obtain
1 − (uv )2 − v 2 1 − (−uv )2 − (−v )2
fU,V (u, v ) = e 2 e 2 |v | + e 2 e 2 |v |
2π 2π
v (u 2 +1)2 v 2
= e − 2 , −∞ < u, v < ∞
π
Distribution of the ratio of normal variables Cont’d
Note that the Jacobians from the two inverses are J1 = J2 = v ,
and the joint pdf is
1 − x2 − y2
fX ,Y (x, y ) = e 2e 2.
2π
Thus, we obtain
1 − (uv )2 − v 2 1 − (−uv )2 − (−v )2
fU,V (u, v ) = e 2 e 2 |v | + e 2 e 2 |v |
2π 2π
v (u 2 +1)2 v 2
= e − 2 , −∞ < u, v < ∞
π
From this the marginal pdf of U can be computed to be
Z ∞ Z ∞
v − (u2 +1)2 v 2 1 (u 2 +1)2 z 1
fU (u) = e 2 dv = e− 2 dz = .
0 π 2π 0 π(u 2+ 1)
So we see that the ratio of two independent standard normal
r.v.s is a Cauchy r.v.
40 / 97
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
The “large number” of eggs laid is a r.v., often taken to be
Poisson(λ). Furthermore, if we assume that each egg’s survival
is independent, then we have Bernoulli trials.
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
The “large number” of eggs laid is a r.v., often taken to be
Poisson(λ). Furthermore, if we assume that each egg’s survival
is independent, then we have Bernoulli trials. Let

X = number of survivors, Y = number of eggs laid,

Thus, we have a hierarchical model as

X |Y ∼ Binomial(Y , p),
Y ∼ Poisson(λ).
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
The “large number” of eggs laid is a r.v., often taken to be
Poisson(λ). Furthermore, if we assume that each egg’s survival
is independent, then we have Bernoulli trials. Let

X = number of survivors, Y = number of eggs laid,

Thus, we have a hierarchical model as

X |Y ∼ Binomial(Y , p),
Y ∼ Poisson(λ).
Recall that we use notation such as X |Y ∼ Binomial(Y , p) to
mean that the conditional distribution of X given Y = y is
Binomial(y , p).
41 / 97
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Thus, any marginal inference on X is with respect to a
Poisson(λp) distribution, with Y playing not part at all.
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Thus, any marginal inference on X is with respect to a
Poisson(λp) distribution, with Y playing not part at all.
The answer to the original question is now easy to compute
E (X ) = λp.
42 / 97
Theorem for expectation of conditional expectation
If X and Y are any two r.v.s, then

E (X ) = E (E (X |Y ))

provided that the expectations exist.

Proof.
Let f (x, y ) denote that joint pdf of X and Y . By definition,
we have
Z Z Z hZ i
E (X ) = xf (x, y )dxdy = xf (x|y )dx fY (y )dy .

Thus, we have
Z
E (X ) = E (X |y )fY (y )dy = E (E (X |Y )).

Replace integrals by sums to prove the discrete case.

43 / 97
Mixture distribution
From the above theorem, we can easily compute the expected
number of survivors

E (X ) = E (E (X |Y )) = E (pY ) = pλ.

44 / 97
Mixture distribution
From the above theorem, we can easily compute the expected
number of survivors

E (X ) = E (E (X |Y )) = E (pY ) = pλ.

Definition
A r.v. X is said to have a mixture distribution if the distribution
of X depends on quantity that also has a distribution.

44 / 97
Mixture distribution
From the above theorem, we can easily compute the expected
number of survivors

E (X ) = E (E (X |Y )) = E (pY ) = pλ.

Definition
A r.v. X is said to have a mixture distribution if the distribution
of X depends on quantity that also has a distribution.

In the above example, the Poisson(λp) distribution is a mixture

distribution since it is the result of combining a Binomial(Y , p) with
Y ∼ Poisson(λ).

44 / 97
Mixture distribution
From the above theorem, we can easily compute the expected
number of survivors

E (X ) = E (E (X |Y )) = E (pY ) = pλ.

Definition
A r.v. X is said to have a mixture distribution if the distribution
of X depends on quantity that also has a distribution.

In the above example, the Poisson(λp) distribution is a mixture

distribution since it is the result of combining a Binomial(Y , p) with
Y ∼ Poisson(λ).
In general, we can say that hierarchical models lead to mixture
distributions.
44 / 97
Example generalization
Instead of one mother insect, there are a large number of moth-
ers and one mother is chosen at random. We are still interested
in knowing the average number of survivors, but is is no longer
clear that the number of eggs laid follows the same Poisson
distribution for each mother.
Example generalization
Instead of one mother insect, there are a large number of moth-
ers and one mother is chosen at random. We are still interested
in knowing the average number of survivors, but is is no longer
clear that the number of eggs laid follows the same Poisson
distribution for each mother.
The following three-stage hierarchy may be more appropriate.
Let

X = number of survivors, X ∼ binomial(Y , p)

Y |Λ ∼ Poisson(Λ), Λ ∼ exponential(β),

Thus, the expectation of X can easily be calculated as

E (X ) = E (E (X |Y )) = E (pY ) = E (E (pY |Λ)) = E (pΛ) = pβ.

45 / 97
Rethinking the three-stage model
Note that this three-stage model can also be thought of as a
two-stage hierarchy by combining the last two stages.
Rethinking the three-stage model
Note that this three-stage model can also be thought of as a
two-stage hierarchy by combining the last two stages. If Y |Λ ∼
Poisson(Λ) and Λ ∼ exponential(β), then
Z ∞
P(Y = y ) = P(Y = y , 0 < Λ < ∞) = f (y , λ)dλ
0
Z ∞ Z ∞ h −λ y i
e λ 1 − βλ
= f (y |λ)f (λ)dλ = e dλ
0 0 y! β
Z ∞
1 −1 1 1 y +1
= λy e −λ(1+β ) dλ = Γ(y + 1)
βy ! 0 βy ! 1 + β −1
1 1 y +1
= −1
.
1+β 1+β
It forms a negative binomial pmf. Therefore, our three-stage
hierarchy is equivalent to the two-stage hierarchy
46 / 97
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
The “large number” of eggs laid is a r.v., often taken to be
Poisson(λ). Furthermore, if we assume that each egg’s survival
is independent, then we have Bernoulli trials.
Binomial-Poisson hierarchy
An insect lays a large number of eggs, each surviving with prob-
ability p. On the average, how many eggs will survive?
The “large number” of eggs laid is a r.v., often taken to be
Poisson(λ). Furthermore, if we assume that each egg’s survival
is independent, then we have Bernoulli trials. Let

X = number of survivors, Y = number of eggs laid,

Thus, we have a hierarchical model as

X = number of survivors, Y = number of eggs laid,

Thus, we have a hierarchical model as

X |Y ∼ Binomial(Y , p),
Y ∼ Poisson(λ).
Recall that we use notation such as X |Y ∼ Binomial(Y , p) to
mean that the conditional distribution of X given Y = y is
Binomial(y , p).
47 / 97
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Thus, any marginal inference on X is with respect to a
Poisson(λp) distribution, with Y playing not part at all.
Binomial-Poisson hierarchy Cont’d
∞
X ∞
X
P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
y =0 y =0
∞ h ih λy e −λ i
X y x
= p (1 − p)(y −x)
y =0
x y!
∞
(λp)x e −λ X ((1 − p)λ)(y −x)
=
x! y =x
(y − x)!
∞
(λp)x e −λ X ((1 − p)λ)t (λp)x e −λ (1−p)λ (λp)x λp
= = e = e .
x! t=0
t! x! x!
Thus, any marginal inference on X is with respect to a
Poisson(λp) distribution, with Y playing not part at all.
The answer to the original question is now easy to compute
E (X ) = λp.
48 / 97
Beta-binomial hierarchy

One generalization of the binomial distribution is to allow the

success probability to vary according to a distribution. A stan-
dard model for this situation is

X |P ∼ Binomial(P),
P ∼ β(α, β).
Beta-binomial hierarchy

One generalization of the binomial distribution is to allow the

success probability to vary according to a distribution. A stan-
dard model for this situation is

X |P ∼ Binomial(P),
P ∼ β(α, β).
By iterating the expectation, we calculate the mean of X asThus,
any marginal inference on X as
nα
E (X ) = E (E (X |P)) = E (n|P) = .
α+β

49 / 97
Conditional variance identity
Theorem
For any two r.v.s

Var (X ) = E (Var (X |Y )) + Var (E (X |Y )),

provided that the expectations exist.

Proof.
By definition, we have

Var (X ) = E ((X − E (X ))2 )

= E ([X − E (X |Y ) + E (X |Y ) − E (X )]2 )
0 = E ([X − E (X |Y )][E (X |Y ) − E (X )])
E ([X − E (X |Y )] ) = E (E {[X − E (X |Y )]2 |Y }) = E (Var (X |Y ))
2

E ([E (X |Y ) − E (X )]2 ) = Var (E (X |Y ))

50 / 97
Beta-binomial hierarchy Cont’d
To calculate the variance of X , we have from

Var (X ) = Var (E (X |P)) + E (Var (X |P))

Note that E (X |P) = nP and Var (X |P) = nP(1 − P), where

P ∼ beta(α, β),
Beta-binomial hierarchy Cont’d
To calculate the variance of X , we have from

Var (X ) = Var (E (X |P)) + E (Var (X |P))

Note that E (X |P) = nP and Var (X |P) = nP(1 − P), where

Var (X ) = Var (E (X |P)) + E (Var (X |P))

Note that E (X |P) = nP and Var (X |P) = nP(1 − P), where

P ∼ beta(α, β),
αβ
Var (E (X |P)) = Var (nP) = n2 .
(α + β)2 (α
+ β + 1)
nΓ(α + β) 1
Z
E (Var (X |P)) = nE (P(1 − P)) = p(1 − p)p α−1 (1 − p)β−1 dp
Γ(α)Γ(β) 0
Γ(α + β) Γ(α + 1)Γ(β + 1) nαβ
=n = .
Γ(α)Γ(β) Γ(α + β + 2) (α + β)(α + β + 1)
Thus we have
nαβ(α + β + n)
Var (X ) = .
51 / 97
(α + β)2 (α + β + 1)
Dirichlet-multinomial hierarchy
Suppose we have a dice of K sides. We toss the dice and the
probability of landing on side k is p(t = k|f ) = fi . We throw
the dice N times and obtain a set of results s = {s1 , s2 , · · · , sN }.
The joint probability is
Dirichlet-multinomial hierarchy
Suppose we have a dice of K sides. We toss the dice and the
probability of landing on side k is p(t = k|f ) = fi . We throw
the dice N times and obtain a set of results s = {s1 , s2 , · · · , sN }.
The joint probability is
N
Y K
Y
p(s|f ) = p(sn |f ) = f1n1 f2n2 · · · fKnK = fi ni ,
n=1 i=1

where ni is the number of i−th slides.

Dirichlet-multinomial hierarchy
Suppose we have a dice of K sides. We toss the dice and the
probability of landing on side k is p(t = k|f ) = fi . We throw
the dice N times and obtain a set of results s = {s1 , s2 , · · · , sN }.
The joint probability is
N
Y K
Y
p(s|f ) = p(sn |f ) = f1n1 f2n2 · · · fKnK = fi ni ,
n=1 i=1

where ni is the number of i−th slides.

Suppose that f is a Dirichlet distribution with α as hyper-
parameter. Then we express the probability of f as
K
Γ( K
P
k=1 αk )
fkαk −1 .
Y
Dir (f |α) = QK
k=1 Γ(αk ) k=1
52 / 97
Example Cont’d
If we want to estimate the parameter f based on the observation
of s, then we can express f in the following manner
Example Cont’d
If we want to estimate the parameter f based on the observation
of s, then we can express f in the following manner

p(s|f , α)p(f |α)

p(f |s, α) = R 1
0 p(s|f , α)p(f |α)df
PK
QK ni Γ( k=1 αk ) QK αk −1
i=1 if Q K k=1 fk
k=1 Γ(αk )
= R 1 QK ni Γ(PKk=1 αk ) QK αk −1
0 i=1 fi QK Γ(αk ) k=1 fk df
k=1
PK
Γ( k=1 αk ) QK nk +αk −1
QK k=1 fk
k=1 Γ(αk )
=
Γ( K
P
αk ) R 1 Q K nk +αk −1
QK k=1
Γ(αk ) 0 k=1 fk df
k=1
K
Γ( K
P
k=1 (nk + αk ))
fknk +αk −1
Y
= QK
k=1 Γ(nk + αk ) k=1
Example Cont’d
If we want to estimate the parameter f based on the observation
of s, then we can express f in the following manner

p(s|f , α)p(f |α) Notice that after estimating

p(f |s, α) = R 1
0 p(s|f , α)p(f |α)df f based on s observations, f
QK ni Γ(PKk=1 αk ) QK αk −1 is still a Dirichlet
i=1 fi QK Γ(αk ) k=1 fk distribution with parameter
k=1
=R Q PK
ni Γ( k=1 αk )
df α + n, where
1 K Q K αk −1
0 i=1 fi QK Γ(αk ) k=1 fk
k=1 n = (n1 , n2 , · · · , nk ). This
Γ( K
P
α k ) Q K n k +α k −1 property is known as
QK k=1 k=1 fk
k=1 Γ(αk ) conjugate priors. Based on
= PK
Γ( k=1 αk ) R 1 QK nk +αk −1
QK
0 k=1 fk df this property, estimating the
k=1 Γ(αk )
parameters fi after
K
Γ( K
P
(n k + α k )) Y n +α −1 observing N trials is a
= QK k=1 fk k k
Γ(nk + αk ) k=1 simple counting procedure.
53 / 97 k=1
Covariance and correlation
In this section, we discuss two numerical measures of the strength of
a relationship between two r.v.s, the covariance and correlation.

54 / 97
Covariance and correlation
In this section, we discuss two numerical measures of the strength of
a relationship between two r.v.s, the covariance and correlation.
The covariance and correlation of X and Y are the numbers
defined by

Cov (X , Y ) = E ((X − µX )(Y − µY )),

Cov (X , Y )
ρXY = ,
σX σY

where the value of ρXY is also called the correlation coefficient.

Cov (X , Y ) = E ((X − µX )(Y − µY )),

Cov (X , Y )
ρXY = ,
σX σY

where the value of ρXY is also called the correlation coefficient.

The large values of X tend to be observed with large values of Y

and small values of X with small values of Y , then Cov (X , Y )
with be positive.

Cov (X , Y ) = E ((X − µX )(Y − µY )),

Cov (X , Y )
ρXY = ,
σX σY

where the value of ρXY is also called the correlation coefficient.

The large values of X tend to be observed with large values of Y

and small values of X with small values of Y , then Cov (X , Y )
with be positive.
Thus the sign of Cov (X , Y ) gives information regarding the

relationship
54 / 97 between X and Y .
Theorem
For any r.v.s X and Y ,

Cov (X , Y ) = E (XY ) − µX µY
Theorem
For any r.v.s X and Y ,

Cov (X , Y ) = E (XY ) − µX µY

Proof.
Cov (X , Y ) = E ((X − µX )(Y − µY ))
= E (XY − µX Y − µY X + µX µY )
= E (XY ) − µX E (Y ) − µY E (X ) + µX µY
= E (XY ) − µX µY

The correlation is always between −1 and 1, with the values −1

and 1 indicating a perfect linear relationship between X and Y .
55 / 97
Example of correlation
Let the joint pdf of (X , Y ) be
f (x, y ) = 1, 0 < x < 1, x < y < x +1.
Example of correlation
Let the joint pdf of (X , Y ) be
f (x, y ) = 1, 0 < x < 1, x < y < x +1.
The marginal distribution of X is
uniform(0, 1) so µX = 21 and
1
σX2 = 12 .
Example of correlation
Let the joint pdf of (X , Y ) be
f (x, y ) = 1, 0 < x < 1, x < y < x +1.
The marginal distribution of X is
uniform(0, 1) so µX = 21 and
1
σX2 = 12 .
The marginal distribution of Y is
fY (y ) = y , 0 < y < 1 and
fY (y ) = 2 − y , 1 ≤ y < 2 so µY = 1
and σY2 = 16 .
Z 1 Z x+1 Z 1 Z 1
1 2 x+1 1 7
E (XY ) = xydxdy =xy |x dx = (x 2 + x)dx = .
0 x 0 2 0 2 12
7
Cov (X , Y ) − 1 ×1 1
ρXY = = 12 q 2 =√ .
σX σY 1 1 2
12 6
56 / 97
Theorem

If r.v.s X and Y are

independent r.v.s, then
Cov (X , Y ) = 0 and
ρXY = 0.

For X ∼ f (x − θ), symmetric around 0 with E (X ) = θ, and Y

is the indicator function Y = I (|X − θ| < 2), then X and Y are
obviously not independent. However,
Z ∞ Z 2
E (XY ) = xI (|X − θ| < 2)f (x − θ)dx = (t + θ)f (t)dt
−∞ −2
Z 2 Z 2
=θ f (t)dt = E (X )E (Y ), ( tf (t)dt = 0)
−2 −2
Thus, it is easy to find uncorrelated, dependent r.v.s.
Theorem
Proof.
If r.v.s X and Y are Since X and Y are independent, we
independent r.v.s, then have E (XY ) = E (X )E (Y ). Thus
Cov (X , Y ) = 0 and
Cov (X , Y ) = E (XY ) − E (X )E (Y ) = 0
ρXY = 0.
ρXY = 0

57 / 97
Theorem
Proof.
If r.v.s X and Y are Since X and Y are independent, we
independent r.v.s, then have E (XY ) = E (X )E (Y ). Thus
Cov (X , Y ) = 0 and
Cov (X , Y ) = E (XY ) − E (X )E (Y ) = 0
ρXY = 0.
ρXY = 0

For X ∼ f (x − θ), symmetric around 0 with E (X ) = θ, and Y

is the indicator function Y = I (|X − θ| < 2), then X and Y are
obviously not independent. However,
Z ∞ Z 2
E (XY ) = xI (|X − θ| < 2)f (x − θ)dx = (t + θ)f (t)dt
−∞ −2
Z 2 Z 2
=θ f (t)dt = E (X )E (Y ), ( tf (t)dt = 0)
−2 −2
Theorem
Proof.
If r.v.s X and Y are Since X and Y are independent, we
independent r.v.s, then have E (XY ) = E (X )E (Y ). Thus
Cov (X , Y ) = 0 and
Cov (X , Y ) = E (XY ) − E (X )E (Y ) = 0
ρXY = 0.
ρXY = 0

For X ∼ f (x − θ), symmetric around 0 with E (X ) = θ, and Y

is the indicator function Y = I (|X − θ| < 2), then X and Y are
obviously not independent. However,
Z ∞ Z 2
E (XY ) = xI (|X − θ| < 2)f (x − θ)dx = (t + θ)f (t)dt
−∞ −2
Z 2 Z 2
=θ f (t)dt = E (X )E (Y ), ( tf (t)dt = 0)
−2 −2
Thus, it is easy to find uncorrelated, dependent r.v.s.
57 / 97
Theorem
If X and Y are any two r.v.s, a and b are any two constants,
then

Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ) + 2abCov (X , Y ).

If X and Y are independent r.v.s, then

Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ).

Proof.
The mean of aX + bY is E (aX + bY ) = aµX + bµY . Thus,

Var (aX + bY ) = E ((aX + bY ) − (aµX + bµY ))2

= E ((aX − aµX ) + (bY − bµY ))2
= E (a2 (X − µX )2 + b 2 (Y − µY )2 + 2ab(X − µX )(Y − µY ))
= a2 Var (X ) + b 2 Var (Y ) + 2abCov (X , Y )
58 / 97
Theorem
If X and Y are any two r.v.s,
a. −1 ≤ ρXY ≤ 1.
b. |ρXY | = 1 if and only if there exist numbers a 6= 0 and b
such that P(Y = aX + b) = 1. If ρXY = 1, then a > 0; and
if ρXY = −1, then a < 0.

Proof.
Consider the function h(t) defined by

h(t) = E ((X − µX )t + (Y − µY ))2 .

Theorem
If X and Y are any two r.v.s,
a. −1 ≤ ρXY ≤ 1.
b. |ρXY | = 1 if and only if there exist numbers a 6= 0 and b
such that P(Y = aX + b) = 1. If ρXY = 1, then a > 0; and
if ρXY = −1, then a < 0.

Proof.
Consider the function h(t) defined by

h(t) = E ((X − µX )t + (Y − µY ))2 .

Expanding this expression, we obtain

h(t) = t 2 E (X − µX )2 + (Y − µY )2 + 2t(X − µX )(Y − µY )

= t 2 σX2 + 2tCov (X , Y ) + σY2 .

59 / 97
Proof Cont’d
∆ = (2Cov (X , Y ))2 − 4σX2 σY2 ≤ 0.
Proof Cont’d
∆ = (2Cov (X , Y ))2 − 4σX2 σY2 ≤ 0.

This is equivalent to
Cov (X , Y )
−σX σY ≤ Cov (X , Y ) ≤ σX σY , i.e., − 1 ≤ ρXY = ≤ 1.
σX σY
Proof Cont’d
∆ = (2Cov (X , Y ))2 − 4σX2 σY2 ≤ 0.

This is equivalent to
Cov (X , Y )
−σX σY ≤ Cov (X , Y ) ≤ σX σY , i.e., − 1 ≤ ρXY = ≤ 1.
σX σY
|ρXY | = 1 if and only if h(t) has a single root. But since
((X − µX )t + (Y − µY ))2 ≥ 0, the expected value h(t) =
E ((X − µX )t + (Y − µY ))2 = 0 if and only if

P ((X − µX )t + (Y − µY ))2 = 0 = 1.

Proof Cont’d
∆ = (2Cov (X , Y ))2 − 4σX2 σY2 ≤ 0.

P ((X − µX )t + (Y − µY ))2 = 0 = 1.

This is equivalent to

P (X − µX )t + (Y − µY ) = 0 = 1
60 / 97
Proof Cont’d

This is P(Y = aX + b) = 1 with a = −t and b = µX t + µY ,

where t is the root of h(t). Using the quadratic formula, we see
that this root is t = − Covσ(X2 ,Y ) . Thus a = −t has the same sign
X
as ρXY , proving the final assertion.

61 / 97
Proof Cont’d

This is P(Y = aX + b) = 1 with a = −t and b = µX t + µY ,

where t is the root of h(t). Using the quadratic formula, we see
that this root is t = − Covσ(X2 ,Y ) . Thus a = −t has the same sign
X
as ρXY , proving the final assertion.

If there is a line y = ax + b (a 6= 0), such that the values of (X , Y )

have a high probability of being near this line, then the correlation
between X and Y will be near 1 or −1.

61 / 97
Proof Cont’d

This is P(Y = aX + b) = 1 with a = −t and b = µX t + µY ,

where t is the root of h(t). Using the quadratic formula, we see
that this root is t = − Covσ(X2 ,Y ) . Thus a = −t has the same sign
X
as ρXY , proving the final assertion.

If there is a line y = ax + b (a 6= 0), such that the values of (X , Y )

have a high probability of being near this line, then the correlation
between X and Y will be near 1 or −1.
But if no such line exists, the correlation will be near 0. This is an
intuitive notion of the linear relationship that is being measured by
correlation.

61 / 97
Example
Let X have a uniform(−1, 1) distribution and Z have a
1
uniform(0, 10 ) distribution. Suppose X and Z are independent.
Let Y = X 2 + Z and consider the random vector (X , Y ). The
1
conditional distribution of Y given X = x is uniform(x 2 , x 2 + 10 ).
The joint pdf of (X , Y ) is
1
f (x, y ) = 5, −1 < x < 1, x 2 < y < x 2 + .
10
Example
Let X have a uniform(−1, 1) distribution and Z have a
1
uniform(0, 10 ) distribution. Suppose X and Z are independent.
Let Y = X 2 + Z and consider the random vector (X , Y ). The
1
conditional distribution of Y given X = x is uniform(x 2 , x 2 + 10 ).
The joint pdf of (X , Y ) is
1
f (x, y ) = 5, −1 < x < 1, x 2 < y < x 2 + .
10
There is a strong relationship between X
and Y , as indicated by the conditional
distribution of Y given X = x.
Example
Let X have a uniform(−1, 1) distribution and Z have a
1
uniform(0, 10 ) distribution. Suppose X and Z are independent.
Let Y = X 2 + Z and consider the random vector (X , Y ). The
1
conditional distribution of Y given X = x is uniform(x 2 , x 2 + 10 ).
The joint pdf of (X , Y ) is
1
f (x, y ) = 5, −1 < x < 1, x 2 < y < x 2 + .
10
There is a strong relationship between X
and Y , as indicated by the conditional
distribution of Y given X = x.
In fact, E (X ) = E (X 3 ) = 0, since X and Z are independent,
E (XZ ) = E (X )E (Z ).
Cov (X , Y ) = E (X (X 2 + Z )) − E (X )(E (X 2 + Z )) = 0, ρXY = 0.
62 / 97
Bivariate normal pdf
Let µX , µY ∈ R, σX , σY ∈ R+ and ρ ∈ [−1, 1] be five real
numbers. The bivariate normal pdf with means µX and µY ,
variances σX2 and σY2 , and correlation ρ is the bivariate pdf given
by
p
f (x, y ) = (2πσX σY 1 − ρ2 )−1

x−µX x−µX y −µY y −µY
− 1
( )2 −2ρ( )( )+( )2
2(1−ρ2 ) σX σX σY σY
· exp
Bivariate normal pdf
Let µX , µY ∈ R, σX , σY ∈ R+ and ρ ∈ [−1, 1] be five real
numbers. The bivariate normal pdf with means µX and µY ,
variances σX2 and σY2 , and correlation ρ is the bivariate pdf given
by
p
f (x, y ) = (2πσX σY 1 − ρ2 )−1

x−µX x−µX y −µY y −µY
− 1
( )2 −2ρ( )( )+( )2
2(1−ρ2 ) σX σX σY σY
· exp

The marginal distribution of X is N(µX , σX2 );

Bivariate normal pdf
Let µX , µY ∈ R, σX , σY ∈ R+ and ρ ∈ [−1, 1] be five real
numbers. The bivariate normal pdf with means µX and µY ,
variances σX2 and σY2 , and correlation ρ is the bivariate pdf given
by
p
f (x, y ) = (2πσX σY 1 − ρ2 )−1

x−µX x−µX y −µY y −µY
− 1
( )2 −2ρ( )( )+( )2
2(1−ρ2 ) σX σX σY σY
· exp

The marginal distribution of X is N(µX , σX2 );

The marginal distribution of Y is N(µY , σY2 );
Bivariate normal pdf
Let µX , µY ∈ R, σX , σY ∈ R+ and ρ ∈ [−1, 1] be five real
numbers. The bivariate normal pdf with means µX and µY ,
variances σX2 and σY2 , and correlation ρ is the bivariate pdf given
by
p
f (x, y ) = (2πσX σY 1 − ρ2 )−1

x−µX x−µX y −µY y −µY
− 1
( )2 −2ρ( )( )+( )2
2(1−ρ2 ) σX σX σY σY
· exp

The marginal distribution of X is N(µX , σX2 );

The marginal distribution of Y is N(µY , σY2 );
The correlation between X and Y is ρXY = ρ;
Bivariate normal pdf
Let µX , µY ∈ R, σX , σY ∈ R+ and ρ ∈ [−1, 1] be five real
numbers. The bivariate normal pdf with means µX and µY ,
variances σX2 and σY2 , and correlation ρ is the bivariate pdf given
by
p
f (x, y ) = (2πσX σY 1 − ρ2 )−1

x−µX x−µX y −µY y −µY
− 1
( )2 −2ρ( )( )+( )2
2(1−ρ2 ) σX σX σY σY
· exp

The marginal distribution of X is N(µX , σX2 );

The marginal distribution of Y is N(µY , σY2 );
The correlation between X and Y is ρXY = ρ;
For any constants a and b, the distribution of aX + bY is
N(aµX + bµY , a2 σX2 + b 2 σY2 + 2abρσX σY ).
63 / 97
Multivariate distributions
We will use boldface letters to denote multiple variates. Thus, we
write X to denote the r.v.s X1 , · · · , Xn and x to denote the sample
x1 , · · · , xn .
The random vector X = (X1 , · · · , Xn ) has a sample space that
is a subset of Rn .
If (X1 , · · · , Xn ) is a discrete random vector, then the joint
pmf of (X1 , · · · , Xn ) is the function defined by
f (x) = f (x1 , · · · , xn ) = P(X1 = x1 , · · · , Xn = xn )
X
for any A ⊂ Rn , P(X ∈ A) = f (x).
x∈A
Multivariate distributions
We will use boldface letters to denote multiple variates. Thus, we
write X to denote the r.v.s X1 , · · · , Xn and x to denote the sample
x1 , · · · , xn .
The random vector X = (X1 , · · · , Xn ) has a sample space that
is a subset of Rn .
If (X1 , · · · , Xn ) is a discrete random vector, then the joint
pmf of (X1 , · · · , Xn ) is the function defined by
f (x) = f (x1 , · · · , xn ) = P(X1 = x1 , · · · , Xn = xn )
X
for any A ⊂ Rn , P(X ∈ A) = f (x).
x∈A
If (X1 , · · · , Xn ) is a continuous random vector, then the
joint pdf of (X1 , · · · , Xn ) is the function defined by
Z Z
for any A ⊂ Rn , P(X ∈ A) = · · · f (x)dx.
A
64 / 97
Multivariate distributions Cont’d
Let g (x) = g (x1 , · · · , xn ) be a real-valued function defined on
the sample space of X. Then the expected value of g (X) is
Z ∞ Z ∞
E (g (X)) = ··· g (x)f (x)dx
−∞ −∞
X
E (g (X)) = g (x)f (x)
x∈Rn

65 / 97
Multivariate distributions Cont’d
Let g (x) = g (x1 , · · · , xn ) be a real-valued function defined on
the sample space of X. Then the expected value of g (X) is
Z ∞ Z ∞
E (g (X)) = ··· g (x)f (x)dx
−∞ −∞
X
E (g (X)) = g (x)f (x)
x∈Rn

Let (X1 , · · · , Xk ) be the first k coordinates of X =

(X1 , · · · , Xn ), is given by the pdf or pmf
Z ∞ Z ∞
f (x1 , · · · , xk ) = ··· f (x1 , · · · , xk )dxk+1 · · · dxn
−∞ −∞
X
f (x1 , · · · , xk ) = f (x1 , · · · , xk )
(xk+1 ,··· ,xn )∈Rn−k

65 / 97
Multinomial distribution
Multinomial theory

Let n and m be positive integers, and A be the set of vectors

xP= (x1 , · · · , xn ) such that each xi is a nonnegative integer and
n
i=1 xi = m, then for any real numbers p1 , · · · , pn
X m!
(p1 + · · · + pn )m = p x1 · · · pnxn .
x1 ! · · · Xn ! 1
x∈A

66 / 97
Multinomial distribution
Multinomial theory

Let n and m be positive integers, and A be the set of vectors

Let n and m be positive integers and p1 ,P · · · , pn be numbers

satisfying 0 ≤ pi ≤ 1, i = 1, · · · , n, and ni=1 pi = 1. Then
X = (X1 , · · · , Xn ) has a multinomial distribution with m trials
and cell probabilities p1 , · · · , pn if the joint pmf of X is small
n
m! Y pixi
f (x1 , · · · , xn ) = p1x1 · · · pnxn = m!
x1 ! · · · Xn ! xi !
i=1
66 / 97
Marginal pdf of multinomial distribution
X m!
f (xn ) = p x1 · · · pnxn
x1 ! · · · xn ! 1
(x1 ,··· ,xn−1 )∈B
X m! (m − xn )!(1 − pn )m−xn
= p1x1 · · · pnxn
x1 ! · · · xn ! (m − xn )!(1 − pn )m−xn
(x1 ,··· ,xn−1 )∈B

m! X (m − xn )! n−1 Y pi xi
= pnxn (1 − pn )m−xn
xn !(m − xn )! x1 ! · · · xn−1 ! 1 − pn
i=1
m!
= p xn (1 − pn )m−xn
xn !(m − xn )! n
Marginal pdf of multinomial distribution
X m!
f (xn ) = p x1 · · · pnxn
x1 ! · · · xn ! 1
(x1 ,··· ,xn−1 )∈B
X m! (m − xn )!(1 − pn )m−xn
= p1x1 · · · pnxn
x1 ! · · · xn ! (m − xn )!(1 − pn )m−xn
(x1 ,··· ,xn−1 )∈B

m! X (m − xn )! n−1 Y pi xi
= pnxn (1 − pn )m−xn
xn !(m − xn )! x1 ! · · · xn−1 ! 1 − pn
i=1
m!
= p xn (1 − pn )m−xn
xn !(m − xn )! n
Hence, the marginal distribution of Xn is binomial(m, pn ).
Marginal pdf of multinomial distribution
X m!
f (xn ) = p x1 · · · pnxn
x1 ! · · · xn ! 1
(x1 ,··· ,xn−1 )∈B
X m! (m − xn )!(1 − pn )m−xn
= p1x1 · · · pnxn
x1 ! · · · xn ! (m − xn )!(1 − pn )m−xn
(x1 ,··· ,xn−1 )∈B

m! X (m − xn )! n−1 Y pi xi
= pnxn (1 − pn )m−xn
xn !(m − xn )! x1 ! · · · xn−1 ! 1 − pn
i=1
m!
= p xn (1 − pn )m−xn
xn !(m − xn )! n
Hence, the marginal distribution of Xn is binomial(m, pn ).
Similar arguments show that each of the other coordinates is
marginally binomially distributed.

67 / 97
Mutually independent random vectors

Let (X1 , · · · , Xn ) be random vectors with joint pdf or pm-

f f (x1 , · · · , xn ). Let fXi (xi ) denote the marginal pdf or pmf of
Xi . Then (X1 , · · · , Xn ) are called mutually independent random
vectors if, for every (x1 , · · · , xn )
n
Y
f (x1 , · · · , xn ) = fXi (xi ).
i=1

If Xi are all one-dimensional, then (X1 , · · · , Xn ) are called mu-

tually independent random variables.

68 / 97
Conditional pdf of multinomial distribution
f (x1 , · · · , xn )
f (x1 , · · · , xn−1 |xn ) =
f (xn )
m! x1 xn n−1
x1 !···xn ! p1 · · · pn (m − xn )! Y pi xi
= m! xn =
xn !(m−xn )! pn (1 − pn )
m−xn x1 ! · · · xn−1 ! 1 − pn
i=1
Conditional pdf of multinomial distribution
f (x1 , · · · , xn )
f (x1 , · · · , xn−1 |xn ) =
f (xn )
m! x1 xn n−1
x1 !···xn ! p1 · · · pn (m − xn )! Y pi xi
= m! xn =
xn !(m−xn )! pn (1 − pn )
m−xn x1 ! · · · xn−1 ! 1 − pn
i=1

This is the pmf of a multinomial distribution with m − xn

p1 pn−1
trials and cell probabilities 1−p n
, · · · , 1−p n
.
Conditional pdf of multinomial distribution
f (x1 , · · · , xn )
f (x1 , · · · , xn−1 |xn ) =
f (xn )
m! x1 xn n−1
x1 !···xn ! p1 · · · pn (m − xn )! Y pi xi
= m! xn =
xn !(m−xn )! pn (1 − pn )
m−xn x1 ! · · · xn−1 ! 1 − pn
i=1

This is the pmf of a multinomial distribution with m − xn

p1 pn−1
trials and cell probabilities 1−p n
, · · · , 1−p n
.
The conditional distribution of any subset of the coordinates
of X1 , · · · , Xn given the values of the rest of the coordinates
is a multinomial distribution.
Conditional pdf of multinomial distribution
f (x1 , · · · , xn )
f (x1 , · · · , xn−1 |xn ) =
f (xn )
m! x1 xn n−1
x1 !···xn ! p1 · · · pn (m − xn )! Y pi xi
= m! xn =
xn !(m−xn )! pn (1 − pn )
m−xn x1 ! · · · xn−1 ! 1 − pn
i=1

This is the pmf of a multinomial distribution with m − xn

p1 pn−1
trials and cell probabilities 1−p n
, · · · , 1−p n
.
The conditional distribution of any subset of the coordinates
of X1 , · · · , Xn given the values of the rest of the coordinates
is a multinomial distribution.
We see from the conditional distribution that the
coordinates of the vector X1 , · · · , Xn are related. It turns
out that all of the pairwise covariances are negative and are
given by Cov (Xi , Xj ) = E [(Xi − pi )(Xj − pj )] = −mpi pj .
69 / 97
Mgf of mutually independent random variables

Let (X1 , · · · , Xn ) be mutually independent r.v.s.

Mgf of mutually independent random variables

Let (X1 , · · · , Xn ) be mutually independent r.v.s.

Let g1 , · · · , gn be real-valued functions such that gi (xi ) is a
function only of xi , i = 1, · · · , n. Then
n
Y n
Y
E( gi (Xi )) = E (gi (Xi )).
i=1 i=1
Mgf of mutually independent random variables

Let (X1 , · · · , Xn ) be mutually independent r.v.s.

Let g1 , · · · , gn be real-valued functions such that gi (xi ) is a
function only of xi , i = 1, · · · , n. Then
n
Y n
Y
E( gi (Xi )) = E (gi (Xi )).
i=1 i=1

Let MX1 (t), · · · , MXn (t) be mgfs, and Z = X1 + · · · + Xn .

Then the mgf of Z is
n
Y
MZ (t) = MXi (t).
i=1

70 / 97
Mgf of mutually independent random variables Cont’d
Corollary

Let (X1 , · · · , Xn ) be mutually independent r.v.s. Let

MX1 (t), · ·P
· , MXn (t) be mgfs. Let ai and bi be fixed constants,
and Z = ni=1 (ai Xi + bi ). Then the mgf of Z is
n
P Y
MZ (t) = e t( bi )
MXi (t).
i=1

71 / 97
Mgf of mutually independent random variables Cont’d
Corollary

Let (X1 , · · · , Xn ) be mutually independent r.v.s. Let

MX1 (t), · ·P
· , MXn (t) be mgfs. Let ai and bi be fixed constants,
and Z = ni=1 (ai Xi + bi ). Then the mgf of Z is
n
P Y
MZ (t) = e t( bi )
MXi (t).
i=1

Let (X1 , · · · , Xn ) be mutually independent r.v.s, and the dis-

tribution of Xi is Gamma(αi , β) with mgf M(t) = (1 − βt)αi .
Thus, the mgf of Z = X1 + · · · + Xn is
n
Y n
Y Pn
MZ (t) = MXi (t) = (1 − βt)αi = (1 − βt)−( i=1 αi )
.
i=1 i=1 Pn
This is the mgf of a Gamma( i=1 αi , β) distribution.
71 / 97
Linear combination of independent normal r.v.s

Let (X1 , · · · , Xn ) be mutually independent r.v.s. with Xi ∼

N(µi , σi2 ). Let ai and bi be fixed constants,
n
X n
X n
X
Z= (ai Xi + bi ) ∼ N( (ai µi + bi ), ai2 σi2 ).
i=1 i=1 i=1

A linear combination of independent normal r.v.s is normally

distributed.
It can be proved by the above corollary.

72 / 97
Generalization

Let (X1 , · · · , Xn ) be random vectors. Then X1 , · · · , Xn are mu-

tually independent random vectors if and only if there exist func-
tions gi (xi ) such that the joint pdf or pmf of (X1 , · · · , Xn ) can
be written as n Y
f (x1 , · · · , xn ) = gi (xi ).
i=1

73 / 97
Generalization

Let (X1 , · · · , Xn ) be random vectors. Then X1 , · · · , Xn are mu-

tually independent random vectors if and only if there exist func-
tions gi (xi ) such that the joint pdf or pmf of (X1 , · · · , Xn ) can
be written as n Y
f (x1 , · · · , xn ) = gi (xi ).
i=1

Let X1 , · · · , Xn be random vectors. Let gi (xi ) be a function

only of xi . Then the random variables Ui = gi (Xi ) are mutually
independent

73 / 97
Tail bounds

Question
Consider the experiment of tossing a fair coin n times. What is
the probability that the number of heads exceeds 3n4 .

Notes
The tail bounds of a r.v. X are concerned with the probability
that it deviates significantly from its expected value E (X ) on a
run of the experiment

74 / 97
Markov inequality
Markov inequality

If X is any r.v. and 0 < a < +∞, then

E (X ) 1
P(X > a) ≤ or P(X > aE (X )) ≤
a a

Proof.
Z Z
X E (X )
P(X > a) = dx ≤ dx = .
X >a a a

Example

3n n/2 2
P(X > )≤ =
4 3n/4 3
75 / 97
Chebyshev’s inequality
If r.v. X is a random variable and let g (x) be a nonnegative
function. Then, for any r > 0,

E (g (X ))
P(g (X ) ≥ r ) ≤ .
r

Proof.

Z ∞ Z
E (g (X )) = g (x)fX (x)dx ≥ g (x)fX (x)dx
−∞ x:g (x)≥r
Z
≥r fX (x)dx = rP(g (X ) ≥ r )
x:g (x)≥r

Rearranging now produces the desired inequality.

76 / 97
Widespread used Chebyshev’s inequality
2
Let g (x) = (x−µ)
σ2
, where µ = E (X ) and σ 2 = Var (X ). For
convenience write r = t 2 . Then
2
(x − µ)2 2 E ( (x−µ)
σ2 ) 1
P( 2
≥ t ) ≤ 2
= 2.
σ t t
Widespread used Chebyshev’s inequality
2
Let g (x) = (x−µ)
σ2
, where µ = E (X ) and σ 2 = Var (X ). For
convenience write r = t 2 . Then
2
(x − µ)2 2 E ( (x−µ)
σ2 ) 1
P( 2
≥ t ) ≤ 2
= 2.
σ t t
1 1
i.e., P(|x − µ| ≥ tσ) ≤ t2
and P(|x − µ| ≤ tσ) ≥ 1 − t2
.
Widespread used Chebyshev’s inequality
2
Let g (x) = (x−µ)
σ2
, where µ = E (X ) and σ 2 = Var (X ). For
convenience write r = t 2 . Then
2
(x − µ)2 2 E ( (x−µ)
σ2 ) 1
P( 2
≥ t ) ≤ 2
= 2.
σ t t
i.e., P(|x − µ| ≥ tσ) ≤ t12 and P(|x − µ| ≤ tσ) ≥ 1 − 1
t2
.
For example, tossing a fair coin n times.
3n n n Var (X ) 4
P(X > ) < P(|X − | > ) ≤ n 2 = .
4 2 4 (4) n
Widespread used Chebyshev’s inequality
2
Let g (x) = (x−µ)
σ2
, where µ = E (X ) and σ 2 = Var (X ). For
convenience write r = t 2 . Then
2
(x − µ)2 2 E ( (x−µ)
σ2 ) 1
P( 2
≥ t ) ≤ 2
= 2.
σ t t
i.e., P(|x − µ| ≥ tσ) ≤ t12 and P(|x − µ| ≤ tσ) ≥ 1 − 1
t2
.
For example, tossing a fair coin n times.
3n n n Var (X ) 4
P(X > ) < P(|X − | > ) ≤ n 2 = .
4 2 4 (4) n
Widespread used Chebyshev’s inequality
2
Let g (x) = (x−µ)
σ2
, where µ = E (X ) and σ 2 = Var (X ). For
convenience write r = t 2 . Then
2
(x − µ)2 2 E ( (x−µ)
σ2 ) 1
P( 2
≥ t ) ≤ 2
= 2.
σ t t
i.e., P(|x − µ| ≥ tσ) ≤ t12 and P(|x − µ| ≤ tσ) ≥ 1 − 1
t2
.
For example, tossing a fair coin n times.
3n n n Var (X ) 4
P(X > ) < P(|X − | > ) ≤ n 2 = .
4 2 4 (4) n
Many other probability inequalities exist similar in spirit to
Chebyshev’s inequality, e.g.,

MX (t)
P(X ≥ a) ≤ .
77 / 97
e at
Chernoff bound
Deriving Chernoff bound

Let Xi be a sequence of independent Pr.v.s with P(Xi = 1) = pi

and P(Xi = 0) = 1 − pi . r.v. X = ni=1 Xi .
µ
P(X < (1 − δ)µ) < e −δ Pn
(1−δ) (1−δ) , where µ = i=1 pi
P(X < (1 − δ)µ) < exp (−µδ 2 /2)

Proof.
For t > 0,

P(X < (1 − δ)µ) = P exp (−tX ) > exp (−t(1 − δ)µ)
Qn
E (exp (−tXi ))
< i=1 .
exp (−t(1 − δ)µ)

78 / 97
Proof of Chernoff bound Cont.d
Note that 1 − x < e −x if x > 0,
n
Y n
Y n
Y
E (exp (−tXi )) = (pi e −t + (1 − pi )) = (1 − pi (1 − e −t ))
i=1 i=1 i=1
n
Y
< exp (pi (e −t − 1)) = exp (µ(e −t − 1)).
i=1
Proof of Chernoff bound Cont.d
Note that 1 − x < e −x if x > 0,
n
Y n
Y n
Y
E (exp (−tXi )) = (pi e −t + (1 − pi )) = (1 − pi (1 − e −t ))
i=1 i=1 i=1
n
Y
< exp (pi (e −t − 1)) = exp (µ(e −t − 1)).
i=1
That is
exp (µ(e −t − 1))
P(X < (1 − δ)µ) < = exp (µ(e (−t) + t − tδ − 1))
exp (−t(1 − δ)µ)
Proof of Chernoff bound Cont.d
Note that 1 − x < e −x if x > 0,
n
Y n
Y n
Y
E (exp (−tXi )) = (pi e −t + (1 − pi )) = (1 − pi (1 − e −t ))
i=1 i=1 i=1
n
Y
< exp (pi (e −t − 1)) = exp (µ(e −t − 1)).
i=1
That is
exp (µ(e −t − 1))
P(X < (1 − δ)µ) < = exp (µ(e (−t) + t − tδ − 1))
exp (−t(1 − δ)µ)
Now its time to choose t to make the bound as tight as possible.
Taking the derivative of µ(e (−t) +t−tδ−1) and setting −e (−t) +
1 − δ = 0. We have t = ln (1/1 − δ),
Proof of Chernoff bound Cont.d
Note that 1 − x < e −x if x > 0,
n
Y n
Y n
Y
E (exp (−tXi )) = (pi e −t + (1 − pi )) = (1 − pi (1 − e −t ))
i=1 i=1 i=1
n
Y
< exp (pi (e −t − 1)) = exp (µ(e −t − 1)).
i=1
That is
exp (µ(e −t − 1))
P(X < (1 − δ)µ) < = exp (µ(e (−t) + t − tδ − 1))
exp (−t(1 − δ)µ)
Now its time to choose t to make the bound as tight as possible.
Taking the derivative of µ(e (−t) +t−tδ−1) and setting −e (−t) +
1 − δ = 0. We have t = ln (1/1 − δ),
e −δ µ
P(X < (1 − δ)µ) < .
(1 − δ)(1−δ)
79 / 97
Proof of second statement
To get the simpler form of the bound, we need to get rid of the
clumsy term (1 − δ)(1−δ) .
Proof of second statement
To get the simpler form of the bound, we need to get rid of the
clumsy term (1 − δ)(1−δ) . Note that
X δi δ2
(1 − δ) ln (1 − δ) = (1 − δ)( − ) > −δ +
i 2
i=1

Thus, we have
δ2
(1 − δ)(1−δ) > exp (−δ + )
2
Furthermore,
e −δ µ
P(X < (1 − δ)µ) < (1−δ)
(1 − δ)
e −δ µ
< δ2
= exp (−µδ 2 /2).
80 / 97
e (−δ+ 2 )
Chernoff bound (Upper tail)
Theorem
Let Xi be a sequence of independent Pr.v.s with P(Xi P
= 1) = pi
and P(Xi = 0) = 1 − pi . r.v. X = ni=1 Xi and µ = ni=1 pi .
µ
P(X > (1 + δ)µ) < eδ
(1+δ)(1+δ)
P(X > (1 + δ)µ) < exp (−µδ 2 /4)

81 / 97
Chernoff bound (Upper tail)
Theorem
Let Xi be a sequence of independent Pr.v.s with P(Xi P
= 1) = pi
and P(Xi = 0) = 1 − pi . r.v. X = ni=1 Xi and µ = ni=1 pi .
µ
P(X > (1 + δ)µ) < eδ
(1+δ)(1+δ)
P(X > (1 + δ)µ) < exp (−µδ 2 /4)

Example
n
Let X be # heads in n tosses of a fair coin, then µ = 2 and
δ = 21 , we have
3n 1 n n
P(X > ) = P(X > (1 + ) ) < exp (− δ 2 /4) = exp (−n/32)
4 2 2 2
If we toss the coin 1000 times, the probability is less than
exp (−125/4).
81 / 97
Hoeffding inequality
Let X1 , X2 , · · · , Xn be i.i.d. observations such that E (Xi ) = µ
and a ≤ Xi ≤ b. Then, for any > 0,

P(|X − µ| > ) < 2 exp (−2n2 /(b − a)2 )

Example

If X1 , X2 , · · · , Xn ∼ Bernoulli(p)
In terms of Hoeffding inequality, we have

P(|X − p| > ) ≤ 2 exp (−2n2 )

If p = 0.5,
1 1
P(X − 0.5 > ) < P(|X − 0.5| > ) ≤ 2 exp (−8n).
4 4
82 / 97
Outline
Joint and Marginal Distributions
Conditional Distribution and Independence
Bivariate Transformations
Hierarchical Models and Mixture Distributions
Hierarchical Models and Mixture Distributions
Covariance and Correlation
Multivariate Distributions
Inequalities
Numerical Inequalities
Functional Inequalities
Take-aways
83 / 97
Lemma
Let a and b be any positive numbers, and let p and q be any
positive numbers satisfying p1 + q1 = 1. Then

1 p 1 q
a + b ≥ ab,
p q
with equality if and only if ap = b q .
Lemma
Let a and b be any positive numbers, and let p and q be any
positive numbers satisfying p1 + q1 = 1. Then

1 p 1 q
a + b ≥ ab,
p q
with equality if and only if ap = b q .
Proof.
Fix b, and consider the function
1 p 1 q
g (a) = a + b − ab.
p q
Lemma
Let a and b be any positive numbers, and let p and q be any
positive numbers satisfying p1 + q1 = 1. Then

1 p 1 q
a + b ≥ ab,
p q
with equality if and only if ap = b q .
Proof.
Fix b, and consider the function
1 p 1 q
g (a) = a + b − ab.
p q

To minimize g (a), differentiate and set equal to 0:

d
g (a) = 0 ⇒ ap−1 − b = 0 ⇒ b = ap−1 .
da
84 / 97
Proof cont’d

A check of the second derivative will establish that this is indeed

a minimum. Note that (p − 1)q = p, the value of the function
at the minimum is
1 p 1 p−1 q 1 1
a + (a ) − aap−1 = ap + ap − ap = 0.
p q p q

Since the minimum is unique, equality holds only if ap−1 = b,

which is equivalent to ap = b q .

85 / 97
Proof cont’d

A check of the second derivative will establish that this is indeed

a minimum. Note that (p − 1)q = p, the value of the function
at the minimum is
1 p 1 p−1 q 1 1
a + (a ) − aap−1 = ap + ap − ap = 0.
p q p q

Since the minimum is unique, equality holds only if ap−1 = b,

which is equivalent to ap = b q .

The inequalities in this subsection, although often stated in terms of

expectations, rely mainly on properties of numbers.

85 / 97
Proof cont’d

A check of the second derivative will establish that this is indeed

a minimum. Note that (p − 1)q = p, the value of the function
at the minimum is
1 p 1 p−1 q 1 1
a + (a ) − aap−1 = ap + ap − ap = 0.
p q p q

Since the minimum is unique, equality holds only if ap−1 = b,

which is equivalent to ap = b q .

The inequalities in this subsection, although often stated in terms of

expectations, rely mainly on properties of numbers. In fact, they are
all based on the following simple lemma.

85 / 97
Hölder’s inequality
1
Let X and Y be any two r.v.s, and let p and q satisfy p + q1 = 1.
Then 1 1
|E (XY )| ≤ E |XY | ≤ (E |X |p ) p (E |Y |q ) q .
Hölder’s inequality
1
Let X and Y be any two r.v.s, and let p and q satisfy p + q1 = 1.
Then 1 1
|E (XY )| ≤ E |XY | ≤ (E |X |p ) p (E |Y |q ) q .
Proof.
The first inequality follows from −|XY | ≤ XY ≤ |XY |. To
prove the second inequality, define
|X | |Y |
a= 1 and b = 1 .
(E |X |p ) p (E |Y |q ) q
Hölder’s inequality
1
Let X and Y be any two r.v.s, and let p and q satisfy p + q1 = 1.
Then 1 1
|E (XY )| ≤ E |XY | ≤ (E |X |p ) p (E |Y |q ) q .
Proof.
The first inequality follows from −|XY | ≤ XY ≤ |XY |. To
prove the second inequality, define
|X | |Y |
a= 1 and b = 1 .
(E |X |p ) p (E |Y |q ) q
Applying the above lemma,
1 |X |p 1 |Y |q |XY |
p
+ q
≥ 1 1 .
p (E |X | ) q (E |Y | ) (E |X |p ) p (E |Y |q ) q
Hölder’s inequality
1
Let X and Y be any two r.v.s, and let p and q satisfy p + q1 = 1.
Then 1 1
|E (XY )| ≤ E |XY | ≤ (E |X |p ) p (E |Y |q ) q .
Proof.
The first inequality follows from −|XY | ≤ XY ≤ |XY |. To
prove the second inequality, define
|X | |Y |
a= 1 and b = 1 .
(E |X |p ) p (E |Y |q ) q
Applying the above lemma,
1 |X |p 1 |Y |q |XY |
p
+ q
≥ 1 1 .
p (E |X | ) q (E |Y | ) (E |X |p ) p (E |Y |q ) q
Now take expectations of both sides. The expectation of the
left-hand side is 1, and rearrangement gives the conclusion.
86 / 97
Cauchy-Schwarz inequality
For any two r.v.s X and Y
1 1
|E (XY )| ≤ E |XY | ≤ (E |X |2 ) 2 (E |Y |2 ) 2 .
Perhaps the most famous special case of Hölder’s inequality is
that for which p = q = 2.

87 / 97
Cauchy-Schwarz inequality
For any two r.v.s X and Y
1 1
|E (XY )| ≤ E |XY | ≤ (E |X |2 ) 2 (E |Y |2 ) 2 .
Perhaps the most famous special case of Hölder’s inequality is
that for which p = q = 2.

Example: covariance inequality

If X and Y have means µX and µY , and variances σX2 and σY2 ,

respectively, we can apply the Cauchy-Schwarz inequality to get
1 1
E |(X − µX )(Y − µY )| ≤ {E (X − µX )2 } 2 {E (Y − µY )2 } 2 .

Squaring both sides and using statistical notation, we have

(Cov (X , Y ))2 ≤ σX2 σY2 .

87 / 97
Special cases of Hölder’s inequality
If we set Y = 1, we get
1
E |X | ≤ (E |X |p ) p , 1 < p < ∞.
Special cases of Hölder’s inequality
If we set Y = 1, we get
1
E |X | ≤ (E |X |p ) p , 1 < p < ∞.

For 1 < r < p, if we replace |X | by |X |r , we obtain

1
E |X |r ≤ (E |X |pr ) p , 1 < p < ∞.
Special cases of Hölder’s inequality
If we set Y = 1, we get
1
E |X | ≤ (E |X |p ) p , 1 < p < ∞.

For 1 < r < p, if we replace |X | by |X |r , we obtain

1
E |X |r ≤ (E |X |pr ) p , 1 < p < ∞.

Now write s = pr (note that s > r ) and rearrange terms to get

1 1
{E |X |r } r ≤ (E |X |s ) s , 1 < r < s < ∞.

which is known as Liapounov’s inequality.

88 / 97
Minkowski’s inequality
Let X and Y be any two r.v.s. Then for 1 ≤ p < ∞,
1 1 1
{E |X + Y |p } p ≤ {E |X |p } p + {E |Y |p } p .
Minkowski’s inequality
Let X and Y be any two r.v.s. Then for 1 ≤ p < ∞,
1 1 1
{E |X + Y |p } p ≤ {E |X |p } p + {E |Y |p } p .
Proof:

E |X + Y |p = E |X + Y ||X + Y |p−1

≤ E |X ||X + Y |p−1 + E |Y ||X + Y |p−1 ,

where we have used the fact that |X + Y | ≤ |X | + |Y |.

Minkowski’s inequality
Let X and Y be any two r.v.s. Then for 1 ≤ p < ∞,
1 1 1
{E |X + Y |p } p ≤ {E |X |p } p + {E |Y |p } p .
Proof:

E |X + Y |p = E |X + Y ||X + Y |p−1

≤ E |X ||X + Y |p−1 + E |Y ||X + Y |p−1 ,

where we have used the fact that |X + Y | ≤ |X | + |Y |.

Now apply Hölder’s inequality to each expectation on the right-
hand side of above inequality to get
1 1 1 1
E |X + Y |p ≤ {E |X |p } p {E |X + Y |q(p−1) } q + {E |Y |p } p {E |X + Y |q(p−1) } q ,
1
Now divide through by {E |X + Y |q(p−1) } q , noting that q(p −
1) = p and 1 − q1 = p1 , we obtain the conclusion.
89 / 97
A new version of Hölder’s inequality

For numbers ai and bi , i = 1, 2, · · · , n, the inequality

n n n
1 X 1 1 1
aip biq
X X
|ai bi | ≤ p q
, + = 1.
p q
i=1 i=1 i=1
A new version of Hölder’s inequality

For numbers ai and bi , i = 1, 2, · · · , n, the inequality

n n n
1 X 1 1 1
aip biq
X X
|ai bi | ≤ p q
, + = 1.
p q
i=1 i=1 i=1

To establish the conclusion occurs when bi = 1, p = q = 2. We

then have
n n
1 X 2 X
|ai | ≤ ai2 .
n
i=1 i=1

90 / 97
Outline
Joint and Marginal Distributions
Conditional Distribution and Independence
Bivariate Transformations
Hierarchical Models and Mixture Distributions
Hierarchical Models and Mixture Distributions
Covariance and Correlation
Multivariate Distributions
Inequalities
Numerical Inequalities
Functional Inequalities
Take-aways
91 / 97
Convex inequality
A function g (x) is convex if

g (λx + (1 − λ)y ) ≤ λg (x) + (1 − λ)g (y ),

for all x and y , and 0 < λ < 1. The function g (x) is concave if
−g (x) is convex.
Informally, we can think of convex functions
as functions that “hold water”-that is, they
are bowl-shaped (g (x) = x 2 is convex),
while concave functions “spill
water”(g (x) = log x is concave).
More formally, convex functions lie below lines connecting any
two points. As λ from 0 to 1, λg (x1 ) + (1 − λ)g (x2 ) defines
a line connecting g (x1 ) and g (x2 ). This line lies above g (x) if
g (x) is convex.
92 / 97
Jensen’s inequality
For any r.v. X , if g (x) is a convex, then

E (g (X )) ≥ g (E (X )).

Equality holds if and only if, for every line a + bx that a tangent
to g (x) at x = E (X ), P(g (X ) = a + bX ) = 1.
Jensen’s inequality
For any r.v. X , if g (x) is a convex, then

E (g (X )) ≥ g (E (X )).

Equality holds if and only if, for every line a + bx that a tangent
to g (x) at x = E (X ), P(g (X ) = a + bX ) = 1.
Proof.
To establish the inequality, let l(x) be a tangent line to g (x) at
the point g (E (X )).
Jensen’s inequality
For any r.v. X , if g (x) is a convex, then

E (g (X )) ≥ g (E (X )).

Equality holds if and only if, for every line a + bx that a tangent
to g (x) at x = E (X ), P(g (X ) = a + bX ) = 1.
Proof.
To establish the inequality, let l(x) be a tangent line to g (x) at
the point g (E (X )). Write l(x) = a + bx for some a and b.
Jensen’s inequality
For any r.v. X , if g (x) is a convex, then

E (g (X )) ≥ g (E (X )).

Equality holds if and only if, for every line a + bx that a tangent
to g (x) at x = E (X ), P(g (X ) = a + bX ) = 1.
Proof.
To establish the inequality, let l(x) be a tangent line to g (x) at
the point g (E (X )). Write l(x) = a + bx for some a and b.
Now, by the convexity of g we have g (x) ≥ a + bx. Since
expectations preserve inequalities,

E (g (X )) ≥ E (a + bX ) = a + bE (X ) = l(E (X )) = g (E (X )).
Jensen’s inequality
For any r.v. X , if g (x) is a convex, then

E (g (X )) ≥ g (E (X )).

Equality holds if and only if, for every line a + bx that a tangent
to g (x) at x = E (X ), P(g (X ) = a + bX ) = 1.
Proof.
To establish the inequality, let l(x) be a tangent line to g (x) at
the point g (E (X )). Write l(x) = a + bx for some a and b.
Now, by the convexity of g we have g (x) ≥ a + bx. Since
expectations preserve inequalities,

E (g (X )) ≥ E (a + bX ) = a + bE (X ) = l(E (X )) = g (E (X )).
One immediate application of Jensen’s Inequality shows that
E (X 2 ) ≥ (E (X ))2 , since g (x) = x 2 is convex.
93 / 97
An inequality for means
Jensen’s inequality can be used to prove an inequality between
three different kinds of means. If a1 , · · · , an are positive num-
bers, define
1
aA = (a1 + a2 + · · · + an ), (arithmetic mean)
n
1
aG = a1 · a2 · · · · · an n , (geometric mean)
1
aH = 1 1 1 1 .(harmonic mean)
(
n a1 + a2 + · · · + an )

An inequality relating these means is

aH ≤ aG ≤ aA .

To apply Jensen’s inequality, let X be a r.v. with range

a1 , · · · , an and P(X = ai ) = n1 , i = 1, · · · , n.
94 / 97
An inequality for means Cont’d
Since log x is a concave function, Jensen’s inequality shows that
E (log X ) ≤ log E (X ); hence
n
1X
log aG = log ai = E (log X ) ≤ log E (X ) = log aA ,
n
i=1

So aG ≤ aA .
Now again use the fact that log x is concave to get
n
1 1X 1 1 1
log = log = log E ( ) ≥ E (log ) = −E (log X ).
aH n ai X X
i=1

Since E (log X ) = log aG , it then follows that log a1H ≥ log a1G ,
or aG ≥ aH .
95 / 97
Covariance inequality
If X is a r.v. with finite mean µ and g (x) is a nondecreasing function,
then E (g (X )(X − µ)) ≥ 0. Since

E (g (X )(X − µ)) = E (g (X )(X − µ)[I(−∞,0) (X − µ) + I(0,∞) (X − µ)])

≥ E (g (µ)(X − µ)I(−∞,0) (X − µ)) + E (g (µ)(X − µ)I(0,∞) (X − µ))
= g (µ)E (X − µ) = 0.

Theorem

If X is a r.v., g (x) and h(x) are any functions s.t. E (g (X )), E (h(X )),
and E (g (X )h(X )) exist.
Covariance inequality
If X is a r.v. with finite mean µ and g (x) is a nondecreasing function,
then E (g (X )(X − µ)) ≥ 0. Since

E (g (X )(X − µ)) = E (g (X )(X − µ)[I(−∞,0) (X − µ) + I(0,∞) (X − µ)])

≥ E (g (µ)(X − µ)I(−∞,0) (X − µ)) + E (g (µ)(X − µ)I(0,∞) (X − µ))
= g (µ)E (X − µ) = 0.

Theorem

If X is a r.v., g (x) and h(x) are any functions s.t. E (g (X )), E (h(X )),
and E (g (X )h(X )) exist.
If g (x) is nondecreasing and h(x) is nonincreasing, then
E (g (X )h(X )) ≤ E (g (X ))E (h(X )).
Covariance inequality
If X is a r.v. with finite mean µ and g (x) is a nondecreasing function,
then E (g (X )(X − µ)) ≥ 0. Since

E (g (X )(X − µ)) = E (g (X )(X − µ)[I(−∞,0) (X − µ) + I(0,∞) (X − µ)])

≥ E (g (µ)(X − µ)I(−∞,0) (X − µ)) + E (g (µ)(X − µ)I(0,∞) (X − µ))
= g (µ)E (X − µ) = 0.

Theorem

If X is a r.v., g (x) and h(x) are any functions s.t. E (g (X )), E (h(X )),
and E (g (X )h(X )) exist.
If g (x) is nondecreasing and h(x) is nonincreasing, then
E (g (X )h(X )) ≤ E (g (X ))E (h(X )).
If g (x) and h(x) are nondecreasing or nonincreasing, then
E (g (X )h(X )) ≥ E (g (X ))E (h(X )).
96 / 97
Take-aways

Conclusions
Joint and marginal distributions
Continuous distributions
Independence
Bivariate transformation
Hierarchical models and mixture distributions
Multivariate distribution
Inequalities

97 / 97

国际象棋入门与提高
No ratings yet
国际象棋入门与提高
253 pages
DTL - IIT - Paper2 2024
No ratings yet
DTL - IIT - Paper2 2024
1 page
List of Topics For Programming Competitions
100% (1)
List of Topics For Programming Competitions
5 pages
Jointly Distributed Random Variables: Jeff Chak Fu WONG
No ratings yet
Jointly Distributed Random Variables: Jeff Chak Fu WONG
44 pages
Chap 3.1
No ratings yet
Chap 3.1
25 pages
ST 610 Lect 4
No ratings yet
ST 610 Lect 4
49 pages
Chap 7
No ratings yet
Chap 7
22 pages
Notes 05
No ratings yet
Notes 05
19 pages
Notes 05
No ratings yet
Notes 05
20 pages
Chapter 4: Multiple Random Variables
No ratings yet
Chapter 4: Multiple Random Variables
34 pages
M 4
No ratings yet
M 4
18 pages
Random Variables - 2D
No ratings yet
Random Variables - 2D
17 pages
Joint Probability Distributions Guide
No ratings yet
Joint Probability Distributions Guide
109 pages
Slide 5 01
No ratings yet
Slide 5 01
26 pages
S201, Lec 2
No ratings yet
S201, Lec 2
48 pages
Probability 2.2 EdX
No ratings yet
Probability 2.2 EdX
110 pages
Elec2600 Lecture Part III H
No ratings yet
Elec2600 Lecture Part III H
237 pages
Joint Probability Distribution Guide
No ratings yet
Joint Probability Distribution Guide
6 pages
6 - Two RV
No ratings yet
6 - Two RV
51 pages
Supportive Notes & QB-Distribution Theory-PS-Unit2
No ratings yet
Supportive Notes & QB-Distribution Theory-PS-Unit2
11 pages
IMEN319 2.conditional Probability
No ratings yet
IMEN319 2.conditional Probability
18 pages
SlidesCourse 04.05 November
No ratings yet
SlidesCourse 04.05 November
13 pages
Multivariate Distributions Guide
No ratings yet
Multivariate Distributions Guide
24 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
33 pages
Chapter 6 - Joint Distributions
No ratings yet
Chapter 6 - Joint Distributions
20 pages
Ue21ec241a - Mee - RM - U4 - 2024
No ratings yet
Ue21ec241a - Mee - RM - U4 - 2024
142 pages
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
13 pages
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
13 pages
Multivariate Random Variable
No ratings yet
Multivariate Random Variable
16 pages
Conditional Distribution
No ratings yet
Conditional Distribution
3 pages
Multivariate Discrete Distributions
No ratings yet
Multivariate Discrete Distributions
27 pages
Stochastic Hydrology: Indian Institute of Science
No ratings yet
Stochastic Hydrology: Indian Institute of Science
52 pages
Review of Basic Probability: 1.1 Random Variables and Distributions
No ratings yet
Review of Basic Probability: 1.1 Random Variables and Distributions
8 pages
ST2334 Chapter 3 Slides
No ratings yet
ST2334 Chapter 3 Slides
76 pages
Theories Joint Distribution PDF
No ratings yet
Theories Joint Distribution PDF
25 pages
Theories Joint Distribution
No ratings yet
Theories Joint Distribution
25 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
14 pages
Lec-8 - Conditional and Marginal - Math. Foun. Ai Ds
No ratings yet
Lec-8 - Conditional and Marginal - Math. Foun. Ai Ds
17 pages
Joint Distribution
No ratings yet
Joint Distribution
15 pages
MIT18 05S14 Class7 Slides PDF
No ratings yet
MIT18 05S14 Class7 Slides PDF
36 pages
Chapitre 6 - Multivariate Probability Distributions
No ratings yet
Chapitre 6 - Multivariate Probability Distributions
41 pages
S201, Lec 1
No ratings yet
S201, Lec 1
33 pages
Joint
No ratings yet
Joint
5 pages
Joint Distribution
No ratings yet
Joint Distribution
37 pages
Chapter 6 - Two - and Higher-Dimensional Random Variables
No ratings yet
Chapter 6 - Two - and Higher-Dimensional Random Variables
13 pages
MAT 326 Chapter 7 Fall 2024
No ratings yet
MAT 326 Chapter 7 Fall 2024
9 pages
Joint Probability Functions
No ratings yet
Joint Probability Functions
7 pages
L14 Types of Random Vector
No ratings yet
L14 Types of Random Vector
4 pages
Module 3
No ratings yet
Module 3
93 pages
Joint & Conditional Probability Distributions
No ratings yet
Joint & Conditional Probability Distributions
23 pages
Ue21ec241a 20221020145724
No ratings yet
Ue21ec241a 20221020145724
16 pages
5 - Pair R. V.
No ratings yet
5 - Pair R. V.
24 pages
APMA1655
No ratings yet
APMA1655
56 pages
Casella G., Berger R.L - Statistical Inference-Duxbury Press (2002) - 169-177
No ratings yet
Casella G., Berger R.L - Statistical Inference-Duxbury Press (2002) - 169-177
13 pages
DSC6132: Probability and Statistical Modelling: Lecture 4: Multivariate Random Variables
No ratings yet
DSC6132: Probability and Statistical Modelling: Lecture 4: Multivariate Random Variables
55 pages
Lect Slides#3
No ratings yet
Lect Slides#3
80 pages
Joint Distributions Functions: Scott Sheffield
No ratings yet
Joint Distributions Functions: Scott Sheffield
68 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
28 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
29 pages
Multiple Random Variables
No ratings yet
Multiple Random Variables
248 pages
03 BivariateRandomVariables
No ratings yet
03 BivariateRandomVariables
110 pages
07 Clustering
No ratings yet
07 Clustering
44 pages
TSClu Win
No ratings yet
TSClu Win
24 pages
Sliding Window Topk
No ratings yet
Sliding Window Topk
30 pages
8 图数据库系统
No ratings yet
8 图数据库系统
72 pages
13 基于知识图谱的问答
No ratings yet
13 基于知识图谱的问答
73 pages
Vldb2008 Ps 4up
No ratings yet
Vldb2008 Ps 4up
16 pages
Finding Top-K Shortest Simple Paths With Diversity
No ratings yet
Finding Top-K Shortest Simple Paths With Diversity
26 pages
A Learning Problem For Entity Matching
No ratings yet
A Learning Problem For Entity Matching
19 pages
Challenges & Opportunities in Graph Processing at Alibaba, 钱正平
No ratings yet
Challenges & Opportunities in Graph Processing at Alibaba, 钱正平
49 pages
Key Elementsof Chess Strategy Excerpt
No ratings yet
Key Elementsof Chess Strategy Excerpt
17 pages
分布式数据流
No ratings yet
分布式数据流
64 pages
Redis源代码分析
No ratings yet
Redis源代码分析
32 pages
64格导游大师 - 国际象棋实战教科书
No ratings yet
64格导游大师 - 国际象棋实战教科书
316 pages
Statistical Inference: Lecture 2: Transformations and Expectations
No ratings yet
Statistical Inference: Lecture 2: Transformations and Expectations
95 pages
Blackand White Magic Excerpt
No ratings yet
Blackand White Magic Excerpt
13 pages
Central It y
No ratings yet
Central It y
92 pages
中国国际象棋：国际象棋中局妙手
No ratings yet
中国国际象棋：国际象棋中局妙手
211 pages
Sampling
No ratings yet
Sampling
100 pages
软件逆向工程原理与实践
No ratings yet
软件逆向工程原理与实践
162 pages
Life Hacks Sample
No ratings yet
Life Hacks Sample
30 pages
The Benoni For The Tournament Player (John Nunn) (Z-Library)
No ratings yet
The Benoni For The Tournament Player (John Nunn) (Z-Library)
164 pages
Winawer Sample
No ratings yet
Winawer Sample
17 pages
Key Elementsof Chess Tactics Excerpt
No ratings yet
Key Elementsof Chess Tactics Excerpt
19 pages
Karjakin Defence Sample
No ratings yet
Karjakin Defence Sample
16 pages
Bogoljubov Vol1 Sample
No ratings yet
Bogoljubov Vol1 Sample
25 pages
Bogoljubov Volume 2 Sample
No ratings yet
Bogoljubov Volume 2 Sample
15 pages
9152
No ratings yet
9152
25 pages
Play The Barry Attack: Andrew Martin
No ratings yet
Play The Barry Attack: Andrew Martin
27 pages
Sphinx Vol 1 - Sample
No ratings yet
Sphinx Vol 1 - Sample
20 pages
Brownian Motion & Stochastic Calculus
No ratings yet
Brownian Motion & Stochastic Calculus
38 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
Risk Profile
No ratings yet
Risk Profile
17 pages
Assignment 1 QMM
No ratings yet
Assignment 1 QMM
7 pages
A Critique of The Application of The Unit Root Tests
No ratings yet
A Critique of The Application of The Unit Root Tests
10 pages
S1 Chp8 DiscreteRandomVars ExamQuestions
No ratings yet
S1 Chp8 DiscreteRandomVars ExamQuestions
11 pages
Sample Space Notes
No ratings yet
Sample Space Notes
7 pages
Assignment 9
No ratings yet
Assignment 9
2 pages
CLASSIFICATION: Bayesian Classifiers: Naïve Bayes Bayesian Networks
No ratings yet
CLASSIFICATION: Bayesian Classifiers: Naïve Bayes Bayesian Networks
29 pages
Mock Exams August
No ratings yet
Mock Exams August
16 pages
Lecture 4 - Ch4 - s2-5
No ratings yet
Lecture 4 - Ch4 - s2-5
84 pages
Tutorial-6 18MAB204T
No ratings yet
Tutorial-6 18MAB204T
2 pages
1.) List All Possible SAMPLE of Size 2 and Their Corresponding
No ratings yet
1.) List All Possible SAMPLE of Size 2 and Their Corresponding
6 pages
Brownian Motion Stochastic Calculus
No ratings yet
Brownian Motion Stochastic Calculus
51 pages
A Monte Carlo Experiment
No ratings yet
A Monte Carlo Experiment
43 pages
Gaussian Noise Problem Set
No ratings yet
Gaussian Noise Problem Set
8 pages
Density Estimation For Statistics and Data Analysis B W Silverman PDF Download
100% (1)
Density Estimation For Statistics and Data Analysis B W Silverman PDF Download
58 pages
PTSP Jntua Old Question Papers
100% (1)
PTSP Jntua Old Question Papers
32 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
15 pages
Convergence Black-Scholes To Binomial
100% (1)
Convergence Black-Scholes To Binomial
9 pages
The Mean of The Sample Mean
No ratings yet
The Mean of The Sample Mean
31 pages
Sta 121 Test 2 2022
No ratings yet
Sta 121 Test 2 2022
6 pages
15-Probability Distribution
No ratings yet
15-Probability Distribution
7 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
No ratings yet
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
66 pages
Time Series Analysis: Henrik Madsen
No ratings yet
Time Series Analysis: Henrik Madsen
25 pages
Mathematics Probability
No ratings yet
Mathematics Probability
3 pages
Statistics Worksheet for Students
No ratings yet
Statistics Worksheet for Students
2 pages