Chapter 5
Week 5: Expectation for
random variables
5.1 Discrete random variables
We introduced above a median as a characteristic of a ”center” of the
distribution. There is an alternative and more popular parameter of
distribution that describes its center.
Definition: Let X be a discrete random variable with frequency p(x)
such that
∑
|x|p(x) < +∞.
x
The expectation of X (or the mean value of X, or the expected value
of X) is ∑
E(X) = xp(x)
x
∑
Note that condition that x |x|p(x) < +∞ is always satisfied
if the random variable takes values at a finite set.
Notation: It is acceptable to write EX instead of E(X).
Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that EX = 1 · p + 0 · (1 − p) = p.
Note that X takes values 0 or 1 only, and it never takes the value
p = EX, if p ∈ (0, 1). This is why the term ”the expected value”
should not be taken literally.
Note that the median of Bernoulli distribution is not a very useful
parameter. If p = 1/2 than
P(X ≤ a) = P(X = 0) = P(X = 1) = P(X ≥ a)
1
2CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
for any a ∈ (0, 1), i.e., any point a ∈ (0, 1) is a median.
Example
If we toss a single die once and X is the outcome, then
X 1 2 3 4 5 6
1 1 1 1 1 1
p(x) = P(X = x)
6 6 6 6 6 6
1 1 1 1 1 1
E(X) = 1 × +2× +3× +4× +5× +6×
6 6 6 6 6 6
= 3.5
Example
A pair of dice are tossed once and X is the sum of the outcomes:
X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
p(x)
36 36 36 36 36 36 36 36 36 36 36
= P(X = x)
∑
12
1 2 1
EX = xf (x) = 2 × +3× + . . . + 12 × =7
36 36 36
x=2
Example
A gambler bets on the coin tossing game: he pays $100 for head and
receives $100 for the tail. What is his expected gain?
Solution: let X be the gain. It is a random variable with
frequency pX (x) given by
x -100 100
pX (x) 1/2 1/2
It gives EX = 100(1/2) + 100(−1/2) = 0. This is a fair
game!
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We have
P(X = −1) = 20/38, P(X = 1) = 18/38,
Hence
EX = −1 · 20/38 + 1 · 18/38 = −1/19.
Your expected loss is about $.05. This game gives an advantage to
the casino; it covers casino business expenses.
5.2. EXPECTATION FOR CONTINUOUS RANDOM VARIABLES 3
Example
Consider roulette wheel again. If you bet $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let X be the gain. We have
P(X = −1) = 37/38, P(X = 35) = 1/38,
Hence
EX = −1 · 37/38 + 35 · 1/38 = −1/19.
Your expected loss is about $0.05 again, Note that the expected value
is the same as for the previous example.
5.2 Expectation for continuous random vari-
ables
Definition: Let X be a continuous random variable with density
f (x) such that
∫ ∞
|x|f (x)dx < +∞.
−∞
The expectation of X (or the mean value of X, or the expected value
of X) is ∫ ∞
EX = E(X) = xf (x)dx.
−∞
Example
Consider the uniform probability function over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a
x 1 x2 a
E(X) = dx = = .
0 a a 2 0 2
Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere
⇒
∫ +∞
1
EX = E(X) = xλe−λx dx = .
0 λ
4CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
5.3 Case of the distributions with heavy tails
(fat tails)
It may happen that
∫ ∞
|x|f (x)dx = +∞
−∞
(i.e., the integral diverges). It happens if f (x) is not decaying fast
enough when |x| → +∞ ∫∞(the case of heavy (fat) tail distribution).
Technically, the integral −∞ xf (x)dx is not defined in this case.
It is common to accept the following rules:
(A) If
∫ 0 ∫ ∞
xf (x)dx > −∞, xf (x)dx = +∞
−∞ 0
then EX = +∞ (and, therefore, it is defined).
(B) If
∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx < +∞
−∞ 0
then EX = −∞ (and, therefore, it is defined).
The only case when EX is not defined in any sense is when
∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx = +∞.
−∞ 0
Similar rules are often accepted for the case of discrete random
variables.
Example
Cauchy distribution: Consider a random variable X = X1 /X2 ,
where Xi ∼ N (0, 1). We found on Week 4 that X has density
1
f (x) = .
π(1 + x2 )
(Cauchy density). We have that
∫ 0 ∫ ∞
x x
dx = −∞, dx = +∞.
−∞ π(1 + x2 ) 0 π(1 + x2 )
This means that EX is not defined.
5.4. EXPECTATIONS OF FUNCTIONS OF RANDOM VARIABLES5
5.4 Expectations of functions of random vari-
ables
Discrete random varibales
Theorem Let g(x) be a function, g : R → R. Let X be a discrete
random variable with frequency p(x) such that
∑
|g(x)|p(x) < +∞.
x
Let Y = g(X). Then
∑
EY = g(x)p(x)
x
Proof. Assume that X take values xk , k = 1, 2, ..., and P(X =
xk ) = pX (xk ). It follows that Y take values yk = g(xk ), k =
1, 2, .... Assume first that, for any yk , there is an unique xk such
that yk = g(xk ). Then P(Y = yk ) = pX (xk ). Then the frequency
function for Y is pY (yk ) = pX (xk ). We have
∑ ∑ ∑
EY = yk pY (yk ) = g(xk )pX (xk ) = g(x)p(x).
k k x
The case when there are more than one x such that yk = g(x)
requires additional analysis, We have
∑
P(Y = yk ) = pX (xj ).
j: g(xj )=yk
Hence
∑ ∑ ∑
EY = yk pY (yk ) = yk pX (xj )
k k j: g(xj )=yk
∑ ∑ ∑
= g(xj )pX (xj ) = g(x)p(x).
k j: g(xj )=yk x
Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that E(X 2 ) = 12 · p + 02 · (1 − p) = p.
Notations:
It is common to write EX 2 instead of E(X 2 ). Therefore, EX 2
means E(X 2 ) rather than (EX)2 .
6CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
Continuous random variables
Let g(x) be a function, g : R → R. Let X be a continuous random
variable with density f (x) such that
∫ ∞
|g(x)|f (x)dx < +∞.
−∞
Then expectation of Y = g(X) is
∫ ∞
EY = g(x)f (x)dx.
−∞
Example
Consider the uniform distribution over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a 2
2 x 1 x3 a2
E(X ) = dx = = .
0 a a 3 0 3
Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere
then
∫ +∞
2
2
EX = E(X ) = 2
x2 λe−λx dx = 2 .
0 λ
5.5 Joint random variables
Theorem Let X and Y be jointly distributed with probability distri-
bution function or probability function f (x, y). hen
{ ∑ ∑
y g(x, y)f (x, y) if X, Y are discrete
Eg(X, Y ) = ∫ ∞x ∫ ∞
−∞ −∞ g(x, y)f (x, y)dxdy if X, Y are continuous.
Example
Consider discrete random variable with frequency
{ 1
(x + y), x = 0, 1, 2; y = 0, 1, 2, 3
p(x, y) = 30
0, elsewhere.
Then
∑
2 ∑
3
1
E(XY ) = xy (x + y)
30
x=0 y=0
5.5. JOINT RANDOM VARIABLES 7
It gives
1 ∑
2
E(XY ) = {x × 0 × (x + 0) + x × 1 × (x + 1)
30
x=0
+x × 2 × (x + 2) + x × 3 × (x + 3)}
1 ∑
2
= {x(x + 1) + 2x(x + 2) + 3x(x + 3)}
30
x=0
1
= {0(0 + 1) + 2 × 0 × (0 + 2) + 3 × 0 × (0 + 3)
30
+1(1 + 1) + 2 × 1 × (1 + 2) + 3 × 1 × (1 + 3)
+2(2 + 1) + 2 × 2 × (2 + 2) + 3 × 2 × (2 + 3)}
72
= = 2.4.
30
Example
In the previous example
∑
2 ∑
3
(x + y)2 98
E(X + Y ) = = = 3.266.
30 30
x=0 y=0
Special cases
Let g(X, Y ) = X. Let us verify that Eg(X, Y ) = E(X).
We have
{ ∑ ∑
y xf (x, y), if X, Y are discrete
E(X) = ∫ ∞x ∫ ∞
−∞ −∞ xf (x, y)dxdy, if X, Y are continuous
In the case when X and Y are continuous,
∫ ∞ ∫ ∞ ∫ ∞ {∫ ∞ }
E(X) = xf (x, y)dxdy = x f (x, y)dy dx
−∞ −∞ −∞ −∞
∫ ∞
= xfX (x)dx,
−∞
∫∞
where fX (x) ≡ −∞ f (x, y)dy is the marginal density of X.
Similar conclusions for discrete random variables.
Theorem Let a be a constant. Then
E(a) = a.
Proof: a can be described as a discrete random variable that takes
only one value, a. Then Ea = a · 1.
8CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
5.6 Expectation of linear combination of ran-
dom variables
Theorem Let X be a random variable, a ∈ R. Then
E(aX) = aEX.
Proof: We consider continuous case only. Take g(x) = ax. We
have
∫ ∞ ∫ ∞
E(aX) = axf (x)dx = a xf (x)dx = aEX.
−∞ −∞
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We found above that
EX = −$1/19 ∼ −$0.05.
If you bet $100 that an odd non-zero number comes up, you win or
lose is Y = 100X, and EY = 100EX ∼ −$5.
Theorem Let X be a random variable, a ∈ R. Then E(a +
X) = a + EX.
Proof: We consider continuous case only. Take g(x) = a + x.
We have
∫ ∞
E(a + X) = (a + x)f (x)dx
∫ ∞ ∫ ∞ −∞
=a f (x)dx + xf (x)dx = a + EX.
−∞ −∞
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We found above that EX = −$1/19 ∼ −$0.05.
Assume that you have $10 in your pocket, and let Y be the total
amount of money in your pocket after the game. Then Y = $10 + X
and EY = $10 + EX ∼ $9.95.
Theorem Let X and Y be random variables, a ∈ R. Then
E(X + Y ) = EX + EY .
5.6. EXPECTATION OF LINEAR COMBINATION OF RANDOM VARIABLES9
Proof: We consider continuous case only. Take g(x, y) = x + y.
We have
∫ ∞∫ ∞
E(X + Y ) = (x + y)f (x, y)dxdy
−∞ −∞
∫ ∞ {∫ ∞ } ∫ ∞ {∫ ∞ }
= x f (x, y)dy dx + y f (x, y)dx
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx + yfY (y)dy
−∞ −∞
= EX + EY.
Here fX (x) and fY (y) are the marginal distributions of X and Y,
respectively.
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs.
Let X be the gain. If you bet another $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let Y be the gain. We found above that EX = EY =
−$1/19 ∼ −$0.05. The expected gain for this combined $2 bet is
E(X + Y ) = −2/19 ∼ −$0.1.
Corollary Let X1 , ..., Xn be random variables, ai ∈ R,
∑
n
Z = a0 + ai Xi .
i=1
Then
∑
n
EZ = a0 + ai Xi .
i=1
Example
Let X be the total number of successes in n Bernoulli trias. We have
that X = X1 + .... + Xn , where Xi are Bernoulli variables such that
{
1, with probability p
Xi =
0, with probability 1 − p
We have that EXi = 1 · p + 0 · (1 − p) = p.
Hence EX = p + ... + p = np.
Corollary Let S(X) and T (X) be functions of X. Then
E(S(X) + T (X)) = E(S(X)) + E(T (X)).
Corollary Let X and Y be jointly distributed, and let g and h
be functions of the random variables X and Y. Then E[g(X, Y )] +
h(X, Y )] = E[g(X, Y )] + h(X, Y )]
10CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
5.7 Expectation of a product
Is it correct that E(XY ) = E(X)E(Y )?
The answer is no. For example, take random variable X such
that
{
1, with probability1/2
X=
−1, with probability 1/2
We have that EX = 0, but X 2 = 1 with probability 1 and
EX 2 = 1 ̸= EX · EX = 0.
Theorem If X and Y are independent random variables, then
E(XY ) = E(X)E(Y )
Proof (for continuous case).
∫ ∞ ∫ ∞
E(XY ) = xyf (x, y)dxdy.
−∞ −∞
Since X and Y are independent, we have f (x, y) = fX (x)fY (y),
where fX (x) and fY (y) are the marginal distributions of X and Y,
respectively. Hence
∫ ∞∫ ∞ ∫ ∞∫ ∞
E(XY ) = xyf (x, y)dxdy = xyfX (x)fY (y)dxdy
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx yfY (y)dy = E(X)E(Y ).
−∞ −∞
Note: The condition that E(XY ) = E(X)E(Y ) is neces-
sary but not sufficient for independency of X and Y . That is,
E(XY ) = E(X)E(Y ) does not necessarily imply that X and Y are
independent.
5.8 Probability of an event
Let A be a a random event. Consider a random variable IA such that
{
1, if event A occurs
IA =
0, otherwise
Theorem EIA = P(A).
*
5.9 Expectation in the axiomatic setting
Let Ω be the sample space, and let X(ω) be a random variable, i.e.,
it is a mapping,
∫ X : Ω → R. In this setting, it is common to write
EX = Ω X(ω)P(dω)
meaning that the expectation is a kind of an integral. Modern proba-
bility theory gives an interpretation to this integration.
5.10. MOMENTS 11
5.10 Moments
Definition: The kth moment of the random variable is defined as
′
µk = E(X k )
′
It is usually denoted by µk .
By the rule of expectation, we have that
{ ∑ k
x f (x) if X is discrete
µk = ∫ ∞x k
′
−∞ x f (x)dx if X is continuous
k = 1, 2, ... The first moment (i.e., EX) is called the mean of X and
is often denoted by µ, i.e.,
′
µ1 = E(X) = mean of X ≡ µ.
Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that µ′k = E(X k ) = 1k · p + 0k · (1 − p) = p.
Definition: The kth central moment (or the moment about the mean)
of the random variable X is defined as E[(X − µ)k ]. It is usually
denoted by µk . By the rule of expectation, we have that
{ ∑
(x − µ)k f (x) if X is discrete
µk = ∫ ∞x
−∞ (x − µ) f (x)dx if X is continuous
k
Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that µ = EX = p,
µk = E(X − µ)k = (1 − p)k · p + (−p)k · (1 − p).
In particular,
µ2 = (1 − p)2 · p + (−p)2 · (1 − p) = p − 2p2 + p3 − p2 − p3
= p − p2 = p(1 − p).