Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views20 pages

7 Expectation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views20 pages

7 Expectation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Slide set  Expectation

∙ Definition and properties


∙ Correlation and covariance
∙ Linear MSE estimation
∙ Summary

© Copyright  Abbas El Gamal


Definition of expectation

∙ We already introduced the notion of expectation (mean) of a r.v.


∙ We generalize this definition and discuss it in more depth
∙ Let X ∈ X be a discrete r.v. with pmf pX (x) and g(x) be a function of x
The expectation or expected value of g(X) is defined as
E(g(X)) = 󵠈 g(x)pX (x)
x∈X

∙ For a continuous r.v. X ∼ fX (x), the expected value of g(X) is defined as



E(g(X)) = 󵐐 g(x)fX (x) dx
−∞

∙ Examples:
󳶳 g(X) = c, a constant, then E(g(X)) = c
󳶳 g(X) = X, E(X) is the mean of X
󳶳 g(X) = X k , k = , , . . ., E(X k ) is the kth moment of X
󳶳 g(X) = (X − E(X)) , E 󶁡(X − E(X)) 󶁱 is the variance of X
 / 
Fundamental theorem of expectation

∙ Let X ∼ pX (x) and Y = g(X) ∼ pY (y), then


EY (Y) = 󵠈 ypY (y) = 󵠈 g(x)pX (x) = EX (g(X))
y∈Y x∈X

Similarly, for continuous r.v.s X ∼ fX (x) and Y = g(X) ∼ fY (y), EY (Y) = EX (g(X))
∙ Hence, E(g(X)) can be found using either fX (x) or fY (y)
It is often easier to use fX (x) than to first find fY (y) then use it to find E(Y)
∙ Proof: We prove the theorem for discrete r.v.s
EY (Y) = 󵠈 ypY (y)
y
= 󵠈y 󵠈 pX (x)
y {x: g(x)=y}
=󵠈 󵠈 ypX (x)
y {x: g(x)=y}

=󵠈 󵠈 g(x)pX (x) = 󵠈 g(x)pX (x) = EX (g(X))


y {x: g(x)=y} x

 / 
Expectation is linear

∙ For any constants a and b


E 󶁡ag (X) + bg (X)󶁱 = a E(g (X)) + b E(g (X))

This follows from the definition of expectation as sum / integral


∙ Example: Let X be a r.v. with known mean and variance.
Find the mean and variance of aX + b
󳶳 By linearity of expectation, the mean: E(aX + b) = a E(X) + b
󳶳 The variance

Var(aX + b) = E 󶁢((aX + b) − E(aX + b)) 󶁲


= E 󶁢(aX + b − a E(X) − b) 󶁲
= E 󶁢a (X − E(X)) 󶁲
= a E 󶀢(X − E(X)) 󶀲 = a Var(X)

 / 
Recap

∙ Let X ∼ fX (x), Y = g(X) be a function of X; the expectation of g(X) is defined as



E[g(X)] = 󵐐 g(x) fX (x) dx
−∞

=󵐐 y fY (y) dy fundamental theorem of expectation
−∞

∙ Expectation is linear, i.e.,


E[ag (X) + bg (X)] = a E[g (X)] + b E[g (X)]

 / 
Correlation and covariance
∙ We can define expectation for a function of two r.v.s: Let (X, Y) ∼ fX,Y (x, y)
and g(x, y) be a function of x, y, the expectation of g(X, Y) is defined as
∞ ∞
E(g(X, Y)) = 󵐐 󵐐 g(x, y) fX,Y (x, y) dx dy
−∞ −∞

The function g(X, Y) may be X, Y, X  , X + Y, max{X, Y}, . . .


∙ The correlation of X and Y is defined as E(XY)
∙ The covariance of X and Y is defined as
Cov(X, Y) = E [(X − E(X))(Y − E(Y))]
= E [XY − X E(Y) − Y E(X) + E(X) E(Y)]
= E(XY) − E[X E(Y)] − E[Y E(X)] + E(X) E(Y) linearity of expectation
= E(XY) − E(Y) E(X) − E(X) E(Y) + E(X) E(Y) linearity of expectation
= E(XY) − E(X) E(Y)
󳶳 If X = Y, then Cov(X, Y) = Var(X)
∙ X and Y are said to be uncorrelated if Cov(X, Y) = , i.e., if E(XY) = E(X) E(Y)
 / 
Example
∙ Let
y
 for x, y ≥ , x + y ≤  y − /
f (x, y) = 󶁇
 otherwise

Find Var(X), Var(Y), Cov(X, Y)
∙ The means of X, Y are / x − /
x
 −y 
 / 

E(X) = E(Y) = 󵐐 󵐐 x dx dy = 󵐐 ( − y) dy =
   
The second moment of X is

 −y
 ( − y)


E(X ) = 󵐐 󵐐 x dx dy =  󵐐 dy = = E(Y  )
    
  
Hence the variance is Var(X) = E(X ) − [E(X)] = = Var(Y)

The correlation of X and Y is
 −y  −y  ( − y) 
E(XY) = 󵐐 󵐐 xy dx dy =  󵐐 y 󵐐 x dx dy =  󵐐 y ⋅ dy =
      
  
Hence, Cov(X, Y) = E(XY) − E(X) E(Y) = − =−
  
 / 
The correlation coefficient
∙ The correlation coefficient of X and Y is defined as
Cov(X, Y) Cov(X, Y)
ρX,Y = =
󵀄Var(X) Var(Y) σX σY
−/ 
󳶳 For the previous example: ρX,Y = =−
/ 
∙ If (X, Y) are uncorrelated, i.e., Cov(X, Y) = , then ρ = 
∙ − ≤ ρX,Y ≤ . To show this consider
X − E(X) Y − E(Y) 
E 󶁥󶀣 ± 󶀳 󶁵 ≥ , expanding and using linearity of expectation,
σX σY
E[(X − E(X)) ] E[(Y − E(Y)) ] E[(X − E(X))(Y − E(Y))]
+ ± ≥
Var(X) Var(Y) σX σY
 +  ± ρX,Y ≥  󳨐⇒ −  ≤ ρX,Y ≤  󳨐⇒ −  ≤ ρX,Y ≤ 
(X − E(X)) (Y − E(Y))
∙ From the above, ρX,Y = ± iff =∓ , i.e.,
σX σY
iff (X − E(X)) is a linear function of (Y − E(Y))
∙ In general, ρX,Y tells us how well X can be estimated by a linear function of Y
 / 
Visualizing correlation in data (E(X) = , Var(X) = , E(Y) = , Var(Y) = /)
uncorrelated (ρ = ) positively correlated (ρ = /)

negatively correlated (ρ = −/) completely correlated (ρ = )

 / 
Independent versus uncorrelated

∙ Let X and Y be independent, then for any functions g(X) and h(Y),
E[g(X) h(Y)] = E[g(X)] E[h(Y)]

∙ Proof: Let’s assume that X ∼ fX (x) and Y ∼ fY (y), then


∞ ∞
E[g(X)h(Y)] = 󵐐 󵐐 g(x) h(y) fX,Y (x, y) dx dy
−∞ −∞
∞ ∞
=󵐐 󵐐 g(x) h(y) fX (x) fY (y) dx dy by independence
−∞ −∞
∞ ∞
=󵐐 g(x) fX (x) dx 󵐐 h(y) fY (y) dy = E[g(X)] E[h(Y)]
−∞ −∞

∙ In particular, if X, Y are independent, E(XY) = E(X) E(Y), i.e., Cov(X, Y) = 


∙ Hence, independent 󳨐⇒ uncorrelated
∙ However, if X and Y are uncorrelated they are not necessarily independent

 / 
Example

∙ Let X, Y ∈ {−, −, , } such that


pX,Y (, ) = /, pX,Y (−, −) = /
pX,Y (−, ) = /, pX,Y (, −) = /,
pX,Y (x, y) = , otherwise y
/
Are X and Y independent? Are they uncorrelated? 2

/
∙ Clearly X and Y are not independent, since if we 1

−2 −1 1 2
know the outcome of one, we completely know x
/
the outcome of the other −1

/
To check if they are uncorrelated, we find the covariance −2

   
E(X) = E(Y) =  ⋅ + (−) ⋅ + (−) ⋅ +⋅ = ,
   
   
E(XY) =  ⋅  ⋅ + (−) ⋅ (−) ⋅ +  ⋅ (−) ⋅ + (−) ⋅  ⋅ =
   
Thus, Cov(X, Y) =  and X and Y are uncorrelated!
 / 
Signal estimation

∙ Consider the following signal estimation problem:


X Y ̂
X
Sensor Estimator

∙ The signal X may be location, scene illumination, temperature, pressure, . . .


The sensor may be lidar, camera, temperature / pressure sensor, . . .
The sensor output Y is a noisy observation of X
∙ This setting may also represent prediction / forecasting, e.g.,
Y is solar output power in hour t, X is solar power at t +  (see HW )
∙ Upon observing Y, the estimator tries to finds a good estimate X̂ of X
∙ There are different types of estimators that one can use depending on
󳶳 The goodness / fidelity criterion
󳶳 Knowledge about the statistics of (X, Y)

 / 
Linear MSE estimation

∙ Consider the following signal estimation problem


X Y ̂
X
Sensor Estimator

∙ There are different types of estimators that one can use depending on
󳶳 The goodness / fidelity criterion
󳶳 Knowledge about the statistics of (X, Y)

∙ The most popular fidelity criterion is the mean squared error between X̂ and X,
̂  ],
MSE = E[(X − X) the smaller the better
∙ To find the optimal X, ̂ we need to know the distribution of (X, Y) (Slide set )

∙ We often have estimates only of the means, variances, and covariance of (X, Y)
∙ It turns out with this information, we can find the best linear MSE estimate, i.e.,
̂ = aY + b that minimizes the MSE = E[(X − X)
the X ̂ ]

∙ We refer to this estimator as the linear MMSE estimate


 / 
Linear MSE estimation

∙ Theorem: The linear MMSE estimate of X given Y is


̂ = Cov(X, Y) (Y − E(Y)) + E(X)
X
Var(Y)
Y − E(Y)
= ρX,Y σX 󶀥 󶀵 + E(X)
σY

and the minimum MSE is


Cov (X, Y) 
MMSE = Var(X) − = ( − ρX,Y )σX
Var(Y)
∙ Properties of linear MMSE estimate:
󳶳 ̂ = E(X), i.e., the estimate is unbiased
E(X)
󳶳 ̂ = E(X), i.e., ignore the observation !
If ρX,Y = , i.e., X, Y uncorrelated, then X
(Y − E(Y)) (X − E(X))
󳶳 If ρX,Y = ±, i.e., =∓ , then the linear estimate is perfect
σY σX

 / 
Proof of theorem

∙ We want to find a, b that minimize E[(X − aY − b) ]


Can take partial derivatives and set them to , but let’s do it slightly differently
∙ First suppose we wish to estimate X by the best constant b that minimize
the MSE = E[(X − b) ]
∙ The answer is b = E(X) and the minimum MSE = Var(X), i.e., absent any
observations, the mean is the MMSE estimate of X and the variance is its MSE
∙ We can show this using calculus or in a nicer way as follows:
E 󶁢(X − b) 󶁲 = E 󶁢[(X − E(X)) + (E(X) − b)] 󶁲
= E 󶁢(X − E(X)) + (E(X) − b) + (E(X) − b)(X − E(X))󶁲
= E[(X − E(X)) ] + (E(X) − b) + (E(X) − b) E(X − E(X))
= E[(X − E(X)) ] + (E(X) − b) ≥ E[(X − E(X)) ],

with equality iff b = E(X)

 / 
Proof of theorem (continued)
∙ We want to find a, b that minimize E[(X − aY − b) ]
∙ Suppose a has been chosen, what b minimizes E[((X − aY) − b) ]?
∙ From the above result, we have b = E(X − aY) = E(X) − a E(Y) ()
∙ So, we want to choose a to minimize
MSE = E 󶁢((X − aY) − E(X − aY)) 󶁲 , which is the same as
E 󶁢((X − E(X)) − a(Y − E(Y))) 󶁲 = Var(X) + a Var(Y) − a Cov(X, Y)
This is a quadratic function of a and is minimized for
Cov(X, Y) ρX,Y σX σY ρX,Y σX
a= = = ()
Var(Y) σY σY
∙ Substituting from () and (), the linear MMSE estimate and the MMSE are
ρX,Y σX ρX,Y σX
̂ = aY + b =
X Y + E(X) − E(Y),
σY σY
MMSE = Var(X) + a Var(Y) − a Cov(X, Y)
   ρX,Y σX 
= σX + ρXY σX −  ⋅ ρX,Y σX σY = ( − ρX,Y )σX
σY
 / 
The additive noise channel

∙ When you measure a signal, e.g., location, scene, temperature, pressure, . . . ,


the measuring device/circuit adds noise to the signal
∙ We model this system by an additive noise channel
Z

X Y =X+Z

󳶳 The input signal X has known mean μX and variance σX ,


󳶳 the additive noise Z has zero mean and known variance σZ , and
󳶳 the output (observation) Y = X + Z, where X and Z are uncorrelated

∙ Find the linear MMSE estimate of the signal X given the output Y and its MSE

 / 
The additive noise channel

∙ The best linear MSE estimate is


̂ = Cov(X, Y) (Y − E(Y)) + E(X)
X
Var(Y)

∙ So we need to find E(Y), Var(Y), and Cov(X, Y) in terms of μX , σX , σZ


E(Y) = E(X + Z) = E(X) + E(Z) = μX + 
Var(Y) = E[(Y − E(Y)) ] = E[(X − μX ) + Z) ] = σX + σZ
Cov(X, Y) = E[(X − μX )(X + Z − μX )] = E[(X − μX ) + (X − μX )Z] = σX + 

∙ Thus, the linear MMSE estimate is


σX σX σZ
̂ =
X (Y − μX ) + μX =  Y+  μX

σX + σZ σX + σZ σX + σZ

σX
So if the signal to noise ratio (SNR)  is high, we put more weight on Y,
σZ
and if it’s low, we put more weight on μX
 / 
The additive noise channel

∙ From the theorem, the minimum MSE is


Cov (X, Y)
MMSE = Var(X) −
Var(Y)

∙ From the model, we have Var(X) = σX , Cov(X, Y) = σX , Var(Y) = σX + σZ
∙ Hence, the minimum MSE is
σX
MMSE = σX − 
σX + σZ
σX σZ σX
=  =
σX + σZ SNR + 

So the MMSE decreases from σX to zero as the SNR increases from  to ∞

 / 
Summary

∙ Expectation is linear
∙ Covariance and correlation coefficient
∙ Independent 󳨐⇒ uncorrelated; reverse doesn’t hold in general
∙ Application: linear estimation
∙ The additive noise channel

 / 

You might also like