Statistics for Data Science - 2
Week 3 Notes
Expected value
• Expected value of a random variable
Definition: Suppose X is a discrete random variable with range TX and PMF fX . The
expected value of X, denoted E[X], is defined as
X
E[X] = tP (X = t)
t∈TX
assuming the above sum exists.
Expected value represents “center” of a random variable.
1. Consider a constant c as a random variable X with
P (X = c) = 1.
E[c] = c × 1 = c
2. If X takes only non-negative values, i.e. P (X ≥ 0) = 1. Then,
E[X] ≥ 0
• Expected value of a function of random variables
Suppose X1 . . . Xn have joint PMF fX1 ...Xn with range of Xi denoted as TXi . Let
g : TX1 × . . . × TXn → R
be a function, and let Y = g(X1 , . . . , Xn ) have range TY and PMF fY . Then,
X X
E[g(X1 , . . . , Xn )] = tfY (t) = g(t1 , . . . , tn )fX1 ...Xn (t1 , . . . , tn )
t∈TY ti ∈TXi
• Linearity of Expected value:
1. E[cX] = cE[X] for a random variable X and a constant c.
2. E[X + Y ] = E[X] + E[Y ] for any two random variables X, Y .
• Zero mean Random variable:
A random variable X with E[X] = 0 is said to be a zero-mean random variable.
• Variance and Standard deviation:
Definition: The variance of a random variable X, denoted by Var(X), is defined as
Var(X) = E[(X − E[X])2 ]
Variance measures the spread about the expected value.
Variance of random variable X is also given by Var(X) = E[X 2 ] − E[X]2
The standard deviation of X, denoted by SD(X), is defined as
p
SD(X) = + Var(X)
Units of SD(X) are same as units of X.
• Properties: Scaling and translation
Let X be a random variable. Let a be a constant real number.
1. Var(aX) = a2 Var(X)
2. SD(aX) =| a | SD(X)
3. Var(X + a) = Var(X)
4. SD(X + a) = SD(X)
• Sum and product of independent random variables
1. For any two random variables X and Y (independent or dependent), E[X + Y ] =
E[X] + E[Y ].
2. If X and Y are independent random variables,
(a) E[XY ] = E[X]E[Y ]
(b) Var(X + Y ) = Var(X) + Var(Y )
• Standardised random variables:
1. Definition: A random variable X is said to be standardised if E[X] = 0, Var(X) =
1.
X − E[X]
2. Let X be a random variable. Then, Y = is a standardised random
SD(X)
variable.
• Covariance:
Definition: Suppose X and Y are random variables on the same probability space. The
covariance of X and Y , denoted as Cov(X, Y ), is defined as
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
It summarizes the relationship between two random variables.
Properties:
1. Cov(X, X) = Var(X)
2. Cov(X, Y ) = E[XY ] − E[X]E[Y ]
Page 2
3. Covariance is symmetric if Cov(X, Y ) = Cov(Y, X)
4. Covariance is a “linear” quantity.
(a) Cov(X, aY + bZ) = aCov(X, Y ) + bCov(X, Z)
(b) Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z)
5. Independence: If X and Y are independent, then X and Y are uncorrelated, i.e.
Cov(X, Y ) = 0
6. If X and Y are uncorrelated, they may be dependent.
• Correlation coefficient:
Definition: The correlation coefficient or correlation of two random variables X and Y
, denoted by ρ(X, Y ), is defined as
Cov(X, Y )
ρ(X, Y ) =
SD(X)SD(Y )
1. −1 ≤ ρ(X, Y ) ≤ 1.
2. ρ(X, Y ) summarizes the trend between random variables.
3. ρ(X, Y ) is a dimensionless quantity.
4. If ρ(X, Y ) is close to zero, there is no clear linear trend between X and Y .
5. If ρ(X, Y ) = 1 or ρ(X, Y ) = −1, Y is a linear function of X.
6. If | ρ(X, Y ) | is close to one, X and Y are strongly correlated.
• Bounds on probabilities using mean and variance
1. Markov’s inequality: Let X be a discrete random variable taking non-negative
values with a finite mean µ. Then,
µ
P (X ≥ c) ≤
c
Mean µ, through Markov’s inequality: bounds the probability that a non-negative
random variable takes values much larger than the mean.
2. Chebyshev’s inequality: Let X be a discrete random variable with a finite mean
µ and a finite variance σ 2 . Then,
1
P (| X − µ |≥ kσ) ≤
k2
Other forms:
σ2 1
(a) P (| X − µ |≥ c) ≤ 2
, P ((X − µ)2 > k 2 σ 2 ) ≤ 2
c k
1
(b) P (µ − kσ < X < µ + kσ) ≥ 1 − 2
k
Mean µ and standard deviation σ, through Chebyshev’s inequality: bound the
probability that X is away from µ by kσ.
Page 3