1 The Multivariate Normal Density and Its Properties
1.1 The Multivariate Normal Density
A p-dimensional random vector X is distributed as multivariate normal distribution if
the jpdf of X is
1
f (x) = (2π)−p/2 |Σ|−1/2 exp{− (x − µ)′ Σ−1 (x − µ)}
2
We shall denote this p-dimensional density by Np (µ, Σ).
Example: Bivariate normal density
1.2 Properties
• If Σ is positive definite, so that Σ−1 exists, then
1
Σe = λe implies Σ−1 e = ( )e
λ
so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair (1/λ, e)
for Σ−1 . Also, Σ−1 is positive definite.
• Contours of constant density for the p-dimensional normal distribution are ellip-
soids defined by x such that
(x − µ)′ Σ−1 (x − µ) = c2
√
The ellipsoids are centered at µ and have axes ±c λi ei , where Σei = λi ei for
i = 1, 2, · · · , p.
• If X is distributed as Np (µ, Σ), then any linear combination of variables a′ X =
a1 X1 + a2 X2 + · · · + ap Xp is distributed as N (a′ µ, a′ Σa). Also if for every
a, a′ X is distributed as N (a′ µ, a′ Σa), then X must be Np (µ, Σ).
• If X is distributed as Np (µ, Σ), the q linear combinations
a11 X1 + · · · + a1p Xp
a21 X1 + · · · + a2p Xp
AX = ..
.
aq1 X1 + · · · + aqp Xp
are distributed as Nq (Aµ, AΣA′ ). Also, X + d, where d is a vector of con-
stants, is distributed as Np (µ + d, Σ).
• All subsets of X are normally distributed. If [ we respectively
] [ (1) ] partition
[ X, its]
X(1) µ Σ11 Σ12
mean vector µ, and its covariance matrix Σ as , ,
X(2) µ(2) Σ21 Σ22
,then X(1) is distributed as Nq1 (µ(1) , Σ11 ).
1
• If X(1) and X(2) are independent, then Cov(X(1) , X(2) ) = 0.
[ (1) ] [ (1) ] [ ]
X µ Σ11 Σ12
• If ∼ N q1 +q2 ( , ), then X(1) and X(2) are
X(2) µ(2) Σ21 Σ22
independent if and only if Σ12 = 0.
• If X(1) and X(2) are independent [ and(1)are] distributed [as N q1 (µ
(1)
] [ , Σ11 ) and ]
(1)
X µ Σ11 0
Nq2 (µ(2) , Σ22 ) respectively, then ∼ Nq1 +q2 ( , ).LetX=
X(2) µ(2) 0 Σ22
[ (1) ] [ (1) ] [ ]
X µ Σ11 Σ12
∼ Nq1 +q2 ( , ), then the conditional distribu-
X(2) µ(2) Σ21 Σ22
tion of X1 , given that X2 = x2 is distributed as Nq1 (µ(1) + Σ12 Σ−1 22 (x2 −
µ(2) ), Σ11 − Σ12 Σ−122 Σ21 ).
• Let X is distributed as Np (µ, Σ) with |Σ| > 0. Then (X − µ)′ Σ−1 (X − µ) ∼
χ2p .
• The Np (µ, Σ) distribution assigns probability 1−α to the solid ellipsoid x : (X − µ)′ Σ−1 (X − µ) ≤ χp (α),
where χ2p (α) denotes the upper (100α)th percentile of the χ2p (α) distribution.
– Let X1 , X2 , · · · , Xn be mutually independent with Xj distribued as Np (µj , Σ).
Then
V1 = c1 X1 + c2 X2 + · · · + cn Xn
∑n ∑n
is distributed as Np ( j=1 cj µj , ( j=1 c2j )Σ).
Moreover, V1 and V2 = b1 X1 + b2 X2 + · · · + bn Xn are jointly multi-
variate normal with covariance matrix
[ ∑n ]
( j=1 c2j )Σ (b′ c)Σ
∑
(b′ c)Σ
n
( j=1 b2j )Σ
Consequently, V1 and V2 are independent if b′ c = 0.
2 Sampling from a Multivariate Normal Distribu-
tion and MLE
Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal population
with mean µ and covariance Σ. Then
1∑
n
(n − 1)
µ̂ = X̄ and Σ̂ = (Xj − X̄)(Xj − X̄)′ = S
n j=1 n
are the maximum∑likelihood estimator of µ and Σ, respectively. Their observed
values, x̄ and n1 j=1 (xj − x̄)(xj − x̄)′ are called the maximum likelihood
n
estimates of µ and Σ.
2
2.1 Sufficient Statistics
Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal population
with mean µ and covariance Σ. Then
X̄ and S are sufficient statistics.
2.2 Sampling Distribution
– Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal pop-
ulation with mean µ and covariance Σ. Then
1. X̄ is distributed as Np (µ, (1/n)Σ)
2. (n − 1)S is distribued as Wishart random matrix with n-1 d.f.
3. X̄ and S are independent.
– Let X1 , X2 , · · · , Xn be a random sample from a population with mean µ
and finite (nonsingular) covariance Σ. Then for n − p large, we have
p p
1. X̄ −
→ µ, and S −→Σ
√ a
2. n(X̄ − µ) ∼ Np (µ, (1/n)Σ)
a
3. n(X̄ − µ)′ S −1 (X̄ − µ) ∼ χ2p
2.3 Wishart Distribution
Let X1 , X2 , · · · , Xn be a random sample from a∑ multivariate normal popula-
tion with mean µ and covariance Σ. Then W = i=1 Xi Xi ′ is distributed as
n
Wishart distribution with d.f of n, denoted as Wn (∆, Σ), where ∆ = µµ′ . When
µ = 0, it is called central Wishart distribution, denoted as Wn (0, Σ).
∑A
– If j ∼ Wnj (∆j , Σ), j = 1, 2, · ·∑
· , m are mutually
∑mindependent, then
m m
j=1 W j ∼ W n (∆, Σ) where n = n
j=1 j , ∆ = j=1 ∆j .
– If W ∼ Wn (∆, Σ), C is a m×n matrix, then CW C ′ ∼ Wm (C∆C ′ , CΣC ′ ).
3 Assessing the Assumption of Normality
– Evaluating the normality of the univariate marginal distribution
1. histogram
2. Q-Q plot
3. Shapiro-Wilks’ test
4. Kolmogorov-Smirnov test
– Evaluating bivariate normality
3
1. Chi-square plot
2. Multiple testing
3. Energy test
Example: Constructing a Q-Q plot A sample of n = 10 observations gives the
values in the following table:
Ordered observations Probability levels Standard normal quantiles
x(j) (j − 1/2)/n q(j)
-1.00 0.05 -1.645
-0.10 0.15 -1.036
0.16 0.25 -0.674
0.41 0.35 -0.385
0.62 0.45 -0.125
0.80 0.55 0.125
1.26 0.65 0.385
1.54 0.75 0.674
1.71 0.85 1.036
2.30 0.95 1.645
The steps leading to a Q-Q plot are as follows:
– Order the original observations to get x(1) , x(2) , . . . , x(n) and their corre-
sponding probability values
1 1 1
(1 − )/n, (2 − )/n, . . . , (n − )/n;
2 2 2
– Calculate the standard normal quantiles
q(1) , q(2) , . . . , q(n) ;
– Plot the pairs of observations
(q(1) , x(1) ), (q(2) , x(2) ), . . . , (q(n) , x(n) ),
and examine the "straightnesss" of the outcome.
3.1 Evaluating Multivariate Normality
– Recall: Let X is distributed as Np (µ, Σ) with |Σ| > 0. Then (X − µ)′ Σ−1 (X − µ) ∼
χ2p . So,
(X − X̄)′ S −1 (X − X̄) ∼ χ2p
a
– chi-square plot: d2j = (xj − x̄)′ S−1 (xj − x̄), j = 1, 2, . . . , n
1. Order the squared distances from the smallest to largest as d2(1) ≤ d2(2) ≤
· · · d2(n)
2. Graph the pairs (qc,p ((j − 12 /n), d2(j) ), where qc,p ((j − 21 )/n) is the 100(j −
1
2
)/n quantile of the chi-square distribution with p degrees of freedom.
– The plot should resemble a straight line through the origin having slope 1. A sys-
tematic curved pattern suggests lack of normality.
4
3.2 Detecting Outliers and Cleaning Data
– Make a dot plot for each variable
– Make a scatter plot for each pair of variables
√
– Calculate the standardized values for each variable zjk = (xjk − x¯k )/ skk
for j = 1, 2, · · · , n for each column k = 1, 2, · · · , p. Examine these stan-
dardized values for large or small values.
– Calculate the generalized squared distances d2j = (xi − x̄)′ S −1 (xi − x̄)
Examine these distances for usually large values. In a chi-square plot, these
would be points farthest from the origin.
3.3 Transformation to Near Normality
– Transforming univariate observations:
Original Scale Transformed Scale
√
Counts y y
Proportions logit(p) = 2 log(p(1 − p))
1
Correlations F isher′ s z(r) = 21 log( 1−r
1+r
)
– Box-Cox transformation:
{
xλ −1
(λ) λ λ ̸= 0
x =
lnλ λ=0
– Transforming multivariate observations: let λ1 , λ2 , · · · , λp be the power
transformations for the p measured characteristics. Each λk can be selected
by maximizing
n 1 ∑ (λk ) ∑
n n
(λ )
lk (λ) = − ln[ (xjk − xk k )2 ] + (λk − 1) lnxjk
2 n j=1 j=1
where x1k , x2k , · · · , xnk are the n observations on the kth variable, k =
1, 2, · · · , p. Here
1 ∑ (λk ) 1 ∑ ( xjk λk − 1 )
n n
(λk )
xk = xjk =
n j=1 n j=1 λk
is the arithmetic average of the transformed observations.
– We start with the values λ̂1 , λ̂2 , . . . , λ̂p obtained from the preceding trans-
formations and iterate toward the set of values λ′ = [λ1 , λ2 , . . . , λp ], which
collectively maximizes
ℓ(λ1 , λ2 , . . . , λp )
∑n ∑n
= − n2 ln|S(λ)| + (λ1 − 1) j=1lnxj1 + (λ2 − 1) j=1 lnxj2
∑n
+ · · · + (λp − 1) j=1 lnxjp
5
where S(λ) is the sample covariance matrix obtained from
λ
xj11 −1
λ1
λ
xj22 −1
= ,
(λ) λ2
xj .. j = 1, 2, . . . , n
.
λ
xjpp −1
λp