Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views6 pages

Multi Normal

The document discusses the multivariate normal distribution, its properties, and implications for statistical analysis, including maximum likelihood estimation and sufficient statistics. It outlines methods for assessing normality, detecting outliers, and transforming data to achieve normality. Additionally, it covers the sampling distribution and the Wishart distribution related to multivariate normal populations.

Uploaded by

vanjunxin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Multi Normal

The document discusses the multivariate normal distribution, its properties, and implications for statistical analysis, including maximum likelihood estimation and sufficient statistics. It outlines methods for assessing normality, detecting outliers, and transforming data to achieve normality. Additionally, it covers the sampling distribution and the Wishart distribution related to multivariate normal populations.

Uploaded by

vanjunxin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1 The Multivariate Normal Density and Its Properties

1.1 The Multivariate Normal Density


A p-dimensional random vector X is distributed as multivariate normal distribution if
the jpdf of X is
1
f (x) = (2π)−p/2 |Σ|−1/2 exp{− (x − µ)′ Σ−1 (x − µ)}
2
We shall denote this p-dimensional density by Np (µ, Σ).
Example: Bivariate normal density

1.2 Properties
• If Σ is positive definite, so that Σ−1 exists, then
1
Σe = λe implies Σ−1 e = ( )e
λ
so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair (1/λ, e)
for Σ−1 . Also, Σ−1 is positive definite.

• Contours of constant density for the p-dimensional normal distribution are ellip-
soids defined by x such that

(x − µ)′ Σ−1 (x − µ) = c2

The ellipsoids are centered at µ and have axes ±c λi ei , where Σei = λi ei for
i = 1, 2, · · · , p.
• If X is distributed as Np (µ, Σ), then any linear combination of variables a′ X =
a1 X1 + a2 X2 + · · · + ap Xp is distributed as N (a′ µ, a′ Σa). Also if for every
a, a′ X is distributed as N (a′ µ, a′ Σa), then X must be Np (µ, Σ).
• If X is distributed as Np (µ, Σ), the q linear combinations
 
a11 X1 + · · · + a1p Xp
 a21 X1 + · · · + a2p Xp 
 
AX =  .. 
 . 
aq1 X1 + · · · + aqp Xp

are distributed as Nq (Aµ, AΣA′ ). Also, X + d, where d is a vector of con-


stants, is distributed as Np (µ + d, Σ).
• All subsets of X are normally distributed. If [ we respectively
] [ (1) ] partition
[ X, its]
X(1) µ Σ11 Σ12
mean vector µ, and its covariance matrix Σ as , ,
X(2) µ(2) Σ21 Σ22
,then X(1) is distributed as Nq1 (µ(1) , Σ11 ).

1
• If X(1) and X(2) are independent, then Cov(X(1) , X(2) ) = 0.
[ (1) ] [ (1) ] [ ]
X µ Σ11 Σ12
• If ∼ N q1 +q2 ( , ), then X(1) and X(2) are
X(2) µ(2) Σ21 Σ22
independent if and only if Σ12 = 0.
• If X(1) and X(2) are independent [ and(1)are] distributed [as N q1 (µ
(1)
] [ , Σ11 ) and ]
(1)
X µ Σ11 0
Nq2 (µ(2) , Σ22 ) respectively, then ∼ Nq1 +q2 ( , ).LetX=
X(2) µ(2) 0 Σ22
[ (1) ] [ (1) ] [ ]
X µ Σ11 Σ12
∼ Nq1 +q2 ( , ), then the conditional distribu-
X(2) µ(2) Σ21 Σ22
tion of X1 , given that X2 = x2 is distributed as Nq1 (µ(1) + Σ12 Σ−1 22 (x2 −
µ(2) ), Σ11 − Σ12 Σ−122 Σ21 ).

• Let X is distributed as Np (µ, Σ) with |Σ| > 0. Then (X − µ)′ Σ−1 (X − µ) ∼


χ2p .

• The Np (µ, Σ) distribution assigns probability 1−α to the solid ellipsoid x : (X − µ)′ Σ−1 (X − µ) ≤ χp (α),
where χ2p (α) denotes the upper (100α)th percentile of the χ2p (α) distribution.
– Let X1 , X2 , · · · , Xn be mutually independent with Xj distribued as Np (µj , Σ).
Then
V1 = c1 X1 + c2 X2 + · · · + cn Xn
∑n ∑n
is distributed as Np ( j=1 cj µj , ( j=1 c2j )Σ).
Moreover, V1 and V2 = b1 X1 + b2 X2 + · · · + bn Xn are jointly multi-

variate normal with covariance matrix


[ ∑n ]
( j=1 c2j )Σ (b′ c)Σ

(b′ c)Σ
n
( j=1 b2j )Σ
Consequently, V1 and V2 are independent if b′ c = 0.

2 Sampling from a Multivariate Normal Distribu-


tion and MLE
Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal population
with mean µ and covariance Σ. Then
1∑
n
(n − 1)
µ̂ = X̄ and Σ̂ = (Xj − X̄)(Xj − X̄)′ = S
n j=1 n

are the maximum∑likelihood estimator of µ and Σ, respectively. Their observed


values, x̄ and n1 j=1 (xj − x̄)(xj − x̄)′ are called the maximum likelihood
n

estimates of µ and Σ.

2
2.1 Sufficient Statistics
Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal population
with mean µ and covariance Σ. Then

X̄ and S are sufficient statistics.

2.2 Sampling Distribution


– Let X1 , X2 , · · · , Xn be a random sample from a multivariate normal pop-
ulation with mean µ and covariance Σ. Then
1. X̄ is distributed as Np (µ, (1/n)Σ)
2. (n − 1)S is distribued as Wishart random matrix with n-1 d.f.
3. X̄ and S are independent.
– Let X1 , X2 , · · · , Xn be a random sample from a population with mean µ
and finite (nonsingular) covariance Σ. Then for n − p large, we have
p p
1. X̄ −
→ µ, and S −→Σ
√ a
2. n(X̄ − µ) ∼ Np (µ, (1/n)Σ)
a
3. n(X̄ − µ)′ S −1 (X̄ − µ) ∼ χ2p

2.3 Wishart Distribution


Let X1 , X2 , · · · , Xn be a random sample from a∑ multivariate normal popula-
tion with mean µ and covariance Σ. Then W = i=1 Xi Xi ′ is distributed as
n

Wishart distribution with d.f of n, denoted as Wn (∆, Σ), where ∆ = µµ′ . When
µ = 0, it is called central Wishart distribution, denoted as Wn (0, Σ).

∑A
– If j ∼ Wnj (∆j , Σ), j = 1, 2, · ·∑
· , m are mutually
∑mindependent, then
m m
j=1 W j ∼ W n (∆, Σ) where n = n
j=1 j , ∆ = j=1 ∆j .
– If W ∼ Wn (∆, Σ), C is a m×n matrix, then CW C ′ ∼ Wm (C∆C ′ , CΣC ′ ).

3 Assessing the Assumption of Normality


– Evaluating the normality of the univariate marginal distribution
1. histogram
2. Q-Q plot
3. Shapiro-Wilks’ test
4. Kolmogorov-Smirnov test
– Evaluating bivariate normality

3
1. Chi-square plot
2. Multiple testing
3. Energy test
Example: Constructing a Q-Q plot A sample of n = 10 observations gives the
values in the following table:

Ordered observations Probability levels Standard normal quantiles


x(j) (j − 1/2)/n q(j)
-1.00 0.05 -1.645
-0.10 0.15 -1.036
0.16 0.25 -0.674
0.41 0.35 -0.385
0.62 0.45 -0.125
0.80 0.55 0.125
1.26 0.65 0.385
1.54 0.75 0.674
1.71 0.85 1.036
2.30 0.95 1.645

The steps leading to a Q-Q plot are as follows:


– Order the original observations to get x(1) , x(2) , . . . , x(n) and their corre-
sponding probability values
1 1 1
(1 − )/n, (2 − )/n, . . . , (n − )/n;
2 2 2
– Calculate the standard normal quantiles
q(1) , q(2) , . . . , q(n) ;

– Plot the pairs of observations


(q(1) , x(1) ), (q(2) , x(2) ), . . . , (q(n) , x(n) ),
and examine the "straightnesss" of the outcome.

3.1 Evaluating Multivariate Normality


– Recall: Let X is distributed as Np (µ, Σ) with |Σ| > 0. Then (X − µ)′ Σ−1 (X − µ) ∼
χ2p . So,
(X − X̄)′ S −1 (X − X̄) ∼ χ2p
a

– chi-square plot: d2j = (xj − x̄)′ S−1 (xj − x̄), j = 1, 2, . . . , n


1. Order the squared distances from the smallest to largest as d2(1) ≤ d2(2) ≤
· · · d2(n)
2. Graph the pairs (qc,p ((j − 12 /n), d2(j) ), where qc,p ((j − 21 )/n) is the 100(j −
1
2
)/n quantile of the chi-square distribution with p degrees of freedom.
– The plot should resemble a straight line through the origin having slope 1. A sys-
tematic curved pattern suggests lack of normality.

4
3.2 Detecting Outliers and Cleaning Data
– Make a dot plot for each variable
– Make a scatter plot for each pair of variables

– Calculate the standardized values for each variable zjk = (xjk − x¯k )/ skk
for j = 1, 2, · · · , n for each column k = 1, 2, · · · , p. Examine these stan-
dardized values for large or small values.
– Calculate the generalized squared distances d2j = (xi − x̄)′ S −1 (xi − x̄)
Examine these distances for usually large values. In a chi-square plot, these
would be points farthest from the origin.

3.3 Transformation to Near Normality


– Transforming univariate observations:
Original Scale Transformed Scale

Counts y y
Proportions logit(p) = 2 log(p(1 − p))
1

Correlations F isher′ s z(r) = 21 log( 1−r


1+r
)

– Box-Cox transformation:
{
xλ −1
(λ) λ λ ̸= 0
x =
lnλ λ=0

– Transforming multivariate observations: let λ1 , λ2 , · · · , λp be the power


transformations for the p measured characteristics. Each λk can be selected
by maximizing

n 1 ∑ (λk ) ∑
n n
(λ )
lk (λ) = − ln[ (xjk − xk k )2 ] + (λk − 1) lnxjk
2 n j=1 j=1

where x1k , x2k , · · · , xnk are the n observations on the kth variable, k =
1, 2, · · · , p. Here

1 ∑ (λk ) 1 ∑ ( xjk λk − 1 )
n n
(λk )
xk = xjk =
n j=1 n j=1 λk
is the arithmetic average of the transformed observations.
– We start with the values λ̂1 , λ̂2 , . . . , λ̂p obtained from the preceding trans-
formations and iterate toward the set of values λ′ = [λ1 , λ2 , . . . , λp ], which
collectively maximizes
ℓ(λ1 , λ2 , . . . , λp )
∑n ∑n
= − n2 ln|S(λ)| + (λ1 − 1) j=1lnxj1 + (λ2 − 1) j=1 lnxj2
∑n
+ · · · + (λp − 1) j=1 lnxjp

5
where S(λ) is the sample covariance matrix obtained from
 λ 
xj11 −1
 λ1 
 λ
xj22 −1 
 
= ,
(λ) λ2
xj  ..  j = 1, 2, . . . , n
 . 
 λ

xjpp −1
λp

You might also like