Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views29 pages

Week2 3 MatrixApproach Part2

The document covers the matrix approach to simple linear regression, focusing on the geometry of vectors, matrix multiplication, determinants, and properties of matrices. It explains concepts such as orthogonality, projections, and the use of inverse matrices in solving simultaneous equations. Additionally, it discusses the variance-covariance matrix for random vectors and the trace of a matrix, highlighting their applications in statistics.

Uploaded by

c2pnynkvyh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views29 pages

Week2 3 MatrixApproach Part2

The document covers the matrix approach to simple linear regression, focusing on the geometry of vectors, matrix multiplication, determinants, and properties of matrices. It explains concepts such as orthogonality, projections, and the use of inverse matrices in solving simultaneous equations. Additionally, it discusses the variance-covariance matrix for random vectors and the trace of a matrix, highlighting their applications in statistics.

Uploaded by

c2pnynkvyh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

STAT 5205

Matrix Approach to Simple


Linear Regression
Geometry of Vectors

• A vector of order n is a point in n-dimensional space


• The line running through the origin and the point
represented by the vector defines a 1-dimensional
subspace of the n-dim space
• Any p linearly independent vectors of order n, p < n
define a p-dimensional subspace of the n-dim space
• Two vectors x and y are orthogonal if x’y = y’x = 0
and form a 90 angle at the origin
• Two vectors x and y are linearly dependent if they
form a 0 or 180 angle at the origin
Geometry of Vectors

• The length Lx of a vector x = (x1, … , xn)ʹ (aka norm) is defined as


length of 𝒙 = 𝒙 = 𝐿𝒙 = 𝑥12 + 𝑥22 + ⋯ +𝑥𝑛2= 𝒙′ 𝒙

• Cauchy-Schwarz inequality:
𝒂⋅𝒃 ≤ 𝒂 𝒃

• How does it relate to statistics? If x = (x1, … , xn)ʹ is a random sample with sample
mean 𝒙
ഥ, then the sample standard deviation of x is

𝑥1 − 𝒙 2 ഥ
+ ⋯ + 𝑥𝑛 − 𝒙 2 ഥ
𝒙−𝒙
𝑠= = length of
𝑛−1 𝑛−1
Geometry of Vectors

• The angle 𝜃 between two vectors x and y is defined to be


such that
𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛 𝒙′ 𝒚
cos 𝜃 = =
𝐿 𝒙 𝐿𝒚 𝒙′ 𝒙 𝒚′ 𝒚
𝒙′ 𝒚
⇒ 𝜃 = arccos
𝒙′ 𝒙 𝒚′ 𝒚

• If 𝒂 ⋅ 𝒃 = 𝟎 then the vectors are called orthogonal.

• In statistics: If two vectors each have mean 0 among their


elements, then cos(𝜃) is the correlation between the two
vectors.
Example: Geometry of Vectors

Let
−3 1
−1 5
𝒙= , 𝒚=
5 −3
1 3
Find cos(𝜃).

𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛 𝒙′ 𝒚
cos 𝜃 = =
𝐿𝒙 𝐿𝒚 𝒙 ′ 𝒙 𝒚′ 𝒚

𝐿𝒙 = 𝑥12 + 𝑥22 + ⋯ 𝑥𝑛2


Geometry
of Vectors
• The projection of y on x is:
• 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝒚 𝑜𝑛 𝒙 =
𝒚′ 𝒙 𝒚′ 𝒙 𝒚′ 𝒙 1
2 𝒙 = ′ 𝒙 =
𝐿𝒙 𝒙𝒙
𝒙
𝐿𝒙 𝐿𝒙
• and the length of the
projection is:
|𝒚′ 𝒙| 𝒚′ 𝒙
• = 𝐿𝒚 = 𝐿𝒚 cos 𝜃
𝐿𝒙 𝐿𝒚 𝐿𝒙
• Applications in statistics:
many, for example,
regression and principal
components.
Graphical Depiction of Matrix multiplication

1 2 3 1 −2
Consider 𝑴 = ,𝒂 = ,𝒃 = ,𝒄 =
3 −1 4 7 3

Find the image of the quadrilateral 0abc.


• Graphical Depiction of Matrix multiplication:
Matrix multiplication scales/rotates/skews a geometric plane.
Ax

 a12  x
a  2
x =  1 
x  22 

x
 2  a12  • • aa11  x1
a 
 22 
()
0  21 
1 • • • aa 
11

 21 

(10)
DETERMINANTS OF SQUARE MATRICES

A =  11 12 
a a •
 a21 a22 
 a12 
a 
 22  •
 det( A)
•a 
• •  a11 
= a11a22 –  21 
a21a12

 | det( A) |

= Area of the image of the unit square under A


Example:
1 2
𝑴= ⇒ det 𝑴 = 1 −1 − 2 3 = −7
3 −1
Time to think: What does the negative sign represent?
DETERMINANTS OF SQUARE MATRICES

In general,
det 𝑨 = 𝑎𝑖1 𝐴𝑖1 + 𝑎𝑖2 𝐴𝑖2 + ⋯ + 𝑎𝑖𝑛 𝐴𝑖𝑛 , ∀𝑖 = 1, … , 𝑛
where 𝐴𝑖𝑗 are the so-called co-factors.

Exercise:
1 2 1
𝐴= 0 1 2
−3 4 1
Find
Matrix Inverse
• Note: For scalars (except 0), when we multiply a number, by its reciprocal,
we get 1:
2(1/2) = 1 𝑥(1/𝑥) = 𝑥(𝑥 −1 ) = 1

• In matrix form if A is a square matrix of full rank (all rows and columns are
linearly independent), then A has an inverse: A-1 such that: A-1 A = A A-1 = I

• Example: Let
2 8
2 8
𝑨= ⇒ 𝑨−1 = 36 36
4 −2 4 2

36 36
Verify:
2 8 4 32 16 16
+ −
𝑨−1 𝑨 = 36 36
2 8
= 36 36 36 36 = 1 0 = 𝑰
4 2 4 −2 8 8 32 4 2
0 1
− − +
36 36 36 36 36 36
Computing an Inverse of 2x2 Matrix
Use of Inverse Matrix – Solving Simultaneous Equations
AY = C where A and C are matrices of of constants, Y is matrix of unknowns
 A -1 AY = A -1C  Y = A -1C (assuming A is square and full rank)

Equation 1: 12 y1 + 6 y2 = 48 Equation 2: 10y1 − 2 y2 = 12


12 6   y1   48
A=  Y=  C=  Y = A -1C
10 −2   y2  12 
1  −2 −6  1  2 6 
 A -1 = =
12(−2) − 6(10)  −10 12  84 10 −12 

1 2 6   48 1  96 + 72  1 168   2 
Y=A C= 
-1
   =   =   = 
84 10 −12  12  84  480 − 144  84 336   4 

Note the wisdom of waiting to divide by | A | at end of calculation!


Useful Matrix Results
All rules assume that the matrices are conformable to operations.

• Addition rules:
A+B=B+A
(A + B) + C = A + (B + C)

• Multiplication rules:
(A B)C = A(BC)
C(A + B) = CA + CB
k(A + B) = kA + kB, where k is a scalar

• Transpose rules:
(Aʹ)ʹ = A
(A + B)ʹ = A ʹ + Bʹ
(ABC)ʹ = C ʹBʹAʹ

• Inverse rules (assuming square matrices of full rank):


(A -1)-1 = A
(ABC)-1 = C-1B-1A-1
(Aʹ)-1 = (A-1)ʹ
Important Matrix Results

In general,
• 𝑨𝑩 ≠ 𝑩𝑨 (no commutative law)
• 𝑨𝑩 = 𝟎 does not imply 𝑨 = 0 or 𝑩 = 0
• 𝑨𝑩 = 𝑨𝑪 does not imply 𝑩 = 𝑪 even if 𝑨 ≠ 𝟎
PROPERTIES OF DETERMINANTS

• Another notation for det(A) is |A|


• |A′| = |A|
• |AB| = |A||B| only when A and B are both square
• |A-1| = |A|-1
• Partitioned matrices:
𝐴11 𝐴12 −1
= 𝐴11 𝐴22 − 𝐴21𝐴11 𝐴12
𝐴21 𝐴22
= 𝐴22 𝐴11 − 𝐴12𝐴−1 22 𝐴21
Trace of a matrix

Definition: If A is a square matrix n x n, then the trace is an operator


defined as
𝑛

tr 𝑨 = ෍ 𝑎𝑖𝑖
𝑖=1

Properties:

• tr(𝛼A) = 𝛼 tr(A)
• tr(A + B) = tr(A) + tr(B)
• tr(AB) = tr(BA)
𝑝
• tr(A′A) = tr (AA′) = σ𝑛𝑖=1 σ𝑗=1 𝑎𝑖𝑗
2
, where A is n x p matrix
• Circular shift: tr(ABC) = tr(BCA) = tr(CAB)
• It is a useful tool when we need to find the MLE of the covariance
matrix of multivariate normal distribution.
Random Vectors and Matrices
Shown for case of n=3, generalizes to any n:
 Y1 
Random variables: Y1 , Y2 , Y3  Y = Y2 
Y3 

 E Y1 
 
Expectation: E Y =  E Y2  In general: E Y =  E Yij  i = 1,..., n; j = 1,..., p
 E Y3  n p

Variance-Covariance Matrix for a Random Vector:


  Y1 − E Y1  
  
 
 2 Y = E  Y − E Y  Y − E Y ' = E  Y2 − E Y2  Y1 − E Y1 Y2 − E Y2  Y3 − E Y3   =
 Y − E Y   
 3 3  
 ( Y1 − E Y1) (Y − E Y ) (Y − E Y ) (Y − E Y ) (Y − E Y )    
2

 1 1 2 2 1 1 3 3 2
1  12  13 

= E  (Y2 − E Y2 ) (Y1 − E Y1) (Y − E Y ) (Y − E Y ) (Y − E Y )  =  
2

 2 2 2 2 3 3 21  22  23  = 
     32  32 
  (Y3 − E Y3 ) (Y1 − E Y1) (   )(   ) (   )
2
Y − E Y Y − E Y Y − E Y  
31
 3 3 2 2 3 3
Linear Regression Example (n=3)
Error terms are assumed to be independent, with mean 0, constant variance  2 :
 E  i  = 0  2  i  =  2   i ,  j  = 0  i  j
 1  0  2 0 0
 
ε =  2  E ε = 0 σ 2 ε =  0  2 0  =  2I
 3  0 0 0  2
 

Y = Xβ + ε E Y = E Xβ + ε = Xβ + E ε = Xβ


 2 0 0
 
σ 2 Y = σ 2 Xβ + ε = σ 2 ε =  0  2 0  =  2 I
0 0  2
 
Mean and Variance of Linear Functions of Y
Frequently we encounter a random vector W that is obtained by multiplying a
random vector Y by a constant matrix A:

W = AY

That is, if A is k  n and Y is 1  n, then W is 1  k with:

𝑊1 𝑎11 𝑌1 + ⋯ + 𝑎1𝑛 𝑌𝑛
𝑊= ⋮ = ⋮
𝑊𝑘 𝑎𝑘1 𝑌1 + ⋯ + 𝑎𝑘𝑛 𝑌𝑛

Some basic results:


E(W) = AE(Y)

2(W) = 2(AY) = A2(Y)A’


or
cov(W) =Acov(Y)A’
Exercise (5.18. on p. 211)
Consider the following functions of the random variables Y1, Y2, Y3, and Y4 :

1
𝑊1 = 𝑌1 + 𝑌2 + 𝑌3 + 𝑌4
4

1 1
𝑊2 = 𝑌1 + 𝑌2 − 𝑌3 + 𝑌4
2 2

a) State the above in matrix notation;


b) Find E(W);
c) Find the cov(W)
Multivariate Normal Distribution
The observations vector Y contains an observation from each of the p variables:
𝑌1
𝑌2
𝒀= ⋮
𝑌𝑝

The mean vector E(Y), denoted by , contains the expected value of each of the p variables:
𝜇1
𝜇2
𝑬 𝒀 =𝝁= ⋮
𝜇𝑝

Finally, the covariance matrix 2(Y), denoted , contains the variances and covariances:

𝜎12 𝜎12 … 𝜎1𝑝


𝜎 𝜎22 … 𝜎2𝑝
𝜮 = 21
⋮ ⋮ ⋮
𝜎𝑝1 𝜎𝑝2 … 𝜎𝑝2
Multivariate Normal Distribution
The density function of the multivariate normal distribution can be stated as:

1 1
𝑓(𝒀) = 𝑝 1 exp − (𝒀 − 𝝁)′Σ −1 (𝒀 − 𝝁)
2
2𝜋 2 Σ 2

We abbreviate this as:


Y ~ N(, )

It can be shown that marginally each Yi is normally distributed:

𝑌𝑖 ~ 𝑁 𝑖, 𝜎𝑖2 , 𝑖 = 1, … , 𝑝
and (Yi, Yj) = ij, i  j

Theorem: If A is a matrix of fixed constants, then:


W = AY ~ N(A, AA’)
Simple Linear Regression in Matrix Form
Estimating Parameters by Least Squares
Q Q
Normal equations obtained from: , and setting each equal to 0:
 0 1
n n
nb0 + b1  X i =  Yi
i =1 i =1
n n n
b0  X i + b1  X =  X iYi
i
2

i =1 i =1 i =1

 n
  n 
 n  Xi    Yi  b 
Note: In matrix form: X ' X =  n  X'Y =  n  Defining b =  0 
i =1 i =1

 n
    b1 
 X i  X i2    X iYi 
 i =1 i =1   i =1 
 X'Xb = X'Y  b = ( X'X ) X'Y
-1

Exercise:
Based on matrix form: Verify that the
Q = ( Y - Xβ ) ' ( Y - Xβ ) = Y'Y - Y'Xβ - β'X'Y + β'X'Xβ = formula
 n n
 n n 𝑏 = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑌
= Y ' Y − 2   0  Yi + 1  X iYi  + n 0 + 2  0 1  X i + 1  X i2
2 2

 i =1 i =1  i =1 i =1 Reduces to the
 Q   n n
 previous version with

    − 2  Yi + 2 n  0 + 2 1 Xi  SSxx, SSXY, …
(Q) =  =  = −2 X ' Y + 2 X ' X 
0 i =1 i =1

β  Q   n n n
2
    −2 X iYi + 2 0  X i + 2 1  X i 
 1   i =1 i =1 i =1 
Setting this equal to zero, and replacing  with b  X ' Xb = X ' Y
Fitted Values and Residuals
^ ^
Y i = b0 + b1 X i ei = Yi − Y i In Matrix form:
^ 
 Y 1   b0 + b1 X 1 
^  ^  b + b X 
Y = Y 2  =  0 1 2  = Xb = X ( X'X ) X'Y = HY H = X ( X'X ) X'
-1 -1

 M  M 
 ^  b +b X 

Y n   0 1 n 
 
H is called the "hat" or "projection" matrix, note that H is idempotent (HH = H ) and symmetric(H = H'):
HH = X ( X'X ) X'X ( X'X ) X' = X'I ( X'X ) X' = X ( X'X ) X' = H
-1 -1 -1 -1
( -1
)
H' = X ( X'X ) X' ' = X ( X'X ) X' = H
-1

 ^
 ^ 
Y
 1 − Y 1
  Y1   Y 1 
 ^  Y   ^ 
e = Y2 − Y 2  =  2  − Y 2  = Y - Y = Y - Xb = Y - HY = (I - H)Y
^

 M   M  M
    ^ 
Y − Y n  Yn  Y n 
^

 n   

^
Note: E Y = E HY = HE Y = HXβ = X ( X'X ) X'Xβ = Xβ
-1
^
σ 2 Y = Hσ 2IH' =  2 H

E e = E ( I − H ) Y = ( I − H ) E Y = ( I − H ) Xβ = Xβ - Xβ = 0 σ 2 e = ( I − H ) σ 2I ( I − H ) ' =  2 ( I − H )


^
s Y = MSE H
2
s 2 e = MSE ( I − H )
Analysis of Variance
How do we write sum of squares in a matrix form?
𝑛 𝑛 𝑛 2
σ 𝑖=1 𝑌𝑖
𝑆𝑆𝑇 = ෍ 𝑌𝑖 − 𝑌ത 2 = ෍ 𝑌𝑖2 −
𝑛
𝑖=1 𝑖=1
𝑛 1 ⋯ 1
2 ′
σ𝑛𝑖=1 𝑌𝑖 2 1 ′
෍ 𝑌𝑖 = 𝒀 𝒀, = 𝒀 𝑱𝒀, 𝑱 = ⋮ ⋱ ⋮
𝑛 𝑛
𝑖=1 1 ⋯ 1

1 ′ ′
1
⇒ 𝑆𝑆𝑇 = 𝒀 𝒀 − 𝒀 𝑱𝒀 = 𝒀 𝐈 − 𝐉 𝒀
𝑛 𝑛
𝑆𝑆𝐸 = 𝒆′ 𝒆 = 𝒀 − 𝑿𝒃 ′ 𝒀 − 𝑿𝒃 = 𝒀′ 𝒀 − 𝒀′ 𝑿𝒃 − 𝒃′ 𝑿′ 𝒀 + 𝒃′ 𝑿′ 𝑿𝒃
= 𝒀′ 𝒀 − 𝒃′ 𝑿′ 𝒀 = 𝒀′ 𝑰 − 𝑯 𝒀
Since 𝒃′ 𝑿′ 𝒀 = 𝒀′ 𝑿′ 𝑿′ 𝑿 −1𝑿′ 𝒀 = 𝒀′ 𝑯𝒀

Exercise:
Show that
1
𝑆𝑆𝑅 = 𝒀′ 𝑯 − 𝑱 𝒀
𝑛
Note: All of these sums of squares are of the form 𝑌 ′ 𝐴𝑌 where A is a symmetric
matrix.
Time to think: What is the name of expressions like 𝑌 ′ 𝐴𝑌 ?
Eigenvalues and Eigenvectors

Def: If A is a square matrix and 𝜆 is a scalar and x is a nonzero vector such that
Ax = 𝜆x
then we say that 𝜆 is an eigenvalue of A and x is its corresponding eigenvector.

Note: To find the eigenvalues we solve |A - 𝜆I| = 0

Note: If A is n⨉n then A has n eigenvalues 𝜆1, … , 𝜆n.


The 𝜆’s are not necessarily all distinct, or nonzero or real numbers.

Properties:
• tr 𝐴 = σ𝑛𝑖=1 𝜆𝑖
• 𝐴 = ς𝑛𝑖=1 𝜆𝑖

Note: A symmetric matrix is positive definite if all the eigenvalues are positive.

10 3
Exercise: Find the eigenvalues and eigenvectors of 𝑨 =
3 8
Inferences in Linear Regression
b = ( X'X ) X'Y Þ E b = ( X'X ) X'E Y = ( X'X ) X'Xβ = β
-1 -1 -1

σ 2 b = ( X'X ) X'σ 2 Y X ( X'X ) = σ 2 ( X'X ) X'IX ( X'X ) = σ 2 ( X'X ) s 2 b = MSE ( X'X )
-1 -1 -1 -1 -1 -1

1 X
2
X   MSE 2
X MSE X MSE 
 + n − n   + n − n 
n
( ) (
Xi − X  )  n
( ) (
Xi − X  )
2 2 2 2

  Xi − X     Xi − X  
( X'X )  s b = 
-1
Recall: = i =1 i =1

2 i =1 i =1

 X 1   X MSE MSE 
 − −
2   2 
( ) ( ) ( ) ( )
n 2 n n 2 n
  Xi − X  Xi − X    Xi − X  Xi − X 
 i =1 i =1   i =1 i =1 

Estimated Mean Response at X = X h :


^
Y h = b0 + b1 X h = Xh'b
1 
Xh =  
Xh 
 ^
s 2 Y h = Xh's 2 b Xh = MSE Xh' ( X'X ) Xh ( -1
)
Predicted New Response at X = X h :

( )
^
s 2 pred = MSE 1 + Xh' ( X'X ) Xh
-1
Y h = b0 + b1 X h = Xh'b

You might also like