04 LinAlgTutorial
04 LinAlgTutorial
Yunkai Zhou
Department of Mathematics
Southern Methodist University
Dallas, Texas 75075
[email protected]
Spring, 2011
Acknowledgements
Matrices: (element-wise)
An m × n matrix A ∈ Rm×n (or A ∈ Cm×n )
A = ai,j
Matrices: (element-wise)
An m × n matrix A ∈ Rm×n (or A ∈ Cm×n )
A = ai,j
where ai ∈ Rm (or Cm ), i = 1, 2, . . . , n.
Transpose:
A = [ai,j ]m×n ⇐⇒ AT = [aj,i ]n×m
Example:
a11 a12
a11 a21 a31
A = a21 a22 ⇐⇒ AT =
a12 a22 a32
a31 a32
Transpose:
A = [ai,j ]m×n ⇐⇒ AT = [aj,i ]n×m
Example:
a11 a12
a11 a21 a31
A = a21 a22 ⇐⇒ AT =
a12 a22 a32
a31 a32
Example:
a11 a12
ā ā21 ā31
A = a21 a22 ⇐⇒ A = 11 H
ā12 ā22 ā32
a31 a32
A is symmetric: if A = AT
(usually it refers to “real” symmetric,
it can also be “complex” symmetric)
A is hermitian: if A = AH (or A = A∗ )
Vector-wise notation:
a ∈ Cm ⇐⇒ aT ∈ C1×m
a1T
a2T
A = a1 , a2 , · · · , an ∈ Cm×n ⇐⇒ AT = . ∈ Cn×m
..
anT
Matrix-vector product b = Ax
n
X
Element-wise bi = ai,j xj , i = 1, 2, . . . , m
j=1
n
X
Vector-wise b= aj xj
j=1
A(x + y) = Ax + Ay, ∀ x, y ∈ Cn
A(αx) = αAx, ∀α∈C
Matrix-vector product b = Ax
Vector-wise
n
X
b = aj xj
j=1
aj = A(:, j) = Aej
A(i, :) = eiT A
Matrix-matrix product C = AB
Vector-wise (compare columns in C = AB)
[c1 , c2 , . . . , ck ] = A[b1 , b2 , . . . , bk ]
n
X
=⇒ cj = Abj = ak bk,j
k=1
Vendor provided:
— Intel Math Kernel Library (MKL)
— AMD Core Math Library (ACML)
— Sun Performance Library
— SGI Scientific Computing Software Library
Automatically Tuned Linear Algebra Software (ATLAS)
— Analyzes hardware to produce BLAS libraries for any platform
— Used in MATLAB, precompiled libraries freely available
— Sometimes outperforms vendor libraries
GOTO BLAS (mainly for Intel processors)
— Manually optimized assembly code,
(fastest implementation for Intel processors)
range(A) = span{a1 , a2 , . . . , an }
= All linear combinations of the columns of A
= {Ax | ∀ x ∈ Cn }
null(A) = {x | Ax = 0}
Equivalently,
Rank-nullity theorem: rank(A) + dim(null(A)) = n
Theorem
An m × n matrix A (m ≥ n) is full rank iff null(A) = {0}.
In other words, a full rank matrix never maps two different vectors to a
same vector.
Theorem:
An elementary matrix I − uv T is always invertible, its inverse is
uv T
(I − uv T )-1 = I − ,
v u−1
T
-1
A11 0 A-1
11 0
=
A21 A22 −A22 A21 A-1
-1
11 A-1
22
-1
A11 A12 A-1 −A-1 -1
11 A12 A22
= 11
0 A22 0 A-1
22
In particular,
-1 -1
I 0 I 0 I A12 I −A12
= , =
A21 I −A21 I 0 I 0 I
If A22 is invertible,
A11 A12 I A12 A-1
22 Ŝ 0 I 0
=
A21 A22 0 I 0 A22 A-1
22 A21 I
If A is nonsingular, then
-1 -1
A11 A12 I −A-111 A12 A11 0 I 0
=
A21 A22 0 I 0 S -1 −A21 A-111 I
-1 -1 -1 -1 -1 -1
A11 + A11 A12 S A21 A11 −A11 A12 S
=
−S -1 A21 A-1
11 S -1
Similarly,
-1
A11 A12 I 0 Ŝ -1 0 I −A12 A-1
22
=
A21 A22 −A-1
22 A21 I 0 A-1
22
0 I
" #
Ŝ -1 −Ŝ -1 A12 A-1
22
=
−A22 A21 Ŝ -1
-1 A-1 22 + A-1 A Ŝ -1 A A-1
22 21 12 22
Special cases:
(Sherman-Morrison) If A is nonsingular, u, v ∈ Cn , and
1 + v H A-1 u 6= 0, then
-1 A-1 uv H A-1
A + uv H = A-1 −
1 + v H A-1 u
(Sherman-Morrison-Woodbury) If A is nonsingular, U, V ∈ Cn×k ,
and Ik + V H A-1 U is invertible, then
-1 -1
A + UV H = A-1 − A-1 U I + V H A-1 U V H A-1
Y. Zhou Math-6316/CS-7366, SMU 27/209
Basic Linear Algebra: Vector Norms
Definition:
A vector norm k·k on a vector space X is a real-valued function on X,
which satisfies the following three conditions:
1. kxk ≥ 0, ∀ x ∈ X, and kxk = 0 iff x = 0.
2. kαxk = |α|kxk, ∀ x ∈ X, ∀α ∈ C.
3. (Triangle inequality) kx + yk ≤ kxk + kyk, ∀ x, y ∈ X.
n
!1/p
X
p
kxkp = |xi | , p ≥ 1. (if p < 1, does kxkp define a norm?)
i=1
Y. Zhou Math-6316/CS-7366, SMU 28/209
Verification of Norm Conditions
√
Example 2: Show that kxkM = x H Mx, where M is (hermitian) PD,
defines a norm on Cn . (This is called a weighted 2-norm.)
√
Example 2: Show that kxkM = x H Mx, where M is (hermitian) PD,
defines a norm on Cn . (This is called a weighted 2-norm.)
√
1. Since M is PD, kxkM = x H Mx ≥ 0, and kxkM = 0 iff x = 0
√
2. kαxkM = ᾱx H Mαx = |α|kxkM
3. kx + yk2M = (x + y)H M(x + y) = x H Mx + x H My + y H Mx + y H My,
Since M is PD, let M = W H W for some W nonsingular, then
x H My + y H Mx = (Wx)H (Wy) + (Wy)H (Wx) ≤ 2 kWxk2 kWyk2 =
2kxkM kykM , therefore kx + yk2M ≤ (kxkM + kykM )2 .
Y. Zhou Math-6316/CS-7366, SMU 29/209
Basic Linear Algebra: Matrix norms
Clearly,
kAxkα ≤ kAkα kxkα .
Property: Every induced matrix norm is sub-multiplicative.
Proof: For any compatible A, B and an induced matrix norm k·kα ,
kABkα = max kABxkα ≤ max kAkα kBxkα ≤ kAkα kBkα .
kxkα =1 kxkα =1
1-norm:
m
kAxk1 X
kAk1 = max = max |aij |
x6=0 kxk1 1≤j≤n
i=1
2-norm:
kAxk2 p
kAk2 = max = λmax (AH A) = σmax (A)
x6=0 kxk2
n
!1/p
X
kX kSp := σi (X )p .
i=1
Special cases:
Nuclear norm (p = 1), also called the trace norm or Ky-Fan norm:
n
X
kX k∗ = kX ktr := σi (X )
i=1
For x, y ∈ Rn ,
Pn
hx , yi := y T x = x T y = i=1 xi yi
For x, y ∈ Cn ,
Pn
hx , yi := y H x = x H y = i=1 xi ȳi
hAx , yi = x , AH y , ∀ x ∈ Cn , y ∈ Cm
hx , xi ≥ 0 ∀ x
Cauchy inequality: (Cauchy-Bunyakowski-Schwarz)
hx , yi ≤ kxk2 kyk2
hu , v i = hv , ui, ∀ u, v ∈ V
hαu , v i = α hu , v i , ∀α ∈ F
hu1 + u2 , v i = hu1 , v i + hu2 , v i , ∀u1 , u2 , v ∈ V
hx , yi = 0, ∀x ∈ X , ∀y ∈ Y
hx , yi = 0, ∀x, y ∈ S, x 6= y
Q -1 = Q T
Q -1 = Q H
Qx = b ⇐⇒ x = QHb
u Qu u
Qu v Qv v
−→ −→ −→ −→
Rotating OA anti-clockwise by θ to O Ã. Denote L = kOAk = kO Ãk.
x̃ x cos(θ) − sin(θ) x
=⇒ = G(θ) :=
ỹ y sin(θ) cos(θ) y
If rotate clockwise by θ, then the Givens rotation matrix is
cos(θ) sin(θ)
G(−θ) = .
− sin(θ) cos(θ)
−→ −→ −→ −→
Rotating OA anti-clockwise by θ to O Ã. Denote L = kOAk = kO Ãk.
x̃ x cos(θ) − sin(θ) x
=⇒ = G(θ) :=
ỹ y sin(θ) cos(θ) y
If rotate clockwise by θ, then the Givens rotation matrix is
cos(θ) sin(θ)
G(−θ) = . G-1 (θ) = G(−θ)
− sin(θ) cos(θ)
x̃ x cos(θ) − sin(θ) x x cos(θ) − y sin(θ)
= G(θ) := =
ỹ y sin(θ) cos(θ) y y cos(θ) + x sin(θ)
x
To zero out the 2nd element in , simply choose a θ s.t. ỹ = 0,
y
−x
i.e., cot(θ) = y
There are more numerically stable ways to compute the
sin(θ), cos(θ) from x, y
To selectively zero out k elements in a length-n vector, apply
corresponding Givens rotation k times sequentially
1 ··· 0 ··· 0 ··· 0
.. . . .. .. ..
.
. . . .
0 · · · cos(θ) ··· − sin(θ) ··· 0
.. .. .. .. ...
G(i, j, θ) = . . . .
0 · · · sin(θ) ··· cos(θ) ··· 0
. .. .. .. ..
.. . . . .
0 ··· 0 ··· 0 ··· 1
Hx = αe1
H is unitary =⇒ kHxk2 = kαe1 k2 = |α| = kxk2
w(w H x)
.
w Hw
w = Hx − x
From x, need to go twice the length
of this projection to reach Hx:
Hx = αe1
ww H x
Hx = x − 2
w Hw
ww H
H =I−2
w Hw
H = I − 2vv H , where kv k2 = 1 .
LU decomposition
A = LU
where L is unit lower triangular, U is upper triangular
Cholesky decomposition (for hermitian PSD matrices) :
A = R H R = LDLH
A = X ΛX -1 , Λ = diag(λ1 , · · · , λn ),
A = QΛQ H ,
A = USU H ,
σ1 2 y12 y22
If A = , then AS = (y1 , y2 ) σ12
+ σ22
= 1 is an
σ2
ellipse in R2
a11 a12 2
P 2 2
If A = a21 a22
, then AS = (y1 , y2 ) yi = j aij xj , x1 +x2 = 1
is an ellipse in R2
a11 a12
If A = a21 a22 , then AS 2 is a (reduced) ellipsoid in R3
a31 a32
(essentially it is still a 2-d ellipse)
Avj = σj uj
v1 σ 2u 2 σ 1u 1
v2
A
S AS
AS = UΣV T S
V T S contains rotations/reflections of S, it is still a unit sphere;
Σ(V T S) contains scaling of the new unit sphere, resulting in a
hyperellipse; and U(ΣV T S) contains rotations/reflections of the
hyperellipse, without changing its shape.
Y. Zhou Math-6316/CS-7366, SMU 52/209
Geometrical interpretation of SVD
Fact: Image of S = x kxk2 = 1, x ∈ Rn under any
A = UΣV T ∈ Rm×n is a hyperellipse AS in Rm .
The σi (A)’s measure how much distortion A applies to S:
U T AS is a hyperellipse in standard position, with k-th semiaxis equal
to σk (A).
T T
Note U AS = y y = U Ax, x ∈ S , (assume σi > 0, i = 1, . . . , n)
y := U T Ax = U T UΣV T x = ΣV T x, ∀x ∈ S
y12 y2 y2
kxk2 = V T x 2
= Σ-1 y = 1, =⇒ 2
+ 22 + · · · + n2 = 1
2 σ1 σ2 σn
In matrix notation,
" # " # σ1
σ2
A v1 v2 · · · vn = u1 u2 · · · un
..
=⇒ AV = UΣ
.
σn
Σ VH
Full SVD:
A = U U⊥
A = Ũ Σ̃V H
0
Σ VH
(Thin) SVD:
A = U
A = UΣV H
Σk VkH
Truncated SVD:
A ≈ Uk Σk VkH A ≈ Uk
EVD uses the same basis X for row and column space;
SVD uses two different bases V , U
EVD generally does not maintain an orthonormal basis in X ,
unless A is normal;
SVD always has two orthonormal bases
EVD is defined only for square matrices;
SVD exists for all matrices
For hermitian/symmetric positive definite matrices A, EVD and
SVD are the same (assuming same order in Λ and Σ)
For hermitian/symmetric matrices A, EVD and SVD are the same
except that σi = |λi | (assuming same order in Λ and Σ)
In other words,
r
X 2
σk+1 = min kA − Bk2 , σi2 = min kA − BkF .
rank(B)=k rank(B)=k
i=k+1
The proof uses a standard technique in linear algebra which may be called
dimensionality argument.
Proof. By contradiction, if ∃B ∈ Cm×n , rank (B) ≤ k s.t.
kA − Bk2 < σk+1 (A). Then ∀w ∈ ker (B), w 6= 0,
kAwk2 = k(A − B)wk2 ≤ kA − Bk2 kwk2 < σk+1 (A) kwk2 .
Note that dim (ker (B)) ≥ n − k, and
dim(span{v1 , v2 , . . . , vk+1 }) = k + 1, therefore
Pk+1
∃w0 ∈ ker (B) ∩ span{v1 , v2 , . . . , vk+1 }, where w0 = i=1 ci vi 6= 0, for
which it must
Pbe true that
k+1
kAw0 k2 = i=1 ci σi (A)ui ≥ σk+1 (A) kw0 k2 . A contradiction.
2
The {Zj }rj=1 construct part of an orthonormal basis of the Cm×n space:
Zi , Zj = trace Zj∗ Zi = δij
σj = A , Zj
160
140 2
10
120
100
1
10
80
60
0
10
40
20
−1
0 10
0 50 100 150 200 0 50 100 150 200
Singular values of the clown image, the horizontal lines plot the 1st,
5th, 10th, 15th, 20th, · · · , 65th, 70th singular values.
Original (Rank 200) Rank 1
[ 200 x 320 ] image, svd rank = 1
Rank 2
[ 200 x 320 ] image, svd rank = 2
[ 200 x 320 ] original image
20 20
20
40 40
40
60 60
60
80 80
80
100 100
100
120 120
120
140 140
140
160 160
160
180 180
180
200 200
50 100 150 200 250 300 50 100 150 200 250 300
200
50 100 150 200 250 300 Truncation error: sigma(2)/sigma(1)=2.315e−01 Truncation error: sigma(3)/sigma(1)=2.006e−01
Rank 3
[ 200 x 320 ] image, svd rank = 3
Rank 5
[ 200 x 320 ] image, svd rank = 5
Rank 10
[ 200 x 320 ] image, svd rank = 10
20 20 20
40 40 40
60 60 60
80 80 80
Rank 15
[ 200 x 320 ] image, svd rank = 15
Rank 25
[ 200 x 320 ] image, svd rank = 25
Rank 50
[ 200 x 320 ] image, svd rank = 50
20 20 20
40 40 40
60 60 60
80 80 80
450
2
400 10
350
1
300 10
250
0
200 10
150
−1
100 10
50
−2
0 10
0 200 400 600 0 200 400 600
Singular values of the lena image, the horizontal lines plot the 1st,
5th, 10th, 15th, 20th, · · · , 65th, 70th singular values.
Original (Rank 512) Rank 3 Rank 5
[ 512 x 512 ] image, svd rank = 3 [ 512 x 512 ] image, svd rank = 5
[ 512 x 512 ] original image
50 50
50
100 100
100
150 150
150
200 200
200
250 250
250
300 300
300
350 350
350
400 400
400
450 450
450
500 500
500
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
50 100 150 200 250 300 350 400 450 500
Truncation error: sigma(4)/sigma(1)=9.206e−02 Truncation error: sigma(6)/sigma(1)=7.938e−02
50 50 50
50 50 50
2
180 10
0
160 10
−2
140 10
−4
120 10
−6
100 10
−8
80 10
−10
60 10
−12
40 10
−14
20 10
−16
0 10
0 50 100 150 200 250 0 50 100 150 200 250
180 180
180
200 200
200
50 100 150 200 250 50 100 150 200 250
50 100 150 200 250
Truncation error: sigma(4)/sigma(1)=9.982e−02 Truncation error: sigma(6)/sigma(1)=5.124e−02
20 20 20
40 40 40
60 60 60
80 80 80
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
Truncation error: sigma(11)/sigma(1)=2.690e−02 Truncation error: sigma(16)/sigma(1)=1.869e−02 Truncation error: sigma(21)/sigma(1)=1.390e−02
20 20 20
40 40 40
60 60 60
80 80 80
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
Truncation error: sigma(31)/sigma(1)=9.384e−03 Truncation error: sigma(41)/sigma(1)=6.457e−03 Truncation error: sigma(51)/sigma(1)=5.271e−03
SVD: Proof of existence
m×n
This theorem can be stated as an exercise: Let the SVD
H
of A ∈ C
0 A
be A = UΣV H . Find the eigendecomposition of .
A 0
for k = 0, 1, · · · , n − 1 .
Y. Zhou Math-6316/CS-7366, SMU 76/209
(semi-) Variational Principle
P2 = P
If v ∈ range(P), then Pv = v
Since with v = Px, v b
Pv = P 2 x = Px = v
Projection along the line b
Pv − v ∈ null(P) Pv ra n g e
Since P(Pv − v ) = (P )
P 2 v − Pv = 0
b x
S2
b
S1
b
Px
v
b
ra n g e
(P )
b
Pv
range(P) = (null(P))⊥
For the (=⇒) part: Given any x ∈ Cm , let y = Px ∈ range (P). Since
range(P) = (null(P))⊥ = range(P ∗ ), y ∈ range (P ∗ ). Now apply the
properties of a projector,
y = Py = P 2 x = Px
y = P ∗ y = P ∗ Px,
which lead to Px = P ∗ Px, or (P − P ∗ P)x = 0, for all x ∈ Cm . This is
only possible when P = P ∗ P, taking conjugate transpose gives
P = P ∗ = P ∗ P.
Y. Zhou Math-6316/CS-7366, SMU 85/209
Equivalence of the definitions
kPk2 = 1 ⇐⇒ P = P ∗.
(Comment: This proof shows that the singular values, as well as eigenvalues, of
an orthogonal projector must be 1 or 0.)
By Schur-decomposition of P: Let P = QSQ ∗ , then
P 2 = P =⇒ S 2 = S. Let diag(S) = (sii ), comparing diagonal elements
of S 2= S we have
sii = 1 or 0 for all i. Assume S is ordered as
S S12
S = 11 , where diag(S11 ) = Ik , diag(S22 ) = 0m−k . Then clearly
S22
2
S22 = S22 =⇒ S22 = (0)m−k . Now use the condition kSk2 = 1 to show
that S12 = (0) and S11 = Ik : Let si: = S(i, :), i = 1 : k , by variational
e∗ Ss∗ s s∗
principal, σ1 (S) = 1 ≥ ke ki ksi: k = ksi: ki: = ksi: k2 .
i 2 i: 2 i: 2
Proof. For the (⇐=) part, kPk2 ≥ 1 easily follows from P 2 = P. Now
show kPk2 ≤ 1: Since range(P) ⊥ null(P), and (I − P)x ∈ null(P) for
any x, x = Px + (I − P)x is an orthogonal decomposition, by the
Pythagorean theorem kxk2 ≥ kPxk2 , hence kPk2 ≤ 1.
For the (=⇒) part: Given any nonzero x, y, with x ∈ range(P),
y ∈ null(P), need to show x ⊥ y:
Decompose x as x = αy + r where r ⊥ y and α ∈ C, then by the
2 2 2
Pythagorean theorem, kxk2 = |α|2 kyk2 + kr k2 . However, P is a
projector with kPk2 = 1,
PA v − v ⊥ range(A), or A∗ (PA v − v ) = 0,
A∗ (Ax − v ) = 0 ⇐⇒ A∗ Ax = A∗ v
Since A∗ A is nonsingular,
x = (A∗ A)−1 A∗ v
Recall that
A+ = (A∗ A)+ A∗ = A∗ (AA∗ )+
If A has full column rank,
A+ = (A∗ A)-1 A∗
Similarly, the orthogonal projector that projects onto range (A∗ ) (row
space of A) is
PA∗ = A∗ (AA∗ )+ A = A+ A .
q1 , q2 , . . . , qj = a1 , a2 , . . . , aj , for j = 1, . . . , n
or
A = QR
A = Q̃ R̃
A = Q
rij = qi∗ aj , (i 6= j)
and
j−1
X
|rjj | = kaj − rij qi k2
i=1
R ( 1 , 1 ) = norm ( A ( : , 1 ) ) ; Q ( : , 1 ) = A ( : , 1 ) / R ( 1 , 1 ) ;
for j = 2 : n ,
R ( 1 : j−1,j ) = Q ( : , 1 : j−1) ' * A ( : , j ) ;
Q ( : , j ) = A ( : , j ) − Q ( : , 1 : j−1) * R ( 1 : j−1,j ) ;
R ( j , j ) = norm ( Q ( : , j ) ) ;
i f ( R ( j , j ) == 0 ) , e r r o r ( [ ' columns l i n e a r l y dependent ' ] ) ; end
Q (: , j) = Q (: , j) /R(j ,j) ;
end
P1 a1 P2 a2 Pn an
q1 = , q2 = , ..., qn =
kP1 a1 k kP2 a2 k kPn an k
where
" #
∗
Pj = I − Q̂j−1 Q̂j−1 with Q̂j−1 = q1 q2 ··· qj−1
where
P⊥q = I − qq ∗
vj = Pj aj
Classical:
p
v1 ← (1, ǫ, 0, 0)T , r11 = 1 + ǫ2 ≈ 1, q1 = v1 /1 = (1, ǫ, 0, 0)T
v2 ← (1, 0, ǫ, 0)T , r12 = q1T a2 = 1, v2 ← v2 − 1q1 = (0, −ǫ, ǫ, 0)T
√ √
r22 = 2ǫ, q2 = v2 /r22 = (0, −1, 1, 0)T / 2
v3 ← (1, 0, 0, ǫ)T , r13 = q1T a3 = 1, v3 ← v3 − 1q1 = (0, −ǫ, 0, ǫ)T
r23 = q2T a3 = 0, v3 ← v3 − 0q2 = (0, −ǫ, 0, ǫ)T
√ √
r33 = 2ǫ, q3 = v3 /r33 = (0, −1, 0, 1)T / 2
Modified:
p
v1 ← (1, ǫ, 0, 0)T , r11 = 1 + ǫ2 ≈ 1, q1 = v1 /1 = (1, ǫ, 0, 0)T
T
v2 ← (1, 0, ǫ, 0) , r12 = = 1, v2 ← v2 − 1q1 = (0, −ǫ, ǫ, 0)T
q1T v2
√ √
r22 = 2ǫ, q2 = v2 /r22 = (0, −1, 1, 0)T / 2
v3 ← (1, 0, 0, ǫ)T , r13 = q1T v3 = 1, v3 ← v3 − 1q1 = (0, −ǫ, 0, ǫ)T
√
r23 = = ǫ/ 2, v3 ← v3 − r23 q2 = (0, −ǫ/2, −ǫ/2, ǫ)T
q2T v3
√ √
r33 = 6ǫ/2, q3 = v3 /r33 = (0, −1, −1, 2)T / 6
Check Orthogonality:
Classical: q2T q3 = (0, −1, 1, 0)(0, −1, 0, 1)T /2√
= 1/2
Modified: q2T q3 = (0, −1, 1, 0)(0, −1, −1, 2)T / 12 = 0
MGS is numerically stable (less sensitive to rounding errors)
Flops counts of Gram-Schmidt QR
√
Count each +, −, ∗, /, · as one flop,
only look at the higher order terms
Orthonormalize A ∈ Rm×n , (m ≥ n)
4. EndDo
Approximate total flops for MGS (same for
5. rjj = kqj k2 . If rjj = 0 exit
CGS)
6. qj := qj /rjj n2
7. EndDo 4m = 2mn2
2
A R1 R2 · · · Rn = Q
| {z }
R −1
“Triangular orthogonalization”
Qn · · · Q2 Q1 A = R
| {z }
Q∗
“Orthogonal triangularization”
× × × X X X × × × × × ×
× × × Q1
0 X X Q2
X X Q3
× ×
× × × 0 X X 0 X X
−→ −→ −→
× × × 0 X X 0 X 0
× × × 0 X X 0 X 0
A(0) := A A(1) := Q1 A A(2) := Q2 Q1 A A(3) := Q3 Q2 Q1 A
× × × X X X × × × × × ×
× × × Q1
0 X X Q2
X X Q3
× ×
× × × 0 X X 0 X X
−→ −→ −→
× × × 0 X X 0 X 0
× × × 0 X X 0 X 0
A(0) := A A(1) := Q1 A A(2) := Q2 Q1 A A(3) := Q3 Q2 Q1 A
× × × X X X × × × × × ×
× × × Q1
0 X X Q2
X X Q3
× ×
× × × 0 X X 0 X X
−→ −→ −→
× × × 0 X X 0 X 0
× × × 0 X X 0 X 0
A(0) := A A(1) := Q1 A A(2) := Q2 Q1 A A(3) := Q3 Q2 Q1 A
Question: what is vk ?
× × × X X X × × × × × ×
× × × Q1
0 X X Q2
X X Q3
× ×
× × × 0 X X 0 X X
−→ −→ −→
× × × 0 X X 0 X 0
× × × 0 X X 0 X 0
A(0) := A A(1) := Q1 A A(2) := Q2 Q1 A A(3) := Q3 Q2 Q1 A
Question: what is vk ?
vk
ṽk = A(k−1) (k : m, k), vk ← α kṽk k2 e1 −ṽk , (α =?), vk ←
kvk k2
Y. Zhou Math-6316/CS-7366, SMU 106/209
The Householder Algorithm
Compute Qx = Q1 Q2 · · · Qn x implicitly:
cos θ − sin θ
Recall Givens rotation G(θ) = rotates x ∈ R2
sin θ cos θ
anticlockwisely by θ
To set an element to zero, choose cos θ and sin θ so that
"q 2 #
cos θ − sin θ xi xi + xj2
=
sin θ cos θ xj 0
or
xi −xj
cos θ = q , sin θ = q
xi2 + xj2 xi2 + xj2
“Orthogonal Triangularization”
Geometric Interpretation
b
For any x ∈ Cn , Ax ∈ range(A)
Minimizing kb − Axk2 means finding r = b − Ax
the shortest distance from b to
b
range(A) y = Pb
= Ax ra n g e
Need Ax = Pb where P is an (A)
orthogonal projector onto range(A),
i.e., r ⊥ range(A)
Y. Zhou Math-6316/CS-7366, SMU 112/209
Solving Least Squares Problems
Ax = Pb =⇒ QRx = QQ ∗ b =⇒ Rx = Q ∗ b
If A = UΣV ∗ , than P = UU ∗
Ax = Pb =⇒ UΣV ∗ x = UU ∗ b =⇒ ΣV ∗ x = U ∗ b
Ax = Pb =⇒ Ax = A(A∗ A)-1 A∗ b =⇒ A∗ Ax = A∗ b.
2
It can also be obtained by expanding minn kAx − bk2 as
x∈C
f (x) = x ∗ A∗ Ax − b∗ Ax − x ∗ A∗ b + b∗ b,
then set the first order derivative of f (x) w.r.t. x to 0. This also leads
to the normal equation A∗ Ax = A∗ b.
×××× ×××× ×××× ××××
×××× L1 L2 L3
→ 0 XXX
→
××× →
×××
×××× 0 XXX 0 XX ××
×××× 0 XXX 0 XX 0 X
A L1 A L2 L1 A L3 L2 L1 A
“Triangular triangularization”
Y. Zhou Math-6316/CS-7366, SMU 119/209
The Matrices Lk
xjk
The multipliers ℓjk = xkk appear in Lk :
1
..
.
m
1 Y
Lk = = Ea (k, j, −ℓjk )
−ℓk+1,k 1
j=k+1
.. ..
. .
−ℓmk 1
ℓk ek∗
L-1
k =I − ∗ = I + ℓk ek∗
ek ℓk − 1
L−1 −1 ∗ ∗ ∗ ∗
k Lk+1 = (I + ℓk ek )(I + ℓk+1 ek+1 ) = I + ℓk ek + ℓk+1 ek+1
The product L = L−1 −1 −1
1 L2 · · · Lm−1 is obtained by inserting ℓk into
the k-th column of I
1
ℓ21 1
L = L−1 −1 −1 ℓ31 ℓ32 1
1 L2 · · · Lm−1 =
..
.. .. ..
. . . .
ℓm1 ℓm2 ··· ℓm,m−1 1
f o r k = 1 : m−1
i f ( A ( k , k ) == 0 ) ,
e r r o r ( ' w i t h o u t p i v o t i n g , LU decomposition f a i l s ' )
else
A ( k +1: m , k ) = A ( k +1: m , k ) / A ( k , k ) ;
A ( k +1: m , k +1: m ) = A ( k +1: m , k +1: m )−A ( k +1: m , k ) * A ( k , k +1: m ) ;
end
end
Pm Pm
Operation count ∼ k=1 2(m − k)(m − k) ∼ 2 k=1 k 2 ∼ 2m3 /3
Y. Zhou Math-6316/CS-7366, SMU 122/209
Pivoting
Full pivoting searches among all valid pivots, i.e., at k-th step,
choose maxi≥k,j≥k |aij | as pivot, (interchange rows and columns),
expensive
Partial pivoting considers a pivot in column k only, i.e., choose
maxi≥k |aik | as pivot, (interchange rows)
× × ××× × × ××× × × ×××
× ××× P1
xik X X X L1
xik × × ×
× × × × −→
× × × × −→
0 XXX
xik X X X X XXX 0 XXX
× ××× × ××× 0 XXX
Pivot selection Row interchange Elimination
In terms of matrices:
Lm−1 Pm−1 · · · L2 P2 L1 P1 A = U,
where Pi ’s are the elementary matrices, each used to switch two
rows when a pivot is necessary.
Y. Zhou Math-6316/CS-7366, SMU 125/209
The PA = LU Factorization
Lm−1 Pm−1 · · · L2 P2 L1 P1 A = U
(L′m−1 · · · L′2 L′1 )(Pm−1 · · · P2 P1 )A = U
where
−1 −1
L′k = Pm−1 · · · Pk+1 Lk Pk+1 · · · Pm−1
PA = LU
f o r j = 1 : n−1
% choose t h e one w i t h l a r g e s t magnitude from A( j : n , j ) as p i v o t
[ amax , ip ] = max ( abs ( A ( j : n , j ) ) ) ;
% i p from above i s i n [ 1 : n−j + 1 ] , p o i n t i t t o t r u e row number i n A
ip = ip + j−1;
i f ( ip ˜= j ) ,
% a p p l y P j t o both A and b , t h i s i s n o t h i n g b u t row swamping
tmp=A ( ip , j : n ) ; A ( ip , j : n ) =A ( j , j : n ) ; A ( j , j : n ) =tmp ;
tmp = b ( ip ) ; b ( ip ) = b ( j ) ; b ( j ) = tmp ;
end
i f ( A ( j , j ) ˜=0) ,
% a p p l y t h e s t a n d a r d gauss e l i m i n a t i o n
A ( j +1: n , j ) = A ( j +1: n , j ) / A ( j , j ) ;
A ( j +1: n , j +1: n ) = A ( j +1: n , j +1: n ) − A ( j +1: n , j ) * A ( j , j +1: n ) ;
b ( j +1: n ) = b ( j +1: n ) − A ( j +1: n , j ) * b ( j ) ;
else
error ( ' singular matrix ' ) ;
end
end
x = t r i u ( A ) \b ;
Set
to obtain
PAQ = LU
√
Let α = a11 . The first step for A = R ∗ R is
a11 w ∗ α 0 α w ∗ /α
A := =
w A(1) w/α I 0 A(1) − ww ∗ /a11
α 0 1 0 α w ∗ /α
= =: R1∗ A1 R1
w/α I 0 A(1) − ww ∗ /a11 0 I
p
That is, R(1,1) = A(1,1) , R(1,2:n) = A∗(2:n,1) /R(1,1) .
Can apply the same to A(2) := A(1) − ww ∗ /a11 (also PD, why?)
1 0 1 0 1 0
A= R1∗ R = R1∗ R2∗ A2 R2 R1 , R2 = , A2 =
0 R̃2∗ Ã2 R̃2 1 0 R̃2 0 Ã2
q ∗
(2) (2)
Note R(2,2) = A(1,1) , R(2,2:n) = A(2:n,1) /R(2,2) .
Apply the same recursively to diagonal block of A(k)
Y. Zhou Math-6316/CS-7366, SMU 131/209
Computing A = R ∗ R (A is PD, two versions)
R = A;
for k = 1 : n
f o r j = k+1 : n % o n l y update upper t r i a n g u l a r p a r t ( symmetry )
R(j , j : n) = R(j , j : n) − R(k , j : n) *R(k , j) ' / R(k , k) ;
end
i f ( R ( k , k ) <= 0 ) ,
e r r o r ( ' A i s n o t HPD, t r y ' ' A=Rˆ * DR ' ' i n s t e a d ' ) ,
end
R(k , k : n) = R(k , k : n) / sqrt (R(k , k) ) ;
end
R = triu (R) ;
R = zeros ( n , n ) ;
for i = 1 : n ,
tmp = A ( i , i ) − R ( 1 : i−1,i ) ' * R ( 1 : i−1,i ) ;
i f ( tmp <= 0 ) ,
e r r o r ( ' A i s n o t HPD, t r y ' ' A=Rˆ * DR ' ' i n s t e a d ' ) ,
end
R ( i , i ) = s q r t ( tmp ) ;
f o r j = i+1 : n
R ( i , j ) = ( A ( i , j ) − R ( 1 : i−1,i ) ' * R ( 1 : i−1,j ) ) / R ( i , i ) ;
end
end
R = eye ( n ) ; % t h e r e t u r n e d R i s u n i t upper t r i a n g u l a r
f o r j = 1 : n−1,
dv ( j ) = r e a l ( A ( j , j ) ) ;
R ( j , j +1: n ) = A ( j , j +1: n ) / dv ( j ) ;
f o r i = j+1 : n % o n l y update upper t r i a n g u l a r row elements
A ( i , i : n ) = A ( i , i : n ) − R ( j , i ) ' * dv ( j ) * R ( j , i : n ) ;
end
end
dv ( n ) = A ( n , n ) ; % D= d i a g ( dv ( 1 : n ) )
R = eye ( n ) ;
f o r j = 1 : n−1,
dv ( j ) = r e a l ( A ( j , j ) ) ;
f o r i = j +1: n
R ( j , i ) = A ( j , i ) / dv ( j ) ;
f o r k = j+1 : i %o n l y update l o w e r t r i a n g u l a r column elements
A ( k , i ) = A ( k , i ) − R ( j , i ) * dv ( j ) * R ( j , k ) ' ;
end
end
end
dv ( n ) = A ( n , n ) ;
If f is differentiable,
κ̂(f , x) = kJf (x)k
where the Jacobian (Jf )ij = ∂fi /∂xj , and the matrix norm is
induced by the norms on X and Y .
Relative Condition Number
κ̂ kf (x + δx) − f (x)k kδxk
κ(f , x) = = sup
kf (x)k / kxk δx6=0 kf (x)k kxk
If f is differentiable,
kJf (x)k
κ(f , x) =
kf (x)k/kxk
Y. Zhou Math-6316/CS-7366, SMU 136/209
Conditioning, Condition number
kJf k kxk
κ= = kAk
kf (x)k/kxk kAxk
κ ≤ kAkkA−1 k
(A + δA)(x + δx) = b
means there is a constant C such that |ϕ(s, t)| ≤ Cψ(t) for any s
Example: (sin2 t)(sin2 s) = O(t 2 ) uniformly as t → 0, but not if
sin2 s is replaced by s2
In bounds such as kx̃ − xk ≤ Cκ(A)ǫmachine kxk, C does not
depend on A or b, but it might depend on the dimension m
kx̃ − xk
f̃ (x) = f (x̃) for some x̃ with = O(ǫmachine )
kxk
f (x̃)
b f̃ (x)
f
forward error
x̃
b kf (x)−f̃ (x)k
f̃
backward error b
kx−x̃k f f (x)
b
x
f̃ is stable (in the mixed forward-backward sense): Nearly right solution to a
nearly right problem.
f (x̃) = f̃ (x)
b
f forward error
x̃
b kf (x)−f̃ (x)k
f̃
backward error b
kx−x̃k f f (x)
b
x
f̃ is backward stable: Exactly right solution to a nearly right problem.
Y. Zhou Math-6316/CS-7366, SMU 143/209
Linking forward error with backward error
Assume that forward error, backward error, and condition number are
defined mutually consistently, then a rule of thumb in error analysis is
That is,
f (x) − f̃ (x) ≤ Cκ̂(f , x) kx − x̃k ,
kx̃ − xk
= O(ǫmachine )
kxk
kδAk
Q̃ R̃ = A + δA, = O(ǫmachine )
kAk
kδRk
(R̃ + δR)x̃ = ỹ, = O(ǫmachine )
kR̃k
k∆Ak
(A + ∆A)x̃ = b, = O(ǫmachine )
kAk
kR̃k kA + δAk
≤ kQ̃ ∗ k = O(1)
kAk kAk
k(δQ)R̃k kR̃k
≤ kδQk = O(ǫmachine )
kAk kAk
kQ̃(δR)k kδRk kR̃k
≤ kQ̃k = O(ǫmachine )
kAk kR̃k kAk
k(δQ)(δR)k kδRk
≤ kδQk = O(ǫ2machine )
kAk kAk
kx̃ − xk
= O(κ(A)ǫmachine )
kxk
m e
±mβ e−p = ± β
βp
1 emin
Minimum when d1 = 1, di = 0 (i > 1), e = emin , i.e., β
β .
! di = β − 1 (i ≥ 1), e = emax , i.e.,
Maximum when
p
X β−1
β emax = β emax (1 − β −p ).
βi
i=1
Or, by using mβ e−p : β p−1 β emin −p ≤ |fl(y )| ≤ (β p − 1)β emax −p .
Y. Zhou Math-6316/CS-7366, SMU 154/209
Machine epsilon and unit roundoff
fl(y ) − y
≤ ǫmachine
y
The gaps between adjacent numbers scale with the size of the
numbers
For all x ∈ R in the range of a floating point system, there exists
a floating point number fl(x) such that |x − fl(x)| ≤ ǫmachine |x|
Example: β = 2, p = 3, emin = −1, emax = 3, normalized
d1 d2 d3
+ 2 + 3 2e , e ∈ {−1, 0, 1, 2, 3},
2 2 2
d1 ≡ 1, d2 , d3 ∈ {0, 1}, (essentially only need two bits for p = 3)
1 1
0 4 2 1 2 3 4 5 6 7
Number of floating points between adjacent powers of 2: 2p−1 − 1.
(# of floating points between adjacent powers of β: (β − 1)β p−1 − 1)
Y. Zhou Math-6316/CS-7366, SMU 156/209
Denormalized (or Subnormal) Numbers
1 1
0 4 2 1 2 3 4 5 6 7
Y. Zhou Math-6316/CS-7366, SMU 157/209
Two equivalent floating point representations
where 0 ≤ di ≤ β − 1, and d1 6= 0.
Another equivalent representation (more often used, as used in IEEE) is
d2 dp
±d1 .d2 · · · dp × β e−1 = ± d1 + 1 + · · · + p−1 β e−1 ,
β β
where 0 ≤ di ≤ β − 1, and d1 6= 0.
No essential difference at all, except that in order to represent the same floating
point numbers, the emin and emax of the first representation should be 1 greater
than that of the second representation. (which can cause some confusion.)
For example, the previous example using the second representation should be
β = 2, p = 3, emin = −2, emax = 2.
Y. Zhou Math-6316/CS-7366, SMU 158/209
An exercise
The top one contains the normalized, while the bottom one contains
both the normalized and the subnormal, floating points.
1. Which representation is the system using,
the 0.d1 d2 · · · dp × β e or the d1 .d2 · · · dp × β e ?
2. Determine the possible values of β and p for this system.
The top one contains the normalized, while the bottom one contains
both the normalized and the subnormal, floating points.
1. Which representation is the system using,
the 0.d1 d2 · · · dp × β e or the d1 .d2 · · · dp × β e ?
2. Determine the possible values of β and p for this system.
Answer: To solve this problem, apply the formula that determines the number of floating points between adjacent powers of β,
which is (β − 1)β p−1 − 1. (This formula can be obtained in several ways.)
Here, since (β − 1)β p−1 − 1 = 11, the only two integer solution pairs are (β, p) = (4, 2) and (13, 1). (Note the proportion
of gap is not drawn correctly to reveal the value of β.)
4 − ∞ = lim 4 − x = −∞
x→∞
0 00000000 0000000000000000000000000000000
S E (8 bits) M (23 physical bits, effective 24 bits)
emin =1−127=−126 223 −1 # of floating reals in (2e , 2e+1 ),
0/1 emax =28 −2−127=127 for every integer e∈[emin ,emax ]
Represented number:
0 00000000 0000000000000000000000000000000
S E (8 bits) M (23 physical bits, effective 24 bits)
emin =1−127=−126 223 −1 # of floating reals in (2e , 2e+1 ),
0/1 emax =28 −2−127=127 for every integer e∈[emin ,emax ]
Represented number:
S E M Quantity
0 11111111 00000100000000000000000 NaN
1 11111111 00100010001001010101010 NaN
0 11111111 00000000000000000000000 ∞
0 10000001 10100000000000000000000 +1 · 2129−127 · 1.101 = 6.5
0 10000000 00000000000000000000000 +1 · 2128−127 · 1.0 = 2
0 00000001 00000000000000000000000 +1 · 21−127 · 1.0 = 2−126
0 00000000 10000000000000000000000 +1 · 2−126 · 0.1 = 2−127
0 00000000 00000000000000000000001 +1 · 2−126 · 2−23 = 2−149
0 00000000 00000000000000000000000 0
1 00000000 00000000000000000000000 −0
1 10000001 10100000000000000000000 −1 · 2129−127 · 1.101 = −6.5
1 11111111 00000000000000000000000 −∞
>> f o r m a t long e
>> eps / 2
ans = 1.110223024625157e−16
>> 1 . + eps / 2 − 1 .
ans = 0
>> eps / 1 . 5
ans = 1.480297366166875e−16
>> 1 . + eps / 1 . 5 − 1 .
ans = 2.220446049250313e−16
>> 2 . + eps − 2 .
ans = 0
>> 2 . + 1 . 1 * eps − 2 .
ans = 4.440892098500626e−16
>> 2 . + 2 * eps − 2 .
ans = 4.440892098500626e−16
>> 4 . + 2 * eps − 4 .
ans = 0
>> 4 . + 3 * eps − 4 .
ans = 8.881784197001252e−16
>> 4 . + 4 * eps − 4 .
ans = 8.881784197001252e−16
>> 2 ˆ 9 * eps
ans = 1.136868377216160e−13
>> 1024. + 2 ˆ 9 * eps − 1024.
ans = 0
>> 1024. + ( 1 + 1 . e−16) * 2 ˆ 9 * eps − 1024.
ans = 0
>> 1024. + (1+ eps ) * 2 ˆ 9 * eps − 1024.
ans = 2.273736754432321e−13
>> 1024. + 2 ˆ 1 0 * eps − 1024.
ans = 2.273736754432321e−13
>> 2 ˆ 1 1 . + 2 ˆ 1 0 * eps − 2 ˆ 1 1 .
ans = 0
>> 3 * 2 ˆ 1 0 * eps
ans = 6.821210263296962e−13
>> [ 2ˆ11 + 3 * 2 ˆ 1 0 * eps − 2 ˆ 1 1 , 2ˆ11 + 5 * 2 ˆ 1 0 * eps − 2ˆ11 ]
ans = 9.094947017729282e−13 9.094947017729282e−13
Ax = λx
A = X ΛX −1 or AX = X Λ
Ax = b → (X −1 b) = Λ(X −1 x)
Since
m
Y m
Y
|λj | = σj
j=1 j=1
x ∗ Ax
r (x) = , x ∈ Cm , x 6= 0
x ∗x
Questions:
1. Under what condition does it converge?
Questions:
1. Under what condition does it converge?
2. How to determine convergence?
Questions:
1. Under what condition does it converge?
2. How to determine convergence?
Convergence may be determined from |λ(k+1) − λ(k) |, or from the
angle between v (k+1) and v (k) , or by the residual norm
Av (k) − λ(k) v (k)
3. If it converges, what does it converge to?
v (0) = a1 q1 + a2 q2 + · · · + am qm
v (k) = ck Ak v (0) = ck (a1 λk1 q1 + a2 λk2 q2 + · · · + am λkm qm )
= ck λk1 (a1 q1 + a2 (λ2 /λ1 )k q2 + · · · + am (λm /λ1 )k qm )
Algorithm: RQI
Choose v (0) = some unit length (random) vector
Compute λ(0) = (v (0) )∗ Av (0)
for k = 1, 2, . . .
Solve (A − λ(k−1) I)w = v (k−1) for w (shift-inverse)
v (k) = w/kwk (normalize)
λ(k) = (v (k) )∗ Av (k) (current Rayleigh quotient)
Convergence rate:
(locally) Square in v and λ when A is not hermitian
(locally) Cubic in v and 6th order in λ when A is hermitian
Y. Zhou Math-6316/CS-7366, SMU 183/209
Block Power Method
which converge to S as j → ∞
For practical reason, an eigen algorithm should converge with a
reasonably small j
For hermitian A, the sequence converges to a diagonal matrix
Since a real matrix may have complex eigenvalues (and they
always appear in conjugate pairs), the Q and S in its Schur form
can be complex.
When only real Q and S are desired, then one uses a real Schur
factorization, in which S may have 2 × 2 blocks on its diagonal.
Y. Zhou Math-6316/CS-7366, SMU 187/209
Unitary similarity triangularization
function [ H , Q ] = hessen2 ( A )
[ m , n ] = s i z e ( A ) ; H=A ;
Q = eye ( n ) ;
f o r k = 1 : n−2
[ Q ( k +1: n , k ) , H ( k +1 , k ) ] = house_gen ( H ( k +1: n , k ) ) ;
% p r e m u l t i p l y by ( I − uu ' ) , u = Q( k +1: n , k ) ;
H ( k +1: n , k +1: n ) =H ( k +1: n , k +1: n ) − . . .
Q ( k +1: n , k ) * ( Q ( k +1: n , k ) ' * H ( k +1: n , k +1: n ) ) ;
H ( k +2: n , k ) = zeros ( n−k −1 ,1) ;
% p o s t m u l t i p l y by ( I − uu ' )
H ( 1 : n , k +1: n ) =H ( 1 : n , k +1: n ) −(H ( 1 : n , k +1: n ) * Q ( k +1: n , k ) ) * Q ( k +1: n , k ) ' ;
end
% accumulate Q, use backward a c c u m u l a t i o n ( l e s s f l o p s )
f o r k = n−2 : −1 : 1
u = Q ( k +1: n , k ) ;
Q ( k +1: n , k +1: n ) = Q ( k +1: n , k +1: n ) − u * ( u ' * Q ( k +1: n , k +1: n ) ) ;
Q ( : , k ) = zeros ( n , 1 ) ; Q ( k , k ) =1;
end
Y. Zhou Math-6316/CS-7366, SMU 193/209
Operation counts and stability
Change notation a bit, use V to denote the unitary matrix that transforms A
into H, i.e., V ∗ AV = H
i. reduce A to upper Hessenberg form: AV = VH
ii. while not convergent Do :
1. select a shift µ
2. QR factorization of the shifted H: QR = H − µI
3. update V : V ← VQ
4. update H: H ← RQ + µI (= Q ∗ HQ)
(A − µI)V = VQR = V + R
⇒ At each iteration, QR is block power iteration with shift µ
⇒ In total, QR is subspace iteration with variable shifts
f u n c t i o n [ H , V ] = qrschur ( A , tol ) ;
% compute A=VHV' , where H converges t o upper t r i a n g u l a r
[ m , n ] = s i z e ( A ) ; H = zeros ( n , n ) ;
i f ( n a r g o u t > 2 ) , [ H , V ] = hessen2 ( A ) ; else , [ H ] = hessen ( A ) ; end
k = n ; it = 1 ; itmax = n ˆ 2 ;
w h i l e ( k > 1 & it <=itmax )
% compute t h e W i l k i n s o n s h i f t
mu = e i g ( H ( k−1:k , k−1:k ) ) ;
i f abs ( mu ( 1 )−H ( k , k ) )<=abs ( mu ( 2 )−H ( k , k ) ) , mu = mu ( 1 ) ;
else , mu = mu ( 2 ) ; end
% d e f l a t e i f a s u b d i a g o n a l i s s m a l l enough
i f abs ( H ( k , k−1) ) < tol * ( abs ( H ( k−1,k−1) ) +abs ( H ( k , k ) ) ) ,
H ( k , k−1) = 0 ; k = k−1;
end
it = it + 1 ;
end
3
A symmetric, with Rayleigh quotient shift
2 http://faculty.smu.edu/yzhou/Teach/demo/sym_wilks.gif
3 http://faculty.smu.edu/yzhou/Teach/demo/sym_RQshifts.gif
4 http://faculty.smu.edu/yzhou/Teach/demo/nonsym_wilks.gif
Y. Zhou Math-6316/CS-7366, SMU 202/209
Quite a few details left out
cos θ sin θ
Jacobi rotator J = , looks very much like a
− sin θ cos θ
Givens rotator, but with an intrinsic difference:
The need to keep a similarity transformation.
E.g., Diagonalize a 2 × 2 real symmetric matrix using J
T a d X 0 2d
J J= =⇒ tan(2θ) =
d b 0 X b−a
T1 T̂1
β β β
T= β
= + β β
T2 T̂2
d1 d2 d3 d4 λ