Eigenvalue Computation Methods
Eigenvalue Computation Methods
This lecture discusses a few numerical methods for the computation of eigenvalues and eigenvectors of
matrices. Most of this lecture will focus on the computation of a few eigenvalues of a large symmetric
matrix, but some nonsymmetric matrices also will be considered, including the Google matrix. The QR-
algorithm for both symmetric and nonsymmetric matrices of small to modest size will be discussed towards
the end of these notes. We use the same notation as in Lecture 13. The norm k · k denotes the Euclidean
vector norm.
for k=1,2,3,...
w=A*v;
lambda=norm(w);
v=w/lambda;
end
The scalars lambda converge to the magnitude of the desired eigenvalue and the vector v to an associated
eigenvector as j increases. The reason for this is quite easy to see. Let {λj , vj }nj=1 denote the eigenpairs of
A. Since A is symmetric, we may assume that the eigenvectors vj form an orthonormal basis of Rn . The
initial vector v for the power method can be expressed in terms of this basis, i.e.,
n
X
v= αj vj
j=1
for certain coefficients αj , where we assume that α1 6= 0. By assumption |λ1 | ≫ |λj | for 2 ≤ j ≤ n.
Multiplying v by the matrix A k times yields
n
X n
X
Ak v = αj Ak vj = αj λkj vj
j=1 j=1
Since the quotients λj /λ1 , 2 ≤ j ≤ n, are of magnitude smaller than unity, the vector Ak v/λk1 converges to
a multiple of v1 as k increases.
1
The vector v determined by k steps of the pseudo-code differs from Ak v/λk1 only by a scaling factor. It
follows that the vectors v generated by the code converges to v1 up to an arbitrary scaling factor. If v = v1 ,
then
||Av|| = |λ1 | kv1 k = |λ1 |.
This shows that the scalars lambda in the code converge to the magnitude of λ1 . We can determine the
proper sign of λ1 by comparing the signs of nonvanishing components of w and v. For instance, if v is
an accurate approximation of v1 whose first component is nonvanishing, then sign(λ1 ) is the sign of the
quotient of the first components of w and v. The iterations with the power method are terminated when 2
consecutively determined values of lambda are sufficiently close.
Exercise 1. Generate the matrix M = rand(100), the vector v = rand(100, 1), and let A = (M + M T )/2.
This is a symmetric matrix; it is the symmetric part of the randomly generated matrix M . Apply the power
method to A with initial vector v and print successive values of lambda. Do the scalars lambda converge
quickly or slowly to the largest eigenvalue of A? Explain! Hint: Compute all eigenvalues of A with the
MATLAB function eig. 2
Exercise 2. Generate the matrix M = randn(100), the vector v = randn(100, 1), and let A = (M +M T )/2.
Apply the power method and print successive values of lambda. Do the scalars lambda converge quickly or
slowly to the magnitude of the eigenvalue of largest magnitude of A? Explain! 2
2
This is the power method applied to a suitable matrix, known as the Google matrix. To see that (2) is one
step of the power method, we introduce the hyperlink matrix H = [hjk ] with hjk = 1/|Pj | if there is a link
from page j to page k. The remaining entries of H are zero. Let v (k) be the PageRank vector at iteration
k; the j entry of v (k) is given by rk (Pj ). Then (2) can be written in the form
v (k+1) = H T v (k) ,
which shows that (2) is the power method applied to the transpose of H.
Note that the matrix H is huge; it is of size n × n with n > 8 · 109 . The matrix H also is very sparse.
On average there are 10 outlinks from each web page. Therefore, on average, each row of H contains only
10 nonvanishing entries. Only the nonzero entries are stored. The computational effort required to evaluate
the matrix-vector product in (3) is proportional to n.1
Unfortunately, the iterations (3) are not guaranteed to converge. However, the matrix H easily can be
adjusted to secure convergence. First note that H may have zero rows caused by so-called dangling nodes,
i.e.., web pages without outpointers. We replace these rows by the vector 1/n eT , where e = [1, 1, . . . , 1]T .
This gives the n × n matrix S. Let the vector a ∈ Rn have the entry 1 in those rows that are zero rows of
H; the remaining entries of A are zero. Then we can express S as
1 T
S=H+ ae .
n
Note that this matrix is not explicitly stored.
Convergence of the power method is not guaranteed when applied to S either, since S may have zero
entries. We therefore modify S to obtain a positive matrix, the Google matrix. It is given by
1
G = αS + (1 − α) eeT .
n
The matrix G is not explicitly stored. Instead one expresses the matrix in the form
1 T 1 1
G = α(H + ae ) + (1 − α) eeT = αH + (αa + (1 − α)e) eT . (4)
n n n
Note that only the matrix H and vector a have to be stored. The entries of the vector e are known and
therefore do not have to be explicitly stored. The matrix-vector products required for the iterations
are evaluated by using the decomposed representation (4) of G. Since G is positive the iterations are
guaranteed to converge. The rate of convergence depends on the ratio of the second largest to largest
eigenvalues of G. The largest eigenvalue of G is 1, because the largest eigenvalue of H is 1. The second
largest eigenvalue of G generally is α. The value α = 0.85 is said to be used by Google. The PangeRank is
said to be updated once a week.
1 If H were a dense matrix, then the evauation of a matrix-vector products would have been proportional to n2 arithmetic
3
When searching something with Google the PageRank determines, in part, the result of the search. Pages
with a large PageRank are listed before pages with a small PageRank. 2
Exercise 3. Let the largest and second largest eigenvalue of G be 1 and 0.85, respectively. How many
iterations are required to determine the eigenvector with about 3 significant digits? 2
Exercise 4. Generate the matrix A = rand(100), the vector v = rand(100, 1). This is (generally) a
nonsymmetric nonnegative matrix. Apply the power method to A with initial vector v and print successive
values of lambda. Do the scalars lambda converge quickly or slowly to the largest eigenvalue of A? Hint:
Compute all eigenvalues of A with the MATLAB function eig. 2
Exercise 5. The MATLAB command magic(n) determines an n × n matrix, whose entries form a magic
square. Determine the eigenvalues of a few magic squares using the power method. What is the largest
eigenvalue? Explain! 2
Subspace iteration
The purpose of this method is to determine several, say ℓ, of the largest eigenvalues and associated eigenvec-
tors of a large symmetric matrix A ∈ Rn×n . Let the matrix V ∈ Rn×ℓ have linearly independent columns.
The function orth in the pseudo-code for subspace iteration below orthogonalizes these columns.
V=[v_1,v_2,...,v_l]=orth(V);
for k=1,2,3,...
W=[w_1,w_2,...,w_l]^T=A*V;
Lambda=diag[norm(w_1),norm(w_2),...,norm(w_l)];
for j=1,2,...,l
v_j=w_j/Lambda_j;
end
V=[v_1,v_2,...,v_l]=orth(V);
end
The orthogonalization of the columns in each loop secures convergence to eigenvalues and eigenvectors
associated with the ℓ eigenvalues of A of largest magnitude. Without orthogonalization all columns would
converge to eigenvectors associated with the eigenvalue(s) of largest magnitude.
Let the eigenvalues λj of A be ordered so that their magnitude decreases with increasing index,
The eigenvalues λℓ and λℓ+1 are required to be of different magnitudes. Then subspace iteration yields
convergence towards the jth eigenvector for j ≤ ℓ with rate |λj /λℓ+1 |.
Exercise 6. Determine the two largest eigenvalues of the symmetric tridiagonal 2000 × 2000 matrix with
diagonal entries 2 and sub- and super-diagonal entries −1 by subspace iteration. Use sparse storage format
for the matrix, i.e., first generate it as a dense matrix and then store it as a sparse one using the MATLAB
command sparse. The sparse format only stores the nonvanishing elements of the matrix and of its LU-
factorization. Hint: Type help sparse in MATLAB. The MATLAB command spy lets you see which entries
are stored. The dense matrix can be generated by the command toeplitz. 2
4
Inverse iteration
Assume that we would like to determine an eigenvalue of the symmetric matrix A in the vicinity of µ, say
µ = 1. We would like to use a method that is simple to code, such as the power method. However, when µ
is inside the convex hull of the spectrum of A, the power method will not converge to a desired eigenpair.
Inverse iteration is a simple modification of the power method, which makes it possible to determine the
desired eigenpair. The following pseudo-code describes the inverse iteration method:
Inverse iteration can be seen to be the power method applied to the matrix (A − µI)−1 . This matrix has
the same eigenvectors as A. Let λj be an eigenvalue of A. Then (λj − µ)−1 is an eigenvalue of (A − µI)−1 ;
see Exercise 7. Inverse iterations gives convergence to the eigenvalue (λj − µ)−1 of largest magnitude of the
matrix (A − µI)−1 . The expression |λj − µ|−1 is the largest when λj is the closest to µ. In order for inverse
iteration to converge, there must be a single closest eigenvalue of A to µ. Similarly as the power method,
inverse iteration is terminated when two consecutively computed values of lambda are sufficiently close. The
desired eigenvalue approximation has to be determined from lambda and µ. The parameter µ in inverse
iteration is referred to as the shift.
Exercise 7. Let {λ, v} be an eigenpair of A. Show that v is an eigenvector of (A − µI)−1 . What is the
corresponding eigenvalue? 2
Exercise 8. Give an expression for the rate of convergence of the eigenvector associated with the eigenvalue
closest to µ for inverse iteration similar to (1). 2
Exercise 9. Determine the smallest eigenvalue of the symmetric tridiagonal matrix of Exercise 6. Use
sparse storage format for the matrix. Then also the factors in the LU-factorization are stored as sparse
matrices. Verify this with the MATLAB command spy. 2
Here α is the unknown, v the “matrix”, and Av the “right-hand side” of a standard least-squares prob-
lem. This least-square problem seeks to determine the best “approximate eigenvalue” associated with the
“approximate eigenvector” v.
5
The normal equations associated with (6) are
v T v α = v T Av,
for k=1,2,3,...
Solve (A-alpha*I)w=v for w using LU-factorization;
lambda=norm(w);
v=w/lambda;
alpha=v’*A*v;
end
The desired output are alpha and the vector v. The Rayleigh quotient iteration method requires LU-
factorization of the matrix A-alpha*I in every iteration. This generally is the most expensive part of the
method, and one step of Rayleigh quotient iteration is more expensive than one step of inverse iteration.
Generally Rayleigh quotient iteration requires fewer iterations than inverse iteration and can be competitive,
in particular, when high accuracy is desired.
Exercise 10. Apply Rayleigh quotient iteration to the problem in Exercise 9. Compare the rate of
convergence with inverse iteration. 2
The QR-algorithm
This algorithm is designed to compute all eigenvalues and associated eigenvector of a small to moderately
sized matrix A ∈ Rn×n . We will focus on the computation of eigenvalues and first describe the algorithm
for nonsymmetric matrices A. At the end of this subsection, we discuss simplifications that arise when A is
symmetric.
Let µ be a shift close to a desired eigenvalue. The basic step of the QR-algorithm is the QR-factorization
of the matrix A − µI and the multiplication of the factors in reverse order:
for k=1,2,3,...
1. Compute QR-factorization: A-mu*I => Q and R;
2. Multiply factors: A:=R*Q + mu*I;
end
Here Q is an n × n orthogonal matrix and R an n × n upper triangular matrix. The matrix in line 2 of the
above algorithm is similar to the one in line 1. This can be seen by expressing the matrix R in line 1 as
R = QT (A − µI).
6
Substituting this expression into line 2 yields
where the last equality follows from the fact that Q is orthogonal. Since QT = Q−1 the matrices A and
QT AQ are similar and therefore have the same eigenvalues.
The QR-algorithm is generally applied to upper Hessenberg matrices. These are matrices of the form
* * * * *
* * * * *
0 * * * * .
à =
0 0 * * *
0 0 0 * *
Thus, all entries below the subdiagonal of an upper Hessenberg matrix vanish. We can bring A into upper
Hessenberg form by orthogonal similarity transformation. In other words, there is an orthogonal matrix U ,
such that
à = U AU T .
is an upper Hessenberg matrix. The orthogonal matrix U can be determined as a product of Householder-type
matrices Hk . These are n × n matrices of the form
I 0
Hk = ,
0 H
with leading k × k principal submatrix the identity matrix, the trailing principal (n − k) × (n − k) matrix is a
Householder matrix, and whose remaining elements are zero; see Lecture 12 from the fall semester for details
on Householder and Householder-type matrices. We will use that Householder-type matrices are symmetric
and orthogonal.
In the first step, we apply the Householder-type matrix H1 from the left to generate zeros in the first
column of A below the subdiagonal. Thus,
* * * * *
+ + + + +
H1 A = 0 + + + + .
0 + + + +
0 + + + +
Elements marked by + may be nonvanishing and are generally not the same as in A; elements marked by ∗
are the same as in A. Thus, multiplication by the Householder-type matrix H1 does not change the entries
of the first row of A. Similarly, multiplying H1 A by H1 = H1T from the right does not change the entries in
the first column. In particular, the zeros generated in H1 A are preserved and we obtain
* * * * *
* * * * *
H1 AH1 = 0 * * * * .
0 * * * *
0 * * * *
7
Note that this matrix is similar to A.
We now proceed by applying a Householder-type matrix H2 from the left to H1 AH1 to generate zeros
below the subdiagonal of the second column. This gives the matrix
* * * * *
* * * * *
0 + + + + .
H2 H1 AH1 =
0 0 + + +
0 0 + + +
Entries that may be nonvanishing and different from those in H1 AH1 are marked by +. The similarity
transformation is completed by applying H2 from the right. This does not affect the entries in the first 2
columns of H2 H1 AH1 . We obtain
* * * * *
* * * * *
0 * * * *
H2 H1 AH1 H2 = .
0 0 * * *
0 0 * * *
0 0 * * *
We apply similarly the Householder-type matrix H3 from the right and left, to obtain the upper Hessenberg
matrix
* * * * *
* * * * *
0 * * * *
à = H3 H2 H1 AH1 H2 H3 =
0
,
0 * * *
0 0 * * *
0 0 0 * *
which is similar to the given matrix A.
We apply the iterations of the QR-algorithm to the upper Hessenberg matrix Ã. The shift µ is often
chosen to be the eigenvalue of the trailing principal 2 × 2 matrix of à closest to the last diagonal entry. The
upper Hessenberg form of the matrix is preserved by the iterations. The QR-algorithm therefore generates
a sequence of upper Hessenberg matrices. The last subdiagonal element of the Hessenberg matrices in this
sequence typically converges to zero. Only rarely another subdiagonal entry becomes zero. When the last
subdiagonal entry vanishes, the last diagonal entry is an eigenvalue. We can continue the computations
with the leading (n − 1) × (n − 1) principal submatrix of the Hessenberg matrix. This submatrix also is of
upper Hessenberg form. The reduction of size is referred to as deflation. In actual implementations in finite-
precision floating-point arithmetic, deflation is carried out when the last subdiagonal element is of sufficiently
small magnitude. The computations proceed until a 2 × 2 upper Hessenberg matrix remains. Its eigenvalues
can be determined in closed form. The eigenvalue approximations determined by the QR-algorithm converge
at least quadratically with the iteration number.
The computation of the QR-factorization of an upper Hessenberg matrix is much cheaper than determin-
ing the QR-factorization of a general matrix of the same size. This depends on that the Householder-type
matrices are sparse; see Exercise 12 below.
We turn to symmetric matrices A. Then the upper Hessenberg matrix à is symmetric. It therefore is
symmetric and tridiagonal. This matrix form is preserved by the QR-algorithm.
8
Exercise 11. Show that λ = 3 is an eigenvalue of the matrix
1 2 3 4
2 3 4 5
A= 0
.
4 5 6
0 0 0 3
Do the Householder matrices in the Householder-type matrices have a special form? If so, explain. 2
Exercise 13. The number of iterations required by the QR-algorithm typically is independent of the size
of the matrix. Under this assumption, how fast does the computational work grow with n when the given
matrix A is i) nonsymmetric and of upper Hessenberg form, and ii) symmetric and tridiagonal. How does
this computational effort compare with the initial reduction of the matrix to upper Hessenberg or symmetric
tridiagonal form? 2
Exercise 14. The MATLAB command hilb(n) determines the n×n Hilbert matrix. Compute the eigenval-
ues of Hilbert matrices of orders 3, 4, 5, . . . . Determine experimentally the growth of the condition number
of Hilbert matrices with n. Does the condition number grow linearly, quadratically, exponentially? Faster
than exponentially? How can you find out by computing and plotting? We will come across Hilbert matrices
in the next lecture. 2
[1] A. N. Langville and C. D. Meyer, Google’s PageRank and Beyond, Princeton University Press, Princeton,
NJ, 2006.