Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views149 pages

Lalg 2

The document is a course outline for Linear Algebra II by J. B. Cooper at Johannes Kepler Universität Linz, covering topics such as determinants, complex numbers, eigenvalues, Euclidean and Hermitian spaces, and multilinear algebra. It includes detailed sections on the properties and calculations of determinants, the construction of complex vector spaces, and various theorems related to linear algebra. Each section is structured to build foundational knowledge and explore advanced concepts in linear algebra.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views149 pages

Lalg 2

The document is a course outline for Linear Algebra II by J. B. Cooper at Johannes Kepler Universität Linz, covering topics such as determinants, complex numbers, eigenvalues, Euclidean and Hermitian spaces, and multilinear algebra. It includes detailed sections on the properties and calculations of determinants, the construction of complex vector spaces, and various theorems related to linear algebra. Each section is structured to build foundational knowledge and explore advanced concepts in linear algebra.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

LINEAR ALGEBRA II

J. B. Cooper
Johannes Kepler Universität Linz

Contents
1 DETERMINANTS 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Existence of the determinant
and how to calculate it . . . . . . . . . . . . . . . . . . . . . 5
1.3 Further properties of the determinant . . . . . . . . . . . . . . 10
1.4 Applications of the determinant . . . . . . . . . . . . . . . . . 20

2 COMPLEX NUMBERS AND COMPLEX VECTOR SPACES 26


2.1 The construction of C . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Complex vector spaces and matrices . . . . . . . . . . . . . . 37

3 EIGENVALUES 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Characteristic polynomials and diagonalisation . . . . . . . . . 44
3.3 The Jordan canonical form . . . . . . . . . . . . . . . . . . . . 50
3.4 Functions of matrices and operators . . . . . . . . . . . . . . 63
3.5 Circulants and geometry . . . . . . . . . . . . . . . . . . . . . 71
3.6 The group inverse and the Drazin inverse . . . . . . . . . . . 74

4 EUCLIDEAN AND HERMITIAN SPACES 77


4.1 Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Orthogonal decompositions . . . . . . . . . . . . . . . . . . . 89
4.3 Self-ajdoint mappings— the spectral theorem . . . . . . . . . 91
4.4 Conic sections . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5 Hermitian spaces . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.6 The spectral theorem—complex version . . . . . . . . . . . . . 103
4.7 Normal operators . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.8 The Moore-Penrose inverse . . . . . . . . . . . . . . . . . . . 112

1
4.9 Positive definite matrices . . . . . . . . . . . . . . . . . . . . 119

5 MULTILINEAR ALGEBRA 123


5.1 Dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2 Duality in euclidean spaces . . . . . . . . . . . . . . . . . . . 131
5.3 Multilinear mappings . . . . . . . . . . . . . . . . . . . . . . 132
5.4 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

2
1 DETERMINANTS
1.1 Introduction
In this chapter we treat one of the most important themes of linear algebra—
that of the determinant. We begin with some remarks which will motivate
the formal definition:
I. Recall that the system

ax + by = e
cx + dy = f

has the unique solution


ed − f b af − ce
x= y=
ad − bc ad − bc
provided that the denominator ad − bc is non-zero. If we introduce the
notation  
a b
det = ad − bc
c d
we can write the solution in the form
   
e b a b
x = det ÷ det
f d c d
   
a e a b
y = det ÷ det
c f c d
(Note that the numerators are formed by replacing the column of the matrix
 
a b
A=
c d

corresponding to the unknown by the column vector on the right hand side).
Earlier, we displayed a similar formula for the solution of a system of
three equations in three unknowns. It is therefore natural to ask whether
we can define a function det on the space Mn of n × n matrices so that the
solution of the equation AX = Y is, under suitable conditions, given by the
formula
det Ai
xi =
det A

3
where Ai is the matrix that we obtain by replacing the i-th column of A by
Y i.e.  
a11 a12 . . . a1,i−1 y1 a1,i+1 . . . a1n
 
Ai =  ... ..
. .
an1 an2 . . . an,i−1 yn an,i+1 . . . ann
II. Recall that  
a b
ad − bc = det
c d
is the area of the parallelogram spanned by the vectors (a, c) and (b, d). Now
if f is the corresponding linear mapping on R2 , this is just the image of the
standard unit square (i.e. the square with vertices (0, 0), (1, 0), (0, 1), (1, 1))
under f . The natural generalisation would be to define the determinant of
an n × n matrix to be the n-dimensional volume of the image of the standard
hypercube in Rn under the linear mapping induced by the matrix. Although
we do not intend to give a rigorous treatment of the volume concept in higher
dimensional spaces, it is geometrically clear that it should have the following
properties:
a) the volume of the standard hypercube is 1. This means that the determi-
nant of the unit matrix is 1;
b) the volume depends linearly on the length of a fixed side. This means
that the function det is linear in each column i.e.

det[A1 . . . Ai + A′i . . . An ] = det[A1 . . . A)i . . . An ] + det[A1 . . . A′i . . . An ]

and
det[A1 . . . λAi . . . An ] = λ det[A1 . . . An ].
c) The volume of a degenerate parallelopiped is zero. This means that if two
columns of the matrix coincide, then its determinant vanishes.
(Note that the volume referred to here can take on negative values—
depending on the orientation of the parallelopiped).

4
1.2 Existence of the determinant
and how to calculate it
We shall now proceed to show that a function with the above properties
exists. In fact it will be more convenient to demand the analogous properties
for the rows i.e. we shall construct, for each n, a function

det : Mn → R

with the properties


d1) det In = 1;
d2)
    
A1 A1 A1
 ..   ..   .. 
 .   .   . 
     
det  λAi + µA′i  = λ det  Ai  + µ det  A′i  .
 ..     .. 
 .   ...   . 
An An An
d3) if Ai = Aj (i 6= j), then
 
A1
 .. 
 . 
 
 Ai 
 . 
det  
 ..  = 0.
 A 
 j 
 . 
 .. 
An

Before we prove the existence of such a function, we shall derive some further
properties which are a consequence of d1) - d3):
d4) if we add a multiple of one row to another one, the value of the determi-
nant remains unaltered i.e.
   
A1 A1
 ..   .. 
 .   . 
   
 Ai   Ai + Aj 
 .   .. 
det 
 .
.  = det 
  . ;

 A   Aj 
 j   
 .   . 
 ..   .. 
An An

5
d5) if we interchange two rows of a matrix, then we alter the sign of the
determinant i.e.    
A1 A1
 ..   .. 
 .   . 
   
 Ai   Aj 
 .   .. 

det  .  = − det 
. 
 . .
 A   Ai 
 j   
 .   .. 
 ..   . 
An An
d6) if one row of A is a linear combination of the others, then det A = 0.
Hence if r(A) < n (i.e. if A is not invertible), then det A = 0.
Proof. d4)
       
A1 A1 A1 A1
 ..   ..   ..   .. 
 .   .   .   . 
       
 Ai + Aj   Ai   Aj   Ai 
 ..       
det  .  = det  ...  + det  ...  = det  ... 
       
 Aj       
   Aj   Aj   Aj 
 ..   .   .   . 
 .   ..   ..   .. 
An An An An
by d3).
d5)
       
A1 A1 A1 A1
 ..   ..   ..   .. 
 .   .   .   . 
       
 Ai   Ai + Aj   Ai + Aj   Aj 
 .   ..   ..   
det  .  = det 
 .  .  det  .  = − det  ...  .
     
 A   Aj   −Ai   A 
 j       i 
 .   ..   ..   . 
 ..   .   .   .. 
An An An An
d6) Suppose that Ai = λ1 A1 + · · · + λi−1 Ai−1 . Then
   
A1 A1
 ..   .. 
 .   . 
   
 A   A 
det  i−1  = det  i−1
=0
 Ai   (λ1 A1 + . . . λi−1 Ai−1 ) 
 .   .. 
 ..   . 
An An

6
since if we expand the expression by using the linearity in the i-th row we
obtain a sum of multiples of determinants each of which has two identical
rows and these vanish.
Note the fact that with this information we are able to calculate the
determinant of a given matrix, despite the fact that it has not yet been
defined! We simply reduce the matrix A to Hermitian form à by using
elementary transformations. At each step the above rules tell us the effect
on the determinant. If there is a zero on the diagonal of à (i.e. if r(A) < n),
then det A = 0 by d6) above. If not, we can continue to reduce the matrix to
the unit matrix by further row operations and so calculate its determinant.
In fact, a little reflection shows that most of these calculations are superfluous
and that it suffices to reduce the matrix to upper triangular form since the
determinant of the latter is the product of its diagonal elements.
We illustrate this by “calculating” the determinant of the 3 × 3 matrix
 
0 2 3
 1 2 1 .
2 −3 2

We have
   
0 2 3 1 2 1
det  1 2 1  = − det  0 2 3 
2 −3 2 2 −3 2
 
1 2 1
= − det  0 2 3 
0 −7 0
 
1 2 1
= −2 det  0 1 23 
0 −7 0
 
1 2 1
= −2 det  0 1 23 
0 0 21
2
= −21.

In fact, what the above informal argument actually proves is the unique-
ness of the determinant function. This fact is often useful and we state it as
a Proposition.

Proposition 1 There exists at most one mapping det : Mn → R with the


properties d1)-d3) above.

7
The main result of this section is the fact that such a function does in fact
exist. The proof uses an induction argument on n. We already know that a
determinant function exists for n = 1, 2, 3. In order to motivate the following
proof note the formula

a11 (a22 a33 − a32 a23 ) − a21 (a12 a33 − a32 a13 ) + a31 (a12 a23 − a22 a13 )

for the determinant of the 3 × 3 matrix


 
a11 a12 a13
 a21 a22 a23  .
a31 a32 a33
This is called the development of the determinant along the first column and
suggests how to extend the definition to one dimension higher. This will be
carried out formally in the proof of the following Proposition:
Proposition 2 There is a (and hence exactly one) function det : Mn → R
with the properties d1)-d3) (and so also d4)-d6)).
Proof. As indicated above, we prove this by induction on n. The case
n = 1 is clear (take det[a] = a). The step from n − 1 to n: we define
n
X
det A = (−1)i+1 ai1 det Ai1
i=1

where Ai1 is the (n−1) ×(n−1) matrix obtained by deleting the first column
and the i-throw of A (the induction hypothesis ensures that its determinant
is defined) and show that this function satisfies d1), d2) and d3). It is clear
that det In = 1. We verify the linearity in the k-th row as following. It
suffices to show that each term ai1 det Ai1 is linear in the k-th row. Now if
i 6= k a part of the k-th row of A is a row of Ai1 and so this term is linear by
the induction hypothesis. if i = k, then det Ai1 is independent of the k-throw
and ai1 depends linearly on it.
It now remains to show that det A = 0 whenever two rows of A are
identical, say the k-th and the l-th (with k < l). Consider the sum
n
X
(−1)i+1 ai1 det Ai1 .
i=1

then Ai1 has two identical rows (and so vanishes by the induction hypothesis)
except for the cases where j = k or j = l. This leaves the two terms

(−1)k+1 ak1 det Ak1 and (−1)l+1 al1 det Al1

8
and they are equal in absolute value, but with opposite signs. (For ak1 = al1
and Ak1 is obtained from Al1 by moving one row (k − l − 1) places. This
can be achieved by the same number of row exchanges and so multiplies the
determinant by (−1)k−l−1 ).
The above proof yields the formula
n
X
det A = (−1)i+1 ai1 det Ai1
i=1

for the determinant which is called the development along the first col-
umn. Similarly, one can develop det A along the j-th column i.e. we have
the formula n
X
det A = (−1)i+j aij det Aij
i=1

where Aij is the (n − 1) × (n − 1) matrix obtained from A by omitting the


i-th row and the j-th column. This can be proved by repeating the above
proof with this recursion formula in place of the original one.

Example: If we expand the determinant of the triangular matrix


 
a11 a12 . . . a1n
 0 a22 . . . a2n 
 
A =  .. .. 
 . . 
0 0 ... ann

along the first column, we see that its determinant is


 
a22 . . . a2n
 0 . . . a3n 
 
a11 det  .. ..  .
 . . 
0 . . . ann

An obvious induction argument shows that the determinant is a11 a22 . . . ann ,
the product of the diagonal elements. In particular, this holds for diagonal
matrices.
This provides a justification for the method for calculating the determi-
nant of a matrix by reducing it to triangular form by means of elementary
row operations. Note that for small matrices it is usually more convenient
to calculate the determinant directly from the formulae given earlier.

9
1.3 Further properties of the determinant
d7) if r(A) = n i.e. A is invertible, then det A 6= 0.
Proof. For then the Hermitian form of A has non-zero diagonal elements
and so the determinant of A is non-zero.
Combining d5) and d7) we have the following Proposition:

Proposition 3 An n×n matrix A is invertible if and only if its determinant


is non-zero.

Shortly we shall see how the determinant can be used to give an explicit
formula for the inverse.
d8) The determinant is multiplicative i.e.

det AB = det A · det B.

Proof. This is a typical application of the uniqueness of the determinant


function. We first dispose of the case where det B = 0. Then r(B) < n and
so r(AB) < n. In this case the formula holds trivially since both sides are
zero.
If det B 6= 0, then the mapping
det AB
A 7→
det B
is easily seen to satisfy the three characteristic properties d1)-d3) of the
determinant function and so is the determinant.
d9) Suppose that A is an n × n matrix whose determinant does not vanish.
Then, as we have seen, A is invertible and we now show that the inverse of
A can be written down explicitly as follows:
1
A−1 = (adj A)
det A
where adj A is the matrix [(−1)i+j det Aji ]. (i.e. we form the matrix whose
(i, j)-th entry is the determinant of the matrix obtained by removing the i-th
row and the j-th column of A, with sign according to the chess-board pattern

+ − + − ...
− + − + ...
..
.

This matrix is then transposed and the result is divided by det A.

10
Proof. We show that (adj A)A = (det A)I. Suppose that bik is the (i, k)-th
element of the product i.e.
n
X
bik = (−1)i+j ajk det Aji .
j=1

If i = k this is just the expansion of det A along the i-th column i.e. bii =
det A.
If i 6= k, it is the expansion of the determinant of the matrix obtained
from A by replacing the i-th column with the k-th one and so is 0 (since this
is a matrix with two identical columns and so of rank < n).
We have discussed the determinant function in terms of its properties
with regard to rows. Of course, it would have been just as logical to work
with columns and we now show that the result would have been the same.
To do this we introduce as a temporary notation the name Det for a function
on the n × n matrices with the following properties:
D1) Det In = 1;
D2) Det is linear in the columns;
D3) Det A = 0 whenever two columns of A coincide.
Of course we can prove the existence of such a function exactly as we did
for det (exchanging the word “column” for “row” everywhere). Even simpler,
we note that if we put
Det A = det At
then this will fulfill the required conditions.
All the properties of det carry over in the obvious way. In particular, there
is only one function with the above properties and we have the expansions
n
X
Det A = (−1)i+j aij Det Aij
j=1

along the i-th row. We shall now prove the following result:
Proposition 4 For each n × n matrix A, det A = Det A.
In other words, det A = det At and the notation “Det” is superfluous.
Again the proof is a typical application of the uniqueness. It suffices to
show that the function det satisfies conditions d1)-d3). Of course, we have
det I = 1. In order to prove the other two assertions, we use induction on
the order n and inspect the expansion
n
X
Det A = (−1)i+j aij Det Aij
j=1

11
which is clearly linear in aij (and so in the i-th row). By the induction
hypothesis, it is linear in the other rows (since each of the Aij are). To
complete the proof, we need only show that Det A vanishes if two rows of
A coincide. But then r(A) < n and so we have Det A = 0 by the column
analogue of property d6).
d11) One can often reduce the computations involved in calculating deter-
minants by using suitable block decompositions. For example, if A has the
decomposition  
B C
0 D
where B and D are square matrices, then

det A = det B · det D.

(Warning: it is not true that if


 
B C
A=
D E

then there is a simple formula such as

det A = det B · det E − det D · det C

which would allow us to calculate the determinant of A from those of B, C,


D and E. However, such formulae do exist under suitable conditions, the
above being the simplest example).
Proof. We can assume that A and hence also B and D are invertible (for
otherwise both sides vanish). Then if we multiply A on the right by the
matrix  
I −B −1 C
0 I
which has determinant 1, we get the value
 
B 0
det .
0 D

Now the function  


B 0
B 7→ det ÷ det D
0 D
fulfills the conditions d1)-d3) and so is the determinant function.

12
Determinants of linear operators Since square matrices are the coordi-
nate versions of linear operators on a vector space V it is tempting to extend
the definition of determinants to such operators. The obvious way to do this
is to choose some basis (x1 , . . . , xn ) and to define the determinant det f of f
to be the determinant of the matrix of f with respect to this basis. We must
then verify that this value is independent of the choice of basis. But if A′ is
the matrix of f with respect to another basis, we know that

A′ = S −1 AS

for some invertible matrix S. Then we have

det A′ = det(S −1 AS)


= det S −1 · det S · det A
= det(S −1 S) det A
= det A.

Of course, it is essential to employ the matrix of f with respect to a single


basis for calculating the determinant.
Some of the properties of the determinant can now be interpreted as
follows:
a) det f 6= 0 if and only if f is an isomorphism;
b) det(f g) = det f · det g (f, g ∈ L(V ));
c) det Id = 1.

Example: Calculate  
6 0 2 0
 4 0 0 2 
det 
 0

1 2 0 
2 0 2 2
We have
 
6 0 2 0  
 4  6 2 0
0 0 2 
det 
 0 = − det  4 0 2 
1 2 0 
2 2 0
2 0 2 2
 
3 1 0
= −8 det  2 0 1 
1 1 0
= −8(−3 + 1) = 16.

13
Example: Calculate  
x 1 1 1
 1 x 1 1 
det 
 1
.
1 x 1 
1 1 1 x
Solution: We have
   
x 1 1 1 0 1−x 1 − x2 1 − x2
 1 x 1 1   0 −(1 − x) 0 1−x 
det 
 1 1 x
 = det 
 

1 0 0 −(1 − x) 1 − x 
1 1 1 x 1 1 1 x
 2

1−x 1−x 1−x
= det  −(1 − x) 0 1−x 
0 −(1 − x) 1 − x
 
1 1 1+x
= (x − 1)3 det  −1 0 1 
0 −1 1
= (x − 1)3 (1 + x + 2) = (x − 1)3 (3 + x).

Example: Calculate the determinant dn of the n × n matrix


 
2 −1 0 . . . 0
 −1 2 0 ... 0 
 
 .. .. 
 . . 
0 0 . . . −1 2

We can express the determinant in terms of the following (n − 1) × (n − 1)


determinants by expanding along the first row:
 
  −1 0 0 . . . 0
2 −1 0 . . . 0  −1 2 −1 . . . 0 
 −1 2 0 . . . 0   . 
   . 
dn = det  .. ..  + det  . 
 . .   
 ... 
0 0 0 ... 2
0 0 0 ... 2
= 2dn−1 − dn−2

. It then follows easily by induction that dn = n + 1.

14
Example: Calculate
 
0 0 ... 0 1
 0 0 ... 1 0 
 
det  .. ..  .
 . . 
1 0 ... 0 0

We have
   
0 0 ... 0 1 0 ... 0 1
 0 0 ... 1 0   0 ... 1 0 
  n  
det  .. ..  = (−1) det  .. .. 
 . .   . . 
1 0 ... 0 0 1 ... 0 0

where the left hand matrix is n×n and the right hand one is (n−1)×(n−1).
From this it follows that the value of the given determinant is
n(n−1)
(−1)n−1 (−1)n−2 . . . (−1)2−1 = (−1) 2 .

Example: Calculate
 
x a a ... a
 a x a ... a 
 
det  .. .. .
 . . 
a a a ... x

Solution: The required determinant is


 
x a a ... a
 a−x x−a 0 ... a 
  n−1
det  .. ..  = x(x − a) − (a − x)a(x − a)n−2 + · · · +
 . . 
a−x 0 0 ... x− a
= x(x − a)n−1 + a(x − a)n−1 + · · · +
= (x + (n − 1)a)(x − a)n−1 .

Example: Calculate the determinant of the Vandermonde matrix


 
1 x1 x22 . . . x1n−1
 1 x2 x2 . . . ax2n−1 
 2 
 .. .. .
 . . 
1 xn x2n . . . n−1
xn

15
Solution: Subtracting from each column x1 times the one on its left we see
that the determinant is equal to
 
1 0 ... 0
 1 x2 − x1 . . . xn−2 (x2 − x1 ) 
 2 
det  .. .. 
 . . 
n−2
1 (xn − x1 ) . . . xn (xn − x1 )
which is equal to
 
1 x2 . . . x2n−2
 .. 
(x2 − x1 )(x3 − x1 ) . . . (xn − x1 ) det  ... . 
n−2
1 xn . . . xn

Hence, by induction, the value of the determinant is


Y
(xj − xi )
1≤i<j≤n

n(n − 1)
(a product of terms). (In particular, this determinant is non-zero
2
if the xi are distinct).

Exercises: 1) Evaluate the determinants of the following matrices:


   
1 2 3 1 1 1

det 4 5 6  det  a b c  .
7 8 9 a2 b2 c2
2) Calculate the determinants of
 
a + b + 2c a b
det  c b + c + 2a b 
c a c + a + 2b
 
x 1 0 x
 0 x x 1 
det 
 1 x x 0


x 0 1 x
3) For which values of x does the determinant
 
x 1 x
det  0 x 1 
2 x 1

16
vanish?
4) Evaluate the following determinants:
 
1 2 3 ... n
 2 3 4 ... 1 
 
det  .. .. 
 . . 
n 1 2 ... n−1
 
0 1 1 ... 1
 −1 0 1 ... 1 
 
det  .. .. 
 . . 
−1 −1 −1 ... 0
 
1 −a 0 ... 0
 −b 1 −a ... 0 
 
 ... 0 
det  0 −b 1 
 .. .. 
 . . 
0 0 ... −b 1
 
λ1 1 0 ... 0
 −1 λ2 1 ... 0 
 
det  .. .. 
 . . 
0 0 ... −1 λn
 
λ1 a a ... a
 b λ2 a ... a 
 
det  .. .. 
 . . 
b b b . . . λn
 
λ a 0 ... 0
 a λ a ... 0 
 
det  .. ..  .
 . . 
0 0 ... a λ
5) Show that if P is a projection, then the dimension k of the range of P is
determined by the equation

2k = det(I + P ).

Use this to show that if t 7→ P (t) is a continuous mapping from R into the
family of all projections on Rn , then the dimension of the range of P (t) is
constant.

17
6) Show that if A (resp. B) is an m × n matrix (resp. an n × m matrix),
then Im + AB is invertible if and only if In + BA is.
7) Let A and B be n × n matrices. Show that
• adj (AB) = adj A · adj B;

• adj (adj A) = (det A)n−2 A;

• adj (λA) = λn−1 adj A.


8) Let A be an invertible matrix whose elements are integers. Show that A−1
has the same property if and only if the determinant of A is 1 or −1.
9) Let f be a mapping from Mn into R which is linear in each row and such
that f (A) = −f (B) whenever B is obtained from A by exchanging two rows.
Show that there is a c ∈ R so that f (A) = c · det A.
10) Show that if A is an n×n matrix with At = −A (such matrices are called
skew symmetric), then the determinant of A vanishes whenever n is odd.
11) Let A be an m × n matrix. Show that in the reduction of A to Hermitian
form by means of Gaußian elimination, the pivot element never vanishes (i.e.
we do not require the use of row exchanges) if and only if det Ak 6= 0 for each
k ≤ min(m, n). (Ak is the k × k matrix [ai j]1≤i≤k,1≤j≤k .
Deduce that A then has a factorisation of the form LU where L is an
invertible m × m lower triangular matrix and U is an m × n upper triangular
matrix.
12) Show that a matrix A has rank r if and only if there is a r × r minor of
A with non-vanishing determinant and no such (r + 1) × (r + 1) minor. (An
r-minor of A is a matrix of the form
 
ai1 ,j1 . . . ai1 ,jp
 .. 
det  ... . 
aip ,j1 . . . aip ,jp

for increasing sequences

1 ≤ i1 < i1 < · · · < ir ≤ m

1 ≤ j1 < j2 < · · · < jr ≤ n).


13) Let x and y be continuously differentiable functions on the interval [a, b].
Apply Rolle’s theorem to the following function
 
1 1 1
t 7→ det  x(t) x(a) x(b) 
y(t) y(a) y(b)

18
Which well-known result of analysis follows?
14) Let
x1 = rc1 c2 . . . cn−2 cn−1
x2 = rc1 . . . cn−2 sn−1
..
.
xj = rc1 . . . cn−j sn−j+1
..
.
xn = rs1 .
where ci = cos θi , si = sin θi (these are the equations of the transformation to
polar coordinates in n dimensions). Calculate the determinant of the Jacobi
matrix
∂(x1 , x2 , . . . , xn )
.
∂(r, θ1 , . . . , θn−1 )
15) Consider the Vandermonde matrix
 
1 ... 1
 t1 . . . tn 
 
Vn =  .. ..  .
 . . 
n−1 n−1
t1 . . . tn
Show that Vn Vnt is the matrix
 
s0 s1 . . . sn−1
 s1 s2 . . . sn 
 
 .. .. 
 . . 
sn−1 sn . . . s2n−2
Pn k
where sk = i=1 ti .
Use this to calculate the determinant of this matrix. 
B C
16) Suppose that the square matrix A has a block representation
D E
where B is square and non-singular. Show that
det A = det B det(E − DB −1 C).
Deduce that if D is also square and commutes with B, then det A = det(BE−
DC).
17) Suppose that A0 , . . . , Ar are complex n × n matrices and consider the
matrix function
p(t) = A0 + A1 t + · · · + Ar tr .
Show that if det p(t) is constant, then so is p(t) (i.e. A0 is the only non-
vanishing term).

19
1.4 Applications of the determinant
We conclude this chapter by listing briefly some applications of the determi-
nant:

I. Solving systems of equations—Cramer’s rule: returning to one of


our original motivations for introducing determinants, we show that if the
determinant of the matrix A of the system AX = Y is non-zero, then the
unique solution is given by the formulae
det Ai
xi =
det A
where Ai is the matrix obtained by replacing the i-th column of A with the
column vector Y . To see this note that we can write the system in the form
     
a11 x1 a1i − y1 x1 a1n − y1
     
x1  ...  + · · · +  ..
.  + · · · + xn 
..
. =0
an1 x1 ani − yn x1 ann − yn
and this just means that the columns of the matrix
 
a11 . . . a1,i−1 (x1 a1i − y1 ) . . . a1n
 .. .. 
 . . 
an1 . . . an,i−1 (xi ani − yn ) . . . ann
are linearly independent. Hence its determinant vanishes and, using the
linearity in the i-th column, this means that det Ai − xi det A = 0.

II. The recognition of bases: If (x1 , . . . , xn ) is a basis for a vector space


V (e.g. the canonical basis for Rn ), then a set (x′1 , . . . , x′n ) is a basis if and
only if the determinant of the transfer matrix T = [tij ] whose columns are
the coordinates of the x′j with respect to the (xi ) is non-zero.

III. Areas, volumes etc.: If ξ, η are points in R, then


 
η 1
det =η−ξ
ξ 1
is the directed length of the interval from ξ to η.
If A = (ξ1 , ξ2), B = (η1 , η2 ), C = (ζ1 , ζ2) are points in the plane, then
 
ξ1 ξ2 1
1 
det η1 η2 1 
2
ζ1 ζ2 1

20
is the area of the triangle ABC. The area is positive if the direction A →
B → C is clockwise, otherwise it is negative. (By taking the signed area we
assure that it is additive i.e. that

△ABC = △OAB + △OBC + △OCA

regardless of the position of O with respect to the triangle (see figure ??).
If A = (ξ1 , ξ2 , ξ3 ), B = (η1 , η2 , η3 ), C = (ζ1 , ζ2 , ζ3 ), D = (υ1 , υ2, υ3 ) are
points in space, then  
ξ1 ξ2 ξ3 1
1  η1 η2 η3 1 
det  ζ1 ζ2 ζ3 1 

3!
υ1 υ2 υ3 1
is the volume of the tetrahedron ABCD.
Of course, analogous formulae hold in higher dimensions.

IV. The action of linear mappings on volumes: If f is a linear map-


ping from R3 into R3 with matrix
 
a11 a12 a13

A = a21 a22 a23 
a31 a32 a33
then f multiplies volumes by a factor of det f . For suppose that f maps the
tetrahedron BCDE into the tetrahedron B1 C1 D1 E1 . The area of BCDE is
 
ξ1 ξ2 ξ3 1
1  η1 η2 η3 1 
det 
 ζ1 ζ2 ζ3 1 

3!
υ1 υ2 υ3 1

where B is the point (ξ1 , ξ2, ξ3 ) etc. and that of the image is
 1 1 
ξ1 ξ2 ξ31 1
1  η11 η21 η31 1 
det  ζ11 ζ21
.
3! ζ31 1 
υ1 υ2 υ3 1
Now we have
    t 
ξ11 ξ21 ξ31 1 ξ11 ξ21 ξ31 1 A 0
1  η11 η21 η31 1   η11 η21 η31 1   
det   = 1 det   
3!  ζ11 ζ21 ζ31 1  3!  ζ11 ζ21 ζ31 1   
υ1 υ2 υ3 1 υ11 υ21 υ31 1 0 1

21
and so we have that the volume of B1 C1 D1 E1 is det A times the volume of
BCDE. It follows from a limiting argument that the same formula holds for
arbitrary figures. (This justifies the original geometrical motivation for the
existence and properties of the determinant).
Once again, an analogous result holds in higher dimensions.

V. The equations of curves: If P = (ξ11 , ξ21) and Q = (ξ12 , ξ22) are distinct
points in the plane, then the line L through P and Q has equation
 
ξ1 ξ2 1
det  ξ11 ξ21 1  = 0.
ξ12 ξ22 1

For if the equation of the line has the form aξ1 + bξ2 + c = 0, then we have

aξ11 + bξ21 + c = 0
aξ12 + bξ22 + c = 0.

This means that the above three homogeneous equations (in the variables
a, b, c) has a non-trivial solution. As we know, this is equivalent to the
vanishing of the above determinant.
In exactly the same way one shows that the plane through (ξ11 , ξ21 , ξ31),
(ξ12 , ξ22 , ξ32) and (ξ13 , ξ23 , ξ33) has equation
 
ξ1 ξ2 ξ3 1
 ξ11 ξ21 ξ31 1 
det  
 ξ12 ξ22 ξ32 1  = 0.
ξ13 ξ23 ξ33 1

The circle through (ξ11 , ξ22), (ξ12 , ξ22) and (ξ13, ξ23 ) has equation:
 
(ξ1 )2 + (ξ2 )2 ξ1 ξ2 1
 (ξ11 )2 + (ξ21 )2 ξ11 ξ21 1 
det   (ξ12 )2 + (ξ22 )2
 = 0.
ξ12 ξ22 1 
(ξ13 )2 + (ξ23 )2 ξ13 ξ23 1

(Note that the coefficient of ξ12 + ξ22 is


 1 1 
ξ1 ξ2 1
det  ξ12 ξ22 1 
ξ13 ξ23 1

and this fails to vanish precisely when the points are non-collinear).

22
VI. Orientation: A linear isomorphism f on a vector space V is said
to preserve orientation if its determinant is positive—otherwise it reverses
orientation. This concept is particularly important for isometries and those
which preserve orientation are called proper. Thus the only proper isome-
tries of the plane are translations and rotations.
Two bases (x1 , . . . , xn ) and (x′1 , . . . , x′n ) have the same orientation if
the linear mapping which maps xi onto x′i for each i preserves orientation.
This just means that the transfer matrix from (xi ) to (x′j ) has positive de-
terminant. For instance, in R3 , (e1 , e2 , e3 ) and (e3 , e1 , e2 ) have the same
orientation, whereas that of (e2 , e1 , e3 ) is different.

Example Is  
cos α cos β sin α cos β − sin β
 cos α sin β sin α sin β cos β 
− sin α cos α 0
the matrix of a rotation?
Solution: Firstly the columns are orthonormal and so the matrix induces an
isometry. but the determinant is

− sin2 α cos2 β − cos2 α sin2 β − sin2 α sin2 β − cos2 α cos2 β = −1.

Hence it is not a rotation.

Example: Solve the system

(m + 1)x + y + z = 2−m
x + (m + 1)y + z = −2
x + y + (m + 1)z = m.

The determinant of the matrix A of the equation is m2 (m + 3) which is


non-zero, unless m = 0 or m = −3. Otherwise the solution is, by Cramer’s
rule,  
2−m 1 1
1
x= 2 det  −2 m + 1 1 
m (m + 3)
m 1 m+1
2−m
i.e. . The values of y and z can be calculated similarly.
m

23
Exercises: 1) Show that the centre of the circle through the points (ξ11 , ξ22),
(ξ12 , ξ22 ) and (ξ13 , ξ23) has coordinates
 1 2   1 2 
(ξ1 ) + (ξ21 )2 ξ21 1 (ξ1 ) + (ξ21 )2 ξ11 1
( 21 det  (ξ12)2 + (ξ22 )2 ξ22 1  , 21 det  (ξ12 )2 + (ξ22 )2 ξ12 1 )
(ξ13)2 + (ξ23 )2 ξ23 1 (ξ 3 )2 + (ξ23 )2 ξ13 1
 1 2  1 .
ξ1 ξ2 1
det  ξ12 ξ22 1 
ξ13 ξ23 1

2) Show that in Rn the equation of the hyperplane through the affinely


independent points x1 , . . . , xn is
 
ξ1 ξ2 ... ξn 1
 ξ1 ξ1 ... ξn1 1 
 1 2 
det  .. ..  = 0.
 . . 
ξ1n ξ2n . . . ξnn 1

3) Let A be an invertible n × n matrix. Use that fact that if AX = Y , then


AX̃ = Ỹ where
   
x1 0 . . . 0 y1 a12 . . . a1n
 x2 1 . . . 0   y2 a22 . . . a2n 
   
X̃ =  .. ..  Ỹ =  .. .. 
 . .   . . 
xn 0 . . . 1 yn an2 . . . ann

to give an alternative proof of Cramer’s rule.


4) Let

p(t) = a0 + · · · + am tm
q(t) = b0 + · · · + bn tn

be polynomials whose leading coefficients are non-zero. Show that they have
a common root if and only if the determinant of the (m+ n) ×(m+ n) matrix
 
am am−1 . . . a1 a0 0 ... 0
 0 am . . . a2 a1 a0 . . . 0 
 . .. 
 . 
 . . 
 
A= 0 0 . . . 0 am am−1 . . . a0 
 
 bn bn−1 . . . b1 b0 0 ... 0 
 . .. 
 .. . 
0 0 . . . bn ... b0

24
is non-zero. (This is known as Sylvester’s criterium for the existence of a
common root). In order to prove it calculate the determinants of the matrices
B and BA where B is the (m + n) × (m + n) matrix
 
tn+m−1 0 0 ... 0
 tn+m−2 1 0 ... 0 
 

 tn+m−3 0 1 ... 0 
 .. .. 
 . . 
 .
 t 0 0 ... 0 
 
 1 0 0 ... 0 
 .. .. 
 . . 
1 ... 1

25
2 COMPLEX NUMBERS AND COMPLEX
VECTOR SPACES
2.1 The construction of C
When we discuss the eigenvalue problem in the next chapter, it will be con-
venient to consider complex vector spaces i.e. those for which the complex
numbers play the role taken by the reals in the third chapter. We therefore
bring a short introduction to the theme of complex numbers.
Complex numbers were stumbled on by the renaissance mathematician
Cardano in the famous formulae
√ p √ p √ p
λ1 = 3 α + 3 β λ2 = ω 3 α + ω 2 3 β λ3 = ω 2 3 α + ω 3 β
p p √
−q + q 2 + 4p3 −q − q 2 + 4p3 −1 + 3i
where α = , β = , ω = for the
2 2 2
roots λ1 ,λ2 and λ3 of the cubic equation

x3 + 3px = q = 0

(which he is claimed to have stolen from a colleague). In the above formulae


i denotes the square root of −1 i.e. a number with the property that i2 = −1.
This quantity appears already in the solution of the quadratic equation by
radicals but only in the case where the quadratic has no real roots. In
the cubic equation, its occurrence is unavoidable, even in the case where
the discriminant q 2 + 4p3 is positive, in which case the cubic has three real
roots. Since the square of a real number is positive, no such number with the
defining property of i exists and mathematicians simply calculated formally
with expressions of the form x + iy as if the familiar rules of arithmetic still
hold for such expressions. Just how uncomfortable they were in doing this is
illustrated by the following quotation from Leibniz:
• “The imaginary numbers are a free and marvellous refuge of the divine
intellect, almost an amphibian between existence and non-existence.”
The nature of the complex numbers was clarified by Gauß in the nine-
teenth century with the following geometrical interpretation¿ the real num-
bers are identified with the points on the x-axis of a coordinate plane. One
associates the number i with the point (0, 1) on the plane and, accordingly,
the number x + iy with the point (x, y). The arithmetical operations of
addition and multiplication are defined geometrically as in the following fig-
ure (where the triangles OAB and OCD are similar). A little analytical
geometry shows that these operations can be expressed as follows:

26
addition: (x, y) + (x1 , y1) = (x + x1 , y + y1 )’
multiplication: (x, y) · (x1 , y1 ) = (xx1 − yy1, xy1 + x1 y).
Note that these correspond precisely to the expressions obtained by formally
adding and multiplying x + iy and x1 + iy1 .
This leads to the following definition: a complex number is an ordered
pair (x, y) of real numbers. On the set of such numbers we define addition
and multiplication by the above formulae. We use the following conventions:
• 1) i denotes the complex number (0, 1) and we identity the real number
x with the complex number (x, 0). Then i2 = −1 since
(0, 1) · (0, 1) = (−1, 0).
Every complex number (x, y) has a unique representation x = iy where
x, y ∈ R. (It is customary to use letters such as z, w, . . . for complex
numbers). If z = x + iy (x, y ∈ R), then x is called the real part of z
(written ℜz) and y is called the imaginary part (written ℑz).
• 2) If z = x + iy, we denote the complex number x − iy (i.e. the mirror
image of z in the x-axis) by z̄—the complex-conjugate of z. Then
the following simple relations holds:
z + z1 = z̄z1 ;
zz1 = z̄ · z1 ;
1
ℜz = (z + z̄);
2
1
ℑz = (z − z̄);
2i p
z · z̄ = |z|2 where |z| = x2 + y 2 .
|z| is called the modulus or absolute value of z. It is multiplicative
in the sense that |zz1 | = |z||z1 |.
• 3) every non-zero complex number z has a unique representation of the
form
ρ(cos θ + i sin θ)
where ρ > 0 and θ ∈ [0, 2π[. Here ρ = |z| and θ is the unique real
x y
number in [0, 2π[ so that cos θ = , sin θ = .
ρ ρ
We denote the set of complex numbers by C. Of course, as a set, it is
identical with R2 and we use the notation C partly for historical reasons and
partly to emphasis the fact that we are considering it not just as a vector
space but also with its multiplicative structure.

27
Proposition 5 For complex numbers z, z1 , z2 , z3 we have the relationships

• z1 + z2 = z2 + z1 ;

• z1 + (z2 + z3 ) = (z1 + z2 ) + z3 ;;

• z1 = 0 = 0 + z1 = z1 ’

• z1 + (−z1 ) = 0 where − z1 = (−1)z1 ;

• z1 (z2 + z3 ) = z1 z2 + z1 z3 ;

• z1 z2 = z2 z1 ;

• z1 (z2 z3 ) = (z1 z2 )z3 ;

• z1 · 1 = 1 · z1 = z1 ;

• if z 6= 0, there is an element z −1 so that z · z −1 = 1 (take z −1 = ).
|z|2

This result will be of some importance for us since in our treatment of linear
equations, determinants, vector spaces and so on, the only properties of the
real numbers that we have used are those which correspond to the above
list. Hence the bulk of our definitions, results and proofs can be carried over
almost verbatim to the complex case and, with this justification, we shall use
the complex versions of results which we have proved only for the real case
without further comment.
It is customary to call a set with multiplication and addition operations
with such properties a field. A further example of a field is the set Q of
rational numbers.

de Moivre’s formula: The formula for multiplication of complex numbers


has a particularly transparent form when the complex numbers are repre-
sented in polar form: we have

ρ1 (cos θ1 + i sin θ1 )ρ2 (cos θ2 + i sin θ2 ) = ρ1 ρ2 (cos(θ1 + θ2 ) + i sin(θ1 + θ2 )).

This is derived by multiplying out the left hand side and using the addition
formulae for the trigonometric functions.
This equation can be interpreted geometrically as follows: multiplication
by the complex number z = ρ(cos θ + i sin θ) has the effect of rotating a
second complex number through an angle of θ and multiplying its absolute
value by ρ (of course this is one of the similarities considered in the second

28
chapter—in fact, a rotary dilation). As a Corollary of the above formula we
have the famous result

(cos θ + i sin θ)n = cos nθ + i sin nθ

of de Moivre. It is obtained by a simple induction argument on n ∈ N.


Taking complex conjugates gives the result for −n and so it holds for each
n ∈ Z.
From it we can deduce the following fact: if z = ρ(cos θ + i sin θ) is a
non-zero complex number, then there are n solutions of the equation ζ n = z
(n ∈ N) given by the complex numbers
1 2πr + θ 2πr + θ
ρ n (cos + sin )
n n
(r = 0, 1, . . . , n−1). In particular, there are n roots of unity (i.e. solutions of
the equations ζ n = 1), namely the complex numbers 1, ω, ω 2, . . . , ω n−1 where
ω = cos 2π
n
+ i sin 2π
n
is the primitive n-th root of unity.

Example: If z is a complex number, not equal to one, show that


1 − z n+1
1 + z + z2 + · · · + zn = .
1−z
Use this to calculate the sums

1 + cos θ + · · · + cos nθ

and
sin θ + · · · + sin nθ.
Solution: The first part is proved exactly as in the case of the partial sums
of a real geometric series. If we set z = cos θ + i sin θ and take the real part,
we get
1 − cos(n + 1)θ − i sin(n + 1)θ
1 + cos θ + · · · + cos nθ = ℜ
1 − cos θ − i sin θ
which simplifies to the required formula (we leave the details to the reader).
The sine part is calculated with the aid of the imaginary part. Example:
Describe the geometric form of the set

C = {z ∈ C : zz̄ + āz + az̄ + b = 0}

where a is a complex number and b a real number.

29
Solution Substituting z = x + iy we get

C = {(x, y) : x2 + y 2 + 2a1 x + 2a2 y + b = 0}

(where a = a1 + ia2 ) which is a circle, a point or the empty set depending


on the values of a1 , a2 and b.

Exercises on complex numbers: 1) Calculate ℜz, ℑz, |z|, z −1 and arg z


where
√ 1+i
z = 1 − i z = 3 + 2i .
1−i
Show that if z1 , z2 ∈ C are such that z1 z2 and z1 + z2 are real, then either
z1 = z 2 or z1 and z2 are themselves real.
2)
 If x, x1, y, y1 ∈ R and X = diag(x, x), Y = diag(y, y) etc. while J =
0 1
, calculate (X + JY )(X − JY ), (X + JY )(X1 + JY1 ), (X + JY )−1
−1 0
(when it exists) and det(X + JY ). (Compare the results with the arithmetic
operations in C. This exercise can be used as the basis for an alternative
construction of the complex numbers).
3) Use de Moivre’s theorem to derive a formula for cos4 θ in terms of cos 2θ
and cos 4θ.
4) Show that if n is even, then
 
n n n
cos nθ = cos θ − cosn−2 θ sin2 θ + · · · + (−1) 2 sinn θ;
2
 
n n
sin nθ = cosn−1 θ sin θ + · · · + (−1) 2 −1 n cos θ sinn−1 θ.
1

What are the corresponding results for n odd?


5) Suppose that |r| < 1. Calculate

1 + r cos θ + r 2 cos 2θ + . . .

and
r sin θ + r 2 sin 2θ + . . .
6) Show that the points z1 , z2 and z3 in the complex plane are the vertices
of an equilateral triangle if and only if

z1 + ωz2 + ω 2 z3 = 0

or
z1 + ω 2 z2 + ωz3 = 0

30
2πi
where ω = e 3 .
If z1 , z2 , z3 , z4 are four complex numbers, what is the geometrical signifi-
cance of the condition

z1 + iz2 + (i2 )z3 + (i3 )z4 = 0?

(Note that i is the primitive fourth root of unity).


7) Show that if z1 , z2 , z3 , z4 are complex numbers, then

(z1 − z4 )(z2 − z3 ) + (z2 − z4 )(z3 − z1 ) + (z3 − z4 )(z1 − z2 ) = 0.

Deduce that if A, B, C, D are four points in the plane, then

|AD||BC| ≤ |BD||CA| + |CD||AB|.

8) We defined the complex numbers formally as pairs of real numbers. In


this exercise, we investigate what happens if we continue this process i.e. we
consider pairs (z, w) of complex numbers. On the set Q of such pairs we
define the natural addition and multiplication as follows:

(z0 , w0 )(z1 , w1 ) = (z0 z1 − w0 w1 , z0 w1 + z1 w0 ).

Show that Q satisfies all of the axioms of a field with the exception of the
commutativity of multiplication (such structures are called skew fields).
Show that if we put i = (i, 0), j = (0, i), k = (0, 1), then ij = −ji = k,
jk = −kj = i etc. and i2 = j2 = k2 = −1. Also every element of Q has a
unique representation of the form

ξ0 + (ξ1 i + ξ2 j + ξ3 k)

with ξ0 , ξ1, ξ2 , ξ3 ∈ R. (The elements of Q are called quaternions).

31
2.2 Polynomials
The field of complex numbers has one significant advantage over the real
field. All polynomials have roots. This result will be very useful in the next
chapter—it is known as the fundamental theorem of algebra and can
be stated in the following form:

Proposition 6 Let

p(λ) = a0 + · · · + an−1 λn−1 + λn

be a complex polynomial with n ≥ 1. Then there are complex numbers


λ1 , . . . , λn so that
p(λ) = (λ − λ1 ) . . . (λ − λn ).

There is no simple algebraic proof of this result which we shall take for
granted.
The fundamental theorem has the following Corollary on the factorisation
of real polynomials.

Corollar 1 Let
p(t) = a0 + · · · + an−1 tn−1 + tn
be a polynomial with real coefficients. Then there are real numbers

t1 , . . . , tr , α1 , . . . , αs , β1 , . . . , βs

where r + 2s = n so that

p(t) = (t − t1 ) . . . (t − tr )(t2 − 2α1 t + α12 + β12 ) . . . (t2 − 2αs t + αs2 + βs2 ).

Proof. We denote by λ1 , . . . , λn the complex roots of the polynomial

p(λ) = a0 + a1 λ + · · · + λn .

Since p(λ) = p(λ̄) (the coefficients being real), we see that a complex number
λ is a root if and only if its complex conjugate is also one. Hence we can list
the roots of p as follows: firstly the real ones

t1 , . . . , tr

and then the complex ones in conjugate pairs:

α1 + iβ1 , α1 − iβ1 , . . . , αs + iβs , αs − iβs .

32
Then we see that p has the required form by multiplying out the correspond-
ing linear and quadratic terms.
The next result concerns the representation of rational functions. These
p
are functions of the form where p and q are polynomials. By long division
q
every such function can be expressed as the sum of a polynomial and a
ildep
rational function where the degree d(p̃) of p̃ (i.e. the index of its highest
q
power) is strictly less than that of q. Hence from now on we shall tacitly
assume that this condition is satisfied. Further it is no loss of generality to
suppose that the leading coefficient of q is “1”.
We consider first the case where q has simple zeros i.e.
q(λ) = (λ − λ1 ) . . . (λ − λn )
where the λi are distinct. Then we claim that there are uniquely determined
complex numbers a1 , . . . , an so that
p(λ) a1 an
= + ...
q(λ) λ − λ1 λ − λN
for λ ∈ C \ {λ1 , . . . , λn }.
Proof. This is equivalent to the equation
n
X
p(λ) = ai qi (λ)
i=1

q(λ)
where qi (λ) = λ−λ i
. If this holds for all λ as above then it holds for all λ in
C since both sides are polynomials. Substituting successively λ1 ,λ2 , . . . , λn
in the equation we see that
p(λ1 )
a1 =
(λ1 − λ2 ) . . . (λ1 − λn )
..
.
p(λn )
an =
(λ2 − λn ) . . . (λn − (λn−1
The general result (i.e. where q has multiple zeros) is more complicated to
state. We suppose that
q(λ) = (λ − λ1 )n−1 . . . (λ − λr )nr
where the λi are distinct and claim that the rational function can be written
1
as a linear combination of functions of the form (λ−λ i)
j for 1 ≤ i ≤ r and

1 ≤ j ≤ ni .

33
Proof. Write
p(λ) p(λ)
=
q(λ) (λ − λ1 )n1 q1 (λ)
where q1 (λ) = (λ − λ2 )n2 . . . (λ − λr )nr . We claim that there is a polynomial
p1 with d(p1 ) = d(p) − 1 and an a ∈ C so that
p(λ) a p1 (λ)
= +
(λ − λ1 )n1 q1 (λ) (λ − λ1 )n1 (λ − λ1 )n1 −1 q1 (λ)
from which the proof follows by induction.
For the above equation is equivalent to the following one:
p(λ) − aq1 (λ) p1 (λ)
= .
q(λ) (λ − λ1 )n1 −1 q(λ)
Hence it suffices to choose a ∈ C so that p(λ) − aq1 (λ) contains a factor
p(λ1 )
λ − λ1 and there is precisely one such a namely a = .
q1 (λ)
We remark that the degree function satisfies the following properties:
d(p + q) ≤ max(d(p), d(q))
with equality if d(p) 6= d(q)) and
d(pq) = d(p) + d(q)
provided that p and q are non-zero.
The standard high school method for the division of polynomials can be
used to prove the existence part of the following result:
Proposition 7 Let p and q be polynomials with d(p) ≥ 1. Then there are
unique polynomials r, s so that
q = ps + r
where r = 0 or d(r) < d(p).
Proof. In the light of the above remark, we can confine ourselves to a proof
of the uniqueness: suppose that
q = ps + r = ps1 + r1
for suitable s, r, s1 , r1 . Then
p(s − s1 ) = r − r1 .
Now the right hand side is a polynomial of degree strictly less than that of p
and hence so is the left hand side. But this can only be the case if s = s1 .

34
The above division algorithm can be used to prove an analogue of the
Euclidean algorithm for determining the greatest common divisor of two
polynomials p, q. We say that for two such polynomials, q is a divisor of
p (written q | p) if there is a polynomial s so that p = qs. Note that then
d(p) ≥ d(q) (where d(p) denotes the degree of p). Hence if p | q and q | p, then
d(p) = d(q) and it follows that p is a non-zero constant times q (we are tacitly
assuming that the polynomials p and q are both non-zero). The greatest
common divisor of p and q is by definition a common divisor which has
as divisor each other divisor of p and q. It is then uniquely determined up
to a scalar multiple and we denote it by g.c.d. (p, q). It can be calculated as
follows: we suppose that d(q) ≤ d(p) and use the division algorithm to write

p = qs1 + r1

with r1 = 0 or d(r1 ) < d(q). In the first case, q is the greatest common
divisor. Otherwise we write

q = s2 r1 + r2

then
r1 = s3 r2 + r3
and continue until we reach a final equation rk = sk+2rk+1 without remainder.
Then rk−1 is the greatest common divisor and by substituting backwards
along the equations, we can compute a representation of it in the form mp+nq
for suitable polynomials m and n.

Lagrange interpolation A further useful property of polynomials is the


following interpolation method: suppose that we have (n + 1) distinct points
t0 , . . . , tn in R. Then for any complex numbers a0 , . . . , an we can find a
polynomial p of degree at most n so that p(ti ) = ai for each i. To do this
note that the polynomial
Y t − tj
pi (t) =
i6=j
ti − tj

has the property that it takes on the value 1 at ti and 0 at the other tj . Then
n
X
p= ai pi
i=0

is the required polynomial.

35
Exercises: 1) Show that a complex number λ0 is a root of order r of the
polynomial p (i.e. (λ − λ0 )r divides p) if and only if

p(λ0 ) = p′ (λ0 ) = · · · = p(r−1) (λ0 ) = 0.

2) Show that, in order to prove the fundamental theorem of algebra, it suffices


to prove that every polynomial p of degree ≥ 1 has at least one zero.
3) Prove the statement of 2) (and hence the fundamental theorem of algebra)
by verifying the following steps:

• show that if p is a non-constant polynomial, then there is a point λ0 in


C so that |p(λ0 )| ≤ |p(λ)|(λ ∈ C).

• show that p(λ0 ) = 0.

(If a0 = p(λ0 ) is non-zero, consider the Taylor expansion

p(λ) = a0 + ar (λ − λ0 )r + · · · + an (λ − λ0 )n

of p at λ0 where ar is the first coefficient after a0 which does not vanish.


Show that there is a point λ1 so that |p(λ1 )| < |a0 | which is a contradiction).
4) Show that the set of rational functions is a field with the natural algebraic
operations.

36
2.3 Complex vector spaces and matrices
We are now in a position to define complex vector spaces.

Definition: A complex vector space (or vector space over C) is a


set V together with an addition and a scalar multiplication i.e. mappings
(x, y) 7→ x + y resp. (λ, x) 7→ λx form V × V into V resp. from C × C into
V so that
• x + y = y + x (x, y ∈ V )’
• x + (y + z) = (x + y) + z (x, y, z ∈ V );
• there is a vector 0 so that x + 0 = x for each x ∈ V ;
• for each x ∈ V there is a vector y so that x + y = 0;
• (λµ)x = λ(µx) (λ, µ ∈ C, x ∈ V );
• 1 · x = x (x ∈ V );
• λ(x + y) = λx + λy and (λ + µ)x = λx + µx (λ, µ ∈ C, x, y ∈ V ).
The following modifications of our examples of real vector spaces provide us
with examples of complex ones:
• Cn —the space of n-tuples (λ1 , . . . , λn ) of complex numbers;
• PolC (n)—the space of polynomials with complex coefficients of degree
at most n;
C
• Mm,n —the space of m × n matrices with complex elements.
We can then define linear dependence resp. independence for elements of a
complex vector space and hence the concept of basis. Every complex vector
space which is spanned by finitely many elements has a basis and so is iso-
morphic to Cn where n is the dimension of the space i.e. the cardinality of
a basis.
If V and W are complex vector spaces, the notion of a linear mapping
from V into W is defined exactly as in the real case (except, of course, for
the fact that the homogeneity condition f (λx) = λf (x) must now hold for all
complex λ). Such a mapping f is determined by a matrix [aij ] with respect
to bases (x1 , . . . , xn ) resp. (y1 , . . . , yn ) where the elements of the matrix are
now complex numbers and are determined by the equations
m
X
f (xj ) = aij yi .
i=1

37
The theory of chapters I and V for matrices can then be carried over in the
obvious way to complex matrices.
Sometimes it is convenient to be able to pass between complex and real
vectors and this can be achieved as follows: if V is a complex vector space,
then we can regard it as a real vector space simply by ignoring the fact that
we can multiply by complex scalars. We denote this space by V R . This
notation may seem rather pedantic—but note that if the dimension of V is n
then that of VR is 2n. This reflects the fact that elements of V can be linearly
dependent in V without being so in VR since there are less possibilities for
building linear combinations in the latter. For example, the sequence

(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, . . . , 0, 1)

is a basis for Cn whereas the longer sequence

(1, 0, . . . , 0), (i, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, . . . , 0, i)

is necessary to attain a basis for the real space Cn which is thus 2n dimen-
sional.
On the other hand, it V is a real vector space we can define a correspond-
ing complex vector space VC as follows: as a set VC is V × V . It has the
natural addition and scalar multiplication is defined by the equation

(λ + iµ)(x1 , x2 ) = (λx1 − µx2 , µx1 + λx2 ).

The dimensions of V (as a real space) and VC (as a complex space) are the
same. If f : V → W is a linear mapping between complex vector space then
it is a fortiori a linear mapping from VR into WR . However, a real linear
mapping between the latter spaces need not be complex-linear. On the other
hand, if f : V → W is a linear mapping between real vector spaces, we can
extend it to a complex linear mapping fC between VC and WC by defining

fC (x1 , x2 ) = (f (x1 ), f (x2 )).

Exercises: 1) Solve the system

(1 − i)x − 9y =0
2x + (1 − i)y = 1.

2) Find a Hermitian form for the matrix


 
i −(1 + i) 1
 1 2 −1  .
2i 1 1

38
3) Show that if z1 , z2 , z3 are complex numbers, then
 
z1 z1 1
det  z2 z2 1 
z3 z3 1

is 4i times the area of the triangle with z1 , z2 , z3 as vertices.


4) Let A and B be real n × n matrices. Show that if A and B are similar
as complex matrices (i.e. if there is an invertible complex matrix P so that
P −1 AP = B) then they are similar as real matrices (i.e. there is a real matrix
P with the same property).
5) Let A and B be real n × n matrices. Show that

det(A + iB = det(A − iB)

and that  
A −B
det = | det(A + iB)|2 .
B A
6) (The following exercise shows that complex 2 × 2 matrices can be used to
give a natural approach to the two products in R3 ). Consider the space M2C
of 2 × 2 complex matrices. If A is such a matrix, say
 
a11 a22
A=
alpha21 a22

then we write A∗ for the matrix


 
a11 a21
.
a12 a22

(The significance of this matrix will be discussed in more detail in Chapter


VII). We let E3 denote the family of those A which satisfy the conditions
A = A∗ and tr A = 0. The set of such matrices is a real vector space (but
not a complex one). In fact if x = (ξ1 , ξ2, ξ3 ) is an element of R3 , then the
matrix  
ξ3 ξ1 − iξ2
Ax =
ξ1 + ξ2 −ξ3
is in E3 . Show that this induces an isomorphism between R3 and E3 . Show
that E3 is not closed under (matrix) products but that if A, B ∈ E3 , then
1
A ∗ B = (AB + BA)
2

39
is also in E3 and that

Ax ∗ Ay = (x|y)I2 + Ax×y .

7) Complex 2 × 2 matrices also allow a natural approach to the subject of


quaternions (cf. Exercise 8) of section VI.1). Consider the set of matrices of
the form  
z w
w̄ −z̄
where z, w ∈ C. Show that this is closed under addition and multiplication
and that the mapping  
z w
(z, w) 7→
w̄ −z̄
is a bijection between Q and the set of such matrices which preserves the
algebraic operations. (Note that under this identification, the special quater-
nions have the form
     
0 1 0 −i 1 0
ii = j= k=
1 0 i 0 0 −1

and that the quaternion ξ0 + ξ1 i + ξ2 j + ξ3 k is represented by the matrix


ξ0 I2 + Ax where Ax is as in exercise 6)).
8) Use the results of the last two exercises to give new proofs of the following
identities involving the vector and scalar products:

x × y = −y × x;
kx × yk = kxkkyk sin θ where θ is the angle between x and y;
(x × y) × z = (x|z)y − (y|z)x;
(x × y) × z + (y × z) × x + (z × x) × y = 0.

40
3 EIGENVALUES
3.1 Introduction
In this chapter we discuss the so-called eigenvalue problem for operators
or matrices. This means that for a given operator f ∈ L(V ) a scalar λ and
a non-zero vector x are sought so that f (x) = λx (i.e. the vector x is not
rotated by f ). Such problems arise in many situations, some of which we
shall become acquainted with in the course of this chapter. In fact, if the
reader examines the discussion of conic sections in the plane and three di-
mensional space he will recognise that the main point in the proof was the
solution of an eigenvalue problem. The underlying theoretical reason for the
importance of eigenvalues is the following: we know that a matrix is the
coordinate representation of an operator. Even in the most elementary an-
alytic geometry one soon appreciates the advantage of choosing a basis for
which the matrix has a particularly simple form. The simplest possible form
is that of a diagonal matrix and the reader will observe that we obtain such
a representation precisely when the basis elements are so-called eigenvectors
of the operator f i.e. they satisfy the condition f (xi ) = λi xi for suitable
eigenvalues λ1 , . . . , λn (which then form the diagonal elements of the corre-
sponding matrix). Stated in terms of matrices this comprises what we may
call the diagonalisation problem: given an n × n matrix A can we find an
invertible matrix S so that S −1 AS is diagonal?
Amongst the advantages that such a diagonalisation brings is the fact that
one can then calculate simply and quickly arbitrary powers and thus poly-
nomial functions of a matrix by doing this for the diagonal matrix and then
transforming back. We shall discuss some applications of this below. On the
other hand, if A is the matrix of a linear mapping in Rn we can immediately
read off the geometrical form of the latter from its diagonalisation.
We begin with the formal definition. If f ∈ L(V ), an eigenvalue of f
is a scalar λ so that there exists a non-zero x with f (x) = λx. The space
Ker (f − λId) is then non-trivial and is called the eigenspace of λ and each
non-zero element therein is called an eigenvector. Our main concern in
this chapter will be the following: given an operator f , can we find a basis
for V consisting of eigenvectors? In general the answer is no as very simple
examples show but we shall obtain a result which, while being much less
direct, is still useful in theory and applications.
We can restate the eigenvalue problem in terms of matrices: an eigenvalue
resp. eigenvector for an n × n matrix A is an eigenvalue resp. eigenvector for
the operator fA i.e. λ is an eigenvalue if and only if there exists a non-zero
column vector X so that AX = λX and X is then called an eigenvector.

41
Before beginning a systematic development of the theory, we consider a
simple example where an eigenvalue problem arises naturally—in this case
in the solution of a linear system of ordinary differential equations:

Example: Consider the coupled system


df
= 3f + 2g
dt
dg
= f + 2g.
dt
If we introduce the (vector-valued) function
 
f (t)
F (t) =
g(t)

we can write this equation in the form


dF
= AF
dt
 
  df
3 2 dF  dt 
where A = and is, of course, the function  dg .
1 2 dt
dt
Formally, this looks very much like one of the simplest of all differential
equations—the equation
df
= af
dt
and by analogy, we try the substitution
 
λt x1
F (t) = e X where X = .
x2

This leads to the solution

Aeλt X = AF (t) = F ′ (t) = λeλt X

or AX = λX i.e. an eigenvalue problem for A which we now proceed to


solve. The column vector must be a non-trivial solution of the homogeneous
system
(3 − λ)x1 + 2x2 = 0
x1 + (2 − λ)x2 = 0.

42
Now we know that such a solution exists if and only if the determinant of
the corresponding matrix vanishes. This leads to the equation

(3 − λ)(2 − λ) − 2 = 0

for λ which has solutions λ = 1 or λ = 4.


If we solve the corresponding homogeneous systems we get eigenvectors
(−1, 1) and (2, 1) respectively and they form a basis for R2 . hence if
 
−1 2
S=
1 1

is the matrix whose columns are the eigenvectors of A we see that


 
−1 1 0
S AS = .
0 4

Then our differential equation has the solution


   
t −1 4t 2
F (t) = c1 e + c2 e
1 1

i.e. f (t) = −c1 et + 2ce4t , g(t) = c1 et + c2 e4t for arbitrary constants c1 , c2 .

43
3.2 Characteristic polynomials and diagonalisation
The above indicates the following method for characterising eigenvalues:
Proposition 8 If A is an n × n matrix, then λ is an eigenvalue of A if and
only if λ is a root of the equation

det(A − λI) = 0.

Proof. λ is an eigenvalue if and only if the homogeneous equation AX =


λX has a non-trivial solution and the matrix of this equation is A − λI.
We remark that det(A − λI) = 0 is a polynomial equation of degree n.
From this simple result we can immediately draw some useful conclusions:
I. Every matrix over C has n eigenvalues, the n roots of the above equa-
tion (although some of the eigenvalues can occur as repeated roots of the
equation).
II. Every real matrix has at least one real eigenvalue if n is odd (in particular
if n = 3). The example of a rotation in R2 shows that 2 × 2 matrices over R
need not have any eigenvalues.
III. If A is a triangular matrix, say
 
a11 a12 . . . a1n
 0 a22 . . . a2n 
 
A =  .. ..  ,
 . . 
0 0 . . . ann

then the eigenvalues of A are just the diagonal elements a11 , . . . , ann (in
particular, this holds if A is diagonal). For

det(A − λI) = (a11 − λ) . . . (ann − λ)

with roots a11 , . . . , ann .


Owing to the special role played by the polynomial det(A − λI) in the
eigenvalue problem we give it a special name—the characteristic polyno-
mial of A—in symbols χA . For example, if
 
1 −1 0
A =  −1 2 −1 
0 −1 1

then χA (λ) = λ(1 − λ)(λ − 3).


For an operator f ∈ L(V ), the characteristic polynomial is defined by the
equation
χf (λ) = det(f − λId)

44
and the eigenvalues are the roots of this polynomial.
We now turn to the topic of the diagonalisation problem. The connection
with the eigenvalue problem is made explicit in the following result:
Proposition 9 A linear operator f ∈ L(V ) is diagonalisable if and only if
V has a basis (x1 , . . . , xn ) consisting of eigenvectors of f .
Proposition 10 If an n×n matrix A has n linearly independent eigenvectors
X1 , . . . , Xn and S is the matrix [X1 . . . Xn ], then S diagonalises A i.e.
S −1 AS = diag (λ1 , . . . , λn )
where the λi are the corresponding eigenvalues.
Proof. If the matrix of f with respect to the basis (x1 , . . . , xn ) is the
diagonal matrix  
λ1 0 . . . 0
 0 λ2 . . . 0 
 
 .. ..  ,
 . . 
0 0 . . . λn
then f (xi ) = λi xi i.e. each xi is an eigenvector. Conversely, if (xi ) is a basis
so that f (xi ) = λi xi for each i, then the matrix of f is as above. The second
result is simply the coordinate version.
As already mentioned, the example of a rotation in R2 shows that the
condition of the above theorems need not always hold. The problem is that
the matrices of rotations (with the trivial exceptions Dπ and D0 ) have no
(real) eigenvalues. There is no problem if the operator does have n distinct
eigenvalues, as the next result shows:
Proposition 11 Let f ∈ L(V ) be a linear operator in an n dimensional
space and suppose that f has r distinct eigenvalues with eigenvectors x1 , . . . , xr .
Then {x1 , . . . , xr } is linearly independent. Hence if f has n distinct eigen-
values, it is diagonalisable.
Proof. If the xi are linearly dependent, there is a smallest s so that xs is
linearly dependent on x1 , . . . , xs−1 , say xs = µ1 x1 + . . . µs−1 xs−1 . If we apply
f to both sides and then subtract λs times the original equation, we get:
0 = µ1 (λ1 − λs )x1 + · · · + µs−1 (λs−1 − λs )xs−1
and this implies that the x1 , . . . , xs−1 are linearly dependent which is a con-
tradiction.
Of course, it is not necessary for a matrix to have n distinct eigenvalues
in order for it to be diagonalisable, the simplest counterexample being the
unit matrix.

45
Estimates for eigenvalues For applications it is often useful to have esti-
mates for the eigenvalues of a given matrix, rather than their precise values.
In this section, we bring two such estimates, together with some applications.
Recall that if a matrix A is dominated by the diagonal in the sense that
for each i X
|aii | − |aij | > 0,
j6=i

then it is invertible (see Chapter IV). This can be used to give the following
estimate:
Proposition 12 Let A be a complex n×n matrix with eigenvalues λ1 , . . . , λn .
Put for each i X
αi = |aij |.
j6=i

Then the eigenvalues lie in the region


[
{z ∈ C : |z − aii | ≤ αi }.

Proof. It is clear that if λ does not lie in one of the above circular regions,
then the matrix (λI − A) is dominated by the diagonal in the above sense
and so is invertible i.e. λ is not an eigenvalue.
We can use this result to obtain a classical estimate for the zero of poly-
nomials. Consider the polynomial p which maps t onto a0 + a1 t + · · · +
an−1 tn−1 + tn . The roots of p coincide with the eigenvalues of the companion
matrix  
0 1 0 ... 0
 0 0 1 ... 0 
 
C =  .. .. 
 . . 
−a0 −a1 −a2 . . . −an−1
(see Exercise 4) below).
It follows from the above criterium that if λ is a zero of p, then

|λ| ≤ max(|a0 |, 1 + |a1 |, . . . , 1 + |an−1 |).

Our second result shows that the eigenvalues of a small matrix cannot be
too large. More precisely, if A is an n × n matrix and a = maxi,j |aij |, then
each eigenvalue λ satisfies the inequality: |λ| ≤ na. For suppose that
 
ξ1
 
X =  ... 
ξn

46
is a corresponding eigenvector. Then we have

(AX|X) = λ(X|X)

i.e. X X
λ ξi ξi = aij ξi ξj .
i i,j

Taking absolute values, we have the inequality


X X
|λ| |ξi |2 ≤ |aij ||ξi ||ξj |
i i,j
X
≤a |ξi ||ξj |
i,j
X X
= a( |ξi |)( |ξj |)
i j
X
2
≤ na( |ξi | )
i

which implies the result. (In the last inequality, we use the Cauchy-Schwarz
P 1 P 1
inequality which implies that i |ξi | ≤ n 2 ( i |ξi |2 ) 2 . See the next chapter
for details).
We conclude this section with an application of the diagonalisation method:

Difference equations One of many applications of the technique of diago-


nalising a matrix is the solving of difference equations. Rather than develop a
general theory we shall show how to solve a particular equation—the method
is, however, quite general.
The example we choose is the difference equation which defines the famous
Fibonacci series. This is the sequence (fn ) which is defined by the initial
conditions f1 = f2 = 1 and the recursion  formula fn+2 = fn + fn+1 . If
fn+1
we write Xn for the 2 × 1 matrix , then we can write the defining
fn
conditions in the form
 
1
X1 = and Xn+1 = AXn
1
 
1 1
where A = .
1 0
A simple induction argument shows that the general solution is then
Xn = An−1 X1 . In order to be able to exploit this representation it is necessary

47
to compute the powers of A. To do this directly would involve astronomical
computations. The task is simplified
√ by diagonalising
√ A. A simple calculation
1+ 5 1− 5
shows that A has eigenvalues and , with eigenvectors
2 2
 1+√5   1−√5 
2 and 2 .
1 1

Hence " √ #
1+ 5
0
S −1 AS = 2 √
1− 5
0 2
 √
1+ 5

1− 5

where S = 2 .
2
1 1
From this it follows that
 1+√5 1−√5  " √
1+ 5
#" √ #
1 0√ 1 − 1−√2 5
A= √ 2 2 2
1− 5
5 1 1 0 2
−1 1+2 5

and
 √ √ " √
1+ 5
#n " √ #
1 1+ 5 1− 5 0 1 − 1−√2 5
An = √ 2 2 2 √
1− 5 1+ 5
5 1 1 0 2
−1 2

This leads to the formula


" √ !n √ !n #
1 1+ 5 1− 5
fn = √ − .
5 2 2

Examples: 1) Calculate χA (λ) where


 
1 1 ... 1
 ..  .
A =  ... . 
1 1 ... 1

Solution:  
1 −λ 1 ... 1
 .. 
χA (λ) = det  ... . 
1 1 ... 1 − λ
= (n − λ)λn−1 (−1)n−1 .
by a result of the previous chapter.

48
2) Calculate the eigenvalues of the n × n matrix
 
0 1 0 ... 0
 0 0 1 ... 0 
 
A =  .. ..  .
 . . 
1 0 0 ... 0
Solution:  
−λ 1 0 . . . 0
 0 λ 1 ... 0 
 
χA (λ) =  .. .. 
 . . 
1 0 0 ... λ
n−1
= (−1) (λn − 1).
2πir
Hence the eigenvalues are the roots e n of unity (r = 0, . . . , n − 1).
3) Calculate the eigenvalues of the linear mapping
   
a11 a12 a11 a21
7→
a21 a22 a12 a22
on M2 .
Solution: With respect to the basis
       
1 0 0 1 0 0 0 0
x1 = x2 = x3 = x4 =
0 0 0 0 1 0 0 1
the mapping has matrix  
1 0 0 0
 0 0 1 0 
 
 0 1 0 0 
0 0 0 1
and this has eigenvalues 1, 1, 1, −1.
4) Show that fn+1 fn−1 −fn2 = (−1)n−1 where fn is the n-th Fibonacci number.
Solution: Note the
 
2 fn+1 fn
fn+1 fn−1 − fn = det
fn fn−1
 n−1  
1 1 f2 f1
= det
1 0 f1 f0
   
1 1 n−1 2 1
= (det ) det
1 0 1 1
= (−1)n−1 .
.

49
3.3 The Jordan canonical form
As we have seen, not every matrix can be reduced to diagonal form and in
this section we shall investigate what can be achieved in the general case.
We begin by recalling that failure to be diagonalisable can result from two
causes. Firstly, the matrix can fail to have a sufficient number of eigenvalues
(i.e. zeroes of χA ). By the fundamental theorem of algebra, this can only
happen in the real case and in this section we shall avoid this difficulty by
confining our attention to complex matrices resp. vector spaces. The second
difficulty is that the matrix may have n eigenvalues (with repetitions) but
may fail to have enough eigenvectors to span the space. A typical example
is the shear operator
(ξ1 , ξ2 ) 7→ (ξ1 + ξ2 , ξ2 )
 
1 1
with matrix .
0 1
This has the double eigenvalue 1 but the only eigenvectors are multiples
of the unit vector (1, 0). We will investigate in detail the case of repeated
eigenvalues and it will turn out that in a certain sense the shear operator
represents the typical situation. The precise result that we shall obtain is
rather more delicate to state and prove than the diagonalisable case and we
shall proceed by way of a series of partial results. We begin with the following
Proposition which allows us to reduce to the case where the operator f has
a single eigenvalue.
In order to avoid tedious repetitions we assume from now until the end
of this section that f is a fixed operator on a complex vector space V of
dimension and that f has eigenvalues
λ1 , . . . , λ1 , λ2 , . . . , λr , . . . , λr
where λi occurs ni times. This means that f has characteristic polynomial
(λ1 − λ)n1 . . . (λr − λ)nr
where n1 + · · · + nr = n).
Proposition 13 There is a direct sum decomposition
V = V1 ⊕ · · · ⊕ Vr
where
• each Vi is f invariant (i.e. f (Vi ) ⊂ Vi );
• the dimension of Vi is ni and (f − λi Id)ni |Vi = 0.
In particular, the only eigenvalue of f |Vi is λi .

50
Proof. Fix i. It is clear that

Ker (f − λi Id) ⊂ Ker (f − λi Id)2 ⊂ . . .

Hence there exists a smallest ri so that

Ker (f − λi Id)ri = Ker (f − λi Id)ri +1

and so
Ker (f − λi Id)ri = Ker (f − λi Id)ri +m
for m ∈ N.
Then we claim that

V = Ker(f − λi Id)ri ⊕ Im(f − λi Id)ri .

Since the sum of the dimensions of these two spaces is that of V , it suffices
to show that their intersection is {0}. But if y ∈ Ker (f − λi Id)ri and y =
(f − λi Id)ri (x), then (f − λi Id)2ri (x) = 0 and so x ∈ Ker (f − λi Id)2ri =
Ker (f − λi Id)ri i.e. y = 0. It is now clear that if Vi = Ker(f − λi Id)ri , then

V = V1 ⊕ · · · ⊕ Vr

is the required decomposition.


If f ∈ L(V ), then the sequences

Ker f ⊂ Ker f 2 ⊂ . . .

and
f (V ) ⊃ f 2 (V ) ⊃ . . .
become stationary at points r, s i.e. we have

Ker f 6= Ker f 2 6= · · · =
6 Ker f r = Ker f r+1 = . . .

and
f (V ) 6= f 2 (V ) 6= · · · =
6 f s (V ) = f s+1 (V ) = . . .
Then the above proof actually shows the following:

Proposition 14 r = s and V = V1 ⊕ V2 where V1 = f r (V ) and V2 = Ker f r .

Corollar 2 If f is such that Ker f = Ker f 2 , then V = f (V ) ⊕ Ker f .

51
Using the above result, we can concentrate on the restrictions of f to the sum-
mands. These have the special property that they have only one eigenvalue.
Typical examples of matrices with this property are the Jordan matrices
which we introduced in the first chapter. Recall the notation
 
λ 1 0 ... 0
 0 λ 1 ... 0 
 
Jn (λ) =  .. ..  .
 . . 
0 0 0 ... λ

In particular, the shear matrix is J1 (1).


The following facts can be computed easily.
1) Jn (λ) has one eigenvalue, namely λ, and one eigenvector (1, 0, . . . , 0) (or
rather multiples of this vector);
2) If p is a polynomial, then
 (n−1) 
p(λ) p′ (λ) . . . p (n−1)!(λ)
 
 0 p(λ) . . . 
p(Jn (λ)) =  . . .
 .. .. 
0 0 ... p(λ)

3) Jn (λ) − λI is the matrix


 
0 1 0 ... 0
 0 0 1 ... 0 
 
Jn (0) =  .. ..  .
 . . 
0 0 0 ... 0

Hence (Jn (λ) − λI)n = 0 and (Jn (λ) − λI)r 6= 0 if r < n.


The next result shows that all operators with only one eigenvalue can be
represented by blocks of Jordan matrices:
Proposition 15 Let g be a linear operator on the n-dimensional space W
so that (g − λI)n = 0 for some λ ∈ C. Then there is a decomposition

W = W1 ⊕ · · · ⊕ Wk

so that each Wi is g-invariant and has a basis with respect to which the matrix
of g is the Jordan matrix Jsi (λ) where si = dim Wi .
By replacing g by g − λI we can reduce to the following special case which
is the one which we shall prove:

52
Proposition 16 Let g ∈ L(V ) be nilpotent with g r = 0, g r−1 6= 0. Then
there is a decomposition
V = V1 ⊕ · · · ⊕ Vk
with each Vi g-invariant and a basis for each Vi so that g|Vi has matrix Jsi (0)
where si = dim Vi .

Proof. Choose x1 ∈ V so that g r−1 (x1 ) 6= 0. Then the vectors

x1 , g(x1), . . . , g r−1(x1 )

are linearly independent. Otherwise there is a greatest k so that g k (x1 ) is


linearly dependent on g k+1(x1 ), . . . , g r−1(x1 ) say

g k (x1 ) = λk+1 g k+1(x1 ) + · · · + λr−1 g r−1 (x1 ).

But if we apply g r−k−1 to both sides we get g r−1(x1 ) = 0—a contradiction.


Now there are two possibilities: a) (x1 , g(x1 ), . . . , g r−1(x1 )) spans V . Then

y1 = g r−1 (x1 ), y2 = g r−2(x1 ), . . . , y r = x1

is a basis for V with respect to which g has matrix Jr (0).


b) V1 = [x1 , g(x1 ), . . . , g r−1(x1 )] 6= V . We then construct a g-invariant sub-
space V2 whose intersection with V1 is the zero-vector. We do this as follows:
for each y not in V1 there is an integer s with g s−1(y) ∈ / V1 , and g s (y) ∈ V1
(since g i (y) is eventually zero). Choose a y ∈ V \ V1 for which this value of
s is maximal. Pr−1
Suppose that g s (y) = j=0 λj g j (x1 ). Then

0 = g r (y)
= g r−s (g s (y))
r−1
X
= λj g j+r+s(x1 )
j=0
s−1
X
= λj g j+r−s(x1 ).
j=0

Hence λj = 0 for j = 0, . . . ,Ps − 1 since x1 , g(x1 ), . . . , g r−1P


(x1 ) are linearly
s r−1 j r−1
independent and so g (y) = j=s λj g (x1 ). Put x2 = y − j=s λj g j−s(x1 ).
Then g s (x2 ) = 0 and by the same argument as above, {x2 , g(x2 ), . . . g s−1 (x2 )}
is linearly independent. Then V2 = [x2 , γ(x2 ), . . . , g s−1(x2 )] is g-invariant and
has the desired property.

53
Now if V = V1 ⊕ V2 we are finished. If not we can proceed in the same
manner to obtain a suitable V3 and so on until we have exhausted V .
We are now in a position to state and prove our general result. Starting
with the operator f ∈ L(V ) we first split V up in the form
V = V1 ⊕ · · · ⊕ Vr
where each Vi is f -invariant and the restriction of (f −λi Id) to Vi is nilpotent.
Applying the second result we get a further splitting
Vi = W1i ⊕ · · · ⊕ Wkii
and a basis for Wji so that the matrix is a Jordan matrix. Combining all
of the bases for the various Wij we get one for V with respect to which the
matrix of f has the form
diag (J(λ1 ), . . . , J(λ1 ), J(λ2 ), . . . , J(λr ), . . . , J(λr ))
where we have omitted the subscripts indicating the dimensions of the Jordan
matrices.
This result about the existence of the above representation (which is
called the Jordan canonical form of the operator) is rather powerful and
can often be used to prove non-trivial facts about matrices by reducing to
the case of Jordan matrices. We use this technique in the following proof
of the so-called Cayley-Hamilton theorem which states that a matrix is a
“solution” of its own characteristic equation.
Proposition 17 Let A be an n × n matrix. Then χA (A) = 0.
Proof. We begin with the case where A has Jordan form i.e. a block
representation diag (A1 , . . . , Ar ) where Ai is the part corresponding to the
eigenvalue λi . Ai itself can be divided into Jordan blocks i.e.
A = diag(J(λi ), . . . , J(λi )).
Now if p is a polynomial, then p(A) = diag (p(A1 ), . . . , p(Ar )) and so it
suffices to show that χA (Ai ) = 0 for each i. But χA contains the factor
(λi − λ)ni and so χA (Ai ) contains the factor (λI − Ai )ni and we have seen
that this is zero.
We now consider the general case i.e. where A is not necessarily in Jordan
form. We can find an invertible matrix S with à = S −1 AS has Jordan form.
Then χà = χA and so
χA (A) = χà (S) = χà (S ÃS −1 ) = Sχà (Ã)S −1 = 0.
The Cayley-Hamilton theorem can be used to calculate higher powers and
inverses of matrices. We illustrate this with a simple example:

54
Example: If  
2 −1 3
A =  1 0 2 ,
0 3 1
then
χA (λ) = −λ3 + 3λ2 + 3λ − 2
and so
−A3 + 3A2 + 3A − 2I = 0.
Hence A3 = 3A2 − 3A − 2I. From this it follows that

A4 = 3A3 + 3A2 − 2A
     
13 18 3 3 7 7 2 −1 3
= 3  9 13 21  + 3  2 5 5  − 2  1 0 2 
9 18 12 3 3 7 0 3 1
 
44 77 105
= 31 57 74  .

36 57 85

Also 2I = −A3 + 3A2 + 3A = A(−A2 + 3A + 3I) i.e.


 
3 −5 −8
1 
2
A = − (−A + 3A + 3I) = 12
−1
−1 − 21  .
2
? 3 − 12

A further interesting fact that can easily be verified with help of the Jordan
form is the following:

Proposition 18 Let A be an n × n matrix with eigenvalues λ1 , . . . , λn and


let p be a polynomial. Then the eigenvalues of p(A) are p(λ1 ), . . . , p(λn ).

Proof. Without loss of generality, we can assume that A has Jordan form
and then p(A) is a triangular matrix with diagonal entries p(λ1 ), . . . , p(λn ).
The above calculations indicate the usefulness of a polynomial p such that
p(A) = 0. The Cayley-Hamilton theorem provides us with one of degree n.
In general, however, there will be suitable polynomials of lower degree. For
example, the characteristic polynomial of the identity matrix In is (1 − λ)n
but p(I) = 0 where p is the linear polynomial p(λ) = 1 − λ. Since it is
obviously of advantage to take the polynomial of smallest degree with this
property, we introduce the following definition:

55
Definition: Let A be an n × n matrix with characteristic polynomial‘
χA (λ) = (λ1 − λ)n1 . . . (λr − λ)nr .
Then there exists for each i a smallest mi (≤ ni ) so that p(A) = 0 where
p(λ) = (λ1 − λ)m1 . . . (λr − λ)mr .
This polynomial is called the minimal polynomial of A and denoted by
mA . In principle it can be calculated by considering the n1 · n2 . . . nr divisors
of the characteristic polynomial which contain the factor (λ1 − λ) . . . (λr − λ)
and determining the one of lowest degree which annihilates A. In terms of
the Jordan canonical form of A it is clear that mi is the order of the largest
Jordan matrix in the block corresponding to the eigenvalue λi .
We conclude with two simple and typical applications of the Cayley-
Hamilton theorem.
I. Suppose that we are given a polynomial p with roots λ1 , . . . , λn and are
required to construct a second one whose roots are the square of the λi
(without calculating these roots explicitly). This can be done as follows: let
A be the companion matrix of p so that the eigenvalues of A are the roots
of p. Then if B = A2 , the eigenvalues of B are the required numbers. Hence
q = χB is a suitable polynomial.
II. Suppose that we are given two polynomials p and q whereby the roots of
p are λ1 , . . . , λn . If A is the companion matrix of p, then the eigenvalues of
q(A) are q(λ1 ), . . . , q(λn ). Hence p and q have a common root if and only
if det q(A) = 0. This gives a criterium for the two polynomials to have
a common root. For this reason the quantity ∆ = det q(A) is called the
resultant of p and q.
The particular case where q is the derivative of p is useful since the ex-
istence of a common root for p and p′ implies that p has a double root. In
this case the expression ∆ = det p′ (A) is called the discriminant of p.

Example: For which values of a,b, c is the matrix


 
0 0 1
A= 4 a b 
−2 1 c
nilpotent?
Solution: We calculate as follows:
 
−2 1 c
A2 =  4a − 2b a2 + b 4 + ab + bc 
4 − 2c a + c −2 + b + c2

56
which is never zero.
 
4 − 2c a + c −2 + b + c2
A3 =  ? ? ? 
? ? ?

(we do not require the entries marked by a question mark). If A3 = 0, we


must have c = 2, a = −2, b = −2. Then
 
0 0 1
A =  4 −2 −2 
−2 1 2

and one calculates that A3 = 0 i.e. A is nilpotent if and only if a, b and c


have the above values.

Example: Show that the matrix


 
1 1 0
A =  −3 −2 1 
−1 0 1

is nilpotent and find a basis which reduces it to Jordan form.


Solution: One calculates that
 
−2 −1 1
A2 =  2 1 −1 
−2 −1 1

and A3 = 0. Now
       
1 1 1 −2
A  0  =  −3  and A2  0  =  2  .
0 1 0 2

Then (−2, 2, 2), (1, −3, 1) and (1, 0, 0) are linearly independent and with
respect to this basis fA has matrix
 
0 1 0
 0 0 1 .
0 0 0

57
Example: Calculate the minimal polynomials of
   
1 1 1 1 1 1
A= 0  1 1  B =  1 1 1 .
0 0 1 1 1 1

Solution: χA (λ) = (1 − λ)3 and


 
0 1 1
(A − λI) =  0 0 1  .
0 0 0

It is clear that (A − I)2 6= 0 and (A − I)3 = 0 i.e. mA (λ) = (1 − λ)3 . χB (λ) =


λ2 (3 − λ) and one calculates that B(B − 3I) = 0 i.e. mB (λ) = λ(3 − λ).

Exercises: 1) For which values of a, b and c are the following matrices


nilpotent?
   
2 −2 a 0 a b
A =  2 −2 b  B =  6 −3 c  .
1 −1 0 −2 1 −3

2) For each eigenvalue λ of the matrices A and B below calculate the value
of r for which Ker (A − λI)r becomes stationary:
 
  1 0 0 0
4 6 0  2 1 0 0 
A =  −1 −1 0  B=  3
.
2 1 0 
0 0 1
4 3 2 1

3) Solve the differential equations:

df
= g
dt
dg
= −6f + 5g.
dt
4) Diagonalise the matrix
 
1 −1 2
A =  −1 1 2 
2 2 −2

58
and use it to solve the difference equations:

an+1 = an − bn + 2cn
bn+1 = −an + bn + 2cn
cn+1 = 2an + 2bn − 2cn

resp. the system of equations:


dx
= x − y + 2z
dt
dy
= −x + y + 2z
dt
dz
= 2x + 2y − 2z.
dt
5) Let V1 be a subspace of the vector space V which is invariant under the
operator f ∈ L(V ). Show that if f is diagonalisable, then so is the restriction
of f to V1 .
6) A cyclic element for an operator f ∈ L(V ) is a vector x so that

{x, f (x), f 2 (x), . . . , f n−1 (x)}

forms a basis for V (where n = dim V ). Show that if f is diagonalisable,


then it has a cyclic element if and only if its eigenvalues are distinct.
7) Show that if two matrices are diagonalised by the same invertible matrix,
then they commute.
8) Let A be a diagonalisable n × n matrix. Show that a matrix B commutes
with every matrix that commutes with A if and only if there is a polynomial
p so that B = p(A). Show that if A has distinct eigenvalues, then it suffices
for this that B commutes with A.
9) Let f ∈ L(V ) be such that each vector x ∈ V is an eigenvalue. Show that
there is a λ ∈ R so that f = λId.
10) Let f, g ∈ L(V ) commute. Show that each eigenspace of f is g-invariant.
11) Suppose that f ∈ L(V ) has n distinct eigenvalues where n = dim V .
Show that V contains precisely 2n f -invariant subspaces.
12) Show that f ∈ L(V ) is nilpotent if and only if its characteristic polyno-
mial has the form ±λn where n = dim V . Deduce that if f is nilpotent, then
f n = 0. 13) A nilpotent operator f on an n dimensional space V is nilcyclic
if and only if V has a basis of the form (x, f (x), f 2 (x), . . . , f n−1(x)) for some
x ∈ V (i.e. x is a cyclic vector as defined in Exercise 6). Show that this is
equivalent to each of the following conditions:

• f n = 0 but f n−1 6= 0;

59
• r(f ) = n − 1;

• dim Ker f = 1;

• there is no decomposition V = V1 ⊕ V2 where V1 and V2 are non-trivial


subspaces which are f -invariant.

14) Show that if A is an invertible n × n matrix, then

λn 1
χA−1 (λ) = (−1)n χA ( ).
det A λ
Show that in general (i.e. without the condition on invertibility of A) we
have
χ′ (λ)
tr (λI − A)−1 = A
χA (λ)
whenever λ is not an eigenvalue of A.
15) Show that if A is a nilpotent n × n matrix with Ak = 0, then I − A is
invertible and
(I − A)−1 = I + A + · · · + Ak−1 .
16) Let A be a complex 2 × 2 matrix which is not a multiple of the unit
matrix. Show that any matrix which commutes with A can be written in the
form λI + µA (λ, µ ∈ C).
17) Find a Jordan canonical form for the operator

D : Pol (n) → Pol (n).

18) Show that every n × n matrix over C can be represented as a sum D + N


whereD is a diagonalisable matrix, N is nilpotent and N and D commute.
Show that there is only one such representation.
19) A linear mapping f : V → V has an upper triangular representation
(i.e. a basis with respect to which its matrix is upper triangular) if and
only if there is a sequence (Vi ) of subspaces of V (i = 0, . . . , n) where the
dimensional if Vi is i and Vi ⊂ Vi+1 for each i so that f (Vi ) ⊂ Vi for each
i. Show directly (i.e. without using the Jordan canonical form) that such
a sequence exists (for operators on complex vector spaces) and use this to
give a proof of the Cayley-Hamilton theorem independent of the existence of
a Jordan form. Show that if f and V are real, then such a representation
exists if and only if χf has n real zeroes.
20) Show that if A is an n × n matrix with eigenvalues λ1 , . . . , λn then the
eigenvalues of p(A) are p(λ1 ), . . . , p(λn ).
21) Let J be a Jordan block of the form Jn (λ). Calculate

60
• the set of matrices B which commute with A;
• the set of matrices which commute with all matrices which commute
with A.
Deduce that a matrix is in the latter set if and only if it has the form p(A)
for some polynomial p.
22) Use 21) to show that a matrix B commutes with all matrices which
commute with a given matrix A if and only if B = p(A) for some polynomial
p (cf. Exercise 7) above).
23) Show that if A1 and A2 are commuting n × n matrices, then their eigen-
values can be ordered as
λ1 , . . . , λn resp. µ1 , . . . , µn
in such a way that for any polynomial p of two variables, the eigenvalues of
p(A1 , A2 ) are
p(λ1 , µ1 ), . . . , p(λn , µn ).
Generalise to commuting r-tuples A1 , . . . , Ar of matrices.
24) Show that if p and q are polynomials and A is the companion matrix of p,
then the nullity of q(A) is the number of common roots of p and q (counted
with multiplicities). In the case where q = p′ , the rank of p′ (A) is the number
of distinct roots of A.
25) Consider the companion matrix
 
0 1 0 ... 0
 0 0 1 ... 0 
 
 .. .
. 
C= . . 
 
 0 0 0 ... 1 
−a0 −a1 −a2 . . . −a0
of the polynomial p (cf. a previous exercise) and suppose now that p has
repeated roots
λ1 , . . . , λ1 , λ2 , . . . , λi , . . . , λr , . . . , λr
where λi occurs ni times. Show that C has Jordan form
diag (Jn1 (λ1 ), Jn2 (λ2 ), . . . , Jnr (λr ))
and that this is induced by the following generalised Vandermonde matrix:
 
1 0 ... 1 ... 0
 λ1 1 ... λ2 . . . 0 
 
 λ2 2λ1 ... λ22 . . . 0 
 1 .
 .. .. 
 . . 
λ1n−1 (n − 1)λ1n−2 . . . n−1
λ2 n−nr
. . . λr

61
(The first n1 columns are obtained by successive differentiation of the first
one and so on).

62
3.4 Functions of matrices and operators
We have often used the fact that we can substitute square matrices into
polynomials. For many applications, it is desirable to be able to do this
for more general functions and we discuss briefly some of the possibilities.
Suppose firstly that A is a diagonal matrix, say

A = diag (λ1 , . . . , λn ).

Then if we recall that for a polynomial p,

p(A) = diag (p(λ1 ), . . . , p(λn ))

it is natural to define x(A) to be diag (x(λ1 ), . . . , x(λn )) for a suitable function


x. In order for this to make sense, it suffices that the domain of definition of x
contain the set {λ1 , . . . , λn } of eigenvalues of A. In view of the importance of
the latter set in what follows, we denote it by σ(A). It is called the spectrum
of A. Of course, substitution satisfies the rules that (x+y)(A) = x(A)+y(A)
and (xy)(A) = x(A)y(A).
If A is diagonalisable, say

S −1 AS = D = diag (a,1 , . . . , λn ),

we define x(A) to be

S · x(D) · S −1 = S · diag (x(λ1 ), . . . , x(λn )) · S −1 .

The case where A is not diagonalisable turns out to be rather more tricky.
Firstly, we note that it suffices to be able to define x(A) for Jordan blocks.
For if A has Jordan form

S −1 AS = diag (J1 , . . . , Jr )

then we can define x(A) to be

S · diag (x(J1 ), . . . , x( Jr )) · S −1

once we know how to define the x(Ji ). (We are using the notation diag (J1 , . . . , Jr )
for the representation of a Jordan form as a blocked diagonal matrix).
In order to motivate the general definition, consider the case of the square
root of the Jordan matrix Jn (λ). Firstly, we remark that for λ = 0, no such
square root exists. We show this for the simplest case (n = 2) but the same
argument works in general.

63
Example: Show that there is no matrix A so that
 
2 0 1
A = .
0 0

Solution: Suppose that such a matrix exists. Then A4 = 0 and so A is


nilpotent. But a 2 × 2 nilpotent matrix must satisfy the equation A2 = 0
(since its characteristic polynomial is λ2 ) which is patently absurd. We now
turn to the general case and show that if λ is non-zero, then Jn (λ) does have
a square root. In fact, if λ1/2 denotes one of the square roots of λ, then the
matrix  1 1  1 

1 12 λ1 2
1 . . . 2 1
n−1 λn−1
 λ2
1 1 


1/2  0 1 1
. . . n−2 λn−2 
1
A=λ 
2
1 λ
2

. . 
 .. .. 
0 0 0 ... 1

(where for any real number α and n ∈ N, αn is the binomial coefficient

α(α − 1) . . . (α − n + 1)
)
n!
satisfies the equation A2 = Jn (λ).
If this matrix seems rather mysterious, notice that the difference between
the cases λ = 0 and λ 6= 0 lies in the fact that the complex function z 7→ z 1/2
is analytic in the neighbourhood of a non-zero λ (i.e. is expressible as a power
series in a neighbourhood of λ) whereas this is not the case at 0. In fact, we
calculated the above root by writing
1
Jn (λ) = λ(I + N)
λ
where N is the nilpotent matrix Jn (0). We then wrote
1
Jn (λ)1/2 = λ1/2 (I + N)1/2
λ
and substituted λ1 N for z in the Taylor series

X∞ 1
1/2 2 zi .
(1 + z) =
i=0
i

Of course, the resulting infinite series terminates after n-terms since N n = 0.

64
If we apply the same method to the Taylor series

(1 + z)−1 = 1 − z + z 2 − z 3 + . . .

then we can calculate the inverse of Jn (λ) for λ 6= 0. The reader can check
that the result coincides with that given above.
This suggest the following method for defining x(A) where, for the sake
of simplicity, we shall assume that x is entire. This will ensure that x has a
Taylor expansion around each λ in the spectrum of A. As noted above, we
use the Jordan form
S −1 AS = diag (J1 , . . . , Jr )
where Ji is Jni (λi ). We define x(Ji ) as follows. x has the Taylor expansion

x(λ) = x(λi ) + x′ (λi )(λ − λi ) + . . .

around λi . If we substitute formally we get the expression


x(n−1) (λi ) n−1
x(Ji ) = x(λi )I + x′ (λi )N + · · · + N
(n − 1)!
where Ji = λi I + N (i.e. N = Jni (0)).
We now give a purely algebraic form of the definition. Suppose that A is
an n × n matrix with distinct eigenvalues λ1 , . . . , λr and minimal polynomial
Y
m(λ) = (λ − λi )m1 .

We shall define x(A) for functions x which are defined on a neighbourhood


of the set of eigenvalues of A and have derivatives up to order mi − 1 at λi
for each i. It is clear that there is a polynomial p so that

p(k) (λi ) = x(k) (λi )

for each i and k ≤ mi − 1.


Perhaps the easiest way to do this is as follows. Firstly we construct a
polynomial Lik (of minimal degree) which vanishes, together with its deriva-
tives up to order mj − 1 at λj (j 6= i) and is such that the k-th derivative at
λi is one, while all other ones vanish there (up to order mi − 1). We denote
this polynomial by Lik (it can easily be written down explicitly but as we
shall not require this directly we leave its computation as an exercise for the
reader).
Then the polynomial which we require is
X
p= xik Lik
i,k

65
where xik = x(k) (λi ). Hence if we write Pik for the operator Lik (A), then
X
x(A) = xik Pik .
i,k

The Pik are called the components of A.


In the special case where A has n distinct eigenvalues, then χA has the
form
(λ − λ1 ) . . . (λ − λn ).
The only relevant L’s are the Li0 ’s which we denote simply by Li . Thus Li
is the Lagrange interpolating polynomial
Y λ − λj

j6=i
λk − λj

which takes on the value 1 at λi and the value 0 at the other λ’s. We note also
the fact that the sum of the Li ’s is the constant function one and that L2i = Li
(both of these when the functions are evaluated at the eigenvalues). In this
case, the components Pi = Li (A) satisfy the equations Pi2 = Pi (i.e. they
are projections) and their sum is the identity operator. The most important
example of such a function of a matrix is the exponential function. Since the
latter is entire, we can substitute any matrix and we denote the result by
exp(A) or eA . We note some of its simple properties:
• if A is diagonalisable, say A = SDS −1 where D = diag (λ1 , . . . , λn ),
then
exp A = S · diag (eλ1 , . . . , eλn ) · S −1 ;
• if A = D + N where D is diagonalisable and N is nilpotent, both
commuting, then exp A = exp D · exp N and

X Nk
exp N =
k!
k=0

where the series breaks off after finitely many terms;


• exp A · exp(−A) = I (and so exp A is always invertible);
• if (λ1 , . . . , λn ) are the eigenvalues of A, then (eλ1 , . . . , eλn ) are the eigen-
values of exp A;
• if A and B commute, then so do exp A and exp B and we have the
formula
exp(A + B) = exp A · exp B.
In particular, if s, t ∈ R, then exp(s + t)A = exp sA · exp tB;

66
• the function t 7→ exp tA from R into the set of n × n matrices is
differentiable and
d
(exp tA) = A · exp(tA).
dt
We remark that the statement in (6) means that the elements of the matrix
exp tA, as functions of t, are differentiable. The derivative on the left hand
side is then the matrix obtained by differentiating its elements.
This property is particularly important since it means that the general
solution of the system
dX
= AX
dt
of differential equations where X is the column matrix
 
x1 (t)
 .. 
X(t) =  . 
xin (t)

whose elements are smooth functions is given by the formula

X(t) = exp tA · X0

where X0 is the column matrix


 
x1 (0)
 .. 
 . 
xn (0)

of initial conditions.

Example: Consider the general linear ordinary differential equation of de-


gree n
x(n) + an−1 x(n−1) + · · · + a0 x = 0
with constant coefficients. We can write this equation in the form
dX
= AX
dt
where X is the column vector
 
x(t)
 x′ (t) 
 
 .. 
 . 
x(n−1) (t)

67
and A is the companion matrix
 
0 1 0 ... 0
 0 0 1 ... 0 
 
 .. .. 
A= . . 
 
 0 0 0 ... 1 
−a0 −a1 −a2 . . . −an−1

of the polynomial
p(t) = tn + an−1 tn−1 + · · · + a0 .
As we know, the characteristic polynomial of this matrix is p and so its
eigenvalues are the roots λ1 , . . . , λn of p.
We suppose that these λi are all distinct. Then A is diagonalisable and
the diagonalising matrix is the Vandermonde matrix
 
1 ... 1
 λ1 . . . λn 
 
V (λ1 , . . . , λn ) =  .. ..  .
 . . 
n−1 n−1
λ1 . . . λn

Hence if we write S for this matrix, we have

exp At = S · diag (eλ1 t , . . . , eλn t ) · S −1

and the solution of the above equation can be read off from the formula

X(t) = S · (eλ1 t , . . . , eλn t ) · S 1 X0

where X0 is the column matrix of initial values


 
x(0)
 x′ (0) 
 
 .. .
 . 
(n−1)
x (0)

The reader can check that this provides the classical solution.
In principle, the case of repeated roots for p can be treated similarly.
Instead of the above diagonalisation, we use the reduction of A to its Jordan
form. The details form one of the main topics in most elementary books on
differential equations.

68
Exercises: 1) Calculate exp A where
 
0 1 0      
  cos θ − sin θ cos θ sin θ 0 t
A= 2 0 2 A= A= A= .
sin θ cos θ sin θ − cos θ −t 0
0 1 0

2) Solve the following system of differential equations:

dx1
= x1 − 12x3 + e−3t
dt
dx2
= −x1 + 7x2 − 20x3
dt
dx3
= x1 + 5x3 + cos t.
dt
3) If  
0 1 0 ... 0
 0 0 1 ... 0 
 
 .. ..  .
C= . . 
 
 0 0 0 ... 1 
1 0 0 ... 0
calculate exp tC.
4) Calculate exp A where A is the matrix of the differentiation operator D
on Pol (n).
5) Show that if A is an n × n matrix all of whose entries are positive, then
the same holds for exp A.
6) Show that det(exp A) = exp(tr A).
7) Show that the general solution of the equation

dX
= AX + B
dt
with initial condition X(0) = X0 , where A is a constant n × n matrix and B
is a continuous mapping from R into the space of n × 1 column matrices, is
given by the equation
Z t
X(t) = exp((t − s)A)B(s) ds + exp tA · X0 .
0

8) Show that if A is an n × n matrix with distinct eigenvalues λ1 , . . . , λr and


minimal polynomial Y
m(λ) = (λ − λi )mi ,

69
then two polynomials p and q agree on A (i.e. are such that p(A) = q(A)) if
and only if for each i
p(λi ) = q(λi ) p′ (λi ) = q ′ (λi ) ... pmi −1 (λi ) = q mi −1 (λi ).
9) Let A and B be n × n matrices and define a matrix function X(t) as
follows:
X(t) = eAt CB Bt .
Show that X is a solution of the differential equation
dX
= AX + XB
dt
with initial condition X(0) = C.
Deduce that if the integral
Z ∞
Y =− eAt CeBt dt
0

converges for given matrices A, B and C, then it is a solution of the equation


AY + Y B = C.
10) Let A and B be n×n matrices which commute and suppose that A2 = A.
Show that M(s) = AeBs is a solution of the functional equation M(s + t) =
M(s)M(t). Show that if, on the other hand, M is a smooth function which
satisfies this equation, then M(s) has the form AeBs where A = M(0) and
B = M ′ (0).
11) For which matrices A are the following expressions defined:
sin A, cos A, tan A, sinh A, cosh A, tanh A, ln(I + A)?
Show how to use the methods of this section to solve a differential equation
of the form
d2 X
= AX
dt2
with initial values X(0) = X0 , X ′ (0) = Y0 . 12) Show that if a function x is
such that x(A) is defined, then so is x(At ) and we have
x(At ) = x(A)t .
13) Show that if A and B are commuting n × n matrices, then
(B − A)m
x(B) = x(A) + x′ (A) · (B − A) + · · · + x(m) (A) ·
m!
for suitable functions x and integers m.

70
3.5 Circulants and geometry
An n × n matrix A is a circulant if it has the form
 
a0 a1 . . . an−1
 an−1 a0 . . . an−2 
 
 .. ..  .
 . . 
a1 a2 ... a)
Note that this just means that A is a polynomial function of the special
circulant  
0 1 0 ... 0
 0 0 1 ... 0 
 
C =  .. ..  .
 . . 
1 0 0 ... 0
In fact, A is then p(C) where
p(t) = a0 + a1 t + · · · + an−1 tn−1 .
The following result gives an alternative characterisation of circulant matri-
ces. It can be verified by direct calculation.
Proposition 19 An n × n matrix is circulant if and only if it commutes
with C.
We have already calculated the eigenvalues of C and found them to be
1, ω, ω 2, . . . , ω n−1
where ω is the primitive root cos 2πn
+ i sin 2π
n
. The eigenvector corresponding
1
k
to ω is easily seen to be uk = n (ω , ω , . . . , ω nk ). (The reason for the
√ k 2k

factor √1n will become apparent later). These eigenvectors are particularly
interesting since they are also eigenvectors for all polynomial functions of C
i.e. for the circulant matrices.
Here we shall discuss briefly the circulants of the form
 
1 1 1 ... 1 0 ... 0
 0 1 1 ... 1 1 ... 0 
 
A =  .. .. 
 . . 
1 1 1 ... 0 0 ... 1
where there are m “ones” in each row. In other words,
1
A = p(C) where p(t) = (1 + t + · · · + tm−1 ).
m
71
It follows from the above that the eigenvalues of A are λ1 , . . . , λn where
λn = 1 and
1 1 − ω km
λk =
m 1 − ωk
for k 6= n.
Then we see immediately
• that A is invertible if and only if m and n are relatively prime;
• that if d is the greatest common divisor of m and n, then the dimension
of the kernel of fA is d and it has a basis consisting of the vectors
1 jd 2jd
(ω , ω , . . . , 1, ω jd, . . . )
n
for 1 ≤ j ≤ d − 1.
These results have the following geometrical interpretation. Suppose that P
is an n-gon in R2 . If we identify the points of R2 with complex numbers, we
can specify P by an n-tuple (z1 , . . . , zn ) of complex numbers (its vertices).
For example, the standard square corresponds to the 4-tuple (o, 1, 1 + i, i).
An n × n matrix A can be regarded as acting on such polynomials by left
multiplication of the corresponding column matrix i.e. we define the polygon
Q = A(P) to be the one with vertices ζ1 , . . . , ζn where
   
ζ1 z1
 ..   . 
 .  = A  ..  .
ζn zn
Consider the transformation with matrix
 
1 1 ... 1 0 ... 1
1 
 0 1 ... 1 1 ... 0 

A =  .. .. 
m . . 
1 1 ... 1 0 ... 1
discussed above. In this case, Q is the polygon whose vertices are the cen-
troids of the vertices P1 , . . . , Pm resp. P2 , . . . , Pm+1 and so on. This polygon
is called the m-descendant of P.
The results on the matrix A that we obtained above can now be expressed
geometrically as follows:
If m and n are relatively prime, then every polygon Q is the m-descendant
of a unique polygon P.
A more delicate investigation of the case where the greatest common
factor d of m and n is greater than 1 leads to a characterisation of those
polygons P which are m descendants (see the Exercises below).

72
Exercises: 1) Show that the determinant of the circulant matrix circ (a0 , . . . , an−1 )
is
n−1
YX n−1
ω ik ak
j=0 k=0

where ω is the primitive n-th root of unity.


2) A matrix A is called r-circulant if CA = AC r where C is the circulant
matrix of the text. Characterise such matrices directly and show that if A is
r-circulant and B s-circulant, then AB is rs-circulant.
3) Describe those matrices which are polynomial functions of the matrix
Jn (0).
4) Suppose that the greatest common factor d of m and n is greater than 1.
Identify the kernel and the range of the matrix
 
1 1 ... 1 0 ... 0
1 
 0 1 ... 1 1 ... 0 

A =  .. .. 
m . . 
1 1 ... 1 0 ... 1

and use this to give a characterisation of those polygons which are m-descendants
resp. whose m-descendants are the trivial polygon with all vertices at the
origin.
5) Diagonalise the following circulant matrices:

• circ (a, a + h, . . . , a + (n − 1)h);

• circ (a, ah, . . . , ahn−1 );

• circ (0, 1, 0, 1, . . . , 0, 1).

6) Show that if A is a circulant matrix and x is a suitable function, then


x(A) is also a circulant.

73
3.6 The group inverse and the Drazin inverse
As a further application of the Jordan form we shall construct two special
types of generalised inverse for linear mappings (resp. matrices). These are
of some importance in certain applications. The method will be the same in
both cases. Firstly we construct the inverse for matrices in Jordan form and
then use this to deal with the general case. We begin with the group inverse.
A group inverse for an n × n matrix A is an n × n matrix S so that

ASA = A SAS = S

and A and S commute. As a simple example, suppose that A is the diagonal


matrix
diag (λ1 , . . . , λr , 0, . . . , 0).
Then S = diag ( λ11 , . . . , λ1r , 0, . . . , 0) is obviously a group inverse for A. Hence
every diagonalisable matrix has a group inverse. (If à = P −1 AP is diagonal
and S̃ is a group inverse for à as above, then S = P S̃P −1 is a group inverse
for A).
More generally, note that if the vector space has the splitting

V = f (V ) ⊕ Ker f

then we can use this to define a generalised inverse g = P ◦ f˜−1 where P is


the projection onto f (V ) along Ker f and f˜ is the isomorphism induced by f
on f (V ). In this case, g is easily seen to be a group inverse for f . Note that
from the analysis of section 3 of the present chapter, f has such a splitting
in the case where r(f ) = r(f 2 ). In terms of the Jordan form this means
that the block corresponding to the zero eigenvalues is the zero block i.e. the
Jordan form is
diag (J1 (λ1 ), . . . , Jr (λr ), 0, . . . , 0)
where λ1 , . . . , λr are the non-zero eigenvalues. The group inverse is then

diag (J1 (λ1 )−1 , . . . , Jr (λr )−1 , 0, . . . , 0).

In terms of the minimal polynomial, this means that the matrix has a group
inverse provided the latter has the form

mA (λ) = λǫ (λ1 − λ)n1 . . . (λr − λ)nr

where ǫ is either 0 or 1.

74
The Drazin inverse A Drazin inverse for an n × n matrix A is an n × n
matrix S so that

• SAS = S;

• S and A commute;
1
• λ ∈ σ(A) if and only if λ† ∈ σ(S) where λ† = λ
for λ 6= 0 and λ† = 0
for λ = 0;

• Ak+1 S = Ak for some positive integer k.

Suppose first that A is a Jordan block Jn (λ). Then we define S to be Jn (λ)−1


if λ is non-zero and to be 0 if λ = 0. Then S satisfies the above four condition
(with k = n if λ = 0). We denote this S by AD . Now if A has Jordan form

diag (A1 , A2 , . . . , Ar )

where each Ai is a Jordan block, we define AD to be the matrix

diag (AD D
1 , . . . , Ar ).

Once again, this satisfies the above four condition (this time with the integer
k the maximum of the orders (ki ) of the individual blocks corresponding to
the zero eigenvalue).
For a general matrix A we choose an invertible P so that A = P −1 ÃP
where à has Jordan form. Then we define AD to be P −1 ÃD P . Of course,
AD is a Drazin inverse for A.
In terms of operators, the Drazin inverse can be described as follows:
suppose that f : V → V is a linear transformation. Then, as we have seen,
there is a smallest integer p at which the sequences

V ⊃ f (V ) ⊃ f 2 (V ) ⊃ . . .

and
Ker f ⊂ Ker (f 2 ) ⊂ Ker(f 3 ) ⊂ . . .
become stationary. This integer is called the index of f and we have the
splitting
V = f p (V ) ⊕ Ker (f p ).
f , restricted to f p (V ), is an isomorphism of this space onto itself. The
operator f D is that one which is obtained by composing the inverse of the
latter with the projection onto f p (V ) along Ker (f p ).

75
Exercises: 1) For f ∈ L(V ) we define A(f ) = f (V ) ∩ Ker f . Show that

• dim A(f ) = r(f ) − r(f 2 );

• Ker f ⊂ f (V ) if and only if r(f ) = r(f 2) − n(f ) where n(f ) =


dim Ker f ;

• f (V ) = Ker f if and only if f 2 = 0 and n(f ) = r(f );

• V = f (V ) ⊕ Ker f if and only if lr(f ) = r(f 2 ).

2) Show that the Drazin inverse is uniquely determined i.e. that there is at
most one matrix S so that SAS = S, AS = SA and Ak+1 S = Ak for some
positive integer k.
3) Show that the group inverse is uniquely determined i.e. there is at most
one matrix S so that ASA = A, SAS = S and AS = SA.
4) Show that if f ∈ L(V ) has a Drazin inverse f D , then f D (V ) = f (V ),
Ker f D = Ker(f k ) and f f D = f D f is the projection onto f (V ) along Ker f k .
5) The following exercise shows how the existence of the Drazin inverse can
be deduced directly from the existence of the minimal polynomial, without
using the Jordan form. Suppose that mA has the form

t 7→ ak tk + · · · + tr

where tk is the lowest power of t which occurs with non-vanishing coefficient


(we can suppose that k ≥ 1). The equation mA (A) = 0 can then be written
in the form
Ak = Ak+1 B
where B is a suitable polynomial in A. Show that Ak+r B r = Ak for each
r ≥ 1 and deduce that S = Ak B k+1 satisfies the defining conditions for
the Drazin inverse. 6) Show that if A is an n × n matrix, then there is a
polynomial p so that AD = p(A).

76
4 EUCLIDEAN AND HERMITIAN SPACES
4.1 Euclidean space
In chapter II we saw that a number of basic geometrical concepts could be
defined in terms of the scalar product. We now discuss such products in
higher dimensions where, in the spirit of chapter III, we use the axiomatic
approach. We shall prove higher dimensional versions of many of the results
of chapter II, culminating in the spectral theorem for self-adjoint operators.

Definition: A scalar product (or inner product) on a real vector space


V is a mapping from V × V → R, denoted as

(x, y) 7→ (x|y)

so that

• the mapping is bilinear i.e.

(λ1 x1 +λ2 x2 |µ1 y1 +µ2 y2 ) = λ1 µ1 (x1 |y1)+λ1 µ2 (x1 |y2 )+λ2 µ1 (x2 |y1 )+λ2 µ2 (x2 |y2 )

for λ1 , λ2 , µ1, µ2 ∈ R, x1 , x2 , y1, y2 ∈ V ;

• it is symmetric i.e. (x|y) = (y|x) (x, y ∈ V );

• it is positive definite i.e. (x|x) > 0 if x 6= 0.

Property (3) implies the following further property:


(4) if x ∈ V is such that (x|y) = 0 for each y ∈ V , then x = 0. (Choose
y = x in (3)).
A euclidean vector space is a (real) vector space V together with a
scalar product.
Of course, a vector space can be provided with many distinct scalar prod-
ucts. This can be visualised in R2 where a scalar product is given by a bilinear
form of the type
2
X
(x, y) 7→ aij ξi ηj
i,j

and the positive definiteness means exactly that the corresponding conic
section
X2
aij ξi ξj = 1
i,j

77
is an ellipse. Hence the choice of a scalar product in R2 is just the choice of
an ellipse with centre 0.
The standard euclidean space is Rn with the scalar product
X
(x|y) = ξi ηi .
i

Another example is the space Pol (n) with the scalar product
Z 1
(p|q) = p(t)q(t) dt.
0

The latter is a subspace of the infinite dimensional space C([0, 1]) with scalar
product Z 1
(x|y) = x(t)y(t) dt.
0

Using the scalar product we can define the length (or norm) of a vector
x—written kxk. It is defined by the formula
p
kxk = (x|x).

We then define the distance between two points x, y in V to be kx − yk,


the length of their difference. As in R2 and R3 there is a close connection
between the concepts of norm and scalar product. Of course, the norm is
defined in terms of the latter. On the other hand, the scalar product can be
expressed in terms of the norm as follows:

kx + yk2 = (x + y|x + y) = (x|x) + 2(x|y) + (y|y)

and so
1
(x|y) = (kx + yk2 − kxk2 − kyk2).
2
2
As in R the norm and scalar product satisfy the Cauchy-Schwartz in-
equality:
|(x|y)| ≤ kxkkyk (x, y ∈ V ).
This is named after the discoverers of the classical case
1 1
ξ1 η1 + · · · + ξn ηn ≤ (ξ12 + · · · + ξn ) 2 (η12 + · · · + ηn2 ) 2 .

To prove the general inequality, we consider the quadratic function

T 7→ (x + ty|x + ty) = t2 kyk2 + 2t(x|y) + kxk2

78
which is non-negative by the positive-definiteness of the product. Hence its
discriminant is less than or equal to zero i.e. 4(x|y)2 −4(kxk2 kyk2) ≤ 0 which
reduces to the required inequality. (Note that the same proof shows that the
Cauchy-Schwarz inequality is strict i.e. |(x|y)| < kxkkyk unless x and y are
proportional. For the above quadratic is strictly positive if x and y are not
proportional and then the discriminant must be negative).
From this we can deduce the triangle inequality

kx + yk ≤ kxk + kyk.

For

kx + yk2 = (x + y|x + y)
= kxk2 + 2(x|y) + kyk2
≤ kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2.

From the Cauchy-Schwarz inequality we see that if x and y are non-zero then
the quotient
(x|y)
kxkkyk
lies between −1 and 1. Hence there is a unique θ ∈ [0, π] so that

(x|y)
cos θ = .
kxkkyk

θ is called the angle between x and y.


We can also define the concept of orthogonality or perpendicularity: two
vectors are perpendicular (written x ⊥ y if (x|y) = 0.
As in R2 and R3 we use this notion to define a special class of bases:

Definition: An orthogonal basis in V is a sequence (x1 , . . . , xn ) of non-


zero vectors so that (xi |xj ) = 0 for i 6= j. If, in addition, each vector has
unit length, then the system is orthonormal.
An orthogonal system is automatically linearly independent. For if

λ1 x1 + . . . λm xm = 0,

then
0 = (λ1 ξ1 + · · · + λm xm |xi ) = λi (xi |xi )
and so λi = 0 for each i.

79
Hence an orthonormal system (x1 , . . . , xn ) with n elements in an n-dimensional
space is a basis and such bases are called orthonormal bases. The classical
example is the canonical basis (e1 , . . . , en ) for Rn .
One advantage of an orthonormal basis is the fact that the coefficient of
a vector with respect to Pthe basis can be calculated simply by taking scalar
n
products. In fact x = k=1(x|xk )xk . (This is sometimes called the Fourier
seriesPof x). This is proved
Pn by a calculation similar to the one above. Also if
n
x = i+1 λi xi and y = k=1 µk xm , then
n
X n
X n
X n
X
(x|y) = ( λi xi | µk xk ) = λi µk (xi |xk ) = λi µ i
i=1 k=1 i,k=1 i=1

p
and, in particular, kxk = λ21 + · · · + λ2n . Thus the scalar product and norm
can be calculated from the coordinates with respect to an orthonormal basis
exactly as we calculate them in Rn .
Every euclidean space has an orthonormal basis and to prove this we use
a construction which has a natural geometrical background and which we
have already used in dimensions 2 and 3.

Proposition 20 If V is an n-dimensional euclidean space, then V has an


orthonormal basis (x1 , . . . , xn ).

Proof. We construct the basis recursively. For x1 we take any unit vector.
If we have constructed x1 , . . . , xr we construct xr+1 as follows. We take any
z which does not lie in the span of x1 , . . . , xr (of course, if there is no such
element we have already constructed a basis). Then define
r
X
x̃r+1 = z − (z|xi )xi .
i=1

x̃r+1 is non-zero and is perpendicular to each xi (1 ≤ i ≤ r). Hence if we put


x̃r+1
xr+1 =
kx̃r+1 k

then (xr , . . . , xr+1 ) is an orthonormal system.


The proof actually yields the following result:

Proposition 21 Every orthonormal system (x1 , . . . , xr ) in V can be ex-


tended to an orthonormal basis i.e. there exist elements xr+1 , . . . , xn so that
(x1 , . . . , xn ) is an orthonormal basis for V .

80
The standard way to construct an orthonormal basis is called the Gram-
Schmid process and consists in applying the above method in connection
with a given basis (y1 , . . . , yn ), using y1 for x1 and, at the r-th step using
yr+1 for z. This produces an orthonormal basis x1 , . . . , xn of the form
x1 = b11 y1
x2 = b21 y1 + b22 y2
..
.
xn = bn1 y1 + · · · + bnn yn
where the diagonal elements bii are non-zero. If we apply this method to the
case where the space is Rn (identified with the space of row vectors) and the
basis (y1, . . . , yn ) consists of the rows of an invertible n × n matrix A, we
obtain a lower triangular matrix
 
b11 0 . . . 0
 b21 b22 . . . 0 
 
B =  .. .. 
 . . 
bn1 bn2 . . . bnn
and a matrix Q whose rows form an orthonormal basis for Rn (such matrices
are called orthonormal) so that Q = BA. Since B is invertible and its
inverse L is also a lower triangular matrix, we obtain the following result on
matrices:
Proposition 22 Any n × n invertible matrix A has a representation of the
form A = LQ where Q is an orthonormal matrix and L is lower triangular.
We can use this fact to prove a famous inequality for the determinant of
the n × n matrix. We have
YX 1
| det A| ≤ ( |aij |2 ) 2
i j

(i.e. the determinant is bounded by the product of the euclidean norms of


the columns. This is known as Hadamard’s inequality).
Proof. We can write the above equation as L = AQt . Hence
X X 1
X 1
lii = aij qij ≤ ( a2ij ) 2 ( qij2 ) 2
j j j
P
the last step using the Cauchy-Schwarz inequality. Hence (since j qij2 = 1),
!1/2
Y Y X
| det A| = | det LU| = | det L| = |lii | ≤ a2ij
i i j

81
(We have used the fact that if the matrix U is orthonormal, then its deter-
minant is ±1. This follows as in the 2-dimensional case from the equation
U t U = I—see below).
In the context of euclidean space, those linear mapping which preserve
distance (i.e. are such that kf (x)k = kxk for x ∈ V ) are of particular interest.
As in the two and three dimensional cases, we can make the following simple
remarks (note that we only consider linear isometries):
I. If f is an isometry, then f preserves scalar products i.e. (f (x)|f (y)) = (x|y)
(x, y ∈ V ). For
1
(f (x)|f (y)) = (kf (x + y)k2 − kf (x)k2 − kf (y)k2)
2
1
= (kx + yk2 − kxk2 − kyk2)
2
= (x|y).

On the other hand, this property implies that f is an isometry. (Take x = y).
II. An isometry from V into V1 is automatically injective and so is surjective
if and only if dim V = dim V1 . In particular, any isometry of V into itself is
a bijection.
III. An isometry maps orthonormal systems onto orthonormal systems. In
particular, if dim V = dim V1 , then f maps orthonormal bases onto orthonor-
mal bases. On the other hand, if f maps one orthonormal basis (x1 , . . . , xn )
ontoP an orthonormal system P(y1 , . . . , yn ) in V1 , then f is an isometry. For if
x = k λk xk , then f (x) = k λk yk and so
X
kf (x)k2 = λ2k = kxk2 .
k

IV. By this criterium, if A is an n × n matrix, then since fA maps the


canonical basis for Rn onto the columns of A, we see that fA is an isometry
if and only if the columns of A form an orthonormal basis. This can be
conveniently expressed in the equation At · A = I (or equivalently, At = A−1
or A · At = I). Matrices with this property are called orthonormal as we
noted above.
Typical isometries in Rn are reflections i.e. operators with matrices of
the form  
1 0 0 ... 0 0
 0 1 0 ... 0 0 
 
 .. .. 
 . . 
 
 0 0 0 ... 1 0 
0 0 0 . . . 0 −1

82
with respect to some orthonormal basis and rotations i.e. those with ma-
trices of the form
 
1 0 0 ... 0 0
 0 1 0 ... 0 0 
 
 .. .. 
 . . 
 
 0 0 0 . . . cos θ − sin θ 
0 0 0 . . . sin θ cos θ

with respect to some orthonormal basis.

Example: Show that the mapping

(A|B) = tr (AB t )

is a scalar product on Mn .
Solution: The bilinearity and symmetry:

(A1 + A2 |B) = tr((A1 + A2 )B t )


= tr (A1 B t + A2 B t )
= tr(A1 B t ) + tr (A2 B t )
= (A1 |B) + (A2 |B).

(A|B) = tr (AB t )
= tr(AB t )t
= tr(BAt )
= (B|A).

Positive definiteness:
X
(A|A) = tr (AAt ) = a2ij > 0
i,j

for A 6= 0.
We remark that the basis (eij : i, j = 1, . . . , n) is then orthonormal where
eij is the matrix with a 1 in the (i, j)-th position and zeroes elsewhere.

Example: Show that the mapping

(x|y) = 4ξ1 η1 − 2ξ1η2 − 2ξ2η1 + 3ξ2 η2

83
is a scalar product on R2 .
Solution: The matrix of the quadratic form is
 
4 −2
−2 3
with eigenvalues the roots of λ2 − 7λ + 8 which are positive.
We calculate an orthonormal basis with respect to this scalar product by
applying the Gram-Schmidt process to x1 = (1, 0), x2 = (0, 1). This gives
x1 1
e1 = = ( , 0);
kx1 k 2
1 1 1
y2 = (0, 1) − ((0, 1)|( , 0))( , 0) = , 1);
2 2 2
1
e2 = √ (2, 1).
2

Example: Construct an orthonormal basis for Pol (2) with scalar product
Z 1
(p|q) = p(t)q(t) dt
0

by applying the Gram-Schmidt method to (1, t, t2 ).


Solution:
x̃0 = 1, x0 = 1;
1
x̃1 (t) = t − (t|1)1 = t = − ;
2
√ 1
x1 (t) = 12(t − );
2
x̃2 (t) = t − (t |x0 )x0 (t) − (t2 |x1 )x1 (t)
2 2

1
= t2 − t + ;
6
√ 2 1
x2 (t) = 6 5(t − t + ).
6
We calculate the Fourier series of 1 + t + t2 with respect to this basis. We
have
Z 1
11
λ1 = (t2 + t + 1) dt =
0 6
√ Z 1 2 1 1
λ2 = 12 (t + t + 1)(t − ) dt = √
0 2 3
√ Z 1
1 1
λ3 = 6 5 (t2 + t + 1)(t2 − t + ) dt = √
0 6 6 5

84
and so
11 1 1
t2 + t + 1 = + 2(t − ) + (t2 − t + ).
6 2 6

Example If (e1 , . . . , en ) is an orthonormal basis for V and x1 , . . . , xn ∈ V ,


then
 2  
(x1 |e1 ) . . . (x1 |en ) (x1 |x1 ) . . . (x1 |xn )
 .. ..   .. .. 
det  . .  = det  . . .
(xn |e1 ) . . . (xn |en ) (xn |x1 ) . . . (xn |xn )

(And hence the right hand side is non-negative and vanishes if and only if
the xi are linearly dependent).
Solution: This follows from the equalities
n
X
(xi |xj ) = (xi |ek )(xj |ek )
k=1

which can be written in the matrix form


    
(x1 |e1 ) . . . (x1 |en ) (x1 |e1 ) . . . (x1 |en ) (x1 |x1 ) . . . (x1 |xn )
 .. ..  .. ..   .. .. 
 . .  . . = . . .
(xn |e1 ) . . . (xn |en ) (xn |e1 ) . . . (xn |en ) (xn |x1 ) . . . (xn |xn )

Exercises: 1)

• Apply the Gram-Schmidt process to obtain an orthonormal system


from the following sets:

(1, 0, 1), (2, 1, −1), (−1, 1, 0) in R3 ;


(1, 2, 1, 2), (1, 1, 1, 0), (2, 1, 0, 1) in R4 .

• Calculate the orthogonal projection of (1, 3, −2) on [(2, 1, 3), (2, 0, 5)].

2) Show that if the ai are positive, then


n
X Xn
1
( ai )( ) ≥ n2
i=1
a
i+1 i
n
X n
X
2
( ai ) ≤ n( a2i ).
i=1 i=1

85
3) If x1 , . . . , xm are points in Rn , then the set of points which are equidistant
from x1 , . . . , xm is an affine subspace which is of dimension n − m if the xi
are affinely independent.
4) Let (x0 , . . . , xn ) be affinely independent points in Rn . Then there is exactly
one hypersphere through these points and its equation is
 
1 1 ... 1
 X0 X1 ... X 
 
det  .. ..  = 0
 . . 
(x0 |x0 ) (x1 |x1 ) . . . (x|x)

where X is the column vector corresponding to x etc.


5) Suppose that a1 , . . . , an , b1 , . . . , bn are positive numbers. Show that either
a1 an
+···+ ≥n
b1 bn
or
b1 bn
+···+ ≥ n.
a1 an
6) Show that a sequence (x1 , . . . , xn ) in a euclidean space V is an orthogonal
basis if an only if for each x ∈ V ,
n
X
2
kxk = |(x|xi )|2 .
i=1

7) Calculate an orthonormal basis of polynomials for Pol (3) by applying


the Gram-Schmidt process to the basis (1, t, t2 , t3 ) with respect to the scalar
product Z ∞
(p|q) = p(t)q(t)e−t dt.
0
8) Let pn be the polynomial
dn 2
n
(t − 1)n .
dt
Show that Z 1
tk pn (t) dt = 0 (k < n)
−1

and hence that the system (pn ) is orthogonal for the corresponding scalar
product. (This shows that the system (pn ) is, up to norming factors, the
sequence obtained by applying the Gram-Schmidt process to the sequence
(tn ). These functions are called the Legendre polynomials).

86
9) Approximate sin x by a polynomial of degree 3 using the orthonormal
basis of the last exercise. Use this to check the accuracy by calculating an
approximate value of sin 1. 10) Show that for an n×n matrix A the following
inequality holds:
X n
1
a2ij ≥ (tr A)2 .
i,j=1
n

For which matrices do we have equality?


11) Show that if (ξi ) and (ηi ) are vectors in Rn , then
X X X X
( ξi2 )( ηi2 ) = ( ξi ηi )2 − (ξi ηj − ξj ηi )2 .
i i i 1≤i<j≤n

(This is a quantitative version of the Cauchy-Schwarz equations).


12) Consider the euclidean space Mn with the scalar product discussed in the
example abovje. If A is an n × n matrix one can apply the Gram-Schmidt
process to the sequence I, A, A2 , A3 , . . . . This will stop at the smallest k
so that Ak is a linear combination of the earlier. This provides a specific
description of the minimal polynomial of A. The reader is invited to fill in
the details.
13) Let (x1 , . . . , xk ) be a linearly independent set in the euclidean space V .
Show that if x ∈ V , then its orthogonal projection x0 onto the linear span of
the above vectors is given by the formula
k
X
x0 = λr xr
r+1

det Gj
where λj = . Here G is the matrix [gij ] where gij = (xi |xj ) and Gj is
det G
the matrix obtained from G by replacing the j-th column by ??
14) Show that if x1 , . . . , xn is a basis for the euclidean space V , then the
Gram-Schmidt process, applied to this basis, leads to the system (yk ) where
1
yk = (dk−1dk ) 2 D

where dn = det G (G as in 13)) and


 
(x1 |x1 ) . . . (x1 |xn )
 .. .. 
 . . 
Dn = det  .
 (xn−1 |x1 ) . . . (xn−1 |xn ) 
x1 ... xn

87
(The last expression is to be understood as the linear combination of the x’s
obtained by formally expanding the “determinant” along the last row).
15) Use Hadamard’s inequality to show that
n
| det A|2 ≤ K n · n 2

where A is an n × n matrix and K = maxi,j |aij |.


16) Show that
Z ∞
1
min e−t (1 + a1 t + . . . an tn )2 dt = ;
0 n+1
Z 1
1
min (1 + a1 t + · · · + an tn )2 dt =
0 (n + 1)2

where the minima are each taken over the possible choices of the coefficients
a1 , . . . , an .

88
4.2 Orthogonal decompositions
In our discussion of vector spaces we saw that each subspace has a com-
plementary subspace which determines a splitting of the original space. In
general this complementary subspace is not unique but, as we shall now see,
the structure of a euclidean space allows us to choose a unique one in a
natural way.

Definition: If V1 is a subspace of the euclidean space V , then the set

V1⊥ = {x ∈ V : (x|y) = 0 for each y ∈ V1 }

of vectors which are orthogonal to each vector in V1 forms a subspace of V


called the orthogonal complement of V1 .

Proposition 23 V1 and V1⊥ are complementary subspaces. More precisely,


if (x1 , . . . , xn ) is an orthonormal basis for V so that V1 = [x1 , . . . , xr ], then
V1⊥ = [xr+1 , . . . , xn ]. Hence V = V1 ⊕ V1⊥ .
Pn
Proof. It suffices to check that y = i+1 λi xi is orthogonal to each xi
(i = 1, . . . , r) and so to V1 if and only if λ − i = 0 for i = 1, . . . , r i.e.
y ∈ [xr+1 , . . . , xn ].
We shall write V = V ⊥ V1 to denote the fact that V = V1 ⊕ V2 and
V1 ⊥ V2 .
If V = V1 ⊥ V2 and (x1 , . . . , xn ) is an orthonormal basis with V1 =
[x1 , . . . , xr ], then
Xr
PV1 (x) = (x|xi )xi
r=1

is the projection of x onto V1 along V1⊥ . It is called the orthogonal pro-


jection of x onto V1 and has the following important geometric property:
PV1 (x) is the nearest point to x in V1 i.e.

kx − PV1 (x)k < kx − zk (z ∈ V1 , z 6= PV1 (x)).

For if z ∈ V1 , we have

kx − zk2 = kx − PV1 (x)k2 + kPV1 (x) − zk2

since x − PV1 (x) ⊥ PV1 (z).


The point x − 2PV1 (x) is then the mirror image (or reflection) of x in
V1⊥ . The linear mapping Id − 2PV1 is called a reflection.

89
In the same way we define an orthogonal decomposition

V = V1 ⊥ · · · ⊥ Vr .

This means that V = V1 ⊕ · · · ⊕ Vr and Vi ⊥ Vj (i.e. x ⊥ y if x ∈ Vi , y ∈ Vj )


when i 6= j.
Then if Pi is the orthogonal projection onto Vi ,

• P1 + · · · + Pr = Id;

• Pi Pj = 0 (i 6= j).

Such a sequence of projections is called a partition of unity.


Orthogonal splittings of the space are obtained by partitioning an or-
thonormal basis (x1 , . . . , xn ) i.e. we define Vi to be [xki−1 , . . . , xki −1 ] where
1 = k0 < k1 < · · · < kr = n + 1.
On the other hand, if we have such a splitting V1 ⊥ · · · ⊥ Vr , then we
can construct an orthonormal basis for V by combining orthonormal bases
for the components V1 , . . . , Vn .

Exercise: 1) Consider the space Mn with the scalar product

(A|B) = tr (B t A).

Show that Mn is the orthogonal direct sum of the symmetric and the anti-
symmetric matrices.

90
4.3 Self-ajdoint mappings— the spectral theorem
One of the most important consequences of the existence of a scalar product
is that the fact that it induces a certain symmetry on the linear operators
on the space. If f : V → V1 we say that g : V1 → V is ajdoint to f if
(f (x)|y) = (x|g(y)) for x ∈ V and y ∈ V1 . We shall presently see that such
a g always exists. Furthermore it is unique. The general construction is
illustrated by the following two examples:
I. Consider the mapping f : R2 → R2 defined by the matrix
 
a11 a12
.
a22 a22

A simple calculation shows that

(f (x)|y) = a11 ξ1 η1 + a12 ξ2 η1 + a21 ξ1 η2 + a22 ξ2 η2 .

If g is the mapping with matrix At , then analogously

(x|g(y)) = a11 η1 ξ1 + a21 η2 ξ1 + a12 η1 ξ2 + a22 η2 ξ2

and both are equal.


II. Now consider the mapping f : Rn → Rm defined by the matrix A = [aij ].
We know that aij is the i-th coordinate of f (ej ) i.e. that aij = (f (ej )|ei ) (re-
member the method of calculating coordinates with respect to an orthonor-
mal basis). Hence if a mapping g : Rm → Rn exists which is adjoint to f ,
then
aij = (f (ej )|ei ) = (ej |g(ei ))
and by the same reasoning the latter is the (j, i)-th entry of g. In other
words the only possible choice for g is the mapping with matrix B = At . A
simple calculation shows that this mapping does in fact satisfy the required
conditions. Since these considerations are perfectly general, we state and
prove them in the form of the following Proposition:

Proposition 24 If f : V → V1 is a linear mapping there is exactly one


linear mapping g : V1 → V which is adjoint to f . If f has matrix A with
respect to orthonormal bases (xi ) resp. (yj ) then g is the mapping with matrix
At with respect to (yj ) and (xi ).

91
Proof. Suppose that x ∈ V and y ∈ V1 . Then
n
X m
X
(f (x)|y)) = (f ( (x|xj )xj )| (y|yk )yk )
j=1 k=1
m X
X n X
=( ( aij (x|xj )yi | (y|yk )yk )
i−1 j=1
m
XX n X m
= aij (x|xj )(y|yk )(yi |yk )
i=1 j=1 k=1
Xm X n
= aij (x|xj )(y|yi).
i=1 j=1

Similarly, we have
m X
X n
(x|g(y)) = aij (x|xj )(y|yi)
i=1 j=1

if g is the operator with matrix as in the formulation and so

(f (x)|y) = (x|g(y))

i.e. g is adjoint to f .
Naturally, we shall denote g by f t and note the following simple properties:
• (f + g)t = f t + g t ;
• (λf )t = λf t ;
• (f g)t = g t f t ;
• (f t )t = f ;
• an operator f is an isometry if and only if f t f = Id. In particular, if
V = V1 , this means that f t = f −1 .
We prove the last statement. We have seen that f is an isometry if and only
if (f (x)|f (y)) = (x|y) for each x, y and this can be restated in the form

(f t f (x)|y) = (x|y)

i.e. f t f = Id.
A mapping f : V → V is said to be self-adjoint if f t = f i.e.

(f (x)|y) = (x|f (y))

92
for x, y ∈ V . This is equivalent to f being represented by a symmetric A
with respect to an orthonormal basis. In particular, this will be the case if
f is represented by a diagonal matrix. The most important result of this
chapter is the following which states that the converse is true:

Proposition 25 Let f : V → V be self-adjoint. Then V possesses an or-


thonormal basis consisting of eigenvectors of f . With respect to this basis, f
is represented by the diagonal matrix diag (λ1 , . . . , λn ) where the λi are the
eigenvalues of f .

In terms of matrices this can be stated in the following form:

Proposition 26 Let A be a symmetric n × n matrix. Then there exists an


orthonormal matrix U and real numbers λ1 , . . . , λn so that

U t AU = U −1 AU = diag (λ1 , . . . , λn ).

In particular, a symmetric matrix is diagonalisable and its characteristic poly-


nomial has n real roots.

This follows immediately from the above Proposition, since the transfer ma-
trix between orthonormal bases is orthonormal.
Before proceeding with the proof we recall that we have essentially proved
this result in the two dimensional case in our treatment of conic sections in
Chapter II.
One of the consequences of the result is the fact that every symmetric
matrix A has at least one eigenvalue. Indeed this is the essential part of
the proof as we shall see and before proving
 this we reconsider the two di-
a11 a12
mensional case. The symmetric matrix induces the quadratic
a12 a22
form
φ1 (x) = (fA (x)|x)
and so define the conic section

Q1 = {x : φ1 (x) = 1}.

If we compare this to the form φ2 (x) = (x|x) which defines the unit circles
Q2 = {x : φ2 (x) = 1} and consider the case where Q1 is an ellipse, then we
see that the eigenvectors of A are just the major and minor axes of the ellipse
φ1 (x)
and these can be characterised as those directions x for which the ratio
φ2 (x)
is extremal. Hence we can reduce the search for eigenvalues to one for the
extremal value of a suitable functional, a problem which we can solve with

93
the help of elementary calculus. This simple idea will allow us to prove the
following result which is the core of the proof. We use the Proposition that
a continuous function on a closed, bounded subset of Rn is bounded and
attains its supremum.
Proposition 27 Lemma Let f : V → V be self-adjoint. Then there exists
an x ∈ V with kxk = 1 and λ ∈ R so that f (x) = λx (i.e. f has an
eigenvalue).
Proof. We first consider the case where V = Rn with the natural scalar
product. Then f is defined by an n × n symmetric matrix A. Consider the
function
(f (x)|x)
φ : x 7→
(x|x)
on V \ {0}. Then φ(λx) = φ(x) for λ 6= 0. There exists an x1 ∈ V so
that kx1 k = 1 and φ(x1 ) ≥ φ(x) for x ∈ V with kxk = 1. Hence, by the
homogeneity,
φ(x1 ) ≥ φ(x) (x ∈ V \ {0}).
We show that x1 is an eigenvector with f (x1 ) = λ1 x1 where λ1 = φ(x1 ) =
(f (x1 )|x1 ). To do this choose y 6= 0 in V . The function
ψ : t 7→ φ(x1 + ty)
has a minimum for t = 0. We show that ψ ′ (x) exists and its value is
2(z|f (x1 )) − 2(y|x1)λ1 . Hence this must vanish for each y i.e.
(y|f (x1) − λ1 x1 ) = 0.
Since this holds for each y, f (x1 ) = λ1 x1 . To calculate the derivative of ψ
we compute the limit as t tends to zero of the difference quotient
1 1
(ψ(t) − ψ(0)) = (φ(x1 + ty) − φ(x1 )).
t t
But this is the limit of the expression
1 (x1 |x1 )[(f (x1 )|x1 ) + 2t(f (x1 )|y)) + t2 (f (y)|y)] − (f (x1 )|x1 )[(x1 |x1 ) + 2t(x1 |y) + t2 (y|y)]
[ ]
t (x1 + ty|x1 + ty)(x1 |x1 )
which is easily seen to be
2(y|f (x1)) − 2(y|x1)λ1 .
In order to prove the general case (i.e. where f is an operator on an
abstract euclidean space), we consider an orthonormal basis (x1 , . . . , xn ) for
V and let A be the matrix of f with respect to this basis. Then A has an
eigenvalue and of course this means that f also has one.
We can now continue to the proof of the main result:

94
Proof. The proof is an induction argument on the dimension of V . For
dim V = 1, the result is trivial (all 1 × 1 matrices are diagonal!) The step
n − 1 → n: By the Lemma, there exists an eigenvector x1 with kx1 k = 1 and
eigenvalue λ1 . Put V1 = {x1 }⊥ = {x ∈ V : (x|x1 ) = 0}. Then V1 is (n − 1)-
dimensional and f (V1 ) ⊂ V1 since if x ∈ V1 , then (f (x)|x1 ) = (x|f (x1 )) =
(x|λ1 x1 ) = 0 and so f (x) ∈ V1 . Hence by the induction hypothesis there
exists an orthonormal basis (x2 , . . . , xn ) consisting of eigenvectors for f . Then
(x1 , . . . , xn ) is the required orthonormal basis for V .
The above proof implies the following useful characterisation of the largest
resp. smallest eigenvalue of f ;
Corollar 3 Let f : V → V be a self-adjoint linear mapping with eigenvalues
λ1 , . . . , λn numbered so that λ1 ≤ · · · ≤ λn . Then
λ1 = min{φ(x) : x ∈ V \ {0}}
λn = max{φ(x) : x ∈ V \ {0}}.
This can be generalised to the following so-called minimax characterisation
of the k-th eigenvalue:
λk = min max{(f (x)|x) : x ∈ V \ {0}, (x|y1) = · · · = (x|yr ) = 0}
the minimum being taken over all finite sequences y1 , . . . , yr of unit vectors
where r = n − k.
We remark here that it follows from the proofs of the above results that if
f is such that (f (x)|x) ≥ 0 for each x in V , then its eigenvalues are all non-
negative (and they are positive if (f (x)|x) > 0 for non-zero x). Such f are
called positive semi-definite resp. positive definite and will be examined
in some detail below. It follows form the above minimax description of the
eigenvalues that
λk (f + g) ≥ λk (f )
whenever f is self-adjoint and g is positive semi-definite (with strict inequality
when g is positive definite). As we have seen, a symmetric operator is always
diagonalisable. Using this fact, we can prove the following weaker result for
arbitrary linear operators between euclidean spaces.
Proposition 28 Let f : V → V1 be a linear mapping. Then there exist or-
thonormal bases (x1 , . . . , xn ) for V and (y1 , . . . , ym ) for V1 so that the matrix
of f with respect to these bases has the form
 
A1 0
0 0
where A1 = diag (µ1 , . . . , µr ), r is the rank of f and µ1 , . . . , µr are positive
scalars.

95
The corresponding result for matrices is the following:
Proposition 29 If A is an m × n matrix then there exist orthonormal ma-
trices U1 and U2 so that U1 AU2 has the above form.
Proof. We prove the operator form of the result. Note that the mapping
f t f on V is self-adjoint since (f t f )t = f t f tt = f t f . We can thus choose
an orthonormal basis (xi ) consisting of eigenvectors for f t f . If λi is the
corresponding eigenvalue we can number the xi so that the first r eigenvalues
are non-zero but the following √ ones all vanish. Then each λi is positive
(i = 1, . . . , r) and kf (xi )k = λi . For

λi = λi (xi |xi ) = (λi xi |xi ) = (f t f (xi )|xi ) = (f (xi )|f (xi )) > 0.

Also if i 6= j then f (xi ) and f (xj ) are perpendicular since

(f (xi )|f (xj )) = (f t f (xi )|xj ) = λi (xi |xj ) = 0.

Hence if we put
f (x1 ) f (xr )
y1 = √ , . . . , yr = √
λ1 λr
then (y1 , . . . , yr ) is an orthonormal system in V1 . We extend it to an or-
thonormal basis (y1 , . . . , ym ) for V1 . The matrix of f with respect to these
bases clearly has the desired form.
The proof shows that the µi are the square roots of the eigenvalues of f t f
and so are uniquely determined by f . They are call the singular values of
f.

Example: Let V be a euclidean space with a basis (xi ) which is not as-
sumed to be orthonormal. Show that if f ∈ L(V ) has matrix A with re-
spect to this basis, then f is self-adjoint if and only if At G = GA where
G = [(xi |xj )]i,j .
Solution: f is self-adjoint if and only if
n
X n
X n
X n
X
(f ( λi xi | µj xj ))) = ( λi xi |f ( µj xj ))))
i=1 j=1 i=1 j=1

for all choices of scalars λ1 , . . . , λn , µ1, . . . , µn and this holds if and only if

(f (xi )|xj ) = (xi |f (xj ))

for each i and j. Substituting the values of f (xi ) and f (xj ) one gets the
required equation At G = GA.

96
Exercises: 1) Calculate the adjoints of the following operators:

• (ξ1 , ξ2 , ξ3 ) 7→ (ξ1 + ξ2 + ξ3 , ξ1 + 2ξ2 + 2ξ3 , ξ1 − 2ξ2 − 2ξ3 ) (on R3 with


the usual scalar product);

• the differentiation operator D on Pol (n);

• p 7→ (t 7→ tp(t)) from Pol (n) into Pol (n + 1).

2) Show that if f ∈ L(V ), then

Ker f t = f (V )⊥ f t (V ) = (Ker f )⊥ .

3) Show that for every self-adjoint operator f there are orthogonal projections
P1 , . . . , Pr and real scalars λ1 , . . . , λr so that

• Pi Pj = 0 if i 6= j;

• P1 + · · · + Pr = Id;

• f = λ1 P1 + · · · + λr Pr .

Calculate the Pi explicitly for the operator on R3 with matrix


 
0 1 0
 1 0 1 .
0 1 0

4) Show that an operator f is an orthogonal projection if and only if f 2 = f


and f is self-adjoint. Let f be a self-adjoint operator so that f m = f n for
distinct positive integers m and n. Show that f is an orthogonal projection.
5) Show that if f is an invertible, self-adjoint operator, then λ is an eigenvalue
of f if and only if λ1 is an eigenvalue of f −1 . Show that f and f −1 have the
same eigenvectors.

97
4.4 Conic sections
As mentioned above, the theory of conic sections in the plane was the main
source of the ideas which lie behind the spectral theorem. We now indicate
briefly how the latter can be used to give a complete classification of higher
dimensional conic sections. The latter are defined as follows:

Definition: A conic section in a euclidean space V is a set of the form

Q = {x ∈ V : (f (x)|x) + 2(b|x) + c = 0}

where b ∈ V , c ∈ R and f is a self-adjoint operator on V . In order to simplify


the analysis we assume that the conic section is central i.e. has a point of
central symmetry (for example, in R2 an ellipse is central whereas a parabola
is not). If we then take this point to be the origin, it follows that x ∈ Q if
and only if −x ∈ Q. Analytically this means that b = 0 i.e. A has the form

{x ∈ V ; (f (x)|x) + c = 0}.

Proposition 30 If Q is a conic section of the above form, then there is an


orthonormal basis (x1 , . . . , xn ) for V and real numbers λ1 , . . . , λn so that

Q = {x = ξ1 x1 + . . . ξn xn : λ1 ξ12 + . . . λn ξn2 + c = 0}.

This result can be restated as follows: let f be the isometry from V into Rn
which maps (x‘ 1, . . . , xn ) onto the canonical basis (e1 , . . . , en ). Then

f (Q) = {(ξ1 , . . . , ξn ) ∈ Rn : λ1 ξ12 + · · · + λn ξn2 + c = 0}.

Proof. Choose for (x1 , . . . , xn ) an orthonormal basis for V so that f (xi ) =


λi xi . Then
(f (x)|x) = λ1 ξ12 + · · · + λn ξn2
if x = ξ1 x1 + . . . ξn xn .
As an application, we classify the central conics in R3 . Up to an isometry
they have the form

{(ξ1 , ξ2 , ξ3) : λ1 ξ12 + λ2 ξ22 + λ3 ξ32 + c = 0}.

By distinguishing the various possibilities for the signs of the λ’s, we obtain
the following types: λ1 , λ2 , λ3 all positive. Then we can reduce to the form

ξ12 ξ22 ξ32


+ + 2 +d=0
a2 b2 c

98
and this is an ellipsoid for d < 0, a point for d = 0 and the empty set for d > 0.
λ1 , λ2 , λ3 all negative. This can be reduced to the first case by multiplying
by −1. λ1 , λ2 positive, λ3 negative. Then we can write the equation in the
form
ξ12 ξ22 ξ32
+ − 2 + d = 0.
a2 b2 c
(the cases λ1 , λ3 > 0, λ2 < 0 resp. λ2 , λ3 > 0, λ1 < 0 can be reduced to the
above by permuting the unknowns). The above equation represents a circular
cone (d = 0) or a one-sheeted or two sheeted hyperboloid (depending on the
sign of d). The cases where at least one of the λ’s vanishes can be reduced
to the two-dimensional case.

Exercises: 1) Show that each non-central conic

{x ∈ R3 : (f (x)|x) + 2(g|x) + c = 0}

where b 6= 0 is isometric to one of the following:

ξ12 ξ22
{x : + − 2ξ3 + d = 0}
a2 b2
ξ2 ξ2
{x : 12 − 22 − 2ξ3 + d = 0}
a b
2
{x : ξ1 − 2x2 + d = 0}

and discuss their geometrical forms.


2) If
3 ξ12 ξ22 ξ32
Q = {x ∈ R : 2 + 2 + 2 = 1}
a b c
is an ellipsoid with a ≤ b ≤ c show that the number of planes through 0
which cut Q in a circle is 1 (if a = b or b = c but a 6= c), 2 (if a, b, c are
distinct) or infinite (if a = b = c). 3) Diagonalise the quadratic form
n
X
φ(x) = (ξi − ξj )2
i=1

on Rn .

99
4.5 Hermitian spaces
For several reasons, it is useful to reconsider the theory developed in this
chapter in the context of complex vector space. Amongst other advantages
this will allow us to give a purely algebraic proof of the central result—the
diagonalisation of symmetric matrices. This is because the existence of an
eigenvalue in the complex case follows automatically from the fundamental
theorem of algebra.
We begin by introducing the concept of a hermitian vector space i.e.
a vector space V over C with a mapping
( | ):V ×V →C
so that
• (λx + y|z) = λ(x|z) + (y|z) (linearity in the first variable);
• (x|x) ≥ 0 and (x|x) = 0 if and only if x = 0;
• (x|y) = (y|x) (x, y ∈ V ).
Examples: All the examples of euclidean spaces can be “complexified” in
the natural and obvious ways. Thus we have: a) the standard scalar product
((λ1 , . . . , λn ))|(µ1 , . . . , µn )) = λ1 µ1 + · · · + λn µn
on Cn ;
b) the scalar product
Z 1
(p|q) = p(t)q(t) dt
0
on the space PolC (n) of polynomials with complex coefficients. Note the
rather unexpected appearance of the complex conjugation in condition 2)
above. This means that the scalar product is no longer bilinear and is used
in order to ensure that condition 1) can hold. However, the product is
sesqui-linear (from the classical Greek for one and a half) i.e. satisfies the
condition n n n
X X X
( λi xi | µj y j ) = λi µj (xi |yj ).
i=1 j=1 i,j=1

All of the concepts which we have introduced for euclidean spaces can be
employed, with suitable changes usually necessitated by the sesquilinearity
of the scalar product, for hermitian
p spaces. We shall review them briefly: The
length of a vector x is kxk = (x|x). Once again we have the inequality
|(x|y)| ≤ kxkkyk (x, y ∈ V ).

100
(Since there is a slight twist in the argument we give the proof. Firstly we
have (for each t ∈ R),

0 ≤ (x + ty|x + ty) = kxk2 + 2tℜ(x|y) + t2 kyk2

and, as in the real case, this gives the inequality

ℜ(x|y) ≤ kxkkyk.

Now there is a complex number λ with |λ| = 1 and λ(x|y) > 0. If we apply
the above inequality with x replaced by λx we get

|(x|y)| = |(λx|y)| = λ(x|y) = ℜλ(x|y) = ℜ(λx|y) ≤ kλxkkyk = kxkkyk.

Using this inequality, one proves just as before that the distance function
satisfies the triangle inequality i.e.

kx + yk ≤ kxk + kyk.

For reasons that we hope will be obvious we do not attempt to define the
angle between two vectors in hermitian space but the concept of orthogonality
continues to play a central role. Thus we say that x and y are perpendicular
if (x|y) = 0 (written x ⊥ y). Then we can define orthonormal systems and
bases as before and the Gram-Schmidt method can be used to show that every
hermitian space V has an orthonormal basis (x1 , . . . , xn ) and the mapping

(λ1 , λ2 , . . . , λn ) 7→ λ1 x1 + · · · + λn xn

is an isometry from Cn onto V i.e. we have the formulae

(x|y) = λ1 µ1 + · · · + λn µn
kxk2 = |λ1 |2 + · · · + |λn |2

if x = λ1 x1 + · · · + λn xn , y = µ1 x1 + . . . µn xn .

Exercises: 1) Let f be the  linear mapping on the two dimensional space
a b
C2 with matrix A = . Then the scalar product on C2 defines a
c d
norm there and so a norm for operators. Show that the norm of the above
operator is given by the formula
1 p
kf k2 = (h2 + [(h2 − 4| det A|2 ]).
2

101
(Calculate the singular values of A).
2) Show that if H is a hermitian matrix of the form A + iB where A and B
are real and A is non-singular, then we have the formula

(det H)2 = (det A)2 det(I + A−1 BA−1 B).

3) Show that if U is a complex n × n matrix of the form P + iQ where P and


Q are real, then U is unitary (i.e. such that U ∗ U = I) if and only if P t Q is
symmetric and P t P + Qt Q = I.

102
4.6 The spectral theorem—complex version
If f ∈ L(V, V1 ) there is exactly one mapping g : V1 → V so that
(f (x)|y) = (x|g(y)) (x ∈ V, y ∈ V1 ).
We denote this mapping by f ∗ . The proof is exactly the same as for the real
case, except that we use the formula
aij = (f (xj )|yi ) = (xj |g(yj )) = (g(yi )|xj )
for the elements of the matrix A of f with respect to the orthonormal bases
(x1 , . . . , xn ) resp. (y1 , . . . , ym) to show that the matrix of f ∗ is A∗ , the n × m
matrix obtained from A by taking the complex conjugates of elements and
then transposing.
The linear mapping f on V is hermitian if f ∗ = f i.e. if (f (x)|y) =
(x|f (y)) (x, y ∈ V ). This means that the matrix A of f with respect to
an orthonormal basis satisfies the condition A = A∗ (i.e. aij = aji for each
i, j). Such matrices are also called hermitian. f : V → V1 is unitary
if (f (x)|f (y)) = (x|y) (x, y ∈ V ). This is equivalent to the condition that
f ∗ f = Id. Hence the matrix U of f (with respect to orthonormal bases) must
satisfy the condition U ∗ U = I (i.e. the columns of U are an orthonormal
system in Cn ). If dim V = dim V1 (= n say), then U is an n × n matrix and
the above condition is equivalent to the equation U ∗ = U −1 . Such matrices
are called unitary.
We now proceed to give a purely algebraic proof of the so-called spectral
theorem for hermitian operators. We begin with some preliminary results on
eigenvalues and eigenvectors:
Lemma 1 If f ∈ L(V ) with eigenvalue λ, then
• λ is real if f is hermitian;
• |λ| = 1 if f is unitary.
Proof. 1) if the non-zero element x is a corresponding eigenvector, then we
have
(f (x)|x) = (λx|x) = λ(x|x)
and
((f (x)|x) = (x|f (x)) = (x|λx) = λ̄(x|x)
and so λ = λ̄. 2) Here we have
(x|x) = (f (x)|f (x)) = (λx|λx) = |λ|2 (x|x)
and so kλk2 = 1.

103
Proposition 31 Lemma If λ1 , λ2 are distinct eigenvalues of the hermitian
mapping f with corresponding eigenvectors x1 , x2 , then x1 ⊥ x2 .

Proof. We have (f (x1 )|x2 ) = (x1 |f (x2 )) and so

(λ1 x1 |x2 ) = (x1 |λ2 x2 ) i.e. (λ1 − λ2 )(x1 |x2 ) = 0.

(Recall that λ2 = λ̄2 by the Lemma above). Hence (x1 |x2 ) = 0.

Proposition 32 Lemma If f ∈ L(V ), then Ker f = Ker (f ∗ f ).

Proof. Clearly Ker f ⊂ Ker f ∗ f . Now if x ∈ Ker f ∗ f , then f ∗ f (x) = 0 and


so (f ∗ f (x)|x) = 0 i.e. (f (x)|f (x)) = 0. Thus f (x) = 0 i.e. x ∈ Ker f .
In passing we note that this means that if A is an m × n matrix, then
r(A) = r(A∗ A) = r(A∗ ).
Corollar 4 If f ∈ L(V ) is hermitian and r ∈ N, then Ker f = Ker f r .
Proof. Applying the last result repeatedly, we get the chain of equalities
k
Ker f = Ker f 2 = · · · = Ker f 2 = . . .

Each Ker f r lies between two terms of this series and so coincides with Ker f .
We now come to our main result:
Proposition 33 If f ∈ L(V ) is a hermitian mapping, then there exists an
orthonormal basis (xi ) for V so that each xi is an eigenvector for f . With
respect to this basis, f has the matrix diag (λ1 , . . . , λn ) where the λi are the
(real) eigenvalues of f .
Proof. Let Vi = Ker (f −λi Id). It follows from the above corollary (applied
to f − λ1 Id) that V is the direct sum

V = V1 ⊕ Im(f − λ1 Id).

(Recall from Chapter VII that we have such a splitting exactly when the ker-
nel of a mapping coincides with the kernel of its square). A simple induction
argument shows that V is the direct sum

V = V1 ⊕ V2 ⊕ · · · ⊕ Vr .

However, we know, by the Lemma above that Vi ⊥ Vj if i 6= j and so in fact


we have
V = V1 ⊥ · · · ⊥ Vr .
Hence we can construct the required basis by piecing together orthonormal
bases for the various Vi .

104
Corollar 5 If A is a hermitian n × n matrix, then there exists a unitary
n × n matrix U and real numbers λ1 , . . . , λn so that

U −1 AU = diag (λ1 , . . . , λn ).

A useful sharpening of the spectral theorem is as follows:

Proposition 34 Let f and g be commuting hermitian operators on V . Then


V has an orthonormal basis (x1 , . . . , xn ) whereby the xi are simultaneously
eigenvectors for f and g.

Proof. Let λ1 , . . . , λr be the distinct eigenvalues of f and put Vi = Ker (f −


λi Id). Then g(Vi ) ⊂ Vi . For if x ∈ Vi , then f (x) = λi x and f (g(x)) =
g(f (x)) = g(λi x) = λi g(x) i.e. g(x) ∈ Vi . Hence if we apply the above
result to g on Vi , we can find an orthonormal basis for the latter consisting
of eigenvectors for g (regarded as an operator on Vi ). Of course they are
also eigenvectors for f and so the basis that we get for V by piecing together
these sub-bases has the required properties.
For matrices, this means that if A and B are hermition n × n matrices,
then there is a unitary matrix U so that both U ∗ AU and U ∗BU are diagonal
(real) matrices.
 
1+i 1
Example: We diagonalise the matrix A = .
1 1+i
Solution: The characteristic polynomial is

(1 + i)2 − 2(1 + i)λ + λ2 − 1

with roots 2 + i and i. The corresponding eigenvectors are √12 (1, −1) and
 
√1 (1, 1). hence U ∗ AU = diag (i, 2 + i) where U = √1
1 1
.
2 2 −1 1

Example: Show that if A is hermitian, then I + iA is invertible and U =


(I − iA)(I + iA)−1 is unitary.
Solution: We can diagonalise A as

V ∗ AV = diag (λ1 , . . . , λn )

where the λi are real. Then

V ∗ AV = diag (1 + iλ1 , . . . , 1 + iλn )

105
and the right hand side is invertible. Hence so is the left hand side. This
implies that (I + iA)−1 exists. Then

1 − iλ1 1 − iλn
V ∗ (I − iA)(I + iA)V = diag ( ,..., ).
1 + iλ1 1 + iλn
The right hand side is clearly unitary. Exercises: 1) Show that an operator
f on a hermitian space is an orthogonal projection if and only if f = f ∗ f . 2)
Let A be a complex n × n matrix, p a polynomial. Show that if p(A∗ A) = 0,
then p(AA∗ ) = 0.
3) Let p be a complex polynomial in two variables. Show that if A is an n×n
complex matrix so that p(A, A∗ ) = 0, then p(λ, λ̄) = 0 for any eigenvalue
λ of A. What can you deduce about the eigenvalues of a matrix A which
satisfies one of the conditions:

A∗ = cA (c ∈ R) A∗ A = A∗ + A A∗ A = −I?

4) Let A be an n × n complex matrix with eigenvector X and eigenvalue λ1 .


Show that there is a unitary matrix U so that U −1 AU has the block form
 
λ1
 B 
0

for some n × (n − 1) matrix B. Deduce that there is a unitary matrix Ũ so


that Ũ −1 AŨ is upper triangular and that if A is hermitian, then the latter
matrix is diagonal. (This exercise provides an alternative proof of the spectral
theorem).

106
4.7 Normal operators
Normal operators are a generalisation of hermitian ones. Consider first the
diagonal matrix
A = diag (λ1 , . . . , λn ).
A need not necessarily be hermitian (indeed this is the case precisely when the
λi are real). However, it does satisfy the weaker condition that AA∗ = A∗ A
i.e. that A and A∗ commute. We say that such A are normal. Similarly,
an operator f on V is normal if f ∗ f = f f ∗ . Note that unitary mappings
are examples of normal mappings—they are not usually hermitian. We shall
now show that normal mappings have diagonal representations. In order
to do this, we note that any f ∈ L(V ) has a unique representation in the
form f = g + ih where g and h are hermitian (compare the representation
of a complex number z in the form x + iy with x and y real). Indeed if
f = g + ih, then f ∗ = g − ih and so f + f ∗ = 2g i.e. g = 12 (f + f ∗). Similarly,
1
h = (f − f ∗ ). This proves the uniqueness. On the other hand, it is clear
2i
that if g and h are as in the above formula, then they are hermitian and
f = g + ih.
The fact that normal operators are diagonalisable will follow easily from
the following simple characterisation: f is normal if and only if g and h
commute.
Proof. Clearly, if g and h commute, then so do f = g + ih and f ∗ = g − ih.
On the other hand, if f and f ∗ commute then so do g and h since both are
linear combinations of f and f ∗ .

Proposition 35 If f ∈ L(V ) is normal, then V has an orthonormal basis


(x1 , . . . , xn ) consisting of eigenvectors of f .

Proof. We write f = g + ih as above and find an orthonormal basis


(x1 , . . . , xn ) consisting of vectors which are simultaneously eigenvectors for g
and h, say g(xi ) = λi xi , h(xi ) = µi xi . Then

f (xi ) = g(xi ) + ih(xi ) = (λi + µi )(xi )

as claimed.

Corollar 6 If f ∈ L(V ) is unitary, then there exists an orthonormal ba-


sis (x1 , . . . , xn ) so that the matrix of f has the form diag (eiθ1 , . . . , eiθn ) for
suitable θ1 , . . . , θn ∈ R.

107
We close this section with the classification of the isometries of Rn . This
generalises the results of Chapter II on those of R2 and R3 . The method
we use is a standard one for deducing results about the real case from the
complex one and can also be employed to deduce the spectral theorem for
self-adjoint operators on euclidean spaces from the corresponding one for
hermitian operators.
We require the following simple Lemma:
Lemma 2 Let V be a subspace of Cn with the property that if z ∈ V , then
ℜz and ℑz also belong to V (where if z = (z1 , . . . , zn ), then

ℜz = (ℜz1 , . . . , ℜzn ) ℑz = (ℑz1 , . . . , ℑzn )).

Then V has an orthonormal basis consisting of real vectors (i.e. vectors z


with ℑz = 0.]
Proof. We consider the set of all real vectors in V . Then it follows from
the assumptions that this is a euclidean space whose real dimension coincides
with the complex one of V . If (x1 , . . . , xr ) is an orthonormal basis for this
space, it is also one for V .
Proposition 36 Let f be an isometry of the euclidean space V . Then there
is an orthonormal basis (x1 , . . . , xn ) for V with respect to which f has block
matrix of the form  
Ir 0 0 ... 0
 0 −Is 0 . . . 0 
 
 0 0 D . . . 0 
 θ1 
 .. .. 
 0 0 . . 
0 0 0 . . . Dθ t
for suitable integers r, s, t with r + s + 2t = n where Dθ denotes the matrix
 
cos θ − sin θ
.
sin θ cos θ

Proof. Let A be the matrix of f with respect to an orthonormal basis. We


regard A as a unitary operator on Cn . Since its non-real eigenvalues occur
in complex conjugate pairs there is a (complex) unitary matrix U so that

U −1 AU = diag (1, 1, . . . , 1, −1, . . . , −1, eiθ1 , e−iθ1 , . . . , eiθt , e−iθt ).

From this it follows that Cn splits into the direct sum

V1 ⊕ V−1 ⊕ (W1 ⊕ W1′ ) ⊕ · · · ⊕ (Wt ⊕ Wt′ )

108
where

V1 = {x : fA (x) = x}
V−1 = {x : fA (x) = −x}
Wi = {x : fA (x) = eiθi x}
Wi′ = {x : fA (x) = e−iθi x}.

Now consider V1 . Then since A is real, z ∈ V1 if and only if ℜz ∈ V1 and


ℑz ∈ V1 . Hence by the Lemma above we can choose an orthonormal basis
(x1 , . . . , xr ) for V1 consisting of real vectors. Similarly, we can construct an
orthonormal basis for V−1 consisting of real vectors (y1 , . . . , ys ).
For the same reason, the mapping

z 7→ z̄ = ℜz − ℑz

maps Wi onto Wi′ . Hence if (z1 , . . . , zt ) is an orthonormal basis for Wi ,


(z̄1 , . . . , z̄t ) is an orthonormal basis for Wi′ . Now fA maps the two dimensional
space spanned by zi and z̄i into itself and has matrix
 iθ 
e i 0
0 e−iθi

with respect to this basis.


We introduce the real bases wi and wi′ where
zi + z̄i zi − z̄i
wi = √ wi′ = √
2 i 2
and consider the mapping fA on the two dimensional real space spanned by
these vectors. Then a simple calculation shows that wi and wi′ are perpen-
dicular (since zi ⊥ z̄i ) and that the matrix of fA with respect to this basis
is  
cos θi − sin θi
.
sin θi cos θi
If we combine all the basis elements obtained in this way, we obtain an
orthonormal basis of the required sort.

Exercises: 1) Show that if f is a linear operator on a hermitian space


V , then either of the following conditions is equivalent to the fact that f is
normal:

• (f (x)|f (y)) = (f ∗ (x)|f ∗ (y)) (x, y ∈ V );

109
• kf (x)k = kf ∗ (x)k (x ∈ V ).
2) Show that an m × n complex matrix has factorisations
A = UB = CV

where U (resp. V ) is unitary m × m (resp. n × n) and B and C are positive


semi-definite. Show that if A is normal (and so square), then U and B can
be chosen in such a way that they commute.
3) Show that a complex n × n matrix A is normal if and only if there is a
complex polynomial p so that A∗ = p(A).
4) Let A be a normal matrix and suppose that all of the row sums have a
common value c. Show that the same is true of the column sums. What is
the common value in this case?
5) Let A be a normal n × n matrix. Show that the set of eigenvalues of A is
symmetric about zero (i.e. is such that if λ is an eigenvalue then so is −λ) if
and only if tr A2k+1 = 0 for k = 0, 1, 2, . . . .
6) Let U be an isometry of the hermitian space V and put
V1 = {x : Ux = x}.
Show that if Un is the mapping
1
Un = (I + U + · · · + U n−1 )
n
then for each x ∈ V , Un x converges to the orthogonal projection of x onto
V1 .
7) If f ∈ L(V ) is such that f ∗ = −f , then Id + f is invertible and (Id −
f )(Id + f )−1 is unitary. Which unitary operators can be expressed in this
form?
8) Show that if an n × n complex matrix is simultaneously unitary and
triangular, then it is diagonal.
9) Let A be a normal matrix with the property that distinct eigenvalues
have distinct absolute values. Show that if B is a normal matrix, then AB
is normal if and only if AB = BA. (Note that it is not true in general that
AB is normal if and only if A and B commute (where A and B are normal
matrices) although the corresponding result holds for self-adjoint mappings.
The above result implies that this is true in the special case where either A
or B is positive definite).
10) Let f ∈ L(V ) be such that if x ⊥ y, then f (x) ⊥ f (y). Show that there
exists an isometry f and a λ ∈ R so that f = λg.
11) (This exercise provides the basis for a direct proof of the spectral theorem
for isometries on Rn without having recourse to the complex result). Suppose

110
that f is an orthonormal operator on Rn . Show that there exists a one- or
two-dimensional subspace of Rn which is f -invariant. (One can suppose that
f has no eigenvalues. Choose a unit vector x so that the angle between x
and f (x) is minimum. Show that if y is the bisector of the angle between
x + f (x)
x and f (x) (i.e. y = ), then f (y) lies on the plane through x and
2
f (x)—hence the latter is f -invariant).

111
4.8 The Moore-Penrose inverse
We now return once again to the topic of generalised inverses. Recall that if
f : V → W is a linear mapping, we construct a generalised inverse for f by
considering splittings
V = V1 ⊕ V2 W = W1 ⊕ W2
where V2 is the kernel of f , W1 the image and V1 and W2 are complementary
subspaces. In the absence of any further structure, there is no natural way to
choose W2 and V1 . However, when V and W are euclidean space, the most
obvious choices are the orthogonal complements V1 = V2⊥ and W2 = W1⊥ .
Then generalised inverse that we obtain in this way is uniquely specified and
denoted by f † . It is called the Moore-Penrose inverse of f and has the
following properties:
• f †f f † = f †;
• f f †f = f †;
• f † f is the orthogonal projection onto V1 and so is self-adjoint;
• f f † is the orthogonal projection onto W1 and so is self-adjoint.
In fact, these properties characterise f † —it is the only linear mapping from
W into V which satisfies them as can easily be seen.
It follows that if y ∈ W , then x = f † (y) is the “best” solution of the
equation f (x) = y in the sense that
kf (x) − yk ≤ kf (z) − yk
for each z ∈ V i.e. f (x) is the nearest point to y in f (V ). In addition x is
the element of smallest norm which is mapped onto this nearest point.
In terms of matrices, these results can be restated as follows: let V and W
have orthonormal bases (x1 , . . . , xn ) resp. (y1 , . . . , ym ) and let f have matrix
A with respect to them. Then the matrix A† of f † satisfies the conditions:
AA† A = A A† AA† = A† AA† and A† A are self-adjoint.
Of course, A† is then called the Moore-Penrose inverse of A and is
uniquely determined by the above equations. The existence of f † can also
be proved elegantly by using the result on singular values from the third
paragraph. Recall that we can choose orthonormal bases (x1 , . . . , xn ) resp.
(y1 , . . . , ym ) so that the matrix of f has the block form
 
A1 0
0 0

112
with A1 = diag (µ1 , . . . , µr ). Then f † is the operator with matrix
 −1 
A1 0
0 0
with respect to (y1 , . . . , ym ) resp. (x1 , . . . , xn ). Note that f is injective if and
only if r = n. In this case, f has matrix
 
A1
0
and f † is the mapping (f t f )−1 f t as one can verify by computing the matrix
of the latter product.
Of course, the abstract geometric description of the Moore-Penrose in-
verse is of little help in calculating concrete examples and we mention some
explicit formulae which are often useful.
Firstly, suppose that A has block representation [B C] where B is an
invertible (and hence square) matrix. Then it follows from the results on
positive definite matrices that BB t + CC t is invertible. The Moore-Penrose
inverse of A is then given by the formula
 t 
† B (BB t + CC t )−1
A =
C t (BB t + CC t )−1
as can be checked by multiplying out.

Example: Consider the matrix


 
1 0 ... 0 1
 0 1 ... 0 1 
 
A =  .. ..  .
 . . 
0 0 ... 1 1
Then A is of the above form with B = In and one can easily calculate that
A† is the matrix  
n −1 −1 . . . −1
 −1 n −1 . . . −1 
1   ..

.. 
 . . 
n+1 
 −1 −1 −1 . . . n 
1 1 1 ... 1
The Moore-Penrose inverse can also be calculated by means of the fol-
lowing recursion method. We put
B = A∗ A

113
and define

C0 = I
C1 = tr (C0 B)I − C0 B
1
C2 = tr (C1 B)I − C1 B
2
and so on. It turns out that Cr B vanishes where r is the rank of A and the
earlier values have non-vanishing trace. Then we have the formula

(r − 1)Cr−1 A∗
A† = .
tr (Cr−1 B)

This can be checked by noting that the various steps are independent of the
choice of basis. Hence we can choose bases so that the matrix of the operator
defined by A has the form  
A1 0
0 0
where A1 = diag (λ1 , . . . , λr ). This is a simple calculation.
As an application, we consider the problem of the least square fitting of
data. Let (t1 , y1 ), . . . , (tn , yn ) be points in R2 . We determine real numbers
c, d so that the line y = ct + d provides an optimal fit. This means that c
and d should be a “solution” of the equation
 
t1 1   y1
 .. ..  c
 . .  = ... .
d
tn 1 yn

If we interpret this in the sense that c and d are to be chosen so that the
error
(y1 − ct1 − d1 )2 + · · · + (yn − ctn − dn )2
be as small as possible, then this reduces to calculating the Moore Penrose
inverse of  
t1 1
 
A =  ... ... 
tn 1
since the solution is  
  y1
c  
= A†  ...  .
d
yn

114
If the ti are distinct (which we tacitly assume), then A† is given by the
formula
A† = (At A)−1 At .
In this case  P 
t t2i t̄
AA=
t̄ n
where t̄ = t1 + · · · + tn .

Example: We consider the concrete case of the data


01235
for the value of t and
02346
for the corresponding values of the y’s, then we have
 
0 1
 1 1 
 
A=  2 1 

 3 1 
5 1
and so  
t 39 11
AA=
11 5
and  
t −1 1 5 −11
(A A) = .
74 −11 39
This leads to the solution
 
0
     2   
c 1 5 −11 0 1 2 3 5   71
=  3 = 74 .
d 74 −11 39 1 1 1 1 1   51
 4  74

5
Similar applications of the Moore-Penrose inverse arise in the problem of
curve fitting. Here one is interested in fitting lower order curves to given
data. In chapter V we saw how the methods of linear algebra could be
applied. In practical applications, however, the data will be overdetermined
and will not fit the required type of curve exactlyl. In this case, the Moore-
Penrose inverse can be used to find a curve whcih provides what is, in a
certain sense, a best fit. We illustrate this with an example.

115
Example: Suppose that we are given a set of points P1 , . . . , Pn in the plane
and are looking for an ellipse which passes through them. In order to simplify
the arithmetic, we shall assume that the ellipse has equation of the form
αξ12 + βξ22 = 1

(i.e. that the principal axes are on the coordinate axes). Then we are requred
to find (positive) α and β so that the equations

α(ξ1i )2 + β(ξ2i )2 = 1
are satisfied (where Pi has coordinates (ξ1i , ξ2i )). This is a linear equation
with matrix  
(ξ11 )2 (ξ21 )2
 ..  .
A =  ... . 
n 2 n 2
(ξ1 ) (ξ2 )
Our theory would lead us to expect that the vector
 
1
  †  .. 
a b =A  . 
1

will be a good solution.

Exercises: 1) Show that the least square solution of the system


x = bi (i = 1, . . . , n)

is the mean value of b1 , . . . , bn .


2) Calculate explicitly the least square solution of the system

y i = ci t + d
considered above.
3) Suppose that f is an operator on the hermitian space V . Show that if f is
surjective, then f f t is invertible and the Moore-Penrose inverse of f is given
by the formula
f † = f t (f f t )−1 .
Interpret this in terms of matrices.
4) Show that f ∈ L(V ) commutes with its Moore-Penrose inverse if and only
if the ranges of f and f ∗ coincide and this is equivalent to the fact that we
have splitting
V = f (V ) ⊥ Ker (f )

116
resp. that there is an orthonormal basis with respect to which f has matrix
 
A 0
0 0

where A is invertible.
5) Show that if A is an m × n matrix, then there are polynomials p and q so
that
A† = A∗ p(AA∗ ) and A† = q(A∗ A)A∗ .
6) Show that the Moore-Penrose inverse of A can be written down explicitly
with the help of the following integrals:
Z ∞
∗ t

A = e−(A A) A∗ dt
−∞
Z
† 1 1
A = (zI − A∗ A)−1 A∗ dz
2πi c z

(the latter being integrated around a simple closed curve which encloses
the non-zero eigenvalues of A∗ A. These integrals of matrix-valued functions
are to be interpreted in the natural way i.e. they are integrated elementwise).
7) Show that if A is normal with diagonalisation
 
∗ A1 0
U AU =
0 0

where A = diag (λ1 , . . . , λr ) with λ1 , . . . , λr the non-zero eigenvalues, then

A† = (A + P )−1 − P

where  
∗ 0 0
P =U U.
0 In−r
8) Use 7) to show that if C is a circulant with rank r = n − p and F ∗ CF =
diag (λ1 , . . . , λn ) is its diagonalisation as above, then

C † = C(I + K)−1 − K

where K is defined as follows. Suppose that the eigenvalues λi1 , . . . , λip


vanish. Then
X 1
K= Kis with Kir = circ (1, ø−ir +1 , . . . , ø(n−1)(−ir +1) ).
s
n

117
9) Show how to use the Moore-Penrose inverse in obtaining a polynomial of
degree at most n − 1 to fit data

(t1 , x1 ), . . . , (tm , xm )

where m > n. (The ti are assumed to be distinct.)


10) Show that an operator f commutes with its Moore-Penrose inverse if and
only if its range is the orthogonal complement of its kernel (or, alternatively,
if the kernels of f and f ∗ coincide). Show that in this case, f † is a polynomial
in f .

118
4.9 Positive definite matrices
We conclude this chapter with a discussion of the important topic of positive
definite matrices. Recall the following characterisation:
Proposition 37 Let f be a self-adjoint operator on V . Then the following
are equivalent:
• f is positive definite;
• all of the eigenvalues of f are positive;
• there is an invertible operator g on V so that f = g t g.
Proof. (1) implies (2): If λ is an eigenvalue,with unit eigenvector x, then

0 < (f (x)|x) = (λx|x) = λ(x|x) = λ.

(2) implies (3): Choose an orthonormal basis (xi ) of eigenvectors for f . Then
the matrix of f is diag (λ1 , . . . , λn )√where, by√assumption, each λi > 0. Let g
be the operator with matrix diag ( λ1 , . . . , λn ). Then f = g t g. (3) implies
(1): If f = g t g, then

(f (x)|x) = (g t g(x)|x) = (g(x)|g(x)) = kg(x)k2 > 0

if x 6= 0.
There are corresponding characterisations of positive-semidefinite opera-
tors, resp. positive definite operators on hermitian spaces.
Suppose that the n × n matrix A is positive definite. By the above, A
has a factorisation B t B for some invertible n × n matrix. We shall now show
that B can be chosen to be upper triangular (in which case it is unique). For
if A = [aij ], then a11 > 0 (put X = (1, 0, . . . , 0) in the condition X t AX > 0).
Hence there is a matrix L1 of the form
1

a11
0 ... 0
−b21 1 . . . 0  
.. .. 
. . 

−bn1 0 ... 
1
so that the first row of L1 A is  
1
 0 
 
 .. 
 . 
0

119
(we are applying the Gaußian elimination method to reduce the first column).
Since A is symmetric, we have the inequality
 
t 1 0
L1 AL1 =
0 A2

where A2 is also positive definite. Note that the matrix L1 is lower triangular.
Proceeding inductively, we obtain a sequence L1 , . . . , Ln−1 of such matrices so
that if L = Ln−1 . . . L1 , then LALt = I. Hence A has the factorisation B t B
where B = (L−1 )t and so is upper triangular. This is called the Cholelsky
factorisation of A.
An almost immediate Corollary of the above is the following characteri-
sation of positive definite matrices: A symmetric n × n matrix A is positive
definite if and only if det Ak > 0 for k = 1, . . . , n where A is the k × k matrix
[aij ]ki,j=1 .
Proof. Necessity: Note that if A is positive definite then det A > 0 since
the determinant is the product of the eigenvalues of A. Clearly each Ak is
positive definite if A is (apply the defining condition on A to the vectors of
the form (ξ1 , . . . , ξk , 0, . . . , 0)). Sufficiency: Let A satisfy the above condition.
In particular, a11 > 0. As above we find a lower triangular matrix L1 with
 
t 1 0
à = L1 AL1 =
0 C

for a suitable (n − 1) × (n − 1) matrix C. The submatrices Ãk of this new


matrix are obtained from those of A by multiplying the first row resp. column
1
by (which is positive) or by subtracting the first row (resp. column) from
a11
the later ones. This implies that the new matrix à and hence C satisfy the
same conditions and can proceed as in the construction of the Cholelsky
factorisation to get A = B t B i.e. A is positive definite.
There are a whole range of inequalities involving positive definite matri-
ces. Many of their proofs are based on the following formula: Let A be a
positive definite n × n matrix. Then
Z P
n
− aij ξi ξj π2
e dξ1 . . . dξn = √ .
Rn det A
Proof. If we choose new coordinates (η1 , . . . , ηn ) with respect to which A
is diagonal, we get the integral
Z
2 2
e−(λ1 η1 +···+λn ηn ) dη1 . . . dηn
Rn

120
Q R
whose value is ni=1 ( R e−λi ηi dηi ) (λ1 , . . . , λn are the eigenvalues of A). The
2

result now follows from the classical formula


Z
2 √
e−y dy = π
R

by a change of variables.

Exercises: 1) Show that if f ∈ L(V ) is positive definite, then

(f (x)|x)(f −1 (y)|y) ≥ (x|y)2

for x, y ∈ V .
2) Let f and g be self-adjoint operators where g is positive definite. The
generalised eigenvalue problem for f and g is the equation f (x) = λg(x)
(where, as usual, only non-zero x are of interest). Show that the space has
a basis of eigenvectors for this problem (put g = ht h where h is invertible
and note that the problem is equivalent to the usual eigenvalue problem for
(h−1 )t f h−1 ).
3) Show that every operator f on a euclidean space has uniquely determined
representations
f = hu = u1 h1
where u and U1 are isometries and h, h1 are positive semi-definite. Show that
f is then normal if and only if h and u commute, in which case u1 = u and
h1 = h.
4) Show that if A is a real, positive definite matrix, then
1
(det A) n = min tr (AB).
det B=1,B≥0

Hence deduce that


1 1 1
(det(A + B)) n ≥ (det A) n + (det B) n .

5) Let A be a positive definite n × n matrix and denote by Ai the matrix


obtained by deleting the i-th row and the i-th column. Show that
det(A + B) det A det B
≥ +
det(Ai + Bi ) det Ai det Bi

(where B is also positive definite and Bi is defined as for Ai ). 6) Show that


if f is a positive definite operator with eigenvalues

λ1 > · · · > λn ,

121
then
Y Y
λi ≤(f (xi )|xi )
X X
λi ≤ (f (xi )|xi )

for any orthonormal basis (xi ).


7) Let f and g be operators on the euclidean space V . Show that they
satisfy the condition kf (x)k ≤ kg(x)k for each x ∈ V if and only if g t g − f t f
is positive semi-definite. Show that if they are normal and commute, this
is equivalent to the fact that they have simultaneous matrix representation
diag (λ1 , . . . λn ) and diag (µ1 , . . . , µn ) (with respect to an orthonormal basis)
where |λi | ≥ |µi | for each i.
8) Show that if A is a symmetric matrix so that there points a < b such that

(A − aI)(A − bI)

is positive definite, then A has no eigenvalues between a and b.

122
5 MULTILINEAR ALGEBRA
In this chapter we bring a brief introduction to the topic of multilinear alge-
bra. This includes such important subjects as tensors and multilinear forms.
As usual, we employ a coordinate-free approach but show how to manipulate
with coordinates via suitable bases in the spaces considered. We begin with
the concept of the dual space.

5.1 Dual spaces


If V is a vector space, then the dual space of V is the space L(V ; R) of
linear mappings from V into R. As we have seen, this is a linear space with
the natural operations. We denote it by V ∗ . For example, if V is the space
Rn , then, as we know, the space of linear mappings from Rn into R can
be identified with the space M1,n of 1 × n matrices, where y = [η1 , . . . , ηn ]
induces the linear mapping

fy : (ξ1 , . . . , ξn ) 7→ ξ1 η1 + · · · + ξn ηn .

Since M1,n is naturally isomorphic to Rn we can express this in the equation


(Rn )∗ = Rn which we regard as a shorthand for the fact that the mapping
y 7→ fy is an isomorphism from the second space onto the first one.
We note the following simple properties of the dual space:

Proposition 38 Let x, y be elements of a vector space V . Then

• x 6= 0 if and only if there is an f ∈ V ∗ with f (x) = 1;

• x and y are linearly independent if and only if there is an f ∈ V ∗ so


that f (x) = 1, f (y) = 0 or f (x) = 0, f (y) = 1.

Proof. It suffices to prove these for the special case V = Rn where they
are trivial
If f is a non-zero element in the dual V ∗ of V , then the subset

Hαf = {x ∈ V : f (x) = α}

is called a hyperplane in V . It is an affine subspace of dimension one less


than that of V , in fact, a translate of the kernel of f . Every hyperplane is
of the above form and Hαf and Hβf are parallel (i.e. they can be mapped
onto each other by a translation). Then the above result can be expressed
as follows:

123
• a point x in V is non-zero if and only if it lies on some hyperplane
which does not pass through zero;
• two points x and y in V are linearly independent if and only if there
are parallel, but distinct, hyperplanes of the form Hαf and H0f so that
x ∈ Hαf and y ∈ H0f or vice versa.
The dual basis: Suppose now that V has a basis (x1 , . . . , xn ). For each i
there is precisely one fi ∈ V ∗ so that
fi (xi ) = 1 and fi (xj ) = 0 (i 6= j).
In other words, fi is that element of the dual space which P associates to each
x ∈ V its i-th coefficient with respect to (xj ) for if x = nj=1 λj xj , then
n
X
fi (x) = λj fi (xj ) = λi .
j=1

We claim that this sequence (fi ) is a basis for V ∗ —called


Pn the dual basis to

Pn f = i=1 f (xi )fi which should
(xi ). For if f ∈ V , it has the representation
be compared to the representation x = i=1 fi (x)xi in V . In order to prove
this it suffices to show that
Xn
f (x) = f (xi )fi (x)
i=1

for each x ∈ VPand this follows from an application of f to both sides of the
equation x = ni=1 fi (x)xi .
In order toPsee that the fi are linearly independent suppose that the linear
combination ni=1 λi fi is zero. Then applying this form to xj and using the
defining condition on the fi we see that λj = 0.
(Of course, the last step is, strictly speaking unnecessary since we already
know that V and V ∗ have the same dimension).
The principle used in this argument will be applied again and so, in order
to avoid tedious repetitions, we state an abstract form of it as a Lemma:
Lemma 3 Let V be a vector space whose elements are functions defined on
a set S with values in R so that the arithmetic operations on V coincide with
the natural ones for functions (i.e. (x+y)(t) = x(t)+y(t), (λx)(t) = λx(t))).
Then if x1 , . . . , xn is a sequence in V and there are points t1 , . . . , tn in S so
that
xi (tj ) = 0 (i 6= j) or 1 (i = j),
the sequence x1 , . . . , xn is linearly independent.
P
The proof is trivial. If a linear combination ni=1 λi xi vanishes, then evalu-
ation at tj shows that λj = 0.

124
Examples of dual bases: We calculate the dual bases to
• (1, 1), (1, 0) for R2 ;
• the canonical basis (1, t, . . . , tn ) for Pol (n).
(1) Let x1 = (1, 1), x2 = (1, 0) and let the dual basis be (f1 , f2 ) where
f1 = (ξ11 , ξ21), f2 = (ξ12 , ξ22). Then we have the four equations
f1 (x1 ) = ξ11 + ξ21 = 1 f2 (x1 ) = ξ12 + ξ22 = 0
f1 (x2 ) = ξ11 = 0 f2 (x2 ) = ξ12 = 1
with solutions f1 = (0, 1), f2 = (1, −1).
(2) Let fi be the functional
p(i) (0)
p 7→ .
i!
The of course, fi (tj ) = 1 if i = j and 0 otherwise. Hence (fi ) is the dual
basis and if p ∈ Pol (n), then its expansion
n
X n
X p(i) (0)
p= fi (p)ti = ti
i=0 i=0
i!

with respect to the natural basis is the formal Taylor expansion of p.


We now investigate the behaviour of the dual basis under coordinate
transformations. Let (xi ) resp. (x′j ) be bases for V with dual bases (fi ) and
(fj′ ). Let T = [tij ] be the transfer matrix from (xi ) to (x′j ) i.e. we have the
equations
Xn

xj = tij xi .
i=1
Then we know that the coordinate representations
n
X n
X
x= λi xi = λ′j x′j
i=1 j=1

of x with respect to the two bases are related by the equations


n
X n
X
λi = tij λ′j λ′i = t̃ij λj
j=1 j=1

where the matrix [t̃ij ] is the inverse of T . Remembering that λi = fi (x) we


can write these equations in the forms
n
X n
X
fi (x) = tij fj′ (x) resp. fi′ (x) = t̃ij fj (x).
j=1 j=1

125
Since these hold for any x ∈ V we have
n
X n
X
′ ′
fi = tij fj resp. fi = t̃ij fj .
j=1 j=1

Comparing these with the defining formula


Xn

fj = sij fi
i=1

for the transfer matrix S from (fi ) to (fj′ ) we see that S = (T t )−1 . Thus we
have proved
Proposition 39 If T is the transfer matrix from (xi ) to (x′j ), then (T t )−1
is the transfer matrix from (fi ) to (fj′ ).
We now consider duality for mappings. Suppose that f : V → W is a linear
mapping. We define the transposed mapping f t (which maps the dual W ∗
of W into V ∗ ) as follows: if g ∈ W ∗ , then f t (g) is defined in the natural way
as the composition g ◦ f i.e. we have the equation
f t (g) : x 7→ g(f (x)) or f t (g)(x) = g(f (x)).
As the notation suggests, this is the coordinate-free version of the transpose
of a matrix:
Proposition 40 If (x1 , . . . , xn ) resp. (y1 , . . . , ym ) are bases for V and W
resp. and f : V → W is a linear mapping with matrix A = [aij ], then the
matrix of f t with respect to the dual bases (g1 , . . . , gm ) and (f1 , . . . , fn ) is At ,
the transpose of A.
Pn
Proof.
Pn Pm The matrix A is determined by the fact that f maps j=1 λj xj into
j=1 ( i=1 aij λj )yi or, in terms of the fj ’s and gi ’s,
n
X
aij fj (x) = gi (f (x)) = f t gi (x)
j=1

(the latter equation by the definition of f t ).


Since this holds for each x we have
Xn
t
f (gi ) = aij fj
j=1

and if we compare this with the defining relation


n
X
t
f (gi ) = bji fj
j=1

for the matrix B of f , we see that B = At .


t

126
The bidual: If V is a vector space, we can form the dual of its dual space
i.e. the space (V ∗ )∗ which we denote by V ∗∗ . As we have already seen, the
vector space V is isomorphic to its dual space V ∗ and hence also to its bidual.
However, there is an essential difference between the two cases. The first was
dependent on an (arbitrary) choice of basis for V . We shall now show how to
define a natural isomorphism from V onto V ∗∗ which is independent of any
additional structure of V .

Definition: If V is a vector space, we construct a mapping iV from V into


V ∗∗ by defining the form iV (x) (x ∈ V ) as follows:

iV (x)(f ) = f (x) (f ∈ V ∗ ).

It is easy to see that iV is a linear injection. It is surjective since the dimen-


sions of V and V ∗∗ coincide.
We now turn to duality for subspaces: let M be a subspace of V . Then

M o = {f ∈ V ∗ : f (x) = 0 for all x ∈ M}

is a subspace of V ∗ —called the annihilator of M (in V ∗ ). Similarly, if N


is a subspace of V ∗ , then

No = {x ∈ V : f (x) = 0 for all f ∈ N}

is the annihilator of N in V . Notice that this is just i−1 o o


V (N ) where N is the
annihilator of N in V ∗∗ , the dual of V ∗ . Hence there is a perfect symmetry
in the relationship between M and M o resp. N and No . Because of this fact,
it will often suffice to prove just half of the statements of some of the next
results.

Proposition 41 If M ⊂ V and N ⊂ V ∗ , then

dim M + dim M o = dim V = dim V ∗ = dim N + dim No .

Proof. Choose a basis (xi ) for V so that (x1 , . . . , xr ) is one for M. Let
(fi ) be the dual basis. Then it is clear that M o = [fr+1 , . . . , fn ] (cf. the
calculation above) from which the first half of the equation follows. The
second follows from the symmetry mentioned above.

Corollar 7 If M and N are as above, then M = (M o )o andN = (No )o .

127
Proof. It is clear that M ⊂ (M o )o . To verify equality, we count dimensions:

dim(M o )0 = dim V − dim M o


= dim V − (dim V − dim M)
= dim M.

Proposition 42 If f : V → W is a linear mapping, then Ker f t = (Im f )o ,


Im (F t ) = (Ker f )o , Ker f = (Im f t )o , Im f = (Ker f t )o .

Proof. In fact, we only have to prove one of these result, say the first one
Ker f t = (Im f )o . This follows from the following chain of equivalences:

g ∈ Ker f t if and only if f t (g) = 0


if and only if (f t (g))(x) = 0 for x ∈ V
if and only if g(f (x)) = 0 for x ∈ V
if and only if g ∈ (Im f )o.

In order to obtain the other three we proceed as follows: if we take annihila-


tors of both sides we get

(Ker f t )o = ((Im)o )o = Im f

which is the fourth equation.


The remaining two are obtained by exchanging the roles of f, V, W with
those of f t , V ∗ and W ∗ in the two that we have just proved.
If we apply this result to the linear mapping defined by a matrix, we ob-
tain the following criterium for the solvability of a system of linear equations.
Proposition 43 Let A be an m × n matrix. Then the equation AX = Y
has a solution if and only if Y t Z = 0 for each solution Z of the homogeneous
equation At Z = 0.

Product and quotient spaces: If V1 and V2 are vector space, then so is


V = V1 × V2 under the operations

(x, y) + (x1 , y1) = (x + x1 , y + y1 )


λ(x, y) = (λx, λy)

as is easily checked. The mappings x 7→ (x, o) and y 7→ (0, y) are isomor-


phisms from V1 resp. V2 onto the subspaces

V˜1 = {(x, 0) : x ∈ V1 } resp. V˜2 = {(0, y) : y ∈ V2 }

128
and V = V˜1 ⊕ V˜2 . Hence V1 × V2 is sometimes called the external direct
sum of V1 and V2 .
It is easily checked that the dual (V1 × V2 )∗ of such a product is naturally
isomorphic to V1∗ ×V2∗ where a pair (f, g) in the latter space defines the linear
form
(x, y) 7→ f (x) + g(y).
We now introduce a construction which is in some sense dual to that of
taking subspaces and which can sometimes be used in a similar way to reduce
dimension. Suppose that V1 is a subspace of the vector space V . We introduce
an equivalence relation ∼ on V as follows:

x∼y if and only x − y ∈ V1 .

(i.e. we are reducing V1 and all the affine subspaces parallel to it to points).
V /V1 is, by definition, the corresponding set of equivalence classes {[x] : x ∈
V } where [x] = {y : y ∼ x}. V /V1 is a vector space in its own right, where
we define the operations by the equations

[x] + [y] = [x + y]
λ[x] = [λx]

and the mapping π : V → V /V1 which maps x onto [x] is linear and surjective.
Further we have the following characteristic property:

Proposition 44 Let f be a linear mapping from V into a vector space W


which vanishes on V1 . Then there is a linear mapping f˜ : V /V1 → W which
is such that f = f˜ ◦ π.

If we apply this to the case where W = R, we see that the dual space of V /V1
is naturally isomorphic to the polar V1o of V1 in V ∗ . From this it follows that
the dimension of (V /V1)∗ and hence of V /V1 is

dim V − dim V1 .

Exercises: 1) Calculate the dual basis to the basis

(1, 1, 1), (1, 1, −1), (1, −1, −1)

for R3 . R1
2) Calculate the coordinates of the functional p 7→ 0 p(t) dt on Pol (n) with
respect to the basis (fti ) where (ti ) is a sequence of distinct points in [0, 1]
and fti (p) = p(ti ).

129
3) Let (x1 , . . . , xn ) be a basis for the vector space V with dual basis (f1 , . . . , fn ).
Show that the set
(x1 , x2 − λ2 x1 , . . . , xn − λn x1 )
is a basis and that
(f1 + λ2 f2 + . . . λn fn , f2 + λ3 f3 + · · · + λn fn , . . . , fn )
is the corresponding dual basis.
4) Find the dual basis to the basis
(1, t − a, . . . , (t − a)n )
for Pol (n).
5) Let t0 , . . . , tn be distinct points of [0, 1]. Show that the linear forms fi :
x → x(ti ) form a basis for the dual of Pol (n). What is the dual basis?
6) Let V1 be a subspace of a vector space V and let f : V1 → W be a linear
mapping. Show that there is a linear mapping f˜ : V → W which extends f .
Show that if S is a subset of V and f : S → W an arbitrary mapping, then
there is an extension
Pn of f to a linear mapping f˜ fromPV into W if and only
if whenever i=1 λi xi = 0 (for x1 , . . . , xn ∈ S), then ni=1 λi f (xi ) = 0.
7) Let f be the linear form
Z 1
x 7→ x(t) dt
0

on Pol (n). Calculate D t f where D t is the transpose of the differentiation


operator D.
8) Let V1 and V2 be subspaces of a vector space V . Show that
(V1 + V2 )o = V1o ∩ V2o (V1 ∩ V2 )o = V1o + V2o .
9) Let P be a projection onto the subspace V1 of V . Show that P t is also a
projection and determine its range.
10) Show that if V and W are vector spaces, then L(V, W ) is naturally
isomorphic to the dual of L(V, W ) under the bilinear mapping
(f, g) 7→ tr (g ◦ f ).
(i.e. the mapping
g 7→ (f 7→ tr (g ◦ f ))
is an isomorphism from L(W, V ) onto L(V, W )∗ ).
11) Show that if f is a linear functional on the vector space Mn so that
f (Id) = n and f (AB) = f (BA) for each pair A, B, then f is the trace
functional (i.e. f (A) = tr A for each A).

130
5.2 Duality in euclidean spaces
As we have seen, any vector space V is isomorphic to its dual space. In the
special case where V = RnP we used the particular isomorphism y 7→ fy where
fy is the linear form x 7→ i ξi ηi . In this case we see the special role of the
scalar product and this suggests the following result:

Proposition 45 Let V be an euclidean space with scalar product ( | ). then


the mapping τ : y → fy where fy (x) = (x|y) is an isomorphism from V onto
V ∗.

Proof. τ is clearly linear and injective. It is surjective since dim V =


dim V ∗ .
Using this fact, the duality theory for euclidean spaces can be given a
more symmetric form. We illustrate this by discussing briefly the concept
of covariant and contravariant vectors (cf. Chapter II.9). If (xi ) is a basis,
consider the dual basis (fi ) for V ∗ . Then if we define xi to be τ −1 (fi ), (xi )
is of course a basis for V . Hence each x ∈ V has two representations,namely
X X
x= fi (x)xi = (x|xi )xi
i

which is called the contravariant representation and


X
x= (x|xi )xi
i

the covariant representation.


Note that a basis (xi ) is orthonormal exactly when it coincides with its
dual basis. Then the two representations for x are the same.
We have been guilty of an abuse of notation by denoting the adjoint
of an operator between euclidean spaces studied in Chapter VIII and the
adjoint introduced here by the same symbol. This is justified by the fact
that they coincide up to the identification of the spaces with their duals via
the mapping τ .

131
5.3 Multilinear mappings
In this and the following section, we shall consider the concepts of multilinear
mappings and tensors. In fact, these are just two aspects of the same math-
ematical phenomenon—the difference in language having arisen during their
historical development. We begin with the concept of a multilinear mapping:

Definition: Let V1 , . . . , Vn and W be vector spaces. A multilinear mapping


from V1 × · · · × Vn into W is a mapping
f : V1 × · · · × Vn → W
so that
f (x1 , . . . , λxi + µx′i , . . . , xn ) = λf (x1 , . . . , xi , . . . , xn ) + µf (x1 , . . . , x′i , . . . , xn )
for each choice of i, x1 , . . . , xi , x′i , . . . , xn , λ, µ (i.e. T is linear in each variable
separately). Note that the linear mappings correspond to the special case
n = 1.
We denote the space of all such mappings by Ln (V1 , . . . , Vn ; W ). If V1 =
· · · = Vn (which will almost always be the case in applications), we write
Ln (V ; W ). If the range space is R, we denote it by Ln (V1 , . . . , Vn ) resp.
Ln (V ). In particular, the space L1 (V ) is just the dual V ∗ .
In order to simplify the presentation, we shall, for the present, confine
ourselves to the simple case L2 (V1 , V2 ) of bilinear forms on the product of
two vector spaces. In this case, the defining conditions can be combined in
the form X
f (λ1 x1 + λ2 x2 , µ1 y1 + µ2 y2 ) = λi µj f (xi , yj ).
i,j=1,2

More generally, we have


n
X m
X X
f( λi xi , µj y j ) = λi µj f (xi , yj )
i−1 j=1 i,j

for any m. We have already met several examples of bilinear forms.P For exam-
ple, the scalar product on a euclidean space and the bilinear form i,j aij ξi ηj
associated with a conic section. In fact, the typical bilinear form on Rn can
be written as X
f (x, y) = aij ξi ηj
i,j

for suitable coefficients [aij ]. For if we put aij = f (ei , ej ) then


n
X n
X X
f (x, y) = f ( ξi ei , ηj ej ) = aij ξi ηj .
i=1 j−1 i,j

132
In matrix notation this can be conveniently written in the form X t AY where
X and Y are the column matrices
   
ξ1 η1
 ..   .. 
 .   . .
ξn ηn
Just as in the case of the representation of linear operators by matrices, this is
completely general and so if V1 and V2 are spaces with bases (x1 , . . . , xm ) resp.
(y1 , . . . , yn ) and if f ∈ L2 (V1 , V2 ), then A = [aij ] where aij = f (xi , yj ) is called
the matrix of f with respect to these bases and we have the representation
X
f (x, y) = aij λi µj
i,j
P P
where x = i λi xi and y = j µj xj . We can express this fact in a more
abstract way as follows. Suppose that f ∈ V1∗ and g ∈ V2∗ . Then we define
a linear functional f ⊗ g on V1 × V2 as follows:
f ⊗ f : (x, y) 7→ f (x)g(y).
Proposition 46 If (fi ) and (gi ) are the dual bases of V1 and V2 , then (fi ⊗gj )
is a basis for L2 (V1 , V2 ). Hence the dimension of the latter space is dim V1 ·
dim V2 .
Proof. The argument above shows that these elements span L2 (V1 , V2 ). On
the other hand, fi ⊗ gj (xk , yl ) vanishes unless i = k and j = l in which case
its value is one. Hence the set is linearly independent by the Lemma above.
We have thus seen that both linear mappings and bilinear forms are rep-
resentable by matrices. However, it is important to note that the formula for
the change in the representing matrices induced by new coordinate systems
is different in each case as we shall now see. For suppose that we introduce
new bases (x′1 , . . . , x′m ) resp. (y1′ , . . . , yn′ ) in the above situation with transfer
matrices S = [sij ] and T = [tkl ] i.e.
X X
x′j = sij xi yj′ = tkl yk .
i k

Now if A is the matrix of f with respect to the new bases, then
a′ji = f (x′j , yl′)
X X
= f( sij xi , tkl yk )
i k
XX
= sij aik tkl
i k

133
which is the (j, l)-th element S t AT . Thus we have the formula
A′ = S t AT
for the new matrix which should be compared with that for the change in
the matrix of a linear mapping.
In the particular case where V1 = V2 = V and we use the same basis for
each space, the above equation takes on the form
A′ = S t AS.
It is instructive to verify this formula with the use of coordinates. In
matrix notation we have
f (x, y) = X t AX = (X ′ )t A′ (Y ′ )
where X, , Y, X ′, Y ′ are the column matrices composed of the coordinates of
x and y with respect to the corresponding matrices. Now we know that
X = SX ′ and Y = T Y ′ and if we substitute this in the formula we get
f (x, y) = (SX ′ )t A(T Y ′ ) = (X ′ )t (S t AT )(Y ′ )
as required.
We can distinguish two particularly important classes of bilinear forms f
on the product V × V of a vector space with itself. f ∈ L2 (V ) is said to be
• symmetric if f (x, y) = f (y, x)(x, y ∈ V );
• alternating if f (x, y) = −f (y, x)(x, y ∈ V ).
If f has the coordinate representation
X
aij fi ⊗ fj
i,j

with respect to the basis (x1 , . . . , xn ), then f is symmetric (resp. alternating)


if and only if A = At (resp. A = −At ). (For aij = f (xi , xj ) = ±f (xj , xi ) =
±aji according as f is symmetric or alternating).
Symmetric forms with representations
p
X p+q
X
f= fi ⊗ fi − fi ⊗ fi
i=1 i=p+1

are particularly transparent. On Rn these are the forms


f (x, y) = ξ1 η1 + · · · + ξp ηp − (ξp+1ηp+1 + · · · + ξp+q ηp+q ).
The central result on symmetric forms is the following:

134
Proposition 47 Let f be a symmetric bilinear form on V . Then there is a
basis (xi ) of V and integers p, q with p + q ≤ n so that
p p+q
X X
f= fi ⊗ fi − fi ⊗ fi .
i=1 i=p+1

Before proving this result, we restate it as one on matrices:


Proposition 48 Let A be a symmetric n × n matrix. Then there is an
invertible n × n matrix S so that

S t AS = diag (1, . . . , 1, −1, . . . , −1, 0, . . . , 0).

Proof. We prove the matrix form of the result. First note that the result
on the diagonalisation of symmetric operators on euclidean space provides a
unitary matrix U so that U t AU = diag (λ − 1, . . . , λn ) where the λi are the
eigenvalues and can be ordered so that the first p (say) are positive, the next
q are negative and the rest zero. Now put
1 1 1 1
T = diag ( √ , . . . , p , p ,..., p , 1, . . . , 1).
λ1 λp −λp+1 −λp+q

Then if S = UT ,  
Ip 0 0
S t AS =  0 −Iq 0  .
0 0 0
We now turn to the signs involved in the canonical form
 
Ip 0 0
 0 −Iq 0 .
0 0 0

Although there is a wide range of choice of the diagonalising matrix S, it


turns out that the arrangement of signs is invariant:
Proposition 49 Proposition (Sylvester’s law of inertia) Suppose that the
symmetric form f has the representations
   
Ip 0 0 Ip ′ 0 0
 0 −Iq 0  resp.  0 −Iq′ 0 
0 0 0 0 0 0

with respect to bases (xi ) resp. (x′j ). Then p = p′ and q = q ′ .

135
Proof. First note that p + q and p′ + q ′ are both equal to the rank of the
corresponding matrices and so are equal. Now put

Mp = [x1 , . . . , xp ] Mp′ = [x′1 , . . . , x′p′ ]

Nq = [xp+1 , . . . , xn ] Nq′ = [x′p′ +1 , . . . , x′n ]


and note that f (x, x) > 0 for x ∈ Mp \ {0} (resp. Mp′ \ {0}) and f (x, x) ≤ 0
for x ∈ Nq resp. Nq′ .
Hence Mp′ ∩ Nq = {0} = Mp ∩ Nq′ . Suppose that p 6= p′ , say p′ > p. Then

dim(Mp′ ∩ Nq ) = dim Mp′ + dim Nq − dim(Mp′ + Nq )


≥ p′ + (n − p) − n
= p′ − p > 0

which is a contradiction. Then p = p′ and so q = q ′ .


The above proof of the diagonalisation of a symmetric bilinear form is
rather artificial in that it introduces in an arbitrary way the inner product
on Rn . We give a direct, coordinate-free proof as follows: suppose that φ
is such a form. We shall show, by induction on the dimension of V , how to
construct a basis (x1 , . . . , xn ) with respect to which the matrix of φ is
 
Ip ′ 0 0
 0 −Iq′ 0 
0 0 0

for suitable p, q.
Proof. We choose a vector x̃1 with φ(x̃1 , x̃1 ) 6= 0. (If there is no such vector
then the form φ vanishes and the result is trivially true). Now let V1 be the
linear span [x̃1 ] and put

V2 = {y ∈ V : φ(x̃1 , y) = 0}.

It is clear that V1 ∩ V2 = {0} and the expansion


φ(x̃1 , y)
y= x̃1 + z
φ(x̃1 , x̃1
φ(x̃1 , y)
where z = x̃1 (from which it follows that z ∈ V2 ) shows that V =
φ(x̃1 , x̃1
V1 ⊗ V2 . By the induction hypothesis, there is a suitable basis (x2 , . . . , xn )
x̃1 x̃1
for V2 . We define x1 to be p if φ(x̃1 , x̃1 ) > 0 and p
φ(x̃1 , x̃1 ) −φ(x̃1 , x̃1 )
otherwise. Then the basis (x1 , . . . , xn ) has the required properties.

136
A symmetric bilinear form φ is said to be non-singular if whenever
x ∈ V is such that φ(x, y) = 0 for each y ∈ V , then x vanishes. The reader
can check that this is equivalent to the fact that the rank of the matrix of φ
is equal to the dimension of V (i.e. p + q = n). In this case, just as in the
special case of a scalar product, the mapping

τ : x 7→ (y 7→ φ(x, y))

is an isomorphism from V onto its dual space V ∗ . The classical example of


a non positive definite form which is non-singular, is the mapping

φ : (x, y) 7→ ξ1 η1 − ξ2 η2

on R2 .
Most of the results above can be carried over to the space L(V1 , . . . , Vr ; W )
of multilinear mappings from V1 ×· · ·×Vr into W . We content ourselves with
the remark that if we have bases (x11 , . . . , x1n1 ), . . . (xr1 , . . . , xrnr ) for V1 , . . . , Vr
with the corresponding dual bases and (y1 , . . . , yp ) for W , then the set

(fi11 ⊗ · · · ⊗ firr ⊗ yj : 1 ≤ i1 ≤ n1 , . . . , 1 ≤ ir ≤ nr , 1 ≤ j ≤ p)

is a basis for L(V1 , . . . , Vr ; W ) where for fi ∈ V1∗ , . . . , fr ∈ Vr∗ , and y ∈ W ,


f1 ⊗ · · · ⊗ fr ⊗ y is the mapping

(x1 , . . . , xr ) 7→ f1 (x1 ) . . . fr (xr )y.

In particular, the dimension of the latter space is

(dim V1 ) . . . (dim Vr )(dim W ).

Example: Calculate the coordinates of

f : (x, y) 7→ 2ξ1 η1 − ξ1 η2 + ξ2 η1 − ξ2 η2

with respect to the basis (1, 0), (1, 1). P


Solution: The dual basis is f1 = (1, −1), f2 = (0, 1). Then f = aij fi ⊗ fj
where aij = f (xi , xj ) with x1 = (1, 0), x2 = (1, 1). Thus a11 = 2, a21 = 1,
a21 = 3, a22 = 1 i.e.

f = 2f1 ⊗ f1 + f1 ⊗ f2 + 3f2 ⊗ f1 + f2 ⊗ f2 .

137
Exercises: 1) Reduce the following forms on R3 to their canonical forms:
• (ξ1 , ξ2 , ξ3 ) 7→ ξ1 ξ2 + ξ2 ξ3 + ξ3 ξ2 ;
• (ξ1 , ξ2 , ξ3 ) 7→ ξ2 η1 + ξ1 η2 + 2ξ2 η2 + 2ξ2η3 + 2ξ3 η2 + 5ξ3η3 .
2) Find the matrices of the bilinear forms
Z 1
(x, y) 7→ x(t)y(t) dt
0
(x, y) 7→ x(0)y(0)
(x, y) 7→ x(0)y ′ (0)
on Pol (n).
3) Let f be a symmetric bilinear form on V and φ be the mapping x 7→
f (x, x). Show that
• f (x, y) = 14 (φ(x + y) − φ(x − y)) (V real);
• f (x, y) = 14 (φ(x + y) − φ(x − y) + iφ(x + iy) − iφ(x − iy)) (V complex).
(This example shows how we can recover a symmetric 2-form from the
quadratic form it generates i.e. its values on the diagonal).
4) Let x1 , . . . , xn−1 be elements of Rn . Show that there exists a unique
element y of Rn so that(x|y) = det X for each x ∈ Rn where X is the matrix
with rows x1 , x2 , . . . , xn−1 , x. If we denote this y by
x1 × x2 × · · · × xn−1
show that this cross-product is linear in each variable xi (i.e. it is an
(n − 1)-linear mapping from Rn × · · · × Rn into Rn ). (When n = 3, this
coincides with the classical vector product studied in Chapter II).
5) Two spaces V and W with symmetric bilinear forms φ and ψ are said to
be isometric if there is a vector space isomorphism f : V → W so that
ψ(f (x), f (y)) = φ(x, y) (x, y ∈ V ).
Show that this is the case if and only if the dimensions of V and W respec-
tively the rank and signatures of φ and ψ coincide.
6) Let A be a symmetric, invertible n × n matrix. Show that the quadratic
form on Rn induced by A−1 is
 
0 ξ1 . . . ξn
1   η1


(x, y) 7→ −  .. .
det A  . A 
ηn ...

138
7) Let φ be a symmetric bilinear form on the vector space V . Show that V
has a direct sum representation

V = V+ ⊕ V− ⊕ V0

where

V0 = {x : φ(x, x) = 0}
V+ = {x : φ(x, x) > 0} ∪ {0}
V− = {x : φ(x, x) < 0} ∪ {0}.

8) Let ( | ) be a symmetric bilinear form on the vector space V and put

N = {x ∈ V : (x|y) = 0 for each y ∈ V }.

Show that one can define a bilinear form ( | )1 on V /N so that (π(x)|π(y))1 =


(x|y) for each x, y. Show that this form is non-singular. What is dim V /N?
(π denotes the natural mapping x 7→ [x] from V onto the quotient space).
9) What is the rank of the bilinear form with matrix
 
a 1 1 ... 1
 1 a 1 ... 1 
 
 .. .. ?
 . . 
1 1 1 ... a

10) Show that a multilinear form f ∈ Lr (V ) is alternating if and only if


f (x1 , . . . , xr ) = 0 whenever two of the xi coincide.
Show that every bilinear form on V has a unique representation f = fs +fa
as a sum of a symmetric and an alternating form.
11) R2 , together with the bilinear form

(x, y) 7→ ξ1 η1 − ξ2 η2

is often called the hyperbolic plane. Show that if V is a vector space with
a non-singular inner product and there is a vector x with (x|x) = 0, then
V contains a two dimensional subspace which is isometric to the hyperbolic
plane.
12) Suppose that φ and ψ are bilinear forms on a vector space V so that

{x : φ(x, x) = 0} ∩ {x : ψ(x, x) = 0} = {0}.

Show that there is a basis for V with respect to which both φ and ψ have
upper triangular matrices. Deduce that if φ and ψ are symmetric, there is a
basis for which both are diagonal.

139
13) Let A = [aij ] be an n × n symmetric matrix and letAk denote the sub-
matrix  
a11 . . . a1k
 .. ..  .
 . . 
ak1 . . . akk
Show that if the determinants of each of the Ak are non-zero, then the cor-
responding quadratic form Q(x) = (Ax|x) can be written in the form

Xn
det Ak 2
η
k=1
det Ak−1 k
P
where ηk = ξk + nj=k+1 bjk ξj for suitable bik .
Deduce that A is positive definite if and only if each det Ak is positive.
Can you give a corresponding characterisation of positive semi-definiteness?
14) Show that if f is a symmetric mapping in Lr (V ; W ), then
1 X
f (x1 , . . . , xr ) = ǫ1 . . . ǫr f (ǫ1 x1 + · · · + ǫr xr )
r!2r
the sum being taken over all choices (ǫi ) of sign (i.e. each ǫi is either 1 or
−1—there being 2r summands).
15) Let φ be an alternating bilinear form on a vector space V . Show that V
has a basis so that the matrix of the form is
 
0 Ir 0
 −Ir 0 0  .
0 0 0

16) Let φ be as above and suppose that the rank of φ is n. Then it follows
from the above that n is even i.e. of the form 2k for some k and V has a
basis so that the matrix of φ is
 
0 Ik
J= .
−Ik 0

A space V with such a φ is called a symplectic vector space. A linear


mapping f on such a space is called symplectic if φ(f (x), f (y)) = φ(x, y).
Show that this is the case if and only if At JA = A where J is as above and A
is the matrix of f with respect to this basis. Deduce that A has determinant
1 and so an isomorphism. Show that if λ is an eigenvalue of f , then so are
1
λ
, λ̄ and λ̄1 .

140
5.4 Tensors
We now turn to tensor products. In fact, these are also multilinear mappings
which are now defined on the dual space. However, because of the symmetry
between a vector space and its dual this is of purely notational significance.

Definition: If V1 and V2 are vector space, the tensor product V1 ⊗ V2 of


V1 and V2 is the space L2 (V1∗ , V2∗ ) of bilinear forms on the dual spaces V1∗
and V2∗ . If x ∈ V1 , y ∈ V2 , x ⊗ y is the form

(f, g) 7→ f (x)f (y)

on V1∗ × V2∗ . We can then translate some previous results as follows:

• the mapping (x, y) 7→ x ⊗ y is bilinear from V1 × V2 into V1 ⊗ V2 . In


other words we multiply out tensors in the usual way:
X X X
( λi xi ) ⊗ ( µj y j ) = λi µj xi ⊗ yj ;
i j i,j

• if (xi ) resp. (yj ) is a basis for V1 resp. V2 , then

(xi ⊗ yj : 1 ≤ i ≤ m, 1 ≤ j ≤ n)

is a P
basis for V1 ⊗ V2 and so each z ∈ V1 ⊗ V2 has a representation
z = i,j aij xi ⊗ yj where aij = z(fi , gj ).

Once again, this last statement implies that every tensor is described by a
matrix. Of course the transformation laws for the matrix of a tensor are
again different from those that we have met earlier and in fact we have the
formula
A′ = S −1 A(T −1 )t
where A is the matrix of z with respect to (xi ) and (yj ), A′ is the matrix
with respect to (x′i ) and (zj′ ) and S and T are the corresponding transfer
matrices.
Every tensor z ∈ V1 ⊗ V2 is thus representable as a linear combination
of so-called simple tensors i.e. those of the form x ⊗ y (x ∈ V1 , y ∈ V2 )
(stated more abstractly, the image of V1 ×V2 in V1 ⊗V2 spans the latter). Not
every tensor
P is simple. This
P can be perhaps most easily verified as follows:
if x = i λi xi and y = j µj yj , then the matrix of x ⊗ y is [λi µj ] and this
has rank 1. Hence if the matrix of a tensor has rank more than one, it is not
a simple tensor.

141
Tensor products of linear mappings: Suppose now that we have linear
mappings f ∈ L(V1 , W1 ) and g ∈ L(V2 , W2 ). Then we can define a linear
mapping f ⊗ g from V1 ⊗ V2 into W1 ⊗ W2 by putting
X X
f ⊗ g( xi ⊗ yi ) = f (xi ) ⊗ g(yi ).
i i

Higher order tensors: Similarly, we can define V1 ⊗ · · · ⊗ Vr to be the


space Lr (V1∗ , . . . , Vr∗ ) of multilinear mappings on V1∗ × · · · × Vr∗ . Then if
(x11 , . . . , x1n1 ), . . . , (xr1 , . . . , xrnr ) are bases for V1 , . . . , Vr , the family

(x1i1 ⊗ · · · ⊗ xrir : 1 ≤ i1 ≤ n1 , . . . , 1 ≤ ir ≤ nr )

is a basis for the tensor product and so every tensor has a representation
X
ti1 ...ir x1i1 ⊗ · · · ⊗ xrir .
1≤i1 ,...,ir ≤n

In practice one is almost always interested in tensor products of the form

V ⊗V ⊗···⊗V ⊗V∗ ⊗···⊗V∗


N
with p copies of V and q copies of the dual. We denote this space by qp V —
its elements are called (p + q) tensors on V . They are contravariant of
degree p and covariant Nq of degree q.
Thus the notation p V is just a new one for the space

Lp+q (V ∗ , . . . , V ∗ , V, . . . , V )

of multilinear forms on V ∗ × · · · × V ∗ × V × · · · × V . (Strictly speaking, the


last q spaces should be V ∗∗ ’s but we are tacitly identifying V with V ∗∗ ).
We now bring a useful list of operations on tensors. In doing so, we
shall specify them only on simple tensors. This means that they will be
defined on typical basis elements. They can then be extended by linearity or
multilinearity to arbitrary tensors.

Multiplication: We can multiply a tensor of degree (p, q) with one of


degree (p1 , q1 ) to obtain one ofNdegree (p
N+ p1 , q + qN
1 ). More precisely, there
is a bilinear mapping m from q V × q1 V into p+p
p p1 1
q+q1 V whereby

m(x1 ⊗ · · · ⊗ xp ⊗ f1 ⊗ · · · ⊗ fq , x′1 ⊗ · · · ⊗ x′p1 ⊗ f1′ ⊗ · · · ⊗ fq′1 )

is
(x1 ⊗ · · · ⊗ xp ⊗ x′1 ⊗ · · · ⊗ x′p1 ⊗ f1 ⊗ · · · ⊗ fq′1 ).

142
Contraction: We can reduce a tensor of order (p, q) to one of degree (p −
1, q − 1) by applying a covariant componentNpto a contravariant
Np−1 one. More
precisely, there is a linear mapping c from q V into q−1 V where

c(x1 ⊗ · · · ⊗ xp ⊗ f1 ⊗ . . . fq ) = f1 (xp )x1 ⊗ . . . xp−1 ⊗ f2 ⊗ · · · ⊗ fq .

Raising or lowering an index: In the case where V is a euclidean space,


we can apply the operator τ or its inverse to a component of a coordinate
to change it from a covariant
N oneNto a contravariant one or vice versa. For
example we can map pq V into p−1 q+1 V as follows:

x1 ⊗ . . . ξp ⊗ f1 ⊗ · · · ⊗ fq 7→ x1 ⊗ · · · ⊗ xp−1 ⊗ τ xp ⊗ f1 ⊗ · · · ⊗ fq .

We continue with some brief remarks on the standard notation for tensors.
V is a vector space with basis (e1 , . . . , en ) and we denote by (f1 , . . . , fn ) the
dual basis. Then we write (e1 , . . . , en ) for the corresponding basis for V ,
identified with V ∗ by way of a scalar product (i.e. ei = τ −1 (fi )). Then we
have bases

• (eij ) for V ⊗
PV where eij = ei ⊗ ej and a typical tensor has a represen-
tation z = ij ξ eij where ξ ij = fi ⊗ fj (z) = (eij |z) with eij = ei ⊗ ej ).
ij

• (eji ) for V ⊗ VP

where eji = ei ⊗ ej and a typical tensor z has the
representation i,j ξji eji where ξji = (z|eij ).
N
In the general tensor space pq V we have a basis
j ...j
(ei11...ipq )

where the typical element is

ei1 ⊗ · · · ⊗ eip ⊗ ej1 ⊗ · · · ⊗ ejq

and the tensor z has the representation


X i ...i j ...j
ξj11 ...jpq ei11...ipq .

 
0 2
Example: Let f be the operator induced by the 2×2 matrix A =
1 −1
resp. by  
−1 1
B=
2 0

143
. We calculate the matrix of f ⊗ g with respect to the basis (y1 , y2 , y3 , y4)
where

y1 = e1 ⊗ e1 y2 = e1 ⊗ e2 y3 = e2 ⊗ e1 y4 = e4 ⊗ e4 .

Then

f (y1 ) = −y3 + 2y4


f (y2 ) = y3
f (y3 ) = −2y1 + 4y2 + y3 − 2y4
f (y4 ) = 2y1 − y3 .

Hence the required matrix is


 
0 0 −2 2
 0 0 4 0 
 .
 −1 1 1 −1 
2 0 −2 0

We can write this in the suggestive form


     
−1 1 −1 1
 0 2 0 2
0  
    2 
 −1 1 −1 1 
1 −1
2 0 2 0
from which the following general pattern should be clear.

Proposition 50 If A is the matrix of the linear mapping f and B that of


g, then the matrix of f ⊗ f is
 
a11 B a12 B . . . a1n B
 .. ..  .
 . . 
an1 B a2n B . . . ann B

(This matrix is called the Kronecker product of A and B—written A⊗B).

Note that the basis used to define the matrix of the tensor product is that one
which is obtained by considering the array (xi ⊗ yn ) of tensor products and
numbering it by reading along the successive rows n the customary manner.
Using these matrix representation one can check the following results:

• if f and g are injective, then so is f ⊗ g;

144
• r(f ⊗ g) = r(f )r(g);
• tr (f ⊗ g) = tr f · tr g;
• det(f ⊗ g) = (det f )m (det g)n where m is the dimension of the space
on which f acts, resp. n that of g.
One proves these results by choosing bases for which the matrices have a
simple form and then examining the Kronecker product. For example, con-
sider 3) and 4). We choose bases so that f and g are in Jordan form. Then
it is clear that the Kronecker product is upper triangle (try it out for small
matrices) and the elements in the diagonal are products of the form λi µj
where λi is an eigenvalue of f and µj one of g. The formulae 3) and 4) follow
by taking the sum resp. the product and counting how often the various
eigenvalues occur.
If the underlying spaces to be tensored are euclidean, then the same is
true of the tensor product space. For example, if V1 and V2 are euclidean,
then the mapping
(x ⊗ y|x1 ⊗ y1 ) 7→ (x|x1 )(y|y1)
can be extended to a scalar product on V1 ⊗V2 . Note that the latter is defined
so as to ensure that if (xi ) is an orthonormal basis for V1 and (yj ) one for
V2 , then (xi ⊗ yj ) is also orthonormal. In this context, we have the natural
formula
(f ⊗ g)t = f t ⊗ g t
relating the adjoint of f ⊗ g with those of f and g. This easily implies that
the tensor product of two self-adjoint mappings is itself self-adjoint. The
same holds for normal mappings. Also we have the formula
(f ⊗ g)† = f † ⊗ g †
for the Moore-Penrose inverse of a tensor product.

Example: Consider the matrix


 
1 2 0 0 1 2
 2 1 0 0 2 1 
A=  1 2
.
3 6 −1 −2 
2 2 6 3 −2 −1
Then its Moore-Penrose inverse can be easily calculated by noting that it is
the Kronecker product of the matrices
   
1 2 1 0 1
and
2 1 1 3 −1

145
whose Moore-Penrose inverses can be readily computed.
Tensor products can be used to solve certain types of matrix equation
and we continue this section with some remarks on this theme. Firstly we
note that if p is a polynomial in two variables, say
X
p(s, t) = aij si tj ,
i,j

and A and B are n × n matrices, then we can define a new operator p(A, B)
by means of the formula
X
p(A, B) = ai,j Ai ⊗ B j
i,j

(Warning: this is not the matrix obtained by substituting A and B for s and
t resp.—the latter is an n × n matrix whereas the matrix above is n2 × n2 ).
The result which we shall require is the following:
Proposition 51 The eigenvalues of the above matrix are the scalars of the
form p(λi , µj ) where λi , . . . , λn are the eigenvalues of A and µ1 , . . . , µn are
those of B.
This is proved by using a basis for which A has Jordan form. The required
matrix then has an upper triangular block form with diagonal matrices of
type p(λi , B) i.e. matrices which are obtained by substituting an eigenvalue
λi of A for s and the matrix B for t. This matrix has eigenvalues p(λi , µj )
(j1 , . . . , n) from which the result follows.
The basis of our application of tensor products is the following simple
remark. The space Mm,n of m × n matrices is of course identifiable as a
vector space with Rmn . In the following we shall do this systematically by
associating to an m × n matrix X = [X1 . . . Xn ] the column vector
 
X1
 
X̃ =  ... 
Xn

i.e. we place the columns of X on top of each other. The property that we
shall require in order to deal with matrix equations is the following:
Proposition 52 Let A be an m × m matrix, X an m × n matrix and B an
n × n matrix. Then we have the equation

^ = (B t ⊗ A)X̃.
AXB

146
Pn
Proof. If X = [X1 . . . Xn ] then the j-th column of AXB is k=1 (bjk A)Xk
and this is
[b1j A b2j A . . . bnj A]X̃
which implies the result.
The following special cases will be useful below.
g = (In ⊗ A)X;
AX
g = (B t ⊗ Im )X;
XB
AX^
+ XB = ((In ⊗ A) + (B t ⊗ Im )X̃.

The above results can be used to tackle equations of the form

A1 XB1 + · · · + Ar XB r = C.

Here the A’s are the given m × m matrices, the B’s are n × n and V is m × n.
X is the unknown. Using the abovePr apparatus, we can rewrite the equation
t
in the form GX̃ = C̃ where C = j=1 Bj ⊗ Aj .
Rather than consider the most general case, we shall confine our attention
to one special one which often occurs in applications, namely the equation

AX + XB = C.

In this case, the matrix G is

In ⊗ A + B t ⊗ Im .

This is just p(A, B t ) where p(s, t) = s+t. Hence by our preparatory remarks,
the eigenvalues of G are the scalars of the form λi + µj where the λ’s are the
eigenvalues of A and the µj are those of B.
Hence we have proved the following result:

Proposition 53 The equation AX + XB = C has a solution for any C if


and only if λi + µj 6= 0 for each pair λi and µj of eigenvalues of A and B
respectively.

For the above condition means that 0 is not an eigenvalue of G̃ i.e. this
matrix is invertible.
We can also get information for the case where the above general equation
is not always solvable. Consider, for example, the equation

AXB = C

147
where we are not assuming that A and B are invertible. Suppose that S and
T are generalised inverses for A and B respectively. Then it is clear that
S ⊗ T is one for A ⊗ B. If we rewrite the equation AXB = C in the form

(At ⊗ B)X̃ = C̃

then it follows form the general theory of such inverses that it has a solution
if and only if we have the equality

ASCT B = C.

The general solution is then

X = SCT + Y − SAY BT

whereby Y is arbitrary.
In general, one would choose the Moore-Penrose inverses A† and B † for
S and T . This gives the solution

X = A† CB †

which is best possible in the usual sense.

Exercises: 1) Let x1 , . . . , xn be elements of V . Show that

x1 ⊗ · · · ⊗ xn = 0

if and only if at least one xi vanishes. If x, y, x′ , y ′ are non-vanishing elements


of V show that x ⊗ y = x′ ⊗ y ′ if and only if there is a λ ∈ R so that x = λx′
and y = λ1 y ′ .
2) Let V1 and V2 be vector spaces and denote by φ the natural bilinear
mapping (x, y) 7→ x⊗y from V1 ×V2 into V1 ⊗V2 . Show that for every bilinear
mapping T : V1 × V2 → W , there is a unique linear mapping f : V1 ⊗ V2 → W
so that T = f ◦ φ.
Show that this property characterises the tensor product i.e. if U is a
vector space and ψ is a bilinear mapping from V1 × V2 into U so that the
natural analogue of the above property holds, then there is an isomorphism
g from U onto V1 ⊗ V2 so that g ◦ ψ = φ.
3) Let (xi ) resp. (x′i ) be bases for the vector space V and suppose that
A = [aij ] is the corresponding transfer matrix. Show that if a p-tensor z has
the coordinate representation
X
z= ti1 ,...,ip fi1 ⊗ · · · ⊗ fip
1≤i1 ,...,ip ≤n

148
with respect to the first basis and
X
z= t′i1 ,...,ip fi′1 ⊗ · · · ⊗ fi′p
1≤i1 ,...,ip ≤n

with respect to the second, then


X
t′i1 ,...,ip = ai1 j1 . . . aip jp tj1 ...jp .
1≤j1 ,...,jp ≤n

4) Show that the tensor product

Pol (m) ⊗ Pol (n)

is naturally isomorphic to the vector space of polynomials in two variable, of


degree at most m in the first and at most n in the second.
5) What is the trace resp. the determinant of the linear mapping

f 7→ φ ◦ f ◦ ψ

on L(V ) where φ and ψ are fixed members of L(V ).


6) Consider the linear mapping

Φ : f 7→ f t

on L(V ) where V is Hermitian. Calculate the matrix of f with respect to the


usual basis for L(V ) derived from an orthonormal basis for V . Is Φ normal
resp. unitary with respect to the scalar product

(f |g) = tr (g tf ).

Calculate the characteristic polynomial resp. the eigenvalues of Φ.


7) Let p and q be polynomials and A and B be matrices with p and q as
characteristic polynomials (for example, the companion matrices of p and q).
Show that the number ∆ = det(A ⊗ I − I ⊗ B) is a resolvent of p and q i.e.
has the property that it vanishes if and only if p and q have a common zero.
8) Consider the mapping
X 7→ AXAt
on the vector space of the n × n symmetric matrices, where A is a given n × n
matrix. Show that the determinant of this mapping is (det A)n+1 .

149

You might also like