Lalg 2
Lalg 2
J. B. Cooper
Johannes Kepler Universität Linz
Contents
1 DETERMINANTS 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Existence of the determinant
and how to calculate it . . . . . . . . . . . . . . . . . . . . . 5
1.3 Further properties of the determinant . . . . . . . . . . . . . . 10
1.4 Applications of the determinant . . . . . . . . . . . . . . . . . 20
3 EIGENVALUES 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Characteristic polynomials and diagonalisation . . . . . . . . . 44
3.3 The Jordan canonical form . . . . . . . . . . . . . . . . . . . . 50
3.4 Functions of matrices and operators . . . . . . . . . . . . . . 63
3.5 Circulants and geometry . . . . . . . . . . . . . . . . . . . . . 71
3.6 The group inverse and the Drazin inverse . . . . . . . . . . . 74
1
4.9 Positive definite matrices . . . . . . . . . . . . . . . . . . . . 119
2
1 DETERMINANTS
1.1 Introduction
In this chapter we treat one of the most important themes of linear algebra—
that of the determinant. We begin with some remarks which will motivate
the formal definition:
I. Recall that the system
ax + by = e
cx + dy = f
corresponding to the unknown by the column vector on the right hand side).
Earlier, we displayed a similar formula for the solution of a system of
three equations in three unknowns. It is therefore natural to ask whether
we can define a function det on the space Mn of n × n matrices so that the
solution of the equation AX = Y is, under suitable conditions, given by the
formula
det Ai
xi =
det A
3
where Ai is the matrix that we obtain by replacing the i-th column of A by
Y i.e.
a11 a12 . . . a1,i−1 y1 a1,i+1 . . . a1n
Ai = ... ..
. .
an1 an2 . . . an,i−1 yn an,i+1 . . . ann
II. Recall that
a b
ad − bc = det
c d
is the area of the parallelogram spanned by the vectors (a, c) and (b, d). Now
if f is the corresponding linear mapping on R2 , this is just the image of the
standard unit square (i.e. the square with vertices (0, 0), (1, 0), (0, 1), (1, 1))
under f . The natural generalisation would be to define the determinant of
an n × n matrix to be the n-dimensional volume of the image of the standard
hypercube in Rn under the linear mapping induced by the matrix. Although
we do not intend to give a rigorous treatment of the volume concept in higher
dimensional spaces, it is geometrically clear that it should have the following
properties:
a) the volume of the standard hypercube is 1. This means that the determi-
nant of the unit matrix is 1;
b) the volume depends linearly on the length of a fixed side. This means
that the function det is linear in each column i.e.
and
det[A1 . . . λAi . . . An ] = λ det[A1 . . . An ].
c) The volume of a degenerate parallelopiped is zero. This means that if two
columns of the matrix coincide, then its determinant vanishes.
(Note that the volume referred to here can take on negative values—
depending on the orientation of the parallelopiped).
4
1.2 Existence of the determinant
and how to calculate it
We shall now proceed to show that a function with the above properties
exists. In fact it will be more convenient to demand the analogous properties
for the rows i.e. we shall construct, for each n, a function
det : Mn → R
Before we prove the existence of such a function, we shall derive some further
properties which are a consequence of d1) - d3):
d4) if we add a multiple of one row to another one, the value of the determi-
nant remains unaltered i.e.
A1 A1
.. ..
. .
Ai Ai + Aj
. ..
det
.
. = det
. ;
A Aj
j
. .
.. ..
An An
5
d5) if we interchange two rows of a matrix, then we alter the sign of the
determinant i.e.
A1 A1
.. ..
. .
Ai Aj
. ..
det . = − det
.
. .
A Ai
j
. ..
.. .
An An
d6) if one row of A is a linear combination of the others, then det A = 0.
Hence if r(A) < n (i.e. if A is not invertible), then det A = 0.
Proof. d4)
A1 A1 A1 A1
.. .. .. ..
. . . .
Ai + Aj Ai Aj Ai
..
det . = det ... + det ... = det ...
Aj
Aj Aj Aj
.. . . .
. .. .. ..
An An An An
by d3).
d5)
A1 A1 A1 A1
.. .. .. ..
. . . .
Ai Ai + Aj Ai + Aj Aj
. .. ..
det . = det
. . det . = − det ... .
A Aj −Ai A
j i
. .. .. .
.. . . ..
An An An An
d6) Suppose that Ai = λ1 A1 + · · · + λi−1 Ai−1 . Then
A1 A1
.. ..
. .
A A
det i−1 = det i−1
=0
Ai (λ1 A1 + . . . λi−1 Ai−1 )
. ..
.. .
An An
6
since if we expand the expression by using the linearity in the i-th row we
obtain a sum of multiples of determinants each of which has two identical
rows and these vanish.
Note the fact that with this information we are able to calculate the
determinant of a given matrix, despite the fact that it has not yet been
defined! We simply reduce the matrix A to Hermitian form à by using
elementary transformations. At each step the above rules tell us the effect
on the determinant. If there is a zero on the diagonal of à (i.e. if r(A) < n),
then det A = 0 by d6) above. If not, we can continue to reduce the matrix to
the unit matrix by further row operations and so calculate its determinant.
In fact, a little reflection shows that most of these calculations are superfluous
and that it suffices to reduce the matrix to upper triangular form since the
determinant of the latter is the product of its diagonal elements.
We illustrate this by “calculating” the determinant of the 3 × 3 matrix
0 2 3
1 2 1 .
2 −3 2
We have
0 2 3 1 2 1
det 1 2 1 = − det 0 2 3
2 −3 2 2 −3 2
1 2 1
= − det 0 2 3
0 −7 0
1 2 1
= −2 det 0 1 23
0 −7 0
1 2 1
= −2 det 0 1 23
0 0 21
2
= −21.
In fact, what the above informal argument actually proves is the unique-
ness of the determinant function. This fact is often useful and we state it as
a Proposition.
7
The main result of this section is the fact that such a function does in fact
exist. The proof uses an induction argument on n. We already know that a
determinant function exists for n = 1, 2, 3. In order to motivate the following
proof note the formula
a11 (a22 a33 − a32 a23 ) − a21 (a12 a33 − a32 a13 ) + a31 (a12 a23 − a22 a13 )
where Ai1 is the (n−1) ×(n−1) matrix obtained by deleting the first column
and the i-throw of A (the induction hypothesis ensures that its determinant
is defined) and show that this function satisfies d1), d2) and d3). It is clear
that det In = 1. We verify the linearity in the k-th row as following. It
suffices to show that each term ai1 det Ai1 is linear in the k-th row. Now if
i 6= k a part of the k-th row of A is a row of Ai1 and so this term is linear by
the induction hypothesis. if i = k, then det Ai1 is independent of the k-throw
and ai1 depends linearly on it.
It now remains to show that det A = 0 whenever two rows of A are
identical, say the k-th and the l-th (with k < l). Consider the sum
n
X
(−1)i+1 ai1 det Ai1 .
i=1
then Ai1 has two identical rows (and so vanishes by the induction hypothesis)
except for the cases where j = k or j = l. This leaves the two terms
8
and they are equal in absolute value, but with opposite signs. (For ak1 = al1
and Ak1 is obtained from Al1 by moving one row (k − l − 1) places. This
can be achieved by the same number of row exchanges and so multiplies the
determinant by (−1)k−l−1 ).
The above proof yields the formula
n
X
det A = (−1)i+1 ai1 det Ai1
i=1
for the determinant which is called the development along the first col-
umn. Similarly, one can develop det A along the j-th column i.e. we have
the formula n
X
det A = (−1)i+j aij det Aij
i=1
An obvious induction argument shows that the determinant is a11 a22 . . . ann ,
the product of the diagonal elements. In particular, this holds for diagonal
matrices.
This provides a justification for the method for calculating the determi-
nant of a matrix by reducing it to triangular form by means of elementary
row operations. Note that for small matrices it is usually more convenient
to calculate the determinant directly from the formulae given earlier.
9
1.3 Further properties of the determinant
d7) if r(A) = n i.e. A is invertible, then det A 6= 0.
Proof. For then the Hermitian form of A has non-zero diagonal elements
and so the determinant of A is non-zero.
Combining d5) and d7) we have the following Proposition:
Shortly we shall see how the determinant can be used to give an explicit
formula for the inverse.
d8) The determinant is multiplicative i.e.
+ − + − ...
− + − + ...
..
.
10
Proof. We show that (adj A)A = (det A)I. Suppose that bik is the (i, k)-th
element of the product i.e.
n
X
bik = (−1)i+j ajk det Aji .
j=1
If i = k this is just the expansion of det A along the i-th column i.e. bii =
det A.
If i 6= k, it is the expansion of the determinant of the matrix obtained
from A by replacing the i-th column with the k-th one and so is 0 (since this
is a matrix with two identical columns and so of rank < n).
We have discussed the determinant function in terms of its properties
with regard to rows. Of course, it would have been just as logical to work
with columns and we now show that the result would have been the same.
To do this we introduce as a temporary notation the name Det for a function
on the n × n matrices with the following properties:
D1) Det In = 1;
D2) Det is linear in the columns;
D3) Det A = 0 whenever two columns of A coincide.
Of course we can prove the existence of such a function exactly as we did
for det (exchanging the word “column” for “row” everywhere). Even simpler,
we note that if we put
Det A = det At
then this will fulfill the required conditions.
All the properties of det carry over in the obvious way. In particular, there
is only one function with the above properties and we have the expansions
n
X
Det A = (−1)i+j aij Det Aij
j=1
along the i-th row. We shall now prove the following result:
Proposition 4 For each n × n matrix A, det A = Det A.
In other words, det A = det At and the notation “Det” is superfluous.
Again the proof is a typical application of the uniqueness. It suffices to
show that the function det satisfies conditions d1)-d3). Of course, we have
det I = 1. In order to prove the other two assertions, we use induction on
the order n and inspect the expansion
n
X
Det A = (−1)i+j aij Det Aij
j=1
11
which is clearly linear in aij (and so in the i-th row). By the induction
hypothesis, it is linear in the other rows (since each of the Aij are). To
complete the proof, we need only show that Det A vanishes if two rows of
A coincide. But then r(A) < n and so we have Det A = 0 by the column
analogue of property d6).
d11) One can often reduce the computations involved in calculating deter-
minants by using suitable block decompositions. For example, if A has the
decomposition
B C
0 D
where B and D are square matrices, then
12
Determinants of linear operators Since square matrices are the coordi-
nate versions of linear operators on a vector space V it is tempting to extend
the definition of determinants to such operators. The obvious way to do this
is to choose some basis (x1 , . . . , xn ) and to define the determinant det f of f
to be the determinant of the matrix of f with respect to this basis. We must
then verify that this value is independent of the choice of basis. But if A′ is
the matrix of f with respect to another basis, we know that
A′ = S −1 AS
Example: Calculate
6 0 2 0
4 0 0 2
det
0
1 2 0
2 0 2 2
We have
6 0 2 0
4 6 2 0
0 0 2
det
0 = − det 4 0 2
1 2 0
2 2 0
2 0 2 2
3 1 0
= −8 det 2 0 1
1 1 0
= −8(−3 + 1) = 16.
13
Example: Calculate
x 1 1 1
1 x 1 1
det
1
.
1 x 1
1 1 1 x
Solution: We have
x 1 1 1 0 1−x 1 − x2 1 − x2
1 x 1 1 0 −(1 − x) 0 1−x
det
1 1 x
= det
1 0 0 −(1 − x) 1 − x
1 1 1 x 1 1 1 x
2
1−x 1−x 1−x
= det −(1 − x) 0 1−x
0 −(1 − x) 1 − x
1 1 1+x
= (x − 1)3 det −1 0 1
0 −1 1
= (x − 1)3 (1 + x + 2) = (x − 1)3 (3 + x).
14
Example: Calculate
0 0 ... 0 1
0 0 ... 1 0
det .. .. .
. .
1 0 ... 0 0
We have
0 0 ... 0 1 0 ... 0 1
0 0 ... 1 0 0 ... 1 0
n
det .. .. = (−1) det .. ..
. . . .
1 0 ... 0 0 1 ... 0 0
where the left hand matrix is n×n and the right hand one is (n−1)×(n−1).
From this it follows that the value of the given determinant is
n(n−1)
(−1)n−1 (−1)n−2 . . . (−1)2−1 = (−1) 2 .
Example: Calculate
x a a ... a
a x a ... a
det .. .. .
. .
a a a ... x
15
Solution: Subtracting from each column x1 times the one on its left we see
that the determinant is equal to
1 0 ... 0
1 x2 − x1 . . . xn−2 (x2 − x1 )
2
det .. ..
. .
n−2
1 (xn − x1 ) . . . xn (xn − x1 )
which is equal to
1 x2 . . . x2n−2
..
(x2 − x1 )(x3 − x1 ) . . . (xn − x1 ) det ... .
n−2
1 xn . . . xn
n(n − 1)
(a product of terms). (In particular, this determinant is non-zero
2
if the xi are distinct).
16
vanish?
4) Evaluate the following determinants:
1 2 3 ... n
2 3 4 ... 1
det .. ..
. .
n 1 2 ... n−1
0 1 1 ... 1
−1 0 1 ... 1
det .. ..
. .
−1 −1 −1 ... 0
1 −a 0 ... 0
−b 1 −a ... 0
... 0
det 0 −b 1
.. ..
. .
0 0 ... −b 1
λ1 1 0 ... 0
−1 λ2 1 ... 0
det .. ..
. .
0 0 ... −1 λn
λ1 a a ... a
b λ2 a ... a
det .. ..
. .
b b b . . . λn
λ a 0 ... 0
a λ a ... 0
det .. .. .
. .
0 0 ... a λ
5) Show that if P is a projection, then the dimension k of the range of P is
determined by the equation
2k = det(I + P ).
Use this to show that if t 7→ P (t) is a continuous mapping from R into the
family of all projections on Rn , then the dimension of the range of P (t) is
constant.
17
6) Show that if A (resp. B) is an m × n matrix (resp. an n × m matrix),
then Im + AB is invertible if and only if In + BA is.
7) Let A and B be n × n matrices. Show that
• adj (AB) = adj A · adj B;
18
Which well-known result of analysis follows?
14) Let
x1 = rc1 c2 . . . cn−2 cn−1
x2 = rc1 . . . cn−2 sn−1
..
.
xj = rc1 . . . cn−j sn−j+1
..
.
xn = rs1 .
where ci = cos θi , si = sin θi (these are the equations of the transformation to
polar coordinates in n dimensions). Calculate the determinant of the Jacobi
matrix
∂(x1 , x2 , . . . , xn )
.
∂(r, θ1 , . . . , θn−1 )
15) Consider the Vandermonde matrix
1 ... 1
t1 . . . tn
Vn = .. .. .
. .
n−1 n−1
t1 . . . tn
Show that Vn Vnt is the matrix
s0 s1 . . . sn−1
s1 s2 . . . sn
.. ..
. .
sn−1 sn . . . s2n−2
Pn k
where sk = i=1 ti .
Use this to calculate the determinant of this matrix.
B C
16) Suppose that the square matrix A has a block representation
D E
where B is square and non-singular. Show that
det A = det B det(E − DB −1 C).
Deduce that if D is also square and commutes with B, then det A = det(BE−
DC).
17) Suppose that A0 , . . . , Ar are complex n × n matrices and consider the
matrix function
p(t) = A0 + A1 t + · · · + Ar tr .
Show that if det p(t) is constant, then so is p(t) (i.e. A0 is the only non-
vanishing term).
19
1.4 Applications of the determinant
We conclude this chapter by listing briefly some applications of the determi-
nant:
20
is the area of the triangle ABC. The area is positive if the direction A →
B → C is clockwise, otherwise it is negative. (By taking the signed area we
assure that it is additive i.e. that
regardless of the position of O with respect to the triangle (see figure ??).
If A = (ξ1 , ξ2 , ξ3 ), B = (η1 , η2 , η3 ), C = (ζ1 , ζ2 , ζ3 ), D = (υ1 , υ2, υ3 ) are
points in space, then
ξ1 ξ2 ξ3 1
1 η1 η2 η3 1
det ζ1 ζ2 ζ3 1
3!
υ1 υ2 υ3 1
is the volume of the tetrahedron ABCD.
Of course, analogous formulae hold in higher dimensions.
where B is the point (ξ1 , ξ2, ξ3 ) etc. and that of the image is
1 1
ξ1 ξ2 ξ31 1
1 η11 η21 η31 1
det ζ11 ζ21
.
3! ζ31 1
υ1 υ2 υ3 1
Now we have
t
ξ11 ξ21 ξ31 1 ξ11 ξ21 ξ31 1 A 0
1 η11 η21 η31 1 η11 η21 η31 1
det = 1 det
3! ζ11 ζ21 ζ31 1 3! ζ11 ζ21 ζ31 1
υ1 υ2 υ3 1 υ11 υ21 υ31 1 0 1
21
and so we have that the volume of B1 C1 D1 E1 is det A times the volume of
BCDE. It follows from a limiting argument that the same formula holds for
arbitrary figures. (This justifies the original geometrical motivation for the
existence and properties of the determinant).
Once again, an analogous result holds in higher dimensions.
V. The equations of curves: If P = (ξ11 , ξ21) and Q = (ξ12 , ξ22) are distinct
points in the plane, then the line L through P and Q has equation
ξ1 ξ2 1
det ξ11 ξ21 1 = 0.
ξ12 ξ22 1
For if the equation of the line has the form aξ1 + bξ2 + c = 0, then we have
aξ11 + bξ21 + c = 0
aξ12 + bξ22 + c = 0.
This means that the above three homogeneous equations (in the variables
a, b, c) has a non-trivial solution. As we know, this is equivalent to the
vanishing of the above determinant.
In exactly the same way one shows that the plane through (ξ11 , ξ21 , ξ31),
(ξ12 , ξ22 , ξ32) and (ξ13 , ξ23 , ξ33) has equation
ξ1 ξ2 ξ3 1
ξ11 ξ21 ξ31 1
det
ξ12 ξ22 ξ32 1 = 0.
ξ13 ξ23 ξ33 1
The circle through (ξ11 , ξ22), (ξ12 , ξ22) and (ξ13, ξ23 ) has equation:
(ξ1 )2 + (ξ2 )2 ξ1 ξ2 1
(ξ11 )2 + (ξ21 )2 ξ11 ξ21 1
det (ξ12 )2 + (ξ22 )2
= 0.
ξ12 ξ22 1
(ξ13 )2 + (ξ23 )2 ξ13 ξ23 1
and this fails to vanish precisely when the points are non-collinear).
22
VI. Orientation: A linear isomorphism f on a vector space V is said
to preserve orientation if its determinant is positive—otherwise it reverses
orientation. This concept is particularly important for isometries and those
which preserve orientation are called proper. Thus the only proper isome-
tries of the plane are translations and rotations.
Two bases (x1 , . . . , xn ) and (x′1 , . . . , x′n ) have the same orientation if
the linear mapping which maps xi onto x′i for each i preserves orientation.
This just means that the transfer matrix from (xi ) to (x′j ) has positive de-
terminant. For instance, in R3 , (e1 , e2 , e3 ) and (e3 , e1 , e2 ) have the same
orientation, whereas that of (e2 , e1 , e3 ) is different.
Example Is
cos α cos β sin α cos β − sin β
cos α sin β sin α sin β cos β
− sin α cos α 0
the matrix of a rotation?
Solution: Firstly the columns are orthonormal and so the matrix induces an
isometry. but the determinant is
(m + 1)x + y + z = 2−m
x + (m + 1)y + z = −2
x + y + (m + 1)z = m.
23
Exercises: 1) Show that the centre of the circle through the points (ξ11 , ξ22),
(ξ12 , ξ22 ) and (ξ13 , ξ23) has coordinates
1 2 1 2
(ξ1 ) + (ξ21 )2 ξ21 1 (ξ1 ) + (ξ21 )2 ξ11 1
( 21 det (ξ12)2 + (ξ22 )2 ξ22 1 , 21 det (ξ12 )2 + (ξ22 )2 ξ12 1 )
(ξ13)2 + (ξ23 )2 ξ23 1 (ξ 3 )2 + (ξ23 )2 ξ13 1
1 2 1 .
ξ1 ξ2 1
det ξ12 ξ22 1
ξ13 ξ23 1
p(t) = a0 + · · · + am tm
q(t) = b0 + · · · + bn tn
be polynomials whose leading coefficients are non-zero. Show that they have
a common root if and only if the determinant of the (m+ n) ×(m+ n) matrix
am am−1 . . . a1 a0 0 ... 0
0 am . . . a2 a1 a0 . . . 0
. ..
.
. .
A= 0 0 . . . 0 am am−1 . . . a0
bn bn−1 . . . b1 b0 0 ... 0
. ..
.. .
0 0 . . . bn ... b0
24
is non-zero. (This is known as Sylvester’s criterium for the existence of a
common root). In order to prove it calculate the determinants of the matrices
B and BA where B is the (m + n) × (m + n) matrix
tn+m−1 0 0 ... 0
tn+m−2 1 0 ... 0
tn+m−3 0 1 ... 0
.. ..
. .
.
t 0 0 ... 0
1 0 0 ... 0
.. ..
. .
1 ... 1
25
2 COMPLEX NUMBERS AND COMPLEX
VECTOR SPACES
2.1 The construction of C
When we discuss the eigenvalue problem in the next chapter, it will be con-
venient to consider complex vector spaces i.e. those for which the complex
numbers play the role taken by the reals in the third chapter. We therefore
bring a short introduction to the theme of complex numbers.
Complex numbers were stumbled on by the renaissance mathematician
Cardano in the famous formulae
√ p √ p √ p
λ1 = 3 α + 3 β λ2 = ω 3 α + ω 2 3 β λ3 = ω 2 3 α + ω 3 β
p p √
−q + q 2 + 4p3 −q − q 2 + 4p3 −1 + 3i
where α = , β = , ω = for the
2 2 2
roots λ1 ,λ2 and λ3 of the cubic equation
x3 + 3px = q = 0
26
addition: (x, y) + (x1 , y1) = (x + x1 , y + y1 )’
multiplication: (x, y) · (x1 , y1 ) = (xx1 − yy1, xy1 + x1 y).
Note that these correspond precisely to the expressions obtained by formally
adding and multiplying x + iy and x1 + iy1 .
This leads to the following definition: a complex number is an ordered
pair (x, y) of real numbers. On the set of such numbers we define addition
and multiplication by the above formulae. We use the following conventions:
• 1) i denotes the complex number (0, 1) and we identity the real number
x with the complex number (x, 0). Then i2 = −1 since
(0, 1) · (0, 1) = (−1, 0).
Every complex number (x, y) has a unique representation x = iy where
x, y ∈ R. (It is customary to use letters such as z, w, . . . for complex
numbers). If z = x + iy (x, y ∈ R), then x is called the real part of z
(written ℜz) and y is called the imaginary part (written ℑz).
• 2) If z = x + iy, we denote the complex number x − iy (i.e. the mirror
image of z in the x-axis) by z̄—the complex-conjugate of z. Then
the following simple relations holds:
z + z1 = z̄z1 ;
zz1 = z̄ · z1 ;
1
ℜz = (z + z̄);
2
1
ℑz = (z − z̄);
2i p
z · z̄ = |z|2 where |z| = x2 + y 2 .
|z| is called the modulus or absolute value of z. It is multiplicative
in the sense that |zz1 | = |z||z1 |.
• 3) every non-zero complex number z has a unique representation of the
form
ρ(cos θ + i sin θ)
where ρ > 0 and θ ∈ [0, 2π[. Here ρ = |z| and θ is the unique real
x y
number in [0, 2π[ so that cos θ = , sin θ = .
ρ ρ
We denote the set of complex numbers by C. Of course, as a set, it is
identical with R2 and we use the notation C partly for historical reasons and
partly to emphasis the fact that we are considering it not just as a vector
space but also with its multiplicative structure.
27
Proposition 5 For complex numbers z, z1 , z2 , z3 we have the relationships
• z1 + z2 = z2 + z1 ;
• z1 + (z2 + z3 ) = (z1 + z2 ) + z3 ;;
• z1 = 0 = 0 + z1 = z1 ’
• z1 (z2 + z3 ) = z1 z2 + z1 z3 ;
• z1 z2 = z2 z1 ;
• z1 · 1 = 1 · z1 = z1 ;
z̄
• if z 6= 0, there is an element z −1 so that z · z −1 = 1 (take z −1 = ).
|z|2
This result will be of some importance for us since in our treatment of linear
equations, determinants, vector spaces and so on, the only properties of the
real numbers that we have used are those which correspond to the above
list. Hence the bulk of our definitions, results and proofs can be carried over
almost verbatim to the complex case and, with this justification, we shall use
the complex versions of results which we have proved only for the real case
without further comment.
It is customary to call a set with multiplication and addition operations
with such properties a field. A further example of a field is the set Q of
rational numbers.
This is derived by multiplying out the left hand side and using the addition
formulae for the trigonometric functions.
This equation can be interpreted geometrically as follows: multiplication
by the complex number z = ρ(cos θ + i sin θ) has the effect of rotating a
second complex number through an angle of θ and multiplying its absolute
value by ρ (of course this is one of the similarities considered in the second
28
chapter—in fact, a rotary dilation). As a Corollary of the above formula we
have the famous result
1 + cos θ + · · · + cos nθ
and
sin θ + · · · + sin nθ.
Solution: The first part is proved exactly as in the case of the partial sums
of a real geometric series. If we set z = cos θ + i sin θ and take the real part,
we get
1 − cos(n + 1)θ − i sin(n + 1)θ
1 + cos θ + · · · + cos nθ = ℜ
1 − cos θ − i sin θ
which simplifies to the required formula (we leave the details to the reader).
The sine part is calculated with the aid of the imaginary part. Example:
Describe the geometric form of the set
29
Solution Substituting z = x + iy we get
1 + r cos θ + r 2 cos 2θ + . . .
and
r sin θ + r 2 sin 2θ + . . .
6) Show that the points z1 , z2 and z3 in the complex plane are the vertices
of an equilateral triangle if and only if
z1 + ωz2 + ω 2 z3 = 0
or
z1 + ω 2 z2 + ωz3 = 0
30
2πi
where ω = e 3 .
If z1 , z2 , z3 , z4 are four complex numbers, what is the geometrical signifi-
cance of the condition
Show that Q satisfies all of the axioms of a field with the exception of the
commutativity of multiplication (such structures are called skew fields).
Show that if we put i = (i, 0), j = (0, i), k = (0, 1), then ij = −ji = k,
jk = −kj = i etc. and i2 = j2 = k2 = −1. Also every element of Q has a
unique representation of the form
ξ0 + (ξ1 i + ξ2 j + ξ3 k)
31
2.2 Polynomials
The field of complex numbers has one significant advantage over the real
field. All polynomials have roots. This result will be very useful in the next
chapter—it is known as the fundamental theorem of algebra and can
be stated in the following form:
Proposition 6 Let
There is no simple algebraic proof of this result which we shall take for
granted.
The fundamental theorem has the following Corollary on the factorisation
of real polynomials.
Corollar 1 Let
p(t) = a0 + · · · + an−1 tn−1 + tn
be a polynomial with real coefficients. Then there are real numbers
t1 , . . . , tr , α1 , . . . , αs , β1 , . . . , βs
where r + 2s = n so that
p(λ) = a0 + a1 λ + · · · + λn .
Since p(λ) = p(λ̄) (the coefficients being real), we see that a complex number
λ is a root if and only if its complex conjugate is also one. Hence we can list
the roots of p as follows: firstly the real ones
t1 , . . . , tr
32
Then we see that p has the required form by multiplying out the correspond-
ing linear and quadratic terms.
The next result concerns the representation of rational functions. These
p
are functions of the form where p and q are polynomials. By long division
q
every such function can be expressed as the sum of a polynomial and a
ildep
rational function where the degree d(p̃) of p̃ (i.e. the index of its highest
q
power) is strictly less than that of q. Hence from now on we shall tacitly
assume that this condition is satisfied. Further it is no loss of generality to
suppose that the leading coefficient of q is “1”.
We consider first the case where q has simple zeros i.e.
q(λ) = (λ − λ1 ) . . . (λ − λn )
where the λi are distinct. Then we claim that there are uniquely determined
complex numbers a1 , . . . , an so that
p(λ) a1 an
= + ...
q(λ) λ − λ1 λ − λN
for λ ∈ C \ {λ1 , . . . , λn }.
Proof. This is equivalent to the equation
n
X
p(λ) = ai qi (λ)
i=1
q(λ)
where qi (λ) = λ−λ i
. If this holds for all λ as above then it holds for all λ in
C since both sides are polynomials. Substituting successively λ1 ,λ2 , . . . , λn
in the equation we see that
p(λ1 )
a1 =
(λ1 − λ2 ) . . . (λ1 − λn )
..
.
p(λn )
an =
(λ2 − λn ) . . . (λn − (λn−1
The general result (i.e. where q has multiple zeros) is more complicated to
state. We suppose that
q(λ) = (λ − λ1 )n−1 . . . (λ − λr )nr
where the λi are distinct and claim that the rational function can be written
1
as a linear combination of functions of the form (λ−λ i)
j for 1 ≤ i ≤ r and
1 ≤ j ≤ ni .
33
Proof. Write
p(λ) p(λ)
=
q(λ) (λ − λ1 )n1 q1 (λ)
where q1 (λ) = (λ − λ2 )n2 . . . (λ − λr )nr . We claim that there is a polynomial
p1 with d(p1 ) = d(p) − 1 and an a ∈ C so that
p(λ) a p1 (λ)
= +
(λ − λ1 )n1 q1 (λ) (λ − λ1 )n1 (λ − λ1 )n1 −1 q1 (λ)
from which the proof follows by induction.
For the above equation is equivalent to the following one:
p(λ) − aq1 (λ) p1 (λ)
= .
q(λ) (λ − λ1 )n1 −1 q(λ)
Hence it suffices to choose a ∈ C so that p(λ) − aq1 (λ) contains a factor
p(λ1 )
λ − λ1 and there is precisely one such a namely a = .
q1 (λ)
We remark that the degree function satisfies the following properties:
d(p + q) ≤ max(d(p), d(q))
with equality if d(p) 6= d(q)) and
d(pq) = d(p) + d(q)
provided that p and q are non-zero.
The standard high school method for the division of polynomials can be
used to prove the existence part of the following result:
Proposition 7 Let p and q be polynomials with d(p) ≥ 1. Then there are
unique polynomials r, s so that
q = ps + r
where r = 0 or d(r) < d(p).
Proof. In the light of the above remark, we can confine ourselves to a proof
of the uniqueness: suppose that
q = ps + r = ps1 + r1
for suitable s, r, s1 , r1 . Then
p(s − s1 ) = r − r1 .
Now the right hand side is a polynomial of degree strictly less than that of p
and hence so is the left hand side. But this can only be the case if s = s1 .
34
The above division algorithm can be used to prove an analogue of the
Euclidean algorithm for determining the greatest common divisor of two
polynomials p, q. We say that for two such polynomials, q is a divisor of
p (written q | p) if there is a polynomial s so that p = qs. Note that then
d(p) ≥ d(q) (where d(p) denotes the degree of p). Hence if p | q and q | p, then
d(p) = d(q) and it follows that p is a non-zero constant times q (we are tacitly
assuming that the polynomials p and q are both non-zero). The greatest
common divisor of p and q is by definition a common divisor which has
as divisor each other divisor of p and q. It is then uniquely determined up
to a scalar multiple and we denote it by g.c.d. (p, q). It can be calculated as
follows: we suppose that d(q) ≤ d(p) and use the division algorithm to write
p = qs1 + r1
with r1 = 0 or d(r1 ) < d(q). In the first case, q is the greatest common
divisor. Otherwise we write
q = s2 r1 + r2
then
r1 = s3 r2 + r3
and continue until we reach a final equation rk = sk+2rk+1 without remainder.
Then rk−1 is the greatest common divisor and by substituting backwards
along the equations, we can compute a representation of it in the form mp+nq
for suitable polynomials m and n.
has the property that it takes on the value 1 at ti and 0 at the other tj . Then
n
X
p= ai pi
i=0
35
Exercises: 1) Show that a complex number λ0 is a root of order r of the
polynomial p (i.e. (λ − λ0 )r divides p) if and only if
p(λ) = a0 + ar (λ − λ0 )r + · · · + an (λ − λ0 )n
36
2.3 Complex vector spaces and matrices
We are now in a position to define complex vector spaces.
37
The theory of chapters I and V for matrices can then be carried over in the
obvious way to complex matrices.
Sometimes it is convenient to be able to pass between complex and real
vectors and this can be achieved as follows: if V is a complex vector space,
then we can regard it as a real vector space simply by ignoring the fact that
we can multiply by complex scalars. We denote this space by V R . This
notation may seem rather pedantic—but note that if the dimension of V is n
then that of VR is 2n. This reflects the fact that elements of V can be linearly
dependent in V without being so in VR since there are less possibilities for
building linear combinations in the latter. For example, the sequence
is necessary to attain a basis for the real space Cn which is thus 2n dimen-
sional.
On the other hand, it V is a real vector space we can define a correspond-
ing complex vector space VC as follows: as a set VC is V × V . It has the
natural addition and scalar multiplication is defined by the equation
The dimensions of V (as a real space) and VC (as a complex space) are the
same. If f : V → W is a linear mapping between complex vector space then
it is a fortiori a linear mapping from VR into WR . However, a real linear
mapping between the latter spaces need not be complex-linear. On the other
hand, if f : V → W is a linear mapping between real vector spaces, we can
extend it to a complex linear mapping fC between VC and WC by defining
(1 − i)x − 9y =0
2x + (1 − i)y = 1.
38
3) Show that if z1 , z2 , z3 are complex numbers, then
z1 z1 1
det z2 z2 1
z3 z3 1
and that
A −B
det = | det(A + iB)|2 .
B A
6) (The following exercise shows that complex 2 × 2 matrices can be used to
give a natural approach to the two products in R3 ). Consider the space M2C
of 2 × 2 complex matrices. If A is such a matrix, say
a11 a22
A=
alpha21 a22
39
is also in E3 and that
Ax ∗ Ay = (x|y)I2 + Ax×y .
x × y = −y × x;
kx × yk = kxkkyk sin θ where θ is the angle between x and y;
(x × y) × z = (x|z)y − (y|z)x;
(x × y) × z + (y × z) × x + (z × x) × y = 0.
40
3 EIGENVALUES
3.1 Introduction
In this chapter we discuss the so-called eigenvalue problem for operators
or matrices. This means that for a given operator f ∈ L(V ) a scalar λ and
a non-zero vector x are sought so that f (x) = λx (i.e. the vector x is not
rotated by f ). Such problems arise in many situations, some of which we
shall become acquainted with in the course of this chapter. In fact, if the
reader examines the discussion of conic sections in the plane and three di-
mensional space he will recognise that the main point in the proof was the
solution of an eigenvalue problem. The underlying theoretical reason for the
importance of eigenvalues is the following: we know that a matrix is the
coordinate representation of an operator. Even in the most elementary an-
alytic geometry one soon appreciates the advantage of choosing a basis for
which the matrix has a particularly simple form. The simplest possible form
is that of a diagonal matrix and the reader will observe that we obtain such
a representation precisely when the basis elements are so-called eigenvectors
of the operator f i.e. they satisfy the condition f (xi ) = λi xi for suitable
eigenvalues λ1 , . . . , λn (which then form the diagonal elements of the corre-
sponding matrix). Stated in terms of matrices this comprises what we may
call the diagonalisation problem: given an n × n matrix A can we find an
invertible matrix S so that S −1 AS is diagonal?
Amongst the advantages that such a diagonalisation brings is the fact that
one can then calculate simply and quickly arbitrary powers and thus poly-
nomial functions of a matrix by doing this for the diagonal matrix and then
transforming back. We shall discuss some applications of this below. On the
other hand, if A is the matrix of a linear mapping in Rn we can immediately
read off the geometrical form of the latter from its diagonalisation.
We begin with the formal definition. If f ∈ L(V ), an eigenvalue of f
is a scalar λ so that there exists a non-zero x with f (x) = λx. The space
Ker (f − λId) is then non-trivial and is called the eigenspace of λ and each
non-zero element therein is called an eigenvector. Our main concern in
this chapter will be the following: given an operator f , can we find a basis
for V consisting of eigenvectors? In general the answer is no as very simple
examples show but we shall obtain a result which, while being much less
direct, is still useful in theory and applications.
We can restate the eigenvalue problem in terms of matrices: an eigenvalue
resp. eigenvector for an n × n matrix A is an eigenvalue resp. eigenvector for
the operator fA i.e. λ is an eigenvalue if and only if there exists a non-zero
column vector X so that AX = λX and X is then called an eigenvector.
41
Before beginning a systematic development of the theory, we consider a
simple example where an eigenvalue problem arises naturally—in this case
in the solution of a linear system of ordinary differential equations:
42
Now we know that such a solution exists if and only if the determinant of
the corresponding matrix vanishes. This leads to the equation
(3 − λ)(2 − λ) − 2 = 0
43
3.2 Characteristic polynomials and diagonalisation
The above indicates the following method for characterising eigenvalues:
Proposition 8 If A is an n × n matrix, then λ is an eigenvalue of A if and
only if λ is a root of the equation
det(A − λI) = 0.
then the eigenvalues of A are just the diagonal elements a11 , . . . , ann (in
particular, this holds if A is diagonal). For
44
and the eigenvalues are the roots of this polynomial.
We now turn to the topic of the diagonalisation problem. The connection
with the eigenvalue problem is made explicit in the following result:
Proposition 9 A linear operator f ∈ L(V ) is diagonalisable if and only if
V has a basis (x1 , . . . , xn ) consisting of eigenvectors of f .
Proposition 10 If an n×n matrix A has n linearly independent eigenvectors
X1 , . . . , Xn and S is the matrix [X1 . . . Xn ], then S diagonalises A i.e.
S −1 AS = diag (λ1 , . . . , λn )
where the λi are the corresponding eigenvalues.
Proof. If the matrix of f with respect to the basis (x1 , . . . , xn ) is the
diagonal matrix
λ1 0 . . . 0
0 λ2 . . . 0
.. .. ,
. .
0 0 . . . λn
then f (xi ) = λi xi i.e. each xi is an eigenvector. Conversely, if (xi ) is a basis
so that f (xi ) = λi xi for each i, then the matrix of f is as above. The second
result is simply the coordinate version.
As already mentioned, the example of a rotation in R2 shows that the
condition of the above theorems need not always hold. The problem is that
the matrices of rotations (with the trivial exceptions Dπ and D0 ) have no
(real) eigenvalues. There is no problem if the operator does have n distinct
eigenvalues, as the next result shows:
Proposition 11 Let f ∈ L(V ) be a linear operator in an n dimensional
space and suppose that f has r distinct eigenvalues with eigenvectors x1 , . . . , xr .
Then {x1 , . . . , xr } is linearly independent. Hence if f has n distinct eigen-
values, it is diagonalisable.
Proof. If the xi are linearly dependent, there is a smallest s so that xs is
linearly dependent on x1 , . . . , xs−1 , say xs = µ1 x1 + . . . µs−1 xs−1 . If we apply
f to both sides and then subtract λs times the original equation, we get:
0 = µ1 (λ1 − λs )x1 + · · · + µs−1 (λs−1 − λs )xs−1
and this implies that the x1 , . . . , xs−1 are linearly dependent which is a con-
tradiction.
Of course, it is not necessary for a matrix to have n distinct eigenvalues
in order for it to be diagonalisable, the simplest counterexample being the
unit matrix.
45
Estimates for eigenvalues For applications it is often useful to have esti-
mates for the eigenvalues of a given matrix, rather than their precise values.
In this section, we bring two such estimates, together with some applications.
Recall that if a matrix A is dominated by the diagonal in the sense that
for each i X
|aii | − |aij | > 0,
j6=i
then it is invertible (see Chapter IV). This can be used to give the following
estimate:
Proposition 12 Let A be a complex n×n matrix with eigenvalues λ1 , . . . , λn .
Put for each i X
αi = |aij |.
j6=i
Proof. It is clear that if λ does not lie in one of the above circular regions,
then the matrix (λI − A) is dominated by the diagonal in the above sense
and so is invertible i.e. λ is not an eigenvalue.
We can use this result to obtain a classical estimate for the zero of poly-
nomials. Consider the polynomial p which maps t onto a0 + a1 t + · · · +
an−1 tn−1 + tn . The roots of p coincide with the eigenvalues of the companion
matrix
0 1 0 ... 0
0 0 1 ... 0
C = .. ..
. .
−a0 −a1 −a2 . . . −an−1
(see Exercise 4) below).
It follows from the above criterium that if λ is a zero of p, then
Our second result shows that the eigenvalues of a small matrix cannot be
too large. More precisely, if A is an n × n matrix and a = maxi,j |aij |, then
each eigenvalue λ satisfies the inequality: |λ| ≤ na. For suppose that
ξ1
X = ...
ξn
46
is a corresponding eigenvector. Then we have
(AX|X) = λ(X|X)
i.e. X X
λ ξi ξi = aij ξi ξj .
i i,j
which implies the result. (In the last inequality, we use the Cauchy-Schwarz
P 1 P 1
inequality which implies that i |ξi | ≤ n 2 ( i |ξi |2 ) 2 . See the next chapter
for details).
We conclude this section with an application of the diagonalisation method:
47
to compute the powers of A. To do this directly would involve astronomical
computations. The task is simplified
√ by diagonalising
√ A. A simple calculation
1+ 5 1− 5
shows that A has eigenvalues and , with eigenvectors
2 2
1+√5 1−√5
2 and 2 .
1 1
Hence " √ #
1+ 5
0
S −1 AS = 2 √
1− 5
0 2
√
1+ 5
√
1− 5
where S = 2 .
2
1 1
From this it follows that
1+√5 1−√5 " √
1+ 5
#" √ #
1 0√ 1 − 1−√2 5
A= √ 2 2 2
1− 5
5 1 1 0 2
−1 1+2 5
and
√ √ " √
1+ 5
#n " √ #
1 1+ 5 1− 5 0 1 − 1−√2 5
An = √ 2 2 2 √
1− 5 1+ 5
5 1 1 0 2
−1 2
Solution:
1 −λ 1 ... 1
..
χA (λ) = det ... .
1 1 ... 1 − λ
= (n − λ)λn−1 (−1)n−1 .
by a result of the previous chapter.
48
2) Calculate the eigenvalues of the n × n matrix
0 1 0 ... 0
0 0 1 ... 0
A = .. .. .
. .
1 0 0 ... 0
Solution:
−λ 1 0 . . . 0
0 λ 1 ... 0
χA (λ) = .. ..
. .
1 0 0 ... λ
n−1
= (−1) (λn − 1).
2πir
Hence the eigenvalues are the roots e n of unity (r = 0, . . . , n − 1).
3) Calculate the eigenvalues of the linear mapping
a11 a12 a11 a21
7→
a21 a22 a12 a22
on M2 .
Solution: With respect to the basis
1 0 0 1 0 0 0 0
x1 = x2 = x3 = x4 =
0 0 0 0 1 0 0 1
the mapping has matrix
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
and this has eigenvalues 1, 1, 1, −1.
4) Show that fn+1 fn−1 −fn2 = (−1)n−1 where fn is the n-th Fibonacci number.
Solution: Note the
2 fn+1 fn
fn+1 fn−1 − fn = det
fn fn−1
n−1
1 1 f2 f1
= det
1 0 f1 f0
1 1 n−1 2 1
= (det ) det
1 0 1 1
= (−1)n−1 .
.
49
3.3 The Jordan canonical form
As we have seen, not every matrix can be reduced to diagonal form and in
this section we shall investigate what can be achieved in the general case.
We begin by recalling that failure to be diagonalisable can result from two
causes. Firstly, the matrix can fail to have a sufficient number of eigenvalues
(i.e. zeroes of χA ). By the fundamental theorem of algebra, this can only
happen in the real case and in this section we shall avoid this difficulty by
confining our attention to complex matrices resp. vector spaces. The second
difficulty is that the matrix may have n eigenvalues (with repetitions) but
may fail to have enough eigenvectors to span the space. A typical example
is the shear operator
(ξ1 , ξ2 ) 7→ (ξ1 + ξ2 , ξ2 )
1 1
with matrix .
0 1
This has the double eigenvalue 1 but the only eigenvectors are multiples
of the unit vector (1, 0). We will investigate in detail the case of repeated
eigenvalues and it will turn out that in a certain sense the shear operator
represents the typical situation. The precise result that we shall obtain is
rather more delicate to state and prove than the diagonalisable case and we
shall proceed by way of a series of partial results. We begin with the following
Proposition which allows us to reduce to the case where the operator f has
a single eigenvalue.
In order to avoid tedious repetitions we assume from now until the end
of this section that f is a fixed operator on a complex vector space V of
dimension and that f has eigenvalues
λ1 , . . . , λ1 , λ2 , . . . , λr , . . . , λr
where λi occurs ni times. This means that f has characteristic polynomial
(λ1 − λ)n1 . . . (λr − λ)nr
where n1 + · · · + nr = n).
Proposition 13 There is a direct sum decomposition
V = V1 ⊕ · · · ⊕ Vr
where
• each Vi is f invariant (i.e. f (Vi ) ⊂ Vi );
• the dimension of Vi is ni and (f − λi Id)ni |Vi = 0.
In particular, the only eigenvalue of f |Vi is λi .
50
Proof. Fix i. It is clear that
and so
Ker (f − λi Id)ri = Ker (f − λi Id)ri +m
for m ∈ N.
Then we claim that
Since the sum of the dimensions of these two spaces is that of V , it suffices
to show that their intersection is {0}. But if y ∈ Ker (f − λi Id)ri and y =
(f − λi Id)ri (x), then (f − λi Id)2ri (x) = 0 and so x ∈ Ker (f − λi Id)2ri =
Ker (f − λi Id)ri i.e. y = 0. It is now clear that if Vi = Ker(f − λi Id)ri , then
V = V1 ⊕ · · · ⊕ Vr
Ker f ⊂ Ker f 2 ⊂ . . .
and
f (V ) ⊃ f 2 (V ) ⊃ . . .
become stationary at points r, s i.e. we have
Ker f 6= Ker f 2 6= · · · =
6 Ker f r = Ker f r+1 = . . .
and
f (V ) 6= f 2 (V ) 6= · · · =
6 f s (V ) = f s+1 (V ) = . . .
Then the above proof actually shows the following:
51
Using the above result, we can concentrate on the restrictions of f to the sum-
mands. These have the special property that they have only one eigenvalue.
Typical examples of matrices with this property are the Jordan matrices
which we introduced in the first chapter. Recall the notation
λ 1 0 ... 0
0 λ 1 ... 0
Jn (λ) = .. .. .
. .
0 0 0 ... λ
W = W1 ⊕ · · · ⊕ Wk
so that each Wi is g-invariant and has a basis with respect to which the matrix
of g is the Jordan matrix Jsi (λ) where si = dim Wi .
By replacing g by g − λI we can reduce to the following special case which
is the one which we shall prove:
52
Proposition 16 Let g ∈ L(V ) be nilpotent with g r = 0, g r−1 6= 0. Then
there is a decomposition
V = V1 ⊕ · · · ⊕ Vk
with each Vi g-invariant and a basis for each Vi so that g|Vi has matrix Jsi (0)
where si = dim Vi .
x1 , g(x1), . . . , g r−1(x1 )
0 = g r (y)
= g r−s (g s (y))
r−1
X
= λj g j+r+s(x1 )
j=0
s−1
X
= λj g j+r−s(x1 ).
j=0
53
Now if V = V1 ⊕ V2 we are finished. If not we can proceed in the same
manner to obtain a suitable V3 and so on until we have exhausted V .
We are now in a position to state and prove our general result. Starting
with the operator f ∈ L(V ) we first split V up in the form
V = V1 ⊕ · · · ⊕ Vr
where each Vi is f -invariant and the restriction of (f −λi Id) to Vi is nilpotent.
Applying the second result we get a further splitting
Vi = W1i ⊕ · · · ⊕ Wkii
and a basis for Wji so that the matrix is a Jordan matrix. Combining all
of the bases for the various Wij we get one for V with respect to which the
matrix of f has the form
diag (J(λ1 ), . . . , J(λ1 ), J(λ2 ), . . . , J(λr ), . . . , J(λr ))
where we have omitted the subscripts indicating the dimensions of the Jordan
matrices.
This result about the existence of the above representation (which is
called the Jordan canonical form of the operator) is rather powerful and
can often be used to prove non-trivial facts about matrices by reducing to
the case of Jordan matrices. We use this technique in the following proof
of the so-called Cayley-Hamilton theorem which states that a matrix is a
“solution” of its own characteristic equation.
Proposition 17 Let A be an n × n matrix. Then χA (A) = 0.
Proof. We begin with the case where A has Jordan form i.e. a block
representation diag (A1 , . . . , Ar ) where Ai is the part corresponding to the
eigenvalue λi . Ai itself can be divided into Jordan blocks i.e.
A = diag(J(λi ), . . . , J(λi )).
Now if p is a polynomial, then p(A) = diag (p(A1 ), . . . , p(Ar )) and so it
suffices to show that χA (Ai ) = 0 for each i. But χA contains the factor
(λi − λ)ni and so χA (Ai ) contains the factor (λI − Ai )ni and we have seen
that this is zero.
We now consider the general case i.e. where A is not necessarily in Jordan
form. We can find an invertible matrix S with à = S −1 AS has Jordan form.
Then χà = χA and so
χA (A) = χà (S) = χà (S ÃS −1 ) = Sχà (Ã)S −1 = 0.
The Cayley-Hamilton theorem can be used to calculate higher powers and
inverses of matrices. We illustrate this with a simple example:
54
Example: If
2 −1 3
A = 1 0 2 ,
0 3 1
then
χA (λ) = −λ3 + 3λ2 + 3λ − 2
and so
−A3 + 3A2 + 3A − 2I = 0.
Hence A3 = 3A2 − 3A − 2I. From this it follows that
A4 = 3A3 + 3A2 − 2A
13 18 3 3 7 7 2 −1 3
= 3 9 13 21 + 3 2 5 5 − 2 1 0 2
9 18 12 3 3 7 0 3 1
44 77 105
= 31 57 74 .
36 57 85
A further interesting fact that can easily be verified with help of the Jordan
form is the following:
Proof. Without loss of generality, we can assume that A has Jordan form
and then p(A) is a triangular matrix with diagonal entries p(λ1 ), . . . , p(λn ).
The above calculations indicate the usefulness of a polynomial p such that
p(A) = 0. The Cayley-Hamilton theorem provides us with one of degree n.
In general, however, there will be suitable polynomials of lower degree. For
example, the characteristic polynomial of the identity matrix In is (1 − λ)n
but p(I) = 0 where p is the linear polynomial p(λ) = 1 − λ. Since it is
obviously of advantage to take the polynomial of smallest degree with this
property, we introduce the following definition:
55
Definition: Let A be an n × n matrix with characteristic polynomial‘
χA (λ) = (λ1 − λ)n1 . . . (λr − λ)nr .
Then there exists for each i a smallest mi (≤ ni ) so that p(A) = 0 where
p(λ) = (λ1 − λ)m1 . . . (λr − λ)mr .
This polynomial is called the minimal polynomial of A and denoted by
mA . In principle it can be calculated by considering the n1 · n2 . . . nr divisors
of the characteristic polynomial which contain the factor (λ1 − λ) . . . (λr − λ)
and determining the one of lowest degree which annihilates A. In terms of
the Jordan canonical form of A it is clear that mi is the order of the largest
Jordan matrix in the block corresponding to the eigenvalue λi .
We conclude with two simple and typical applications of the Cayley-
Hamilton theorem.
I. Suppose that we are given a polynomial p with roots λ1 , . . . , λn and are
required to construct a second one whose roots are the square of the λi
(without calculating these roots explicitly). This can be done as follows: let
A be the companion matrix of p so that the eigenvalues of A are the roots
of p. Then if B = A2 , the eigenvalues of B are the required numbers. Hence
q = χB is a suitable polynomial.
II. Suppose that we are given two polynomials p and q whereby the roots of
p are λ1 , . . . , λn . If A is the companion matrix of p, then the eigenvalues of
q(A) are q(λ1 ), . . . , q(λn ). Hence p and q have a common root if and only
if det q(A) = 0. This gives a criterium for the two polynomials to have
a common root. For this reason the quantity ∆ = det q(A) is called the
resultant of p and q.
The particular case where q is the derivative of p is useful since the ex-
istence of a common root for p and p′ implies that p has a double root. In
this case the expression ∆ = det p′ (A) is called the discriminant of p.
56
which is never zero.
4 − 2c a + c −2 + b + c2
A3 = ? ? ?
? ? ?
and A3 = 0. Now
1 1 1 −2
A 0 = −3 and A2 0 = 2 .
0 1 0 2
Then (−2, 2, 2), (1, −3, 1) and (1, 0, 0) are linearly independent and with
respect to this basis fA has matrix
0 1 0
0 0 1 .
0 0 0
57
Example: Calculate the minimal polynomials of
1 1 1 1 1 1
A= 0 1 1 B = 1 1 1 .
0 0 1 1 1 1
2) For each eigenvalue λ of the matrices A and B below calculate the value
of r for which Ker (A − λI)r becomes stationary:
1 0 0 0
4 6 0 2 1 0 0
A = −1 −1 0 B= 3
.
2 1 0
0 0 1
4 3 2 1
df
= g
dt
dg
= −6f + 5g.
dt
4) Diagonalise the matrix
1 −1 2
A = −1 1 2
2 2 −2
58
and use it to solve the difference equations:
an+1 = an − bn + 2cn
bn+1 = −an + bn + 2cn
cn+1 = 2an + 2bn − 2cn
• f n = 0 but f n−1 6= 0;
59
• r(f ) = n − 1;
• dim Ker f = 1;
λn 1
χA−1 (λ) = (−1)n χA ( ).
det A λ
Show that in general (i.e. without the condition on invertibility of A) we
have
χ′ (λ)
tr (λI − A)−1 = A
χA (λ)
whenever λ is not an eigenvalue of A.
15) Show that if A is a nilpotent n × n matrix with Ak = 0, then I − A is
invertible and
(I − A)−1 = I + A + · · · + Ak−1 .
16) Let A be a complex 2 × 2 matrix which is not a multiple of the unit
matrix. Show that any matrix which commutes with A can be written in the
form λI + µA (λ, µ ∈ C).
17) Find a Jordan canonical form for the operator
60
• the set of matrices B which commute with A;
• the set of matrices which commute with all matrices which commute
with A.
Deduce that a matrix is in the latter set if and only if it has the form p(A)
for some polynomial p.
22) Use 21) to show that a matrix B commutes with all matrices which
commute with a given matrix A if and only if B = p(A) for some polynomial
p (cf. Exercise 7) above).
23) Show that if A1 and A2 are commuting n × n matrices, then their eigen-
values can be ordered as
λ1 , . . . , λn resp. µ1 , . . . , µn
in such a way that for any polynomial p of two variables, the eigenvalues of
p(A1 , A2 ) are
p(λ1 , µ1 ), . . . , p(λn , µn ).
Generalise to commuting r-tuples A1 , . . . , Ar of matrices.
24) Show that if p and q are polynomials and A is the companion matrix of p,
then the nullity of q(A) is the number of common roots of p and q (counted
with multiplicities). In the case where q = p′ , the rank of p′ (A) is the number
of distinct roots of A.
25) Consider the companion matrix
0 1 0 ... 0
0 0 1 ... 0
.. .
.
C= . .
0 0 0 ... 1
−a0 −a1 −a2 . . . −a0
of the polynomial p (cf. a previous exercise) and suppose now that p has
repeated roots
λ1 , . . . , λ1 , λ2 , . . . , λi , . . . , λr , . . . , λr
where λi occurs ni times. Show that C has Jordan form
diag (Jn1 (λ1 ), Jn2 (λ2 ), . . . , Jnr (λr ))
and that this is induced by the following generalised Vandermonde matrix:
1 0 ... 1 ... 0
λ1 1 ... λ2 . . . 0
λ2 2λ1 ... λ22 . . . 0
1 .
.. ..
. .
λ1n−1 (n − 1)λ1n−2 . . . n−1
λ2 n−nr
. . . λr
61
(The first n1 columns are obtained by successive differentiation of the first
one and so on).
62
3.4 Functions of matrices and operators
We have often used the fact that we can substitute square matrices into
polynomials. For many applications, it is desirable to be able to do this
for more general functions and we discuss briefly some of the possibilities.
Suppose firstly that A is a diagonal matrix, say
A = diag (λ1 , . . . , λn ).
S −1 AS = D = diag (a,1 , . . . , λn ),
we define x(A) to be
The case where A is not diagonalisable turns out to be rather more tricky.
Firstly, we note that it suffices to be able to define x(A) for Jordan blocks.
For if A has Jordan form
S −1 AS = diag (J1 , . . . , Jr )
S · diag (x(J1 ), . . . , x( Jr )) · S −1
once we know how to define the x(Ji ). (We are using the notation diag (J1 , . . . , Jr )
for the representation of a Jordan form as a blocked diagonal matrix).
In order to motivate the general definition, consider the case of the square
root of the Jordan matrix Jn (λ). Firstly, we remark that for λ = 0, no such
square root exists. We show this for the simplest case (n = 2) but the same
argument works in general.
63
Example: Show that there is no matrix A so that
2 0 1
A = .
0 0
α(α − 1) . . . (α − n + 1)
)
n!
satisfies the equation A2 = Jn (λ).
If this matrix seems rather mysterious, notice that the difference between
the cases λ = 0 and λ 6= 0 lies in the fact that the complex function z 7→ z 1/2
is analytic in the neighbourhood of a non-zero λ (i.e. is expressible as a power
series in a neighbourhood of λ) whereas this is not the case at 0. In fact, we
calculated the above root by writing
1
Jn (λ) = λ(I + N)
λ
where N is the nilpotent matrix Jn (0). We then wrote
1
Jn (λ)1/2 = λ1/2 (I + N)1/2
λ
and substituted λ1 N for z in the Taylor series
X∞ 1
1/2 2 zi .
(1 + z) =
i=0
i
64
If we apply the same method to the Taylor series
(1 + z)−1 = 1 − z + z 2 − z 3 + . . .
then we can calculate the inverse of Jn (λ) for λ 6= 0. The reader can check
that the result coincides with that given above.
This suggest the following method for defining x(A) where, for the sake
of simplicity, we shall assume that x is entire. This will ensure that x has a
Taylor expansion around each λ in the spectrum of A. As noted above, we
use the Jordan form
S −1 AS = diag (J1 , . . . , Jr )
where Ji is Jni (λi ). We define x(Ji ) as follows. x has the Taylor expansion
65
where xik = x(k) (λi ). Hence if we write Pik for the operator Lik (A), then
X
x(A) = xik Pik .
i,k
j6=i
λk − λj
which takes on the value 1 at λi and the value 0 at the other λ’s. We note also
the fact that the sum of the Li ’s is the constant function one and that L2i = Li
(both of these when the functions are evaluated at the eigenvalues). In this
case, the components Pi = Li (A) satisfy the equations Pi2 = Pi (i.e. they
are projections) and their sum is the identity operator. The most important
example of such a function of a matrix is the exponential function. Since the
latter is entire, we can substitute any matrix and we denote the result by
exp(A) or eA . We note some of its simple properties:
• if A is diagonalisable, say A = SDS −1 where D = diag (λ1 , . . . , λn ),
then
exp A = S · diag (eλ1 , . . . , eλn ) · S −1 ;
• if A = D + N where D is diagonalisable and N is nilpotent, both
commuting, then exp A = exp D · exp N and
∞
X Nk
exp N =
k!
k=0
66
• the function t 7→ exp tA from R into the set of n × n matrices is
differentiable and
d
(exp tA) = A · exp(tA).
dt
We remark that the statement in (6) means that the elements of the matrix
exp tA, as functions of t, are differentiable. The derivative on the left hand
side is then the matrix obtained by differentiating its elements.
This property is particularly important since it means that the general
solution of the system
dX
= AX
dt
of differential equations where X is the column matrix
x1 (t)
..
X(t) = .
xin (t)
X(t) = exp tA · X0
of initial conditions.
67
and A is the companion matrix
0 1 0 ... 0
0 0 1 ... 0
.. ..
A= . .
0 0 0 ... 1
−a0 −a1 −a2 . . . −an−1
of the polynomial
p(t) = tn + an−1 tn−1 + · · · + a0 .
As we know, the characteristic polynomial of this matrix is p and so its
eigenvalues are the roots λ1 , . . . , λn of p.
We suppose that these λi are all distinct. Then A is diagonalisable and
the diagonalising matrix is the Vandermonde matrix
1 ... 1
λ1 . . . λn
V (λ1 , . . . , λn ) = .. .. .
. .
n−1 n−1
λ1 . . . λn
and the solution of the above equation can be read off from the formula
The reader can check that this provides the classical solution.
In principle, the case of repeated roots for p can be treated similarly.
Instead of the above diagonalisation, we use the reduction of A to its Jordan
form. The details form one of the main topics in most elementary books on
differential equations.
68
Exercises: 1) Calculate exp A where
0 1 0
cos θ − sin θ cos θ sin θ 0 t
A= 2 0 2 A= A= A= .
sin θ cos θ sin θ − cos θ −t 0
0 1 0
dx1
= x1 − 12x3 + e−3t
dt
dx2
= −x1 + 7x2 − 20x3
dt
dx3
= x1 + 5x3 + cos t.
dt
3) If
0 1 0 ... 0
0 0 1 ... 0
.. .. .
C= . .
0 0 0 ... 1
1 0 0 ... 0
calculate exp tC.
4) Calculate exp A where A is the matrix of the differentiation operator D
on Pol (n).
5) Show that if A is an n × n matrix all of whose entries are positive, then
the same holds for exp A.
6) Show that det(exp A) = exp(tr A).
7) Show that the general solution of the equation
dX
= AX + B
dt
with initial condition X(0) = X0 , where A is a constant n × n matrix and B
is a continuous mapping from R into the space of n × 1 column matrices, is
given by the equation
Z t
X(t) = exp((t − s)A)B(s) ds + exp tA · X0 .
0
69
then two polynomials p and q agree on A (i.e. are such that p(A) = q(A)) if
and only if for each i
p(λi ) = q(λi ) p′ (λi ) = q ′ (λi ) ... pmi −1 (λi ) = q mi −1 (λi ).
9) Let A and B be n × n matrices and define a matrix function X(t) as
follows:
X(t) = eAt CB Bt .
Show that X is a solution of the differential equation
dX
= AX + XB
dt
with initial condition X(0) = C.
Deduce that if the integral
Z ∞
Y =− eAt CeBt dt
0
70
3.5 Circulants and geometry
An n × n matrix A is a circulant if it has the form
a0 a1 . . . an−1
an−1 a0 . . . an−2
.. .. .
. .
a1 a2 ... a)
Note that this just means that A is a polynomial function of the special
circulant
0 1 0 ... 0
0 0 1 ... 0
C = .. .. .
. .
1 0 0 ... 0
In fact, A is then p(C) where
p(t) = a0 + a1 t + · · · + an−1 tn−1 .
The following result gives an alternative characterisation of circulant matri-
ces. It can be verified by direct calculation.
Proposition 19 An n × n matrix is circulant if and only if it commutes
with C.
We have already calculated the eigenvalues of C and found them to be
1, ω, ω 2, . . . , ω n−1
where ω is the primitive root cos 2πn
+ i sin 2π
n
. The eigenvector corresponding
1
k
to ω is easily seen to be uk = n (ω , ω , . . . , ω nk ). (The reason for the
√ k 2k
factor √1n will become apparent later). These eigenvectors are particularly
interesting since they are also eigenvectors for all polynomial functions of C
i.e. for the circulant matrices.
Here we shall discuss briefly the circulants of the form
1 1 1 ... 1 0 ... 0
0 1 1 ... 1 1 ... 0
A = .. ..
. .
1 1 1 ... 0 0 ... 1
where there are m “ones” in each row. In other words,
1
A = p(C) where p(t) = (1 + t + · · · + tm−1 ).
m
71
It follows from the above that the eigenvalues of A are λ1 , . . . , λn where
λn = 1 and
1 1 − ω km
λk =
m 1 − ωk
for k 6= n.
Then we see immediately
• that A is invertible if and only if m and n are relatively prime;
• that if d is the greatest common divisor of m and n, then the dimension
of the kernel of fA is d and it has a basis consisting of the vectors
1 jd 2jd
(ω , ω , . . . , 1, ω jd, . . . )
n
for 1 ≤ j ≤ d − 1.
These results have the following geometrical interpretation. Suppose that P
is an n-gon in R2 . If we identify the points of R2 with complex numbers, we
can specify P by an n-tuple (z1 , . . . , zn ) of complex numbers (its vertices).
For example, the standard square corresponds to the 4-tuple (o, 1, 1 + i, i).
An n × n matrix A can be regarded as acting on such polynomials by left
multiplication of the corresponding column matrix i.e. we define the polygon
Q = A(P) to be the one with vertices ζ1 , . . . , ζn where
ζ1 z1
.. .
. = A .. .
ζn zn
Consider the transformation with matrix
1 1 ... 1 0 ... 1
1
0 1 ... 1 1 ... 0
A = .. ..
m . .
1 1 ... 1 0 ... 1
discussed above. In this case, Q is the polygon whose vertices are the cen-
troids of the vertices P1 , . . . , Pm resp. P2 , . . . , Pm+1 and so on. This polygon
is called the m-descendant of P.
The results on the matrix A that we obtained above can now be expressed
geometrically as follows:
If m and n are relatively prime, then every polygon Q is the m-descendant
of a unique polygon P.
A more delicate investigation of the case where the greatest common
factor d of m and n is greater than 1 leads to a characterisation of those
polygons P which are m descendants (see the Exercises below).
72
Exercises: 1) Show that the determinant of the circulant matrix circ (a0 , . . . , an−1 )
is
n−1
YX n−1
ω ik ak
j=0 k=0
and use this to give a characterisation of those polygons which are m-descendants
resp. whose m-descendants are the trivial polygon with all vertices at the
origin.
5) Diagonalise the following circulant matrices:
73
3.6 The group inverse and the Drazin inverse
As a further application of the Jordan form we shall construct two special
types of generalised inverse for linear mappings (resp. matrices). These are
of some importance in certain applications. The method will be the same in
both cases. Firstly we construct the inverse for matrices in Jordan form and
then use this to deal with the general case. We begin with the group inverse.
A group inverse for an n × n matrix A is an n × n matrix S so that
ASA = A SAS = S
V = f (V ) ⊕ Ker f
In terms of the minimal polynomial, this means that the matrix has a group
inverse provided the latter has the form
where ǫ is either 0 or 1.
74
The Drazin inverse A Drazin inverse for an n × n matrix A is an n × n
matrix S so that
• SAS = S;
• S and A commute;
1
• λ ∈ σ(A) if and only if λ† ∈ σ(S) where λ† = λ
for λ 6= 0 and λ† = 0
for λ = 0;
diag (A1 , A2 , . . . , Ar )
diag (AD D
1 , . . . , Ar ).
Once again, this satisfies the above four condition (this time with the integer
k the maximum of the orders (ki ) of the individual blocks corresponding to
the zero eigenvalue).
For a general matrix A we choose an invertible P so that A = P −1 ÃP
where à has Jordan form. Then we define AD to be P −1 ÃD P . Of course,
AD is a Drazin inverse for A.
In terms of operators, the Drazin inverse can be described as follows:
suppose that f : V → V is a linear transformation. Then, as we have seen,
there is a smallest integer p at which the sequences
V ⊃ f (V ) ⊃ f 2 (V ) ⊃ . . .
and
Ker f ⊂ Ker (f 2 ) ⊂ Ker(f 3 ) ⊂ . . .
become stationary. This integer is called the index of f and we have the
splitting
V = f p (V ) ⊕ Ker (f p ).
f , restricted to f p (V ), is an isomorphism of this space onto itself. The
operator f D is that one which is obtained by composing the inverse of the
latter with the projection onto f p (V ) along Ker (f p ).
75
Exercises: 1) For f ∈ L(V ) we define A(f ) = f (V ) ∩ Ker f . Show that
2) Show that the Drazin inverse is uniquely determined i.e. that there is at
most one matrix S so that SAS = S, AS = SA and Ak+1 S = Ak for some
positive integer k.
3) Show that the group inverse is uniquely determined i.e. there is at most
one matrix S so that ASA = A, SAS = S and AS = SA.
4) Show that if f ∈ L(V ) has a Drazin inverse f D , then f D (V ) = f (V ),
Ker f D = Ker(f k ) and f f D = f D f is the projection onto f (V ) along Ker f k .
5) The following exercise shows how the existence of the Drazin inverse can
be deduced directly from the existence of the minimal polynomial, without
using the Jordan form. Suppose that mA has the form
t 7→ ak tk + · · · + tr
76
4 EUCLIDEAN AND HERMITIAN SPACES
4.1 Euclidean space
In chapter II we saw that a number of basic geometrical concepts could be
defined in terms of the scalar product. We now discuss such products in
higher dimensions where, in the spirit of chapter III, we use the axiomatic
approach. We shall prove higher dimensional versions of many of the results
of chapter II, culminating in the spectral theorem for self-adjoint operators.
(x, y) 7→ (x|y)
so that
(λ1 x1 +λ2 x2 |µ1 y1 +µ2 y2 ) = λ1 µ1 (x1 |y1)+λ1 µ2 (x1 |y2 )+λ2 µ1 (x2 |y1 )+λ2 µ2 (x2 |y2 )
and the positive definiteness means exactly that the corresponding conic
section
X2
aij ξi ξj = 1
i,j
77
is an ellipse. Hence the choice of a scalar product in R2 is just the choice of
an ellipse with centre 0.
The standard euclidean space is Rn with the scalar product
X
(x|y) = ξi ηi .
i
Another example is the space Pol (n) with the scalar product
Z 1
(p|q) = p(t)q(t) dt.
0
The latter is a subspace of the infinite dimensional space C([0, 1]) with scalar
product Z 1
(x|y) = x(t)y(t) dt.
0
Using the scalar product we can define the length (or norm) of a vector
x—written kxk. It is defined by the formula
p
kxk = (x|x).
and so
1
(x|y) = (kx + yk2 − kxk2 − kyk2).
2
2
As in R the norm and scalar product satisfy the Cauchy-Schwartz in-
equality:
|(x|y)| ≤ kxkkyk (x, y ∈ V ).
This is named after the discoverers of the classical case
1 1
ξ1 η1 + · · · + ξn ηn ≤ (ξ12 + · · · + ξn ) 2 (η12 + · · · + ηn2 ) 2 .
78
which is non-negative by the positive-definiteness of the product. Hence its
discriminant is less than or equal to zero i.e. 4(x|y)2 −4(kxk2 kyk2) ≤ 0 which
reduces to the required inequality. (Note that the same proof shows that the
Cauchy-Schwarz inequality is strict i.e. |(x|y)| < kxkkyk unless x and y are
proportional. For the above quadratic is strictly positive if x and y are not
proportional and then the discriminant must be negative).
From this we can deduce the triangle inequality
kx + yk ≤ kxk + kyk.
For
kx + yk2 = (x + y|x + y)
= kxk2 + 2(x|y) + kyk2
≤ kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2.
From the Cauchy-Schwarz inequality we see that if x and y are non-zero then
the quotient
(x|y)
kxkkyk
lies between −1 and 1. Hence there is a unique θ ∈ [0, π] so that
(x|y)
cos θ = .
kxkkyk
λ1 x1 + . . . λm xm = 0,
then
0 = (λ1 ξ1 + · · · + λm xm |xi ) = λi (xi |xi )
and so λi = 0 for each i.
79
Hence an orthonormal system (x1 , . . . , xn ) with n elements in an n-dimensional
space is a basis and such bases are called orthonormal bases. The classical
example is the canonical basis (e1 , . . . , en ) for Rn .
One advantage of an orthonormal basis is the fact that the coefficient of
a vector with respect to Pthe basis can be calculated simply by taking scalar
n
products. In fact x = k=1(x|xk )xk . (This is sometimes called the Fourier
seriesPof x). This is proved
Pn by a calculation similar to the one above. Also if
n
x = i+1 λi xi and y = k=1 µk xm , then
n
X n
X n
X n
X
(x|y) = ( λi xi | µk xk ) = λi µk (xi |xk ) = λi µ i
i=1 k=1 i,k=1 i=1
p
and, in particular, kxk = λ21 + · · · + λ2n . Thus the scalar product and norm
can be calculated from the coordinates with respect to an orthonormal basis
exactly as we calculate them in Rn .
Every euclidean space has an orthonormal basis and to prove this we use
a construction which has a natural geometrical background and which we
have already used in dimensions 2 and 3.
Proof. We construct the basis recursively. For x1 we take any unit vector.
If we have constructed x1 , . . . , xr we construct xr+1 as follows. We take any
z which does not lie in the span of x1 , . . . , xr (of course, if there is no such
element we have already constructed a basis). Then define
r
X
x̃r+1 = z − (z|xi )xi .
i=1
80
The standard way to construct an orthonormal basis is called the Gram-
Schmid process and consists in applying the above method in connection
with a given basis (y1 , . . . , yn ), using y1 for x1 and, at the r-th step using
yr+1 for z. This produces an orthonormal basis x1 , . . . , xn of the form
x1 = b11 y1
x2 = b21 y1 + b22 y2
..
.
xn = bn1 y1 + · · · + bnn yn
where the diagonal elements bii are non-zero. If we apply this method to the
case where the space is Rn (identified with the space of row vectors) and the
basis (y1, . . . , yn ) consists of the rows of an invertible n × n matrix A, we
obtain a lower triangular matrix
b11 0 . . . 0
b21 b22 . . . 0
B = .. ..
. .
bn1 bn2 . . . bnn
and a matrix Q whose rows form an orthonormal basis for Rn (such matrices
are called orthonormal) so that Q = BA. Since B is invertible and its
inverse L is also a lower triangular matrix, we obtain the following result on
matrices:
Proposition 22 Any n × n invertible matrix A has a representation of the
form A = LQ where Q is an orthonormal matrix and L is lower triangular.
We can use this fact to prove a famous inequality for the determinant of
the n × n matrix. We have
YX 1
| det A| ≤ ( |aij |2 ) 2
i j
81
(We have used the fact that if the matrix U is orthonormal, then its deter-
minant is ±1. This follows as in the 2-dimensional case from the equation
U t U = I—see below).
In the context of euclidean space, those linear mapping which preserve
distance (i.e. are such that kf (x)k = kxk for x ∈ V ) are of particular interest.
As in the two and three dimensional cases, we can make the following simple
remarks (note that we only consider linear isometries):
I. If f is an isometry, then f preserves scalar products i.e. (f (x)|f (y)) = (x|y)
(x, y ∈ V ). For
1
(f (x)|f (y)) = (kf (x + y)k2 − kf (x)k2 − kf (y)k2)
2
1
= (kx + yk2 − kxk2 − kyk2)
2
= (x|y).
On the other hand, this property implies that f is an isometry. (Take x = y).
II. An isometry from V into V1 is automatically injective and so is surjective
if and only if dim V = dim V1 . In particular, any isometry of V into itself is
a bijection.
III. An isometry maps orthonormal systems onto orthonormal systems. In
particular, if dim V = dim V1 , then f maps orthonormal bases onto orthonor-
mal bases. On the other hand, if f maps one orthonormal basis (x1 , . . . , xn )
ontoP an orthonormal system P(y1 , . . . , yn ) in V1 , then f is an isometry. For if
x = k λk xk , then f (x) = k λk yk and so
X
kf (x)k2 = λ2k = kxk2 .
k
82
with respect to some orthonormal basis and rotations i.e. those with ma-
trices of the form
1 0 0 ... 0 0
0 1 0 ... 0 0
.. ..
. .
0 0 0 . . . cos θ − sin θ
0 0 0 . . . sin θ cos θ
(A|B) = tr (AB t )
is a scalar product on Mn .
Solution: The bilinearity and symmetry:
(A|B) = tr (AB t )
= tr(AB t )t
= tr(BAt )
= (B|A).
Positive definiteness:
X
(A|A) = tr (AAt ) = a2ij > 0
i,j
for A 6= 0.
We remark that the basis (eij : i, j = 1, . . . , n) is then orthonormal where
eij is the matrix with a 1 in the (i, j)-th position and zeroes elsewhere.
83
is a scalar product on R2 .
Solution: The matrix of the quadratic form is
4 −2
−2 3
with eigenvalues the roots of λ2 − 7λ + 8 which are positive.
We calculate an orthonormal basis with respect to this scalar product by
applying the Gram-Schmidt process to x1 = (1, 0), x2 = (0, 1). This gives
x1 1
e1 = = ( , 0);
kx1 k 2
1 1 1
y2 = (0, 1) − ((0, 1)|( , 0))( , 0) = , 1);
2 2 2
1
e2 = √ (2, 1).
2
Example: Construct an orthonormal basis for Pol (2) with scalar product
Z 1
(p|q) = p(t)q(t) dt
0
1
= t2 − t + ;
6
√ 2 1
x2 (t) = 6 5(t − t + ).
6
We calculate the Fourier series of 1 + t + t2 with respect to this basis. We
have
Z 1
11
λ1 = (t2 + t + 1) dt =
0 6
√ Z 1 2 1 1
λ2 = 12 (t + t + 1)(t − ) dt = √
0 2 3
√ Z 1
1 1
λ3 = 6 5 (t2 + t + 1)(t2 − t + ) dt = √
0 6 6 5
84
and so
11 1 1
t2 + t + 1 = + 2(t − ) + (t2 − t + ).
6 2 6
(And hence the right hand side is non-negative and vanishes if and only if
the xi are linearly dependent).
Solution: This follows from the equalities
n
X
(xi |xj ) = (xi |ek )(xj |ek )
k=1
Exercises: 1)
• Calculate the orthogonal projection of (1, 3, −2) on [(2, 1, 3), (2, 0, 5)].
85
3) If x1 , . . . , xm are points in Rn , then the set of points which are equidistant
from x1 , . . . , xm is an affine subspace which is of dimension n − m if the xi
are affinely independent.
4) Let (x0 , . . . , xn ) be affinely independent points in Rn . Then there is exactly
one hypersphere through these points and its equation is
1 1 ... 1
X0 X1 ... X
det .. .. = 0
. .
(x0 |x0 ) (x1 |x1 ) . . . (x|x)
and hence that the system (pn ) is orthogonal for the corresponding scalar
product. (This shows that the system (pn ) is, up to norming factors, the
sequence obtained by applying the Gram-Schmidt process to the sequence
(tn ). These functions are called the Legendre polynomials).
86
9) Approximate sin x by a polynomial of degree 3 using the orthonormal
basis of the last exercise. Use this to check the accuracy by calculating an
approximate value of sin 1. 10) Show that for an n×n matrix A the following
inequality holds:
X n
1
a2ij ≥ (tr A)2 .
i,j=1
n
det Gj
where λj = . Here G is the matrix [gij ] where gij = (xi |xj ) and Gj is
det G
the matrix obtained from G by replacing the j-th column by ??
14) Show that if x1 , . . . , xn is a basis for the euclidean space V , then the
Gram-Schmidt process, applied to this basis, leads to the system (yk ) where
1
yk = (dk−1dk ) 2 D
87
(The last expression is to be understood as the linear combination of the x’s
obtained by formally expanding the “determinant” along the last row).
15) Use Hadamard’s inequality to show that
n
| det A|2 ≤ K n · n 2
where the minima are each taken over the possible choices of the coefficients
a1 , . . . , an .
88
4.2 Orthogonal decompositions
In our discussion of vector spaces we saw that each subspace has a com-
plementary subspace which determines a splitting of the original space. In
general this complementary subspace is not unique but, as we shall now see,
the structure of a euclidean space allows us to choose a unique one in a
natural way.
For if z ∈ V1 , we have
89
In the same way we define an orthogonal decomposition
V = V1 ⊥ · · · ⊥ Vr .
• P1 + · · · + Pr = Id;
• Pi Pj = 0 (i 6= j).
(A|B) = tr (B t A).
Show that Mn is the orthogonal direct sum of the symmetric and the anti-
symmetric matrices.
90
4.3 Self-ajdoint mappings— the spectral theorem
One of the most important consequences of the existence of a scalar product
is that the fact that it induces a certain symmetry on the linear operators
on the space. If f : V → V1 we say that g : V1 → V is ajdoint to f if
(f (x)|y) = (x|g(y)) for x ∈ V and y ∈ V1 . We shall presently see that such
a g always exists. Furthermore it is unique. The general construction is
illustrated by the following two examples:
I. Consider the mapping f : R2 → R2 defined by the matrix
a11 a12
.
a22 a22
91
Proof. Suppose that x ∈ V and y ∈ V1 . Then
n
X m
X
(f (x)|y)) = (f ( (x|xj )xj )| (y|yk )yk )
j=1 k=1
m X
X n X
=( ( aij (x|xj )yi | (y|yk )yk )
i−1 j=1
m
XX n X m
= aij (x|xj )(y|yk )(yi |yk )
i=1 j=1 k=1
Xm X n
= aij (x|xj )(y|yi).
i=1 j=1
Similarly, we have
m X
X n
(x|g(y)) = aij (x|xj )(y|yi)
i=1 j=1
(f (x)|y) = (x|g(y))
i.e. g is adjoint to f .
Naturally, we shall denote g by f t and note the following simple properties:
• (f + g)t = f t + g t ;
• (λf )t = λf t ;
• (f g)t = g t f t ;
• (f t )t = f ;
• an operator f is an isometry if and only if f t f = Id. In particular, if
V = V1 , this means that f t = f −1 .
We prove the last statement. We have seen that f is an isometry if and only
if (f (x)|f (y)) = (x|y) for each x, y and this can be restated in the form
(f t f (x)|y) = (x|y)
i.e. f t f = Id.
A mapping f : V → V is said to be self-adjoint if f t = f i.e.
92
for x, y ∈ V . This is equivalent to f being represented by a symmetric A
with respect to an orthonormal basis. In particular, this will be the case if
f is represented by a diagonal matrix. The most important result of this
chapter is the following which states that the converse is true:
U t AU = U −1 AU = diag (λ1 , . . . , λn ).
This follows immediately from the above Proposition, since the transfer ma-
trix between orthonormal bases is orthonormal.
Before proceeding with the proof we recall that we have essentially proved
this result in the two dimensional case in our treatment of conic sections in
Chapter II.
One of the consequences of the result is the fact that every symmetric
matrix A has at least one eigenvalue. Indeed this is the essential part of
the proof as we shall see and before proving
this we reconsider the two di-
a11 a12
mensional case. The symmetric matrix induces the quadratic
a12 a22
form
φ1 (x) = (fA (x)|x)
and so define the conic section
Q1 = {x : φ1 (x) = 1}.
If we compare this to the form φ2 (x) = (x|x) which defines the unit circles
Q2 = {x : φ2 (x) = 1} and consider the case where Q1 is an ellipse, then we
see that the eigenvectors of A are just the major and minor axes of the ellipse
φ1 (x)
and these can be characterised as those directions x for which the ratio
φ2 (x)
is extremal. Hence we can reduce the search for eigenvalues to one for the
extremal value of a suitable functional, a problem which we can solve with
93
the help of elementary calculus. This simple idea will allow us to prove the
following result which is the core of the proof. We use the Proposition that
a continuous function on a closed, bounded subset of Rn is bounded and
attains its supremum.
Proposition 27 Lemma Let f : V → V be self-adjoint. Then there exists
an x ∈ V with kxk = 1 and λ ∈ R so that f (x) = λx (i.e. f has an
eigenvalue).
Proof. We first consider the case where V = Rn with the natural scalar
product. Then f is defined by an n × n symmetric matrix A. Consider the
function
(f (x)|x)
φ : x 7→
(x|x)
on V \ {0}. Then φ(λx) = φ(x) for λ 6= 0. There exists an x1 ∈ V so
that kx1 k = 1 and φ(x1 ) ≥ φ(x) for x ∈ V with kxk = 1. Hence, by the
homogeneity,
φ(x1 ) ≥ φ(x) (x ∈ V \ {0}).
We show that x1 is an eigenvector with f (x1 ) = λ1 x1 where λ1 = φ(x1 ) =
(f (x1 )|x1 ). To do this choose y 6= 0 in V . The function
ψ : t 7→ φ(x1 + ty)
has a minimum for t = 0. We show that ψ ′ (x) exists and its value is
2(z|f (x1 )) − 2(y|x1)λ1 . Hence this must vanish for each y i.e.
(y|f (x1) − λ1 x1 ) = 0.
Since this holds for each y, f (x1 ) = λ1 x1 . To calculate the derivative of ψ
we compute the limit as t tends to zero of the difference quotient
1 1
(ψ(t) − ψ(0)) = (φ(x1 + ty) − φ(x1 )).
t t
But this is the limit of the expression
1 (x1 |x1 )[(f (x1 )|x1 ) + 2t(f (x1 )|y)) + t2 (f (y)|y)] − (f (x1 )|x1 )[(x1 |x1 ) + 2t(x1 |y) + t2 (y|y)]
[ ]
t (x1 + ty|x1 + ty)(x1 |x1 )
which is easily seen to be
2(y|f (x1)) − 2(y|x1)λ1 .
In order to prove the general case (i.e. where f is an operator on an
abstract euclidean space), we consider an orthonormal basis (x1 , . . . , xn ) for
V and let A be the matrix of f with respect to this basis. Then A has an
eigenvalue and of course this means that f also has one.
We can now continue to the proof of the main result:
94
Proof. The proof is an induction argument on the dimension of V . For
dim V = 1, the result is trivial (all 1 × 1 matrices are diagonal!) The step
n − 1 → n: By the Lemma, there exists an eigenvector x1 with kx1 k = 1 and
eigenvalue λ1 . Put V1 = {x1 }⊥ = {x ∈ V : (x|x1 ) = 0}. Then V1 is (n − 1)-
dimensional and f (V1 ) ⊂ V1 since if x ∈ V1 , then (f (x)|x1 ) = (x|f (x1 )) =
(x|λ1 x1 ) = 0 and so f (x) ∈ V1 . Hence by the induction hypothesis there
exists an orthonormal basis (x2 , . . . , xn ) consisting of eigenvectors for f . Then
(x1 , . . . , xn ) is the required orthonormal basis for V .
The above proof implies the following useful characterisation of the largest
resp. smallest eigenvalue of f ;
Corollar 3 Let f : V → V be a self-adjoint linear mapping with eigenvalues
λ1 , . . . , λn numbered so that λ1 ≤ · · · ≤ λn . Then
λ1 = min{φ(x) : x ∈ V \ {0}}
λn = max{φ(x) : x ∈ V \ {0}}.
This can be generalised to the following so-called minimax characterisation
of the k-th eigenvalue:
λk = min max{(f (x)|x) : x ∈ V \ {0}, (x|y1) = · · · = (x|yr ) = 0}
the minimum being taken over all finite sequences y1 , . . . , yr of unit vectors
where r = n − k.
We remark here that it follows from the proofs of the above results that if
f is such that (f (x)|x) ≥ 0 for each x in V , then its eigenvalues are all non-
negative (and they are positive if (f (x)|x) > 0 for non-zero x). Such f are
called positive semi-definite resp. positive definite and will be examined
in some detail below. It follows form the above minimax description of the
eigenvalues that
λk (f + g) ≥ λk (f )
whenever f is self-adjoint and g is positive semi-definite (with strict inequality
when g is positive definite). As we have seen, a symmetric operator is always
diagonalisable. Using this fact, we can prove the following weaker result for
arbitrary linear operators between euclidean spaces.
Proposition 28 Let f : V → V1 be a linear mapping. Then there exist or-
thonormal bases (x1 , . . . , xn ) for V and (y1 , . . . , ym ) for V1 so that the matrix
of f with respect to these bases has the form
A1 0
0 0
where A1 = diag (µ1 , . . . , µr ), r is the rank of f and µ1 , . . . , µr are positive
scalars.
95
The corresponding result for matrices is the following:
Proposition 29 If A is an m × n matrix then there exist orthonormal ma-
trices U1 and U2 so that U1 AU2 has the above form.
Proof. We prove the operator form of the result. Note that the mapping
f t f on V is self-adjoint since (f t f )t = f t f tt = f t f . We can thus choose
an orthonormal basis (xi ) consisting of eigenvectors for f t f . If λi is the
corresponding eigenvalue we can number the xi so that the first r eigenvalues
are non-zero but the following √ ones all vanish. Then each λi is positive
(i = 1, . . . , r) and kf (xi )k = λi . For
λi = λi (xi |xi ) = (λi xi |xi ) = (f t f (xi )|xi ) = (f (xi )|f (xi )) > 0.
Hence if we put
f (x1 ) f (xr )
y1 = √ , . . . , yr = √
λ1 λr
then (y1 , . . . , yr ) is an orthonormal system in V1 . We extend it to an or-
thonormal basis (y1 , . . . , ym ) for V1 . The matrix of f with respect to these
bases clearly has the desired form.
The proof shows that the µi are the square roots of the eigenvalues of f t f
and so are uniquely determined by f . They are call the singular values of
f.
Example: Let V be a euclidean space with a basis (xi ) which is not as-
sumed to be orthonormal. Show that if f ∈ L(V ) has matrix A with re-
spect to this basis, then f is self-adjoint if and only if At G = GA where
G = [(xi |xj )]i,j .
Solution: f is self-adjoint if and only if
n
X n
X n
X n
X
(f ( λi xi | µj xj ))) = ( λi xi |f ( µj xj ))))
i=1 j=1 i=1 j=1
for all choices of scalars λ1 , . . . , λn , µ1, . . . , µn and this holds if and only if
for each i and j. Substituting the values of f (xi ) and f (xj ) one gets the
required equation At G = GA.
96
Exercises: 1) Calculate the adjoints of the following operators:
Ker f t = f (V )⊥ f t (V ) = (Ker f )⊥ .
3) Show that for every self-adjoint operator f there are orthogonal projections
P1 , . . . , Pr and real scalars λ1 , . . . , λr so that
• Pi Pj = 0 if i 6= j;
• P1 + · · · + Pr = Id;
• f = λ1 P1 + · · · + λr Pr .
97
4.4 Conic sections
As mentioned above, the theory of conic sections in the plane was the main
source of the ideas which lie behind the spectral theorem. We now indicate
briefly how the latter can be used to give a complete classification of higher
dimensional conic sections. The latter are defined as follows:
Q = {x ∈ V : (f (x)|x) + 2(b|x) + c = 0}
{x ∈ V ; (f (x)|x) + c = 0}.
This result can be restated as follows: let f be the isometry from V into Rn
which maps (x‘ 1, . . . , xn ) onto the canonical basis (e1 , . . . , en ). Then
By distinguishing the various possibilities for the signs of the λ’s, we obtain
the following types: λ1 , λ2 , λ3 all positive. Then we can reduce to the form
98
and this is an ellipsoid for d < 0, a point for d = 0 and the empty set for d > 0.
λ1 , λ2 , λ3 all negative. This can be reduced to the first case by multiplying
by −1. λ1 , λ2 positive, λ3 negative. Then we can write the equation in the
form
ξ12 ξ22 ξ32
+ − 2 + d = 0.
a2 b2 c
(the cases λ1 , λ3 > 0, λ2 < 0 resp. λ2 , λ3 > 0, λ1 < 0 can be reduced to the
above by permuting the unknowns). The above equation represents a circular
cone (d = 0) or a one-sheeted or two sheeted hyperboloid (depending on the
sign of d). The cases where at least one of the λ’s vanishes can be reduced
to the two-dimensional case.
{x ∈ R3 : (f (x)|x) + 2(g|x) + c = 0}
ξ12 ξ22
{x : + − 2ξ3 + d = 0}
a2 b2
ξ2 ξ2
{x : 12 − 22 − 2ξ3 + d = 0}
a b
2
{x : ξ1 − 2x2 + d = 0}
on Rn .
99
4.5 Hermitian spaces
For several reasons, it is useful to reconsider the theory developed in this
chapter in the context of complex vector space. Amongst other advantages
this will allow us to give a purely algebraic proof of the central result—the
diagonalisation of symmetric matrices. This is because the existence of an
eigenvalue in the complex case follows automatically from the fundamental
theorem of algebra.
We begin by introducing the concept of a hermitian vector space i.e.
a vector space V over C with a mapping
( | ):V ×V →C
so that
• (λx + y|z) = λ(x|z) + (y|z) (linearity in the first variable);
• (x|x) ≥ 0 and (x|x) = 0 if and only if x = 0;
• (x|y) = (y|x) (x, y ∈ V ).
Examples: All the examples of euclidean spaces can be “complexified” in
the natural and obvious ways. Thus we have: a) the standard scalar product
((λ1 , . . . , λn ))|(µ1 , . . . , µn )) = λ1 µ1 + · · · + λn µn
on Cn ;
b) the scalar product
Z 1
(p|q) = p(t)q(t) dt
0
on the space PolC (n) of polynomials with complex coefficients. Note the
rather unexpected appearance of the complex conjugation in condition 2)
above. This means that the scalar product is no longer bilinear and is used
in order to ensure that condition 1) can hold. However, the product is
sesqui-linear (from the classical Greek for one and a half) i.e. satisfies the
condition n n n
X X X
( λi xi | µj y j ) = λi µj (xi |yj ).
i=1 j=1 i,j=1
All of the concepts which we have introduced for euclidean spaces can be
employed, with suitable changes usually necessitated by the sesquilinearity
of the scalar product, for hermitian
p spaces. We shall review them briefly: The
length of a vector x is kxk = (x|x). Once again we have the inequality
|(x|y)| ≤ kxkkyk (x, y ∈ V ).
100
(Since there is a slight twist in the argument we give the proof. Firstly we
have (for each t ∈ R),
ℜ(x|y) ≤ kxkkyk.
Now there is a complex number λ with |λ| = 1 and λ(x|y) > 0. If we apply
the above inequality with x replaced by λx we get
Using this inequality, one proves just as before that the distance function
satisfies the triangle inequality i.e.
kx + yk ≤ kxk + kyk.
For reasons that we hope will be obvious we do not attempt to define the
angle between two vectors in hermitian space but the concept of orthogonality
continues to play a central role. Thus we say that x and y are perpendicular
if (x|y) = 0 (written x ⊥ y). Then we can define orthonormal systems and
bases as before and the Gram-Schmidt method can be used to show that every
hermitian space V has an orthonormal basis (x1 , . . . , xn ) and the mapping
(λ1 , λ2 , . . . , λn ) 7→ λ1 x1 + · · · + λn xn
(x|y) = λ1 µ1 + · · · + λn µn
kxk2 = |λ1 |2 + · · · + |λn |2
if x = λ1 x1 + · · · + λn xn , y = µ1 x1 + . . . µn xn .
Exercises: 1) Let f be the linear mapping on the two dimensional space
a b
C2 with matrix A = . Then the scalar product on C2 defines a
c d
norm there and so a norm for operators. Show that the norm of the above
operator is given by the formula
1 p
kf k2 = (h2 + [(h2 − 4| det A|2 ]).
2
101
(Calculate the singular values of A).
2) Show that if H is a hermitian matrix of the form A + iB where A and B
are real and A is non-singular, then we have the formula
102
4.6 The spectral theorem—complex version
If f ∈ L(V, V1 ) there is exactly one mapping g : V1 → V so that
(f (x)|y) = (x|g(y)) (x ∈ V, y ∈ V1 ).
We denote this mapping by f ∗ . The proof is exactly the same as for the real
case, except that we use the formula
aij = (f (xj )|yi ) = (xj |g(yj )) = (g(yi )|xj )
for the elements of the matrix A of f with respect to the orthonormal bases
(x1 , . . . , xn ) resp. (y1 , . . . , ym) to show that the matrix of f ∗ is A∗ , the n × m
matrix obtained from A by taking the complex conjugates of elements and
then transposing.
The linear mapping f on V is hermitian if f ∗ = f i.e. if (f (x)|y) =
(x|f (y)) (x, y ∈ V ). This means that the matrix A of f with respect to
an orthonormal basis satisfies the condition A = A∗ (i.e. aij = aji for each
i, j). Such matrices are also called hermitian. f : V → V1 is unitary
if (f (x)|f (y)) = (x|y) (x, y ∈ V ). This is equivalent to the condition that
f ∗ f = Id. Hence the matrix U of f (with respect to orthonormal bases) must
satisfy the condition U ∗ U = I (i.e. the columns of U are an orthonormal
system in Cn ). If dim V = dim V1 (= n say), then U is an n × n matrix and
the above condition is equivalent to the equation U ∗ = U −1 . Such matrices
are called unitary.
We now proceed to give a purely algebraic proof of the so-called spectral
theorem for hermitian operators. We begin with some preliminary results on
eigenvalues and eigenvectors:
Lemma 1 If f ∈ L(V ) with eigenvalue λ, then
• λ is real if f is hermitian;
• |λ| = 1 if f is unitary.
Proof. 1) if the non-zero element x is a corresponding eigenvector, then we
have
(f (x)|x) = (λx|x) = λ(x|x)
and
((f (x)|x) = (x|f (x)) = (x|λx) = λ̄(x|x)
and so λ = λ̄. 2) Here we have
(x|x) = (f (x)|f (x)) = (λx|λx) = |λ|2 (x|x)
and so kλk2 = 1.
103
Proposition 31 Lemma If λ1 , λ2 are distinct eigenvalues of the hermitian
mapping f with corresponding eigenvectors x1 , x2 , then x1 ⊥ x2 .
Each Ker f r lies between two terms of this series and so coincides with Ker f .
We now come to our main result:
Proposition 33 If f ∈ L(V ) is a hermitian mapping, then there exists an
orthonormal basis (xi ) for V so that each xi is an eigenvector for f . With
respect to this basis, f has the matrix diag (λ1 , . . . , λn ) where the λi are the
(real) eigenvalues of f .
Proof. Let Vi = Ker (f −λi Id). It follows from the above corollary (applied
to f − λ1 Id) that V is the direct sum
V = V1 ⊕ Im(f − λ1 Id).
(Recall from Chapter VII that we have such a splitting exactly when the ker-
nel of a mapping coincides with the kernel of its square). A simple induction
argument shows that V is the direct sum
V = V1 ⊕ V2 ⊕ · · · ⊕ Vr .
104
Corollar 5 If A is a hermitian n × n matrix, then there exists a unitary
n × n matrix U and real numbers λ1 , . . . , λn so that
U −1 AU = diag (λ1 , . . . , λn ).
with roots 2 + i and i. The corresponding eigenvectors are √12 (1, −1) and
√1 (1, 1). hence U ∗ AU = diag (i, 2 + i) where U = √1
1 1
.
2 2 −1 1
V ∗ AV = diag (λ1 , . . . , λn )
105
and the right hand side is invertible. Hence so is the left hand side. This
implies that (I + iA)−1 exists. Then
1 − iλ1 1 − iλn
V ∗ (I − iA)(I + iA)V = diag ( ,..., ).
1 + iλ1 1 + iλn
The right hand side is clearly unitary. Exercises: 1) Show that an operator
f on a hermitian space is an orthogonal projection if and only if f = f ∗ f . 2)
Let A be a complex n × n matrix, p a polynomial. Show that if p(A∗ A) = 0,
then p(AA∗ ) = 0.
3) Let p be a complex polynomial in two variables. Show that if A is an n×n
complex matrix so that p(A, A∗ ) = 0, then p(λ, λ̄) = 0 for any eigenvalue
λ of A. What can you deduce about the eigenvalues of a matrix A which
satisfies one of the conditions:
A∗ = cA (c ∈ R) A∗ A = A∗ + A A∗ A = −I?
106
4.7 Normal operators
Normal operators are a generalisation of hermitian ones. Consider first the
diagonal matrix
A = diag (λ1 , . . . , λn ).
A need not necessarily be hermitian (indeed this is the case precisely when the
λi are real). However, it does satisfy the weaker condition that AA∗ = A∗ A
i.e. that A and A∗ commute. We say that such A are normal. Similarly,
an operator f on V is normal if f ∗ f = f f ∗ . Note that unitary mappings
are examples of normal mappings—they are not usually hermitian. We shall
now show that normal mappings have diagonal representations. In order
to do this, we note that any f ∈ L(V ) has a unique representation in the
form f = g + ih where g and h are hermitian (compare the representation
of a complex number z in the form x + iy with x and y real). Indeed if
f = g + ih, then f ∗ = g − ih and so f + f ∗ = 2g i.e. g = 12 (f + f ∗). Similarly,
1
h = (f − f ∗ ). This proves the uniqueness. On the other hand, it is clear
2i
that if g and h are as in the above formula, then they are hermitian and
f = g + ih.
The fact that normal operators are diagonalisable will follow easily from
the following simple characterisation: f is normal if and only if g and h
commute.
Proof. Clearly, if g and h commute, then so do f = g + ih and f ∗ = g − ih.
On the other hand, if f and f ∗ commute then so do g and h since both are
linear combinations of f and f ∗ .
as claimed.
107
We close this section with the classification of the isometries of Rn . This
generalises the results of Chapter II on those of R2 and R3 . The method
we use is a standard one for deducing results about the real case from the
complex one and can also be employed to deduce the spectral theorem for
self-adjoint operators on euclidean spaces from the corresponding one for
hermitian operators.
We require the following simple Lemma:
Lemma 2 Let V be a subspace of Cn with the property that if z ∈ V , then
ℜz and ℑz also belong to V (where if z = (z1 , . . . , zn ), then
108
where
V1 = {x : fA (x) = x}
V−1 = {x : fA (x) = −x}
Wi = {x : fA (x) = eiθi x}
Wi′ = {x : fA (x) = e−iθi x}.
z 7→ z̄ = ℜz − ℑz
109
• kf (x)k = kf ∗ (x)k (x ∈ V ).
2) Show that an m × n complex matrix has factorisations
A = UB = CV
110
that f is an orthonormal operator on Rn . Show that there exists a one- or
two-dimensional subspace of Rn which is f -invariant. (One can suppose that
f has no eigenvalues. Choose a unit vector x so that the angle between x
and f (x) is minimum. Show that if y is the bisector of the angle between
x + f (x)
x and f (x) (i.e. y = ), then f (y) lies on the plane through x and
2
f (x)—hence the latter is f -invariant).
111
4.8 The Moore-Penrose inverse
We now return once again to the topic of generalised inverses. Recall that if
f : V → W is a linear mapping, we construct a generalised inverse for f by
considering splittings
V = V1 ⊕ V2 W = W1 ⊕ W2
where V2 is the kernel of f , W1 the image and V1 and W2 are complementary
subspaces. In the absence of any further structure, there is no natural way to
choose W2 and V1 . However, when V and W are euclidean space, the most
obvious choices are the orthogonal complements V1 = V2⊥ and W2 = W1⊥ .
Then generalised inverse that we obtain in this way is uniquely specified and
denoted by f † . It is called the Moore-Penrose inverse of f and has the
following properties:
• f †f f † = f †;
• f f †f = f †;
• f † f is the orthogonal projection onto V1 and so is self-adjoint;
• f f † is the orthogonal projection onto W1 and so is self-adjoint.
In fact, these properties characterise f † —it is the only linear mapping from
W into V which satisfies them as can easily be seen.
It follows that if y ∈ W , then x = f † (y) is the “best” solution of the
equation f (x) = y in the sense that
kf (x) − yk ≤ kf (z) − yk
for each z ∈ V i.e. f (x) is the nearest point to y in f (V ). In addition x is
the element of smallest norm which is mapped onto this nearest point.
In terms of matrices, these results can be restated as follows: let V and W
have orthonormal bases (x1 , . . . , xn ) resp. (y1 , . . . , ym ) and let f have matrix
A with respect to them. Then the matrix A† of f † satisfies the conditions:
AA† A = A A† AA† = A† AA† and A† A are self-adjoint.
Of course, A† is then called the Moore-Penrose inverse of A and is
uniquely determined by the above equations. The existence of f † can also
be proved elegantly by using the result on singular values from the third
paragraph. Recall that we can choose orthonormal bases (x1 , . . . , xn ) resp.
(y1 , . . . , ym ) so that the matrix of f has the block form
A1 0
0 0
112
with A1 = diag (µ1 , . . . , µr ). Then f † is the operator with matrix
−1
A1 0
0 0
with respect to (y1 , . . . , ym ) resp. (x1 , . . . , xn ). Note that f is injective if and
only if r = n. In this case, f has matrix
A1
0
and f † is the mapping (f t f )−1 f t as one can verify by computing the matrix
of the latter product.
Of course, the abstract geometric description of the Moore-Penrose in-
verse is of little help in calculating concrete examples and we mention some
explicit formulae which are often useful.
Firstly, suppose that A has block representation [B C] where B is an
invertible (and hence square) matrix. Then it follows from the results on
positive definite matrices that BB t + CC t is invertible. The Moore-Penrose
inverse of A is then given by the formula
t
† B (BB t + CC t )−1
A =
C t (BB t + CC t )−1
as can be checked by multiplying out.
113
and define
C0 = I
C1 = tr (C0 B)I − C0 B
1
C2 = tr (C1 B)I − C1 B
2
and so on. It turns out that Cr B vanishes where r is the rank of A and the
earlier values have non-vanishing trace. Then we have the formula
(r − 1)Cr−1 A∗
A† = .
tr (Cr−1 B)
This can be checked by noting that the various steps are independent of the
choice of basis. Hence we can choose bases so that the matrix of the operator
defined by A has the form
A1 0
0 0
where A1 = diag (λ1 , . . . , λr ). This is a simple calculation.
As an application, we consider the problem of the least square fitting of
data. Let (t1 , y1 ), . . . , (tn , yn ) be points in R2 . We determine real numbers
c, d so that the line y = ct + d provides an optimal fit. This means that c
and d should be a “solution” of the equation
t1 1 y1
.. .. c
. . = ... .
d
tn 1 yn
If we interpret this in the sense that c and d are to be chosen so that the
error
(y1 − ct1 − d1 )2 + · · · + (yn − ctn − dn )2
be as small as possible, then this reduces to calculating the Moore Penrose
inverse of
t1 1
A = ... ...
tn 1
since the solution is
y1
c
= A† ... .
d
yn
114
If the ti are distinct (which we tacitly assume), then A† is given by the
formula
A† = (At A)−1 At .
In this case P
t t2i t̄
AA=
t̄ n
where t̄ = t1 + · · · + tn .
5
Similar applications of the Moore-Penrose inverse arise in the problem of
curve fitting. Here one is interested in fitting lower order curves to given
data. In chapter V we saw how the methods of linear algebra could be
applied. In practical applications, however, the data will be overdetermined
and will not fit the required type of curve exactlyl. In this case, the Moore-
Penrose inverse can be used to find a curve whcih provides what is, in a
certain sense, a best fit. We illustrate this with an example.
115
Example: Suppose that we are given a set of points P1 , . . . , Pn in the plane
and are looking for an ellipse which passes through them. In order to simplify
the arithmetic, we shall assume that the ellipse has equation of the form
αξ12 + βξ22 = 1
(i.e. that the principal axes are on the coordinate axes). Then we are requred
to find (positive) α and β so that the equations
α(ξ1i )2 + β(ξ2i )2 = 1
are satisfied (where Pi has coordinates (ξ1i , ξ2i )). This is a linear equation
with matrix
(ξ11 )2 (ξ21 )2
.. .
A = ... .
n 2 n 2
(ξ1 ) (ξ2 )
Our theory would lead us to expect that the vector
1
† ..
a b =A .
1
y i = ci t + d
considered above.
3) Suppose that f is an operator on the hermitian space V . Show that if f is
surjective, then f f t is invertible and the Moore-Penrose inverse of f is given
by the formula
f † = f t (f f t )−1 .
Interpret this in terms of matrices.
4) Show that f ∈ L(V ) commutes with its Moore-Penrose inverse if and only
if the ranges of f and f ∗ coincide and this is equivalent to the fact that we
have splitting
V = f (V ) ⊥ Ker (f )
116
resp. that there is an orthonormal basis with respect to which f has matrix
A 0
0 0
where A is invertible.
5) Show that if A is an m × n matrix, then there are polynomials p and q so
that
A† = A∗ p(AA∗ ) and A† = q(A∗ A)A∗ .
6) Show that the Moore-Penrose inverse of A can be written down explicitly
with the help of the following integrals:
Z ∞
∗ t
†
A = e−(A A) A∗ dt
−∞
Z
† 1 1
A = (zI − A∗ A)−1 A∗ dz
2πi c z
(the latter being integrated around a simple closed curve which encloses
the non-zero eigenvalues of A∗ A. These integrals of matrix-valued functions
are to be interpreted in the natural way i.e. they are integrated elementwise).
7) Show that if A is normal with diagonalisation
∗ A1 0
U AU =
0 0
A† = (A + P )−1 − P
where
∗ 0 0
P =U U.
0 In−r
8) Use 7) to show that if C is a circulant with rank r = n − p and F ∗ CF =
diag (λ1 , . . . , λn ) is its diagonalisation as above, then
C † = C(I + K)−1 − K
117
9) Show how to use the Moore-Penrose inverse in obtaining a polynomial of
degree at most n − 1 to fit data
(t1 , x1 ), . . . , (tm , xm )
118
4.9 Positive definite matrices
We conclude this chapter with a discussion of the important topic of positive
definite matrices. Recall the following characterisation:
Proposition 37 Let f be a self-adjoint operator on V . Then the following
are equivalent:
• f is positive definite;
• all of the eigenvalues of f are positive;
• there is an invertible operator g on V so that f = g t g.
Proof. (1) implies (2): If λ is an eigenvalue,with unit eigenvector x, then
(2) implies (3): Choose an orthonormal basis (xi ) of eigenvectors for f . Then
the matrix of f is diag (λ1 , . . . , λn )√where, by√assumption, each λi > 0. Let g
be the operator with matrix diag ( λ1 , . . . , λn ). Then f = g t g. (3) implies
(1): If f = g t g, then
if x 6= 0.
There are corresponding characterisations of positive-semidefinite opera-
tors, resp. positive definite operators on hermitian spaces.
Suppose that the n × n matrix A is positive definite. By the above, A
has a factorisation B t B for some invertible n × n matrix. We shall now show
that B can be chosen to be upper triangular (in which case it is unique). For
if A = [aij ], then a11 > 0 (put X = (1, 0, . . . , 0) in the condition X t AX > 0).
Hence there is a matrix L1 of the form
1
a11
0 ... 0
−b21 1 . . . 0
.. ..
. .
−bn1 0 ...
1
so that the first row of L1 A is
1
0
..
.
0
119
(we are applying the Gaußian elimination method to reduce the first column).
Since A is symmetric, we have the inequality
t 1 0
L1 AL1 =
0 A2
where A2 is also positive definite. Note that the matrix L1 is lower triangular.
Proceeding inductively, we obtain a sequence L1 , . . . , Ln−1 of such matrices so
that if L = Ln−1 . . . L1 , then LALt = I. Hence A has the factorisation B t B
where B = (L−1 )t and so is upper triangular. This is called the Cholelsky
factorisation of A.
An almost immediate Corollary of the above is the following characteri-
sation of positive definite matrices: A symmetric n × n matrix A is positive
definite if and only if det Ak > 0 for k = 1, . . . , n where A is the k × k matrix
[aij ]ki,j=1 .
Proof. Necessity: Note that if A is positive definite then det A > 0 since
the determinant is the product of the eigenvalues of A. Clearly each Ak is
positive definite if A is (apply the defining condition on A to the vectors of
the form (ξ1 , . . . , ξk , 0, . . . , 0)). Sufficiency: Let A satisfy the above condition.
In particular, a11 > 0. As above we find a lower triangular matrix L1 with
t 1 0
à = L1 AL1 =
0 C
120
Q R
whose value is ni=1 ( R e−λi ηi dηi ) (λ1 , . . . , λn are the eigenvalues of A). The
2
by a change of variables.
for x, y ∈ V .
2) Let f and g be self-adjoint operators where g is positive definite. The
generalised eigenvalue problem for f and g is the equation f (x) = λg(x)
(where, as usual, only non-zero x are of interest). Show that the space has
a basis of eigenvectors for this problem (put g = ht h where h is invertible
and note that the problem is equivalent to the usual eigenvalue problem for
(h−1 )t f h−1 ).
3) Show that every operator f on a euclidean space has uniquely determined
representations
f = hu = u1 h1
where u and U1 are isometries and h, h1 are positive semi-definite. Show that
f is then normal if and only if h and u commute, in which case u1 = u and
h1 = h.
4) Show that if A is a real, positive definite matrix, then
1
(det A) n = min tr (AB).
det B=1,B≥0
λ1 > · · · > λn ,
121
then
Y Y
λi ≤(f (xi )|xi )
X X
λi ≤ (f (xi )|xi )
(A − aI)(A − bI)
122
5 MULTILINEAR ALGEBRA
In this chapter we bring a brief introduction to the topic of multilinear alge-
bra. This includes such important subjects as tensors and multilinear forms.
As usual, we employ a coordinate-free approach but show how to manipulate
with coordinates via suitable bases in the spaces considered. We begin with
the concept of the dual space.
fy : (ξ1 , . . . , ξn ) 7→ ξ1 η1 + · · · + ξn ηn .
Proof. It suffices to prove these for the special case V = Rn where they
are trivial
If f is a non-zero element in the dual V ∗ of V , then the subset
Hαf = {x ∈ V : f (x) = α}
123
• a point x in V is non-zero if and only if it lies on some hyperplane
which does not pass through zero;
• two points x and y in V are linearly independent if and only if there
are parallel, but distinct, hyperplanes of the form Hαf and H0f so that
x ∈ Hαf and y ∈ H0f or vice versa.
The dual basis: Suppose now that V has a basis (x1 , . . . , xn ). For each i
there is precisely one fi ∈ V ∗ so that
fi (xi ) = 1 and fi (xj ) = 0 (i 6= j).
In other words, fi is that element of the dual space which P associates to each
x ∈ V its i-th coefficient with respect to (xj ) for if x = nj=1 λj xj , then
n
X
fi (x) = λj fi (xj ) = λi .
j=1
for each x ∈ VPand this follows from an application of f to both sides of the
equation x = ni=1 fi (x)xi .
In order toPsee that the fi are linearly independent suppose that the linear
combination ni=1 λi fi is zero. Then applying this form to xj and using the
defining condition on the fi we see that λj = 0.
(Of course, the last step is, strictly speaking unnecessary since we already
know that V and V ∗ have the same dimension).
The principle used in this argument will be applied again and so, in order
to avoid tedious repetitions, we state an abstract form of it as a Lemma:
Lemma 3 Let V be a vector space whose elements are functions defined on
a set S with values in R so that the arithmetic operations on V coincide with
the natural ones for functions (i.e. (x+y)(t) = x(t)+y(t), (λx)(t) = λx(t))).
Then if x1 , . . . , xn is a sequence in V and there are points t1 , . . . , tn in S so
that
xi (tj ) = 0 (i 6= j) or 1 (i = j),
the sequence x1 , . . . , xn is linearly independent.
P
The proof is trivial. If a linear combination ni=1 λi xi vanishes, then evalu-
ation at tj shows that λj = 0.
124
Examples of dual bases: We calculate the dual bases to
• (1, 1), (1, 0) for R2 ;
• the canonical basis (1, t, . . . , tn ) for Pol (n).
(1) Let x1 = (1, 1), x2 = (1, 0) and let the dual basis be (f1 , f2 ) where
f1 = (ξ11 , ξ21), f2 = (ξ12 , ξ22). Then we have the four equations
f1 (x1 ) = ξ11 + ξ21 = 1 f2 (x1 ) = ξ12 + ξ22 = 0
f1 (x2 ) = ξ11 = 0 f2 (x2 ) = ξ12 = 1
with solutions f1 = (0, 1), f2 = (1, −1).
(2) Let fi be the functional
p(i) (0)
p 7→ .
i!
The of course, fi (tj ) = 1 if i = j and 0 otherwise. Hence (fi ) is the dual
basis and if p ∈ Pol (n), then its expansion
n
X n
X p(i) (0)
p= fi (p)ti = ti
i=0 i=0
i!
125
Since these hold for any x ∈ V we have
n
X n
X
′ ′
fi = tij fj resp. fi = t̃ij fj .
j=1 j=1
for the transfer matrix S from (fi ) to (fj′ ) we see that S = (T t )−1 . Thus we
have proved
Proposition 39 If T is the transfer matrix from (xi ) to (x′j ), then (T t )−1
is the transfer matrix from (fi ) to (fj′ ).
We now consider duality for mappings. Suppose that f : V → W is a linear
mapping. We define the transposed mapping f t (which maps the dual W ∗
of W into V ∗ ) as follows: if g ∈ W ∗ , then f t (g) is defined in the natural way
as the composition g ◦ f i.e. we have the equation
f t (g) : x 7→ g(f (x)) or f t (g)(x) = g(f (x)).
As the notation suggests, this is the coordinate-free version of the transpose
of a matrix:
Proposition 40 If (x1 , . . . , xn ) resp. (y1 , . . . , ym ) are bases for V and W
resp. and f : V → W is a linear mapping with matrix A = [aij ], then the
matrix of f t with respect to the dual bases (g1 , . . . , gm ) and (f1 , . . . , fn ) is At ,
the transpose of A.
Pn
Proof.
Pn Pm The matrix A is determined by the fact that f maps j=1 λj xj into
j=1 ( i=1 aij λj )yi or, in terms of the fj ’s and gi ’s,
n
X
aij fj (x) = gi (f (x)) = f t gi (x)
j=1
126
The bidual: If V is a vector space, we can form the dual of its dual space
i.e. the space (V ∗ )∗ which we denote by V ∗∗ . As we have already seen, the
vector space V is isomorphic to its dual space V ∗ and hence also to its bidual.
However, there is an essential difference between the two cases. The first was
dependent on an (arbitrary) choice of basis for V . We shall now show how to
define a natural isomorphism from V onto V ∗∗ which is independent of any
additional structure of V .
iV (x)(f ) = f (x) (f ∈ V ∗ ).
Proof. Choose a basis (xi ) for V so that (x1 , . . . , xr ) is one for M. Let
(fi ) be the dual basis. Then it is clear that M o = [fr+1 , . . . , fn ] (cf. the
calculation above) from which the first half of the equation follows. The
second follows from the symmetry mentioned above.
127
Proof. It is clear that M ⊂ (M o )o . To verify equality, we count dimensions:
Proof. In fact, we only have to prove one of these result, say the first one
Ker f t = (Im f )o . This follows from the following chain of equivalences:
(Ker f t )o = ((Im)o )o = Im f
128
and V = V˜1 ⊕ V˜2 . Hence V1 × V2 is sometimes called the external direct
sum of V1 and V2 .
It is easily checked that the dual (V1 × V2 )∗ of such a product is naturally
isomorphic to V1∗ ×V2∗ where a pair (f, g) in the latter space defines the linear
form
(x, y) 7→ f (x) + g(y).
We now introduce a construction which is in some sense dual to that of
taking subspaces and which can sometimes be used in a similar way to reduce
dimension. Suppose that V1 is a subspace of the vector space V . We introduce
an equivalence relation ∼ on V as follows:
(i.e. we are reducing V1 and all the affine subspaces parallel to it to points).
V /V1 is, by definition, the corresponding set of equivalence classes {[x] : x ∈
V } where [x] = {y : y ∼ x}. V /V1 is a vector space in its own right, where
we define the operations by the equations
[x] + [y] = [x + y]
λ[x] = [λx]
and the mapping π : V → V /V1 which maps x onto [x] is linear and surjective.
Further we have the following characteristic property:
If we apply this to the case where W = R, we see that the dual space of V /V1
is naturally isomorphic to the polar V1o of V1 in V ∗ . From this it follows that
the dimension of (V /V1)∗ and hence of V /V1 is
dim V − dim V1 .
for R3 . R1
2) Calculate the coordinates of the functional p 7→ 0 p(t) dt on Pol (n) with
respect to the basis (fti ) where (ti ) is a sequence of distinct points in [0, 1]
and fti (p) = p(ti ).
129
3) Let (x1 , . . . , xn ) be a basis for the vector space V with dual basis (f1 , . . . , fn ).
Show that the set
(x1 , x2 − λ2 x1 , . . . , xn − λn x1 )
is a basis and that
(f1 + λ2 f2 + . . . λn fn , f2 + λ3 f3 + · · · + λn fn , . . . , fn )
is the corresponding dual basis.
4) Find the dual basis to the basis
(1, t − a, . . . , (t − a)n )
for Pol (n).
5) Let t0 , . . . , tn be distinct points of [0, 1]. Show that the linear forms fi :
x → x(ti ) form a basis for the dual of Pol (n). What is the dual basis?
6) Let V1 be a subspace of a vector space V and let f : V1 → W be a linear
mapping. Show that there is a linear mapping f˜ : V → W which extends f .
Show that if S is a subset of V and f : S → W an arbitrary mapping, then
there is an extension
Pn of f to a linear mapping f˜ fromPV into W if and only
if whenever i=1 λi xi = 0 (for x1 , . . . , xn ∈ S), then ni=1 λi f (xi ) = 0.
7) Let f be the linear form
Z 1
x 7→ x(t) dt
0
130
5.2 Duality in euclidean spaces
As we have seen, any vector space V is isomorphic to its dual space. In the
special case where V = RnP we used the particular isomorphism y 7→ fy where
fy is the linear form x 7→ i ξi ηi . In this case we see the special role of the
scalar product and this suggests the following result:
131
5.3 Multilinear mappings
In this and the following section, we shall consider the concepts of multilinear
mappings and tensors. In fact, these are just two aspects of the same math-
ematical phenomenon—the difference in language having arisen during their
historical development. We begin with the concept of a multilinear mapping:
for any m. We have already met several examples of bilinear forms.P For exam-
ple, the scalar product on a euclidean space and the bilinear form i,j aij ξi ηj
associated with a conic section. In fact, the typical bilinear form on Rn can
be written as X
f (x, y) = aij ξi ηj
i,j
132
In matrix notation this can be conveniently written in the form X t AY where
X and Y are the column matrices
ξ1 η1
.. ..
. . .
ξn ηn
Just as in the case of the representation of linear operators by matrices, this is
completely general and so if V1 and V2 are spaces with bases (x1 , . . . , xm ) resp.
(y1 , . . . , yn ) and if f ∈ L2 (V1 , V2 ), then A = [aij ] where aij = f (xi , yj ) is called
the matrix of f with respect to these bases and we have the representation
X
f (x, y) = aij λi µj
i,j
P P
where x = i λi xi and y = j µj xj . We can express this fact in a more
abstract way as follows. Suppose that f ∈ V1∗ and g ∈ V2∗ . Then we define
a linear functional f ⊗ g on V1 × V2 as follows:
f ⊗ f : (x, y) 7→ f (x)g(y).
Proposition 46 If (fi ) and (gi ) are the dual bases of V1 and V2 , then (fi ⊗gj )
is a basis for L2 (V1 , V2 ). Hence the dimension of the latter space is dim V1 ·
dim V2 .
Proof. The argument above shows that these elements span L2 (V1 , V2 ). On
the other hand, fi ⊗ gj (xk , yl ) vanishes unless i = k and j = l in which case
its value is one. Hence the set is linearly independent by the Lemma above.
We have thus seen that both linear mappings and bilinear forms are rep-
resentable by matrices. However, it is important to note that the formula for
the change in the representing matrices induced by new coordinate systems
is different in each case as we shall now see. For suppose that we introduce
new bases (x′1 , . . . , x′m ) resp. (y1′ , . . . , yn′ ) in the above situation with transfer
matrices S = [sij ] and T = [tkl ] i.e.
X X
x′j = sij xi yj′ = tkl yk .
i k
′
Now if A is the matrix of f with respect to the new bases, then
a′ji = f (x′j , yl′)
X X
= f( sij xi , tkl yk )
i k
XX
= sij aik tkl
i k
133
which is the (j, l)-th element S t AT . Thus we have the formula
A′ = S t AT
for the new matrix which should be compared with that for the change in
the matrix of a linear mapping.
In the particular case where V1 = V2 = V and we use the same basis for
each space, the above equation takes on the form
A′ = S t AS.
It is instructive to verify this formula with the use of coordinates. In
matrix notation we have
f (x, y) = X t AX = (X ′ )t A′ (Y ′ )
where X, , Y, X ′, Y ′ are the column matrices composed of the coordinates of
x and y with respect to the corresponding matrices. Now we know that
X = SX ′ and Y = T Y ′ and if we substitute this in the formula we get
f (x, y) = (SX ′ )t A(T Y ′ ) = (X ′ )t (S t AT )(Y ′ )
as required.
We can distinguish two particularly important classes of bilinear forms f
on the product V × V of a vector space with itself. f ∈ L2 (V ) is said to be
• symmetric if f (x, y) = f (y, x)(x, y ∈ V );
• alternating if f (x, y) = −f (y, x)(x, y ∈ V ).
If f has the coordinate representation
X
aij fi ⊗ fj
i,j
134
Proposition 47 Let f be a symmetric bilinear form on V . Then there is a
basis (xi ) of V and integers p, q with p + q ≤ n so that
p p+q
X X
f= fi ⊗ fi − fi ⊗ fi .
i=1 i=p+1
Proof. We prove the matrix form of the result. First note that the result
on the diagonalisation of symmetric operators on euclidean space provides a
unitary matrix U so that U t AU = diag (λ − 1, . . . , λn ) where the λi are the
eigenvalues and can be ordered so that the first p (say) are positive, the next
q are negative and the rest zero. Now put
1 1 1 1
T = diag ( √ , . . . , p , p ,..., p , 1, . . . , 1).
λ1 λp −λp+1 −λp+q
Then if S = UT ,
Ip 0 0
S t AS = 0 −Iq 0 .
0 0 0
We now turn to the signs involved in the canonical form
Ip 0 0
0 −Iq 0 .
0 0 0
135
Proof. First note that p + q and p′ + q ′ are both equal to the rank of the
corresponding matrices and so are equal. Now put
for suitable p, q.
Proof. We choose a vector x̃1 with φ(x̃1 , x̃1 ) 6= 0. (If there is no such vector
then the form φ vanishes and the result is trivially true). Now let V1 be the
linear span [x̃1 ] and put
V2 = {y ∈ V : φ(x̃1 , y) = 0}.
136
A symmetric bilinear form φ is said to be non-singular if whenever
x ∈ V is such that φ(x, y) = 0 for each y ∈ V , then x vanishes. The reader
can check that this is equivalent to the fact that the rank of the matrix of φ
is equal to the dimension of V (i.e. p + q = n). In this case, just as in the
special case of a scalar product, the mapping
τ : x 7→ (y 7→ φ(x, y))
φ : (x, y) 7→ ξ1 η1 − ξ2 η2
on R2 .
Most of the results above can be carried over to the space L(V1 , . . . , Vr ; W )
of multilinear mappings from V1 ×· · ·×Vr into W . We content ourselves with
the remark that if we have bases (x11 , . . . , x1n1 ), . . . (xr1 , . . . , xrnr ) for V1 , . . . , Vr
with the corresponding dual bases and (y1 , . . . , yp ) for W , then the set
(fi11 ⊗ · · · ⊗ firr ⊗ yj : 1 ≤ i1 ≤ n1 , . . . , 1 ≤ ir ≤ nr , 1 ≤ j ≤ p)
f : (x, y) 7→ 2ξ1 η1 − ξ1 η2 + ξ2 η1 − ξ2 η2
f = 2f1 ⊗ f1 + f1 ⊗ f2 + 3f2 ⊗ f1 + f2 ⊗ f2 .
137
Exercises: 1) Reduce the following forms on R3 to their canonical forms:
• (ξ1 , ξ2 , ξ3 ) 7→ ξ1 ξ2 + ξ2 ξ3 + ξ3 ξ2 ;
• (ξ1 , ξ2 , ξ3 ) 7→ ξ2 η1 + ξ1 η2 + 2ξ2 η2 + 2ξ2η3 + 2ξ3 η2 + 5ξ3η3 .
2) Find the matrices of the bilinear forms
Z 1
(x, y) 7→ x(t)y(t) dt
0
(x, y) 7→ x(0)y(0)
(x, y) 7→ x(0)y ′ (0)
on Pol (n).
3) Let f be a symmetric bilinear form on V and φ be the mapping x 7→
f (x, x). Show that
• f (x, y) = 14 (φ(x + y) − φ(x − y)) (V real);
• f (x, y) = 14 (φ(x + y) − φ(x − y) + iφ(x + iy) − iφ(x − iy)) (V complex).
(This example shows how we can recover a symmetric 2-form from the
quadratic form it generates i.e. its values on the diagonal).
4) Let x1 , . . . , xn−1 be elements of Rn . Show that there exists a unique
element y of Rn so that(x|y) = det X for each x ∈ Rn where X is the matrix
with rows x1 , x2 , . . . , xn−1 , x. If we denote this y by
x1 × x2 × · · · × xn−1
show that this cross-product is linear in each variable xi (i.e. it is an
(n − 1)-linear mapping from Rn × · · · × Rn into Rn ). (When n = 3, this
coincides with the classical vector product studied in Chapter II).
5) Two spaces V and W with symmetric bilinear forms φ and ψ are said to
be isometric if there is a vector space isomorphism f : V → W so that
ψ(f (x), f (y)) = φ(x, y) (x, y ∈ V ).
Show that this is the case if and only if the dimensions of V and W respec-
tively the rank and signatures of φ and ψ coincide.
6) Let A be a symmetric, invertible n × n matrix. Show that the quadratic
form on Rn induced by A−1 is
0 ξ1 . . . ξn
1 η1
(x, y) 7→ − .. .
det A . A
ηn ...
138
7) Let φ be a symmetric bilinear form on the vector space V . Show that V
has a direct sum representation
V = V+ ⊕ V− ⊕ V0
where
V0 = {x : φ(x, x) = 0}
V+ = {x : φ(x, x) > 0} ∪ {0}
V− = {x : φ(x, x) < 0} ∪ {0}.
(x, y) 7→ ξ1 η1 − ξ2 η2
is often called the hyperbolic plane. Show that if V is a vector space with
a non-singular inner product and there is a vector x with (x|x) = 0, then
V contains a two dimensional subspace which is isometric to the hyperbolic
plane.
12) Suppose that φ and ψ are bilinear forms on a vector space V so that
Show that there is a basis for V with respect to which both φ and ψ have
upper triangular matrices. Deduce that if φ and ψ are symmetric, there is a
basis for which both are diagonal.
139
13) Let A = [aij ] be an n × n symmetric matrix and letAk denote the sub-
matrix
a11 . . . a1k
.. .. .
. .
ak1 . . . akk
Show that if the determinants of each of the Ak are non-zero, then the cor-
responding quadratic form Q(x) = (Ax|x) can be written in the form
Xn
det Ak 2
η
k=1
det Ak−1 k
P
where ηk = ξk + nj=k+1 bjk ξj for suitable bik .
Deduce that A is positive definite if and only if each det Ak is positive.
Can you give a corresponding characterisation of positive semi-definiteness?
14) Show that if f is a symmetric mapping in Lr (V ; W ), then
1 X
f (x1 , . . . , xr ) = ǫ1 . . . ǫr f (ǫ1 x1 + · · · + ǫr xr )
r!2r
the sum being taken over all choices (ǫi ) of sign (i.e. each ǫi is either 1 or
−1—there being 2r summands).
15) Let φ be an alternating bilinear form on a vector space V . Show that V
has a basis so that the matrix of the form is
0 Ir 0
−Ir 0 0 .
0 0 0
16) Let φ be as above and suppose that the rank of φ is n. Then it follows
from the above that n is even i.e. of the form 2k for some k and V has a
basis so that the matrix of φ is
0 Ik
J= .
−Ik 0
140
5.4 Tensors
We now turn to tensor products. In fact, these are also multilinear mappings
which are now defined on the dual space. However, because of the symmetry
between a vector space and its dual this is of purely notational significance.
(xi ⊗ yj : 1 ≤ i ≤ m, 1 ≤ j ≤ n)
is a P
basis for V1 ⊗ V2 and so each z ∈ V1 ⊗ V2 has a representation
z = i,j aij xi ⊗ yj where aij = z(fi , gj ).
Once again, this last statement implies that every tensor is described by a
matrix. Of course the transformation laws for the matrix of a tensor are
again different from those that we have met earlier and in fact we have the
formula
A′ = S −1 A(T −1 )t
where A is the matrix of z with respect to (xi ) and (yj ), A′ is the matrix
with respect to (x′i ) and (zj′ ) and S and T are the corresponding transfer
matrices.
Every tensor z ∈ V1 ⊗ V2 is thus representable as a linear combination
of so-called simple tensors i.e. those of the form x ⊗ y (x ∈ V1 , y ∈ V2 )
(stated more abstractly, the image of V1 ×V2 in V1 ⊗V2 spans the latter). Not
every tensor
P is simple. This
P can be perhaps most easily verified as follows:
if x = i λi xi and y = j µj yj , then the matrix of x ⊗ y is [λi µj ] and this
has rank 1. Hence if the matrix of a tensor has rank more than one, it is not
a simple tensor.
141
Tensor products of linear mappings: Suppose now that we have linear
mappings f ∈ L(V1 , W1 ) and g ∈ L(V2 , W2 ). Then we can define a linear
mapping f ⊗ g from V1 ⊗ V2 into W1 ⊗ W2 by putting
X X
f ⊗ g( xi ⊗ yi ) = f (xi ) ⊗ g(yi ).
i i
(x1i1 ⊗ · · · ⊗ xrir : 1 ≤ i1 ≤ n1 , . . . , 1 ≤ ir ≤ nr )
is a basis for the tensor product and so every tensor has a representation
X
ti1 ...ir x1i1 ⊗ · · · ⊗ xrir .
1≤i1 ,...,ir ≤n
Lp+q (V ∗ , . . . , V ∗ , V, . . . , V )
is
(x1 ⊗ · · · ⊗ xp ⊗ x′1 ⊗ · · · ⊗ x′p1 ⊗ f1 ⊗ · · · ⊗ fq′1 ).
142
Contraction: We can reduce a tensor of order (p, q) to one of degree (p −
1, q − 1) by applying a covariant componentNpto a contravariant
Np−1 one. More
precisely, there is a linear mapping c from q V into q−1 V where
x1 ⊗ . . . ξp ⊗ f1 ⊗ · · · ⊗ fq 7→ x1 ⊗ · · · ⊗ xp−1 ⊗ τ xp ⊗ f1 ⊗ · · · ⊗ fq .
We continue with some brief remarks on the standard notation for tensors.
V is a vector space with basis (e1 , . . . , en ) and we denote by (f1 , . . . , fn ) the
dual basis. Then we write (e1 , . . . , en ) for the corresponding basis for V ,
identified with V ∗ by way of a scalar product (i.e. ei = τ −1 (fi )). Then we
have bases
• (eij ) for V ⊗
PV where eij = ei ⊗ ej and a typical tensor has a represen-
tation z = ij ξ eij where ξ ij = fi ⊗ fj (z) = (eij |z) with eij = ei ⊗ ej ).
ij
• (eji ) for V ⊗ VP
∗
where eji = ei ⊗ ej and a typical tensor z has the
representation i,j ξji eji where ξji = (z|eij ).
N
In the general tensor space pq V we have a basis
j ...j
(ei11...ipq )
0 2
Example: Let f be the operator induced by the 2×2 matrix A =
1 −1
resp. by
−1 1
B=
2 0
143
. We calculate the matrix of f ⊗ g with respect to the basis (y1 , y2 , y3 , y4)
where
y1 = e1 ⊗ e1 y2 = e1 ⊗ e2 y3 = e2 ⊗ e1 y4 = e4 ⊗ e4 .
Then
Note that the basis used to define the matrix of the tensor product is that one
which is obtained by considering the array (xi ⊗ yn ) of tensor products and
numbering it by reading along the successive rows n the customary manner.
Using these matrix representation one can check the following results:
144
• r(f ⊗ g) = r(f )r(g);
• tr (f ⊗ g) = tr f · tr g;
• det(f ⊗ g) = (det f )m (det g)n where m is the dimension of the space
on which f acts, resp. n that of g.
One proves these results by choosing bases for which the matrices have a
simple form and then examining the Kronecker product. For example, con-
sider 3) and 4). We choose bases so that f and g are in Jordan form. Then
it is clear that the Kronecker product is upper triangle (try it out for small
matrices) and the elements in the diagonal are products of the form λi µj
where λi is an eigenvalue of f and µj one of g. The formulae 3) and 4) follow
by taking the sum resp. the product and counting how often the various
eigenvalues occur.
If the underlying spaces to be tensored are euclidean, then the same is
true of the tensor product space. For example, if V1 and V2 are euclidean,
then the mapping
(x ⊗ y|x1 ⊗ y1 ) 7→ (x|x1 )(y|y1)
can be extended to a scalar product on V1 ⊗V2 . Note that the latter is defined
so as to ensure that if (xi ) is an orthonormal basis for V1 and (yj ) one for
V2 , then (xi ⊗ yj ) is also orthonormal. In this context, we have the natural
formula
(f ⊗ g)t = f t ⊗ g t
relating the adjoint of f ⊗ g with those of f and g. This easily implies that
the tensor product of two self-adjoint mappings is itself self-adjoint. The
same holds for normal mappings. Also we have the formula
(f ⊗ g)† = f † ⊗ g †
for the Moore-Penrose inverse of a tensor product.
145
whose Moore-Penrose inverses can be readily computed.
Tensor products can be used to solve certain types of matrix equation
and we continue this section with some remarks on this theme. Firstly we
note that if p is a polynomial in two variables, say
X
p(s, t) = aij si tj ,
i,j
and A and B are n × n matrices, then we can define a new operator p(A, B)
by means of the formula
X
p(A, B) = ai,j Ai ⊗ B j
i,j
(Warning: this is not the matrix obtained by substituting A and B for s and
t resp.—the latter is an n × n matrix whereas the matrix above is n2 × n2 ).
The result which we shall require is the following:
Proposition 51 The eigenvalues of the above matrix are the scalars of the
form p(λi , µj ) where λi , . . . , λn are the eigenvalues of A and µ1 , . . . , µn are
those of B.
This is proved by using a basis for which A has Jordan form. The required
matrix then has an upper triangular block form with diagonal matrices of
type p(λi , B) i.e. matrices which are obtained by substituting an eigenvalue
λi of A for s and the matrix B for t. This matrix has eigenvalues p(λi , µj )
(j1 , . . . , n) from which the result follows.
The basis of our application of tensor products is the following simple
remark. The space Mm,n of m × n matrices is of course identifiable as a
vector space with Rmn . In the following we shall do this systematically by
associating to an m × n matrix X = [X1 . . . Xn ] the column vector
X1
X̃ = ...
Xn
i.e. we place the columns of X on top of each other. The property that we
shall require in order to deal with matrix equations is the following:
Proposition 52 Let A be an m × m matrix, X an m × n matrix and B an
n × n matrix. Then we have the equation
^ = (B t ⊗ A)X̃.
AXB
146
Pn
Proof. If X = [X1 . . . Xn ] then the j-th column of AXB is k=1 (bjk A)Xk
and this is
[b1j A b2j A . . . bnj A]X̃
which implies the result.
The following special cases will be useful below.
g = (In ⊗ A)X;
AX
g = (B t ⊗ Im )X;
XB
AX^
+ XB = ((In ⊗ A) + (B t ⊗ Im )X̃.
A1 XB1 + · · · + Ar XB r = C.
Here the A’s are the given m × m matrices, the B’s are n × n and V is m × n.
X is the unknown. Using the abovePr apparatus, we can rewrite the equation
t
in the form GX̃ = C̃ where C = j=1 Bj ⊗ Aj .
Rather than consider the most general case, we shall confine our attention
to one special one which often occurs in applications, namely the equation
AX + XB = C.
In ⊗ A + B t ⊗ Im .
This is just p(A, B t ) where p(s, t) = s+t. Hence by our preparatory remarks,
the eigenvalues of G are the scalars of the form λi + µj where the λ’s are the
eigenvalues of A and the µj are those of B.
Hence we have proved the following result:
For the above condition means that 0 is not an eigenvalue of G̃ i.e. this
matrix is invertible.
We can also get information for the case where the above general equation
is not always solvable. Consider, for example, the equation
AXB = C
147
where we are not assuming that A and B are invertible. Suppose that S and
T are generalised inverses for A and B respectively. Then it is clear that
S ⊗ T is one for A ⊗ B. If we rewrite the equation AXB = C in the form
(At ⊗ B)X̃ = C̃
then it follows form the general theory of such inverses that it has a solution
if and only if we have the equality
ASCT B = C.
X = SCT + Y − SAY BT
whereby Y is arbitrary.
In general, one would choose the Moore-Penrose inverses A† and B † for
S and T . This gives the solution
X = A† CB †
x1 ⊗ · · · ⊗ xn = 0
148
with respect to the first basis and
X
z= t′i1 ,...,ip fi′1 ⊗ · · · ⊗ fi′p
1≤i1 ,...,ip ≤n
f 7→ φ ◦ f ◦ ψ
Φ : f 7→ f t
(f |g) = tr (g tf ).
149