Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views22 pages

Vector Space

The document provides an overview of vector spaces and matrices, detailing concepts such as two-dimensional vectors, linear vector spaces, inner product spaces, and linear operators. It discusses properties of vector addition, scalar multiplication, and introduces the concept of kets in quantum mechanics. Additionally, it covers matrices, eigenvalues, and eigenvectors, emphasizing their significance in mathematical physics.

Uploaded by

Kathak Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views22 pages

Vector Space

The document provides an overview of vector spaces and matrices, detailing concepts such as two-dimensional vectors, linear vector spaces, inner product spaces, and linear operators. It discusses properties of vector addition, scalar multiplication, and introduces the concept of kets in quantum mechanics. Additionally, it covers matrices, eigenvalues, and eigenvectors, emphasizing their significance in mathematical physics.

Uploaded by

Kathak Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Vector Space and Matrices

CU PG-I
A NIRBAN K UNDU

Contents

1 Introduction: Two-dimensional vectors 2

2 Linear vector space 3


2.1 Inner Product Space and Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Linear Independence and Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Linear Operators 9
3.1 Some Special Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Matrices 14
4.1 Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Eigenvalues and Eigenvectors, Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Degenerate Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Functions of a Matrix: The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . 22

This note is mostly based upon:


Dennery and Krzywicki: Mathematics for Physicists
Palash B. Pal: A Physicist’s Introduction to Algebraic Structures
1 Introduction: Two-dimensional vectors

Let us recapitulate what we have already learnt about vectors (for simplicity, consider 2-dimensional
vectors in the cartesian coordinates, but the entire thing can be generalised for higher dimensions).

• Any vector A can be written as


A = a1 i + a2 j , (1)
where i and j are unit vectors along the x- and y-axes respectively, so that i ·i = j · j = 1 and i · j = 0,
and a 1 and a 2 are real numbers. From now on, we will use the shorthand A = (a 1 , a 2 ) for Eq. (1).
This is called an ordered pair, because (a 1 , a 2 ) 6= (a 2 , a 1 ). A set of n such numbers where ordering
is important is known as an n-tuple.

• Two vectors A = (a 1 , a 2 ) and B = (b 1 , b 2 ) can be added to give another vector C = (c 1 , c 2 ) with


c1 = a1 + b1 , c2 = a2 + b2 .

• We can multiply any vector A = (a 1 , a 2 ) by a real number d to get the vector D = d A. The individual
components are multiplied by d , so the magnitude of the vector increases by a factor of d . Thus,
αA + βB = (αa 1 + βb 1 , αa 2 + βb 2 ).

• The null vector 0 = (0, 0) always satisfies A + 0 = 0 + A = A. Also, there is a vector −A = (−a 1 , −a 2 )
so that A + (−A) = 0.

• The scalar product of two vectors A and B is defined as


2
X
A ·B = ai bi . (2)
i =1

We can also write this simply as a i b i with the convention that every repeated index is summed
over. This is known as the Einstein convention. In this convention, the index that is repeated can
be used twice and only twice; expressions like a i b i c i are meaningless. Such a repeated index is
also known as a dummy index, as whether we write a i b i or a p b p is irrelevant; it means the same
thing.
p p
• The magnitude or length of A is given by a = A · A = a i a i . Thus, A · B = ab cos θ where θ is the
angle between the two vectors. Obviously, |A · B | ≤ |A||B |, which is ab. The equality sign applies
only if θ = 0 or π.

• There is, of course, nothing sacred about the choice of i and j as the basis vectors. One can rotate
the coordinate axes by an angle θ in the counterclockwise direction, so that for the new axes,

i 0 = i cos θ + j sin θ , j 0 = −i sin θ + j cos θ . (3)

It follows from i · i = j · j = 1 and i · j = 0 that i 0 · i 0 = j 0 · j 0 = 1 and i 0 · j 0 = 0.

Q. What are the components of A in the primed frame?


Q. Show that A · i projects out the i -th component of A. Hence show that A − (A.i )i is orthogonal to i .
This is something that we will use later.
Q. Given that εi j A i B j is antisymmetric under A ↔ B , and ε12 = 1, what are the other components of εi j ?
(i , j = 1, 2)

2
2 Linear vector space

Let us consider an assembly of some abstract objects1 , which we will denote as | 〉. If we want to label
them, we might call them something like |1〉 or |i 〉. (You will see this object a lot in quantum mechanics;
this is called a ket. In fact, we will develop the idea of vector spaces keeping its application in quantum
mechanics in mind.) Let this assembly be called V. We say that the kets live in the space V. This will be
called a linear vector space if the kets satisfy the following properties.

1. ∀|a〉 , |b〉 ∈ V, (|a〉 + |b〉) ∈ V. [The symbol ∀ means “for all”.]

2. ∀|a〉 ∈ V, α|a〉 ∈ V too, where α is a real or complex number.

3. There exists a null element |0〉 ∈ V such that ∀|a〉 ∈ V, |a〉 + |0〉 = |0〉 + |a〉 = |a〉.

4. ∀|a〉 ∈ V, there exists an |a 0 〉 ∈ V such that |a〉 + |a 0 〉 = |0〉. The ket |a 0 〉 is called the inverse of |a〉.

The addition and multiplication as defined above satisfy the standard laws:

1. |a〉 + |b〉 = |b〉 + |a〉, |a〉 + (|b〉 + |c〉) = (|a〉 + |b〉) + |c〉 .

2. 1.|a〉 = |a〉 .

3. α(β|a〉) = (αβ)|a〉, (α + β)|a〉 = α|a〉 + β|a〉, α(|a〉 + |b〉) = α|a〉 + α|b〉.

If all the conditions are satisfied, the set V is called a linear vector space (LVS) and the objects |〉 are called
vectors. Note that the existence of a scalar (i.e., dot) product is not essential for the definition.
We will often write the null element or the null vector |0〉 as just 0, because any vector |a〉 multiplied
by 0 gives |0〉. We can write

|a〉 = 1|a〉 = (0 + 1)|a〉 = 0|a〉 + 1|a〉 = 0|a〉 + |a〉 , (4)

so by definition 0|a〉 = |0〉 for all |a〉.


This also defines subtraction of vectors,

|i 〉 − | j 〉 = |i 〉 + (−1)| j 〉 = |i 〉 + | j 0 〉 , (5)

where | j 0 〉 is the inverse of | j 〉.


Examples:
1. The 3-dimensional cartesian vectors form an LVS. Note that the maximum number of linearly
independent vectors is 3 here. This is known as the dimensionality of the vector space. A more formal
definition will be given later.
2. All complex numbers z = a + i b form an LVS. z can be written as an ordered pair (a, b), and the
scalar product, z 1∗ z 2 , is (a 1 a 2 + b 1 b 2 , a 1 b 2 − a 2 b 1 ). Note that it is z 1∗ z 2 and not just z 1 z 2 . As we will see
1 They can be something like 2- or 3-dimensional vectors living in the cartesian (or any other) space, so they need not be

completely abstract. However, they need not be functions of spatial coordinates, or momenta.

3
later, the scalar product must be real if we take the product with the same vector. This is obvious here; if
z 1 = z 2 , the product is real. The null vector is 0 = 0 + i 0 and the inverse of (a, b) is (−a, −b).
3. All arrows in a 2-dimensional space, when multiplied only by a real number, form a 2-dimensional
vector space. This is because such multiplications cannot change the orientations of the arrows. If the
multiplication is by a complex number, both magnitude and direction change, and the LVS becomes
1-dimensional.
4. All sinusoidal waves of period 2π form an LVS. Any such wave can be described as sin(x + θ) =
cos θ sin x +sin θ cos x, so one can treat sin x and cos x as basis vectors. This is again a 2-dimensional LVS.
5. All 4-vectors in the Minkowski space-time, of the form x µ ≡ (c t , x, y, z), form a linear vector space,
with µ = 0, 1, 2, 3.

2.1 Inner Product Space and Dual Space

The scalar, or inner, product of two vectors |a〉 and |b〉 in a vector space is a scalar. Thus, the inner
product is defined to be a map of the form V × V → S, where S is the set of scalars on which V is defined
(e.g., for ordinary 3-dimensional vectors, the inner product is the dot product of two vectors, and S is the
set of all real numbers).
The inner product of two vectors |i 〉 and | j 〉 is denoted by 〈i | j 〉; the order is important, see below.
The symbol 〈 is called a bra, so that 〈|〉 gives a closed bracket. The notation is due to Dirac.
The properties of the scalar product is as follows.

1. 〈a|b〉 = 〈b|a〉∗ . Thus, in general, 〈a|b〉 6= 〈b|a〉, but 〈a|a〉 is real. Also, 〈a|a〉 ≥ 0, where the equality
p
sign comes only if |a〉 = |0〉. This defines 〈a|a〉 as the magnitude of the vector |a〉.

2. If |d 〉 = α|a〉 + β|b〉, then 〈c|d 〉 = α〈c|a〉 + β〈c|b〉 is a linear function of α and β. However, 〈d |c〉 =
α∗ 〈a|c〉 + β∗ 〈b|c〉 is a linear function of α∗ and β∗ and not of α and β. This follows trivially from
〈d |c〉 = 〈c|d 〉∗ .

A vector space where the inner product is defined is called an inner product space. Note that in-
ner product may not be defined for every LVS. A very good example is the Minkowski space-time. The
contraction of two 4-vectors, written as

a µ b µ = (a 0 b 0 − a · b) (6)

is not an inner product, as a µ a µ need not be non-negative; in fact, it is negative for a space-like 4-vector,
and zero for a light-like 4-vector (e.g., for two space-time coordinates connected by a light ray). Thus,
while you can always say that a µ b µ is a 4-dimensional dot product, this has a very important difference
with the 3-dimensional dot product, which is an inner product. While a · a is the square of the length of
a, a µ a µ is not the square of the “length” of a µ . The difference stems from the relative minus sign between
the zero-th and the spatial coordinates of a µ b µ .
If somehow 〈a|b〉 = 〈b|a〉, the LVS is called a real vector space. Otherwise, it is complex. The LVS of
ordinary 2- or 3-dimensional vectors is a real vector space as A · B = B · A. That of the complex numbers
(example 2 above) is a complex vector space.

4
In quantum mechanics, the space in which the wavefunctions live is also an LVS. This is known as
the Hilbert space 2 , after the celebrated German mathematician David Hilbert. This is an inner product
space, but with some special properties of convergence that makes it complete3 . We can indeed check
that the Hilbert space is an LVS; in particular, that is why the superposition principle in quantum me-
chanics holds. The wavefunctions are, however, complex quantities, and the scalar product is defined
as4 Z
〈ψ1 |ψ2 〉 = ψ∗1 ψ2 d 3 x . (7)

This is a complex vector space as 〈ψ1 |ψ2 〉 = 〈ψ2 |ψ1 〉∗ .


The vectors |a〉 and |b〉 are orthogonal if they satisfy 〈a|b〉 = 0. Thus, if a vector |a〉 is orthogonal to all
vectors |〉 ∈ S, it must be a null vector, as 〈a|a〉 = 0.
If a set of vectors satisfy 〈i | j 〉 = δi j , where the Krönecker delta is defined as

δi j = 1 if i = j , δi j = 0 if i 6= j , (8)

the vectors are said to be orthonormal.


To facilitate the scalar product, and also taking into account the fact 2 above, we define, for every LVS
V with kets |i 〉, another space VD with bras 〈i |, so that there is a one-to-one correspondence between |i 〉
and 〈i | (that’s why we use the same label i ). The space VD is called the dual space of V, and |i 〉 and 〈i | are
dual vectors to each other. For example, if the LVS V contains n-dimensional column matrices, the dual
LVS VD contains n-dimensional row matrices, which are the hermitian conjugates of the corresponding
members of V. The dual space must satisfy:

1. The product of 〈a| and |b〉 is just the scalar product:

〈a| · |b〉 = 〈a|b〉 . (9)

2. If |d 〉 = α|a〉 + β|b〉, then


〈d | = 〈a|α∗ + 〈b|β∗ . (10)
That is the rule to get a dual vector: change the ket to the corresponding bra, and complex conju-
gate the coefficients.

The scalar product is defined only between a vector from V and another vector from the dual space
VD . Of course, a lot of spaces, like the space for cartesian vectors, are self-dual; there is no distinction
between the original space and the dual space. And that is why you never knew about dual space when
learning dot product of ordinary vectors.
2 The proper definition of the Hilbert space may be found in, say, Dennery and Krzywicki. It is an infinite-dimensional space

but most of the times in quantum mechanics, we work with a small subset of the original space which is finite.
3 All inner product spaces are metric spaces. We will develop the concept of metric spaces later, but it is a space where we

can have the concept of a distance between two vectors: |c〉 = |a〉 − |b〉. If we can have a sequence of elements in an LVS where
the separation between successive elements becomes smaller and smaller as we proceed, and ultimately becomes infinitely
small, that is called a Cauchy sequence. If every Cauchy sequence in an LVS converges to an element within that LVS, it is called
a complete metric space.
4 The vectors, the kets and bras, in the Hilbert space are not in general functions of x. The wavefunction ψ(x) is a scalar prod-

uct of |ψ〉 and |x〉, the eigenkets of the position operator x̂: ψ(x) = 〈x|ψ〉. So 〈ψ1 |ψ2 〉 = 〈ψ1 | d 3 x|x〉〈x| |ψ2 〉 = d 3 x ψ∗
1 ψ2 .
£R ¤ R

5
2.2 Cauchy-Schwarz Inequality

Consider the ket


|c〉 = |a〉 − x〈b|a〉|b〉 , (11)
where x is a real number. So 〈c| = 〈a| − x〈a|b〉〈b|. Thus,

〈c|c〉 = x 2 〈b|a〉〈a|b〉〈b|b〉 − 2x〈b|a〉〈a|b〉 + 〈a|a〉 ≥ 0 , (12)

because of the definition of the “length" of |c〉.


This is a quadratic in x, with all real coefficients (〈b|a〉〈a|b〉 is real), and can never be negative. Thus,
the function can never go below the x-axis; it can at most touch it from above. In other words, there
cannot be two real roots, or

4 (〈b|a〉〈a|b〉)2 ≤ 4 (〈b|a〉〈a|b〉〈b|b〉) 〈a|a〉 ⇒ 〈a|a〉〈b|b〉 ≥ 〈b|a〉〈a|b〉 . (13)

Eq. (13) is known as the Cauchy-Schwarz inequality. For ordinary vectors, this just means

|A|2 |B |2 > |A · B |2 ⇒ | cos θ| ≤ 1 . (14)

If either |a〉 or |b〉 is a null vector, this results in a trivial equality.


The triangle inequality of Euclidean geometry follows from (13). Consider three vectors |1〉, |2〉, and
|3〉 in a two-dimensional plane forming a triangle, so that |3〉 = |1〉 + |2〉 (this is a vector sum). Thus,

〈3|3〉 = 〈1|1〉 + 〈2|2〉 + 〈1|2〉 + 〈2|1〉


| {z }
2Re〈1|2〉
≤ 〈1|1〉 + 〈2|2〉 + 2|〈1|2〉|
p
≤ 〈1|1〉 + 〈2|2〉 + 2 〈1|1〉〈2|2〉 (using CS inequality) , (15)
p p p
so that 〈3|3〉 ≤ 〈1|1〉 + 〈2|2〉.

2.3 Metric Space

A set R is called a metric space if a real, positive number ρ(a, b) is associated with any pair of its elements
a, b ∈ R (remember that a and b need not be numbers) and
(1) ρ(a, b) ≥ 0 for any pair of elements a, b ∈ R;
(2) ρ(a, b) = ρ(b, a) (symmetry);
(3) ρ(a, b) = 0 only when a = b, i.e., when they refer to the same element in R;
(4) ρ(a, b) + ρ(b, c) ≥ ρ(a, c) (triangle inequality).
The number ρ(a, b) may be called the distance between a and b. The set R, together with the binary
operation ρ satisfying these conditions, constitute the metric space. The word “metric” comes from
metre, i.e., something with one can measure the distance between two elements. For n-dimensional
Euclidean space, s
n
ρ(a, b) = (a i − b i )2 .
X
(16)
i =1

Do not confuse ρ(a, b) with 〈a|b〉. In particular, ρ(a, a) = 0 (where a is some point in the LVS) but
p
〈a|a〉 (where |a〉 is a vector) defines the length or norm of that vector. For example, 〈ψ|ψ〉 = 1 means

6
that the wavefunction has been normalized to unity. More precisely, if one thinks |a〉 as the radius vector
starting at the origin and ending at the point a, and similarly for b, then ρ(a, b) is the norm of the vector
|a〉 − |b〉 (or the other way round).
If we have three vectors |a〉, |b〉, and |c〉 in an LVS and we define

|1〉 = |a〉 − |b〉 , |2〉 = |b〉 − |c〉 , |3〉 = |a〉 − |c〉 , (17)

then |1〉, |2〉, |3〉 satisfy the triangle inequality, and also the first two conditions of a metric space, so we
can say:
If the scalar product is defined in an LVS, it is a metric space.
Note that the scalar product need not be defined for all linear vector spaces, like the 4-dimensional
Minkowski space-time. Obviously, this is not a metric space.

p the vectors A = (a 1 , a 2 , a 3 ) and B = (b 1 , b 2 , b 3 ) in three-dimensional


1. The metric associated with
cartesian coordinate is (a 1 − b 1 )2 + (a 2 − b 2 )2 + (a 3 − b 3 )2 .

p two points (x, y, z) and (x + d x, y + d y,


2. If pz + d z) are sufficiently close, the metric is
d x 2 + d y 2 + d z 2 . We can also write this is d x i d x i using the Einstein convention. We may
also remove the root by writing

d s 2 = d x 2 + d y 2 + d z 2 = d xi d xi . (18)

3. Similarly, in two-dimensional plane polar coordinate system, the separation between two points
(r, θ) and (r + d r, θ + d θ) is
d s 2 = d r 2 + r 2 d θ2 . (19)
In three-dimensional spherical polar coordinates (deduce this):

d s 2 = d r 2 + r 2 d θ 2 + r 2 sin2 θd φ2 . (20)

Note that there can be other type of metrics too, not only the distance between two points. For
example, consider the following function:

ρ(x, y) = 0 if x = y, = 1 otherwise . (21)

Check that this satisfies all the properties of metric space and can act as a metric on any set.

2.4 Linear Independence and Basis

If in a vector space V, we have vectors |v 1 〉, |v 2 〉, .... |v n 〉 such that


n
X
a 1 |v 1 〉 + a 2 |v 2 〉 + .... + |a n |v n 〉 = a i |v i 〉 = |0〉 , (22)
i =1

where a i s are some scalars, holds with at least two of the a i s being nonzero, the vectors are called lin-
early dependent. If the only solution for this is a i = 0 for all i , the vectors are linearly independent. The
maximum number of linearly independent vectors in a vector space is called the dimensionality of the

7
vector space. If there are infinite number of such linearly independent vectors, the space is infinite di-
mensional.
If the dimensionality of the vector space be n, the n linearly independent vectors |v 1 〉, |v 2 〉, .... |v n 〉
forms a basis of the vector space, and is said to span the space. Any other vector can be written as a
linear combination of the basis vectors:
Xn
|a〉 = a i |i 〉 , (23)
i =1
but no basis vector can be written as a linear combination of the other basis vectors; that is what linear
independence is all about. The numbers a i are called components of the vector |a〉 in the |i 〉 basis (com-
ponents depend on the choice of basis). It is easy to convince yourself that in a 3-dimensional cartesian
space, i , j , k are a suitable choice of the basis, however, vectors like (i + j + k), (i − j ), (i + j − 2k) are also
linearly independent and can act as a basis. How do you check that they are indeed linearly independent?
The 3-dimensional space can have at most three linearly independent vectors, that is why we call it
3-dimensional. On the other hand, there are infinitely many independent states for a particle in, say, an
infinitely deep one-dimensional potential well (we take the depth to be infinite so that all such states are
bound; for a well of finite depth, there will be a finite number of bound states and an infinite number of
unbound states). When we expand any arbitrary function in a Fourier series, there are infinitely many
sine or cosine functions in the expansion, and they are linearly independent 5 , so this is another infinite
dimensional LVS.
P
Given a basis, the components are unique. Suppose the vector |a〉 can be written both as a i |i 〉 and
P P
b i |i 〉. Subtracting one from the other, (a i − b i )|i 〉 = 0, so by the condition of linear independence of
the basis vectors, a i = b i for all i .
Starting from any basis |a i 〉, where i can be finite or infinite (but these basis vectors need not be
either orthogonal or normalised), one can always construct another orthonormal basis |i 〉. This is known
as Gram-Schmidt orthogonalisation. The procedure is as follows.
1. Normalise the first vector of the original basis:
1
|1〉 = p |a 1 〉 . (24)
〈a 1 |a 1 〉
Thus, 〈1|1〉 = 1.
2. Construct |20 〉 by taking |a 2 〉 and projecting out the part proportional to |1〉:
|20 〉 = |a 2 〉 − 〈1|a 2 〉|1〉 , (25)
which ensures 〈1|20 〉 = 0. Divide |20 〉 by its norm to get |2〉.
3. Repeat this procedure, so that
m−1
|m 0 〉 = |a m 〉 −
X
〈i |a m 〉|i 〉 . (26)
i =1

It is easy to check that |m 0 〉 is orthogonal to |i 〉, i = 1 to m − 1. Normalise |m 0 〉 to unit norm,


1
|m〉 = p |m 0 〉 . (27)
0 0
〈m |m 〉
5 How to know that the sine and cosine functions are linearly independent? Easy; one cannot express, e.g., sin 2x as a linear

combination of sin x and cos x. sin x cos x is not a linear combination.

8
This completes the proof.
Q. If the basis vectors are orthonormal, 〈i | j 〉 = δi j , show that the components of any vector |a〉 are pro-
jected out by the projection operator |i 〉〈i |.
Q. A two-dimensional vector is written as A = 3i + 2j. What will be its components in a basis obtained by
rotating the original basis by π/4 in the counterclockwise direction?
Q. In a two-dimensional space, the basis vectors |i 〉 and | j 〉 are such that 〈i |i 〉 = 1, 〈 j | j 〉 = 2, 〈i | j 〉 = 1.
Construct an orthonormal basis.
Q. Suppose we take in the three-dimensional cartesian space the following vectors as basis: a = i + j + k,
b = i − 2k, c = 2j + k. Normalize a and construct two other orthonormal basis vectors. (This shows you
that the orthonormal basis is not unique, of course we can take i, j, k as another orthonormal basis.)

3 Linear Operators

A function f (x) associates a number y with another number x according to a certain rule. For example,
f (x) = x 2 associates, with every number x, its square. The space for x and y need not be identical. For
example, if x is any real number, positive, negative, or zero, f (x) is confined only to the non-negative
part of the real number space.
Similarly, we can assign with every vector |x〉 of an LVS, another vector |y〉, either of the same LVS or
of a different one, according to a certain rule. We simply write this as

|y〉 = O|x〉 , (28)

and O is called an operator, which, acting on |x〉, gives |y〉. We often put a hat on O, like Ô, to indicate
that this is an operator. Unless a possible confusion can occur, we will not use the hat.
We will be interested in linear operators, satisfying

O[α|a〉 + β|b〉] = αO|a〉 + βO|b〉 . (29)

In very special circumstances, we may need an antilinear operator:

O[α|a〉 + β|b〉] = α∗O|a〉 + β∗O|b〉 . (30)


p
A function f (x) may not be defined for all x; f (x) = x is not defined for x < 0 if both x and f (x) are
confined to be real. Similarly, O|x〉 may not be defined for all |x〉. The set of vectors |x〉 ∈ S for which O|x〉
is defined is called the domain of the operator O.
O|x〉 may take us outside S. The totality of all such O|x〉, where |x〉 is any vector in S and in the domain
of O, is called the range of the operator O. In quantum mechanics, we often encounter situations where
the range is S itself, or a part of it. We’ll see examples of both.

• The identity operator 1 takes a vector to itself without any multiplicative factors: 1|x〉 = |x〉 , ∀|x〉 ∈
S.

• The null operator 0 annihilates all vectors in S: 0|x〉 = |0〉 = 0 , ∀|x〉 ∈ S.

• If A and B are two linear operators acting on S, A = B means A|x〉 = B |x〉 , ∀|x〉 ∈ S.

9
• C = A + B means C |x〉 = A|x〉 + B |x〉 , ∀|x〉 ∈ S.

• D = AB means D|x〉 = A[B |x〉] , ∀|x〉 ∈ S. Note that AB is not necessarily the same as B A. A good
example is the angular momentum operators in quantum mechanics: J x J y 6= J y J x . If AB = B A, the
commutator [A, B ] = AB − B A is zero, and we say that the operators commute.

• The identity operator obviously commutes with any other operator A, as A1|x〉 = A|x〉, and 1[A|x〉] =
A|x〉.

• One can multiply an operator with a number. If A|x〉 = |y〉, then αA|x〉 = α|y〉. Obviously, Aα = αA.

• One can formally write higher powers of the operators. For example, A 2 |x〉 = A[A|x〉]. Similarly,

1 2 1 3
eA ≡ 1+ A + A + A +··· (31)
2! 3!

• The operator A can also act on the dual space S D . If A|a〉 = |c〉, one may write 〈b|A|a〉 = 〈b|c〉. The
vector 〈d | = 〈b|A is defined in such a way that 〈d |a〉 = 〈b|c〉.
This is quite a common practice in quantum mechanics, e.g.,
Z
〈ψ1 |H |ψ2 〉 = ψ∗1 H ψ2 d 3 x . (32)

Note that 〈b|A is not the dual of A|b〉. To see this, consider the operator α1. Acting on |x〉, this gives
α|x〉. The dual of this is 〈x|α∗ , which can be obtained by operating 1α∗ , and not 1α, on 〈x|.

Q. If A and B are linear operators, show that A + B and AB are also linear operators.
Q. If A + B = 1 and AB = 0, what is the value of A 2 + B 2 ?
Q. Show that e A e −A = 1 .
Q. If [A, B ] = 0, show that e A e B = e A+B .
Q. If [A, B ] = B , show that e A B e −A = eB .
Q. If O|x〉 = −|x〉, check whether O is linear. Do the same if O|x〉 = [|x〉]∗ .

3.1 Some Special Operators

The identity operator 1 leaves the ket invariant:

1| j 〉 = | j 〉 , (33)

and so it can be written in terms of orthonormal basis vectors as

1=
X
|i 〉〈i | , (34)
i

which immediately gives X


|j〉 = c i j |i 〉 , (35)
i

where c i j = 〈i | j 〉 are the corresponding projections.

10
The inverse operator of A is the operator B if AB = 1 or B A = 1. The operator B is generally denoted by
A −1 . If A −1 A = 1, A −1 is called the left inverse of A, and sometimes denoted by A −1l
. Similarly, if A A −1 = 1,
A −1 is called the right inverse, and written as A −1 r . Note that there is no guarantee that any one or both
the inverse operators will exist. Also, in general, A A −1 l
6= 1 and A −1
r A 6= 1.

However, if both the inverses exist, they must be equal:

A −1 −1 −1 −1
l A = 1 ⇒ Al A Ar = Ar ⇒ A −1 −1
l = Ar , (36)

where we have used the fact that any operator O multiplying 1 gives O.
The inverse in this case is also unique. To prove this, suppose we have two different inverses A −1
1 and
A −1
2 (whether left or right does not matter any more). Now

A −1 −1 −1 −1
1 A = 1 ⇒ A1 A A2 = A2 ⇒ A −1 −1
1 = A2 , (37)

leading to a reductio ad absurdum.


One can now define a unique inverse operator A −1 for the operator A. It also follows that (AB )−1 =
B −1 A −1 , because
(AB )−1 (AB ) = B −1 A −1 AB = B −1 1B = B −1 B = 1 . (38)
Similarly, (AB )(AB )−1 = 1.
Suppose the scalar product is defined in S. If there is an operator B corresponding to an operator A
such that
〈a|A|b〉 = 〈b|B |a〉∗ , ∀ |a〉 , |b〉 ∈ S , (39)
then B is called the adjoint operator of A and denoted by A † . Thus, it follows that 〈b|A † is the dual vector
of A|b〉.
Now, 〈a|(A † )† |b〉 = 〈b|A † |a〉∗ = 〈a|A|b〉, so (A † )† = A. Also,

〈a|A † B † |b〉 = [〈a|A † ][B † |b〉] = {[〈b|B ][A|a〉]}∗ = 〈b|B A|a〉∗ = 〈a|(B A)† |b〉 , (40)

where we have used the duality property of the vectors. Thus, for any two operators A and B ,

A † B † = (B A)† . (41)

If A † = A, the operator is called self-adjoint or hermitian. If A † = −A, it is anti-hermitian. The hermitian


operators play an extremely important role in quantum mechanics.
The operator d /d x is anti-hermitian but i d /d x is hermitian. This is because

d d d d ψ∗1
Z µ ¶ Z Z
〈ψ1 | |ψ2 〉 = ψ∗1 ψ2 d x = (ψ∗1 ψ2 )d x − ψ2 d x . (42)
dx dx dx dx

But the first integral is zero as both wavefunctions must vanish at the boundary of the integration region. Now,
complete the proof by showing that i d /d x is hermitian. This shows that momentum is indeed a hermitian opera-
tor in quantum mechanics.

Another important class of operators is where U † = U −1 . They are called unitary operators. One can
write
|U |a〉|2 = [〈a|U † ][U |a〉] = 〈a|U †U |a〉 = 〈a|U −1U |a〉 = 〈a|a〉 = ||a〉|2 , (43)

11
which means that operation by unitary operators keeps the length or norm of any vector unchanged.
The nomenclature is quite similar to those used for matrices. We will show later how one can repre-
sent 6 the action of an operator to a vector by conventional matrix multiplication.
Note that the combination |a〉〈b| acts as a linear operator. Operating on a ket, this gives a ket; oper-
ating on a bra, this gives a bra.

(|a〉〈b|)|c〉 = 〈b|c〉|a〉 , 〈d |(|a〉〈b|) = 〈d |a〉〈b| . (44)

Also,
¤∗
〈x|(|a〉〈b|)|y〉 = (〈x|a〉)(〈b|y〉) = 〈y|b〉〈a|x〉 = 〈y|(|b〉〈a|)|x〉∗ ,
£
(45)
so |b〉〈a| is the adjoint of |a〉〈b|.
To sum up the important points once again:

1. The dual of A|x〉 is 〈x|A † . This is the definition of the adjoint operator. If A = A † , the operator is
hermitian.

2. In an expression like 〈x|A|x〉, A can act either on 〈x| or |x〉. But if A|x〉 = |y〉, 〈x|A 6= 〈y| unless A is
hermitian.

3. The ∇ operator is a 3-dimensional vector as it satisfies all the transformation properties of a vector
(there is a 4-dimensional analogue too). Thus, ∇.A is a scalar. ∇ × A is a cross product, which is just
a way to combine two vectors to get another vector: A = B × C ⇒ A i = ²i j k B j C k . This is a vector
operator, a vector whose components are operators. Note that d /d x is an operator whose inverse
does not exist unless you specify the integration constant.

3.2 Projection Operators

Consider the LVS S of two-dimensional vectors, schematically written as |x〉. Let |i 〉 and | j 〉 be the two
unit vectors along the x- and y-axes. The operator P i = |i 〉〈i |, acting on any vector |x〉, gives 〈i |x〉|i 〉, a
vector along the x-direction with a magnitude 〈i |x〉.
Obviously, the set P i |x〉 is a one-dimensional LVS. It contains all those vectors of S that lie along
the x-direction, contains the null element, and also the unit vector |i 〉, which can be obtained by P i |i 〉.
Such a space S 0 , all whose members are members of S but not the other way round, is called a nontrivial
subspace of S. The null vector, and the whole set S itself, are trivial subspaces.
The operator P i is an example of the class known as projection operators. We will denote them by
P . These operators project out a subspace of S. Once a part is projected out, another projection cannot
do anything more, so P 2 = P . A projection operator must also be hermitian, since it is necessary that it
projects out the same part of the original space S and the dual space S D . Any operator that is hermitian
and satisfies P 2 = P is called a projection operator.
Suppose P 1 and P 2 are two projection operators. They project out different parts of the original LVS.
Is P 1 + P 2 a projection operator too? If P 1† = P 1 and P 2† = P 2 , (P 1 + P 2 )† = P 1 + P 2 . However,

(P 1 + P 2 )2 = P 12 + P 22 + P 1 P 2 + P 2 P 1 = (P 1 + P 2 ) + P 1 P 2 + P 2 P 1 , (46)
6 We will also explain what representation means.

12
so that P 1 P 2 + P 2 P 1 must be zero. Multiply from left by P 1 and use P 12 = P 1 , this gives P 1 P 2 + P 1 P 2 P 1 = 0.
Similarly, multiply by P 1 from the right, and subtract one from the other, to get

P1P2 − P2P1 = 0 , (47)

so that the only solution is P 1 P 2 = P 2 P 1 = 0. Projection operators like this are called orthogonal projection
operators. As an important example, for any P , 1 − P is an orthogonal projection operator. They sum up
to 1, which projects the entire space onto itself.
In short, if several projection operators P 1 , P 2 , · · · P n satisfy

P i P j = P i for i = j , 0 otherwise , (48)


P
then i P i is also a projection operator.
Q. Show that d m /d x m is anti-hermitian if m is odd and hermitian if m is even.
Q. Show that if P is a projection operator, so is 1 − P , and it is orthogonal to P .
Q. For a two-dimensional cartesian space, show that |i 〉〈i | and | j 〉〈 j | are orthogonal projection operators
(|i 〉 and | j 〉 are the unit vectors along x and y axes respectively.)

3.3 Eigenvalues and Eigenvectors

If the effect of an operator A on a vector |a〉 is to yield the same vector multiplied by some constant,

A|a〉 = a|a〉 , (49)

We call it an eigenvalue equation, the vector |a〉 an eigenvector of A, and a the eigenvalue A.
If there is even one vector |x〉 which is a simultaneous eigenvector of both A and B , with eigenvalues
a and b respectively, then A and B commute. This is easy to show, as

(AB )|x〉 = A(b|x〉) = b A|x〉 = ab|x〉 , (B A)|x〉 = B (a|x〉) = aB |x〉 = ab|x〉 , (50)

and so [A, B ]|x〉 = 0, therefore [A, B ] is the null operator 0, or AB − B A = 0.


The reverse is not necessarily true. It is true only if both A and B have non-degenerate eigenvec-
tors. If there are two or more eigenvectors for which the eigenvalues of an operator are the same, the
eigenvectors are called degenerate. If no two eigenvectors have same eigenvalues, they are called non-
degenerate. If an operator A has two degenerate eigenvectors |a 1 〉 and |a 2 〉 with the same eigenvalue a,
any linear combination c|a 1 〉 + d |a 2 〉 is also an eigenvector, with the same eigenvalue (prove it). This is
not true for non-degenerate eigenvectors.
Suppose both A and B have non-degenerate eigenvectors, and [A, B ] = 0. Also suppose |x〉 is an
eigenvector (often called an eigenket) of A with eigenvalue a. We can write

[A, B ]|x〉 = 0|x〉 = 0 ⇒ AB |x〉 = B A|x〉 ⇒ A(B |x〉) = a(B |x〉) , (51)

or B |x〉 is also an eigenvector of A with the same eigenvalue a. But A has non-degenerate eigenvalues;
so this can only happen if B |x〉 is just some multiplicative constant times |x〉, or B |x〉 = b|x〉. Thus,
commuting operators must have simultaneous eigenvectors if they are non-degenerate.

13
One can have a counterexample from the angular momentum algebra of quantum mechanics. The
vectors are labelled by the angular momentum j and its projection m on some axis, usually taken to
be the z-axis. These vectors, | j m〉, are eigenvectors of the operator J2 = J x2 + J y2 + J z2 7 . They are also
eigenvectors of J z but not of J x or J y . So here is a situation where J2 and J x commute but they do not
have simultaneous eigenvectors. The reason is that all these | j m〉 states are degenerate with respect to
J2 with an eigenvalue of j ( j + 1)ħ2 .
The eigenvalues of hermitian operators are necessarily real. Suppose A is hermitian, A = A † , and
A|a〉 = a|a〉. Then

〈a|A|a〉 = a〈a|a〉 ,
〈a|A|a〉 = 〈a|A † |a〉∗ = 〈a|A|a〉∗ = a ∗ 〈a|a〉 , (52)

as the scalar product 〈a|a〉 is real. So a = a ∗ , or hermitian operators have real eigenvalues.
If an hermitian operator has two different eigenvalues corresponding to two different eigenvectors,
these eigenvectors must be orthogonal to each other, i.e., their scalar product must be zero. Suppose for
an hermitian operator A, A|a〉 = a|a〉 and A|b〉 = b|b〉. So,

〈b|A|a〉 = a〈b|a〉 ,
〈a|A|b〉 = 〈b|A † |a〉∗ = 〈b|A|a〉∗ = b〈a|b〉 ⇒ 〈b|A|a〉 = b〈b|a〉 , (53)

using the fact that b is real and 〈a|b〉∗ = 〈b|a〉. Subtracting one from the other, and noting that a 6= b, we
get 〈a|b〉 = 0, or they are orthogonal to each other.

4 Matrices

An m × n matrix A has m rows and n columns, and the i j -th element Ai j lives in the i -th row and the
j -th column. Thus, 1 ≤ i ≤ m and 1 ≤ j ≤ n. If m = n, A is called a square matrix.
The sum of two matrices A and B is defined only if they are of same dimensionality, i.e., both have
equal number of rows and equal number of columns. In that case, C = A + B means Ci j = Ai j + Bi j for
every pair (i , j ).
The inner product C = AB is defined if and only if the number of columns of A is equal to the number
of rows of B. In this case, we write
Xn
C = AB = Ci j = Ai k Bk j , (54)
k=1

one can also drop the explicit summation sign using the Einstein convention for repeated indices. If A
is an m × n matrix, and B is an n × p matrix, C will be an m × p matrix. Only if m = p, both AB and BA
are defined. They are of same dimensionality if m = n = p, i.e., both A and B are square matrices. Even if
the product is defined both way, they need not commute; AB is not necessarily equal to BA, and in this
respect matrices differ from ordinary numbers, whose products always commute.
The direct, outer, or Krönecker product of two matrices is defined as follows. If A is an m × m matrix
and B is an n × n matrix, then the direct product C = A ⊗ B is an mn × mn matrix with elements C pq =
7 Although we have used the cartesian symbols x, y, z, the angular momentum operators can act on a completely different

space.

14
A i j B kl , where p = m(i − 1) + k and q = n( j − 1) + l . For example, if A and B are both 2 × 2 matrices,
a 11 b 11 a 11 b 12 a 12 b 11 a 12 b 12
 
µ ¶
a B a 12 B  a 11 b 21 a 11 b 22 a 12 b 21 a 12 b 22 
A ⊗ B = 11  a 21 b 11 a 21 b 12 a 22 b 11 a 22 b 12  .
=  (55)
a 21 B a 22 B
a 21 b 12 a 21 b 22 a 22 b 12 a 22 b 22

A row matrix R of dimensionality 1 × m has only one row and m columns. A column matrix C of
dimensionality m × 1 similarly has only one column but m number of rows. Here, both RC and CR are
defined; the first is a number (or a 1 × 1 matrix), and the second an m × m square matrix.
The unit matrix of dimension n is an n × n square matrix whose diagonal entries are 1 and all other
entries are zero: 1i j = δi j . The unit matrix commutes with any other matrix: A1 = 1A = A, assuming that
the product is defined both way (so that A is also a square matrix of same dimension). From now on,
unless mentioned explicitly, all matrices will be taken to be square ones.
If two matrices P and Q satisfy PQ = QP = 1, P and Q are called inverses of each other, and we denote
Q by P−1 . It is easy to show that left and right inverses are identical, the proof is along the same line as
the proof for linear operators.
The necessary and sufficient condition for the inverse of a matrix A to exist is a nonzero determinant:
det A 6= 0. The matrices with zero determinant are called singular matrices and do not have an inverse.
Note that for a square array
¯ a1 a2 a3 · · · ¯
¯ ¯
¯ ¯
¯ b1 b2 b3 · · · ¯
¯ ¯
¯c
¯ 1 c2 c3 · · · ¯
¯
··· ··· ··· ···
¯ ¯
the determinant is defined as ²i j k... a i b j c k ..., where ²i j k... is an extension of the usual Levi-Civita symbol:
+1 for an even permutation of (i , j , k, ...) = (1, 2, 3, ...), −1 for an odd permutation, and 0 if any two indices
are repeated.
If we strike out the i -th row and the j -th column of the n × n determinant, the determinant of the
reduced (n − 1) × (n − 1) matrix is called the i j -th minor of the original matrix. For example, if we omit
the first row (with a i ) and one of the columns in turn, we get the M 1 j minors. The determinant D n for
this n × n matrix can also be written as
n
(−1)1+ j a j M 1 j .
X
Dn = (56)
j =1

If the i -th row is omitted, the first factor would have been (−1)i + j .
As A−1 A = 1, (det A−1 )×(det A) = 1, as unit matrices of any dimension always have unit determinant.
A similarity transformation on a matrix A is defined by

A0 = R−1 AR . (57)

Similarity transformations keep the determinant invariant, as

det A0 = det (R−1 AR) = det R−1 det A det R = det A . (58)

Another thing that remains invariant under a similarity transformation is the trace of a matrix, which is
just the algebraic sum of the diagonal elements: tr A = i Ai i . Even if A and B do not commute, their
P

15
traces commute, as tr (AB) = tr A tr B = tr (BA). It can be generalized: the trace of the product of any
number of matrices remains invariant under a cyclic permutation of those matrices. The proof follows
from the definition of trace, and the product of matrices:
X X
tr (ABC · · · P) = (ABC · · · P)i i = Ai j B j k Ckl · · · Ppi . (59)
i i , j ,k,l ...p

All the indices are summed over, so we can start from any point; e.g., if we start from the index k, we get
the trace as tr (C · · · PAB).
Note that this is valid only if the matrices are finite-dimensional. For infinite-dimensional matrices, tr (AB)
need not be equal to tr (BA). A good example can be given from quantum mechanics. One can write both position
and momentum operators, x and p, as infinite-dimensional matrices. The uncertainty relation, written in the
form of matrices, now reads [x, p] = i ħ1. The trace of the right-hand side is definitely nonzero; in fact, it is infinity
because the unit matrix is infinite-dimensional. The trace of the left-hand side is also nonzero, as tr (xp) 6= tr (px);
they are infinite-dimensional matrices too.

Under a similarity transformation (57),

tr A0 = tr (R−1 AR) = tr (RR−1 A) = tr (1A) = tr A . (60)

Now, some definitions.

• A diagonal matrix Ad has zero or nonzero entries along the diagonal, but necessarily zero entries
in all the off-diagonal positions. Two diagonal matrices always commute. Suppose the i i -th entry
of Ad is a i and the j j -th entry of Bd is b j . Then

(Ad Bd )i k = (Ad )i j (Bd ) j k = a i δi j b j δ j k = a i b i , (61)

as the product is nonzero only when i = j = k. We get an identical result for (Bd Ad )i k , so they
always commute. A diagonal matrix need not commute with a nondiagonal matrix.

• The complex conjugate A∗ of a matrix A is given by (A∗ )i j = (Ai j )∗ , i.e., by simply taking the com-
plex conjugate of each entry. A need not be diagonal.

• The transpose AT of a matrix A is given by (AT )i j = A j i , i.e., by interchanging the row and the
column. The transpose of an m × n matrix is an n × m matrix; the transpose of a row matrix is a
column matrix, and vice versa. We have

(AB)Tij = (AB) j i = A j k Bki = BTik ATk j = (BT AT )i j , (62)

or (AB)T = BT AT .

• The hermitian conjugate A† of a matrix A is given by A†i j = (A j i )∗ , i.e., by interchanging the row
and the column entries and then by taking the complex conjugate (the order of these operations
does not matter). If A is real, A† = AT .

Q. Show that (AB)† = B† A† .


Q. If two matrices anticommute, i.e., AB = −BA, show that their product has trace zero.
Q. Show that for each of the three Pauli matrices
µ ¶ µ ¶ µ ¶
0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = ,
1 0 i 0 0 −1

16
σ−1
i
= σi . What are the hermitian conjugates of these matrices?
Q. Show that
θ θ
µ ¶
1
exp i θσ2 = cos + i σ2 sin . (63)
2 2 2
Q. The Pauli matrices satisfy [σi , σ j ] = 2i ²i j k σk and {σi , σ j } = 2δi j . Show that for any two vectors A and
B,
σ.A) (~
(~ σ.B) = A.B + i~ σ.(A × B) . (64)

4.1 Some Special Matrices

Some more definitions, valid for square matrices only.


1. If a real matrix O is the inverse of its transpose OT , i.e., OT = O−1 , or OOT = OT O = 1, it is called
an orthogonal matrix.
2. If a complex matrix U is the inverse of its hermitian conjugate U† , i.e., U† = U−1 , or UU† = U† U = 1,
it is called a unitary matrix.
3. If a matrix H is equal to its hermitian conjugate, i.e., H = H† , it is called an hermitian matrix. As you
can see, all the three Pauli matrices are hermitian.
4. If Si j = S j i , i.e., S = ST , it is called a symmetric matrix. For a symmetric matrix of dimensionality n×
n, the 21 n(n −1) entries above the diagonal are identical to the 21 n(n −1) entries below the diagonal. Thus,
a symmetric matrix has only n 2 − 12 n(n − 1) = 12 n(n + 1) independent entries. If the matrix is complex, we
have to multiply by a factor of 2 to get the number of independent elements.
5. If Ai j = −A j i , i.e., A = −AT , it is called an antisymmetric matrix. For an antisymmetric matrix of
dimensionality n ×n, the 12 n(n −1) entries above the diagonal are the algebraic opposites to the 12 n(n −1)
entries below the diagonal. The diagonal entries are obviously all zero. Thus, an antisymmetric matrix
has only 12 n(n − 1) independent entries; multiply by 2 if the entries are complex.
Any matrix P can be written as a sum of a symmetric and an antisymmetric matrix. S = P + PT is
obviously symmetric, and A = P − PT is antisymmetric, so P can be written as 21 (S + A).
The n × n orthogonal matrix O has 12 n(n − 1) independent elements. We have n 2 elements to start
with, but the condition
OOTij = Oi k OTk j = Oi k O j k = δi j (65)
gives several constraints. There are n such equations with the right-hand side equal to 1, which look like
P 2 n 1
k O1k = 1 for i = j = 1, and so on. There are C 2 = 2 n(n − 1) conditions with the right-hand side equal
to zero, which look like X
O1k O2k = 0 . (66)
k

Thus, the total number of independent elements is n 2 − n − 12 n(n − 1) = 12 n(n − 1). Note that OT O = 1
does not give any new constraints; it is just the transpose of the original equation.
Rotation in an n-dimensional space is nothing but transforming a vector by operators which can be
represented (we are yet to come to the exact definition of representation) by n × n orthogonal matrices,
with 21 n(n − 1) independent elements, or angles. Thus, a 2-dimensional rotation can be parametrized by

17
only one angle; a 3-dimensional rotation by three, which are known as Eulerian angles 8 .
One can have an identical exercise for the n × n unitary matrix U. We start with 2n 2 real elements, as
the entries are complex numbers. The condition

UU†i j = Ui k U†k j = Ui k U∗j k = δi j (67)

gives the constraints. There are again n such equations with the right-hand side equal to 1, which look
like
|U1k |2 = 1
X
(68)
k

for i = j = 1, and so on. All entries on the left-hand side are necessarily real. There are n C 2 = 21 n(n − 1)
conditions with the right-hand side equal to zero, which look like

U1k U∗2k = 0 .
X
(69)
k

However, the entries are complex, so a single such equation is actually two equations, for the real and
the imaginary parts. Thus, the total number of independent elements is 2n 2 − n − n(n − 1) = n 2 . Again,
U† U = 1 does not give any new constraints; it is just the hermitian conjugate of the original equation.

4.2 Representation

Suppose we have an orthonormal basis |i 〉, so that any vector |a〉 can be written as in (23). If the space
is n-dimensional, one can express these basis vectors as n-component column matrices, with all en-
tries equal to zero except one, which is unity. For example, in a 3-dimensional space, one can write the
orthonormal basis vectors as      
1 0 0
|1〉 =  0  , |2〉 =  1  , |3〉 =  0  . (70)
0 0 1
Of course there is nothing sacred about the orthonormal basis, but it makes the calculation easier. The
vector |a〉 can be expressed as
a1
 

|a〉 =  a 2  . (71)
a3
Consider an operator A that takes |a〉 to |b〉, i.e., A|a〉 = |b〉. Obviously |b〉 has same dimensionality as
|a〉, and can be written in a form similar to (71). The result is the same if we express the operator A as an
n × n matrix A with the following property:

Ai j a j = bi . (72)

We now call the matrix A a representation of the operator A, and the column matrices a, b representations
of vectors |a〉 and |b〉 respectively.
Examples:
8 There is a conventional choice of Eulerian angles, but it is by no means unique.

18
1. In a two-dimensional place, suppose A|1〉 = |1〉 and A|2〉 = −|2〉. Then a 11 = 1, a 22 = −1, a 12 = a 21 =
0, so that µ ¶
1 0
A= . (73)
0 −1

2. In a three-dimensional space, take A|1〉 = |2〉, A|2〉 = |3〉, A|3〉 = |1〉. Thus, a 21 = a 32 = a 13 = 1 and
the rest entries are zero, and  
0 0 1
A = 1 0 0 . (74)
0 1 0

3. Suppose the Hilbert space is 2-dimensional (i.e., the part of the original infinite-dimensional
space in which we are interested) and the operator A acts like A|ψ1 〉 = p1 |ψ1 〉 + |ψ2 〉 and A|ψ2 〉 =
£ ¤
p 2
p1 −|ψ1 〉 + |ψ2 〉 . Thus, a 11 = a 22 = a 12 = −a 21 = 1/ 2.
£ ¤
2

4.3 Eigenvalues and Eigenvectors, Again

If there is a square matrix A and a column matrix a such that Aa = αa, then a is called an eigenvector of
A and α is the corresponding eigenvalue. Again, this is exactly the same that we got for operators and
vectors, eq. (49).
A square matrix can be diagonalized by a similarity transformation: Ad = RAR−1 . For a diagonal
matrix, the eigenvectors are just the orthonormal basis vectors, with the corresponding diagonal entries
as eigenvalues. (A note of caution: this is strictly true only for non-degenerate eigenvalues, i.e., when all
diagonal entries are different. Degenerate eigenvalues pose more complication which will be discussed
later.) If the matrix A is real symmetric, it can be diagonalized by an orthogonal transformation, i.e., R
becomes an orthogonal matrix. If A is hermitian, it can be diagonalized by a unitary transformation:

Ad = UAU† , (75)

where U† = U−1 . While the inverse does not exist if the determinant is zero, even such a matrix can be
diagonalized. However, determinant remains invariant under a similarity transformation, so at least one
of the eigenvalues will be zero for such a singular matrix.
Trace also remains invariant under similarity transformations.
µ ¶ Thus, it is really easy to find out the
a b
eigenvalues of a 2 × 2 matrix. Suppose the matrix is , and the eigenvalues are λ1 and λ2 . We need
c d
to solve two simultaneous equations,

λ1 λ2 = ad − bc , λ1 + λ2 = a + d , (76)

and that gives the eigenvalues.


We can find the eigenvalues by inspection for some special cases in higher-dimensional matrices too.
Consider, for example, the matrix  
1 1 1
A = 1 1 1 . (77)
1 1 1
The determinant is zero (all minors are zero for A) and there must be at least one zero eigenvalue. How
to know how many eigenvalues are actually zero?

19
Suppose the i j -th element of an n × n matrix A is denoted by a i j . If the system of equations

a 11 x 1 + a 12 x 2 + · · · a an x n = 0,
b 11 x 1 + b 12 x 2 + · · · b an x n = 0,
···
n 11 x 1 + n 12 x 2 + · · · n an x n = 0, (78)

have only one unique solution then the equations are linearly independent and the matrix is nonsingular,
i.e., det A is nonzero. In this case no eigenvalue can be zero, and the matrix is said to be of rank n.
If one of these equations can be expressed as a linear combination of the others, then no unique
solution of (78) is possible. The determinant is singular, i.e., A−1 does not exist, and one of the eigenval-
ues is zero. If there are m linearly dependent rows (or columns) on n − m linearly independent rows (or
columns), the matrix is said to be of rank n − m, and there are m number of zero eigenvalues.
Only one row of (77) is independent; the other two rows are identical, so linearly dependent, and the
rank is 1. Therefore, two of the eigenvalues are zero. The trace must be invariant, so the eigenvalues are
(0, 0, 3).
µ ¶
1 1
The eigenvectors are always arbitrary up to an overall sign. Consider the matrix A = . The
1 1
secular equation is
¯1−λ
¯ ¯
1 ¯¯
¯ = 0, (79)
¯ 1 1−λ¯
which boils down to λ(λ − 2) = 0, so the eigenvalues are 0 and 2 (this can be checked just by looking at
the determinant and trace, without even caring about the secular equation). For λ = 0, the equation of
the eigenvector is µ ¶µ ¶
1−0 1 x
= 0, (80)
1 1−0 y
p p
or x + y = 0. Thus, we can choose the normalized eigenvector as (1/ 2, −1/ 2), but we could have taken
the minus sign in the firstpcomponent
p too. p p the second eigenvector corresponding to λ = 2 or
Similarly,
x − y = 0 can either be (1/ 2, 1/ 2) or (−1/ 2, −1/ 2).
µ ¶
2 7
Q. For what value of x the matrix will have a zero eigenvalue? What is the other eigenvalue?
−6 x
Show that in this case the second row is linearly dependent on the first row.
Q. What is the rank of the matrix whose eigenvalues are (i) 2, 1, 0; (ii) 1, −1, 2, 2; (iii) i , −i , 0, 0?
Q. The 3 rows of a 3 × 3 matrix are (a, b, c); (2a, −b, c); and (6a, 0, 4c). What is the rank of this matrix?
Q. Write down the secular equation for the matrix A for which a 12 = a 21 = 1 and the other elements are
zero. Find the eigenvalues and eigenvectors.

4.4 Degenerate Eigenvalues

If the eigenvalues of a matrix (or an operator) are degenerate, the eigenvectors are not unique. Consider
the operator A with two eigenvectors |x〉 and |y〉 having the same eigenvalue a, so that

A|x〉 = a|x〉 , A|y〉 = a|y〉 . (81)

20
Any linear combination of |x〉 and |y〉 will have the same eigenvalue. Consider the combination |m〉 =
α|x〉 + β|y〉, for which

A[α|x〉 + β|y〉] = α(A|x〉) + β(A|y〉) = a[α|x〉 + β|y〉] = a|m〉 . (82)

Thus one can take any linearly independent combination of the basis vectors for which the eigenvalues
are degenerate (technically, we say the basis vectors that span the degenerate subspace) and those new
vectors are equally good as a basis. One can, of course, find an orthonormal basis too using the Gram-
Schmidt method. The point to remember is that if a matrix, or an operator, has degenerate eigenvalues,
the eigenvectors are not unique.
Examples:
1. The unit matrix in any dimension has all degenerate eigenvalues, equal to 1. The eigenvectors can
be chosen to be the standard orthonormal set, with one element unity and the others zero. But any linear
combination of them is also an eigenvector. But any vector in that LVS is a linear combination of those
orthonormal basis vectors, so any vector is an eigenvector of the unit matrix, with eigenvalue 1, which is
obvious: 1|a〉 = |a〉.
µ ¶ µ ¶ µ ¶
a b p1 p2
2. Suppose the matrix A = has eigenvalues λ1 and λ2 , and the eigenvectors and .
c d q1 q2
The matrix A + 1 must have the same eigenvectors, as they are also the eigenvectors of the 2 × 2 unit
matrix 1. The new eigenvalues, µ1 and µ2 , will satisfy

µ1 + µ2 = (a + 1) + (d + 1) = a + d + 2 = λ1 + λ2 + 2 ,
µ1 µ2 = (a + 1)(d + 1) − bc = (ad − bc) + a + d + 1 = λ1 λ2 + λ1 + λ2 + 1 , (83)

whose obvious solutions are µ1 = λ1 + 1, µ2 = λ2 + 1, as it should be.


3. Consider the matrix  
1 0 0
A = 0 0 1 , (84)
0 1 0
for which the secular equation is (λ − 1)(λ2 − 1) = 0, so that the three eigenvectors are −1, 1, and 1. First,
we find the eigenvector for the non-degenerate eigenvalue −1, which gives x = 0 and y + z = 0. So a
suitably normalized eigenvector is  
0
1  
|1〉 = p 1 . (85)
2 −1

For λ = 1, the only equation that we have is y − z = 0 and there are infinite possible ways to solve this
equation. We can just pick up a suitable choice:
 
0
1  
|2〉 = p 1 . (86)
2 1

The third eigenvector, if we want the basis to be orthonormal, can be found by the Gram-Schmidt
method. Another easy way is to have the cross product of these two eigenvectors, and we find 〈3| =
(1, 0, 0).

21
4.5 Functions of a Matrix: The Cayley-Hamilton Theorem

One can write a function of a square matrix just as one wrote the functions of operators. In fact, to a very
good approximation, what goes for operators goes for square matrices too. Thus, if |a〉 is an eigenvector
of A with eigenvalue a, then A2 |a〉 = a 2 |a〉 and An |a〉 = a n |a〉.
Suppose A is some n ×n matrix. Consider the determinant of λ1 − A, which is a polynomial in λ, with
highest power of λn , and can be written as

det(λ1 − A) = λn + c n−1 λn−1 + · · · + c 1 λ + c 0 . (87)

The equation
det(λ1 − A) = λn + c n−1 λn−1 + · · · + c 1 λ + c 0 = 0 (88)
Eq. (88) is known as the secular or characteristic equation for A. The n roots correspond to n eigenvalues
of A. The Cayley-Hamilton theorem states that if we replace λ by A in (88), the polynomial in A should be
equal to zero:
An + c n−1 An−1 + · · · + c 1 A + c 0 = 0 . (89)
In other words, a matrix always satisfies its characteristic equation.
Proof:
First, a wrong, or bogus proof. It is tempting to write det(A1 − A) = det(A − A) = det(0) = 0, so the
proof seems to be trivial. This is a bogus proof because (i) A1 is not supposed to be A × 1, and (ii) (88)
is an ordinary equation while (89) is a matrix equation, i.e., a sum of n 2 equations, so they cannot be
compared as such.
Now, the actual proof 9 . Suppose |a〉 is an eigenvector of A with eigenvalue a. Obviously, the charac-
teristic equation (88) is satisfied for λ = a. Applying the left-hand side of (89) on |a〉, we get
£ n
A + c n−1 An−1 + · · · + c 1 A + c 0 |a〉 = a n + c n−1 a n−1 + · · · + c 1 a + c 0 |a〉 = 0 ,
¤ £ ¤
(90)

from (88). This is true for all eigenvectors, so the matrix polynomial must identically be zero.
µ ¶
a b
To see what we exactly mean by the Cayley-Hamilton theorem, consider the matrix A = . The
c d
characteristic equation is
¯λ− a
¯ ¯
−b ¯¯
¯ = λ2 − (a + d )λ + (ad − bc) = 0 . (91)
¯ −c λ−d ¯
If we replace λ by A, we get a matrix polynomial
µ ¶µ ¶ µ ¶ µ ¶
a b a b a b 1 0
− (a + d ) + (ad − bc) (92)
c d c d c d 0 1
and it is straightforward to check that this is a 2 × 2 null matrix.
Examples:
1. Suppose the matrix A satisfies A2 −5A + 4 = 0, where 4 is 4 times the unit matrix. The characteristic
equation is then λ2 − 5λ + 4 = 0, so that the two eigenvalues are 1 and 4.
2. The Pauli matrices satisfy σ2i = 1, so the eigenvalues must be either +1 or −1.
9 This is not actually a watertight proof, but will do for us.

22

You might also like