Vector Space
Vector Space
CU PG-I
A NIRBAN K UNDU
Contents
3 Linear Operators 9
3.1 Some Special Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Matrices 14
4.1 Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Eigenvalues and Eigenvectors, Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Degenerate Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Functions of a Matrix: The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . 22
Let us recapitulate what we have already learnt about vectors (for simplicity, consider 2-dimensional
vectors in the cartesian coordinates, but the entire thing can be generalised for higher dimensions).
• We can multiply any vector A = (a 1 , a 2 ) by a real number d to get the vector D = d A. The individual
components are multiplied by d , so the magnitude of the vector increases by a factor of d . Thus,
αA + βB = (αa 1 + βb 1 , αa 2 + βb 2 ).
• The null vector 0 = (0, 0) always satisfies A + 0 = 0 + A = A. Also, there is a vector −A = (−a 1 , −a 2 )
so that A + (−A) = 0.
We can also write this simply as a i b i with the convention that every repeated index is summed
over. This is known as the Einstein convention. In this convention, the index that is repeated can
be used twice and only twice; expressions like a i b i c i are meaningless. Such a repeated index is
also known as a dummy index, as whether we write a i b i or a p b p is irrelevant; it means the same
thing.
p p
• The magnitude or length of A is given by a = A · A = a i a i . Thus, A · B = ab cos θ where θ is the
angle between the two vectors. Obviously, |A · B | ≤ |A||B |, which is ab. The equality sign applies
only if θ = 0 or π.
• There is, of course, nothing sacred about the choice of i and j as the basis vectors. One can rotate
the coordinate axes by an angle θ in the counterclockwise direction, so that for the new axes,
2
2 Linear vector space
Let us consider an assembly of some abstract objects1 , which we will denote as | 〉. If we want to label
them, we might call them something like |1〉 or |i 〉. (You will see this object a lot in quantum mechanics;
this is called a ket. In fact, we will develop the idea of vector spaces keeping its application in quantum
mechanics in mind.) Let this assembly be called V. We say that the kets live in the space V. This will be
called a linear vector space if the kets satisfy the following properties.
3. There exists a null element |0〉 ∈ V such that ∀|a〉 ∈ V, |a〉 + |0〉 = |0〉 + |a〉 = |a〉.
4. ∀|a〉 ∈ V, there exists an |a 0 〉 ∈ V such that |a〉 + |a 0 〉 = |0〉. The ket |a 0 〉 is called the inverse of |a〉.
The addition and multiplication as defined above satisfy the standard laws:
1. |a〉 + |b〉 = |b〉 + |a〉, |a〉 + (|b〉 + |c〉) = (|a〉 + |b〉) + |c〉 .
2. 1.|a〉 = |a〉 .
If all the conditions are satisfied, the set V is called a linear vector space (LVS) and the objects |〉 are called
vectors. Note that the existence of a scalar (i.e., dot) product is not essential for the definition.
We will often write the null element or the null vector |0〉 as just 0, because any vector |a〉 multiplied
by 0 gives |0〉. We can write
|i 〉 − | j 〉 = |i 〉 + (−1)| j 〉 = |i 〉 + | j 0 〉 , (5)
completely abstract. However, they need not be functions of spatial coordinates, or momenta.
3
later, the scalar product must be real if we take the product with the same vector. This is obvious here; if
z 1 = z 2 , the product is real. The null vector is 0 = 0 + i 0 and the inverse of (a, b) is (−a, −b).
3. All arrows in a 2-dimensional space, when multiplied only by a real number, form a 2-dimensional
vector space. This is because such multiplications cannot change the orientations of the arrows. If the
multiplication is by a complex number, both magnitude and direction change, and the LVS becomes
1-dimensional.
4. All sinusoidal waves of period 2π form an LVS. Any such wave can be described as sin(x + θ) =
cos θ sin x +sin θ cos x, so one can treat sin x and cos x as basis vectors. This is again a 2-dimensional LVS.
5. All 4-vectors in the Minkowski space-time, of the form x µ ≡ (c t , x, y, z), form a linear vector space,
with µ = 0, 1, 2, 3.
The scalar, or inner, product of two vectors |a〉 and |b〉 in a vector space is a scalar. Thus, the inner
product is defined to be a map of the form V × V → S, where S is the set of scalars on which V is defined
(e.g., for ordinary 3-dimensional vectors, the inner product is the dot product of two vectors, and S is the
set of all real numbers).
The inner product of two vectors |i 〉 and | j 〉 is denoted by 〈i | j 〉; the order is important, see below.
The symbol 〈 is called a bra, so that 〈|〉 gives a closed bracket. The notation is due to Dirac.
The properties of the scalar product is as follows.
1. 〈a|b〉 = 〈b|a〉∗ . Thus, in general, 〈a|b〉 6= 〈b|a〉, but 〈a|a〉 is real. Also, 〈a|a〉 ≥ 0, where the equality
p
sign comes only if |a〉 = |0〉. This defines 〈a|a〉 as the magnitude of the vector |a〉.
2. If |d 〉 = α|a〉 + β|b〉, then 〈c|d 〉 = α〈c|a〉 + β〈c|b〉 is a linear function of α and β. However, 〈d |c〉 =
α∗ 〈a|c〉 + β∗ 〈b|c〉 is a linear function of α∗ and β∗ and not of α and β. This follows trivially from
〈d |c〉 = 〈c|d 〉∗ .
A vector space where the inner product is defined is called an inner product space. Note that in-
ner product may not be defined for every LVS. A very good example is the Minkowski space-time. The
contraction of two 4-vectors, written as
a µ b µ = (a 0 b 0 − a · b) (6)
is not an inner product, as a µ a µ need not be non-negative; in fact, it is negative for a space-like 4-vector,
and zero for a light-like 4-vector (e.g., for two space-time coordinates connected by a light ray). Thus,
while you can always say that a µ b µ is a 4-dimensional dot product, this has a very important difference
with the 3-dimensional dot product, which is an inner product. While a · a is the square of the length of
a, a µ a µ is not the square of the “length” of a µ . The difference stems from the relative minus sign between
the zero-th and the spatial coordinates of a µ b µ .
If somehow 〈a|b〉 = 〈b|a〉, the LVS is called a real vector space. Otherwise, it is complex. The LVS of
ordinary 2- or 3-dimensional vectors is a real vector space as A · B = B · A. That of the complex numbers
(example 2 above) is a complex vector space.
4
In quantum mechanics, the space in which the wavefunctions live is also an LVS. This is known as
the Hilbert space 2 , after the celebrated German mathematician David Hilbert. This is an inner product
space, but with some special properties of convergence that makes it complete3 . We can indeed check
that the Hilbert space is an LVS; in particular, that is why the superposition principle in quantum me-
chanics holds. The wavefunctions are, however, complex quantities, and the scalar product is defined
as4 Z
〈ψ1 |ψ2 〉 = ψ∗1 ψ2 d 3 x . (7)
δi j = 1 if i = j , δi j = 0 if i 6= j , (8)
The scalar product is defined only between a vector from V and another vector from the dual space
VD . Of course, a lot of spaces, like the space for cartesian vectors, are self-dual; there is no distinction
between the original space and the dual space. And that is why you never knew about dual space when
learning dot product of ordinary vectors.
2 The proper definition of the Hilbert space may be found in, say, Dennery and Krzywicki. It is an infinite-dimensional space
but most of the times in quantum mechanics, we work with a small subset of the original space which is finite.
3 All inner product spaces are metric spaces. We will develop the concept of metric spaces later, but it is a space where we
can have the concept of a distance between two vectors: |c〉 = |a〉 − |b〉. If we can have a sequence of elements in an LVS where
the separation between successive elements becomes smaller and smaller as we proceed, and ultimately becomes infinitely
small, that is called a Cauchy sequence. If every Cauchy sequence in an LVS converges to an element within that LVS, it is called
a complete metric space.
4 The vectors, the kets and bras, in the Hilbert space are not in general functions of x. The wavefunction ψ(x) is a scalar prod-
uct of |ψ〉 and |x〉, the eigenkets of the position operator x̂: ψ(x) = 〈x|ψ〉. So 〈ψ1 |ψ2 〉 = 〈ψ1 | d 3 x|x〉〈x| |ψ2 〉 = d 3 x ψ∗
1 ψ2 .
£R ¤ R
5
2.2 Cauchy-Schwarz Inequality
Eq. (13) is known as the Cauchy-Schwarz inequality. For ordinary vectors, this just means
A set R is called a metric space if a real, positive number ρ(a, b) is associated with any pair of its elements
a, b ∈ R (remember that a and b need not be numbers) and
(1) ρ(a, b) ≥ 0 for any pair of elements a, b ∈ R;
(2) ρ(a, b) = ρ(b, a) (symmetry);
(3) ρ(a, b) = 0 only when a = b, i.e., when they refer to the same element in R;
(4) ρ(a, b) + ρ(b, c) ≥ ρ(a, c) (triangle inequality).
The number ρ(a, b) may be called the distance between a and b. The set R, together with the binary
operation ρ satisfying these conditions, constitute the metric space. The word “metric” comes from
metre, i.e., something with one can measure the distance between two elements. For n-dimensional
Euclidean space, s
n
ρ(a, b) = (a i − b i )2 .
X
(16)
i =1
Do not confuse ρ(a, b) with 〈a|b〉. In particular, ρ(a, a) = 0 (where a is some point in the LVS) but
p
〈a|a〉 (where |a〉 is a vector) defines the length or norm of that vector. For example, 〈ψ|ψ〉 = 1 means
6
that the wavefunction has been normalized to unity. More precisely, if one thinks |a〉 as the radius vector
starting at the origin and ending at the point a, and similarly for b, then ρ(a, b) is the norm of the vector
|a〉 − |b〉 (or the other way round).
If we have three vectors |a〉, |b〉, and |c〉 in an LVS and we define
|1〉 = |a〉 − |b〉 , |2〉 = |b〉 − |c〉 , |3〉 = |a〉 − |c〉 , (17)
then |1〉, |2〉, |3〉 satisfy the triangle inequality, and also the first two conditions of a metric space, so we
can say:
If the scalar product is defined in an LVS, it is a metric space.
Note that the scalar product need not be defined for all linear vector spaces, like the 4-dimensional
Minkowski space-time. Obviously, this is not a metric space.
d s 2 = d x 2 + d y 2 + d z 2 = d xi d xi . (18)
3. Similarly, in two-dimensional plane polar coordinate system, the separation between two points
(r, θ) and (r + d r, θ + d θ) is
d s 2 = d r 2 + r 2 d θ2 . (19)
In three-dimensional spherical polar coordinates (deduce this):
d s 2 = d r 2 + r 2 d θ 2 + r 2 sin2 θd φ2 . (20)
Note that there can be other type of metrics too, not only the distance between two points. For
example, consider the following function:
Check that this satisfies all the properties of metric space and can act as a metric on any set.
where a i s are some scalars, holds with at least two of the a i s being nonzero, the vectors are called lin-
early dependent. If the only solution for this is a i = 0 for all i , the vectors are linearly independent. The
maximum number of linearly independent vectors in a vector space is called the dimensionality of the
7
vector space. If there are infinite number of such linearly independent vectors, the space is infinite di-
mensional.
If the dimensionality of the vector space be n, the n linearly independent vectors |v 1 〉, |v 2 〉, .... |v n 〉
forms a basis of the vector space, and is said to span the space. Any other vector can be written as a
linear combination of the basis vectors:
Xn
|a〉 = a i |i 〉 , (23)
i =1
but no basis vector can be written as a linear combination of the other basis vectors; that is what linear
independence is all about. The numbers a i are called components of the vector |a〉 in the |i 〉 basis (com-
ponents depend on the choice of basis). It is easy to convince yourself that in a 3-dimensional cartesian
space, i , j , k are a suitable choice of the basis, however, vectors like (i + j + k), (i − j ), (i + j − 2k) are also
linearly independent and can act as a basis. How do you check that they are indeed linearly independent?
The 3-dimensional space can have at most three linearly independent vectors, that is why we call it
3-dimensional. On the other hand, there are infinitely many independent states for a particle in, say, an
infinitely deep one-dimensional potential well (we take the depth to be infinite so that all such states are
bound; for a well of finite depth, there will be a finite number of bound states and an infinite number of
unbound states). When we expand any arbitrary function in a Fourier series, there are infinitely many
sine or cosine functions in the expansion, and they are linearly independent 5 , so this is another infinite
dimensional LVS.
P
Given a basis, the components are unique. Suppose the vector |a〉 can be written both as a i |i 〉 and
P P
b i |i 〉. Subtracting one from the other, (a i − b i )|i 〉 = 0, so by the condition of linear independence of
the basis vectors, a i = b i for all i .
Starting from any basis |a i 〉, where i can be finite or infinite (but these basis vectors need not be
either orthogonal or normalised), one can always construct another orthonormal basis |i 〉. This is known
as Gram-Schmidt orthogonalisation. The procedure is as follows.
1. Normalise the first vector of the original basis:
1
|1〉 = p |a 1 〉 . (24)
〈a 1 |a 1 〉
Thus, 〈1|1〉 = 1.
2. Construct |20 〉 by taking |a 2 〉 and projecting out the part proportional to |1〉:
|20 〉 = |a 2 〉 − 〈1|a 2 〉|1〉 , (25)
which ensures 〈1|20 〉 = 0. Divide |20 〉 by its norm to get |2〉.
3. Repeat this procedure, so that
m−1
|m 0 〉 = |a m 〉 −
X
〈i |a m 〉|i 〉 . (26)
i =1
8
This completes the proof.
Q. If the basis vectors are orthonormal, 〈i | j 〉 = δi j , show that the components of any vector |a〉 are pro-
jected out by the projection operator |i 〉〈i |.
Q. A two-dimensional vector is written as A = 3i + 2j. What will be its components in a basis obtained by
rotating the original basis by π/4 in the counterclockwise direction?
Q. In a two-dimensional space, the basis vectors |i 〉 and | j 〉 are such that 〈i |i 〉 = 1, 〈 j | j 〉 = 2, 〈i | j 〉 = 1.
Construct an orthonormal basis.
Q. Suppose we take in the three-dimensional cartesian space the following vectors as basis: a = i + j + k,
b = i − 2k, c = 2j + k. Normalize a and construct two other orthonormal basis vectors. (This shows you
that the orthonormal basis is not unique, of course we can take i, j, k as another orthonormal basis.)
3 Linear Operators
A function f (x) associates a number y with another number x according to a certain rule. For example,
f (x) = x 2 associates, with every number x, its square. The space for x and y need not be identical. For
example, if x is any real number, positive, negative, or zero, f (x) is confined only to the non-negative
part of the real number space.
Similarly, we can assign with every vector |x〉 of an LVS, another vector |y〉, either of the same LVS or
of a different one, according to a certain rule. We simply write this as
and O is called an operator, which, acting on |x〉, gives |y〉. We often put a hat on O, like Ô, to indicate
that this is an operator. Unless a possible confusion can occur, we will not use the hat.
We will be interested in linear operators, satisfying
• The identity operator 1 takes a vector to itself without any multiplicative factors: 1|x〉 = |x〉 , ∀|x〉 ∈
S.
• If A and B are two linear operators acting on S, A = B means A|x〉 = B |x〉 , ∀|x〉 ∈ S.
9
• C = A + B means C |x〉 = A|x〉 + B |x〉 , ∀|x〉 ∈ S.
• D = AB means D|x〉 = A[B |x〉] , ∀|x〉 ∈ S. Note that AB is not necessarily the same as B A. A good
example is the angular momentum operators in quantum mechanics: J x J y 6= J y J x . If AB = B A, the
commutator [A, B ] = AB − B A is zero, and we say that the operators commute.
• The identity operator obviously commutes with any other operator A, as A1|x〉 = A|x〉, and 1[A|x〉] =
A|x〉.
• One can multiply an operator with a number. If A|x〉 = |y〉, then αA|x〉 = α|y〉. Obviously, Aα = αA.
• One can formally write higher powers of the operators. For example, A 2 |x〉 = A[A|x〉]. Similarly,
1 2 1 3
eA ≡ 1+ A + A + A +··· (31)
2! 3!
• The operator A can also act on the dual space S D . If A|a〉 = |c〉, one may write 〈b|A|a〉 = 〈b|c〉. The
vector 〈d | = 〈b|A is defined in such a way that 〈d |a〉 = 〈b|c〉.
This is quite a common practice in quantum mechanics, e.g.,
Z
〈ψ1 |H |ψ2 〉 = ψ∗1 H ψ2 d 3 x . (32)
Note that 〈b|A is not the dual of A|b〉. To see this, consider the operator α1. Acting on |x〉, this gives
α|x〉. The dual of this is 〈x|α∗ , which can be obtained by operating 1α∗ , and not 1α, on 〈x|.
Q. If A and B are linear operators, show that A + B and AB are also linear operators.
Q. If A + B = 1 and AB = 0, what is the value of A 2 + B 2 ?
Q. Show that e A e −A = 1 .
Q. If [A, B ] = 0, show that e A e B = e A+B .
Q. If [A, B ] = B , show that e A B e −A = eB .
Q. If O|x〉 = −|x〉, check whether O is linear. Do the same if O|x〉 = [|x〉]∗ .
1| j 〉 = | j 〉 , (33)
1=
X
|i 〉〈i | , (34)
i
10
The inverse operator of A is the operator B if AB = 1 or B A = 1. The operator B is generally denoted by
A −1 . If A −1 A = 1, A −1 is called the left inverse of A, and sometimes denoted by A −1l
. Similarly, if A A −1 = 1,
A −1 is called the right inverse, and written as A −1 r . Note that there is no guarantee that any one or both
the inverse operators will exist. Also, in general, A A −1 l
6= 1 and A −1
r A 6= 1.
A −1 −1 −1 −1
l A = 1 ⇒ Al A Ar = Ar ⇒ A −1 −1
l = Ar , (36)
where we have used the fact that any operator O multiplying 1 gives O.
The inverse in this case is also unique. To prove this, suppose we have two different inverses A −1
1 and
A −1
2 (whether left or right does not matter any more). Now
A −1 −1 −1 −1
1 A = 1 ⇒ A1 A A2 = A2 ⇒ A −1 −1
1 = A2 , (37)
〈a|A † B † |b〉 = [〈a|A † ][B † |b〉] = {[〈b|B ][A|a〉]}∗ = 〈b|B A|a〉∗ = 〈a|(B A)† |b〉 , (40)
where we have used the duality property of the vectors. Thus, for any two operators A and B ,
A † B † = (B A)† . (41)
d d d d ψ∗1
Z µ ¶ Z Z
〈ψ1 | |ψ2 〉 = ψ∗1 ψ2 d x = (ψ∗1 ψ2 )d x − ψ2 d x . (42)
dx dx dx dx
But the first integral is zero as both wavefunctions must vanish at the boundary of the integration region. Now,
complete the proof by showing that i d /d x is hermitian. This shows that momentum is indeed a hermitian opera-
tor in quantum mechanics.
Another important class of operators is where U † = U −1 . They are called unitary operators. One can
write
|U |a〉|2 = [〈a|U † ][U |a〉] = 〈a|U †U |a〉 = 〈a|U −1U |a〉 = 〈a|a〉 = ||a〉|2 , (43)
11
which means that operation by unitary operators keeps the length or norm of any vector unchanged.
The nomenclature is quite similar to those used for matrices. We will show later how one can repre-
sent 6 the action of an operator to a vector by conventional matrix multiplication.
Note that the combination |a〉〈b| acts as a linear operator. Operating on a ket, this gives a ket; oper-
ating on a bra, this gives a bra.
Also,
¤∗
〈x|(|a〉〈b|)|y〉 = (〈x|a〉)(〈b|y〉) = 〈y|b〉〈a|x〉 = 〈y|(|b〉〈a|)|x〉∗ ,
£
(45)
so |b〉〈a| is the adjoint of |a〉〈b|.
To sum up the important points once again:
1. The dual of A|x〉 is 〈x|A † . This is the definition of the adjoint operator. If A = A † , the operator is
hermitian.
2. In an expression like 〈x|A|x〉, A can act either on 〈x| or |x〉. But if A|x〉 = |y〉, 〈x|A 6= 〈y| unless A is
hermitian.
3. The ∇ operator is a 3-dimensional vector as it satisfies all the transformation properties of a vector
(there is a 4-dimensional analogue too). Thus, ∇.A is a scalar. ∇ × A is a cross product, which is just
a way to combine two vectors to get another vector: A = B × C ⇒ A i = ²i j k B j C k . This is a vector
operator, a vector whose components are operators. Note that d /d x is an operator whose inverse
does not exist unless you specify the integration constant.
Consider the LVS S of two-dimensional vectors, schematically written as |x〉. Let |i 〉 and | j 〉 be the two
unit vectors along the x- and y-axes. The operator P i = |i 〉〈i |, acting on any vector |x〉, gives 〈i |x〉|i 〉, a
vector along the x-direction with a magnitude 〈i |x〉.
Obviously, the set P i |x〉 is a one-dimensional LVS. It contains all those vectors of S that lie along
the x-direction, contains the null element, and also the unit vector |i 〉, which can be obtained by P i |i 〉.
Such a space S 0 , all whose members are members of S but not the other way round, is called a nontrivial
subspace of S. The null vector, and the whole set S itself, are trivial subspaces.
The operator P i is an example of the class known as projection operators. We will denote them by
P . These operators project out a subspace of S. Once a part is projected out, another projection cannot
do anything more, so P 2 = P . A projection operator must also be hermitian, since it is necessary that it
projects out the same part of the original space S and the dual space S D . Any operator that is hermitian
and satisfies P 2 = P is called a projection operator.
Suppose P 1 and P 2 are two projection operators. They project out different parts of the original LVS.
Is P 1 + P 2 a projection operator too? If P 1† = P 1 and P 2† = P 2 , (P 1 + P 2 )† = P 1 + P 2 . However,
(P 1 + P 2 )2 = P 12 + P 22 + P 1 P 2 + P 2 P 1 = (P 1 + P 2 ) + P 1 P 2 + P 2 P 1 , (46)
6 We will also explain what representation means.
12
so that P 1 P 2 + P 2 P 1 must be zero. Multiply from left by P 1 and use P 12 = P 1 , this gives P 1 P 2 + P 1 P 2 P 1 = 0.
Similarly, multiply by P 1 from the right, and subtract one from the other, to get
so that the only solution is P 1 P 2 = P 2 P 1 = 0. Projection operators like this are called orthogonal projection
operators. As an important example, for any P , 1 − P is an orthogonal projection operator. They sum up
to 1, which projects the entire space onto itself.
In short, if several projection operators P 1 , P 2 , · · · P n satisfy
If the effect of an operator A on a vector |a〉 is to yield the same vector multiplied by some constant,
We call it an eigenvalue equation, the vector |a〉 an eigenvector of A, and a the eigenvalue A.
If there is even one vector |x〉 which is a simultaneous eigenvector of both A and B , with eigenvalues
a and b respectively, then A and B commute. This is easy to show, as
(AB )|x〉 = A(b|x〉) = b A|x〉 = ab|x〉 , (B A)|x〉 = B (a|x〉) = aB |x〉 = ab|x〉 , (50)
[A, B ]|x〉 = 0|x〉 = 0 ⇒ AB |x〉 = B A|x〉 ⇒ A(B |x〉) = a(B |x〉) , (51)
or B |x〉 is also an eigenvector of A with the same eigenvalue a. But A has non-degenerate eigenvalues;
so this can only happen if B |x〉 is just some multiplicative constant times |x〉, or B |x〉 = b|x〉. Thus,
commuting operators must have simultaneous eigenvectors if they are non-degenerate.
13
One can have a counterexample from the angular momentum algebra of quantum mechanics. The
vectors are labelled by the angular momentum j and its projection m on some axis, usually taken to
be the z-axis. These vectors, | j m〉, are eigenvectors of the operator J2 = J x2 + J y2 + J z2 7 . They are also
eigenvectors of J z but not of J x or J y . So here is a situation where J2 and J x commute but they do not
have simultaneous eigenvectors. The reason is that all these | j m〉 states are degenerate with respect to
J2 with an eigenvalue of j ( j + 1)ħ2 .
The eigenvalues of hermitian operators are necessarily real. Suppose A is hermitian, A = A † , and
A|a〉 = a|a〉. Then
〈a|A|a〉 = a〈a|a〉 ,
〈a|A|a〉 = 〈a|A † |a〉∗ = 〈a|A|a〉∗ = a ∗ 〈a|a〉 , (52)
as the scalar product 〈a|a〉 is real. So a = a ∗ , or hermitian operators have real eigenvalues.
If an hermitian operator has two different eigenvalues corresponding to two different eigenvectors,
these eigenvectors must be orthogonal to each other, i.e., their scalar product must be zero. Suppose for
an hermitian operator A, A|a〉 = a|a〉 and A|b〉 = b|b〉. So,
〈b|A|a〉 = a〈b|a〉 ,
〈a|A|b〉 = 〈b|A † |a〉∗ = 〈b|A|a〉∗ = b〈a|b〉 ⇒ 〈b|A|a〉 = b〈b|a〉 , (53)
using the fact that b is real and 〈a|b〉∗ = 〈b|a〉. Subtracting one from the other, and noting that a 6= b, we
get 〈a|b〉 = 0, or they are orthogonal to each other.
4 Matrices
An m × n matrix A has m rows and n columns, and the i j -th element Ai j lives in the i -th row and the
j -th column. Thus, 1 ≤ i ≤ m and 1 ≤ j ≤ n. If m = n, A is called a square matrix.
The sum of two matrices A and B is defined only if they are of same dimensionality, i.e., both have
equal number of rows and equal number of columns. In that case, C = A + B means Ci j = Ai j + Bi j for
every pair (i , j ).
The inner product C = AB is defined if and only if the number of columns of A is equal to the number
of rows of B. In this case, we write
Xn
C = AB = Ci j = Ai k Bk j , (54)
k=1
one can also drop the explicit summation sign using the Einstein convention for repeated indices. If A
is an m × n matrix, and B is an n × p matrix, C will be an m × p matrix. Only if m = p, both AB and BA
are defined. They are of same dimensionality if m = n = p, i.e., both A and B are square matrices. Even if
the product is defined both way, they need not commute; AB is not necessarily equal to BA, and in this
respect matrices differ from ordinary numbers, whose products always commute.
The direct, outer, or Krönecker product of two matrices is defined as follows. If A is an m × m matrix
and B is an n × n matrix, then the direct product C = A ⊗ B is an mn × mn matrix with elements C pq =
7 Although we have used the cartesian symbols x, y, z, the angular momentum operators can act on a completely different
space.
14
A i j B kl , where p = m(i − 1) + k and q = n( j − 1) + l . For example, if A and B are both 2 × 2 matrices,
a 11 b 11 a 11 b 12 a 12 b 11 a 12 b 12
µ ¶
a B a 12 B a 11 b 21 a 11 b 22 a 12 b 21 a 12 b 22
A ⊗ B = 11 a 21 b 11 a 21 b 12 a 22 b 11 a 22 b 12 .
= (55)
a 21 B a 22 B
a 21 b 12 a 21 b 22 a 22 b 12 a 22 b 22
A row matrix R of dimensionality 1 × m has only one row and m columns. A column matrix C of
dimensionality m × 1 similarly has only one column but m number of rows. Here, both RC and CR are
defined; the first is a number (or a 1 × 1 matrix), and the second an m × m square matrix.
The unit matrix of dimension n is an n × n square matrix whose diagonal entries are 1 and all other
entries are zero: 1i j = δi j . The unit matrix commutes with any other matrix: A1 = 1A = A, assuming that
the product is defined both way (so that A is also a square matrix of same dimension). From now on,
unless mentioned explicitly, all matrices will be taken to be square ones.
If two matrices P and Q satisfy PQ = QP = 1, P and Q are called inverses of each other, and we denote
Q by P−1 . It is easy to show that left and right inverses are identical, the proof is along the same line as
the proof for linear operators.
The necessary and sufficient condition for the inverse of a matrix A to exist is a nonzero determinant:
det A 6= 0. The matrices with zero determinant are called singular matrices and do not have an inverse.
Note that for a square array
¯ a1 a2 a3 · · · ¯
¯ ¯
¯ ¯
¯ b1 b2 b3 · · · ¯
¯ ¯
¯c
¯ 1 c2 c3 · · · ¯
¯
··· ··· ··· ···
¯ ¯
the determinant is defined as ²i j k... a i b j c k ..., where ²i j k... is an extension of the usual Levi-Civita symbol:
+1 for an even permutation of (i , j , k, ...) = (1, 2, 3, ...), −1 for an odd permutation, and 0 if any two indices
are repeated.
If we strike out the i -th row and the j -th column of the n × n determinant, the determinant of the
reduced (n − 1) × (n − 1) matrix is called the i j -th minor of the original matrix. For example, if we omit
the first row (with a i ) and one of the columns in turn, we get the M 1 j minors. The determinant D n for
this n × n matrix can also be written as
n
(−1)1+ j a j M 1 j .
X
Dn = (56)
j =1
If the i -th row is omitted, the first factor would have been (−1)i + j .
As A−1 A = 1, (det A−1 )×(det A) = 1, as unit matrices of any dimension always have unit determinant.
A similarity transformation on a matrix A is defined by
A0 = R−1 AR . (57)
det A0 = det (R−1 AR) = det R−1 det A det R = det A . (58)
Another thing that remains invariant under a similarity transformation is the trace of a matrix, which is
just the algebraic sum of the diagonal elements: tr A = i Ai i . Even if A and B do not commute, their
P
15
traces commute, as tr (AB) = tr A tr B = tr (BA). It can be generalized: the trace of the product of any
number of matrices remains invariant under a cyclic permutation of those matrices. The proof follows
from the definition of trace, and the product of matrices:
X X
tr (ABC · · · P) = (ABC · · · P)i i = Ai j B j k Ckl · · · Ppi . (59)
i i , j ,k,l ...p
All the indices are summed over, so we can start from any point; e.g., if we start from the index k, we get
the trace as tr (C · · · PAB).
Note that this is valid only if the matrices are finite-dimensional. For infinite-dimensional matrices, tr (AB)
need not be equal to tr (BA). A good example can be given from quantum mechanics. One can write both position
and momentum operators, x and p, as infinite-dimensional matrices. The uncertainty relation, written in the
form of matrices, now reads [x, p] = i ħ1. The trace of the right-hand side is definitely nonzero; in fact, it is infinity
because the unit matrix is infinite-dimensional. The trace of the left-hand side is also nonzero, as tr (xp) 6= tr (px);
they are infinite-dimensional matrices too.
• A diagonal matrix Ad has zero or nonzero entries along the diagonal, but necessarily zero entries
in all the off-diagonal positions. Two diagonal matrices always commute. Suppose the i i -th entry
of Ad is a i and the j j -th entry of Bd is b j . Then
as the product is nonzero only when i = j = k. We get an identical result for (Bd Ad )i k , so they
always commute. A diagonal matrix need not commute with a nondiagonal matrix.
• The complex conjugate A∗ of a matrix A is given by (A∗ )i j = (Ai j )∗ , i.e., by simply taking the com-
plex conjugate of each entry. A need not be diagonal.
• The transpose AT of a matrix A is given by (AT )i j = A j i , i.e., by interchanging the row and the
column. The transpose of an m × n matrix is an n × m matrix; the transpose of a row matrix is a
column matrix, and vice versa. We have
or (AB)T = BT AT .
• The hermitian conjugate A† of a matrix A is given by A†i j = (A j i )∗ , i.e., by interchanging the row
and the column entries and then by taking the complex conjugate (the order of these operations
does not matter). If A is real, A† = AT .
16
σ−1
i
= σi . What are the hermitian conjugates of these matrices?
Q. Show that
θ θ
µ ¶
1
exp i θσ2 = cos + i σ2 sin . (63)
2 2 2
Q. The Pauli matrices satisfy [σi , σ j ] = 2i ²i j k σk and {σi , σ j } = 2δi j . Show that for any two vectors A and
B,
σ.A) (~
(~ σ.B) = A.B + i~ σ.(A × B) . (64)
Thus, the total number of independent elements is n 2 − n − 12 n(n − 1) = 12 n(n − 1). Note that OT O = 1
does not give any new constraints; it is just the transpose of the original equation.
Rotation in an n-dimensional space is nothing but transforming a vector by operators which can be
represented (we are yet to come to the exact definition of representation) by n × n orthogonal matrices,
with 21 n(n − 1) independent elements, or angles. Thus, a 2-dimensional rotation can be parametrized by
17
only one angle; a 3-dimensional rotation by three, which are known as Eulerian angles 8 .
One can have an identical exercise for the n × n unitary matrix U. We start with 2n 2 real elements, as
the entries are complex numbers. The condition
gives the constraints. There are again n such equations with the right-hand side equal to 1, which look
like
|U1k |2 = 1
X
(68)
k
for i = j = 1, and so on. All entries on the left-hand side are necessarily real. There are n C 2 = 21 n(n − 1)
conditions with the right-hand side equal to zero, which look like
U1k U∗2k = 0 .
X
(69)
k
However, the entries are complex, so a single such equation is actually two equations, for the real and
the imaginary parts. Thus, the total number of independent elements is 2n 2 − n − n(n − 1) = n 2 . Again,
U† U = 1 does not give any new constraints; it is just the hermitian conjugate of the original equation.
4.2 Representation
Suppose we have an orthonormal basis |i 〉, so that any vector |a〉 can be written as in (23). If the space
is n-dimensional, one can express these basis vectors as n-component column matrices, with all en-
tries equal to zero except one, which is unity. For example, in a 3-dimensional space, one can write the
orthonormal basis vectors as
1 0 0
|1〉 = 0 , |2〉 = 1 , |3〉 = 0 . (70)
0 0 1
Of course there is nothing sacred about the orthonormal basis, but it makes the calculation easier. The
vector |a〉 can be expressed as
a1
|a〉 = a 2 . (71)
a3
Consider an operator A that takes |a〉 to |b〉, i.e., A|a〉 = |b〉. Obviously |b〉 has same dimensionality as
|a〉, and can be written in a form similar to (71). The result is the same if we express the operator A as an
n × n matrix A with the following property:
Ai j a j = bi . (72)
We now call the matrix A a representation of the operator A, and the column matrices a, b representations
of vectors |a〉 and |b〉 respectively.
Examples:
8 There is a conventional choice of Eulerian angles, but it is by no means unique.
18
1. In a two-dimensional place, suppose A|1〉 = |1〉 and A|2〉 = −|2〉. Then a 11 = 1, a 22 = −1, a 12 = a 21 =
0, so that µ ¶
1 0
A= . (73)
0 −1
2. In a three-dimensional space, take A|1〉 = |2〉, A|2〉 = |3〉, A|3〉 = |1〉. Thus, a 21 = a 32 = a 13 = 1 and
the rest entries are zero, and
0 0 1
A = 1 0 0 . (74)
0 1 0
3. Suppose the Hilbert space is 2-dimensional (i.e., the part of the original infinite-dimensional
space in which we are interested) and the operator A acts like A|ψ1 〉 = p1 |ψ1 〉 + |ψ2 〉 and A|ψ2 〉 =
£ ¤
p 2
p1 −|ψ1 〉 + |ψ2 〉 . Thus, a 11 = a 22 = a 12 = −a 21 = 1/ 2.
£ ¤
2
If there is a square matrix A and a column matrix a such that Aa = αa, then a is called an eigenvector of
A and α is the corresponding eigenvalue. Again, this is exactly the same that we got for operators and
vectors, eq. (49).
A square matrix can be diagonalized by a similarity transformation: Ad = RAR−1 . For a diagonal
matrix, the eigenvectors are just the orthonormal basis vectors, with the corresponding diagonal entries
as eigenvalues. (A note of caution: this is strictly true only for non-degenerate eigenvalues, i.e., when all
diagonal entries are different. Degenerate eigenvalues pose more complication which will be discussed
later.) If the matrix A is real symmetric, it can be diagonalized by an orthogonal transformation, i.e., R
becomes an orthogonal matrix. If A is hermitian, it can be diagonalized by a unitary transformation:
Ad = UAU† , (75)
where U† = U−1 . While the inverse does not exist if the determinant is zero, even such a matrix can be
diagonalized. However, determinant remains invariant under a similarity transformation, so at least one
of the eigenvalues will be zero for such a singular matrix.
Trace also remains invariant under similarity transformations.
µ ¶ Thus, it is really easy to find out the
a b
eigenvalues of a 2 × 2 matrix. Suppose the matrix is , and the eigenvalues are λ1 and λ2 . We need
c d
to solve two simultaneous equations,
λ1 λ2 = ad − bc , λ1 + λ2 = a + d , (76)
19
Suppose the i j -th element of an n × n matrix A is denoted by a i j . If the system of equations
a 11 x 1 + a 12 x 2 + · · · a an x n = 0,
b 11 x 1 + b 12 x 2 + · · · b an x n = 0,
···
n 11 x 1 + n 12 x 2 + · · · n an x n = 0, (78)
have only one unique solution then the equations are linearly independent and the matrix is nonsingular,
i.e., det A is nonzero. In this case no eigenvalue can be zero, and the matrix is said to be of rank n.
If one of these equations can be expressed as a linear combination of the others, then no unique
solution of (78) is possible. The determinant is singular, i.e., A−1 does not exist, and one of the eigenval-
ues is zero. If there are m linearly dependent rows (or columns) on n − m linearly independent rows (or
columns), the matrix is said to be of rank n − m, and there are m number of zero eigenvalues.
Only one row of (77) is independent; the other two rows are identical, so linearly dependent, and the
rank is 1. Therefore, two of the eigenvalues are zero. The trace must be invariant, so the eigenvalues are
(0, 0, 3).
µ ¶
1 1
The eigenvectors are always arbitrary up to an overall sign. Consider the matrix A = . The
1 1
secular equation is
¯1−λ
¯ ¯
1 ¯¯
¯ = 0, (79)
¯ 1 1−λ¯
which boils down to λ(λ − 2) = 0, so the eigenvalues are 0 and 2 (this can be checked just by looking at
the determinant and trace, without even caring about the secular equation). For λ = 0, the equation of
the eigenvector is µ ¶µ ¶
1−0 1 x
= 0, (80)
1 1−0 y
p p
or x + y = 0. Thus, we can choose the normalized eigenvector as (1/ 2, −1/ 2), but we could have taken
the minus sign in the firstpcomponent
p too. p p the second eigenvector corresponding to λ = 2 or
Similarly,
x − y = 0 can either be (1/ 2, 1/ 2) or (−1/ 2, −1/ 2).
µ ¶
2 7
Q. For what value of x the matrix will have a zero eigenvalue? What is the other eigenvalue?
−6 x
Show that in this case the second row is linearly dependent on the first row.
Q. What is the rank of the matrix whose eigenvalues are (i) 2, 1, 0; (ii) 1, −1, 2, 2; (iii) i , −i , 0, 0?
Q. The 3 rows of a 3 × 3 matrix are (a, b, c); (2a, −b, c); and (6a, 0, 4c). What is the rank of this matrix?
Q. Write down the secular equation for the matrix A for which a 12 = a 21 = 1 and the other elements are
zero. Find the eigenvalues and eigenvectors.
If the eigenvalues of a matrix (or an operator) are degenerate, the eigenvectors are not unique. Consider
the operator A with two eigenvectors |x〉 and |y〉 having the same eigenvalue a, so that
20
Any linear combination of |x〉 and |y〉 will have the same eigenvalue. Consider the combination |m〉 =
α|x〉 + β|y〉, for which
Thus one can take any linearly independent combination of the basis vectors for which the eigenvalues
are degenerate (technically, we say the basis vectors that span the degenerate subspace) and those new
vectors are equally good as a basis. One can, of course, find an orthonormal basis too using the Gram-
Schmidt method. The point to remember is that if a matrix, or an operator, has degenerate eigenvalues,
the eigenvectors are not unique.
Examples:
1. The unit matrix in any dimension has all degenerate eigenvalues, equal to 1. The eigenvectors can
be chosen to be the standard orthonormal set, with one element unity and the others zero. But any linear
combination of them is also an eigenvector. But any vector in that LVS is a linear combination of those
orthonormal basis vectors, so any vector is an eigenvector of the unit matrix, with eigenvalue 1, which is
obvious: 1|a〉 = |a〉.
µ ¶ µ ¶ µ ¶
a b p1 p2
2. Suppose the matrix A = has eigenvalues λ1 and λ2 , and the eigenvectors and .
c d q1 q2
The matrix A + 1 must have the same eigenvectors, as they are also the eigenvectors of the 2 × 2 unit
matrix 1. The new eigenvalues, µ1 and µ2 , will satisfy
µ1 + µ2 = (a + 1) + (d + 1) = a + d + 2 = λ1 + λ2 + 2 ,
µ1 µ2 = (a + 1)(d + 1) − bc = (ad − bc) + a + d + 1 = λ1 λ2 + λ1 + λ2 + 1 , (83)
For λ = 1, the only equation that we have is y − z = 0 and there are infinite possible ways to solve this
equation. We can just pick up a suitable choice:
0
1
|2〉 = p 1 . (86)
2 1
The third eigenvector, if we want the basis to be orthonormal, can be found by the Gram-Schmidt
method. Another easy way is to have the cross product of these two eigenvectors, and we find 〈3| =
(1, 0, 0).
21
4.5 Functions of a Matrix: The Cayley-Hamilton Theorem
One can write a function of a square matrix just as one wrote the functions of operators. In fact, to a very
good approximation, what goes for operators goes for square matrices too. Thus, if |a〉 is an eigenvector
of A with eigenvalue a, then A2 |a〉 = a 2 |a〉 and An |a〉 = a n |a〉.
Suppose A is some n ×n matrix. Consider the determinant of λ1 − A, which is a polynomial in λ, with
highest power of λn , and can be written as
The equation
det(λ1 − A) = λn + c n−1 λn−1 + · · · + c 1 λ + c 0 = 0 (88)
Eq. (88) is known as the secular or characteristic equation for A. The n roots correspond to n eigenvalues
of A. The Cayley-Hamilton theorem states that if we replace λ by A in (88), the polynomial in A should be
equal to zero:
An + c n−1 An−1 + · · · + c 1 A + c 0 = 0 . (89)
In other words, a matrix always satisfies its characteristic equation.
Proof:
First, a wrong, or bogus proof. It is tempting to write det(A1 − A) = det(A − A) = det(0) = 0, so the
proof seems to be trivial. This is a bogus proof because (i) A1 is not supposed to be A × 1, and (ii) (88)
is an ordinary equation while (89) is a matrix equation, i.e., a sum of n 2 equations, so they cannot be
compared as such.
Now, the actual proof 9 . Suppose |a〉 is an eigenvector of A with eigenvalue a. Obviously, the charac-
teristic equation (88) is satisfied for λ = a. Applying the left-hand side of (89) on |a〉, we get
£ n
A + c n−1 An−1 + · · · + c 1 A + c 0 |a〉 = a n + c n−1 a n−1 + · · · + c 1 a + c 0 |a〉 = 0 ,
¤ £ ¤
(90)
from (88). This is true for all eigenvectors, so the matrix polynomial must identically be zero.
µ ¶
a b
To see what we exactly mean by the Cayley-Hamilton theorem, consider the matrix A = . The
c d
characteristic equation is
¯λ− a
¯ ¯
−b ¯¯
¯ = λ2 − (a + d )λ + (ad − bc) = 0 . (91)
¯ −c λ−d ¯
If we replace λ by A, we get a matrix polynomial
µ ¶µ ¶ µ ¶ µ ¶
a b a b a b 1 0
− (a + d ) + (ad − bc) (92)
c d c d c d 0 1
and it is straightforward to check that this is a 2 × 2 null matrix.
Examples:
1. Suppose the matrix A satisfies A2 −5A + 4 = 0, where 4 is 4 times the unit matrix. The characteristic
equation is then λ2 − 5λ + 4 = 0, so that the two eigenvalues are 1 and 4.
2. The Pauli matrices satisfy σ2i = 1, so the eigenvalues must be either +1 or −1.
9 This is not actually a watertight proof, but will do for us.
22