Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views63 pages

Linear Algebra, Lecture Notes, Math 405.

The lecture notes cover fundamental concepts in linear algebra, including linear maps, vector spaces, determinants, and eigenvalues. It emphasizes the importance of solving linear systems in various applications across multiple fields. The document also outlines key definitions, properties, and operations related to vector spaces and matrices.

Uploaded by

kakice2439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views63 pages

Linear Algebra, Lecture Notes, Math 405.

The lecture notes cover fundamental concepts in linear algebra, including linear maps, vector spaces, determinants, and eigenvalues. It emphasizes the importance of solving linear systems in various applications across multiple fields. The document also outlines key definitions, properties, and operations related to vector spaces and matrices.

Uploaded by

kakice2439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Linear Algebra lecture notes

Kazufumi Ito∗
November 18, 2020

CONTENTS

• Introduction — Linear map by matrix A, Linear system of equations, dot-product,


Matrix products, Field.

• Vector space. — Subspaces, null space N (A) and range space R(A), Linear indepen-
dent vectors, Span, Gauss-Jordan reduction, Reduced Row Echelon Form, Elementary
matrix multiplication and LU decomposition, Basis and dimension, Inverse of square
matrix A.

• Determinant and Matrix inverse. — Properties of determinant, Cramer’s rule, Matrix


inverse

• Linear Transform T . — Invertibility is equivalent to infectivity and subjectivity, Fun-


damental Theorem of Linear Maps, Matrix representation and Change of basis and
Similarity transform, N (T ) and R(T ).

• Eigenvalues. — Invariant subspace, Diagonilization, Generalized eigen vectors, Jordan


form, Applications to ODEs and Markov chain,

• Inner product and Orthogonality. — Gram-Scmitz orthogonalization, Orthogonal de-


composition theorems, Minimum and least squares solutions, Generalized matrix in-
verse.

• QR-decomposition and Singular value decomposition. — Housholders transform, Ap-


plications.

1 Introduction
Probably the most important problem in mathematics is that of solving a system of linear
equations. Well over 75 percent of all mathematical problems encountered in scientific or
industrial applications involve solving a linear system at some stage. By using the methods
of modern mathematics, it is often possible to take a sophisticated problem and reduce it
to a single system of linear equations. Linear systems arise in applications to such areas

Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA

1
as business, economics, sociology, ecology, demography, genetics, electronics, engineering,
physics, statistics, neuron-network and AI. Therefore, it seems appropriate to begin the
lecture with a section on linear systems.
A linear map A maps a column vector x of dimension n into a vector y of dimension m
by (A(αx1 + βx2 ) = α Ax1 + β Ax2 ) .
 
x1
 x2 
x =  .. 
 
.
xn
onto a vector y of dimension m by
 
a11 x1 + · · · + a1n xn
 a21 x1 + · · · + a2n xn 
y = A(x) =  .
 
..
 . 
am1 x1 + · · · + amn xn
The linear map A is thus defined by the matrix
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
A =  .. .. ,
 
.. ..
 . . . . 
am1 am2 ··· amn
and maps the column vector x to the matrix product
y = Ax.
A Linear system of equations is that Ax = y and we look for a solution x (vector) given
right had-side vector y.
The dot product of a vector a = (a1 , a2 · · · an ) with x is defined as
a · x = a1 x 1 + a2 x 2 + · · · an x n ,
i.e., by multiplying term-by-term the entries of a and x and summing these n products Then,
yi is the dot product of the i-th row of A and x.
EXAMPLE (traffic flow)
In the downtown section of a certain city, two sets of one-way streets intersect as shown
in Figure. The average hourly volume of traffic entering and leaving this section during rush
hour is given in the diagram.
At each intersection the number of automobiles entering must be the same as the number
leaving
x1 + 450 = x2 + 610 (intersection A)
x2 + 520 = x3 + 480 (intersection B)
x3 + 390 = x4 + 600 (intersection C)
x4 + 640 = x1 + 310 (intersection D)

2
Thus, we obtain a system of linear equations:
    
1 −1 0 0 x1 160
 0 1 −1 0   x2   −40
  
  = .
 0 0 1 −1   x3   210 
−1 0 0 1 x4 −330

A defines a linear map for Rn into Rm , i.e.,

A(a1 x1 + a2 x2 ) = a1 Ax1 + a2 Ax2

for all a1 , a2 ∈ R and vectors x1 , x2 .

3
Matrix product C = AB In general If A is an m × n matrix and B is an n × p matrix
(Note: number of columns of A and number of rows of B must be the same n).
   
a11 a12 · · · a1n b11 b12 · · · b1p
 a21 a22 · · · a2n   b21 b22 · · · b2p 
A =  .. .. , B =  ..
   
.. . . .. . . .. 
 . . . .   . . . . 
am1 am2 · · · amn bn1 bn2 · · · bnp

the matrix product C = AB is defined to be the m × p matrix


 
c11 c12 · · · c1p
 c21 c22 · · · c2p 
C =  ..
 
.. .. .. 
 . . . . 
cm1 cm2 · · · cmp

4
such that n
X
cij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj ,
k=1

for i = 1, · · · , m and j = 1, · · · , p. That is, the entry cij is the dot product of the i th row
of A and the jth column of B.
Therefore, C = AB can also be written as
 
a11 b11 + · · · + a1n bn1 a11 b12 + · · · + a1n bn2 ··· a11 b1p + · · · + a1n bnp
 a21 b11 + · · · + a2n bn1 a21 b12 + · · · + a2n bn2 ··· a21 b1p + · · · + a2n bnp 
C= .
 
.. .. ... ..
 . . . 
am1 b11 + · · · + amn bn1 am1 b12 + · · · + amn bn2 ··· am1 b1p + · · · + amn bnp

Thus the product AB is defined if and only if the number of columns in A equals the number
of rows in B, in this case n.
Note: An element s belongs to a set S ⇔ s ∈ S.
A ⊂ B: a set A is a subset of a set B, or equivalently B is a superset of A, if A is
contained in B. That is, all elements of A are also elements of B.
For example, Q is subset of R and R is a subset of C.

5
6
2 Vector space
Linear algebra is the study of linear maps on finite-dimensional vector spaces. Eventually
we will learn what all these terms mean. In this chapter we will define vector spaces and
discuss their elementary properties. We recall an n-tuple of real numbers as a column vector
 
x1
x = ~x =  ... 
 
xn

For example, the solution of the linear system. A vector space is a collection of vectors. The
operations of addition and scalar multiplication rules for vectors are
     
u1 v1 a1 u1 + a2 v1
a1  u2  + a2  v2  =  a1 u2 + a2 v2 
u3 v3 a1 u3 + a2 v3

for a1 , a2 ∈ R and vectors ~u, ~v ∈ R3 .

7
LEARNING OBJECTIVES FOR THIS CHAPTER:

• Vector space and subspace

• Linear independent vectors and span

• Gauss elimination (method to solve Ax = b).

• Bases

• Dimension of subspace

Next, we present the formal definition of a vector space.


Definition (Field F ) A field is a set F together with two binary operations on F called
addition and multiplication. A binary operation on F is a mapping F × F → F , that is, a
correspondence that associates with each ordered pair of elements of F a uniquely determined
element of F . The result of the addition of a and b is called the sum of a and b, and is
denoted a + b. Similarly, the result of the multiplication of a and b is called the product of
a and b, and is denoted ab or a · b. These operations are required to satisfy the following
properties, referred to as field axioms. In these axioms, a, b, and c are arbitrary elements of
the field F .

• Associativity of addition and multiplication: a+(b+c) = (a+b)+c, and a(bc) = (ab)c.

• Commutativity of addition and multiplication: a + b = b + a, and ab = ba.

• Additive and multiplicative identity: there exist two different elements 0 and 1 in F
such that a + 0 = a and a1 = a.

• Additive inverses: for every a in F , there exists an element in F, denoted −a, called
the additive inverse of a, such that a + (−a) = 0.

• Multiplicative inverses: for every a 6= 0 in F , there exists an element in F , denoted by


a−1 or 1/a, called the multiplicative inverse of a, such that aa−1 = 1.

• Distributivity of multiplication over addition: a(b + c) = (ab) + (ac).

8
This may be summarized by saying: a field has two operations, called addition and
multiplication; it is an abelian group under addition with 0 as the additive identity; the
nonzero elements are an abelian group under multiplication with 1 as the multiplicative
identity; and multiplication distributes over addition.
N, Z are not field. C, R and Q are all fields. There are many other fields, including
some finite fields. For example, for each prime number p, there is a field Fp = {0,1,2,...,p-1}
with p elements, where addition and multiplication are carried out modulo p. Thus, in F7 ,
we have 5 + 4 = 2, 5 × 4 = 6 and 5−1 = 3 because 5 × 3 = 1. The smallest such field F2 has
just two elements 0 and 1, where 1 + 1 = 0. This field is extremely important in Computer
Science since an element of F2 represents a bit of information.
Definition (Vector Space V ) v = ~v ∈ V

• Associativity of addition: u + (v + w) = (u + v) + w

• Commutativity of addition: u + v = v + u

• Identity element of addition: There exists an element 0 ∈ V , called the zero vector,
such that v + 0 = v for all v ∈ V .

• Inverse elements of addition: For every v ∈ V , there exists an element −v ∈ V called


the additive inverse of v, such that v + (−v) = 0.

• Compatibility of scalar multiplication with field multiplication: a(bv) = (ab)v.

• Identity element of scalar multiplication: 1v = v, where 1 denotes the multiplicative


identity in F .

• Distributivity of scalar multiplication with respect to vector addition: a(u + v) =


au + av.

• Distributivity of scalar multiplication with respect to field addition: (a + b)v = av + bv

EXAMPLEs (1) The most familiar examples are

R2 = {~x = (x1 , x2 ), x1 , x2 ∈ R} and R3 = {~x = (x1 , x2 , x3 ), x1 , x2 .x3 ∈ R}

which we can think of geometrically as the points in ordinary 2 and 3-dimensional space,
equipped with a coordinate system. In general

Rn = {~x = (x1 , x2 , · · · , xn ), x1 , x2 , · · · , xn ∈ R}
     
u1 v1 u1 + v1
~u + ~v =  u2  +  v2  =  u2 + v2 
u3 v3 u3 + v3
 
c u1
c ~u =  c u2 
c u3

9
       
2 u1 v1 3w1 2u1 + v1 − 3w1
~ =  2 u2  +  v2  −  3w2  =  2u2 + v2 − 3w2 
(2~u + ~v ) − 3w
2 u3 v3 3w3 2u3 + v3 − 3w3
(2) The set Rm×n of all m × n matrices is itself a vector space over R using the operations
of addition and scalar multiplication.
     
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
 a21 a22 a23  +  b21 b22 b23  =  a21 + b21 a22 + b22 a23 + b23 
a31 a32 a33 b31 b32 b33 a31 + b31 a32 + b32 a33 + b33
   
a11 a12 a13 ca11 ca12 ca13
c  a21 a22 a23  =  ca21 ca22 ca23 
a31 a32 a33 ca31 ca32 ca33
(3) Let Pn be the set of polynomials in x with coefficients in the field F . That is,

Pn = {a0 + a1 x + · · · + an xn , ai ∈ R}.

Let C((0, 1), R), consisting of all functions f : (0, 1) → R with the usual pointwise definitions
of addition and scalar multiplication of functions.

(f + g)(t) = f (t) + g(t), (cf )(t) = cf (t) for all t ∈ [0, 1].

We shall assume the following additional simple properties of vectors and scalars from
now on. They can all be deduced from the axioms (and it is a useful exercise to do so).

(i)a~0 = ~0, (ii)0~v = ~0, (iii) − (av) = (−a)v = a(−v), (iv)a~v = ~0 → a = 0 or ~v = 0.

for all a ∈ F and v ∈ V .


(1) Identity element of addition 0 is unique. Proof: 0 = 0 + 00 = 00
(2) The additive inverse is unique. u + v = 0 = u + v 0 implies v = v 0 . Proof:

v = v + 0 = v + u + v 0 = v 0 + (u + v) = v 0 + 0 = v 0

(ii) Proof: 0v = (0 + 0)v = 0v + 0v and 0v = 0v = 0v + w for all w ∈ V . Thus, 0v = ~0.


Convention: ~u − ~v = ~u + (−1)~v = ~u + (−~v )

2.1 Subspaces
Definition (subspace) A subset U of vector space V is called a subspace of V if U is also a
vector space (using the same addition and scalar multiplication as on V ).
EXAMPLEs      
 x   x 
 y  : x, y ∈ R ,  y  : x + 2y − z = 0
0 z
   

10
are subspace of R3   
 x 
 y  : x + 2y − z = 1
z
 

is not a subspace of R3 .
  
x y
: x, y, z ∈ R (symmetric matrix)
y z

is a subspace in R2×2
(2) The null space of a matrix A

N (A) = {x ∈ Rh : Ax = ~0}

is a subspace of Rn since

A(a1~x1 + a2~x2 ) = a1 A~x1 + a2 A~x2 .

The range space of a matrix A

R(A) = {y ∈ Rm : y = Ax, x ∈ Rn }

is a subspace of Rm since

a1 y1 + a2 y2 = A(a1 x1 + a2 x2 ) for y1 = Ax1 , y2 = Ax2 .

(3) P2 is a subspace of P3
(4) The space of all continuously differentiable function C 1 (0, 1) on (0, 1) is a subspace of
the space of continuous functions C(0, 1)
(5) {f 0 (1/2) = f (1/2)} is a subspace of C 1 (0, 1).
(6) Let S be the set of all f ∈ C 2 (0, 1) such that f 00 + f (x) = 0. is a subspace of C 2 (0, 1).
In fact
(a1 f + a2 g)00 + (a1 f + a2 g) = a1 (f 00 + f (x)) + a2 (g 00 + g(x)) = 0
and a1 f + a2 g ∈ S for all f, g ∈ S.
Proposition If W1 and W2 are subspaces of V then so is W1 ∩ W2 .
Proof. Let u, v ∈ W1 ∩ W2 and a ∈ F . Then u + v ∈ W1 (because W1 is a subspace)
and u + v ∈ W2 (because W2 is a subspace). Hence u + v ∈ W1 ∩ W2 . Similarly, we get
av ∈ W1 ∩ W2 , so W1 ∩ W2 is a subspace of V .
Warning! It is not necessarily true that W1 ∪ W2 is a subspace, as the following example
shows.
EXAMPLE Let V = R2 , let W1 = {(a, 0) : a ∈ R} and W2 = {(0, b) : b ∈ R}. Then W1 , W2
are subspaces of V , but W1 ∪ W2 i s not a subspace, because (1, 0), (0, 1) ∈ W1 ∪ W2 , but
(1, 0) + (0, 1) = (1, 1) ∈
/ W1 ∪ W2 .
Note that any subspace of V that contains W1 and W2 has to contain all vectors of the
form u + v for u ∈ W1 , v ∈ W2 . This motivates the following definition.

11
Definition Let W1 , W2 be subspaces of the vector space V . Then the direct sum of W1 , W2
is
W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 }.
Do not confuse W1 + W2 with W1 ∪ W2 .
Proposition If W1 , W2 are subspaces of V then so is W1 + W2 . In fact, it is the smallest
subspace that contains both W1 and W2 .
Proof. Let u, v ∈ W1 + W2 . Then u = u1 + u2 for some u1 ∈ W1 , u2 ∈ W2 and v = v1 + v2 for
some v1 ∈ W1 , v2 ∈ W2 . Then u + v = (u1 + v1 ) + (u2 + v2 ) ∈ W1 + W2 . Similarly, if a ∈ F
then av = av1 + av2 ∈ W1 + W2 . Thus W1 + W2 is a subspace of V . Any subspace of V that
contains both W1 and W2 must contain W1 + W2 , so it is the smallest such subspace.

2.2 Linear independent

Definition A sequence of vectors (v~1 , v~2 , · · · , v~k )from a vector space V is said to be linearly
dependent, if there exist scalars a1 , a2 , . . . , ak , not all zero, such that

a1 v~1 + a2 v~2 + · · · + ak v~k = ~0.

Notice that if not all of the scalars are zero, then at least one is non-zero, say a1 , in which
case this equation can be written in the form
−a2 −ak
v~1 = v~2 + · · · + v~k .
a1 a1
Thus, v~1 is shown to be a linear combination of the remaining vectors.
A sequence of vectors (v~1 , v~2 , . . . , v~n ) is said to be linearly independent if the equation

a1 v~1 + a2 v~2 + · · · + an v~n = ~0,

can only be satisfied by ai = 0, i = 1, · · · , n. This implies that no vector in the sequence can
be represented as a linear combination of the remaining vectors in the sequence. Even more

12
concisely, a sequence of vectors is linear independent if and only if ~0 can be represented as
a linear combination of its vectors in a unique way.
The alternative definition, that a sequence of vectors is linearly dependent if and only if
some vector in that sequence can be written as a linear combination of the other vectors.
Remark: (1) Let ~vi , 1 ≤ i ≤ n be column vectors
 
v1,i
~vi =  ...  ∈ Rm
 
vm,i

Then,     
v1,1 · · · v1,n a1 0
 .. ..   ..  =  ..  .
 . .  .   . 
vm,1 · · · vm,n an 0
(2) {~vk } are linearly independent Ax = ~0 has a unique solution x = ~0 and N (A) = {~0} (null
space of A). Moreover Ax = b has a unique solution, i.e., Ax1 = b and Ax2 = b implies
A(x1 − x2 ) = ~0 and thus x1 = x2 .
(3) {~vk } are linearly dependent Ax = ~0 has a nontrivial solution.
Question 1 and Objective: Identify linearly independent or dependent. How to
find N (A) and R(A).

2.3 Span
Let ~v1 , ~v2 , · · · , ~vn be vectors in a vector space V . A sum of the form

a1~v1 + a2~v2 + · · · + an~vn

where a1 , · · · an ∈ R, is called a linear combination of ~v1 , ~v2 , · · · , ~vn . The set of all linear
combinations of ~v1 , ~v2 , · · · , ~vn is called the span of ~v1 , ~v2 , · · · , ~vn , i.e.,

Span(~v1 , ~v2 , · · · , ~vn ) = {a1~v1 + a2~v2 + · · · + an~vn , ai ∈ R}.

which is a subspace of V .
Remark Let ~ei is the i−th unit vector such that (~ei )j = 0, j 6= i and (~ei )i = 1 Then, {~ei }ni=1
are linear independent and Rn = span(~e1 , ~e2 , · · · , ~en ), i.e.,

~x = (x1 , · · · , xn ) = x1 ~e1 + x2 ~e2 · · · + xn~en .

span(~v1 , ~v2 , · · · , ~vn ) = V1 + V2 + · · · Vn (direct sum)


where Vi = span(~vi ) Recall that if (~v1 , ~v2 , · · · , ~vn ) are linear dependent, i.e. a1 6= 0, then
~v1 ∈ span(~v2 , · · · , ~vn )) and thus span(~v1 , ~v2 , · · · , ~vn ) = span(~v2 , · · · , ~vn )).

span(~v1 , ~v2 , · · · , ~vn ) = Rn , ~vi ∈ Rn

13
if and only if {~v1 , ~v2 , · · · , ~vn } are linearly independent.
Question 2 and Objective: How to determine the span of vectors.
EXAMPLEs (1) P3 = span(1, x, x2 , x3 )

p(x) = a0 + a1 x + a2 x2 + a3 x3

(2) Vectors {1, x, x2 , x3 } are linearly independent in P3 . Suppose x3 are linearly dependent,
i.e.
x3 = a0 + a1 x + a2 x2 for some a0 , a1 , a2 ∈ R
Taking derivative of this three times in x, we obtain 6 = 0, which is a contradiction.
(3) Vectors {a11 +a12 x+a13 x2 , a21 +a22 x+a23 x2 , a31 +a32 x+a33 x2 } are linearly independent
if and only if N (A) = {~0} where
 
a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33

(4) the sum of two vectors ~v1 and ~v2 is the plane that contains them both. For ~v1 =
(1, 2, 3)t , ~v2 = (2, 4, 6)t are linearly dependent and

span(~v1 , ~v2 ) = span(~v1 ).

(5) If span(~v1 , ~v2 , · · · , ~vn ) = Rm , then n ≥ m. Conversely, if n > m, then (~v1 , ~v2 , · · · , ~vn )
are linear dependent.

2.4 Gauss-Jordan Reduction


LEARNING OBJECTIVES FOR THIS SECTION: Gauss elimination to Triangular matrix
form U , and LU decomposition of matrix A. Examples and Applications.
In this section we study a way to solve a linear equation Ax = b. If there exits a unique
solution to Ax = b we say
x = A−1 b,
where A−1 ∈ Rn×n is the inverse of a matrix A, i.e., A−1 A = I =identity matrix.
Objective includes: Identify {~v1 , · · · , ~vn } linearly independent or dependent. Find N (A)
of matrix A of column vectors {~v1 , · · · , ~vn }:

A = [~v1 |~v2 | · · · |~vn ].

EXAMPLE 1 (Triangular System)


     
 3x1 + 2x2 + x3 = 1 3 2 1 x1 1
x2 − x3 = 2 ⇔  0 1 −1   x2  =  2 
2x3 = 4 0 0 2 x3 4

14
is in upper triangular form, since in matrix A has all zeroes under the diagonal. Because
of the strict triangular form, the system is easy to solve. It follows from the third equation
that x3 = 2. Using this value in the second equation, we obtain
x 2 − 2 = 2 ⇒ x2 = 4
Using x2 = 4, x3 = 2 in the first equation, we end up with
3x1 + 2 · 4 + 2 = 1 ⇒ x1 = −3
Thus, the solution of the system is (−3, 4, 2).
Any n×n upper triangular system can be solved in the same manner as the last example.
First, the nth equation is solved for the value of xn . This value is used in the (n−1)st equation
to solve for xn−1 . The values xn and xn−1 are used in the (n − 2)nd equation to solve for
xn−2 , and so on. We will refer to this method of solving a upper triangular system as back
substitution.
Remark If all diagonal entries of upper triangle matrix A are nonzero, then
Ax = b has a unique solution by back substitution. Ax = ~0 has a unique solution
x = ~0, equivalently N (A) = {~0} and {~v1 , · · · , ~vn } are linearly independent.
Gauss-Jordan Reduction is to transform A into a upper triangular matrix by
row operations as below (Gauss elimination).
EXAMPLE 2 Solve the system
     
 x1 + 2x2 + x3 = 3 1 2 1 x1 3
3x1 − x2 − 3x3 = −1 ⇔  3 −1 −3   x2  =  −1  .
2x1 + 3x2 + x3 = 4 2 3 1 x3 4

Subtracting 3 times the first row from the second row yields −7x2 − 6x3 = −10.
Subtracting 2 times the first row from the third row yields −x2 − x3 = −2.
If the second and third equations of our system, respectively, are replaced by these new
equations, we obtain the equivalent system
    
1 2 1 x1 3
 0 −7 −6   x2  =  −10 
0 −1 −1 x3 −2
If the third equation of this system is replaced by the sum of the third equation and − 17
times the second equation, we end up with the following upper triangular system:
    
1 2 1 x1 3
 0 −7 −6   x2  =  −10 
0 0 − 17 x3 − 47
Using back substitution, we get x3 = 4, x2 = −2, x1 = 3.
With each system of equations Ax = b we may associate an augmented matrix of the
form  
a11 · · · a1n b1
A b =  ... .. ..  .
  
. . 
am1 · · · a1n bm

15
where we attach to the coefficient matrix A an additional column b. The system can be
solved by performing operations on the augmented matrix. The xi?s are placeholders that
can be omitted until the end of the computation. Corresponding to the three operations used
to obtain equivalent systems, the following row operations may be applied to the augmented
matrix:
Elementary Row Operations

• [I] Interchange two rows.


• [II] Multiply a row by a nonzero real number.
• [III] Replace a row by its sum with a multiple of another row.

EXAMPLE 3 Solve the system




 0x1 − x2 − x3 + x4 = 0
x1 + x2 + x3 + x4 = 6


 2x1 + 4x2 + x3 − 2x4 = −1
3x1 + x2 − 2x3 + 2x4 = 3

The augmented matrix for this system is


 
0 −1 −1 1 0
 1 1 1 1 6 
 .
 2 4 1 −2 −1 
3 1 −2 2 3
Since it is not possible to eliminate any entries by using 0 as a pivot element, we will use
row operation [I] to interchange the first two rows of the augmented matrix. The new first
row will be the pivotal row and the pivot element will be 1:
 
1 1 1 1 6
 0 −1 −1 1 0 
(pivot a116=0 ) →  .
 2 4 1 −2 −1 
3 1 −2 2 3
Row operation [III] is then used twice to eliminate the two nonzero entries in the first column:
 
1 1 1 1 6
 0 −1 −1 1 0 
 .
 0 2 −1 −4 −13 
0 −2 −5 −1 −15
Next, the second row is used as the pivotal row to eliminate the entries in the second column
below the pivot element −1:
 
1 1 1 1 6
 0 −1 −1 1 0 
 .
 0 0 −3 −2 −13 
0 0 −3 −3 −15

16
Finally, the third row is used as the pivotal row to eliminate the last element in the third
column:  
1 1 1 1 6
 0 −1 −1 1 0 
 .
 0 0 −3 −2 −13 
0 0 0 −1 −2
This augmented matrix represents a upper triangular system. Solving by back substitution,
we obtain the solution (2, −1, 3, 2). In general, if an n × n linear system can be reduced to
upper triangular form, then it will have a unique solution that can be obtained by performing
back substitution on the triangular system. We can think of the reduction process as an
algorithm involving n − 1 steps. At the first step, a pivot element is chosen from among the
nonzero entries in the first column of the matrix. The row containing the pivot element is
called the pivotal row. We interchange rows (if necessary) so that the pivotal row is the new
first row. Multiples of the pivotal row are then subtracted from each of the remaining n − 1
rows so as to obtain 0s in the first entries of rows 2 through n. At the second step, a pivot
element is chosen from the nonzero entries in column 2, rows 2 through n, of the matrix.
The row containing the pivot is then interchanged with the second row of the matrix and
is used as the new pivotal row. Multiples of the pivotal row are then subtracted from the
remaining n − 2 rows so as to eliminate all entries below the pivot in the second column.
The same procedure is repeated for columns 3 through n − 1. Note that at the second step
row 1 and column 1 remain unchanged, at the third step the first two rows and first two
columns remain unchanged, and so on. At each step, the overall dimensions of the system
are effectively reduced by 1 If the elimination process can be carried out as described, we
will arrive at an equivalent strictly triangular system after n − 1 steps. The steps of Gauss
elimination is depicted by

However, the procedure will break down if, at any step, all possible choices for a pivot
element equals to 0. When this happens, the alternative is to reduce the system to certain
special echelon, or staircase-shaped, forms. These echelon forms will be studied in the next
section. They will also be used for m × n systems, where m 6= n.

2.5 Reduced Row Echelon Form


Gauss-Jordon procedure will break down if, at any step, all possible choices for a pivot
element are equal to 0. When this happens, the alternative is to reduce the system to

17
certain special echelon, or staircase-shaped, forms. A matrix is in row echelon form if it has
the shape resulting from a Gaussian elimination. Specifically, a matrix is in row echelon
form if
(a) all rows consisting of only zeroes are at the bottom.
(b) the leading coefficient (also called the pivot) of a nonzero row is always strictly to the
right of the leading coefficient of the row above it.
These two conditions imply
(c) all entries in a column below a leading coefficient are zeros.

18
Suppose we start with b = (1, −1, 1, 3, 4)t we obtain then the reduction process will yield
the echelon-form augmented matrix with the last column = (1, 3, 0, 0, 0)t and the last two
equations of the reduced system will be satisfied for any 5-tuple. Thus the solution set will
be the set of all 5-tuples satisfying the first three equations.

x1 + x2 + x3 + x 4 + x5 = 1
x3 + x4 + 2x5 = 0
x5 = 3.

The variables corresponding to the first nonzero elements in each row of the reduced matrix
will be referred to as lead variables. Thus x1 , x3 , and x5 are the lead variables. The remaining
variables corresponding to the columns skipped in the reduction process will be referred to
as free variables. Hence, x2 and x4 are the free variables. If we transfer the free variables

19
over to the right-hand side of this, we obtain the system

x1 + x3 + x5 = 1 − x2 − x4
x3 + 2x5 = −x4
x5 = 3.

System this is strictly triangular in the unknowns x1 , x3 , and x5 . Thus, for each pair of
values assigned to x2 = α and x4 = β, there will be a unique solution.

x5 = 3, x3 = −β − 6, x1 = −2 − α − 2β

Let b = (0, 0, 0, 0, 0)t . The reduced echelon form yields

x1 + x3 + x5 = −x2 − x4
x3 + 2x5 = −x4
x5 = 0

Thus, we have

N (A) = {(−α − 2β, α, −β, β, 0)t : α, β ∈ R} = α (−1, 1, 0, 0, 0) + β (−2, 0, −1, 1, 0).

EXAMPLE (traffic) The augmented matrix for the system


 
1 −1 0 0 160
 0
 1 −1 0 −40 
 0 0 1 −1 210 
−1 0 0 1 −330

is reduced to
     
1 −1 0 0 160 1 −1 0 0 160 1 −1 0 0 160
 0 1 −1 0 −40 → 0
  1 −1 0 −40 → 0
 1 −1 0 −40 
→ .
 0 0 1 −1 210   0 0 1 −1 210   0 0 1 −1 210 
0 −1 0 1 −170 0 −0 −1 1 −210 0 −0 0 0 0

The system is consistent, and since there is a free variable, there are many possible solutions.
The traffic flow diagram does not give enough information to determine x1 , x2 , x3 , and x4
uniquely. If the amount of traffic were known between any pair of intersections, the traffic
on the remaining arteries could easily be calculated. For example, if the amount of traffic
between intersections C and D averages 200 automobiles per hour, then x4 = 200. Using
this value, we can then solve for x1 , x2 , and x3 by back substitution x1 = 530, x2 = 170
x3 = 410.
EXAMPLE (Underdetermined) Consider

x1 + x2 + x3 + x4 + x5 = 2, x1 + x2 + x3 + 2x4 + 2x5 = 3, x1 + x2 + x3 + 2x4 + 3x5 = 2


     
1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2
 1 1 1 2 2 3 → 0 0 0 1 1 1 → 0 0 0 1 1 1 .
1 1 1 2 3 2 0 0 0 1 2 0 0 0 0 0 1 −1

20
It is consistent. We put the free variables x2 , x3 over on the right-hand side, it follows that

x1 = 1 − x2 − x3 , x4 = 2, x5 = −1.

Thus, for any real numbers α and β, the 5-tuple

(1 − α − β, α, β, 2, −1)

is a solution of the system.

N (A) = (−α − β, α, β, 0, 0) = α (−1, 1, , 0, 0, 0) + β (−1, 0, 1, 0, 0).

EXAMPLE (Overdetermined)

x1 + 2x2 + x3 = 1, 2x1 − x2 + x3 = 2, 4x1 + 3x2 + 3x3 = 4, 3x1 + x2 + 2x3 = 3


     
1 2 1 1 1 2 1 1 1 2 1 1
 2 −1 1 2   0 −5 −1 0   0 −5 −1 0 
 → → 
 4 3 3 4   0 −5 −1 0   0 0 0 0 
3 1 2 3 0 −5 −1 0 0 0 0 0
It is consistent. We put the free variable x3 over on the right-hand side, it follows that
1 1
x2 = − x3 , x1 = 1 − 2x2 − x3 = 1 − x3
5 5
Thus, for any real number α , the 4-tuple
1 1
(1 − α, − α, α)
3 3
is a solution of the system.
1 1 1 1
N (A) = (− α, − α, α) = α (− , − , 1).
3 3 3 3

2.5.1 Elementary matrix multiplication and LU decomposition of A


The elementary row operations of Gauss elimination can be rewritten in a matrix product
form A = LU where L is a lower triangular matrix and U is a reduced upper triangular
matrix. Recall the Gauss-elimination use (N − 1) steps to reduce A into U . That is, given
an N × N matrix A = (ai,j )1≤i,j≤N , define A(0) = A. At n-th step we eliminate the matrix
elements below the main diagonal in the n-th column of A(n−1) by adding to the i-th row of
this matrix the n-th row multiplied by
(n−1)
ai,n
−`i,n := − (n−1)
, i = n + 1, . . . , N.
an,n

21
This can be done by multiplying A(n−1) to the left with the lower triangular matrix
 
1
 ... 
 
1
 
Ln =  ,
 
 −ln+1,n 
 .. ... 
 . 
−lN,n 1

We set
  
1 1
..  . . .

 . 


1 0 0 an,n · · · an,N 
  
A(n) := Ln A(n−1) = .

−ln+1,n 0 0 an+1,n an+1,N 


 .. ...  ... ...
 .. .. 
 . . . 
−lN,n 1 0 0 aN,n · · · aN,N

which coincides with the n-th Gauss elimination step and n-th step matrix A(n) has n-
(n)
th column with all zeros under ann , i.e., ai,n , n + 1 ≤ i ≤ N . After N − 1 steps, we
eliminated all the matrix elements below the main diagonal, so we obtain an upper triangular
matrix U = A(N −1) . We find the LU decomposition A = LU , i.e.,
−1
U = A(N −1) = LN −1 LN −2 · · · L1 A, L = (LN −1 LN −2 · · · L1 )−1 = L−1 −1
1 L2 · · · LN −1 .

Because the inverse of a lower triangular matrix Ln is again a lower triangular matrix, and
the multiplication of two lower triangular matrices is again a lower triangular matrix, it
follows that L is a lower triangular matrix. Moreover, it can be seen that
 
1
 l2,1 . . .
 

..
 
 . 1

L= . .
 
 .. . . . l 1 
 n+1,n 
 .. ... ... 
 . 
lN,1 . . . lN,n . . . lN,N −1 1

It is clear that in order for this algorithm to work, one needs to have a(n−1)n,n 6= 0 at each step
(see the definition of li,n ). If this assumption fails at some point, one needs to interchange n-th
row with another row below it before continuing (Pivoting). This is why an LU decomposition
in general looks like A = P LU . (P is a permutation matrix).
Remark If all diagonal entries of U are nonzero, then Ax = b for a ∈ Rn×n has
a unique solution by back substitution and if Ax = ~0, then x = ~0, equivalently
N (A) = {~0} and column vectors of A are linearly independent

22
2.6 Basis and dimension
LEARNING OBJECTIVES FOR THIS SECTION: Basis and dimension of subspace and
Gauss elimination. Examples including N (A), R(A) and Properties and Algorithms.
Definition (1) A basis for a subspace S is a set of linearly independent vectors whose
span is S. The number n of vectors in a basis of the finite-dimensional subspace S is called
the dimension of S and we write dim(S) = n.
(2) The column rank of matrix A is the dimension of the column space of

A = [~v1 | · · · |~vk ],

where S = span(~v1 , · · · , ~vk ). Ref: MATLAB rank.


(3) A basis of V is a list of vectors in V that is linearly independent and spans V . The
number n of elements in a basis is always equal to the geometric dimension of the subspace
S.
Any spanning set for a subspace can be changed into a basis by removing
redundant vectors (column wise Gauss elimination (see algorithms, below).

S = span(~v1 , · · · ~vn ) and dim(S) = n

and there exists a unique a1 , · · · an ∈ R such that every s ∈ S is represented by

s = a1 ~v1 + · · · an~vn .

The dimension of the null space N (A) is called the nullity of the matrix, and is related
to the rank of the matrix A by the following equation:

rank(A) + nullity(A) = m,

which is known as the rank-nullity theorem. In fact for rank(A) = n ≤ m.

~vn+i = ai1~v1 + · · · + ain~vn , 1 ≤ i ≤ k − n.

Thus, dim(N (A)) = m − n.


EXAMPLE Let S be the subspace of R4 defined by the equations

x1 = 2x2 and x3 = 5x4 .

Then the vectors (2, 1, 0, 0) and (0, 0, 5, 1) are a basis for S. In particular, every vector that
satisfies the above equations can be written uniquely as a linear combination of the two basis
vectors:
(2t1 , t1 , 5t2 , t2 ) = t1 (2, 1, 0, 0) + t2 (0, 0, 5, 1).
The subspace S is two-dimensional. Geometrically, it is the plane in R4 passing through the
points (0, 0, 0, 0), (2, 1, 0, 0), and (0, 0, 5, 1).
EXAMPLE

23
In many applications, it is necessary to find a particular subspace of a vector space
V = R4 . This can be done by finding a set of basis elements of the subspace. For example,
to find all solutions of the system
x1 + x2 + x3 = 0, 2x1 + x2 + x4 = 0
we must find the null space of the matrix
   
1 1 1 0 1 1 1 0
A= → .
2 1 0 1 0 −1 −2 1
and we have
x1 + x2 + x3 = 0, −x2 − 2x3 + x4 = 0
We choose x3 and x4 as free variables and solve for x1 , x2 ,
x2 = −2x3 + x4 , x1 = −x2 − x3 = −x3 − x4 .
Thus, we obtain a basis of N (A)
   
−1 −1
 −2 
 and  1  ,
 

 1   0 
0 1
which corresponds to x3 = 1, x4 = 0 and x3 = 0, x4 = 1, respectively.
In general we have
Basis for a null space N (A) Recall N (A) = {x ∈ Rn : Ax = ~0} is a subspace of matrix
A ∈ Rm×n . One can use Gauss elimination to find a basis of N (A).
• Use elementary row operations to put A in reduced row echelon form.
• Using the reduced row echelon form, determine which of the variables x1 , x2 , · · · , xk
are free. Write equations for the dependent variables in terms of the free variables.
• For each free variable xi , choose a vector in the null space for which xi = 1 and the
remaining free variables are zero. The resulting collection of vectors is a basis for the
null space of A.
EXAMPLE The standard basis for R3 is {~e1 , ~e2 , ~e3 }; however, there are many bases that we
could choose for R3 .
                 
 1 0 0   1 0 2   1 2 3 
 0 , 1  0  ,  1  1 , 0  ,  0 , 1  2  ,
0 0 1 1 1 1 0 0 1
     

Standard Bases We refer to the set {~e1 , ~e2 , ~e3 } as the standard basis for R3 . We refer to
this basis as the standard basis because it is the most natural one to use for representing
vectors in R3 . More generally, the standard basis for Rn is the set {~e1 , ~e2 , ..., ~en } since
 
x1
~x =  ...  = x1 ~e1 + · · · + xn ~en
 
xn

24
The most natural way to represent matrices in R2×2 is in terms of the standard 2 × 2 basis
matrix
         
a11 a12 1 0 0 1 0 0 0 0
A= = a11 + a12 + a21 + a22
a21 a22 0 0 0 0 1 0 0 1

The standard way to represent a polynomial in Pn is in terms of the standard basis functions
{1, x, x2 , ..., xn }, i.e.,
p(x) = a0 + a1 x + ·an xn
In general, we have
Theorem 1 If {~v1 , · · · , ~vn } is a spanning set for a vector space V , then any collection of m
vectors in V , where m > n, are linearly dependent.
Proof: Let {~u1 , ~u2 , ..., ~um } be m vectors in V where m > n. Then, since {~v1 , · · · , ~vn } span
V, we have
~ui = a1i~v1 + a2,i~v2 + · · · an,i~vn
Thus,
m
X Xn n X
X m
c1~u1 + c2~u2 + · · · cm~um = ci ( aji~vj ) = ( aji ci )~
vj
i=1 j=1 j=1 i=1

Now consider the system of equations


m
X
aji ci = A~c = ~0.
i=1

This is a homogeneous system with more unknowns than equations. Therefore, the system
must have a nontrivial solution (c1 , c2 , · · · , cm )t . Thus, {~u1 , ~u2 , ..., ~um } are linearly dependent.

Theorem 2 If V is a vector space of dimension n > 0, then
(I) any set of n linearly independent vectors spans V .
(II) any n vectors that span V are linearly independent.
Proof: Suppose that {~v1 , · · · , ~vn } are linearly independent and ~v is any other vector in V .
Since V has dimension n, it has a basis consisting of n vectors and these vectors span V . It
follows from Theorem 1 that {~v1 , · · · , ~vn ~v } must be linearly dependent. Thus there exists
ci ∈ R, 1 ≤ i ≤ n + 1 not all zero such that

c1~v1 + · · · cn~vn + cn+1~v = ~0

Then, cn+1 cannot be zero since if cn+1 = 0, the ci = 0, 1 ≤ i ≤ n.


c1 cn
~v = − ~v1 + · · · − ~vn
cn+1 cn+1

To prove (II), suppose that span(~v1 , · · · , ~vn ) = V . If {~v1 , · · · , ~vn } are linearly dependent,
then one of the ~vi s, say ~vn , can be written as a linear combination of the others. i.e.,
dim(V ) < n, which is a contradiction. 

25
Theorem 3 The dimension of the sum satisfies the inequality

max(dim W1 , dim W2 ) ≤ dim(W1 + W2 ) ≤ dim(W1 ) + dim(W2 ).

Here the minimum only occurs if one subspace is contained in the other, while the maximum
is the most general case. The dimension of the intersection and the sum are related:

dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).

Proof: Let {~u1 ; ~u2 , · · · ~um } be a basis of W1 ∩ W2 , thus dim(W1 ∩ W2 ) = m. Because


{~u1 ; ~u2 , · · · ~um } is a basis of W1 ∩ W2 , it is linearly independent in W1 . Hence this list can be
extended to a basis {~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj } of W1 . where dim(W1 ) = m + j. Also extend
{~u1 ; ~u2 , · · · ~um } to a basis {~u1 ; ~u2 , · · · ~um , w
~ 1, · · · , w
~ k } of W2 and thus dim(W2 ) = m + k.
We will show that {~u1 ; ~u2 , · · · , ~um , ~v1 , · · · , ~vj , w ~ 1, · · · , w
~ k } is a basis of W1 + W2 . This will
complete the proof, because then we will have

dim(W1 + W2 ) = m + j + k = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ) = m + j + m + k − m

Clearly span({~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj , w


~ 1, · · · , w
~ k }) is contained in W1 and W2 and and
hence equals W1 + W2 . Suppose

a1~u1 + · · · + am~um + b1~v1 + · · · + bj ~vl + c1 w ~ k = ~0,


~ 1 + · · · + ck w (2.1)

Then, it rewritten as

~ 1 + · · · + ck w
c1 w ~ k = −(a1~u1 + · · · + am~um + b1~v1 + · · · + bj ~vj ),

~ 1 + · · · + ck w
which shows that c1 w ~ k ∈ W1 ∩ W2 . Since {~u1 ; ~u2 , · · · ~um } is a basis of W1 ∩ W2 ,

~ 1 + · · · + ck w
c1 w ~ k = d1~u1 + · · · + dm~um

for some d1 , · · · , dm . But since {~u1 ; ~u2 , · · · ~um , w


~ 1, · · · , w
~ k } are linearly independent, all co-
0 0
efficients c s and d s are zero. Thus, our original equation becomes

a1~u1 + · · · + am~um + b1~v1 + · · · + bj ~vj = ~0

Since {~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj } are linearly independent all a0 s and b0 s equal to zero. Thus,
{~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj , w
~ 1, · · · , w
~ k } is a basis of W1 + W2 , which completes the proof. 
Remark If W1 ∩ W2 = {~0}, then dim(W1 + W2 ) = dim(W1 ) + dim(W2 ). Notation: In this
case W1 ⊕ W2 .
Find a basis for W = span(~a1 , · · · ~an ). Let
   
~at1 a1,1 a2,1 · · · am,1
 ..   .. .. .. 
A= . = . . . 
~atn a1,n a2,n · · · am,n

26
Using elementary row operations, this matrix is transformed to the row echelon form. Then,
it has the following shape:
c1,1 c1,2 · · · c1,m
 
 .. .. .. 
 . . . 
c c · · · c
 
 q,1 q,2 q,m 
 0 0 ··· 0 
 
 . .. .. 
 .. . . 
0 0 ··· 0
Then,
{~c1 , · · · ~cq } is a basis of W and dim(W ) = q.
Zassenhaus algorithm Algorithm for finding bases for intersection W1 ∩ W2 and sum
W1 + W2 . Assume
W1 = span(~a1 , · · · ~an ), W2 = span(~b1 , · · · ~bk )
subspaces of Rm and let
~bt
   
~at1 1
A =  ...  B =  ..  .
  
. 
~atn ~bt
k

The algorithm creates the following block matrix of size ((n + k) × (2m)) × ((n + k) × (2m)):

a1,1 a1,2 · · · a1,m a1,1 a1,2 · · · a1,m


 
 .. .. .. .. .. ..  
 . . . . . .  
A A
an,1 an,2 · · · an,m an,1 an,2 · · · an,m  
 
= .
 b1,1 b1,2 · · · b1,m 0 0 ··· 0 

 . .. .. .. .. ..  B 0
 .. . . . . . 
bk,1 bk,2 · · · bk,m 0 0 ··· 0
Using elementary row operations, this matrix is transformed to the row echelon form.
Then, it has the following shape:
 
c1,1 c1,2 · · · c1,m ∗ ∗ ··· ∗
 .. .. .. .. .. .. 
 . . . . . . 
cq,1 cq,2 · · · cq,m ∗ ∗ ··· ∗ 
 
 0 0 ··· 0 d1,1 d1,2 · · · d1,m 
 
 . .. .. .. .. .. 
 .. . . . . . 
 
 0
 0 ··· 0 d`,1 d`,2 · · · d`,m 
 0
 0 ··· 0 0 0 ··· 0  
 . . . . . .
. . . . . .

 . . . . . . 
0 0 ··· 0 0 0 ··· 0

Here, ∗ stands for arbitrary numbers, and the vectors . Then

{~c1 , · · · ~cq } is a is a basis of W1 + W2 .

27
and
{d~1 , · · · d~` } is a is a basis of W1 ∩ W2 .
EXAMPLE     

 1 0 
    
−1 ,  0 

W1 =  0   1 

 
1 −1
 

and     

 5 0 
    
0 ,  5 

W2 = −3 −3

 
3 −2
 

of the vector space R4 . Using the standard basis, we create the following matrix of dimension
(2 + 2) × (2 · 4):  
1 −1 0 1 1 −1 0 1
0 0
 1 −1 0 0 1 −1 
 .
 
5 0 −3 3 0 0 0 0
0 5 −3 −2 0 0 0 0
Using elementary row operations, we transform this matrix into the following matrix:
 
1 0 0 0 ∗ ∗ ∗ ∗
0 1 0 −1 ∗ ∗ ∗ ∗
 
0 0 1 −1 ∗ ∗ ∗ ∗ 
 
 
0 0 0 0 1 −1 0 1
(some entries have been replaced by ∗ because they are irrelevant to the result). Therefore,
      

 1 0 0 
      
0 ,  1  ,  0 

 0  0   1 
 
0 −1 −1
 

is a basis of W1 + W2 and  

 1 
 
−1

 0 
 

1
 

is a basis of W1 ∩ W2 .
MATLAB implementation
Given matrix A ∈ Rm×n1 and A ∈ Rm×n2 use matlab LU decomposition:
[L,U,P]=lu([[A’;B’] [A’;0*B’]]); U
where U is a resulting upper triangular form we are looking for. Try it with
A=rand(4,2), B=[sum(A,2) rand(4,1)];

28
2.7 Inverse of matrix A
LEARNING OBJECTIVES FOR THIS SECTION: Linear equation, Inverse of matrix, Gauss
elimination. Nonsingular and Singular matrixes.
Let I = In ∈ Rn×n be the identity matrix = diagonal matrix with diagonal entries are
all one. Then, In A = AIn = A for all A ∈ Rn×n .
Definition (Inverse of matrix A) Let A ∈ Rn×n be a square matrix. A matrix B ∈ A ∈
Rn×n is an inverse of A if
AB = In identity matrix .
and denoted by B = A−1 ., i.e.
AA−1 = In (2.2)
If so, A is non singular.
Recall that

x = A−1 b satisfies a linear equation Ax = b for all b ∈ Rn .

In fact,
Ax = A(A−1 b) = (AA−1 )b = In b = b.
Note that if B̃ ∈ Rn×n satisfies B̃A = I , then

B̃ = B̃I = B̃(AB) = (B̃A)B = IB = B

and thus
A−1 A = In . (2.3)
Theorem Inverse of product AB
If A, B ∈ Rn×n are nonsingular , then

(AB)−1 = B −1 A−1

Proof:
(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In .
Definition (Transpose)
The transpose of an m × n matrix A, denoted by At , is the n × m matrix such that the
(j, i)-entry is given by Ai,j

(At )j,i = Ai,j for i = 1, · · · , m and j = 1, · · · , n.

In other words, column i of At comes from row i of A, or equivalently row j of At comes


from column j of A.
The following properties hold

(At )t = A, (A + B)t = At + B t , (AB)t = B t At and (At )−1 = (A−1 )t

29
In fact, since X X
(AB)i,j = aik bk,j , (B t At )i,j = bk,i ak,j ,
k k
X
((AB)t )i,j = ajk bk,i = (B t At )i,j
k
−1
For B = A
In = Int = (AB)t = B t At ⇒ (At )−1 = B t = (A−1 )t by (2.2)–(2.3).
How to find A−1 by Gauss Elimination Form an attached matrix
[A | In ]
Then apply a Gauss-Jordan deduction and we obtain the reduced matrix (row echelon form)
[U | C],
where U is the reduced upper matrix of A. Then
A−1 = U −1 C.
EXAMPLE Find all values of a such that the matrix
 
1 0 2
A=  −1 1 a 
0 a −1
is invertible. Solution: Gauss elimination of A:
     
1 0 2 1 0 2 1 0 2
 −1 1 a → 0 1 a+2 → 0 1 a+2 
2
0 a −1 0 a −1 0 0 −a − 2a − 1
implies that
−a2 − 2a − 1 = −(a + 1)2 6= 0 → a 6= −1
EXAMPLE A block matrix formula:
−1  −1
−A−1 BC −1
 
A B A
=
O C O C −1
where A ∈ Rn×n , B ∈ Rn×m and C ∈ Rm×m . Solution:
 −1 −1 
−A−1 BC −1
  −1
A A A−1 B − A−1 BC −1 C
 
A A B In O
= =
O C −1 O C O C −1 C O Im
Equivalently, it is equivalent find a solution to
Ax + By = a Cy = b.
i.e., y = C −1 b, x = A−1 (a − BC −1 b) by back substitution. Equivalently,
   −1
−A−1 BC −1
 
x A a
= .
y O C −1 b

Remark A is non singular iff the reduced triangle matrix U has nonzero diagonals,
i.e., U −1 C is carried out by backward substitution for each column vector of C.

30
3 Determinant and Matrix inverse
LEARNING OBJECTIVES FOR THIS Chapter: Determinant, Cramers rule for inverse of
matrix A. Cofactors and minors of A. Inverse matrix, Properties of Determinant. Alterna-
tive to Gasss-Jordan reduction to upper triangular matrix
In linear algebra, the determinant is a scalar value that can be computed from the
elements of a square matrix and encodes certain properties of the linear transformation
described by the matrix. The determinant of a matrix A is denoted det(A) or |A|. Geomet-
rically, it can be viewed as the volume scaling factor of the linear transformation described
by the matrix. This is also the signed volume of the n-dimensional parallelepiped
spanned by the column or row vectors of the matrix. The determinant is positive or
negative according to whether the linear transformation preserves or reverses the orientation
of a real vector space.

In the case of a 2 × 2 matrix the determinant may be defined as

a b
|A| = = ad − bc.
c d

Similarly, for a 3 × 3 matrix A, its determinant is

a b c
e f d f d e
|A| = d e f = a −b +c
h i g i g h
g h i
= aei + bf g + cdh − ceg − bdi − af h.

Each determinant of a 2 × 2 matrix in this equation is called a minor of the matrix A. This
procedure can be extended to give a recursive definition for the determinant of an n × n
matrix, the Laplace expansion.
The following scheme (rule of Sarrus) for calculating the determinant of a 3×3 matrix, the
sum of the products of three diagonal north-west to south-east lines of matrix elements, minus
the sum of the products of three diagonal south-west to north-east lines of elements, when
the copies of the first two columns of the matrix are written beside it as in the illustration:

31
Definition (Determinant) The determinant of an n × n matrix A, denoted det(A), is a
scalar associated with the matrix A that is defined inductively as

 a11 if n = 1
det(A) =
a11 A11 + a12 A12 + · · · + a1n A1n if n > 1.

where
Ai,j = (−1)1+j det(Mij ).

Laplace expansion expresses the determinant of a matrix in terms of its minors. The
cofactor Aij is defined to be the determinant of the (n−1)×(n−1)-matrix minor Mij that re-
sults from A by removing the i-th row and the j-th column. The expression (−1)i+j det(Mij )
is known as a cofactor.
Equivalent Definition (Leibniz formula) The determinant of a n × n matrix A is the
scalar quantity X
det(A) = sign(φ)a1φ(1) a2φ(2) · · · anφ(n)
φ∈Sn

where Sn is all permutations of indices (1, 2, · · · , n) and sign(φ) is the sign of permutation
(reordering) φ. If φ requires s interchanges of indices (1, 2, · · · , n), then sign(φ) = (−1)s .
In fact, we have

det(A) = nk=1 φ(k)=1, φ∈Sn sign(φ)a1φ(1) a2φ(2) · · · anφ(n)


P P

Pn P
= k=1 ak1 φ(1:k−1:k+1:n)∈Sn−1 sign(φ)a1φ(1) · · · a1φ(k−1) a1φ(k+1) · · · an,φ(n)

= a11 A11 + a21 A21 + · · · + an1 An1

For example, {S2 = (1, 2), (2, 1)}.

S3 = {(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)}

Sn contains n! elements..

32
Remark (column-wise)
det(A) = a11 A11 + a21 A21 + · · · + an1 An1 .
Remark (Expansion at k the row (column) By interchanging the first row (column) with the
kth row (column of A), we have
det(A) = −(ak1 Ak1 + ak2 Ak2 + · · · + akn Akn )

det(A) = −(a1k A1k + a2k A2k + · · · + ank Ank ).


A number of properties relate to the effects on the determinant of changing particular
rows or columns, which all follow from the Laplace expansion and Leibniz formula.
(1) Viewing an n × n being composed of n columns, the determinant is an n-linear function.
This means that if the j-th column of a matrix A s written as a sum aj = v + w of two
column vectors v, w , and all other columns are left unchanged, then the determinant of A
is the sum of the determinants of the matrices obtained from A by replacing the j-th column
by v, denoted by Av then by w, denoted by denoted Aw (and a similar relation holds when
writing a column as a scalar multiple of a column vector).
det(A) = det([a1 | . . . |aj | . . . |an ])
= det([. . . |v + w| . . . ])
= det([. . . |v| . . . ]) + det([. . . |w| . . . ])
= det (Av ) + det (Aw )
(2) If in a matrix, any row or column has all elements equal to zero, then the determinant of
that matrix is 0. This n-linear function is an alternating form. This means that whenever
two columns of a matrix are identical, or more generally some column can be expressed as
a linear combination of the other columns (i.e. the columns of the matrix form a linearly
dependent set), its determinant is 0.
Above all properties for columns have their counterparts in terms of rows: viewing an
n × n matrix as being composed of n rows, the determinant is an n-linear function.
(3) Whenever two rows of a matrix are identical, its determinant is 0.
(4) Interchanging any pair of columns or rows of a matrix multiplies its deter-
minant by −1. This follows from more generally, any permutation of the rows or columns
multiplies the determinant by the sign of the permutation. By permutation, it is meant
viewing each row as a vector Ri (equivalently each column as Ci ) and reordering the rows
(or columns) by interchange of Rj and Rk (or Cj and Ck ), where j, k are two indices chosen
from 1 to n for an n × n square matrix.
(5) Adding a scalar multiple of one column to another column does not change the value of
the determinant. since the determinant changes by a multiple of the determinant of a matrix
with two equal columns, which determinant is 0. Similarly, adding a scalar multiple of
one row to another row leaves the determinant unchanged.
For example, the determinant of
 
−2 2 −3
A = −1 1 3 
2 0 −1

33
can be computed using the following matrices (Gauss eliminations):
     
−2 2 −3 −2 2 −3 −2 2 −3
B= 0  0 4.5 , C = 0
  0 4.5, D =  0 2 −4.
2 0 −1 0 2 −4 0 0 4.5

Here, B is obtained from A by adding −1/2× the first row to the second, so that det(A) =
det(B). C is obtained from B by adding the first to the third row, so that det(C) = det(B).
Finally, D is obtained from C by exchanging the second and third row, so that det(D) =
− det(C). The determinant of the (upper) triangular matrix D is the product of its entries
on the main diagonal: (−2) · 2 · 4.5 = −18. Therefore, det(A) = − det(D) = +18.
Remark If row vectors of A are linearly dependent, det(A) = 0, Conversely, if row vectors of
A are linearly independent if and if det(A) 6= 0.
Definition A matrix A ∈ Rn×n is singular if det(A) = 0, otherwise is non-singular.
Theorem 1 If A is an n × n matrix, then det(At ) = det(A).
Theorem 2 det(cA) = cn det(A).
Theorem 3 For all elementary operations E, det(EA) = det(E) det(A) and

det(A) = det(L) det(U ) where L = E1 E2 · · · En−1

Proof: For elementary row operations [I], [II] and [III]

det(EA) = det(E) det(A).

Theorem 4 If U is an upper triangular matrix det(U ) = u11 × · · · × unn ,


Theorem 5 det(AB) = det(A) det(B).
Proof: Assume B is nonsingular and

B = LU = E1 E2 · · · En−1 diag(U )Ẽ1 · · · Ẽn−1 ,

where Ek are elementary row operation and Ẽj are elementary column operations. That is,

U t = Ẽ1t · · · Ẽn−1
t
diag(U ),

where Ẽ1t , · · · , Ẽn−1


t
is lower triangular matrices of elementary operations. Thus, we have

det(AB) = det(A) det(E) det(diag(U )) det(Ẽ) = det(A) det(U ) = det(A) det(B).

Corollary det(A−1 ) = det(A)−1 since A−1 A = I.


EXAMPLE Expanding by a row or column can sometimes be a quick method of evaluating
the determinant of matrices containing a lot of zeros. For example, let
 
9 0 2 6
 1 2 9 −3
A=  0 0 −2 0 

−1 0 −5 2

34
 
9 0 6
Then, expanding by the third row, we get det(A) = −2 × det  1 2 −3 and by the
  −1 0 2
9 6
second column, det(A) = −2 × 2 × det = −96.
−1 2

3.1 Cramer’s rule


For a matrix equation Ax = b, given that A has a nonzero determinant, the solution x = A−1 b
is given by Cramer’s rule:
det(Ai )
xi = , i = 1, 2, 3, · · · , n
det(A)
where Ai is the matrix formed by replacing the i-th column of A by the column vector b.
This follows immediately by column expansion of the determinant, i.e.
n
  X  
det(Ai ) = det a1 , . . . , b, . . . , an = xj det a1 , . . . , ai−1 , aj , ai+1 , . . . , an = xi det(A)
j=1

where aj is the j-th column of A since

b = x 1 a1 + x 2 a2 + · · · + x n an .

The rule is also equivalently written as

A adj(A) = adj(A) A = det(A) In .

or equivalently
1
A−1 = adj(A)
det(A)
where the adjugate matrix adj(A) is the transpose of the matrix of the cofactors, that is,

(adj(A))ij = (−1)i+j Aji .

In fact i-th column ~xi of A−1 equals to A−1~ei where ~ei is the i the unit vector. By Cramer’s
rule, the j the coordinate of ~xi is given by
det(Aj ) 1
(A−1 )ji = = (−1)i+j Aij .
det(A) det(A)
Note that the adjugate matrix adj(A) is the transpose of the cofactors of A.
The rule for 3 × 3 case: 
a1 x + b 1 y + c 1 z = d 1
a2 x + b 2 y + c 2 z = d 2
a3 x + b 3 y + c 3 z = d 3

which in matrix format is     


a1 b 1 c 1 x d1
a2 b2 c2 y  = d2 .
a3 b 3 c 3 z d3

35
Then the values of x, y and z can be found as follows:

d1 b1 c1 a1 d1 c1 a1 b1 d1
d2 b2 c2 a2 d2 c2 a2 b2 d2
d3 b3 c3 a3 d3 c3 a3 b3 d3
x= , y= , and z = .
a1 b1 c1 a1 b1 c1 a1 b1 c1
a2 b2 c2 a2 b2 c2 a2 b2 c2
a3 b3 c3 a3 b3 c3 a3 b3 c3

EXAMPLE  
a b
A= .
c d
Suppose det(A) = ad − bc 6= 0,
 
−1 1 d −b
A = .
ad − bc −c a

EXAMPLE  
A B
det = det(A) det(C)
O C
where A ∈ Rn×n , B ∈ Rn×m and C ∈ Rm×m . Solution: Apply the Gauss elimination to A
and C to obtain a upper triangular matrix
 
U1 B̃
O U2

whose the determinant= det(U1 ) det(U2 ) = det(A) det(C).


It has recently been shown that Cramer’s rule can be implemented in O(n3 ) time, which
is comparable to more common methods of solving systems of linear equations, such as LU,
QR, or singular value decomposition.
Theorem A matrix A is nonsingular (det(A) 6= 0) if and only if A−1 exists. If so,
rank(A) = n, Ax = b has a unique solution x = A−1 b and N (A) = {~0}.

4 Linear Transform
LEARNING OBJECTIVES FOR THIS CHAPTER Fundamental Theorem of Linear Maps
Matrix representation and Change of basis and Similarity transform, Inverse map, Injective
and Surjective map.
Let V and W be vector spaces with scalars coming from the same field F . A mapping
T : V → W is a linear transformation if for any two vectors x1 and x2 in V and any scalar
a1 , a2 ∈ F , the following are satisfied:

T (a1 x1 + a2 x2 ) = a1 T (x1 ) + a2 T (x2 )

36
Definition (Composition of linear transformations) Let T1 ∈ L(V, W ) and T2 ∈ L(W, U ). We
define a transformation T2 T1 : V → U by (T2 T1 )(u) = T2 (T1 (u)) for u ∈ V In particular, we
define T 2 = T T and T i+1 = T i T for i > 2.
EXAMPLE (Matrix) V = Rn and W = Rm and T (x) = Ax for A ∈ Rm×n .
d
EXAMPLE (Derivative) T1 = dx
= D derivative and V = C 1 (a, b) and W = C(a, b)

d d d
(a1 f1 + a2 f2 ) = a1 f1 + a2 f2 .
dx dx dx
Rx
EXAMPLE (Integration) T2 f = 0 f dx integral and V = C(a, b) and W = C 1 (a, b):
Z x Z x Z x
(a1 f1 + a2 f2 ) dx = a1 f1 dx + a2 f2 dx..
0 0 0

d
Rx
Since (
dx 0
f dx) = f (x), we have

T1 T2 f = T1 (T2 f ) = f for f ∈ C(a, b)

EXAMPLE (Multiplication) (T f )(x) = (a+bx+cx2 )f (x) for a, b, c ∈ R. V = W = C(a, b).

d
EXAMPLE (Composite of Derivative and Multiplication) (T f )(x) = x dx f . V = C 1 (a, b), W =
C(a, b).
EXAMPLE (Shift) Let V = C(R), the space of continuous functions. Every α ∈ R gives
rise to two linear maps, shift Sα : V → V , Sα (f ) = f (x − α) and evaluation Eα V → R,
Eα (f ) = f (α).
Isomorphism identifying V with dim(V ) = n with Rn . Assume dim(V ) = n and {~v1 , · · · , ~vn }
is a linearly independent basis, i.e., every vector ~v ∈ V is uniquely represented by

~v = a1 ~v1 + · · · + an ~vn

That is. ~v ∈ V corresponds to exactly one such column vector (a1 , · · · , an )t in Rn , and vice
versa. That is, for all intents and purposes, we have just identified the vector space V with
the more familiar space Rn .

37
EXAMPLE {1, x, x2 } is the standard basis of P2 :
 
a
V = P2 : a + bx + cx2 →  b  ∈ R3
c

defines an isomorphism identifying P2 with R3 .


Matrix representation A of T Assume dim(V ) = n and dim(W ) = m. We will now see
that we can express linear transformations as matrices as well. Hence, one can simply focus
on studying linear transformations of the form T (x) = Ax where A ∈ Rm×n is a matrix.
In fact, let {~v1 , · · · , ~vn } be a basis of V and {w ~ 1, · · · , w
~ m } be a basis of W . Then, we
have
T(ṽj ) = a1j w̃1 + · · · amj w̃m
and define A = (aij ) ∈ Rm×n , i.e.,
 
a11 · · · a1n
 .. .. ..  .
A= . . . 
am1 · · · amn

In fact, if y = Ax ∈ Rm , given x ∈ Rn we have

T (x1~v1 + · · · + xn~vn ) = x1 T (~v1 ) + · · · + xn T (~vn ) = y1 w


~ 1 + · · · + ym w
~m

since

x1 T (~v1 ) + · · · + xn T (~vn ) = (a11 x1 + · · · + a1n xn )w


~ 1 + · · · + (am1 x1 + · · · + amn xn )w
~ m.

Corollary If T L(Rn , Rm ) then T (x) = Ax if the jth column vector ~aj of A is given by
~aj = T (~ej ), j = 1, · · · , n.
d
EXAMPLE Consider the linear transformation D : P2 → P1 that sends f to dx . Then, the
matrix representation A of D, V = P2 and W = P1 with the standard basis {1, x, x2 } is
given by  
    a  
0 1 0 2×3 0 1 0   b
A= ∈R , b = .
0 0 2 0 0 2 2c
c
d
This represents the fact that dx
(a + bx + cx2 ) = b + 2cx.
Rx
EXAMPLE Consider the integral map T2 : P2 → P3 that sends f to 0 f dx. Then, the
matrix representation A of T2 , V = P2 and W = P3 with the standard basis {1, x, x2 , x3 } is
given by      
0 0 0 0 0 0   0
 1 0 0   1 0 0  a
  b  =  1a 
4×3
 
A=  0 1 0 
 ∈ R , 
 0 1 0   b 
2 2 c 2
0 0 31 0 0 13 1
3
c
Rx
This represents the fact that 0 (a + bx + cx2 ) = ax + 21 bx2 + 13 cx3 .

38
d
EXAMPLE Consider the integral map T : P3 → P3 that sends f to x dx f . Then, the matrix
representation A of T2 , V = P3 and W = P3 with the standard basis {1, x, x2 , x3 } is given
by       
0 0 0 0 0 0 0 0 a 0
 0 1 0 0 
 ∈ R4×4 ,  0 1 0 0   b  =  b 
    
A=  0 0 2 0   0 0 2 0   c   2c 
0 0 0 3 0 0 0 3 d 3d
d
This represents the fact that x dx (a + bx + cx2 + dx3 ) = bx + 2cx2 + 3dx3 .
EXAMPLE T : R2 → R2 , T = Rθ is a rotation by θ anti-clockwise about the origin. Since
T (1, 0) = (cos θ, sin θ) and T (0, 1) = (− sin θ, cos θ),
       
α 1 0 α cos θ − β sin θ
T = αT + βT = ,
β 0 1 α sin θ + β cos θ
so the matrix using the standard bases is
 
cosθ − sin θ
A= .
sin θ cos θ
Now clearly Rθ followed by Rφ is equal to Rθ+φ . Thus
  
cosφ − sin φ cosθ − sin θ
Rφ Rθ =
sin φ cos φ sin θ cos θ
 
cos φ cos θ − sin φ sin θ − cos φ sin θ − sin φ cos θ
=
sin φ cosθ + cos φ sin θ −sinφ sin θ + cos φ cos θ
 
cos(φ + θ) − sin(φ + θ)
= Rφ+θ =
sin(φ + θ) cos(φ + θ)
which derives the addition formulae for sin and cos.
EXAMPLE T : P 2 → R2 , T is evaluation T f = (f (0), f (1))t ∈ R2 Then, with the standard
basis {1, x, x2 } of P2
 
2 t 1 0 0
T (a + bx + cx ) = (a, a + b + c) ⇒ A = .
1 1 1

4.1 Invertibility is equivalent to infectivity and subjectivity


Definition Injective (one-to-one) N (T ) = 0, i.e., T v1 = T v2 implies v1 = v2 .
EXAMPLE The differentiation map T1 = D is not injective since

D(c + p) = Dp for all constants c ∈ R.


Rx
Integral operator T2 f = 0 f dx is injective since

T2 f = T2 g → f = DT2 f = DT2 g = g.

39
Definition T is surjective (onto) R(T ) = W . For all w ∈ W , there exits v ∈ V such
that w = T v.
EXAMPLE The differentiation map D : P5 → P5 is not surjective, because the polynomial
x5 is not in the range of D. However, the differentiation map D : P5 → P4 is surjective.
Theorem A linear map T is invertible if and only if it is injective and surjective.
Proof Suppose T is injective and surjective. We want to prove that T is invertible. For each
w ∈ W , define Sw to be the unique element of V such that T Sw = w (the existence and
uniqueness of such an element follow from the surjectivity and infectivity of T ). Clearly T S
equals the identity map on W . To prove that ST equals the identity map on V , Let v ∈ V .
Then
T (ST v) = (T S)(T v) = I(T v) = T v.
This equation implies that ST v = v (because T is injective). Thus ST equals the identity
map on V . To complete the proof, we need to show that S is linear. To do this, suppose
w1 , w2 ∈ W . Then
T (Sw1 + Sw2 ) = T (Sw1 ) + T (Sw2 ) = w1 + w2
Thus, Sw1 + Sw2 is unique element of V that T maps to w1 + w2 . By the definition of S,
this implies that S(w1 + w2 ) = w1 + w2 . Hence S satisfies the additive property. Also, If
w ∈ W and c ∈ R
T (cSw) = cT (Sw) = c w.
Thus, S(cw) = cSw. Hence S is linear. 

4.2 Fundamental Theorem of Linear Maps


Suppose V is finite-dimensional and T ∈ L(V, V ). Then range of T is finite-dimensional and
dim(V ) = dim(N (T )) + dim(R(T )).
Proof: Let {~u1 , · · · , ~um } be a basis of N (T ) and dim(N (T )) = m The linearly independent
list {~u1 , · · · , ~um } can be extended to a basis
{~u1 , · · · , ~um , ~v1 , · · · , ~vn }
of V . Thus dim(V ) = m + n. To complete the proof, we need show that dim(R(T )) = n.
We will do this by proving that {T~v1 , · · · , T~vn } is a basis of R(T ). Let ~v ∈ V and
~v = a1~u1 + · · · am~um + b1~v1 + · · · bn~vn
Applying T to both sides of this equation, we get
T~v = a1 T ~u1 + · · · am T ~um + b1 T~v1 + · · · bn T~vn
Since the terms of the form T ~uj = 0, this implies {T~v1 , · · · , T~vn } spans range of T . If
we prove that {T~v1 , · · · , T~vn } are linearly independent, the proof is completed. Suppose
c1 , · · · , cn and
c1 T~v1 + · · · + cn T~vn = 0

40
Then,
T (c1~v1 + · · · + cn~vn ) = 0
and c1~v1 + · · · cn~vn ∈ N (T ). Since {~u1 , · · · , ~um } be a basis of N (T )

c1~v1 + · · · cn~vn = d1~u1 + · · · dm~um

Since {~u1 , · · · , ~um , ~v1 , · · · , ~vn } are linearly independent, c1 = · · · = cm = 0. .


Corollary 1 A map to a smaller dimensional space is not injective since dim(N (T )) ≥ 1.
Corollary 2 A map to a larger dimensional space is not surjective since dim(R(T )) < dim(V ).

Corollary 3 A ∈ Rn×n . Then A is injective if an only if A is surjective. If so, A is nonsingular.

4.3 Change of Basis and Similarity transform


In linear algebra, two n × n matrices A and B are called similar if there exists a nonsingular
n × n matrix P such that
B = P −1 AP.
Similar matrices represent the same linear map under two (possibly) different bases, with P
being the change of basis matrix, i.e.,

y = P x ⇔ x = P −1 y.

A transformation
A → P −1 AP
is called a similarity transformation or conjugation of the matrix A. In the general linear
group, similarity is therefore the same as conjugacy, and similar matrices are also called
conjugate.
Theorem (Change of Basis) Let E = {~v1 , · · · , ~vn } and F = {w ~ n } be two ordered
~ 1 , ..., w
bases for a vector space V , and let T : V → V be a linear operator Let P be the transition
matrix representing the change from F to E. If A is the matrix representing T with respect
to E, and B is the matrix representing T with respect to F , then B = P −1 AP . (A : E → E
and B : F → F are matrix representation of T : V → V )
Proof: Let x be any vector in W and let

w ~ 1 + · · · + xn w
~ = x1 w ~n

Let y = P x, t = Ay and z = Bx. It follows from the definition of P that y = ~v |E , i.e.,

~v = y1 ~v1 + · · · + yn ~vn

Since A represents T with respect to E, and B represents T with respect to F , we have


t = T (~v )|E and z = T (~v )|F . Since the transition matrix from E to F is P −1 we have

P −1 AP x = P −1 Ay = P −1 t = z = Bx,

41
which implies
P −1 AP x = Bx for all x ∈ Rn .
EXAMPLE Let E = {1, x, x2 } be the standard basis to P2 . Another  basis forP2 is F =
1 1 0
2
{x + 1, x − 1, 2x }. Since the transformation matrix from F to E is  −1 1 0  and thus
0 0 2
the transformation matrix P from E to F is
   −1
1/2 −1/2 0 1 1 0
 1/2 1/2 0  =  −1 1 0 
0 0 1/2 0 0 2

Consider the element f = a + bx + cx2 ∈ P2 . This represents the fact that f can also be
written as a+b
2
(x + 1) + b−a
2
(x − 1) + 2c (2x2 ).
When defining a linear transformation, it can be the case that a change of basis can
result in a simpler form of the same transformation. For example, the matrix representing
a rotation in R3 when the axis of rotation is not aligned with the coordinate axis can be
complicated to compute. If the axis of rotation were aligned with the positive z-axis, then
it would simply be  
cos θ − sin θ 0
P =  sin θ cos θ 0,
0 0 1
where θ is the angle of rotation.

5 Eigenvalues
LEARNING OBJECTIVES FOR THIS CHAPTER invariant subspaces, eigenvalues, eigen-
vectors, and eigenspaces, diagonarization and Jordan form, solution to linear ordinary dif-
ferential equations. Markov Chain transition matrix.
Definition (Invariant subspace) Suppose T ∈ L(V, V ). A subspace U of V is called
invariant under T if u ∈ U implies T u ∈ U .
The null space and range space of a linear transformation, are prominent examples of
invariant subspaces. More importantly, a specific case of the invariant subspace is as follows.
An eigenvalue λ ∈ C of an n × n matrix A satisfies

(λ I − A)~v = 0 ⇔ A~v = λ ~v ⇔ span(~v ) is an invariant subspace of A.

for a nontrivial vector ~v ∈ C n , i.e., such a ~v is called an eigenvector corresponding to


an eigenvalue λ ∈ C. Let A be an n × n matrix and λ ∈ C. The following statements are
equivalent:
(a) λ is an eigenvalue of A.
(b) (A − λI)x = 0 has a nontrivial solution.
(c) N (A − λ I) 6= {0}
(d) (A − λI) is singular.

42
(e) det(A − λI) = 0.
Thus, λ satisfies the characteristic equation

χ(λ) = det(λ I − A) = 0.

and there exist n eigenvalues {λi } (including algebraic multiplicities) of A. Complex eigen-
values λ of A ∈ Rn×n appear in complex conjugate pair λ = α ± iβ. Thus,

χ(λ) = (λ − λ1 ) × · · · × (λ − λn )

 
a b
EXAMPLE Consider A = ∈ R2×2 . Then,
c d
 
a−λ b
det(A − λ I) = = (a − λ)(d − λ) − bc = λ2 − (a + d)λ + ad − bc = 0
c d−λ

and eigenvalues λ are given by


p
a+d± (a + d)2 − 4(ad − bc)
λ= .
2

Theorem Let A and B be n × n matrices. If B is similar to A, then the two matrices have
the same characteristic polynomial and, consequently, the same eigenvalues.
Proof: B = P −1 AP and

det(B − λI) = det(P (A − λI)P −1 ) = det(P ) det(A − λI) det(P −1 ) = det(A − λI).

Theorem λ1 × λ2 × · · · × λn = χ(0) = det(A) and

λ1 + · · · + λn = a11 + a22 + · · · + ann = trace(A)=sum of the diagonal entries A,

which is the coefficient of (−λ)n−1 of χ(λ).


Definition (Diagnosable) A ∈ Rn×n is said to be diagonalizable if there exists a nonsin-
gular matrix P and a diagonal matrix Λ such that P −1 AP = Λ. We say that P diagonalizes
A.
That is, if A is similar to a diagonal matrix Λ:

P −1 AP = Λ = diag(λ1 , · · · , λn ),

then λi are n eigenvalues of A and each column vector ~vi of P is an eigenvector corresponding
to λi , i.e., A~vi = λi~vi . Even, if A has a repeated eigenvalue λ with algebraic multiplicity
r > 1, A has linear independent r eigenvectors corresponding to λ.
Theorem If eigenvalues {λi } of A are distinct, there exist corresponding eigenvectors {~v1 , · · · ~vn }
are linear independent and A is diagnosable.

43
Proof We prove this by induction on r. It is true for r = 1, because eigenvectors are non-zero
by definition. For r > 1, suppose that for some a1 , · · · , ar we assume
a1~v1 + a2~v2 + · · · + ar~vr = 0.
Then, applying A to this equation gives
a1 λ1~v1 + · · · + ar λr~vr = 0.
Now, subtracting λ1 times the first equation from the second gives
a2 (λ2 − λ1 )~v2 + · · · + ar (λr − λ1 )~vr = 0.
By the inductive hypothesis, {~v2 , ..., ~vr } are linearly independent, so ak (λk − λ1 ) = 0 and
thus ak = 0, k > 1 and also a1 = 0. Thus, {~v1 , ..., ~vr } are linearly independent. 
If we let P be matrix whose column vectors consist of eigenvectors {vi }:
P = [~v1 | · · · |~vn ],
which are linearly independent, then A~vi = λi ~vi , 1 ≤ i ≤ n is written as a matrix identity
AP = P Λ, Λ = diag(λ1 , · · · , λn ) ⇔ P −1 AP = Λ.
That is, A is similar to a diagonal matrix Λ 
Remark (1) A is diagonalizable does not mean A has distinct eigenvalues. For example,
A = I2 has a repeated eigenvalue λ = 1. But A is diagonal.  
n×n 1 1
(2) In general A ∈ R need not be diagonalizable Consider A = . The character-
0 1
istic polynomial is χ(λ) = (λ − 1)2 , so there is a repeated eigenvalue λ = 1. The eigenvector
equations  
0 1
(A − I)~v = ~v = 0
0 0
 
1
has a single solution ~v1 = c where c is arbitrary. That is, A is not diagnosable. To
0
proceed we will introduce the generalized eigenvectors so that one can complete the similarity
P to a Jordan canonical form.
(3) Real 2 × 2 canonical form for complex conjugate eigenvalue case. Assume A ∈
R2×2 has a complex conjugate eigenvalue λ = a ± i b. Let ~v = ~v1 + i ~v2 be a corresponding
eigenvector,
A~v = A~v1 + i A~v2 = λ ~v = a~v1 − b~v2 + i b~v1 + a~v2
and thus equating real part and imaginary part,
A~v1 = a~v1 − b~v2 , A~v2 = b~v1 + a~v2
Equivalently, if we let P = [~v1 , ~v2 ] we have
 
−1 a b
P AP = = real canonical form.
−b a

44
 
1 1
EXAMPLE Consider A = . Then, we have
−2 3

(1 − λ)(3 − λ) + 2 = λ2 − 4λ + 1 = 0 ⇒ λ = 2 ± i.

and      
1 − (2 ± i) 1 1 1 0
~v = 0 ⇒ ~v = c and P =
−2 3 − (2 ± i) 1±i 1 1

5.1 Application to ODE


Given a matrix A ∈ Rn×n consider the linear system of ordinary differential equations
d
~x(t) = A~x(t), ~x(0) = ~x0
dt
For an eigenpair (λ, ~v )
x(t) = ceλt~v with c is a constant,
d
is a solution to dt
x(t) = Ax(t), i.e.,

d
x(t) = cλeλt~v = Ax(t)
dt
If A is diagnosable, then {~v1 , · · · , ~vn } are linearly independent and there exist unique (c1 , · · · , cn )
such that
x0 = c1 p~1 + · · · + cn p~n
and thus
x(t) = c1 eλ1 t p~1 + · · · cn eλn t p~n (Superposition Principle).
Equivalently, x(t) = P eΛt P −1 x0 , where P defines a change of basis of Rn .
EXAMPLE 1 Consider the 2 × 2 system
 
d −8 −5
~x(t) = ~x(t)
dt 10 7

The characteristic polynomial is χ(λ) = λ2 + λ − 6 = (λ − 2)(λ + 3), so there are two


eigenvalues, each with algebraic multiplicity one, λ1 = 2 and λ2 = −3. The eigenvector
equations for p~1 , p~2 are
   
−10 −5 1
(A − 2 I)~p1 = p~1 = 0 ⇒ p~1 =
10 5 −2
   
−5 −5 1
(A + 3 I)~p2 = p~2 = 0 ⇒ p~2 =
10 10 −1
Thus, we have    
2t 1 −3t 1
x(t) = c1 e + c2 e .
−2 −1

45
EXAMPLE 2 Consider  
d 1 1
x= x(t)
dt 0 1
The characteristic polynomial is χ(λ) = (λ − 1)2 , so there is a repeated eigenvalue λ = 1.
The eigenvector equations  
0 1
(A − I)~v = ~v = 0
0 0
 
1
has a single solution ~v1 = c where c is arbitrary. One needs to find second one. Using
0
this eigenvector, we compute the generalized eigenvector ~v2 by solving

(A − λ I)~v2 = ~v1 .

Writing out the values:


          
1 1 1 0 v21 0 1 v21 1
−1 = = .
0 1 0 1 v22 0 0 v22 0

This gives  
0
~v2 = .
1
Thus, if P = [~v1 |v~2 ] then  
λ 1
AP = P .
0 λ
Note that     
0 1 0 1
~v1 = (A − λ I)~v2 = =
0 0 1 0
and  
0 0
(A − λ I) ~v2 = (A − λ I)~v1 = ~0 and (A − λ I) =
2 2
.
0 0
Also, we have    
t 1 t 0
x(t) = c1 e + c2 (1 + t)e .
0 1
since ((1 + t)et )0 = (1 + t)et + et .
EXAMPLE 3 This example is more complex than Example 1. A upper triangular matrix A:
 
1 0 0 0 0
 3 1 0 0 0
 
A=  6 3 2 0 0

10 6 3 2 0
15 10 6 3 2

has eigenvalues λ1 = 1 and λ2 = 2 since χ(λ) = (λ − 1)2 (λ − 2)3 = 0 with algebraic


multiplicities 2 and 3, respectively. The generalized eigenspaces of A are calculated below:

46
x1 is the ordinary eigenvector associated with λ1 = 1 and x2 s a generalized eigenvector
associated with λ2 = 2. y1 is the ordinary eigenvector associated with λ2 = 2, y2 , y3 are
generalized eigenvectors associated with λ2 .
    
0 0 0 0 0 0 0
 3 0 0 0 0 3  0
    
(A − 1 I)x1 =  6 3 1 0 0−9 = 0 = 0,
    
10 6 3 1 0 9  0
15 10 6 3 1 −3 0
    
0 0 0 0 0 1 0
 3 0 0 0 0−15  3 
    
(A − 1 I)x2 =  6 3 1 0 0 30  = −9 = x1 ,
   
10 6 3 1 0 −1   9 
15 10 6 3 1 −45 −3
    
−1 0 0 0 0 0 0
 3 −1 0 0 00 0
    
(A − 2 I)y1 = 
 6 3 0 0 00 = 0 = 0,
   
 10 6 3 0 00 0
15 10 6 3 0 9 0
    
−1 0 0 0 0 0 0
 3 −1 0 0 00 0
    
(A − 2 I)y2 = 
6 3 0 0 0 0 = 0 = y1 ,
   
 10 6 3 0 03 0
15 10 6 3 0 0 9
    
−1 0 0 0 0 0 0
 3 −1 0 0 0 0  0
    
(A − 2 I)y3 =  6 3 0 0 0  1  = 0 = y2 .
   
 10 6 3 0 0−2 3
15 10 6 3 0 0 0
This results in a basis for each of the generalized eigenspaces of A. Together the two chains
of generalized eigenvectors span the space of all 5-dimensional column vectors.
       

 0 1 
 
 0 0 0  
 3 −15

 



 00 0  
      
{x1 , x2 } = −9 30  , {y1 , y2 , y3 } = 00 1  .
         
  −1   3−2
 9  0

  
 
 
  

−3 −45 9 0 0
   

An ”almost diagonal” matrix J in Jordan normal form, similar to A is obtained as follows:


 
0 1 0 0 0
 3 −15 0 0 0
  
P = x1 x2 y1 y2 y3 =  −9 30 0 0 1 ,
 9 −1 0 3 −2
−3 −45 9 0 0

47
 
1 1 0 0 0
0 1 0 0 0
 
0
J = 0 2 1 0,
0 0 0 2 1
0 0 0 0 2
where P is a generalized eigen matrix for A, the columns of P are a canonical basis for A,
and AP = P J.
In general A is diagnosable if and only if the sum of the dimensions of the eigenspaces is n.
Or, equivalently, if and only if A has n linearly independent eigenvectors. Not all matrices
are diagonalizable; matrices that are not diagonalizable are called defective matrices. In
addition to the above examples consider the following matrix:
 
5 4 2 1
 0 1 −1 −1 
A= .
 
−1 −1 3 0
1 1 −1 2
Including multiplicity, the eigenvalues of A are λ = 1, 2, 4, 4. The dimension of the eigenspace
corresponding to the eigenvalue λ = 4 is 1 (and not 2), so A is not diagonalizable. However,
there is an invertible matrix P such that J = P −1 AP , where
 
1 0 0 0
0 2 0 0
J = .
 
0 0 4 1
0 0 0 4
The matrix J is almost diagonal. This is the Jordan normal form of A. For i = 1, 2, 3 there
exits a eigenvector pi ∈ N (λi I − A). For a repeated (algebraic) eigenvalue λ3 = λ4 = 4
(4 I − A) does not have two independent eigenvectors. But, there exists p4 ∈ N (λ4 I − A)2 )
satisfying
(λ4 I − A)p4 = p3 ,
where p3 is an eigenvector of A corresponding to λ3 = λ4 = 4. p4 are called a generalized
eigenvector of A.  
2 −2 1
Exercise Find eigenvalues and eigenvectors of A =  1 −2 1 .
1 −3 2
Solution:
 
2−λ −3 1
det  1 −2 − λ 1  = (2−λ)((−2−λ)(2−λ)+3)+3(2−λ)−3−3−(−2−λ) = −λ(λ2 −2λ+1) = 0
1 −3 2−λ
Thus, the eigenvalues are λ1 = 0, λ2 = λ3 = 1. For λ = 0
   
2−λ −3 1 1
 1 −2 − λ 1  ~v1 = 0 ⇒ ~v1 =  1 .
1 −3 2−λ 1

48
For λ = 1    
1 −3 1 1 −3 1
 1 −3 1  ~v =  0 0 0  ~v = 0,
1 −3 1 0 0 0
   
3 −1
which implies that ~v2 =  1 , ~v3 =
  0  . Thus, A is diagonizable.
0 1

5.2 Reduced form


In general, a square complex matrix A is similar to a block diagonal matrix
 
J1
J =
 ... 

Jp

where each block Ji is a square matrix of the form


 
λi 1
.
λi . .
 
Ji =  .
 
...
 1
λi

Definition (Symmetric) A ∈ Rn×n is said to be symmetric if At = A.


Definition (Real orthogonal) A ∈ Rn×n is said to be orthogonal if At = A−1 , or equivalently,
if AAt = At A = In .
Theorem (Symmetric Case) If A ∈ Rn×n is a symmetric matrix, then there exits a real
orthogonal matrices U (U t U = I) such that

AU = ΛU ⇔ A = U ΛU t

where Λ = diag(λ1 , · · · , λn ) with λi ∈ R is i-th eigenvalue of A and i-th column vector of U


is the corresponding eigenvector to λi .
Proof: An eigenvalue λ of a matrix A is characterized by the algebraic relation M u = λ u
. When A is n × n symmetric matrix, a variational characterization (Riesz method) is also
available. Consider a constrained maximization

f (x) = xt Ax subject to |x|2 ≤ 1

By Weierstrass theorem. there exist a maximizer u and by the Lagrange multipliers theorem
L(x, λ) = f (x) + λ (1 − |x|2 ) satisfies
1
Lx (u, λ) = Au − λu = 0 and |u|2 = 1
2

49
Therefore Au = λ u and |u| = 1. For every unit length eigenvector u of A its eigenvalue is
f (u), so λ is the largest eigenvalue of A. The same calculation performed on the orthogonal
complement of u, i.e., {x ∈ Rn : (x, u) = 0} gives the next largest eigenvalue of A, and so.
That is, we obtain eigen pairs (λi , ui ) such that λ1 ≥ λ2 ≥ · · · ≥ λn and {ui } is orthonormal
i.e. (ui , uj ) = δi,j . 
Theorem (Jordan form A = P JP −1 ) Given an eigenvalue λ, its corresponding Jordan
block gives rise to a Jordan chain. The generator, or lead vector, say pr , of the chain is a
generalized eigenvector such that (A − λ I)r pr = 0, where r is the size of the Jordan block.
The vector p1 = (A − λI)r−1 pr is an eigenvector corresponding to λ. In general, pi is a
preimage of pi−1 under A − λI, i.e., (A − λ I)pi = pi−1 . So the lead vector generates the
chain via multiplication by (A − λ I). Thus, AP = P Ji for each Jordan chain. Therefore, the
statement that every square matrix A can be put in Jordan normal form is equivalent to the
claim that there exists a basis consisting only of eigenvectors and generalized eigenvectors
of A.

5.2.1 Matrix exponential solution


Recall: Given a eigenvalue λ we have

λt
X tk
e = λk
k=0
k!

In the case of matrix A the matrix exponential is easy to compute:



At
X tk
e = Ak
k=0
k!

and
x(t) = eAt x(0)
defines the solution to the differential equation. If A = P −1 BP , then

eAt = P −1 eBt P

For example, B is a diagonal matrix Λ

eΛt = diag(eλ1 t , · · · , eλn t ).

If B is a Jordan block J of size r, we have


tr−1
exp(Jt) = exp(λt)(I + N t + · · · + N r−1 )
(r − 1)!
 
a b
and for B =
−b a  
cos(at) − sin(bt)
exp(Bt) = .
sin(bt) cos(bt)

50
Moreover, Let f (z) be an analytical function of a complex argument. Applying the
function on a n × n Jordan block J with eigenvalue λ results in an upper triangular matrix:
00
 (n−1)

f (λ) f 0 (λ) f 2(λ) ... f (n−1)!(λ)
(n−2)
f (λ) f 0 (λ) ... f (n−2)!(λ) 
 
 0
 
f (J) = 
 ... .
.. . .. . .. .
..
,

 0

 0 0 0 f (λ) f (λ) 
0 0 0 0 f (λ)
(k)
so that the elements of the k-th super-diagonal of the resulting matrix are f k!(λ) . For a
matrix of general Jordan normal form the above expression shall be applied to each Jordan
block. The following example shows the application to the power function f (z) = z n :
 n  n n n−1 n n−2 
λ1 1 0 0 0 λ1 1 λ1 λ
2 1
0 0
n
 0 λ1 1 0 0 
 
0
 λn1 1
λ1n−1 0 0  
 0 0 λ1 0 0  =  0 n
0 λ1 0 0
 n−1 ,

n
   n
 0 0 0 λ2 1  0 0 0 λ2 1 λ2 
0 0 0 0 λ2 0 0 0 0 λn2
k
Y
n n+1−i

where the binomial coefficients are defined as k
= .
For integer positive n it re-
i
i=1
duces to standard definition of the coefficients. For negative n the identity −n = (−1)k n+k−1
 
k k
may be of use.
Real Jordan Form decomposition A = P JP −1 . The real Jordan block is given by
 
Ci I
.
Ci . .
 
Ji =  .
 
...
 I
Ci

where for non-real eigenvalue ai + ibi with given algebraic multiplicity of the 2 × 2 matrix
form  
ai −bi
Ci = .
b i ai
This real Jordan form is a consequence of the complex Jordan form. For a real matrix the
nonreal eigenvectors and generalized eigenvectors can always be chosen to form complex
conjugate pairs. Taking the real and imaginary part (linear combination of the vector and
its conjugate), the matrix has this form with respect to the new basis.
Real Schur decomposition For A ∈ Rn×n one can always write A = U SU t where U ∈ Rn×n
is a real orthogonal matrix, U t U = In , S is a block upper triangular matrix called the real
Schur form. The blocks on the diagonal of S are of size 1 × 1 (in which case they represent
real eigenvalues) or 2 × 2 (in which case they are derived from complex conjugate eigenvalue
pairs). QR-algorithm is used to obtain S and U .

51
Basic QR-algorithm Let A0 = A. At the k-th step (starting with k = 0), we compute the
QR decomposition Ak = Qk Rk We then form Ak+1 = Rk Qk . Note that

Ak+1 = Rk Qk = Q−1 t
k Ak Qk = Qk Ak Qk ,

so all the Ak are similar to A and hence they have the same eigenvalues. The algorithm is
numerically stable because it proceeds by orthogonal similarity transforms. Let

Q̂k = Q0 · · · Qk and R̂k = R0 · · · Rk

be the orthogonal and triangular matrices generated by the QR algorithm, Then, we have

Ak+1 = Q̂tk AQ̂k

With shifts σ0 , · · · , σk , starting with A. Then

Q̂k R̂k = (A − σ0 I) · · · (A − σk I),

which is used to prove the convergence of (Qk , Rk ) to (U, S)

5.3 Markov Chain Transition matrix


Suppose there is a physical or mathematical system that has k possible states and at any
one time, the system is in one and only one of its k states. And suppose that at a given
observation period, say nth period, the probability of the system being in a particular state
depends on its status at the n − 1 period, such a system is called Markov Chain or Markov
process. Define aij to be the probability of the system to be in state i after it was in state
j (at any observation). The matrix At = (aji ) is called the Transition matrix of the Markov
Chain.
EAMPLE In a certain town, 30% of the married women get
 divorced each year and 20 % of
0.7 0.2
the single women get married each year. A = . Eigenvalues λ satisfy
0.3 0.8

(.7 − λ)(.8 − λ) − .06 = λ2 − 1.5λ + .5 = (λ − 1)(λ − .5) = 0

Eigenvector for λ1 = 1: (A − λI)~v1 = 0 where


   
0.7 − 1 0.2 −.03 0.2
A − λI = =
0.3 0.8 − 1 0.3 −0.2
 
2
and thus ~v1 = c . Eigenvector for λ2 = .5: (A − λI)~v2 = 0 where
3
   
0.7 − 0.5 0.2 0.2 0.2
A − λI = =
0.3 0.8 − 0.5 0.3 −0.3
 
1
and thus ~v2 = c .
−1

52
EAMPLE Suppose in small town there are three places to eat, two restaurants one Chinese
and another one is Mexican restaurant. The third place is a pizza place. Everyone in town
eats dinner in one of these places or has dinner at home. Assume that 20% of those who
eat in Chinese restaurant go to Mexican next time, 20% eat at home, and 30% go to pizza
place. From those who eat in Mexican restaurant, 10% go to pizza place, 25% go to Chinese
restaurant, and 25% eats at home next time. From those who eat at pizza place 30% Those
who eat at home 20% go to Chinese, 25% go to Mexican place, and 30% to pizza place. We
call this situation a system. A person in the town can eat dinner in one of these four places,
each of them called a state. In our example, the system has four states. We are interested
in success of these places in terms of their business. So, transition matrix for this example
above, is  
.25 .20 .25 .30
 .20 .30 .25 .30 
A=  .25 .20 .40 .10 

.30 .30 .10 .30


Note that the sum of each column in this matrix is one. Any matrix with this property is
called a (left) stochastic matrix, probability matrix or a Markov matrix. Define a (column)
state vector ~x  
x1
 x2 
~x =  ..  ,
 
 . 
xk
where, xi = probability that the system is in the ith state at the time of observation. That
is, ~x is a probability vector, i.e., xi ≥ 0 and the sum of the entries of the state vector has to
be one:
x1 + x2 + · · · + xk = 1.
Question What is the probability that the system is in the ith state, at the nth observation?
Answer: x(n) = An~x(0) where ~x(0) is an initial probability vector.
For example, if x(0) = (1, 0, 0, 0)t we have
     
.2495 .2495 .2495
 .2634  (10)  .2634  (20)  .2634 
x(5) = 
 .2339  , x
 =  .2339  , x
 = .2339  .

.2532 .2532 .2532

This suggests that the state vector approached to some fixed vector, as the number of
observation periods increase. In fact, the eigenvalues of A are 1.0000, −0.0962, 0.0774, 0.2688
and the first eigen state is (0.2495, 0.2634, 0.2339, 0.2532), which is the asymptotic probability
vector limn→∞ ~x(n) , independent of ~x(0) .
This is not the case for every Markov Chain. For example, if
 
0 1
A= .
1 0

53
Theorem If a Markov chain with an n × n transition matrix A converges to a steady-state
vector x, then
(i) x is a probability vector.
(ii) λ1 = 1 is an eigenvalue of A and x is an eigenvector corresponding to λ = 1.
(iii) If λ1 = 1 is a dominant eigenvalue of a (left) stochastic matrix A (i.e., |λi | < 1, i ≥ 2)
, then the Markov chain with transition A will converge to a steady-state vector.
Proof: Since ki=1 ai,j for all j, λ1 = 1 is an eigenvalue. Next, if x is a probability vector, so
P
is y = Ax since
Xk k X
X k Xk X k k
X
yi = ai,j xj = ( ai,j )xj = xj = 1
i=1 i=1 j=1 j=1 i=1 j=1

For any probability vector ~x(0)

~x(0) = a1~v1 + · · · ak~vk (assuming A is diagnosable)

and
An~x0 = a1~v1 + a2 (λ2 )n + · · · ak (λk )n~vi → a1~v1 .

6 Inner product and Orthogonality


LEARNING OBJECTIVES FOR THIS CHAPTER: Inner product and Orthogonality of
vectors, Gram-Schmidt orthogonalization, Orthogonal decomposition theorem and Minimum
norm solution, Least Square solution to linear system of equations, Generalized matrix in-
verse.
The dot (inner) product of two vectors x = (x1 , · · · , xn )t and y = (y1 , · · · , yn )t

x · y = (x, y) = x1 y1 + x2 y2 + · · · + xn yn

Note that for c ∈ R and x, y, z ∈ Rn

(y, x) = (x, y), (cx, y) = c(x, y), (x + y, z) = (x, z) + (y, z)

For example, in x = (1, 3, −5), y = (4, −2, −1) ∈ R3

(x, y) = (1 × 4) + (3 × −2) + (−5 × −1) = 4 − 6 + 5 = 3.

Transpose of matrix At For A ∈ Rm×n , (Ax, y)Rm = (x, At y)Rn since


m X
X n n X
X n
(Ax, y)Rm = ( aij xj )yi = ( aij yi )xj = (x, At y)Rn .
i=1 j=1 j=1 i=1

Inner product space An inner product space is a vector space V over the field F = R together
with a map
h·, ·i : V × V → F

54
called an inner product that satisfies the following conditions for all vectors x, y, z ∈ V and
all scalars a:
hax, yi = ahx, yi
hx + y, zi = hx, zi + hy, zi
and
hx, yi = hy, xi, hx, xi > 0 for x 6= 0.
R1
EXAMPLE V = C(−1, 1) and hx, yi = −1 x(t)y(t) dt.
R1 1
EXAMPLE V = Pn and htk , tj i = −1 tk tj dt = k+j+1 (1 − (−1)k+j+1 ).
Geometrically, we have the norm and cosine angle,
p p
k~xk = (~x, ~x) = x21 + · · · + x2n

(~x, ~y )
cos(θ) =
k~xkk~y k

where k~xk is the norm of ~x and θ is the angle between vectors ~x and ~y . Since for
all t ∈ R

0 ≤ k~x + t~y k2 = (~x + t~y , ~x + t~y ) = (~x, ~x) + 2t(x, y) + t2 (y, y) = k~xk2 + 2t(x, y) + t2 kyk2

we have the Cauchy Schwarz inequality (letting t = − (x,y)


kyk2
)

|(~x, ~y )| ≤ k~xkk~y k.

Thus,

k~x + ~y k2 ≤ k~xk2 + 2(~x, ~y ) + k~y k2 ≤ k~xk2 + 2k~xkk~y k + k~y |2 = (k~xk + k~y k)2

and we obtain the triangle inequality:

k~x + ~y k ≤ k~xk + k~y k

Thus, (Rn , k · k) is a normed space (kxk = 0 iff ~x = 0 and kc~xk = |c|k~xk for all ~x ∈ Rn and
c ∈ R).
In mathematics, particularly linear algebra and numerical analysis, the Gram-Schmidt
process is a method for orthonormalizing a set of vectors in an inner product space, most
commonly the Euclidean space Rn equipped with the standard dot product. The Gram-
Schmidt process takes a finite, linearly independent set S = {~v1 , ..., ~vk } for k ≤ n and
generates an orthogonal set S̃ = {~u1 , · · · , ~uk }, (~ui , ~uj ) = 0, i 6= j that spans the same
k-dimensional subspace S of Rn .
The method is named after Jorgen Pedersen Gram and Erhard Schmidt, but Pierre-
Simon Laplace had been familiar with it before Gram and Schmidt. In the theory of Lie
group decompositions it is generalized by the Iwasawa decomposition. The application of
the Gram-Schmidt process to the column vectors of a full column rank matrix yields the QR
decomposition (it is decomposed into an orthogonal and a triangular matrix).

55
We define the projection operator by
hv, ui
proju (v) = u,
hu, ui
i.e., this operator projects the vector v orthogonally onto the line spanned by vector u since

hv − proju (v), u) = hv, ui − hv, ui = 0.

The Gram-Schmidt orthogonalization works as follows:


u1
u1 = v1 , ũ1 =
ku1 k
u2
u2 = v2 − proju1 (v2 ), ẽ2 =
ku2 k
u3
u3 = v3 − proju1 (v3 ) − proju2 (v3 ), ũ3 =
ku3 k
u4
u4 = v4 − proju1 (v4 ) − proju2 (v4 ) − proju3 (v4 ), ũ4 =
ku4 k
.. ..
. .
k−1
X uk
uk = vk − projuj (vk ), ũk = .
j=1
kuk k

The sequence {~u1 , · · · , ~uk } is the required system of orthogonal vectors, and the normalized
vectors {ũ1 , · · · , ũk } form an orthonormal set. Equivalently,

A = [~v1 | · · · |vk ] = QR with Q = [ũ1 | · · · |ũk ]

where Qt Q = I and R is a upper triangular (coefficient) matrix.


{~uj }nj=1 is orthogonal basis of V we have the Fourier formula:

h~u, ~uj i
~u = a1 ~u1 + · · · + an ~un with aj = ,
h~uj , ~uj i

since (~u, ~uj ) = aj h~uj , ~uj i and

k~uk2 = a21 k~u1 k2 + · · · a2n k~un |2 .

6.1 Orthogonal decomposition


Definition (Orthogonal complement)

S ⊥ = {x ∈ Rn : (x, s) = 0 for all s ∈ S}

56
is the orthogonal complement of a subspace S of Rn .
Theorem (The Orthogonal Decomposition Theorem) Let S be a subspace of Rn . Then
each x ∈ Rn can be uniquely represented in the form

x = x̂ + z and kxk2 = kx̂k2 + kzk2

where x̂ ∈ S and z ∈ S ⊥ (Rn = S ⊕ S ⊥ )


Proof: Let (~u1 , · · · ~un ) be any orthonormal basis of S, i.e. for any s ∈ S

s = (s, ~u1 ) ~u1 + · · · + (s, ~un ) ~un

If for x ∈ Rn define
x̂ = (x, ~u1 ) ~u1 + · · · + (x, ~un ) ~un ∈ S
then z = x − x̂ ∈ S ⊥ since (z, ~ui ) = (x, ~ui ) − (x̂, ~ui ) = 0. The decomposition is unique since
if there exit tow decompositions of x

x = x̂1 + z1 = xˆ2 + z2

then
x̂1 − xˆ2 = z2 − z1 ∈ S ∩ S ⊥ ⇒ x̂1 − xˆ2 = z2 − z1 = 0.

Orthogonal decomposition theorem II For a matrix A ∈ Rm×n . Then N (A) = R(At )⊥


and thus from the orthogonal decomposition theorem

Rn = N (A) ⊕ R(At ).

Proof: Suppose x ∈ N (A), then

(x, At y)Rn = (Ax, y)Rm = 0

for all y ∈ Rm and thus N (A) ⊂ R(At )⊥ . Conversely, x∗ ∈ R(At )⊥ , i.e.,

(x∗ , At y) = (Ax∗ , y) = 0 for all y ∈ Rm

Thus, Ax∗ = 0 and R(At )⊥ ⊂ N (A). 

57
6.2 Generalized inverse
Minimum norm solution Similarly, we have

Rm = N (At ) ⊕ R(A), Rn = N (A) ⊕ R(At ).

Suppose A ∈ Rm×n , n ≥ m (under-determined) wth R(A) = Rm , equivalently N (At ) = {0}.


From the theorem x = At y defines the minimum solution to Ax = b and thus y satisfies
AAt y = b and
x = At (AAt )−1 b is the minimum norm solution.
Note that if y ∈ N (AAt ) then (y, AAt y) = |At y|2 = 0 and y = 0, i.e., AAt is nonsingular.
Least square solution Recall that we have if N (A) = {0}, equivalently R(At ) = Rn for
A ∈ Rn×n so that Ax = b has a unique. In general for A ∈ Rm×n , m ≥ n (over-determined),

x = (At A)−1 At b defines the least square solution.

that minimizes the error kAx − bk2 . Note that if x ∈ N (At A) then (x, At Ax) = |Ax|2 = 0
and x = 0, i.e., At A is nonsingular.
Generalized inverse of A Consider the regularized least squares formulation for α > 0

min J(x) = kAx − bk2 + α kxk2

Then the minimizer x is given by

x∗ = (At A + αI)−1 At b.

Note that if x ∈ N (At A + αI) then (x, (At A + αI)x) = |Ax|2 + |x|2 = 0 and x = 0, i.e.,
At A + αI is nonsingular. In fact for x̃ ∈ Rn

J(x̃) = kA(x̃ − x∗ ) − (Ax∗ − b)k2 + α kx̃ − x∗ + x∗ k2

= kA(x̃ − x∗ )k2 + kx̃ − x∗ k2 − 2(A(x̃ − x∗ ), Ax∗ − b) − 2(x̃ − x∗ , x∗ ) + J(x∗ )

where

(A(x̃ − x∗ ), Ax∗ − b) + (x̃ − x∗ , x∗ ) = (x̃ − x∗ , (At A + αI)x∗ − At b) = 0.

Thus, we have
J(x̃) ≥ J(x∗ )
whose equality if and only if x̃ = x∗ .

6.3 Approximation Theory


Let S be a subspace of V . Consider the least square approximation of ~u in V by a linear
combination of vectors in S = {~u1 , · · · , ~un by the least square criterion

min k~u − sk2 , over s = x1~u1 + · · · + xn~un ∈ S

58
If {~u1 , · · · , ~un } is an orthonormal basis of S then it follows from the orthogonal decomposition
theory
s∗ = h~u, ~u1 i ~u1 + · · · + h~u, ~un i ~un
is the best approximation of ~u. In general,

k~u − sk2 = |Ax − b|2Rn

where A = [~u1 | · · · |~un ]. The best solution x∗ is given by

x∗ = (At A)−1 (At b),

where
(At A)ij = h~ui , ~uj i, At b = (h~u, ~u1 i, · · · , h~u, ~un i)t .
EXAMPLE 1 (Polynomial approximation) Let S be the subspace P1 of all linear func-
tions in C[0, 1]. Although the functions 1 and√x span S, they are not orthogonal. By
the Gram-Schmitz orthogonalization u2 (x) = 12(x − 12 ) is orthogonal to u1 = 1, i.e.
{u0 (x), u1 (x)} is an orthonormal basis of P1 . Thus, the best linear approximation u(x) = ex
is given by a1 + a2 u2 (x) with
Z 1
x
Z 1 √
e u1 (x) dx = e − 1, a2 = ex u2 (x) dx = 3(3 − e).
0 0
R1 1
Next, let S = P3 . The, we evaluate matrix Qkj = 0 xk xj dx = k+j+1 (1 − (−1)k+j+1 ) and
R1
vector ck = 0 ex xk−1 dx, k = 1, 2, 3.4. Then, the best cubic approximation is given by

a1 + a2 x + a3 x2 + a4 x3

where a ∈ R4 solves Qa = c.
EXAMPLE 2 (Fourie cosine series) Let V be a space of even functions in C[−π, π]. and
S = { √12 , cos(x), cos(2x), · · · , cos(nx)} Then, {cos(kπx)}nk=0 is an orthonormal set of vectors

in V , i.e., π1 −π cos(kπ) cos(jπx) dx = 0 for k 6= j with inner product defined by
Z π
1
hu, vi = u(x)v(x) dx.
π −π

It follows from the orthogonal decomposition theory

s∗ = h~u, ~u1 i ~u1 + · · · + h~u, ~un i ~un

where the Fourier coefficient is given by


1 π
Z
h~u, ~uk i = u(x) cos kπx dx, k ≥ 1
π −π

is the best approximation of a function u(x) ∈ V .

59
7 QR decomposition and Singular value decomposition
Householder transform and QR decomposition A = QR
Let e1 be the vector (1, 0, · · · , 0)t , || · || is the Euclidean norm

kxk2 = x21 + x22 + · · · + x2m = xt x = (x, x)

and I is an m × m identity matrix, set

u = x − αe1 ,
u
v= ,
kuk
Q = I − 2vvT .

Or,
Q = I − 2vvt .
where Q is an m − by − m Householder matrix and
T
Qx = α 0 · · · 0 .

Note that
Qt Q = (I − 2vvt )(I − 2vvt .) = I − 4vvt + 4vvt = I
This can be used to sequentially transform an m − by − n matrix A to upper triangular
form. First, we multiply A with the Householder matrix Q1 we obtain when we choose the
first matrix column for x. This results in a matrix Q1 A with zeros in the left column (except
for the first row).  
α1 ? . . . ?
0 
Q1 A =  ..
 

. A1 
0
This can be repeated for A1 (obtained from Q1 A by deleting the first row and first column),
resulting in a Householder matrix Q2 . Note that Q2 is smaller than Q1. Since we want it
really to operate on Q1 A instead of A we need to expand it to the upper left, filling in a 1,
or in general:  
Ik−1 0
Qk = .
0 Q0k
After k iterations of this process, k = min(m − 1, n)

R = Qk · · · Q2 Q1 A

is an upper triangular matrix. So, with

Q = QT1 QT2 · · · QTt ,

A = QR is a QR decomposition of A.

60
Remark Note that Q is a real orthogonal transform Qt Q = I and Qt = Q−1 and
kQ~xk = k~xk (norm preserving for all x ∈ Rm ).
In fact,
Qt Q = Qk · · · Q2 Q1 Qt1 Qt2 · · · Qtk = I
and
||Qx||2 = (Qx)t Qx = xt Qt Qx = xt x = ||x||2
QR method for Eigenvalue problems In numerical linear algebra, the QR algorithm is an
eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a
matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera
N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition,
writing the matrix as a product of an orthogonal matrix and an upper triangular matrix,
multiply the factors in the reverse order and iterate. It is a power method to compute
dominant eigen value-pairs.
Basic QR-algorithm Let A0 = A. At the k-th step (starting with k = 0), we compute the
QR decomposition Ak = Qk Rk We then form Ak+1 = Rk Qk . Note that
Ak+1 = Rk Qk = Q−1 t
k Ak Qk = Qk Ak Qk ,

so all the Ak are similar to A and hence they have the same eigenvalues. The algorithm is
numerically stable because it proceeds by orthogonal similarity transforms. Let
Q̂k = Q0 · · · Qk and R̂k = R0 · · · Rk
be the orthogonal and triangular matrices generated by the QR algorithm, Then, we have
Ak+1 = Q̂tk AQ̂k
With shifts σ0 , · · · , σk , starting with A. Then
Q̂k R̂k = (A − σ0 I) · · · (A − σk I)

7.1 PCA(Principal Component Analysis


PCA is defined as an orthogonal linear transformation that transforms the data to a new
coordinate system such that the greatest variance by some scalar projection of the data
comes to lie on the first coordinate (called the first principal component), the second greatest
variance on the second coordinate, and so on.
Consider an n × p data matrix, X, with column-wise zero empirical mean (the sample
mean of each column has been shifted to zero), where each of the n rows represents a different
repetition of the experiment, and each of the p columns gives a particular kind of feature
(say, the results from a particular sensor).
Mathematically, the transformation is defined by a set of size ` of p-dimensional vectors
of weights or coefficients w(k) = (w1 , · · · , wp )(k) that map each row vector x(i) of X to a new
vector of principal component scores
t(i) = (t1 , . . . , tl )(i) ,

61
given by
tk(i) = x(i) · w(k) for i = 1, . . . , n k = 1, . . . , `
Since w(1) has been defined to be a unit vector, it equivalently also satisfies
 T T   T T 
w X Xw w X Xw
w(1) = arg max w(1) = arg max
wT w wT w

The quantity to be maximised can be recognised as a Rayleigh quotient. A standard result


for a positive semidefinite matrix such as X T X is that the quotient’s maximum possible
value is the largest eigenvalue of the matrix, which occurs when w is the corresponding
eigenvector.
With w(1) found, the first principal component of a data vector x(i) can then be given
as a score t1 (i) = x(i) · w(1) in the transformed co-ordinates, or as the corresponding vector
in the original variables, (x(i) · w(1))w(1).

7.2 Singular value decomposition


A = U SV t
where U and V are real orthogonal matrices on Rn and S is a diagonal matrix consisting of
singular values of A.
We give two arguments for existence of singular value decomposition. First, since

At A = (U SV t )t U SV t = V SU t U SV t = V S 2 V t ,

AAt = U SV t (U SV t )y = U SV t V SU t = U S 2 U t ,

thus
U corresponds to the eigenvectors of AAt ,

V corresponds to eigenvectors of At A

S 2 = Λ̃=eigenvalues of At A (AAt )
Singular values are similar in that they can be described from variational principles.
Consider a constraint optimization

max σ(u, v) = ut M v subject to |u| ≤ 1, 1|v| ≤ 1

By Lagrange multiplier theorem with

L(u, v, λ1 , λ2 ) = σ(u, v) + λ1 (|u|2 − 1) + λ2 (|v|2 − 1)

there exists (u1 , v1 ) such that

M v1 = λ1 u1 , M t u1 = λ2 v1 , |u1 | = |v1 | = 1.

62
Multiplying the first equation from by ut and multiplying the second equation from by v t ,
we have
σ1 = ut1 M v1 = λ1 = λ2
The same calculation performed on the orthogonal complement {u ∈ Rn : (u, u1 ) = 0}×{v ∈
Rm : (v, v1 ) = 0} and gives the next largest singular value of M , and so. That is, we obtain
singular value triples (σi , ui , vi ) such that σ1 ≥ σ2 ≥ · · · ≥ σn and {ui } is orthonormal i.e.
(ui , uj ) = δi,j , {vi } is orthonormal i.e. (vi , vj ) = δi,j and

M vi = σi ui , M t ui = σi vi , |ui | = |vi | = 1.

Thus, M = U SV t .
Application (Image compression)
A bitmap image is represented by a 864 × 1, 536 matrix, call it A Compute svd decom-
position of A = U SV t and and use a truncated svd

à = Ũ S̃ Ṽ t

where S̃ = diag(s1 , · · · , st ), Ũ = (~u1 , · · · , ~ut ), Ṽ = (~v1 , · · · , ~vt ). We select t such that the
first t singular values of A dominate the remains singular values. It can be proved that à is
the optimal rank t approximation of A

|A − Ã|F is smallest.

63

You might also like