Linear Algebra, Lecture Notes, Math 405.
Linear Algebra, Lecture Notes, Math 405.
Kazufumi Ito∗
November 18, 2020
CONTENTS
• Vector space. — Subspaces, null space N (A) and range space R(A), Linear indepen-
dent vectors, Span, Gauss-Jordan reduction, Reduced Row Echelon Form, Elementary
matrix multiplication and LU decomposition, Basis and dimension, Inverse of square
matrix A.
1 Introduction
Probably the most important problem in mathematics is that of solving a system of linear
equations. Well over 75 percent of all mathematical problems encountered in scientific or
industrial applications involve solving a linear system at some stage. By using the methods
of modern mathematics, it is often possible to take a sophisticated problem and reduce it
to a single system of linear equations. Linear systems arise in applications to such areas
∗
Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
1
as business, economics, sociology, ecology, demography, genetics, electronics, engineering,
physics, statistics, neuron-network and AI. Therefore, it seems appropriate to begin the
lecture with a section on linear systems.
A linear map A maps a column vector x of dimension n into a vector y of dimension m
by (A(αx1 + βx2 ) = α Ax1 + β Ax2 ) .
x1
x2
x = ..
.
xn
onto a vector y of dimension m by
a11 x1 + · · · + a1n xn
a21 x1 + · · · + a2n xn
y = A(x) = .
..
.
am1 x1 + · · · + amn xn
The linear map A is thus defined by the matrix
a11 a12 ··· a1n
a21 a22 ··· a2n
A = .. .. ,
.. ..
. . . .
am1 am2 ··· amn
and maps the column vector x to the matrix product
y = Ax.
A Linear system of equations is that Ax = y and we look for a solution x (vector) given
right had-side vector y.
The dot product of a vector a = (a1 , a2 · · · an ) with x is defined as
a · x = a1 x 1 + a2 x 2 + · · · an x n ,
i.e., by multiplying term-by-term the entries of a and x and summing these n products Then,
yi is the dot product of the i-th row of A and x.
EXAMPLE (traffic flow)
In the downtown section of a certain city, two sets of one-way streets intersect as shown
in Figure. The average hourly volume of traffic entering and leaving this section during rush
hour is given in the diagram.
At each intersection the number of automobiles entering must be the same as the number
leaving
x1 + 450 = x2 + 610 (intersection A)
x2 + 520 = x3 + 480 (intersection B)
x3 + 390 = x4 + 600 (intersection C)
x4 + 640 = x1 + 310 (intersection D)
2
Thus, we obtain a system of linear equations:
1 −1 0 0 x1 160
0 1 −1 0 x2 −40
= .
0 0 1 −1 x3 210
−1 0 0 1 x4 −330
3
Matrix product C = AB In general If A is an m × n matrix and B is an n × p matrix
(Note: number of columns of A and number of rows of B must be the same n).
a11 a12 · · · a1n b11 b12 · · · b1p
a21 a22 · · · a2n b21 b22 · · · b2p
A = .. .. , B = ..
.. . . .. . . ..
. . . . . . . .
am1 am2 · · · amn bn1 bn2 · · · bnp
4
such that n
X
cij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj ,
k=1
for i = 1, · · · , m and j = 1, · · · , p. That is, the entry cij is the dot product of the i th row
of A and the jth column of B.
Therefore, C = AB can also be written as
a11 b11 + · · · + a1n bn1 a11 b12 + · · · + a1n bn2 ··· a11 b1p + · · · + a1n bnp
a21 b11 + · · · + a2n bn1 a21 b12 + · · · + a2n bn2 ··· a21 b1p + · · · + a2n bnp
C= .
.. .. ... ..
. . .
am1 b11 + · · · + amn bn1 am1 b12 + · · · + amn bn2 ··· am1 b1p + · · · + amn bnp
Thus the product AB is defined if and only if the number of columns in A equals the number
of rows in B, in this case n.
Note: An element s belongs to a set S ⇔ s ∈ S.
A ⊂ B: a set A is a subset of a set B, or equivalently B is a superset of A, if A is
contained in B. That is, all elements of A are also elements of B.
For example, Q is subset of R and R is a subset of C.
5
6
2 Vector space
Linear algebra is the study of linear maps on finite-dimensional vector spaces. Eventually
we will learn what all these terms mean. In this chapter we will define vector spaces and
discuss their elementary properties. We recall an n-tuple of real numbers as a column vector
x1
x = ~x = ...
xn
For example, the solution of the linear system. A vector space is a collection of vectors. The
operations of addition and scalar multiplication rules for vectors are
u1 v1 a1 u1 + a2 v1
a1 u2 + a2 v2 = a1 u2 + a2 v2
u3 v3 a1 u3 + a2 v3
7
LEARNING OBJECTIVES FOR THIS CHAPTER:
• Bases
• Dimension of subspace
• Additive and multiplicative identity: there exist two different elements 0 and 1 in F
such that a + 0 = a and a1 = a.
• Additive inverses: for every a in F , there exists an element in F, denoted −a, called
the additive inverse of a, such that a + (−a) = 0.
8
This may be summarized by saying: a field has two operations, called addition and
multiplication; it is an abelian group under addition with 0 as the additive identity; the
nonzero elements are an abelian group under multiplication with 1 as the multiplicative
identity; and multiplication distributes over addition.
N, Z are not field. C, R and Q are all fields. There are many other fields, including
some finite fields. For example, for each prime number p, there is a field Fp = {0,1,2,...,p-1}
with p elements, where addition and multiplication are carried out modulo p. Thus, in F7 ,
we have 5 + 4 = 2, 5 × 4 = 6 and 5−1 = 3 because 5 × 3 = 1. The smallest such field F2 has
just two elements 0 and 1, where 1 + 1 = 0. This field is extremely important in Computer
Science since an element of F2 represents a bit of information.
Definition (Vector Space V ) v = ~v ∈ V
• Associativity of addition: u + (v + w) = (u + v) + w
• Commutativity of addition: u + v = v + u
• Identity element of addition: There exists an element 0 ∈ V , called the zero vector,
such that v + 0 = v for all v ∈ V .
which we can think of geometrically as the points in ordinary 2 and 3-dimensional space,
equipped with a coordinate system. In general
Rn = {~x = (x1 , x2 , · · · , xn ), x1 , x2 , · · · , xn ∈ R}
u1 v1 u1 + v1
~u + ~v = u2 + v2 = u2 + v2
u3 v3 u3 + v3
c u1
c ~u = c u2
c u3
9
2 u1 v1 3w1 2u1 + v1 − 3w1
~ = 2 u2 + v2 − 3w2 = 2u2 + v2 − 3w2
(2~u + ~v ) − 3w
2 u3 v3 3w3 2u3 + v3 − 3w3
(2) The set Rm×n of all m × n matrices is itself a vector space over R using the operations
of addition and scalar multiplication.
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
a21 a22 a23 + b21 b22 b23 = a21 + b21 a22 + b22 a23 + b23
a31 a32 a33 b31 b32 b33 a31 + b31 a32 + b32 a33 + b33
a11 a12 a13 ca11 ca12 ca13
c a21 a22 a23 = ca21 ca22 ca23
a31 a32 a33 ca31 ca32 ca33
(3) Let Pn be the set of polynomials in x with coefficients in the field F . That is,
Pn = {a0 + a1 x + · · · + an xn , ai ∈ R}.
Let C((0, 1), R), consisting of all functions f : (0, 1) → R with the usual pointwise definitions
of addition and scalar multiplication of functions.
(f + g)(t) = f (t) + g(t), (cf )(t) = cf (t) for all t ∈ [0, 1].
We shall assume the following additional simple properties of vectors and scalars from
now on. They can all be deduced from the axioms (and it is a useful exercise to do so).
v = v + 0 = v + u + v 0 = v 0 + (u + v) = v 0 + 0 = v 0
2.1 Subspaces
Definition (subspace) A subset U of vector space V is called a subspace of V if U is also a
vector space (using the same addition and scalar multiplication as on V ).
EXAMPLEs
x x
y : x, y ∈ R , y : x + 2y − z = 0
0 z
10
are subspace of R3
x
y : x + 2y − z = 1
z
is not a subspace of R3 .
x y
: x, y, z ∈ R (symmetric matrix)
y z
is a subspace in R2×2
(2) The null space of a matrix A
N (A) = {x ∈ Rh : Ax = ~0}
is a subspace of Rn since
R(A) = {y ∈ Rm : y = Ax, x ∈ Rn }
is a subspace of Rm since
(3) P2 is a subspace of P3
(4) The space of all continuously differentiable function C 1 (0, 1) on (0, 1) is a subspace of
the space of continuous functions C(0, 1)
(5) {f 0 (1/2) = f (1/2)} is a subspace of C 1 (0, 1).
(6) Let S be the set of all f ∈ C 2 (0, 1) such that f 00 + f (x) = 0. is a subspace of C 2 (0, 1).
In fact
(a1 f + a2 g)00 + (a1 f + a2 g) = a1 (f 00 + f (x)) + a2 (g 00 + g(x)) = 0
and a1 f + a2 g ∈ S for all f, g ∈ S.
Proposition If W1 and W2 are subspaces of V then so is W1 ∩ W2 .
Proof. Let u, v ∈ W1 ∩ W2 and a ∈ F . Then u + v ∈ W1 (because W1 is a subspace)
and u + v ∈ W2 (because W2 is a subspace). Hence u + v ∈ W1 ∩ W2 . Similarly, we get
av ∈ W1 ∩ W2 , so W1 ∩ W2 is a subspace of V .
Warning! It is not necessarily true that W1 ∪ W2 is a subspace, as the following example
shows.
EXAMPLE Let V = R2 , let W1 = {(a, 0) : a ∈ R} and W2 = {(0, b) : b ∈ R}. Then W1 , W2
are subspaces of V , but W1 ∪ W2 i s not a subspace, because (1, 0), (0, 1) ∈ W1 ∪ W2 , but
(1, 0) + (0, 1) = (1, 1) ∈
/ W1 ∪ W2 .
Note that any subspace of V that contains W1 and W2 has to contain all vectors of the
form u + v for u ∈ W1 , v ∈ W2 . This motivates the following definition.
11
Definition Let W1 , W2 be subspaces of the vector space V . Then the direct sum of W1 , W2
is
W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 }.
Do not confuse W1 + W2 with W1 ∪ W2 .
Proposition If W1 , W2 are subspaces of V then so is W1 + W2 . In fact, it is the smallest
subspace that contains both W1 and W2 .
Proof. Let u, v ∈ W1 + W2 . Then u = u1 + u2 for some u1 ∈ W1 , u2 ∈ W2 and v = v1 + v2 for
some v1 ∈ W1 , v2 ∈ W2 . Then u + v = (u1 + v1 ) + (u2 + v2 ) ∈ W1 + W2 . Similarly, if a ∈ F
then av = av1 + av2 ∈ W1 + W2 . Thus W1 + W2 is a subspace of V . Any subspace of V that
contains both W1 and W2 must contain W1 + W2 , so it is the smallest such subspace.
Definition A sequence of vectors (v~1 , v~2 , · · · , v~k )from a vector space V is said to be linearly
dependent, if there exist scalars a1 , a2 , . . . , ak , not all zero, such that
Notice that if not all of the scalars are zero, then at least one is non-zero, say a1 , in which
case this equation can be written in the form
−a2 −ak
v~1 = v~2 + · · · + v~k .
a1 a1
Thus, v~1 is shown to be a linear combination of the remaining vectors.
A sequence of vectors (v~1 , v~2 , . . . , v~n ) is said to be linearly independent if the equation
can only be satisfied by ai = 0, i = 1, · · · , n. This implies that no vector in the sequence can
be represented as a linear combination of the remaining vectors in the sequence. Even more
12
concisely, a sequence of vectors is linear independent if and only if ~0 can be represented as
a linear combination of its vectors in a unique way.
The alternative definition, that a sequence of vectors is linearly dependent if and only if
some vector in that sequence can be written as a linear combination of the other vectors.
Remark: (1) Let ~vi , 1 ≤ i ≤ n be column vectors
v1,i
~vi = ... ∈ Rm
vm,i
Then,
v1,1 · · · v1,n a1 0
.. .. .. = .. .
. . . .
vm,1 · · · vm,n an 0
(2) {~vk } are linearly independent Ax = ~0 has a unique solution x = ~0 and N (A) = {~0} (null
space of A). Moreover Ax = b has a unique solution, i.e., Ax1 = b and Ax2 = b implies
A(x1 − x2 ) = ~0 and thus x1 = x2 .
(3) {~vk } are linearly dependent Ax = ~0 has a nontrivial solution.
Question 1 and Objective: Identify linearly independent or dependent. How to
find N (A) and R(A).
2.3 Span
Let ~v1 , ~v2 , · · · , ~vn be vectors in a vector space V . A sum of the form
where a1 , · · · an ∈ R, is called a linear combination of ~v1 , ~v2 , · · · , ~vn . The set of all linear
combinations of ~v1 , ~v2 , · · · , ~vn is called the span of ~v1 , ~v2 , · · · , ~vn , i.e.,
which is a subspace of V .
Remark Let ~ei is the i−th unit vector such that (~ei )j = 0, j 6= i and (~ei )i = 1 Then, {~ei }ni=1
are linear independent and Rn = span(~e1 , ~e2 , · · · , ~en ), i.e.,
13
if and only if {~v1 , ~v2 , · · · , ~vn } are linearly independent.
Question 2 and Objective: How to determine the span of vectors.
EXAMPLEs (1) P3 = span(1, x, x2 , x3 )
p(x) = a0 + a1 x + a2 x2 + a3 x3
(2) Vectors {1, x, x2 , x3 } are linearly independent in P3 . Suppose x3 are linearly dependent,
i.e.
x3 = a0 + a1 x + a2 x2 for some a0 , a1 , a2 ∈ R
Taking derivative of this three times in x, we obtain 6 = 0, which is a contradiction.
(3) Vectors {a11 +a12 x+a13 x2 , a21 +a22 x+a23 x2 , a31 +a32 x+a33 x2 } are linearly independent
if and only if N (A) = {~0} where
a11 a12 a13
A = a21 a22 a23 .
a31 a32 a33
(4) the sum of two vectors ~v1 and ~v2 is the plane that contains them both. For ~v1 =
(1, 2, 3)t , ~v2 = (2, 4, 6)t are linearly dependent and
(5) If span(~v1 , ~v2 , · · · , ~vn ) = Rm , then n ≥ m. Conversely, if n > m, then (~v1 , ~v2 , · · · , ~vn )
are linear dependent.
14
is in upper triangular form, since in matrix A has all zeroes under the diagonal. Because
of the strict triangular form, the system is easy to solve. It follows from the third equation
that x3 = 2. Using this value in the second equation, we obtain
x 2 − 2 = 2 ⇒ x2 = 4
Using x2 = 4, x3 = 2 in the first equation, we end up with
3x1 + 2 · 4 + 2 = 1 ⇒ x1 = −3
Thus, the solution of the system is (−3, 4, 2).
Any n×n upper triangular system can be solved in the same manner as the last example.
First, the nth equation is solved for the value of xn . This value is used in the (n−1)st equation
to solve for xn−1 . The values xn and xn−1 are used in the (n − 2)nd equation to solve for
xn−2 , and so on. We will refer to this method of solving a upper triangular system as back
substitution.
Remark If all diagonal entries of upper triangle matrix A are nonzero, then
Ax = b has a unique solution by back substitution. Ax = ~0 has a unique solution
x = ~0, equivalently N (A) = {~0} and {~v1 , · · · , ~vn } are linearly independent.
Gauss-Jordan Reduction is to transform A into a upper triangular matrix by
row operations as below (Gauss elimination).
EXAMPLE 2 Solve the system
x1 + 2x2 + x3 = 3 1 2 1 x1 3
3x1 − x2 − 3x3 = −1 ⇔ 3 −1 −3 x2 = −1 .
2x1 + 3x2 + x3 = 4 2 3 1 x3 4
Subtracting 3 times the first row from the second row yields −7x2 − 6x3 = −10.
Subtracting 2 times the first row from the third row yields −x2 − x3 = −2.
If the second and third equations of our system, respectively, are replaced by these new
equations, we obtain the equivalent system
1 2 1 x1 3
0 −7 −6 x2 = −10
0 −1 −1 x3 −2
If the third equation of this system is replaced by the sum of the third equation and − 17
times the second equation, we end up with the following upper triangular system:
1 2 1 x1 3
0 −7 −6 x2 = −10
0 0 − 17 x3 − 47
Using back substitution, we get x3 = 4, x2 = −2, x1 = 3.
With each system of equations Ax = b we may associate an augmented matrix of the
form
a11 · · · a1n b1
A b = ... .. .. .
. .
am1 · · · a1n bm
15
where we attach to the coefficient matrix A an additional column b. The system can be
solved by performing operations on the augmented matrix. The xi?s are placeholders that
can be omitted until the end of the computation. Corresponding to the three operations used
to obtain equivalent systems, the following row operations may be applied to the augmented
matrix:
Elementary Row Operations
16
Finally, the third row is used as the pivotal row to eliminate the last element in the third
column:
1 1 1 1 6
0 −1 −1 1 0
.
0 0 −3 −2 −13
0 0 0 −1 −2
This augmented matrix represents a upper triangular system. Solving by back substitution,
we obtain the solution (2, −1, 3, 2). In general, if an n × n linear system can be reduced to
upper triangular form, then it will have a unique solution that can be obtained by performing
back substitution on the triangular system. We can think of the reduction process as an
algorithm involving n − 1 steps. At the first step, a pivot element is chosen from among the
nonzero entries in the first column of the matrix. The row containing the pivot element is
called the pivotal row. We interchange rows (if necessary) so that the pivotal row is the new
first row. Multiples of the pivotal row are then subtracted from each of the remaining n − 1
rows so as to obtain 0s in the first entries of rows 2 through n. At the second step, a pivot
element is chosen from the nonzero entries in column 2, rows 2 through n, of the matrix.
The row containing the pivot is then interchanged with the second row of the matrix and
is used as the new pivotal row. Multiples of the pivotal row are then subtracted from the
remaining n − 2 rows so as to eliminate all entries below the pivot in the second column.
The same procedure is repeated for columns 3 through n − 1. Note that at the second step
row 1 and column 1 remain unchanged, at the third step the first two rows and first two
columns remain unchanged, and so on. At each step, the overall dimensions of the system
are effectively reduced by 1 If the elimination process can be carried out as described, we
will arrive at an equivalent strictly triangular system after n − 1 steps. The steps of Gauss
elimination is depicted by
However, the procedure will break down if, at any step, all possible choices for a pivot
element equals to 0. When this happens, the alternative is to reduce the system to certain
special echelon, or staircase-shaped, forms. These echelon forms will be studied in the next
section. They will also be used for m × n systems, where m 6= n.
17
certain special echelon, or staircase-shaped, forms. A matrix is in row echelon form if it has
the shape resulting from a Gaussian elimination. Specifically, a matrix is in row echelon
form if
(a) all rows consisting of only zeroes are at the bottom.
(b) the leading coefficient (also called the pivot) of a nonzero row is always strictly to the
right of the leading coefficient of the row above it.
These two conditions imply
(c) all entries in a column below a leading coefficient are zeros.
18
Suppose we start with b = (1, −1, 1, 3, 4)t we obtain then the reduction process will yield
the echelon-form augmented matrix with the last column = (1, 3, 0, 0, 0)t and the last two
equations of the reduced system will be satisfied for any 5-tuple. Thus the solution set will
be the set of all 5-tuples satisfying the first three equations.
x1 + x2 + x3 + x 4 + x5 = 1
x3 + x4 + 2x5 = 0
x5 = 3.
The variables corresponding to the first nonzero elements in each row of the reduced matrix
will be referred to as lead variables. Thus x1 , x3 , and x5 are the lead variables. The remaining
variables corresponding to the columns skipped in the reduction process will be referred to
as free variables. Hence, x2 and x4 are the free variables. If we transfer the free variables
19
over to the right-hand side of this, we obtain the system
x1 + x3 + x5 = 1 − x2 − x4
x3 + 2x5 = −x4
x5 = 3.
System this is strictly triangular in the unknowns x1 , x3 , and x5 . Thus, for each pair of
values assigned to x2 = α and x4 = β, there will be a unique solution.
x5 = 3, x3 = −β − 6, x1 = −2 − α − 2β
x1 + x3 + x5 = −x2 − x4
x3 + 2x5 = −x4
x5 = 0
Thus, we have
is reduced to
1 −1 0 0 160 1 −1 0 0 160 1 −1 0 0 160
0 1 −1 0 −40 → 0
1 −1 0 −40 → 0
1 −1 0 −40
→ .
0 0 1 −1 210 0 0 1 −1 210 0 0 1 −1 210
0 −1 0 1 −170 0 −0 −1 1 −210 0 −0 0 0 0
The system is consistent, and since there is a free variable, there are many possible solutions.
The traffic flow diagram does not give enough information to determine x1 , x2 , x3 , and x4
uniquely. If the amount of traffic were known between any pair of intersections, the traffic
on the remaining arteries could easily be calculated. For example, if the amount of traffic
between intersections C and D averages 200 automobiles per hour, then x4 = 200. Using
this value, we can then solve for x1 , x2 , and x3 by back substitution x1 = 530, x2 = 170
x3 = 410.
EXAMPLE (Underdetermined) Consider
20
It is consistent. We put the free variables x2 , x3 over on the right-hand side, it follows that
x1 = 1 − x2 − x3 , x4 = 2, x5 = −1.
(1 − α − β, α, β, 2, −1)
EXAMPLE (Overdetermined)
21
This can be done by multiplying A(n−1) to the left with the lower triangular matrix
1
...
1
Ln = ,
−ln+1,n
.. ...
.
−lN,n 1
We set
1 1
.. . . .
.
1 0 0 an,n · · · an,N
A(n) := Ln A(n−1) = .
−ln+1,n 0 0 an+1,n an+1,N
.. ... ... ...
.. ..
. . .
−lN,n 1 0 0 aN,n · · · aN,N
which coincides with the n-th Gauss elimination step and n-th step matrix A(n) has n-
(n)
th column with all zeros under ann , i.e., ai,n , n + 1 ≤ i ≤ N . After N − 1 steps, we
eliminated all the matrix elements below the main diagonal, so we obtain an upper triangular
matrix U = A(N −1) . We find the LU decomposition A = LU , i.e.,
−1
U = A(N −1) = LN −1 LN −2 · · · L1 A, L = (LN −1 LN −2 · · · L1 )−1 = L−1 −1
1 L2 · · · LN −1 .
Because the inverse of a lower triangular matrix Ln is again a lower triangular matrix, and
the multiplication of two lower triangular matrices is again a lower triangular matrix, it
follows that L is a lower triangular matrix. Moreover, it can be seen that
1
l2,1 . . .
..
. 1
L= . .
.. . . . l 1
n+1,n
.. ... ...
.
lN,1 . . . lN,n . . . lN,N −1 1
It is clear that in order for this algorithm to work, one needs to have a(n−1)n,n 6= 0 at each step
(see the definition of li,n ). If this assumption fails at some point, one needs to interchange n-th
row with another row below it before continuing (Pivoting). This is why an LU decomposition
in general looks like A = P LU . (P is a permutation matrix).
Remark If all diagonal entries of U are nonzero, then Ax = b for a ∈ Rn×n has
a unique solution by back substitution and if Ax = ~0, then x = ~0, equivalently
N (A) = {~0} and column vectors of A are linearly independent
22
2.6 Basis and dimension
LEARNING OBJECTIVES FOR THIS SECTION: Basis and dimension of subspace and
Gauss elimination. Examples including N (A), R(A) and Properties and Algorithms.
Definition (1) A basis for a subspace S is a set of linearly independent vectors whose
span is S. The number n of vectors in a basis of the finite-dimensional subspace S is called
the dimension of S and we write dim(S) = n.
(2) The column rank of matrix A is the dimension of the column space of
A = [~v1 | · · · |~vk ],
s = a1 ~v1 + · · · an~vn .
The dimension of the null space N (A) is called the nullity of the matrix, and is related
to the rank of the matrix A by the following equation:
rank(A) + nullity(A) = m,
Then the vectors (2, 1, 0, 0) and (0, 0, 5, 1) are a basis for S. In particular, every vector that
satisfies the above equations can be written uniquely as a linear combination of the two basis
vectors:
(2t1 , t1 , 5t2 , t2 ) = t1 (2, 1, 0, 0) + t2 (0, 0, 5, 1).
The subspace S is two-dimensional. Geometrically, it is the plane in R4 passing through the
points (0, 0, 0, 0), (2, 1, 0, 0), and (0, 0, 5, 1).
EXAMPLE
23
In many applications, it is necessary to find a particular subspace of a vector space
V = R4 . This can be done by finding a set of basis elements of the subspace. For example,
to find all solutions of the system
x1 + x2 + x3 = 0, 2x1 + x2 + x4 = 0
we must find the null space of the matrix
1 1 1 0 1 1 1 0
A= → .
2 1 0 1 0 −1 −2 1
and we have
x1 + x2 + x3 = 0, −x2 − 2x3 + x4 = 0
We choose x3 and x4 as free variables and solve for x1 , x2 ,
x2 = −2x3 + x4 , x1 = −x2 − x3 = −x3 − x4 .
Thus, we obtain a basis of N (A)
−1 −1
−2
and 1 ,
1 0
0 1
which corresponds to x3 = 1, x4 = 0 and x3 = 0, x4 = 1, respectively.
In general we have
Basis for a null space N (A) Recall N (A) = {x ∈ Rn : Ax = ~0} is a subspace of matrix
A ∈ Rm×n . One can use Gauss elimination to find a basis of N (A).
• Use elementary row operations to put A in reduced row echelon form.
• Using the reduced row echelon form, determine which of the variables x1 , x2 , · · · , xk
are free. Write equations for the dependent variables in terms of the free variables.
• For each free variable xi , choose a vector in the null space for which xi = 1 and the
remaining free variables are zero. The resulting collection of vectors is a basis for the
null space of A.
EXAMPLE The standard basis for R3 is {~e1 , ~e2 , ~e3 }; however, there are many bases that we
could choose for R3 .
1 0 0 1 0 2 1 2 3
0 , 1 0 , 1 1 , 0 , 0 , 1 2 ,
0 0 1 1 1 1 0 0 1
Standard Bases We refer to the set {~e1 , ~e2 , ~e3 } as the standard basis for R3 . We refer to
this basis as the standard basis because it is the most natural one to use for representing
vectors in R3 . More generally, the standard basis for Rn is the set {~e1 , ~e2 , ..., ~en } since
x1
~x = ... = x1 ~e1 + · · · + xn ~en
xn
24
The most natural way to represent matrices in R2×2 is in terms of the standard 2 × 2 basis
matrix
a11 a12 1 0 0 1 0 0 0 0
A= = a11 + a12 + a21 + a22
a21 a22 0 0 0 0 1 0 0 1
The standard way to represent a polynomial in Pn is in terms of the standard basis functions
{1, x, x2 , ..., xn }, i.e.,
p(x) = a0 + a1 x + ·an xn
In general, we have
Theorem 1 If {~v1 , · · · , ~vn } is a spanning set for a vector space V , then any collection of m
vectors in V , where m > n, are linearly dependent.
Proof: Let {~u1 , ~u2 , ..., ~um } be m vectors in V where m > n. Then, since {~v1 , · · · , ~vn } span
V, we have
~ui = a1i~v1 + a2,i~v2 + · · · an,i~vn
Thus,
m
X Xn n X
X m
c1~u1 + c2~u2 + · · · cm~um = ci ( aji~vj ) = ( aji ci )~
vj
i=1 j=1 j=1 i=1
This is a homogeneous system with more unknowns than equations. Therefore, the system
must have a nontrivial solution (c1 , c2 , · · · , cm )t . Thus, {~u1 , ~u2 , ..., ~um } are linearly dependent.
Theorem 2 If V is a vector space of dimension n > 0, then
(I) any set of n linearly independent vectors spans V .
(II) any n vectors that span V are linearly independent.
Proof: Suppose that {~v1 , · · · , ~vn } are linearly independent and ~v is any other vector in V .
Since V has dimension n, it has a basis consisting of n vectors and these vectors span V . It
follows from Theorem 1 that {~v1 , · · · , ~vn ~v } must be linearly dependent. Thus there exists
ci ∈ R, 1 ≤ i ≤ n + 1 not all zero such that
To prove (II), suppose that span(~v1 , · · · , ~vn ) = V . If {~v1 , · · · , ~vn } are linearly dependent,
then one of the ~vi s, say ~vn , can be written as a linear combination of the others. i.e.,
dim(V ) < n, which is a contradiction.
25
Theorem 3 The dimension of the sum satisfies the inequality
Here the minimum only occurs if one subspace is contained in the other, while the maximum
is the most general case. The dimension of the intersection and the sum are related:
Then, it rewritten as
~ 1 + · · · + ck w
c1 w ~ k = −(a1~u1 + · · · + am~um + b1~v1 + · · · + bj ~vj ),
~ 1 + · · · + ck w
which shows that c1 w ~ k ∈ W1 ∩ W2 . Since {~u1 ; ~u2 , · · · ~um } is a basis of W1 ∩ W2 ,
~ 1 + · · · + ck w
c1 w ~ k = d1~u1 + · · · + dm~um
Since {~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj } are linearly independent all a0 s and b0 s equal to zero. Thus,
{~u1 ; ~u2 , · · · ~um , ~v1 , · · · , ~vj , w
~ 1, · · · , w
~ k } is a basis of W1 + W2 , which completes the proof.
Remark If W1 ∩ W2 = {~0}, then dim(W1 + W2 ) = dim(W1 ) + dim(W2 ). Notation: In this
case W1 ⊕ W2 .
Find a basis for W = span(~a1 , · · · ~an ). Let
~at1 a1,1 a2,1 · · · am,1
.. .. .. ..
A= . = . . .
~atn a1,n a2,n · · · am,n
26
Using elementary row operations, this matrix is transformed to the row echelon form. Then,
it has the following shape:
c1,1 c1,2 · · · c1,m
.. .. ..
. . .
c c · · · c
q,1 q,2 q,m
0 0 ··· 0
. .. ..
.. . .
0 0 ··· 0
Then,
{~c1 , · · · ~cq } is a basis of W and dim(W ) = q.
Zassenhaus algorithm Algorithm for finding bases for intersection W1 ∩ W2 and sum
W1 + W2 . Assume
W1 = span(~a1 , · · · ~an ), W2 = span(~b1 , · · · ~bk )
subspaces of Rm and let
~bt
~at1 1
A = ... B = .. .
.
~atn ~bt
k
The algorithm creates the following block matrix of size ((n + k) × (2m)) × ((n + k) × (2m)):
27
and
{d~1 , · · · d~` } is a is a basis of W1 ∩ W2 .
EXAMPLE
1 0
−1 , 0
W1 = 0 1
1 −1
and
5 0
0 , 5
W2 = −3 −3
3 −2
of the vector space R4 . Using the standard basis, we create the following matrix of dimension
(2 + 2) × (2 · 4):
1 −1 0 1 1 −1 0 1
0 0
1 −1 0 0 1 −1
.
5 0 −3 3 0 0 0 0
0 5 −3 −2 0 0 0 0
Using elementary row operations, we transform this matrix into the following matrix:
1 0 0 0 ∗ ∗ ∗ ∗
0 1 0 −1 ∗ ∗ ∗ ∗
0 0 1 −1 ∗ ∗ ∗ ∗
0 0 0 0 1 −1 0 1
(some entries have been replaced by ∗ because they are irrelevant to the result). Therefore,
1 0 0
0 , 1 , 0
0 0 1
0 −1 −1
is a basis of W1 + W2 and
1
−1
0
1
is a basis of W1 ∩ W2 .
MATLAB implementation
Given matrix A ∈ Rm×n1 and A ∈ Rm×n2 use matlab LU decomposition:
[L,U,P]=lu([[A’;B’] [A’;0*B’]]); U
where U is a resulting upper triangular form we are looking for. Try it with
A=rand(4,2), B=[sum(A,2) rand(4,1)];
28
2.7 Inverse of matrix A
LEARNING OBJECTIVES FOR THIS SECTION: Linear equation, Inverse of matrix, Gauss
elimination. Nonsingular and Singular matrixes.
Let I = In ∈ Rn×n be the identity matrix = diagonal matrix with diagonal entries are
all one. Then, In A = AIn = A for all A ∈ Rn×n .
Definition (Inverse of matrix A) Let A ∈ Rn×n be a square matrix. A matrix B ∈ A ∈
Rn×n is an inverse of A if
AB = In identity matrix .
and denoted by B = A−1 ., i.e.
AA−1 = In (2.2)
If so, A is non singular.
Recall that
In fact,
Ax = A(A−1 b) = (AA−1 )b = In b = b.
Note that if B̃ ∈ Rn×n satisfies B̃A = I , then
and thus
A−1 A = In . (2.3)
Theorem Inverse of product AB
If A, B ∈ Rn×n are nonsingular , then
(AB)−1 = B −1 A−1
Proof:
(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In .
Definition (Transpose)
The transpose of an m × n matrix A, denoted by At , is the n × m matrix such that the
(j, i)-entry is given by Ai,j
29
In fact, since X X
(AB)i,j = aik bk,j , (B t At )i,j = bk,i ak,j ,
k k
X
((AB)t )i,j = ajk bk,i = (B t At )i,j
k
−1
For B = A
In = Int = (AB)t = B t At ⇒ (At )−1 = B t = (A−1 )t by (2.2)–(2.3).
How to find A−1 by Gauss Elimination Form an attached matrix
[A | In ]
Then apply a Gauss-Jordan deduction and we obtain the reduced matrix (row echelon form)
[U | C],
where U is the reduced upper matrix of A. Then
A−1 = U −1 C.
EXAMPLE Find all values of a such that the matrix
1 0 2
A= −1 1 a
0 a −1
is invertible. Solution: Gauss elimination of A:
1 0 2 1 0 2 1 0 2
−1 1 a → 0 1 a+2 → 0 1 a+2
2
0 a −1 0 a −1 0 0 −a − 2a − 1
implies that
−a2 − 2a − 1 = −(a + 1)2 6= 0 → a 6= −1
EXAMPLE A block matrix formula:
−1 −1
−A−1 BC −1
A B A
=
O C O C −1
where A ∈ Rn×n , B ∈ Rn×m and C ∈ Rm×m . Solution:
−1 −1
−A−1 BC −1
−1
A A A−1 B − A−1 BC −1 C
A A B In O
= =
O C −1 O C O C −1 C O Im
Equivalently, it is equivalent find a solution to
Ax + By = a Cy = b.
i.e., y = C −1 b, x = A−1 (a − BC −1 b) by back substitution. Equivalently,
−1
−A−1 BC −1
x A a
= .
y O C −1 b
Remark A is non singular iff the reduced triangle matrix U has nonzero diagonals,
i.e., U −1 C is carried out by backward substitution for each column vector of C.
30
3 Determinant and Matrix inverse
LEARNING OBJECTIVES FOR THIS Chapter: Determinant, Cramers rule for inverse of
matrix A. Cofactors and minors of A. Inverse matrix, Properties of Determinant. Alterna-
tive to Gasss-Jordan reduction to upper triangular matrix
In linear algebra, the determinant is a scalar value that can be computed from the
elements of a square matrix and encodes certain properties of the linear transformation
described by the matrix. The determinant of a matrix A is denoted det(A) or |A|. Geomet-
rically, it can be viewed as the volume scaling factor of the linear transformation described
by the matrix. This is also the signed volume of the n-dimensional parallelepiped
spanned by the column or row vectors of the matrix. The determinant is positive or
negative according to whether the linear transformation preserves or reverses the orientation
of a real vector space.
a b
|A| = = ad − bc.
c d
a b c
e f d f d e
|A| = d e f = a −b +c
h i g i g h
g h i
= aei + bf g + cdh − ceg − bdi − af h.
Each determinant of a 2 × 2 matrix in this equation is called a minor of the matrix A. This
procedure can be extended to give a recursive definition for the determinant of an n × n
matrix, the Laplace expansion.
The following scheme (rule of Sarrus) for calculating the determinant of a 3×3 matrix, the
sum of the products of three diagonal north-west to south-east lines of matrix elements, minus
the sum of the products of three diagonal south-west to north-east lines of elements, when
the copies of the first two columns of the matrix are written beside it as in the illustration:
31
Definition (Determinant) The determinant of an n × n matrix A, denoted det(A), is a
scalar associated with the matrix A that is defined inductively as
a11 if n = 1
det(A) =
a11 A11 + a12 A12 + · · · + a1n A1n if n > 1.
where
Ai,j = (−1)1+j det(Mij ).
Laplace expansion expresses the determinant of a matrix in terms of its minors. The
cofactor Aij is defined to be the determinant of the (n−1)×(n−1)-matrix minor Mij that re-
sults from A by removing the i-th row and the j-th column. The expression (−1)i+j det(Mij )
is known as a cofactor.
Equivalent Definition (Leibniz formula) The determinant of a n × n matrix A is the
scalar quantity X
det(A) = sign(φ)a1φ(1) a2φ(2) · · · anφ(n)
φ∈Sn
where Sn is all permutations of indices (1, 2, · · · , n) and sign(φ) is the sign of permutation
(reordering) φ. If φ requires s interchanges of indices (1, 2, · · · , n), then sign(φ) = (−1)s .
In fact, we have
Pn P
= k=1 ak1 φ(1:k−1:k+1:n)∈Sn−1 sign(φ)a1φ(1) · · · a1φ(k−1) a1φ(k+1) · · · an,φ(n)
S3 = {(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)}
Sn contains n! elements..
32
Remark (column-wise)
det(A) = a11 A11 + a21 A21 + · · · + an1 An1 .
Remark (Expansion at k the row (column) By interchanging the first row (column) with the
kth row (column of A), we have
det(A) = −(ak1 Ak1 + ak2 Ak2 + · · · + akn Akn )
33
can be computed using the following matrices (Gauss eliminations):
−2 2 −3 −2 2 −3 −2 2 −3
B= 0 0 4.5 , C = 0
0 4.5, D = 0 2 −4.
2 0 −1 0 2 −4 0 0 4.5
Here, B is obtained from A by adding −1/2× the first row to the second, so that det(A) =
det(B). C is obtained from B by adding the first to the third row, so that det(C) = det(B).
Finally, D is obtained from C by exchanging the second and third row, so that det(D) =
− det(C). The determinant of the (upper) triangular matrix D is the product of its entries
on the main diagonal: (−2) · 2 · 4.5 = −18. Therefore, det(A) = − det(D) = +18.
Remark If row vectors of A are linearly dependent, det(A) = 0, Conversely, if row vectors of
A are linearly independent if and if det(A) 6= 0.
Definition A matrix A ∈ Rn×n is singular if det(A) = 0, otherwise is non-singular.
Theorem 1 If A is an n × n matrix, then det(At ) = det(A).
Theorem 2 det(cA) = cn det(A).
Theorem 3 For all elementary operations E, det(EA) = det(E) det(A) and
where Ek are elementary row operation and Ẽj are elementary column operations. That is,
U t = Ẽ1t · · · Ẽn−1
t
diag(U ),
−1 0 −5 2
34
9 0 6
Then, expanding by the third row, we get det(A) = −2 × det 1 2 −3 and by the
−1 0 2
9 6
second column, det(A) = −2 × 2 × det = −96.
−1 2
b = x 1 a1 + x 2 a2 + · · · + x n an .
or equivalently
1
A−1 = adj(A)
det(A)
where the adjugate matrix adj(A) is the transpose of the matrix of the cofactors, that is,
In fact i-th column ~xi of A−1 equals to A−1~ei where ~ei is the i the unit vector. By Cramer’s
rule, the j the coordinate of ~xi is given by
det(Aj ) 1
(A−1 )ji = = (−1)i+j Aij .
det(A) det(A)
Note that the adjugate matrix adj(A) is the transpose of the cofactors of A.
The rule for 3 × 3 case:
a1 x + b 1 y + c 1 z = d 1
a2 x + b 2 y + c 2 z = d 2
a3 x + b 3 y + c 3 z = d 3
35
Then the values of x, y and z can be found as follows:
d1 b1 c1 a1 d1 c1 a1 b1 d1
d2 b2 c2 a2 d2 c2 a2 b2 d2
d3 b3 c3 a3 d3 c3 a3 b3 d3
x= , y= , and z = .
a1 b1 c1 a1 b1 c1 a1 b1 c1
a2 b2 c2 a2 b2 c2 a2 b2 c2
a3 b3 c3 a3 b3 c3 a3 b3 c3
EXAMPLE
a b
A= .
c d
Suppose det(A) = ad − bc 6= 0,
−1 1 d −b
A = .
ad − bc −c a
EXAMPLE
A B
det = det(A) det(C)
O C
where A ∈ Rn×n , B ∈ Rn×m and C ∈ Rm×m . Solution: Apply the Gauss elimination to A
and C to obtain a upper triangular matrix
U1 B̃
O U2
4 Linear Transform
LEARNING OBJECTIVES FOR THIS CHAPTER Fundamental Theorem of Linear Maps
Matrix representation and Change of basis and Similarity transform, Inverse map, Injective
and Surjective map.
Let V and W be vector spaces with scalars coming from the same field F . A mapping
T : V → W is a linear transformation if for any two vectors x1 and x2 in V and any scalar
a1 , a2 ∈ F , the following are satisfied:
36
Definition (Composition of linear transformations) Let T1 ∈ L(V, W ) and T2 ∈ L(W, U ). We
define a transformation T2 T1 : V → U by (T2 T1 )(u) = T2 (T1 (u)) for u ∈ V In particular, we
define T 2 = T T and T i+1 = T i T for i > 2.
EXAMPLE (Matrix) V = Rn and W = Rm and T (x) = Ax for A ∈ Rm×n .
d
EXAMPLE (Derivative) T1 = dx
= D derivative and V = C 1 (a, b) and W = C(a, b)
d d d
(a1 f1 + a2 f2 ) = a1 f1 + a2 f2 .
dx dx dx
Rx
EXAMPLE (Integration) T2 f = 0 f dx integral and V = C(a, b) and W = C 1 (a, b):
Z x Z x Z x
(a1 f1 + a2 f2 ) dx = a1 f1 dx + a2 f2 dx..
0 0 0
d
Rx
Since (
dx 0
f dx) = f (x), we have
d
EXAMPLE (Composite of Derivative and Multiplication) (T f )(x) = x dx f . V = C 1 (a, b), W =
C(a, b).
EXAMPLE (Shift) Let V = C(R), the space of continuous functions. Every α ∈ R gives
rise to two linear maps, shift Sα : V → V , Sα (f ) = f (x − α) and evaluation Eα V → R,
Eα (f ) = f (α).
Isomorphism identifying V with dim(V ) = n with Rn . Assume dim(V ) = n and {~v1 , · · · , ~vn }
is a linearly independent basis, i.e., every vector ~v ∈ V is uniquely represented by
~v = a1 ~v1 + · · · + an ~vn
That is. ~v ∈ V corresponds to exactly one such column vector (a1 , · · · , an )t in Rn , and vice
versa. That is, for all intents and purposes, we have just identified the vector space V with
the more familiar space Rn .
37
EXAMPLE {1, x, x2 } is the standard basis of P2 :
a
V = P2 : a + bx + cx2 → b ∈ R3
c
since
Corollary If T L(Rn , Rm ) then T (x) = Ax if the jth column vector ~aj of A is given by
~aj = T (~ej ), j = 1, · · · , n.
d
EXAMPLE Consider the linear transformation D : P2 → P1 that sends f to dx . Then, the
matrix representation A of D, V = P2 and W = P1 with the standard basis {1, x, x2 } is
given by
a
0 1 0 2×3 0 1 0 b
A= ∈R , b = .
0 0 2 0 0 2 2c
c
d
This represents the fact that dx
(a + bx + cx2 ) = b + 2cx.
Rx
EXAMPLE Consider the integral map T2 : P2 → P3 that sends f to 0 f dx. Then, the
matrix representation A of T2 , V = P2 and W = P3 with the standard basis {1, x, x2 , x3 } is
given by
0 0 0 0 0 0 0
1 0 0 1 0 0 a
b = 1a
4×3
A= 0 1 0
∈ R ,
0 1 0 b
2 2 c 2
0 0 31 0 0 13 1
3
c
Rx
This represents the fact that 0 (a + bx + cx2 ) = ax + 21 bx2 + 13 cx3 .
38
d
EXAMPLE Consider the integral map T : P3 → P3 that sends f to x dx f . Then, the matrix
representation A of T2 , V = P3 and W = P3 with the standard basis {1, x, x2 , x3 } is given
by
0 0 0 0 0 0 0 0 a 0
0 1 0 0
∈ R4×4 , 0 1 0 0 b = b
A= 0 0 2 0 0 0 2 0 c 2c
0 0 0 3 0 0 0 3 d 3d
d
This represents the fact that x dx (a + bx + cx2 + dx3 ) = bx + 2cx2 + 3dx3 .
EXAMPLE T : R2 → R2 , T = Rθ is a rotation by θ anti-clockwise about the origin. Since
T (1, 0) = (cos θ, sin θ) and T (0, 1) = (− sin θ, cos θ),
α 1 0 α cos θ − β sin θ
T = αT + βT = ,
β 0 1 α sin θ + β cos θ
so the matrix using the standard bases is
cosθ − sin θ
A= .
sin θ cos θ
Now clearly Rθ followed by Rφ is equal to Rθ+φ . Thus
cosφ − sin φ cosθ − sin θ
Rφ Rθ =
sin φ cos φ sin θ cos θ
cos φ cos θ − sin φ sin θ − cos φ sin θ − sin φ cos θ
=
sin φ cosθ + cos φ sin θ −sinφ sin θ + cos φ cos θ
cos(φ + θ) − sin(φ + θ)
= Rφ+θ =
sin(φ + θ) cos(φ + θ)
which derives the addition formulae for sin and cos.
EXAMPLE T : P 2 → R2 , T is evaluation T f = (f (0), f (1))t ∈ R2 Then, with the standard
basis {1, x, x2 } of P2
2 t 1 0 0
T (a + bx + cx ) = (a, a + b + c) ⇒ A = .
1 1 1
T2 f = T2 g → f = DT2 f = DT2 g = g.
39
Definition T is surjective (onto) R(T ) = W . For all w ∈ W , there exits v ∈ V such
that w = T v.
EXAMPLE The differentiation map D : P5 → P5 is not surjective, because the polynomial
x5 is not in the range of D. However, the differentiation map D : P5 → P4 is surjective.
Theorem A linear map T is invertible if and only if it is injective and surjective.
Proof Suppose T is injective and surjective. We want to prove that T is invertible. For each
w ∈ W , define Sw to be the unique element of V such that T Sw = w (the existence and
uniqueness of such an element follow from the surjectivity and infectivity of T ). Clearly T S
equals the identity map on W . To prove that ST equals the identity map on V , Let v ∈ V .
Then
T (ST v) = (T S)(T v) = I(T v) = T v.
This equation implies that ST v = v (because T is injective). Thus ST equals the identity
map on V . To complete the proof, we need to show that S is linear. To do this, suppose
w1 , w2 ∈ W . Then
T (Sw1 + Sw2 ) = T (Sw1 ) + T (Sw2 ) = w1 + w2
Thus, Sw1 + Sw2 is unique element of V that T maps to w1 + w2 . By the definition of S,
this implies that S(w1 + w2 ) = w1 + w2 . Hence S satisfies the additive property. Also, If
w ∈ W and c ∈ R
T (cSw) = cT (Sw) = c w.
Thus, S(cw) = cSw. Hence S is linear.
40
Then,
T (c1~v1 + · · · + cn~vn ) = 0
and c1~v1 + · · · cn~vn ∈ N (T ). Since {~u1 , · · · , ~um } be a basis of N (T )
y = P x ⇔ x = P −1 y.
A transformation
A → P −1 AP
is called a similarity transformation or conjugation of the matrix A. In the general linear
group, similarity is therefore the same as conjugacy, and similar matrices are also called
conjugate.
Theorem (Change of Basis) Let E = {~v1 , · · · , ~vn } and F = {w ~ n } be two ordered
~ 1 , ..., w
bases for a vector space V , and let T : V → V be a linear operator Let P be the transition
matrix representing the change from F to E. If A is the matrix representing T with respect
to E, and B is the matrix representing T with respect to F , then B = P −1 AP . (A : E → E
and B : F → F are matrix representation of T : V → V )
Proof: Let x be any vector in W and let
w ~ 1 + · · · + xn w
~ = x1 w ~n
~v = y1 ~v1 + · · · + yn ~vn
P −1 AP x = P −1 Ay = P −1 t = z = Bx,
41
which implies
P −1 AP x = Bx for all x ∈ Rn .
EXAMPLE Let E = {1, x, x2 } be the standard basis to P2 . Another basis forP2 is F =
1 1 0
2
{x + 1, x − 1, 2x }. Since the transformation matrix from F to E is −1 1 0 and thus
0 0 2
the transformation matrix P from E to F is
−1
1/2 −1/2 0 1 1 0
1/2 1/2 0 = −1 1 0
0 0 1/2 0 0 2
Consider the element f = a + bx + cx2 ∈ P2 . This represents the fact that f can also be
written as a+b
2
(x + 1) + b−a
2
(x − 1) + 2c (2x2 ).
When defining a linear transformation, it can be the case that a change of basis can
result in a simpler form of the same transformation. For example, the matrix representing
a rotation in R3 when the axis of rotation is not aligned with the coordinate axis can be
complicated to compute. If the axis of rotation were aligned with the positive z-axis, then
it would simply be
cos θ − sin θ 0
P = sin θ cos θ 0,
0 0 1
where θ is the angle of rotation.
5 Eigenvalues
LEARNING OBJECTIVES FOR THIS CHAPTER invariant subspaces, eigenvalues, eigen-
vectors, and eigenspaces, diagonarization and Jordan form, solution to linear ordinary dif-
ferential equations. Markov Chain transition matrix.
Definition (Invariant subspace) Suppose T ∈ L(V, V ). A subspace U of V is called
invariant under T if u ∈ U implies T u ∈ U .
The null space and range space of a linear transformation, are prominent examples of
invariant subspaces. More importantly, a specific case of the invariant subspace is as follows.
An eigenvalue λ ∈ C of an n × n matrix A satisfies
42
(e) det(A − λI) = 0.
Thus, λ satisfies the characteristic equation
χ(λ) = det(λ I − A) = 0.
and there exist n eigenvalues {λi } (including algebraic multiplicities) of A. Complex eigen-
values λ of A ∈ Rn×n appear in complex conjugate pair λ = α ± iβ. Thus,
χ(λ) = (λ − λ1 ) × · · · × (λ − λn )
a b
EXAMPLE Consider A = ∈ R2×2 . Then,
c d
a−λ b
det(A − λ I) = = (a − λ)(d − λ) − bc = λ2 − (a + d)λ + ad − bc = 0
c d−λ
Theorem Let A and B be n × n matrices. If B is similar to A, then the two matrices have
the same characteristic polynomial and, consequently, the same eigenvalues.
Proof: B = P −1 AP and
det(B − λI) = det(P (A − λI)P −1 ) = det(P ) det(A − λI) det(P −1 ) = det(A − λI).
P −1 AP = Λ = diag(λ1 , · · · , λn ),
then λi are n eigenvalues of A and each column vector ~vi of P is an eigenvector corresponding
to λi , i.e., A~vi = λi~vi . Even, if A has a repeated eigenvalue λ with algebraic multiplicity
r > 1, A has linear independent r eigenvectors corresponding to λ.
Theorem If eigenvalues {λi } of A are distinct, there exist corresponding eigenvectors {~v1 , · · · ~vn }
are linear independent and A is diagnosable.
43
Proof We prove this by induction on r. It is true for r = 1, because eigenvectors are non-zero
by definition. For r > 1, suppose that for some a1 , · · · , ar we assume
a1~v1 + a2~v2 + · · · + ar~vr = 0.
Then, applying A to this equation gives
a1 λ1~v1 + · · · + ar λr~vr = 0.
Now, subtracting λ1 times the first equation from the second gives
a2 (λ2 − λ1 )~v2 + · · · + ar (λr − λ1 )~vr = 0.
By the inductive hypothesis, {~v2 , ..., ~vr } are linearly independent, so ak (λk − λ1 ) = 0 and
thus ak = 0, k > 1 and also a1 = 0. Thus, {~v1 , ..., ~vr } are linearly independent.
If we let P be matrix whose column vectors consist of eigenvectors {vi }:
P = [~v1 | · · · |~vn ],
which are linearly independent, then A~vi = λi ~vi , 1 ≤ i ≤ n is written as a matrix identity
AP = P Λ, Λ = diag(λ1 , · · · , λn ) ⇔ P −1 AP = Λ.
That is, A is similar to a diagonal matrix Λ
Remark (1) A is diagonalizable does not mean A has distinct eigenvalues. For example,
A = I2 has a repeated eigenvalue λ = 1. But A is diagonal.
n×n 1 1
(2) In general A ∈ R need not be diagonalizable Consider A = . The character-
0 1
istic polynomial is χ(λ) = (λ − 1)2 , so there is a repeated eigenvalue λ = 1. The eigenvector
equations
0 1
(A − I)~v = ~v = 0
0 0
1
has a single solution ~v1 = c where c is arbitrary. That is, A is not diagnosable. To
0
proceed we will introduce the generalized eigenvectors so that one can complete the similarity
P to a Jordan canonical form.
(3) Real 2 × 2 canonical form for complex conjugate eigenvalue case. Assume A ∈
R2×2 has a complex conjugate eigenvalue λ = a ± i b. Let ~v = ~v1 + i ~v2 be a corresponding
eigenvector,
A~v = A~v1 + i A~v2 = λ ~v = a~v1 − b~v2 + i b~v1 + a~v2
and thus equating real part and imaginary part,
A~v1 = a~v1 − b~v2 , A~v2 = b~v1 + a~v2
Equivalently, if we let P = [~v1 , ~v2 ] we have
−1 a b
P AP = = real canonical form.
−b a
44
1 1
EXAMPLE Consider A = . Then, we have
−2 3
(1 − λ)(3 − λ) + 2 = λ2 − 4λ + 1 = 0 ⇒ λ = 2 ± i.
and
1 − (2 ± i) 1 1 1 0
~v = 0 ⇒ ~v = c and P =
−2 3 − (2 ± i) 1±i 1 1
d
x(t) = cλeλt~v = Ax(t)
dt
If A is diagnosable, then {~v1 , · · · , ~vn } are linearly independent and there exist unique (c1 , · · · , cn )
such that
x0 = c1 p~1 + · · · + cn p~n
and thus
x(t) = c1 eλ1 t p~1 + · · · cn eλn t p~n (Superposition Principle).
Equivalently, x(t) = P eΛt P −1 x0 , where P defines a change of basis of Rn .
EXAMPLE 1 Consider the 2 × 2 system
d −8 −5
~x(t) = ~x(t)
dt 10 7
45
EXAMPLE 2 Consider
d 1 1
x= x(t)
dt 0 1
The characteristic polynomial is χ(λ) = (λ − 1)2 , so there is a repeated eigenvalue λ = 1.
The eigenvector equations
0 1
(A − I)~v = ~v = 0
0 0
1
has a single solution ~v1 = c where c is arbitrary. One needs to find second one. Using
0
this eigenvector, we compute the generalized eigenvector ~v2 by solving
(A − λ I)~v2 = ~v1 .
This gives
0
~v2 = .
1
Thus, if P = [~v1 |v~2 ] then
λ 1
AP = P .
0 λ
Note that
0 1 0 1
~v1 = (A − λ I)~v2 = =
0 0 1 0
and
0 0
(A − λ I) ~v2 = (A − λ I)~v1 = ~0 and (A − λ I) =
2 2
.
0 0
Also, we have
t 1 t 0
x(t) = c1 e + c2 (1 + t)e .
0 1
since ((1 + t)et )0 = (1 + t)et + et .
EXAMPLE 3 This example is more complex than Example 1. A upper triangular matrix A:
1 0 0 0 0
3 1 0 0 0
A= 6 3 2 0 0
10 6 3 2 0
15 10 6 3 2
46
x1 is the ordinary eigenvector associated with λ1 = 1 and x2 s a generalized eigenvector
associated with λ2 = 2. y1 is the ordinary eigenvector associated with λ2 = 2, y2 , y3 are
generalized eigenvectors associated with λ2 .
0 0 0 0 0 0 0
3 0 0 0 0 3 0
(A − 1 I)x1 = 6 3 1 0 0−9 = 0 = 0,
10 6 3 1 0 9 0
15 10 6 3 1 −3 0
0 0 0 0 0 1 0
3 0 0 0 0−15 3
(A − 1 I)x2 = 6 3 1 0 0 30 = −9 = x1 ,
10 6 3 1 0 −1 9
15 10 6 3 1 −45 −3
−1 0 0 0 0 0 0
3 −1 0 0 00 0
(A − 2 I)y1 =
6 3 0 0 00 = 0 = 0,
10 6 3 0 00 0
15 10 6 3 0 9 0
−1 0 0 0 0 0 0
3 −1 0 0 00 0
(A − 2 I)y2 =
6 3 0 0 0 0 = 0 = y1 ,
10 6 3 0 03 0
15 10 6 3 0 0 9
−1 0 0 0 0 0 0
3 −1 0 0 0 0 0
(A − 2 I)y3 = 6 3 0 0 0 1 = 0 = y2 .
10 6 3 0 0−2 3
15 10 6 3 0 0 0
This results in a basis for each of the generalized eigenspaces of A. Together the two chains
of generalized eigenvectors span the space of all 5-dimensional column vectors.
0 1
0 0 0
3 −15
00 0
{x1 , x2 } = −9 30 , {y1 , y2 , y3 } = 00 1 .
−1 3−2
9 0
−3 −45 9 0 0
47
1 1 0 0 0
0 1 0 0 0
0
J = 0 2 1 0,
0 0 0 2 1
0 0 0 0 2
where P is a generalized eigen matrix for A, the columns of P are a canonical basis for A,
and AP = P J.
In general A is diagnosable if and only if the sum of the dimensions of the eigenspaces is n.
Or, equivalently, if and only if A has n linearly independent eigenvectors. Not all matrices
are diagonalizable; matrices that are not diagonalizable are called defective matrices. In
addition to the above examples consider the following matrix:
5 4 2 1
0 1 −1 −1
A= .
−1 −1 3 0
1 1 −1 2
Including multiplicity, the eigenvalues of A are λ = 1, 2, 4, 4. The dimension of the eigenspace
corresponding to the eigenvalue λ = 4 is 1 (and not 2), so A is not diagonalizable. However,
there is an invertible matrix P such that J = P −1 AP , where
1 0 0 0
0 2 0 0
J = .
0 0 4 1
0 0 0 4
The matrix J is almost diagonal. This is the Jordan normal form of A. For i = 1, 2, 3 there
exits a eigenvector pi ∈ N (λi I − A). For a repeated (algebraic) eigenvalue λ3 = λ4 = 4
(4 I − A) does not have two independent eigenvectors. But, there exists p4 ∈ N (λ4 I − A)2 )
satisfying
(λ4 I − A)p4 = p3 ,
where p3 is an eigenvector of A corresponding to λ3 = λ4 = 4. p4 are called a generalized
eigenvector of A.
2 −2 1
Exercise Find eigenvalues and eigenvectors of A = 1 −2 1 .
1 −3 2
Solution:
2−λ −3 1
det 1 −2 − λ 1 = (2−λ)((−2−λ)(2−λ)+3)+3(2−λ)−3−3−(−2−λ) = −λ(λ2 −2λ+1) = 0
1 −3 2−λ
Thus, the eigenvalues are λ1 = 0, λ2 = λ3 = 1. For λ = 0
2−λ −3 1 1
1 −2 − λ 1 ~v1 = 0 ⇒ ~v1 = 1 .
1 −3 2−λ 1
48
For λ = 1
1 −3 1 1 −3 1
1 −3 1 ~v = 0 0 0 ~v = 0,
1 −3 1 0 0 0
3 −1
which implies that ~v2 = 1 , ~v3 =
0 . Thus, A is diagonizable.
0 1
AU = ΛU ⇔ A = U ΛU t
By Weierstrass theorem. there exist a maximizer u and by the Lagrange multipliers theorem
L(x, λ) = f (x) + λ (1 − |x|2 ) satisfies
1
Lx (u, λ) = Au − λu = 0 and |u|2 = 1
2
49
Therefore Au = λ u and |u| = 1. For every unit length eigenvector u of A its eigenvalue is
f (u), so λ is the largest eigenvalue of A. The same calculation performed on the orthogonal
complement of u, i.e., {x ∈ Rn : (x, u) = 0} gives the next largest eigenvalue of A, and so.
That is, we obtain eigen pairs (λi , ui ) such that λ1 ≥ λ2 ≥ · · · ≥ λn and {ui } is orthonormal
i.e. (ui , uj ) = δi,j .
Theorem (Jordan form A = P JP −1 ) Given an eigenvalue λ, its corresponding Jordan
block gives rise to a Jordan chain. The generator, or lead vector, say pr , of the chain is a
generalized eigenvector such that (A − λ I)r pr = 0, where r is the size of the Jordan block.
The vector p1 = (A − λI)r−1 pr is an eigenvector corresponding to λ. In general, pi is a
preimage of pi−1 under A − λI, i.e., (A − λ I)pi = pi−1 . So the lead vector generates the
chain via multiplication by (A − λ I). Thus, AP = P Ji for each Jordan chain. Therefore, the
statement that every square matrix A can be put in Jordan normal form is equivalent to the
claim that there exists a basis consisting only of eigenvectors and generalized eigenvectors
of A.
and
x(t) = eAt x(0)
defines the solution to the differential equation. If A = P −1 BP , then
eAt = P −1 eBt P
50
Moreover, Let f (z) be an analytical function of a complex argument. Applying the
function on a n × n Jordan block J with eigenvalue λ results in an upper triangular matrix:
00
(n−1)
f (λ) f 0 (λ) f 2(λ) ... f (n−1)!(λ)
(n−2)
f (λ) f 0 (λ) ... f (n−2)!(λ)
0
f (J) =
... .
.. . .. . .. .
..
,
0
0 0 0 f (λ) f (λ)
0 0 0 0 f (λ)
(k)
so that the elements of the k-th super-diagonal of the resulting matrix are f k!(λ) . For a
matrix of general Jordan normal form the above expression shall be applied to each Jordan
block. The following example shows the application to the power function f (z) = z n :
n n n n−1 n n−2
λ1 1 0 0 0 λ1 1 λ1 λ
2 1
0 0
n
0 λ1 1 0 0
0
λn1 1
λ1n−1 0 0
0 0 λ1 0 0 = 0 n
0 λ1 0 0
n−1 ,
n
n
0 0 0 λ2 1 0 0 0 λ2 1 λ2
0 0 0 0 λ2 0 0 0 0 λn2
k
Y
n n+1−i
where the binomial coefficients are defined as k
= .
For integer positive n it re-
i
i=1
duces to standard definition of the coefficients. For negative n the identity −n = (−1)k n+k−1
k k
may be of use.
Real Jordan Form decomposition A = P JP −1 . The real Jordan block is given by
Ci I
.
Ci . .
Ji = .
...
I
Ci
where for non-real eigenvalue ai + ibi with given algebraic multiplicity of the 2 × 2 matrix
form
ai −bi
Ci = .
b i ai
This real Jordan form is a consequence of the complex Jordan form. For a real matrix the
nonreal eigenvectors and generalized eigenvectors can always be chosen to form complex
conjugate pairs. Taking the real and imaginary part (linear combination of the vector and
its conjugate), the matrix has this form with respect to the new basis.
Real Schur decomposition For A ∈ Rn×n one can always write A = U SU t where U ∈ Rn×n
is a real orthogonal matrix, U t U = In , S is a block upper triangular matrix called the real
Schur form. The blocks on the diagonal of S are of size 1 × 1 (in which case they represent
real eigenvalues) or 2 × 2 (in which case they are derived from complex conjugate eigenvalue
pairs). QR-algorithm is used to obtain S and U .
51
Basic QR-algorithm Let A0 = A. At the k-th step (starting with k = 0), we compute the
QR decomposition Ak = Qk Rk We then form Ak+1 = Rk Qk . Note that
Ak+1 = Rk Qk = Q−1 t
k Ak Qk = Qk Ak Qk ,
so all the Ak are similar to A and hence they have the same eigenvalues. The algorithm is
numerically stable because it proceeds by orthogonal similarity transforms. Let
be the orthogonal and triangular matrices generated by the QR algorithm, Then, we have
52
EAMPLE Suppose in small town there are three places to eat, two restaurants one Chinese
and another one is Mexican restaurant. The third place is a pizza place. Everyone in town
eats dinner in one of these places or has dinner at home. Assume that 20% of those who
eat in Chinese restaurant go to Mexican next time, 20% eat at home, and 30% go to pizza
place. From those who eat in Mexican restaurant, 10% go to pizza place, 25% go to Chinese
restaurant, and 25% eats at home next time. From those who eat at pizza place 30% Those
who eat at home 20% go to Chinese, 25% go to Mexican place, and 30% to pizza place. We
call this situation a system. A person in the town can eat dinner in one of these four places,
each of them called a state. In our example, the system has four states. We are interested
in success of these places in terms of their business. So, transition matrix for this example
above, is
.25 .20 .25 .30
.20 .30 .25 .30
A= .25 .20 .40 .10
This suggests that the state vector approached to some fixed vector, as the number of
observation periods increase. In fact, the eigenvalues of A are 1.0000, −0.0962, 0.0774, 0.2688
and the first eigen state is (0.2495, 0.2634, 0.2339, 0.2532), which is the asymptotic probability
vector limn→∞ ~x(n) , independent of ~x(0) .
This is not the case for every Markov Chain. For example, if
0 1
A= .
1 0
53
Theorem If a Markov chain with an n × n transition matrix A converges to a steady-state
vector x, then
(i) x is a probability vector.
(ii) λ1 = 1 is an eigenvalue of A and x is an eigenvector corresponding to λ = 1.
(iii) If λ1 = 1 is a dominant eigenvalue of a (left) stochastic matrix A (i.e., |λi | < 1, i ≥ 2)
, then the Markov chain with transition A will converge to a steady-state vector.
Proof: Since ki=1 ai,j for all j, λ1 = 1 is an eigenvalue. Next, if x is a probability vector, so
P
is y = Ax since
Xk k X
X k Xk X k k
X
yi = ai,j xj = ( ai,j )xj = xj = 1
i=1 i=1 j=1 j=1 i=1 j=1
and
An~x0 = a1~v1 + a2 (λ2 )n + · · · ak (λk )n~vi → a1~v1 .
x · y = (x, y) = x1 y1 + x2 y2 + · · · + xn yn
Inner product space An inner product space is a vector space V over the field F = R together
with a map
h·, ·i : V × V → F
54
called an inner product that satisfies the following conditions for all vectors x, y, z ∈ V and
all scalars a:
hax, yi = ahx, yi
hx + y, zi = hx, zi + hy, zi
and
hx, yi = hy, xi, hx, xi > 0 for x 6= 0.
R1
EXAMPLE V = C(−1, 1) and hx, yi = −1 x(t)y(t) dt.
R1 1
EXAMPLE V = Pn and htk , tj i = −1 tk tj dt = k+j+1 (1 − (−1)k+j+1 ).
Geometrically, we have the norm and cosine angle,
p p
k~xk = (~x, ~x) = x21 + · · · + x2n
(~x, ~y )
cos(θ) =
k~xkk~y k
where k~xk is the norm of ~x and θ is the angle between vectors ~x and ~y . Since for
all t ∈ R
0 ≤ k~x + t~y k2 = (~x + t~y , ~x + t~y ) = (~x, ~x) + 2t(x, y) + t2 (y, y) = k~xk2 + 2t(x, y) + t2 kyk2
|(~x, ~y )| ≤ k~xkk~y k.
Thus,
k~x + ~y k2 ≤ k~xk2 + 2(~x, ~y ) + k~y k2 ≤ k~xk2 + 2k~xkk~y k + k~y |2 = (k~xk + k~y k)2
Thus, (Rn , k · k) is a normed space (kxk = 0 iff ~x = 0 and kc~xk = |c|k~xk for all ~x ∈ Rn and
c ∈ R).
In mathematics, particularly linear algebra and numerical analysis, the Gram-Schmidt
process is a method for orthonormalizing a set of vectors in an inner product space, most
commonly the Euclidean space Rn equipped with the standard dot product. The Gram-
Schmidt process takes a finite, linearly independent set S = {~v1 , ..., ~vk } for k ≤ n and
generates an orthogonal set S̃ = {~u1 , · · · , ~uk }, (~ui , ~uj ) = 0, i 6= j that spans the same
k-dimensional subspace S of Rn .
The method is named after Jorgen Pedersen Gram and Erhard Schmidt, but Pierre-
Simon Laplace had been familiar with it before Gram and Schmidt. In the theory of Lie
group decompositions it is generalized by the Iwasawa decomposition. The application of
the Gram-Schmidt process to the column vectors of a full column rank matrix yields the QR
decomposition (it is decomposed into an orthogonal and a triangular matrix).
55
We define the projection operator by
hv, ui
proju (v) = u,
hu, ui
i.e., this operator projects the vector v orthogonally onto the line spanned by vector u since
The sequence {~u1 , · · · , ~uk } is the required system of orthogonal vectors, and the normalized
vectors {ũ1 , · · · , ũk } form an orthonormal set. Equivalently,
h~u, ~uj i
~u = a1 ~u1 + · · · + an ~un with aj = ,
h~uj , ~uj i
56
is the orthogonal complement of a subspace S of Rn .
Theorem (The Orthogonal Decomposition Theorem) Let S be a subspace of Rn . Then
each x ∈ Rn can be uniquely represented in the form
If for x ∈ Rn define
x̂ = (x, ~u1 ) ~u1 + · · · + (x, ~un ) ~un ∈ S
then z = x − x̂ ∈ S ⊥ since (z, ~ui ) = (x, ~ui ) − (x̂, ~ui ) = 0. The decomposition is unique since
if there exit tow decompositions of x
x = x̂1 + z1 = xˆ2 + z2
then
x̂1 − xˆ2 = z2 − z1 ∈ S ∩ S ⊥ ⇒ x̂1 − xˆ2 = z2 − z1 = 0.
Rn = N (A) ⊕ R(At ).
57
6.2 Generalized inverse
Minimum norm solution Similarly, we have
that minimizes the error kAx − bk2 . Note that if x ∈ N (At A) then (x, At Ax) = |Ax|2 = 0
and x = 0, i.e., At A is nonsingular.
Generalized inverse of A Consider the regularized least squares formulation for α > 0
x∗ = (At A + αI)−1 At b.
Note that if x ∈ N (At A + αI) then (x, (At A + αI)x) = |Ax|2 + |x|2 = 0 and x = 0, i.e.,
At A + αI is nonsingular. In fact for x̃ ∈ Rn
where
Thus, we have
J(x̃) ≥ J(x∗ )
whose equality if and only if x̃ = x∗ .
58
If {~u1 , · · · , ~un } is an orthonormal basis of S then it follows from the orthogonal decomposition
theory
s∗ = h~u, ~u1 i ~u1 + · · · + h~u, ~un i ~un
is the best approximation of ~u. In general,
where
(At A)ij = h~ui , ~uj i, At b = (h~u, ~u1 i, · · · , h~u, ~un i)t .
EXAMPLE 1 (Polynomial approximation) Let S be the subspace P1 of all linear func-
tions in C[0, 1]. Although the functions 1 and√x span S, they are not orthogonal. By
the Gram-Schmitz orthogonalization u2 (x) = 12(x − 12 ) is orthogonal to u1 = 1, i.e.
{u0 (x), u1 (x)} is an orthonormal basis of P1 . Thus, the best linear approximation u(x) = ex
is given by a1 + a2 u2 (x) with
Z 1
x
Z 1 √
e u1 (x) dx = e − 1, a2 = ex u2 (x) dx = 3(3 − e).
0 0
R1 1
Next, let S = P3 . The, we evaluate matrix Qkj = 0 xk xj dx = k+j+1 (1 − (−1)k+j+1 ) and
R1
vector ck = 0 ex xk−1 dx, k = 1, 2, 3.4. Then, the best cubic approximation is given by
a1 + a2 x + a3 x2 + a4 x3
where a ∈ R4 solves Qa = c.
EXAMPLE 2 (Fourie cosine series) Let V be a space of even functions in C[−π, π]. and
S = { √12 , cos(x), cos(2x), · · · , cos(nx)} Then, {cos(kπx)}nk=0 is an orthonormal set of vectors
Rπ
in V , i.e., π1 −π cos(kπ) cos(jπx) dx = 0 for k 6= j with inner product defined by
Z π
1
hu, vi = u(x)v(x) dx.
π −π
59
7 QR decomposition and Singular value decomposition
Householder transform and QR decomposition A = QR
Let e1 be the vector (1, 0, · · · , 0)t , || · || is the Euclidean norm
u = x − αe1 ,
u
v= ,
kuk
Q = I − 2vvT .
Or,
Q = I − 2vvt .
where Q is an m − by − m Householder matrix and
T
Qx = α 0 · · · 0 .
Note that
Qt Q = (I − 2vvt )(I − 2vvt .) = I − 4vvt + 4vvt = I
This can be used to sequentially transform an m − by − n matrix A to upper triangular
form. First, we multiply A with the Householder matrix Q1 we obtain when we choose the
first matrix column for x. This results in a matrix Q1 A with zeros in the left column (except
for the first row).
α1 ? . . . ?
0
Q1 A = ..
. A1
0
This can be repeated for A1 (obtained from Q1 A by deleting the first row and first column),
resulting in a Householder matrix Q2 . Note that Q2 is smaller than Q1. Since we want it
really to operate on Q1 A instead of A we need to expand it to the upper left, filling in a 1,
or in general:
Ik−1 0
Qk = .
0 Q0k
After k iterations of this process, k = min(m − 1, n)
R = Qk · · · Q2 Q1 A
A = QR is a QR decomposition of A.
60
Remark Note that Q is a real orthogonal transform Qt Q = I and Qt = Q−1 and
kQ~xk = k~xk (norm preserving for all x ∈ Rm ).
In fact,
Qt Q = Qk · · · Q2 Q1 Qt1 Qt2 · · · Qtk = I
and
||Qx||2 = (Qx)t Qx = xt Qt Qx = xt x = ||x||2
QR method for Eigenvalue problems In numerical linear algebra, the QR algorithm is an
eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a
matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera
N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition,
writing the matrix as a product of an orthogonal matrix and an upper triangular matrix,
multiply the factors in the reverse order and iterate. It is a power method to compute
dominant eigen value-pairs.
Basic QR-algorithm Let A0 = A. At the k-th step (starting with k = 0), we compute the
QR decomposition Ak = Qk Rk We then form Ak+1 = Rk Qk . Note that
Ak+1 = Rk Qk = Q−1 t
k Ak Qk = Qk Ak Qk ,
so all the Ak are similar to A and hence they have the same eigenvalues. The algorithm is
numerically stable because it proceeds by orthogonal similarity transforms. Let
Q̂k = Q0 · · · Qk and R̂k = R0 · · · Rk
be the orthogonal and triangular matrices generated by the QR algorithm, Then, we have
Ak+1 = Q̂tk AQ̂k
With shifts σ0 , · · · , σk , starting with A. Then
Q̂k R̂k = (A − σ0 I) · · · (A − σk I)
61
given by
tk(i) = x(i) · w(k) for i = 1, . . . , n k = 1, . . . , `
Since w(1) has been defined to be a unit vector, it equivalently also satisfies
T T T T
w X Xw w X Xw
w(1) = arg max w(1) = arg max
wT w wT w
At A = (U SV t )t U SV t = V SU t U SV t = V S 2 V t ,
AAt = U SV t (U SV t )y = U SV t V SU t = U S 2 U t ,
thus
U corresponds to the eigenvectors of AAt ,
V corresponds to eigenvectors of At A
S 2 = Λ̃=eigenvalues of At A (AAt )
Singular values are similar in that they can be described from variational principles.
Consider a constraint optimization
M v1 = λ1 u1 , M t u1 = λ2 v1 , |u1 | = |v1 | = 1.
62
Multiplying the first equation from by ut and multiplying the second equation from by v t ,
we have
σ1 = ut1 M v1 = λ1 = λ2
The same calculation performed on the orthogonal complement {u ∈ Rn : (u, u1 ) = 0}×{v ∈
Rm : (v, v1 ) = 0} and gives the next largest singular value of M , and so. That is, we obtain
singular value triples (σi , ui , vi ) such that σ1 ≥ σ2 ≥ · · · ≥ σn and {ui } is orthonormal i.e.
(ui , uj ) = δi,j , {vi } is orthonormal i.e. (vi , vj ) = δi,j and
M vi = σi ui , M t ui = σi vi , |ui | = |vi | = 1.
Thus, M = U SV t .
Application (Image compression)
A bitmap image is represented by a 864 × 1, 536 matrix, call it A Compute svd decom-
position of A = U SV t and and use a truncated svd
à = Ũ S̃ Ṽ t
where S̃ = diag(s1 , · · · , st ), Ũ = (~u1 , · · · , ~ut ), Ṽ = (~v1 , · · · , ~vt ). We select t such that the
first t singular values of A dominate the remains singular values. It can be proved that à is
the optimal rank t approximation of A
|A − Ã|F is smallest.
63