Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
492 views318 pages

Kuttler LinearAlgebra AFirstCourse

Linear algebra

Uploaded by

Vickie Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
492 views318 pages

Kuttler LinearAlgebra AFirstCourse

Linear algebra

Uploaded by

Vickie Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 318

with Open Texts

A First Course in
LINEAR ALGEBRA
An Open Text
by Ken Kuttler

LYRYX with OPEN TEXTS ADAPTATION

UNIVERSITY OF CALGARY
MATH 211 LINEAR METHODS I
ALL SECTIONS FALL 2016

Creative Commons License (CC BY)


a d v a n c i n g l e a r n i n g

LYRYX WITH OPEN TEXTS

OPEN TEXT

This text can be downloaded in electronic format, printed, and can be distributed to students
at no cost. Lyryx will also adapt the content and provide custom editions for specific courses
adopting Lyryx Assessment. The original TeX files are also available if instructors wish to adapt
certain sections themselves.

ONLINE ASSESSMENT

Lyryx has developed corresponding formative online assessment for homework and quizzes.
These are genuine questions for the subject and adapted to the content. Student answers are
carefully analyzed by the system and personalized feedback is immediately provided to help
students improve on their work.
Lyryx provides all the tools required to manage online assessment including student grade re-
ports and student performance statistics.

INSTRUCTOR SUPPLEMENTS

A number of resources are available, including a full set of beamer slides for instructors and
students, as well as a partial solution manual.

SUPPORT

Lyryx provides all of the support instructors and students need! Starting from the course prepa-
ration time to beyond the end of the course, Lyryx staff is available 7 days/week to provide
assistance. This may include adapting the text, managing multiple sections of the course, pro-
viding course supplements, as well as timely assistance to students with registration, navigation,
and daily organization.

Contact Lyryx!
[email protected]
A First Course in Linear Algebra
Ken Kuttler

Version 2016 Revision A

Edits 2012-2014: The content of text has been modified and adapted with the addition of new material
and several images.

Edits 2015: The content of the text continues to be modified and adapted with the addition of new material
including additional examples and proofs to existing material.

Edits 2016: The layout and appearance of the text has been updated, including the title page and newly
designed back cover.

All new content (text and images) is released under the same license as noted below.

Stephanie Keyowski, Managing Editor

Lyryx Learning

Copyright

Creative Commons License (CC BY): This text, including the art and illustrations, are available under
the Creative Commons license (CC BY), allowing anyone to reuse, revise, remix and redistribute the text.

To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/


Contents

1 Systems of Equations 3
1.1 Systems of Equations, Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Systems Of Equations, Algebraic Procedures . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Elementary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 Uniqueness of the Reduced Row-Echelon Form . . . . . . . . . . . . . . . . . . 25
1.2.4 Rank and Homogeneous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 Matrices 41
2.1 Matrix Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1.1 Addition of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.2 Scalar Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1.3 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.4 The i jth Entry of a Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.1.5 Properties of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.6 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.1.7 The Identity and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.1.8 Finding the Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.1.9 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.1.10 More on Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3 Determinants 87
3.1 Basic Techniques and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1.1 Cofactors and 2 2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1.2 The Determinant of a Triangular Matrix . . . . . . . . . . . . . . . . . . . . . . . 92
3.1.3 Properties of Determinants I: Examples . . . . . . . . . . . . . . . . . . . . . . . 94
3.1.4 Properties of Determinants II: Some Important Proofs . . . . . . . . . . . . . . . 98
3.1.5 Finding Determinants using Row Operations . . . . . . . . . . . . . . . . . . . . 103
3.2 Applications of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2.1 A Formula for the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2.2 Cramers Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.2.3 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

iii
iv CONTENTS

4 Rn 125
4.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2 Algebra in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.1 Addition of Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.2 Scalar Multiplication of Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3 Geometric Meaning of Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4 Length of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.5 Geometric Meaning of Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.6 Parametric Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.7 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.7.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.7.2 The Geometric Significance of the Dot Product . . . . . . . . . . . . . . . . . . . 149
4.7.3 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.8 Planes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.9 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.9.1 The Box Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.10 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.10.1 Vectors and Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.10.2 Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

5 Linear Transformations 179


5.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.2 The Matrix of a LinearTransformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.3 Properties of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.4 Special Linear Transformations in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

6 Complex Numbers 203


6.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.2 Polar Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.3 Roots of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4 The Quadratic Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

7 Spectral Theory 221


7.1 Eigenvalues and Eigenvectors of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.1.1 Definition of Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . 221
7.1.2 Finding Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 224
7.1.3 Eigenvalues and Eigenvectors for Special Types of Matrices . . . . . . . . . . . . 230
7.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.2.1 Diagonalizing a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
CONTENTS v

7.2.2 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240


7.3 Applications of Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.3.1 Raising a Matrix to a High Power . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.3.2 Raising a Symmetric Matrix to a High Power . . . . . . . . . . . . . . . . . . . . 246
7.3.3 Markov Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.3.3.1 Eigenvalues of Markov Matrices . . . . . . . . . . . . . . . . . . . . . 255
7.3.4 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

A Some Prerequisite Topics 269


A.1 Sets and Set Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
A.2 Well Ordering and Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

B Selected Exercise Answers 275

Index 305
Preface
A First Course in Linear Algebra presents an introduction to the fascinating subject of linear algebra.
As the title suggests, this text is designed as a first course in linear algebra for students who have a
reasonable understanding of basic algebra. Major topics of linear algebra are presented in detail, with
proofs of important theorems provided. Connections to additional topics covered in advanced courses are
introduced, in an effort to assist those students who are interested in continuing on in linear algebra.
Each chapter begins with a list of desired outcomes which a student should be able to achieve upon
completing the chapter. Throughout the text, examples and diagrams are given to reinforce ideas and
provide guidance on how to approach various problems. Suggested exercises are given at the end of each
section, and students are encouraged to work through a selection of these exercises.
A brief review of complex numbers is given, which can serve as an introduction to anyone unfamiliar
with the topic.
Linear algebra is a wonderful and interesting subject, which should not be limited to a challenge
of correct arithmetic. The use of a computer algebra system can be a great help in long and difficult
computations. Some of the standard computations of linear algebra are easily done by the computer,
including finding the reduced row-echelon form. While the use of a computer system is encouraged, it is
not meant to be done without the student having an understanding of the computations.

1
1. Systems of Equations

1.1 Systems of Equations, Geometry

Outcomes
A. Relate the types of solution sets of a system of two (three) variables to the intersections of
lines in a plane (the intersection of planes in three space)

As you may remember, linear equations like 2x + 3y = 6 can be graphed as straight lines in the coordi-
nate plane. We say that this equation is in two variables, in this case x and y. Suppose you have two such
equations, each of which can be graphed as a straight line, and consider the resulting graph of two lines.
What would it mean if there exists a point of intersection between the two lines? This point, which lies on
both graphs, gives x and y values for which both equations are true. In other words, this point gives the
ordered pair (x, y) that satisfy both equations. If the point (x, y) is a point of intersection, we say that (x, y)
is a solution to the two equations. In linear algebra, we often are concerned with finding the solution(s)
to a system of equations, if such solutions exist. First, we consider graphical representations of solutions
and later we will consider the algebraic methods for finding solutions.
When looking for the intersection of two lines in a graph, several situations may arise. The follow-
ing picture demonstrates the possible situations when considering two equations (two lines in the graph)
involving two variables.
y y y

x x x
One Solution No Solutions Infinitely Many Solutions

In the first diagram, there is a unique point of intersection, which means that there is only one (unique)
solution to the two equations. In the second, there are no points of intersection and no solution. When no
solution exists, this means that the two lines are parallel and they never intersect. The third situation which
can occur, as demonstrated in diagram three, is that the two lines are really the same line. For example,
x + y = 1 and 2x + 2y = 2 are equations which when graphed yield the same line. In this case there are
infinitely many points which are solutions of these two equations, as every ordered pair which is on the
graph of the line satisfies both equations. When considering linear systems of equations, there are always
three types of solutions possible; exactly one (unique) solution, infinitely many solutions, or no solution.

3
4 Systems of Equations

Example 1.1: A Graphical Solution


Use a graph to find the solution to the following system of equations

x+y = 3
yx = 5

Solution. Through graphing the above equations and identifying the point of intersection, we can find the
solution(s). Remember that we must have either one solution, infinitely many, or no solutions at all. The
following graph shows the two equations, as well as the intersection. Remember, the point of intersection
represents the solution of the two equations, or the (x, y) which satisfy both equations. In this case, there
is one point of intersection at (1, 4) which means we have one unique solution, x = 1, y = 4.

y
6

4
(x, y) = (1, 4)

x
4 3 2 1 1


In the above example, we investigated the intersection point of two equations in two variables, x and
y. Now we will consider the graphical solutions of three equations in two variables.
Consider a system of three equations in two variables. Again, these equations can be graphed as
straight lines in the plane, so that the resulting graph contains three straight lines. Recall the three possible
types of solutions; no solution, one solution, and infinitely many solutions. There are now more complex
ways of achieving these situations, due to the presence of the third line. For example, you can imagine
the case of three intersecting lines having no common point of intersection. Perhaps you can also imagine
three intersecting lines which do intersect at a single point. These two situations are illustrated below.

y y

x x
No Solution One Solution
1.1. Systems of Equations, Geometry 5

Consider the first picture above. While all three lines intersect with one another, there is no common
point of intersection where all three lines meet at one point. Hence, there is no solution to the three
equations. Remember, a solution is a point (x, y) which satisfies all three equations. In the case of the
second picture, the lines intersect at a common point. This means that there is one solution to the three
equations whose graphs are the given lines. You should take a moment now to draw the graph of a system
which results in three parallel lines. Next, try the graph of three identical lines. Which type of solution is
represented in each of these graphs?
We have now considered the graphical solutions of systems of two equations in two variables, as well
as three equations in two variables. However, there is no reason to limit our investigation to equations in
two variables. We will now consider equations in three variables.
You may recall that equations in three variables, such as 2x + 4y 5z = 8, form a plane. Above, we
were looking for intersections of lines in order to identify any possible solutions. When graphically solving
systems of equations in three variables, we look for intersections of planes. These points of intersection
give the (x, y, z) that satisfy all the equations in the system. What types of solutions are possible when
working with three variables? Consider the following picture involving two planes, which are given by
two equations in three variables.

Notice how these two planes intersect in a line. This means that the points (x, y, z) on this line satisfy
both equations in the system. Since the line contains infinitely many points, this system has infinitely
many solutions.
It could also happen that the two planes fail to intersect. However, is it possible to have two planes
intersect at a single point? Take a moment to attempt drawing this situation, and convince yourself that it
is not possible! This means that when we have only two equations in three variables, there is no way to
have a unique solution! Hence, the types of solutions possible for two equations in three variables are no
solution or infinitely many solutions.
Now imagine adding a third plane. In other words, consider three equations in three variables. What
types of solutions are now possible? Consider the following diagram.

New Plane

In this diagram, there is no point which lies in all three planes. There is no intersection between all
6 Systems of Equations

planes so there is no solution. The picture illustrates the situation in which the line of intersection of the
new plane with one of the original planes forms a line parallel to the line of intersection of the first two
planes. However, in three dimensions, it is possible for two lines to fail to intersect even though they are
not parallel. Such lines are called skew lines.
Recall that when working with two equations in three variables, it was not possible to have a unique
solution. Is it possible when considering three equations in three variables? In fact, it is possible, and we
demonstrate this situation in the following picture.
New Plane

In this case, the three planes have a single point of intersection. Can you think of other types of
solutions possible? Another is that the three planes could intersect in a line, resulting in infinitely many
solutions, as in the following diagram.

We have now seen how three equations in three variables can have no solution, a unique solution, or
intersect in a line resulting in infinitely many solutions. It is also possible that the three equations graph
the same plane, which also leads to infinitely many solutions.
You can see that when working with equations in three variables, there are many more ways to achieve
the different types of solutions than when working with two variables. It may prove enlightening to spend
time imagining (and drawing) many possible scenarios, and you should take some time to try a few.
You should also take some time to imagine (and draw) graphs of systems in more than three variables.
Equations like x + y 2z + 4w = 8 with more than three variables are often called hyper-planes. You may
soon realize that it is tricky to draw the graphs of hyper-planes! Through the tools of linear algebra, we
can algebraically examine these types of systems which are difficult to graph. In the following section, we
will consider these algebraic tools.
1.2. Systems Of Equations, Algebraic Procedures 7

Exercises

Exercise 1.1.1 Graphically, find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x y = 3.
That is, graph each line and see where they intersect.

Exercise 1.1.2 Graphically, find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1. That
is, graph each line and see where they intersect.

Exercise 1.1.3 You have a system of k equations in two variables, k 2. Explain the geometric signifi-
cance of

(a) No solution.

(b) A unique solution.

(c) An infinite number of solutions.

1.2 Systems Of Equations, Algebraic Procedures

Outcomes
A. Use elementary operations to find the solution to a linear system of equations.

B. Find the row-echelon form and reduced row-echelon form of a matrix.

C. Determine whether a system of linear equations has no solution, a unique solution or an


infinite number of solutions from its row-echelon form.

D. Solve a system of equations using Gaussian Elimination and Gauss-Jordan Elimination.

E. Model a physical system with linear equations and then solve.

We have taken an in depth look at graphical representations of systems of equations, as well as how to
find possible solutions graphically. Our attention now turns to working with systems algebraically.
8 Systems of Equations

Definition 1.2: System of Linear Equations


A system of linear equations is a list of equations,

a11 x1 + a12 x2 + + a1n xn = b1


a21 x1 + a22 x2 + + a2n xn = b2
.
..
am1 x1 + am2 x2 + + amn xn = bm

where ai j and b j are real numbers. The above is a system of m equations in the n variables,
x1 , x2 , xn . Written more simply in terms of summation notation, the above can be written in
the form
n
ai j x j = bi, i = 1, 2, 3, , m
j=1

The relative size of m and n is not important here. Notice that we have allowed ai j and b j to be any
real number. We can also call these numbers scalars . We will use this term throughout the text, so keep
in mind that the term scalar just means that we are working with real numbers.
Now, suppose we have a system where bi = 0 for all i. In other words every equation equals 0. This is
a special type of system.

Definition 1.3: Homogeneous System of Equations


A system of equations is called homogeneous if each equation in the system is equal to 0. A
homogeneous system has the form

a11 x1 + a12 x2 + + a1n xn = 0


a21 x1 + a22 x2 + + a2n xn = 0
...
am1 x1 + am2 x2 + + amn xn = 0

where ai j are scalars and xi are variables.

Recall from the previous section that our goal when working with systems of linear equations was to
find the point of intersection of the equations when graphed. In other words, we looked for the solutions to
the system. We now wish to find these solutions algebraically. We want to find values for x1 , , xn which
solve all of the equations. If such a set of values exists, we call (x1 , , xn ) the solution set.
Recall the above discussions about the types of solutions possible. We will see that systems of linear
equations will have one unique solution, infinitely many solutions, or no solution. Consider the following
definition.

Definition 1.4: Consistent and Inconsistent Systems


A system of linear equations is called consistent if there exists at least one solution. It is called
inconsistent if there is no solution.
1.2. Systems Of Equations, Algebraic Procedures 9

If you think of each equation as a condition which must be satisfied by the variables, consistent would
mean there is some choice of variables which can satisfy all the conditions. Inconsistent would mean there
is no choice of the variables which can satisfy all of the conditions.
The following sections provide methods for determining if a system is consistent or inconsistent, and
finding solutions if they exist.

1.2.1. Elementary Operations

We begin this section with an example. Recall from Example 1.1 that the solution to the given system was
(x, y) = (1, 4).

Example 1.5: Verifying an Ordered Pair is a Solution


Algebraically verify that (x, y) = (1, 4) is a solution to the following system of equations.

x+y = 3
yx = 5

Solution. By graphing these two equations and identifying the point of intersection, we previously found
that (x, y) = (1, 4) is the unique solution.
We can verify algebraically by substituting these values into the original equations, and ensuring that
the equations hold. First, we substitute the values into the first equation and check that it equals 3.
x + y = (1) + (4) = 3
This equals 3 as needed, so we see that (1, 4) is a solution to the first equation. Substituting the values
into the second equation yields
y x = (4) (1) = 4 + 1 = 5
which is true. For (x, y) = (1, 4) each equation is true and therefore, this is a solution to the system.
Now, the interesting question is this: If you were not given these numbers to verify, how could you
algebraically determine the solution? Linear algebra gives us the tools needed to answer this question.
The following basic operations are important tools that we will utilize.

Definition 1.6: Elementary Operations


Elementary operations are those operations consisting of the following.

1. Interchange the order in which the equations are listed.

2. Multiply any equation by a nonzero number.

3. Replace any equation with itself added to a multiple of another equation.

It is important to note that none of these operations will change the set of solutions of the system of
equations. In fact, elementary operations are the key tool we use in linear algebra to find solutions to
systems of equations.
10 Systems of Equations

Consider the following example.

Example 1.7: Effects of an Elementary Operation


Show that the system
x+y = 7
2x y = 8
has the same solution as the system
x+y = 7
3y = 6

Solution. Notice that the second system has been obtained by taking the second equation of the first system
and adding -2 times the first equation, as follows:

2x y + (2)(x + y) = 8 + (2)(7)

By simplifying, we obtain
3y = 6
which is the second equation in the second system. Now, from here we can solve for y and see that y = 2.
Next, we substitute this value into the first equation as follows

x+y = x+2 = 7

Hence x = 5 and so (x, y) = (5, 2) is a solution to the second system. We want to check if (5, 2) is also a
solution to the first system. We check this by substituting (x, y) = (5, 2) into the system and ensuring the
equations are true.
x + y = (5) + (2) = 7
2x y = 2 (5) (2) = 8
Hence, (5, 2) is also a solution to the first system.
This example illustrates how an elementary operation applied to a system of two equations in two
variables does not affect the solution set. However, a linear system may involve many equations and many
variables and there is no reason to limit our study to small systems. For any size of system in any number
of variables, the solution set is still the collection of solutions to the equations. In every case, the above
operations of Definition 1.6 do not change the set of solutions to the system of linear equations.
In the following theorem, we use the notation Ei to represent an equation, while bi denotes a constant.
1.2. Systems Of Equations, Algebraic Procedures 11

Theorem 1.8: Elementary Operations and Solutions


Suppose you have a system of two linear equations

E1 = b 1
(1.1)
E2 = b 2

Then the following systems have the same solution set as 1.1:

1.
E2 = b 2
(1.2)
E1 = b 1

2.
E1 = b 1
(1.3)
kE2 = kb2
for any scalar k, provided k 6= 0.

3.
E1 = b 1
(1.4)
E2 + kE1 = b2 + kb1
for any scalar k (including k = 0).

Before we proceed with the proof of Theorem 1.8, let us consider this theorem in context of Example
1.7. Then,
E1 = x + y, b1 = 7
E2 = 2x y, b2 = 8
Recall the elementary operations that we used to modify the system in the solution to the example. First,
we added (2) times the first equation to the second equation. In terms of Theorem 1.8, this action is
given by
E2 + (2) E1 = b2 + (2) b1
or
2x y + (2) (x + y) = 8 + (2) 7
This gave us the second system in Example 1.7, given by
E1 = b 1
E2 + (2) E1 = b2 + (2) b1

From this point, we were able to find the solution to the system. Theorem 1.8 tells us that the solution
we found is in fact a solution to the original system.
We will now prove Theorem 1.8.
Proof.

1. The proof that the systems 1.1 and 1.2 have the same solution set is as follows. Suppose that
(x1 , , xn ) is a solution to E1 = b1 , E2 = b2 . We want to show that this is a solution to the system
in 1.2 above. This is clear, because the system in 1.2 is the original system, but listed in a different
order. Changing the order does not effect the solution set, so (x1 , , xn ) is a solution to 1.2.
12 Systems of Equations

2. Next we want to prove that the systems 1.1 and 1.3 have the same solution set. That is E1 = b1 , E2 =
b2 has the same solution set as the system E1 = b1 , kE2 = kb2 provided k 6= 0. Let (x1 , , xn ) be a
solution of E1 = b1 , E2 = b2 ,. We want to show that it is a solution to E1 = b1 , kE2 = kb2 . Notice that
the only difference between these two systems is that the second involves multiplying the equation,
E2 = b2 by the scalar k. Recall that when you multiply both sides of an equation by the same number,
the sides are still equal to each other. Hence if (x1 , , xn ) is a solution to E2 = b2 , then it will also
be a solution to kE2 = kb2 . Hence, (x1 , , xn ) is also a solution to 1.3.
Similarly, let (x1 , , xn ) be a solution of E1 = b1 , kE2 = kb2 . Then we can multiply the equation
kE2 = kb2 by the scalar 1/k, which is possible only because we have required that k 6= 0. Just as
above, this action preserves equality and we obtain the equation E2 = b2 . Hence (x1 , , xn ) is also
a solution to E1 = b1 , E2 = b2 .

3. Finally, we will prove that the systems 1.1 and 1.4 have the same solution set. We will show that
any solution of E1 = b1 , E2 = b2 is also a solution of 1.4. Then, we will show that any solution of
1.4 is also a solution of E1 = b1 , E2 = b2 . Let (x1 , , xn ) be a solution to E1 = b1 , E2 = b2 . Then
in particular it solves E1 = b1 . Hence, it solves the first equation in 1.4. Similarly, it also solves
E2 = b2 . By our proof of 1.3, it also solves kE1 = kb1 . Notice that if we add E2 and kE1 , this is equal
to b2 + kb1 . Therefore, if (x1 , , xn ) solves E1 = b1 , E2 = b2 it must also solve E2 + kE1 = b2 + kb1 .
Now suppose (x1 , , xn ) solves the system E1 = b1 , E2 + kE1 = b2 + kb1 . Then in particular it is a
solution of E1 = b1 . Again by our proof of 1.3, it is also a solution to kE1 = kb1 . Now if we subtract
these equal quantities from both sides of E2 + kE1 = b2 + kb1 we obtain E2 = b2 , which shows that
the solution also satisfies E1 = b1 , E2 = b2 .


Stated simply, the above theorem shows that the elementary operations do not change the solution set
of a system of equations.
We will now look at an example of a system of three equations and three variables. Similarly to the
previous examples, the goal is to find values for x, y, z such that each of the given equations are satisfied
when these values are substituted in.

Example 1.9: Solving a System of Equations with Elementary Operations


Find the solutions to the system,

x + 3y + 6z = 25
2x + 7y + 14z = 58 (1.5)
2y + 5z = 19

Solution. We can relate this system to Theorem 1.8 above. In this case, we have

E1 = x + 3y + 6z, b1 = 25
E2 = 2x + 7y + 14z, b2 = 58
E3 = 2y + 5z, b3 = 19

Theorem 1.8 claims that if we do elementary operations on this system, we will not change the solution
set. Therefore, we can solve this system using the elementary operations given in Definition 1.6. First,
1.2. Systems Of Equations, Algebraic Procedures 13

replace the second equation by (2) times the first equation added to the second. This yields the system

x + 3y + 6z = 25
y + 2z = 8 (1.6)
2y + 5z = 19

Now, replace the third equation with (2) times the second added to the third. This yields the system

x + 3y + 6z = 25
y + 2z = 8 (1.7)
z=3

At this point, we can easily find the solution. Simply take z = 3 and substitute this back into the previous
equation to solve for y, and similarly to solve for x.

x + 3y + 6 (3) = x + 3y + 18 = 25
y + 2 (3) = y + 6 = 8
z=3

The second equation is now


y+6 = 8
You can see from this equation that y = 2. Therefore, we can substitute this value into the first equation as
follows:
x + 3 (2) + 18 = 25
By simplifying this equation, we find that x = 1. Hence, the solution to this system is (x, y, z) = (1, 2, 3).
This process is called back substitution.
Alternatively, in 1.7 you could have continued as follows. Add (2) times the third equation to the
second and then add (6) times the second to the first. This yields

x + 3y = 7
y=2
z=3

Now add (3) times the second to the first. This yields

x=1
y=2
z=3

a system which has the same solution set as the original system. This avoided back substitution and led
to the same solution set. It is your decision which you prefer to use, as both methods lead to the correct
solution, (x, y, z) = (1, 2, 3).

1.2.2. Gaussian Elimination

The work we did in the previous section will always find the solution to the system. In this section, we
will explore a less cumbersome way to find the solutions. First, we will represent a linear system with
14 Systems of Equations

an augmented matrix. A matrix is simply a rectangular array of numbers. The size or dimension of a
matrix is defined as m n where m is the number of rows and n is the number of columns. In order to
construct an augmented matrix from a linear system, we create a coefficient matrix from the coefficients
of the variables in the system, as well as a constant matrix from the constants. The coefficients from one
equation of the system create one row of the augmented matrix.
For example, consider the linear system in Example 1.9

x + 3y + 6z = 25
2x + 7y + 14z = 58
2y + 5z = 19

This system can be written as an augmented matrix, as follows



1 3 6 25
2 7 14 58
0 2 5 19

Notice that it has exactly the same information as the original system.Here it is understood that the
1
first column contains the coefficients from x in each equation, in order, 2 . Similarly, we create a

0

3
column from the coefficients on y in each equation, 7 and a column from the coefficients on z in each
2

6
equation, 14 . For a system of more than three variables, we would continue in this way constructing
5
a column for each variable. Similarly, for a system of less than three variables, we simply construct a
column for each variable.
25
Finally, we construct a column from the constants of the equations, 58 .
19
The rows of the augmented  matrix correspond
 to the equations in the system. For example, the top
row in the augmented matrix, 1 3 6 | 25 corresponds to the equation

x + 3y + 6z = 25.

Consider the following definition.


1.2. Systems Of Equations, Algebraic Procedures 15

Definition 1.10: Augmented Matrix of a Linear System


For a linear system of the form

a11 x1 + + a1n xn = b1
.
..
am1 x1 + + amn xn = bm

where the xi are variables and the ai j and bi are constants, the augmented matrix of this system is
given by
a11 a1n b1
. . .
.. .. ..
am1 amn bm

Now, consider elementary operations in the context of the augmented matrix. The elementary opera-
tions in Definition 1.6 can be used on the rows just as we used them on equations previously. Changes to
a system of equations in as a result of an elementary operation are equivalent to changes in the augmented
matrix resulting from the corresponding row operation. Note that Theorem 1.8 implies that any elementary
row operations used on an augmented matrix will not change the solution to the corresponding system of
equations. We now formally define elementary row operations. These are the key tool we will use to find
solutions to systems of equations.

Definition 1.11: Elementary Row Operations


The elementary row operations (also known as row operations) consist of the following

1. Switch two rows.

2. Multiply a row by a nonzero number.

3. Replace a row by any multiple of another row added to it.

Recall how we solved Example 1.9. We can do the exact same steps as above, except now in the
context of an augmented matrix and using row operations. The augmented matrix of this system is

1 3 6 25
2 7 14 58
0 2 5 19

Thus the first step in solving the system given by 1.5 would be to take (2) times the first row of the
augmented matrix and add it to the second row,

1 3 6 25
0 1 2 8
0 2 5 19
16 Systems of Equations

Note how this corresponds to 1.6. Next take (2) times the second row and add to the third,

1 3 6 25
0 1 2 8
0 0 1 3
This augmented matrix corresponds to the system
x + 3y + 6z = 25
y + 2z = 8
z=3
which is the same as 1.7. By back substitution you obtain the solution x = 1, y = 2, and z = 3.
Through a systematic procedure of row operations, we can simplify an augmented matrix and carry it
to row-echelon form or reduced row-echelon form, which we define next. These forms are used to find
the solutions of the system of equations corresponding to the augmented matrix.
In the following definitions, the term leading entry refers to the first nonzero entry of a row when
scanning the row from left to right.

Definition 1.12: Row-Echelon Form


An augmented matrix is in row-echelon form if

1. All nonzero rows are above any rows of zeros.

2. Each leading entry of a row is in a column to the right of the leading entries of any row above
it.

3. Each leading entry of a row is equal to 1.

We also consider another reduced form of the augmented matrix which has one further condition.

Definition 1.13: Reduced Row-Echelon Form


An augmented matrix is in reduced row-echelon form if

1. All nonzero rows are above any rows of zeros.

2. Each leading entry of a row is in a column to the right of the leading entries of any rows above
it.

3. Each leading entry of a row is equal to 1.

4. All entries in a column above and below a leading entry are zero.

Notice that the first three conditions on a reduced row-echelon form matrix are the same as those for
row-echelon form.
Hence, every reduced row-echelon form matrix is also in row-echelon form. The converse is not
necessarily true; we cannot assume that every matrix in row-echelon form is also in reduced row-echelon
1.2. Systems Of Equations, Algebraic Procedures 17

form. However, it often happens that the row-echelon form is sufficient to provide information about the
solution of a system.
The following examples describe matrices in these various forms. As an exercise, take the time to
carefully verify that they are in the specified form.

Example 1.14: Not in Row-Echelon Form


The following augmented matrices are not in row-echelon form (and therefore also not in reduced
row-echelon form).

0 0 0 0
0 2 3 3
1 2 3 3 1 2 3

0 1 0 2 , 2 4 6 , 1 5 0 2
7 5 0 1
0 0 0 1 4 0 7
0 0 1 0
0 0 0 0

Example 1.15: Matrices in Row-Echelon Form


The following augmented matrices are in row-echelon form, but not in reduced row-echelon form.

1 3 5 4
1 0 6 5 8 2 1 0 6 0
0 0 1 2 7 3 0 1 0 7 0 1 4 0
, 0 0 1 0 ,
0 0 0 0 1 1 0 0 1 0
0 0 0 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0

Notice that we could apply further row operations to these matrices to carry them to reduced row-
echelon form. Take the time to try that on your own. Consider the following matrices, which are in
reduced row-echelon form.

Example 1.16: Matrices in Reduced Row-Echelon Form


The following augmented matrices are in reduced row-echelon form.

1 0 0 0
1 0 0 5 0 0

0 0 1 2 0 0 0 1 0 0 1 0 0 4

0 0 0 0 1 1 , 0 0 1 0 , 0 1 0 3
0 0 0 1 0 0 1 2
0 0 0 0 0 0
0 0 0 0

One way in which the row-echelon form of a matrix is useful is in identifying the pivot positions and
pivot columns of the matrix.
18 Systems of Equations

Definition 1.17: Pivot Position and Pivot Column


A pivot position in a matrix is the location of a leading entry in the row-echelon formof a matrix.
A pivot column is a column that contains a pivot position.

For example consider the following.

Example 1.18: Pivot Position


Let
1 2 3 4
A= 3 2 1 6
4 4 4 10
Where are the pivot positions and pivot columns of the augmented matrix A?

Solution. The row-echelon form of this matrix is



1 2 3 4
0 1 2 32
0 0 0 0

This is all we need in this example, but note that this matrix is not in reduced row-echelon form.
In order to identify the pivot positions in the original matrix, we look for the leading entries in the
row-echelon form of the matrix. Here, the entry in the first row and first column, as well as the entry in
the second row and second column are the leading entries. Hence, these locations are the pivot positions.
We identify the pivot positions in the original matrix, as in the following:

1 2 3 4
3 2 1 6
4 4 4 10

Thus the pivot columns in the matrix are the first two columns.
The following is an algorithm for carrying a matrix to row-echelon form and reduced row-echelon
form. You may wish to use this algorithm to carry the above matrix to row-echelon form or reduced
row-echelon form yourself for practice.
1.2. Systems Of Equations, Algebraic Procedures 19

Algorithm 1.19: Reduced Row-Echelon Form Algorithm


This algorithm provides a method for using row operations to take a matrix to its reduced row-
echelon form. We begin with the matrix in its original form.

1. Starting from the left, find the first nonzero column. This is the first pivot column, and the
position at the top of this column is the first pivot position. Switch rows if necessary to place
a nonzero number in the first pivot position.

2. Use row operations to make the entries below the first pivot position (in the first pivot column)
equal to zero.

3. Ignoring the row containing the first pivot position, repeat steps 1 and 2 with the remaining
rows. Repeat the process until there are no more rows to modify.

4. Divide each nonzero row by the value of the leading entry, so that the leading entry becomes
1. The matrix will then be in row-echelon form.

The following step will carry the matrix from row-echelon form to reduced row-echelon form.

5. Moving from right to left, use row operations to create zeros in the entries of the pivot columns
which are above the pivot positions. The result will be a matrix in reduced row-echelon form.

Most often we will apply this algorithm to an augmented matrix in order to find the solution to a system
of linear equations. However, we can use this algorithm to compute the reduced row-echelon form of any
matrix which could be useful in other applications.
Consider the following example of Algorithm 1.19.

Example 1.20: Finding Row-Echelon Form and


Reduced Row-Echelon Form of a Matrix
Let
0 5 4
A= 1 4 3
5 10 7
Find the row-echelon form of A. Then complete the process until A is in reduced row-echelon form.

Solution. In working through this example, we will use the steps outlined in Algorithm 1.19.

1. The first pivot column is the first column of the matrix, as this is the first nonzero column from the
left. Hence the first pivot position is the one in the first row and first column. Switch the first two
rows to obtain a nonzero entry in the first pivot position, outlined in a box below.

1 4 3
0 5 4
5 10 7
20 Systems of Equations

2. Step two involves creating zeros in the entries below the first pivot position. The first entry of the
second row is already a zero. All we need to do is subtract 5 times the first row from the third row.
The resulting matrix is
1 4 3
0 5 4
0 10 8

3. Now ignore the top row. Apply steps 1 and 2 to the smaller matrix
 
5 4
10 8
In this matrix, the first column is a pivot column, and 5 is in the first pivot position. Therefore, we
need to create a zero below it. To do this, add 2 times the first row (of this matrix) to the second.
The resulting matrix is  
5 4
0 0
Our original matrix now looks like
1 4 3
0 5 4
0 0 0
We can see that there are no more rows to modify.
4. Now, we need to create leading 1s in each row. The first row already has a leading 1 so no work is
needed here. Divide the second row by 5 to create a leading 1. The resulting matrix is

1 4 3
0 1 45
0 0 0
This matrix is now in row-echelon form.
5. Now create zeros in the entries above pivot positions in each column, in order to carry this matrix
all the way to reduced row-echelon form. Notice that there is no pivot position in the third column
so we do not need to create any zeros in this column! The column in which we need to create zeros
is the second. To do so, subtract 4 times the second row from the first row. The resulting matrix is

1 0 51
4
0 1 5
0 0 0

This matrix is now in reduced row-echelon form.


The above algorithm gives you a simple way to obtain the row-echelon form and reduced row-echelon
form of a matrix. The main idea is to do row operations in such a way as to end up with a matrix in
row-echelon form or reduced row-echelon form. This process is important because the resulting matrix
will allow you to describe the solutions to the corresponding linear system of equations in a meaningful
way.
1.2. Systems Of Equations, Algebraic Procedures 21

In the next example, we look at how to solve a system of equations using the corresponding augmented
matrix.

Example 1.21: Finding the Solution to a System


Give the complete solution to the following system of equations

2x + 4y 3z = 1
5x + 10y 7z = 2
3x + 6y + 5z = 9

Solution. The augmented matrix for this system is



2 4 3 1
5 10 7 2
3 6 5 9

In order to find the solution to this system, we wish to carry the augmented matrix to reduced row-
echelon form. We will do so using Algorithm 1.19. Notice that the first column is nonzero, so this is our
first pivot column. The first entry in the first row, 2, is the first leading entry and it is in the first pivot
position. We will use row operations to create zeros in the entries below the 2. First, replace the second
row with 5 times the first row plus 2 times the second row. This yields

2 4 3 1
0 0 1 1
3 6 5 9

Now, replace the third row with 3 times the first row plus to 2 times the third row. This yields

2 4 3 1
0 0 1 1
0 0 1 21

Now the entries in the first column below the pivot position are zeros. We now look for the second pivot
column, which in this case is column three. Here, the 1 in the second row and third column is in the pivot
position. We need to do just one row operation to create a zero below the 1.
Taking 1 times the second row and adding it to the third row yields

2 4 3 1
0 0 1 1
0 0 0 20

We could proceed with the algorithm to carry this matrix to row-echelon form or reduced row-echelon
form. However, remember that we are looking for the solutions to the system of equations. Take another
look at the third row of the matrix. Notice that it corresponds to the equation

0x + 0y + 0z = 20
22 Systems of Equations

There is no solution to this equation because for all x, y, z, the left side will equal 0 and 0 6= 20. This shows
there is no solution to the given system of equations. In other words, this system is inconsistent.
The following is another example of how to find the solution to a system of equations by carrying the
corresponding augmented matrix to reduced row-echelon form.

Example 1.22: An Infinite Set of Solutions


Give the complete solution to the system of equations

3x y 5z = 9
y 10z = 0 (1.8)
2x + y = 6

Solution. The augmented matrix of this system is



3 1 5 9
0 1 10 0
2 1 0 6

In order to find the solution to this system, we will carry the augmented matrix to reduced row-echelon
form, using Algorithm 1.19. The first column is the first pivot column. We want to use row operations to
create zeros beneath the first entry in this column, which is in the first pivot position. Replace the third
row with 2 times the first row added to 3 times the third row. This gives

3 1 5 9
0 1 10 0
0 1 10 0

Now, we have created zeros beneath the 3 in the first column, so we move on to the second pivot column
(which is the second column) and repeat the procedure. Take 1 times the second row and add to the third
row.
3 1 5 9
0 1 10 0
0 0 0 0
The entry below the pivot position in the second column is now a zero. Notice that we have no more pivot
columns because we have only two leading entries.
At this stage, we also want the leading entries to be equal to one. To do so, divide the first row by 3.

1 13 53 3

0 1 10 0
0 0 0 0

This matrix is now in row-echelon form.


Lets continue with row operations until the matrix is in reduced row-echelon form. This involves
creating zeros above the pivot positions in each pivot column. This requires only one step, which is to add
1.2. Systems Of Equations, Algebraic Procedures 23

1
3 times the second row to the first row.

1 0 5 3
0 1 10 0
0 0 0 0

This is in reduced row-echelon form, which you should verify using Definition 1.13. The equations
corresponding to this reduced row-echelon form are

x 5z = 3
y 10z = 0
or
x = 3 + 5z
y = 10z
Observe that z is not restrained by any equation. In fact, z can equal any number. For example, we can
let z = t, where we can choose t to be any number. In this context t is called a parameter . Therefore, the
solution set of this system is
x = 3 + 5t
y = 10t
z=t
where t is arbitrary. The system has an infinite set of solutions which are given by these equations. For
any value of t we select, x, y, and z will be given by the above equations. For example, if we choose t = 4
then the corresponding solution would be

x = 3 + 5(4) = 23
y = 10(4) = 40
z=4


In Example 1.22 the solution involved one parameter. It may happen that the solution to a system
involves more than one parameter, as shown in the following example.

Example 1.23: A Two Parameter Set of Solutions


Find the solution to the system
x + 2y z + w = 3
x+yz+w = 1
x + 3y z + w = 5

Solution. The augmented matrix is


1 2 1 1 3
1 1 1 1 1
1 3 1 1 5
We wish to carry this matrix to row-echelon form. Here, we will outline the row operations used. However,
make sure that you understand the steps in terms of Algorithm 1.19.
24 Systems of Equations

Take 1 times the first row and add to the second. Then take 1 times the first row and add to the
third. This yields
1 2 1 1 3
0 1 0 0 2
0 1 0 0 2
Now add the second row to the third row and divide the second row by 1.

1 2 1 1 3
0 1 0 0 2 (1.9)
0 0 0 0 0

This matrix is in row-echelon form and we can see that x and y correspond to pivot columns, while
z and w do not. Therefore, we will assign parameters to the variables z and w. Assign the parameter s
to z and the parameter t to w. Then the first row yields the equation x + 2y s + t = 3, while the second
row yields the equation y = 2. Since y = 2, the first equation becomes x + 4 s + t = 3 showing that the
solution is given by
x = 1 + s t
y=2
z=s
w=t
It is customary to write this solution in the form

x 1 + s t
y 2

z = s (1.10)
w t


This example shows a system of equations with an infinite solution set which depends on two param-
eters. It can be less confusing in the case of an infinite solution set to first place the augmented matrix in
reduced row-echelon form rather than just row-echelon form before seeking to write down the description
of the solution.
In the above steps, this means we dont stop with the row-echelon form in equation 1.9. Instead we
first place it in reduced row-echelon form as follows.

1 0 1 1 1
0 1 0 0 2
0 0 0 0 0

Then the solution is y = 2 from the second row and x = 1 + z w from the first. Thus letting z = s and
w = t, the solution is given by 1.10.
You can see here that there are two paths to the correct answer, which both yield the same answer.
Hence, either approach may be used. The process which we first used in the above solution is called
Gaussian Elimination This process involves carrying the matrix to row-echelon form, converting back to
equations, and using back substitution to find the solution. When you do row operations until you obtain
reduced row-echelon form, the process is called Gauss-Jordan Elimination.
1.2. Systems Of Equations, Algebraic Procedures 25

We have now found solutions for systems of equations with no solution and infinitely many solutions,
with one parameter as well as two parameters. Recall the three types of solution sets which we discussed
in the previous section; no solution, one solution, and infinitely many solutions. Each of these types of
solutions could be identified from the graph of the system. It turns out that we can also identify the type
of solution from the reduced row-echelon form of the augmented matrix.

No Solution: In the case where the system of equations has no solution, the row-echelon form of
the augmented matrix will have a row of the form
 
0 0 0 | 1

This row indicates that the system is inconsistent and has no solution.

One Solution: In the case where the system of equations has one solution, every column of the
coefficient matrix is a pivot column. The following is an example of an augmented matrix in reduced
row-echelon form for a system of equations with one solution.

1 0 0 5
0 1 0 0
0 0 1 2

Infinitely Many Solutions: In the case where the system of equations has infinitely many solutions,
the solution contains parameters. There will be columns of the coefficient matrix which are not
pivot columns. The following are examples of augmented matrices in reduced row-echelon form for
systems of equations with infinitely many solutions.

1 0 0 5
0 1 2 3
0 0 0 0
or  
1 0 0 5
0 1 0 3

1.2.3. Uniqueness of the Reduced Row-Echelon Form

As we have seen in earlier sections, we know that every matrix can be brought into reduced row-echelon
form by a sequence of elementary row operations. Here we will prove that the resulting matrix is unique;
in other words, the resulting matrix in reduced row-echelon form does not depend upon the particular
sequence of elementary row operations or the order in which they were performed.
Let A be the augmented matrix of a homogeneous system of linear equations in the variables x1 , x2 , , xn
which is also in reduced row-echelon form. The matrix A divides the set of variables in two different types.
We say that xi is a basic variable whenever A has a leading 1 in column number i, in other words, when
column i is a pivot column. Otherwise we say that xi is a free variable.
Recall Example 1.23.
26 Systems of Equations

Example 1.24: Basic and Free Variables


Find the basic and free variables in the system

x + 2y z + w = 3
x+yz+w = 1
x + 3y z + w = 5

Solution. Recall from the solution of Example 1.23 that the row-echelon form of the augmented matrix of
this system is given by
1 2 1 1 3
0 1 0 0 2
0 0 0 0 0
You can see that columns 1 and 2 are pivot columns. These columns correspond to variables x and y,
making these the basic variables. Columns 3 and 4 are not pivot columns, which means that z and w are
free variables.
We can write the solution to this system as
x = 1 + s t
y=2
z=s
w=t

Here the free variables are written as parameters, and the basic variables are given by linear functions
of these parameters.
In general, all solutions can be written in terms of the free variables. In such a description, the free
variables can take any values (they become parameters), while the basic variables become simple linear
functions of these parameters. Indeed, a basic variable xi is a linear function of only those free variables
x j with j > i. This leads to the following observation.

Proposition 1.25: Basic and Free Variables


If xi is a basic variable of a homogeneous system of linear equations, then any solution of the system
with x j = 0 for all those free variables x j with j > i must also have xi = 0.

Using this proposition, we prove a lemma which will be used in the proof of the main result of this
section below.

Lemma 1.26: Solutions and the Reduced Row-Echelon Form of a Matrix


Let A and B be two distinct augmented matrices for two homogeneous systems of m equations in n
variables, such that A and B are each in reduced row-echelon form. Then, the two systems do not
have exactly the same solutions.

Proof. With respect to the linear systems associated with the matrices A and B, there are two cases to
consider:
1.2. Systems Of Equations, Algebraic Procedures 27

Case 1: the two systems have the same basic variables

Case 2: the two systems do not have the same basic variables

In case 1, the two matrices will have exactly the same pivot positions. However, since A and B are not
identical, there is some row of A which is different from the corresponding row of B and yet the rows each
have a pivot in the same column position. Let i be the index of this column position. Since the matrices are
in reduced row-echelon form, the two rows must differ at some entry in a column j > i. Let these entries
be a in A and b in B, where a 6= b. Since A is in reduced row-echelon form, if x j were a basic variable
for its linear system, we would have a = 0. Similarly, if x j were a basic variable for the linear system of
the matrix B, we would have b = 0. Since a and b are unequal, they cannot both be equal to 0, and hence
x j cannot be a basic variable for both linear systems. However, since the systems have the same basic
variables, x j must then be a free variable for each system. We now look at the solutions of the systems in
which x j is set equal to 1 and all other free variables are set equal to 0. For this choice of parameters, the
solution of the system for matrix A has x j = a, while the solution of the system for matrix B has x j = b,
so that the two systems have different solutions.
In case 2, there is a variable xi which is a basic variable for one matrix, lets say A, and a free variable
for the other matrix B. The system for matrix B has a solution in which xi = 1 and x j = 0 for all other free
variables x j . However, by Proposition 1.25 this cannot be a solution of the system for the matrix A. This
completes the proof of case 2.
Now, we say that the matrix B is equivalent to the matrix A provided that B can be obtained from A
by performing a sequence of elementary row operations beginning with A. The importance of this concept
lies in the following result.

Theorem 1.27: Equivalent Matrices


The two linear systems of equations corresponding to two equivalent augmented matrices have
exactly the same solutions.

The proof of this theorem is left as an exercise.


Now, we can use Lemma 1.26 and Theorem 1.27 to prove the main result of this section.

Theorem 1.28: Uniqueness of the Reduced Row-Echelon Form


Every matrix A is equivalent to a unique matrix in reduced row-echelon form.

Proof. Let A be an m n matrix and let B and C be matrices in reduced row-echelon form, each equivalent
to A. It suffices to show that B = C.
Let A+ be the matrix A augmented with a new rightmost column consisting entirely of zeros. Similarly,
augment matrices B and C each with a rightmost column of zeros to obtain B+ and C+. Note that B+ and
C+ are matrices in reduced row-echelon form which are obtained from A+ by respectively applying the
same sequence of elementary row operations which were used to obtain B and C from A.
Now, A+ , B+ , and C+ can all be considered as augmented matrices of homogeneous linear systems
in the variables x1 , x2 , , xn . Because B+ and C+ are each equivalent to A+ , Theorem 1.27 ensures that
28 Systems of Equations

all three homogeneous linear systems have exactly the same solutions. By Lemma 1.26 we conclude that
B+ = C+ . By construction, we must also have B = C.
According to this theorem we can say that each matrix A has a unique reduced row-echelon form.

1.2.4. Rank and Homogeneous Systems

There is a special type of system which requires additional study. This type of system is called a homo-
geneous system of equations, which we defined above in Definition 1.3. Our focus in this section is to
consider what types of solutions are possible for a homogeneous system of equations.
Consider the following definition.

Definition 1.29: Trivial Solution


Consider the homogeneous system of equations given by

a11 x1 + a12 x2 + + a1n xn = 0


a21 x1 + a22 x2 + + a2n xn = 0
.
..
am1 x1 + am2 x2 + + amn xn = 0

Then, x1 = 0, x2 = 0, , xn = 0 is always a solution to this system. We call this the trivial solution
.

If the system has a solution in which not all of the x1 , , xn are equal to zero, then we call this solution
nontrivial . The trivial solution does not tell us much about the system, as it says that 0 = 0! Therefore,
when working with homogeneous systems of equations, we want to know when the system has a nontrivial
solution.
Suppose we have a homogeneous system of m equations, using n variables, and suppose that n > m.
In other words, there are more variables than equations. Then, it turns out that this system always has
a nontrivial solution. Not only will the system have a nontrivial solution, but it also will have infinitely
many solutions. It is also possible, but not required, to have a nontrivial solution if n = m and n < m.
Consider the following example.

Example 1.30: Solutions to a Homogeneous System of Equations


Find the nontrivial solutions to the following homogeneous system of equations

2x + y z = 0
x + 2y 2z = 0

Solution. Notice that this system has m = 2 equations and n = 3 variables, so n > m. Therefore by our
previous discussion, we expect this system to have infinitely many solutions.
The process we use to find the solutions for a homogeneous system of equations is the same process
1.2. Systems Of Equations, Algebraic Procedures 29

we used in the previous section. First, we construct the augmented matrix, given by
 
2 1 1 0
1 2 2 0

Then, we carry this matrix to its reduced row-echelon form, given below.
 
1 0 0 0
0 1 1 0

The corresponding system of equations is


x=0
yz = 0
Since z is not restrained by any equation, we know that this variable will become our parameter. Let z = t
where t is any number. Therefore, our solution has the form

x=0
y=z=t
z=t

Hence this system has infinitely many solutions, with one parameter t.
Suppose we were to write the solution to the previous example in another form. Specifically,

x=0
y = 0+t
z = 0+t

can be written as
x 0 0
y = 0 +t 1
z 0 1
Notice that we have constructed a column from the constants in the solution (all equal to 0), as well as a
column corresponding to the coefficients on t in each equation. While we will discuss this form of solution
more in further chapters,
for now consider the column of coefficients of the parameter t. In this case, this
0
is the column 1 .
1
There is a special name for this column, which is basic solution. The basic solutions of a system are
columns constructed from the coefficients on parameters in the solution. We often denote basic solutions
by X1 , X2 etc.,
depending on how many solutions occur. Therefore, Example 1.30 has the basic solution
0
X1 = 1 .
1
We explore this further in the following example.
30 Systems of Equations

Example 1.31: Basic Solutions of a Homogeneous System


Consider the following homogeneous system of equations.

x + 4y + 3z = 0
3x + 12y + 9z = 0

Find the basic solutions to this system.

Solution. The augmented matrix of this system and the resulting reduced row-echelon form are
   
1 4 3 0 1 4 3 0

3 12 9 0 0 0 0 0

When written in equations, this system is given by

x + 4y + 3z = 0

Notice that only x corresponds to a pivot column. In this case, we will have two parameters, one for y and
one for z. Let y = s and z = t for any numbers s and t. Then, our solution becomes

x = 4s 3t
y=s
z=t

which can be written as


x 0 4 3
y = 0 +s 1 +t 0
z 0 0 1
You can see here that we have two columns of coefficients corresponding to parameters, specifically one
for s and one for t. Therefore, this system has two basic solutions! These are

4 3
X1 = 1 , X2 = 0
0 1


We now present a new definition.

Definition 1.32: Linear Combination


Let X1 , , Xn ,V be column matrices. Then V is said to be a linear combination of the columns
X1 , , Xn if there exist scalars, a1 , , an such that

V = a1 X1 + + an Xn

A remarkable result of this section is that a linear combination of the basic solutions is again a solution
to the system. Even more remarkable is that every solution can be written as a linear combination of these
1.2. Systems Of Equations, Algebraic Procedures 31

solutions. Therefore, if we take a linear combination of the two solutions to Example 1.31, this would also
be a solution. For example, we could take the following linear combination

4 3 18
3 1 +2 0 = 3
0 1 2

You should take a moment to verify that



x 18
y = 3
z 2

is in fact a solution to the system in Example 1.31.


Another way in which we can find out more information about the solutions of a homogeneous system
is to consider the rank of the associated coefficient matrix. We now define what is meant by the rank of a
matrix.

Definition 1.33: Rank of a Matrix


Let A be a matrix and consider any row-echelon form of A. Then, the number r of leading entries
of A does not depend on the row-echelon form you choose, and is called the rank of A. We denote
it by rank(A).

Similarly, we could count the number of pivot positions (or pivot columns) to determine the rank of A.

Example 1.34: Finding the Rank of a Matrix


Consider the matrix
1 2 3
1 5 9
2 4 6
What is its rank?

Solution. First, we need to find the reduced row-echelon form of A. Through the usual algorithm, we find
that this is
1 0 1
0 1 2
0 0 0
Here we have two leading entries, or two pivot positions, shown above in boxes.The rank of A is r = 2.

Notice that we would have achieved the same answer if we had found the row-echelon form of A
instead of the reduced row-echelon form.
Suppose we have a homogeneous system of m equations in n variables, and suppose that n > m. From
our above discussion, we know that this system will have infinitely many solutions. If we consider the
32 Systems of Equations

rank of the coefficient matrix of this system, we can find out even more about the solution. Note that we
are looking at just the coefficient matrix, not the entire augmented matrix.

Theorem 1.35: Rank and Solutions to a Homogeneous System


Let A be the m n coefficient matrix corresponding to a homogeneous system of equations, and
suppose A has rank r. Then, the solution to the corresponding system has n r parameters.

Consider our above Example 1.31 in the context of this theorem. The system in this example has m = 2
equations in n = 3 variables. First, because n > m, we know that the system has a nontrivial solution, and
therefore infinitely many solutions. This tells us that the solution will contain at least one parameter. The
rank of the coefficient matrix can tell us even more about the solution! The rank of the coefficient matrix
of the system is 1, as it has one leading entry in row-echelon form. Theorem 1.35 tells us that the solution
will have n r = 3 1 = 2 parameters. You can check that this is true in the solution to Example 1.31.
Notice that if n = m or n < m, it is possible to have either a unique solution (which will be the trivial
solution) or infinitely many solutions.
We are not limited to homogeneous systems of equations here. The rank of a matrix can be used to
learn about the solutions of any system of linear equations. In the previous section, we discussed that a
system of equations can have no solution, a unique solution, or infinitely many solutions. Suppose the
system is consistent, whether it is homogeneous or not. The following theorem tells us how we can use
the rank to learn about the type of solution we have.

Theorem 1.36: Rank and Solutions to a Consistent System of Equations


Let A be the m (n + 1) augmented matrix corresponding to a consistent system of equations in n
variables, and suppose A has rank r. Then

1. the system has a unique solution if r = n

2. the system has infinitely many solutions if r < n

We will not present a formal proof of this, but consider the following discussions.

1. No Solution The above theorem assumes that the system is consistent, that is, that it has a solution.
It turns out that it is possible for the augmented matrix of a system with no solution to have any
rank r as long as r > 1. Therefore, we must know that the system is consistent in order to use this
theorem!

2. Unique Solution Suppose r = n. Then, there is a pivot position in every column of the coefficient
matrix of A. Hence, there is a unique solution.

3. Infinitely Many Solutions Suppose r < n. Then there are infinitely many solutions. There are less
pivot positions (and hence less leading entries) than columns, meaning that not every column is a
pivot column. The columns which are not pivot columns correspond to parameters. In fact, in this
case we have n r parameters.
1.2. Systems Of Equations, Algebraic Procedures 33

Exercises

Exercise 1.2.4 Find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x y = 3.

Exercise 1.2.5 Find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1.

Exercise 1.2.6 Do the three lines, x + 2y = 1, 2x y = 1, and 4x + 3y = 3 have a common point of


intersection? If so, find the point and if not, tell why they dont have such a common point of intersection.

Exercise 1.2.7 Do the three planes, x + y 3z = 2, 2x + y + z = 1, and 3x + 2y 2z = 0 have a common


point of intersection? If so, find one and if not, tell why there is no such point.

Exercise 1.2.8 Four times the weight of Gaston is 150 pounds more than the weight of Ichabod. Four
times the weight of Ichabod is 660 pounds less than seventeen times the weight of Gaston. Four times the
weight of Gaston plus the weight of Siegfried equals 290 pounds. Brunhilde would balance all three of the
others. Find the weights of the four people.

Exercise 1.2.9 Consider the following augmented matrix in which denotes an arbitrary number and 
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?

0  0

0 0 
0 0 0 0 

Exercise 1.2.10 Consider the following augmented matrix in which denotes an arbitrary number and 
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?

0 
0 0 

Exercise 1.2.11 Consider the following augmented matrix in which denotes an arbitrary number and 
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?

0  0 0

0 0 0 
0 0 0 0 

Exercise 1.2.12 Consider the following augmented matrix in which denotes an arbitrary number and 
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
34 Systems of Equations

the solution unique?



0  0

0 0 0 0  0
0 0 0 0 

Exercise 1.2.13 Suppose a system of equations has fewer equations than variables. Will such a system
necessarily be consistent? If so, explain why and if not, give an example which is not consistent.

Exercise 1.2.14 If a system of equations has more equations than variables, can it have a solution? If so,
give an example and if not, tell why not.

Exercise 1.2.15 Find h such that  


2 h 4
3 6 7
is the augmented matrix of an inconsistent system.

Exercise 1.2.16 Find h such that  


1 h 3
2 4 6
is the augmented matrix of a consistent system.

Exercise 1.2.17 Find h such that  


1 1 4
3 h 12
is the augmented matrix of a consistent system.

Exercise 1.2.18 Choose h and k such that the augmented matrix shown has each of the following:

(a) one solution

(b) no solution

(c) infinitely many solutions


 
1 h 2
2 4 k

Exercise 1.2.19 Choose h and k such that the augmented matrix shown has each of the following:

(a) one solution

(b) no solution

(c) infinitely many solutions


1.2. Systems Of Equations, Algebraic Procedures 35

 
1 2 2
2 h k

Exercise 1.2.20 Determine if the system is consistent. If so, is the solution unique?

x + 2y + z w = 2
xy+z+w = 1
2x + y z = 1
4x + 2y + z = 5

Exercise 1.2.21 Determine if the system is consistent. If so, is the solution unique?

x + 2y + z w = 2
xy+z+w = 0
2x + y z = 1
4x + 2y + z = 3

Exercise 1.2.22 Determine which matrices are in reduced row-echelon form.


 
1 2 0
(a)
0 1 7

1 0 0 0
(b) 0 0 1 2
0 0 0 0

1 1 0 0 0 5
(c) 0 0 1 2 0 4
0 0 0 0 1 3

Exercise 1.2.23 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
2 1 3 1
1 0 2 1
1 1 1 2

Exercise 1.2.24 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
0 0 1 1
1 1 1 0
1 1 0 1
36 Systems of Equations

Exercise 1.2.25 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
3 6 7 8
1 2 2 2
1 2 3 4

Exercise 1.2.26 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
2 4 5 15
1 2 3 9
1 2 2 6

Exercise 1.2.27 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
4 1 7 10
1 0 3 3
1 1 2 1

Exercise 1.2.28 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
3 5 4 2
1 2 1 1
1 1 2 0

Exercise 1.2.29 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
2 3 8 7
1 2 5 5
1 3 7 8

Exercise 1.2.30 Find the solution of the system whose augmented matrix is

1 2 0 2
1 3 4 2
1 0 2 1

Exercise 1.2.31 Find the solution of the system whose augmented matrix is

1 2 0 2
2 0 1 1
3 2 1 3
1.2. Systems Of Equations, Algebraic Procedures 37

Exercise 1.2.32 Find the solution of the system whose augmented matrix is
 
1 1 0 1
1 0 4 2

Exercise 1.2.33 Find the solution of the system whose augmented matrix is

1 0 2 1 1 2
0 1 0 1 2 1

1 2 0 0 1 3
1 0 1 0 2 2

Exercise 1.2.34 Find the solution of the system whose augmented matrix is

1 0 2 1 1 2
0 1 0 1 2 1

0 2 0 0 1 3
1 1 2 2 2 0

Exercise 1.2.35 Find the solution to the system of equations, 7x + 14y + 15z = 22, 2x + 4y + 3z = 5, and
3x + 6y + 10z = 13.

Exercise 1.2.36 Find the solution to the system of equations, 3x y + 4z = 6, y + 8z = 0, and 2x + y =


4.

Exercise 1.2.37 Find the solution to the system of equations, 9x 2y + 4z = 17, 13x 3y + 6z = 25,
and 2x z = 3.

Exercise 1.2.38 Find the solution to the system of equations, 65x + 84y + 16z = 546, 81x + 105y + 20z =
682, and 84x + 110y + 21z = 713.

Exercise 1.2.39 Find the solution to the system of equations, 8x + 2y + 3z = 3, 8x + 3y + 3z = 1, and


4x + y + 3z = 9.

Exercise 1.2.40 Find the solution to the system of equations, 8x + 2y + 5z = 18, 8x + 3y + 5z = 13,
and 4x + y + 5z = 19.

Exercise 1.2.41 Find the solution to the system of equations, 3x y 2z = 3, y 4z = 0, and 2x + y =


2.

Exercise 1.2.42 Find the solution to the system of equations, 9x + 15y = 66, 11x + 18y = 79, x + y =
4, and z = 3.
38 Systems of Equations

Exercise 1.2.43 Find the solution to the system of equations, 19x + 8y = 108, 71x + 30y = 404,
2x + y = 12, 4x + z = 14.

Exercise 1.2.44 Suppose a system of equations has fewer equations than variables and you have found a
solution to this system of equations. Is it possible that your solution is the only one? Explain.

Exercise 1.2.45 Suppose a system of linear equations has a 2 4 augmented matrix and the last column
is a pivot column. Could the system of linear equations be consistent? Explain.

Exercise 1.2.46 Suppose the coefficient matrix of a system of n equations with n variables has the property
that every column is a pivot column. Does it follow that the system of equations must have a solution? If
so, must the solution be unique? Explain.

Exercise 1.2.47 Suppose there is a unique solution to a system of linear equations. What must be true of
the pivot columns in the augmented matrix?

Exercise 1.2.48 The steady state temperature, u, of a plate solves Laplaces equation, u = 0. One way
to approximate the solution is to divide the plate into a square mesh and require the temperature at each
node to equal the average of the temperature at the four adjacent nodes. In the following picture, the
numbers represent the observed temperature at the indicated nodes. Find the temperature at the interior
nodes, indicated by x, y, z, and w. One of the equations is z = 41 (10 + 0 + w + x).
30 30
20 y w 0
20 x z 0
10 10

Exercise 1.2.49 Find the rank of the following matrix.



4 16 1 5
1 4 0 1
1 4 1 2

Exercise 1.2.50 Find the rank of the following matrix.



3 6 5 12
1 2 2 5
1 2 1 2

Exercise 1.2.51 Find the rank of the following matrix.



0 0 1 0 3
1 4 1 0 8

1 4 0 1 2
1 4 0 1 2
1.2. Systems Of Equations, Algebraic Procedures 39

Exercise 1.2.52 Find the rank of the following matrix.



4 4 3 9
1 1 1 2
1 1 0 3

Exercise 1.2.53 Find the rank of the following matrix.



2 0 1 0 1
1 0 1 0 0

1 0 0 1 7
1 0 0 1 7

Exercise 1.2.54 Find the rank of the following matrix.



4 15 29
1 4 8

1 3 5
3 9 15

Exercise 1.2.55 Find the rank of the following matrix.



0 0 1 0 1
1 2 3 2 18

1 2 2 1 11
1 2 2 1 11

Exercise 1.2.56 Find the rank of the following matrix.



1 2 0 3 11
1 2 0 4 15

1 2 0 3 11
0 0 0 0 0

Exercise 1.2.57 Find the rank of the following matrix.



2 3 2
1 1 1

1 0 1
3 0 3

Exercise 1.2.58 Find the rank of the following matrix.



4 4 20 1 17
1 1 5 0 5

1 1 5 1 2
3 3 15 3 6
40 Systems of Equations

Exercise 1.2.59 Find the rank of the following matrix.



1 3 4 3 8
1 3 4 2 5

1 3 4 1 2
2 6 8 2 4

Exercise 1.2.60 Suppose A is an m n matrix. Explain why the rank of A is always no larger than
min (m, n) .

Exercise 1.2.61 State whether each of the following sets of data are possible for the matrix equation
AX = B. If possible, describe the solution set. That is, tell whether there exists a unique solution, no
solution or infinitely many solutions. Here, [A|B] denotes the augmented matrix.

(a) A is a 5 6 matrix, rank(A) = 4 and rank [A|B] = 4.

(b) A is a 3 4 matrix, rank(A) = 3 and rank [A|B] = 2.

(c) A is a 4 2 matrix, rank(A) = 4 and rank [A|B] = 4.

(d) A is a 5 5 matrix, rank(A) = 4 and rank [A|B] = 5.

(e) A is a 4 2 matrix, rank(A) = 2 and rank [A|B] = 2.

Exercise 1.2.62 Consider the system 5x + 2y z = 0 and 5x 2y z = 0. Both equations equal zero
and so 5x + 2y z = 5x 2y z which is equivalent to y = 0. Does it follow that x and z can equal
anything? Notice that when x = 1, z = 4, and y = 0 are plugged in to the equations, the equations do
not equal 0. Why?
2. Matrices

2.1 Matrix Arithmetic

Outcomes
A. Perform the matrix operations of matrix addition, scalar multiplication, transposition and ma-
trix multiplication. Identify when these operations are not defined. Represent these operations
in terms of the entries of a matrix.

B. Prove algebraic properties for matrix addition, scalar multiplication, transposition, and ma-
trix multiplication. Apply these properties to manipulate an algebraic expression involving
matrices.

C. Compute the inverse of a matrix using row operations, and prove identities involving matrix
inverses.

E. Solve a linear system using matrix algebra.

F. Use multiplication by an elementary matrix to apply row operations.

G. Write a matrix as a product of elementary matrices.

You have now solved systems of equations by writing them in terms of an augmented matrix and
then doing row operations on this augmented matrix. It turns out that matrices are important not only for
systems of equations but also in many applications.
Recall that a matrix is a rectangular array of numbers. Several of them are referred to as matrices.
For example, here is a matrix.
1 2 3 4
5 2 8 7 (2.1)
6 9 1 2
Recall that the size or dimension of a matrix is defined as m n where m is the number of rows and n is
the number of columns. The above matrix is a 3 4 matrix because there are three rows and four columns.
You can remember the columns are like columns in a Greek temple. They stand upright while the rows
lay flat like rows made by a tractor in a plowed field.
When specifying the size of a matrix, you always list the number of rows before the number of
columns.You might remember that you always list the rows before the columns by using the phrase
Rowman Catholic.

41
42 Matrices

Consider the following definition.

Definition 2.1: Square Matrix


A matrix A which has size n n is called a square matrix . In other words, A is a square matrix if
it has the same number of rows and columns.

There is some notation specific to matrices which we now introduce. We denote the columns of a
matrix A by A j as follows  
A = A1 A2 An
Therefore, A j is the jth column of A, when counted from left to right.
The individual elements of the matrix are called entries or components of A. Elements of the matrix
are identified according to their position. The (i, j)-entry of a matrix is the entry in the ith row and jth
column. For example, in the matrix 2.1 above, 8 is in position (2, 3) (and is called the (2, 3)-entry) because
it is in the second row and the third column.
In order to remember which matrix we are speaking of, we will denote the entry in the ith row and
the jth column of matrix A by ai j . Then, we can write A in terms of its entries, as A = ai j . Using this
notation on the matrix in 2.1, a23 = 8, a32 = 9, a12 = 2, etc.
There are various operations which are done on matrices of appropriate sizes. Matrices can be added
to and subtracted from other matrices, multiplied by a scalar, and multiplied by other matrices. We will
never divide a matrix by another matrix, but we will see later how matrix inverses play a similar role.
In doing arithmetic with matrices, we often define the action by what happens in terms of the entries
(or components) of the matrices. Before looking at these operations in depth, consider a few general
definitions.

Definition 2.2: The Zero Matrix


The m n zero matrix is the m n matrix having every entry equal to zero. It is denoted by 0.

One possible zero matrix is shown in the following example.

Example 2.3: The Zero Matrix


 
0 0 0
The 2 3 zero matrix is 0 = .
0 0 0

Note there is a 2 3 zero matrix, a 3 4 zero matrix, etc. In fact there is a zero matrix for every size!

Definition 2.4: Equality of Matrices


   
Let A and B be two m n matrices. Then A = B means that for A = ai j and B = bi j , ai j = bi j
for all 1 i m and 1 j n.
2.1. Matrix Arithmetic 43

In other words, two matrices are equal exactly when they are the same size and the corresponding
entries are identical. Thus
0 0  
0 0 6= 0 0
0 0
0 0
because they are different sizes. Also,
   
0 1 1 0
6=
3 2 2 3
because, although they are the same size, their corresponding entries are not identical.
In the following section, we explore addition of matrices.

2.1.1. Addition of Matrices

When adding matrices, all matrices in the sum need have the same size. For example,

1 2
3 4
5 2
and  
1 4 8
2 8 5
cannot be added, as one has size 3 2 while the other has size 2 3.
However, the addition
4 6 3 0 5 0
5 0 4 + 4 4 14
11 2 3 1 2 6
is possible.
The formal definition is as follows.

Definition 2.5: Addition of Matrices


   
Let A =  ai j and B = bi j be two m n matrices. Then A + B = C where C is the m n matrix
C = ci j defined by
ci j = ai j + bi j

This definition tells us that when adding matrices, we simply add corresponding entries of the matrices.
This is demonstrated in the next example.

Example 2.6: Addition of Matrices of Same Size


Add the following matrices, if possible.
   
1 2 3 5 2 3
A= ,B =
1 0 4 6 2 1
44 Matrices

Solution. Notice that both A and B are of size 2 3. Since A and B are of the same size, the addition is
possible. Using Definition 2.5, the addition is done as follows.
       
1 2 3 5 2 3 1+5 2+2 3+3 6 4 6
A+B = + = =
1 0 4 6 2 1 1 + 6 0 + 2 4 + 1 5 2 5


Addition of matrices obeys very much the same properties as normal addition with numbers. Note that
when we write for example A + B then we assume that both matrices are of equal size so that the operation
is indeed possible.

Proposition 2.7: Properties of Matrix Addition


Let A, B and C be matrices. Then, the following properties hold.

Commutative Law of Addition


A+B = B+A (2.2)

Associative Law of Addition

(A + B) +C = A + (B +C) (2.3)

Existence of an Additive Identity

There exists a zero matrix 0 such that


(2.4)
A+0 = A

Existence of an Additive Inverse


There exists a matrix A such that
(2.5)
A + (A) = 0

Proof. Consider the Commutative Law of Addition given in 2.2. Let A, B,C, and D be matrices such that
A + B = C and B + A = D. We want to show that D = C. To do so, we will use the definition of matrix
addition given in Definition 2.5. Now,

ci j = ai j + bi j = bi j + ai j = di j

Therefore, C = D because the i jth entries are the same for all i and j. Note that the conclusion follows
from the commutative law of addition of numbers, which says that if a and b are two numbers, then
a + b = b + a. The proof of the other results are similar, and are left as an exercise.
We call the zero matrix in 2.4 the additive identity. Similarly, we call the matrix A in 2.5 the
additive inverse. A is defined to equal (1) A = [ai j ]. In other words, every entry of A is multiplied
by 1. In the next section we will study scalar multiplication in more depth to understand what is meant
by (1) A.
2.1. Matrix Arithmetic 45

2.1.2. Scalar Multiplication of Matrices

Recall that we use the word scalar when referring to numbers. Therefore, scalar multiplication of a matrix
is the multiplication of a matrix by a number. To illustrate this concept, consider the following example in
which a matrix is multiplied by the scalar 3.

1 2 3 4 3 6 9 12
3 5 2 8 7 = 15 6 24 21
6 9 1 2 18 27 3 6

The new matrix is obtained by multiplying every entry of the original matrix by the given scalar.
The formal definition of scalar multiplication is as follows.

Definition 2.8: Scalar Multiplication of Matrices


   
If A = ai j and k is a scalar, then kA = kai j .

Consider the following example.

Example 2.9: Effect of Multiplication by a Scalar


Find the result of multiplying the following matrix A by 7.
 
2 0
A=
1 4

Solution. By Definition 2.8, we multiply each element of A by 7. Therefore,


     
2 0 7(2) 7(0) 14 0
7A = 7 = =
1 4 7(1) 7(4) 7 28


Similarly to addition of matrices, there are several properties of scalar multiplication which hold.
46 Matrices

Proposition 2.10: Properties of Scalar Multiplication


Let A, B be matrices, and k, p be scalars. Then, the following properties hold.

Distributive Law over Matrix Addition

k (A + B) = kA + kB

Distributive Law over Scalar Addition

(k + p) A = kA + pA

Associative Law for Scalar Multiplication

k (pA) = (kp) A

Rule for Multiplication by 1


1A = A

The proof of this proposition is similar to the proof of Proposition 2.7 and is left an exercise to the
reader.

2.1.3. Multiplication of Matrices

The next important matrix operation we will explore is multiplication of matrices. The operation of matrix
multiplication is one of the most important and useful of the matrix operations. Throughout this section,
we will also demonstrate how matrix multiplication relates to linear systems of equations.
First, we provide a formal definition of row and column vectors.

Definition 2.11: Row and Column Vectors


Matrices of size n 1 or 1 n are called vectors. If X is such a matrix, then we write xi to denote
the entry of X in the ith row of a column matrix, or the ith column of a row matrix.
The n 1 matrix
x1

X = ...
xn
is called a column vector. The 1 n matrix
 
X= x1 xn

is called a row vector.

We may simply use the term vector throughout this text to refer to either a column or row vector. If
we do so, the context will make it clear which we are referring to.
2.1. Matrix Arithmetic 47

In this chapter, we will again use the notion of linear combination of vectors as in Definition 4.7. In
this context, a linear combination is a sum consisting of vectors multiplied by scalars. For example,
       
50 1 2 3
=7 +8 +9
122 4 5 6
is a linear combination of three vectors.
It turns out that we can express any system of linear equations as a linear combination of vectors. In
fact, the vectors that we will use are just the columns of the corresponding augmented matrix!

Definition 2.12: The Vector Form of a System of Linear Equations


Suppose we have a system of equations given by

a11 x1 + + a1n xn = b1
.
..
am1 x1 + + amn xn = bm

We can express this system in vector form which is as follows:



a11 a12 a1n b1
a21 a22 a2n b2

x1 .. + x2 .. + + xn .. = ...
. . .
am1 am2 amn bm

Notice that each vector used here is one column from the corresponding augmented matrix. There is
one vector for each variable in the system, along with the constant vector.
The first important form of matrix multiplication is multiplying a matrix by a vector. Consider the
product given by
  7
1 2 3
8
4 5 6
9
We will soon see that this equals
       
1 2 3 50
7 +8 +9 =
4 5 6 122

In general terms,

  x1      
a11 a12 a13 x2 = x1 a11 + x2 a12 + x3 a13
a21 a22 a23 a21 a22 a23
x3
 
a11 x1 + a12 x2 + a13 x3
=
a21 x1 + a22 x2 + a23 x3
Thus you take x1 times the first column, add to x2 times the second column, and finally x3 times the third
column. The above sum is a linear combination of the columns of the matrix. When you multiply a matrix
48 Matrices

on the left by a vector on the right, the numbers making up the vector are just the scalars to be used in the
linear combination of the columns as illustrated above.
Here is the formal definition of how to multiply an m n matrix by an n 1 column vector.

Definition 2.13: Multiplication of Vector by Matrix


 
Let A = ai j be an m n matrix and let X be an n 1 matrix given by

x1

A = [A1 An ] , X = ...
xn

Then the product AX is the m 1 column vector which equals the following linear combination of
the columns of A:
n
x1 A1 + x2 A2 + + xn An = x jA j
j=1

If we write the columns of A in terms of their entries, they are of the form

a1 j
a2 j

A j = ..
.
am j

Then, we can write the product AX as



a11 a12 a1n
a21 a22 a2n

AX = x1 .. + x2 .. + + xn ..
. . .
am1 am2 amn

Note that multiplication of an m n matrix and an n 1 vector produces an m 1 vector.


Here is an example.

Example 2.14: A Vector Multiplied by a Matrix


Compute the product AX for

1
1 2 1 3 2
A = 0 2 1 2 , X =

0
2 1 4 1
1

Solution. We will use Definition 2.13 to compute the product. Therefore, we compute the product AX as
2.1. Matrix Arithmetic 49

follows.

1 2 1 3

1 0 +2 2 +0 1 + 1 2
2 1 4 1

1 4 0 3

= 0 + 4 + 0 + 2
2 2 0 1

8
= 2
5


Using the above operation, we can also write a system of linear equations in matrix form. In this
form, we express the system as a matrix multiplied by a vector. Consider the following definition.

Definition 2.15: The Matrix Form of a System of Linear Equations


Suppose we have a system of equations given by

a11 x1 + + a1n xn = b1
a21 x1 + + a2n xn = b2
.
..
am1 x1 + + amn xn = bm

Then we can express this system in matrix form as follows.



a11 a12 a1n x1 b1
a21 a22 a2n x2 b2

.. . ..
. . .. =
. ...
. . . . .
am1 am2 amn xn bm

The expression AX = B is also known as the Matrix Form of the corresponding system of linear
equations. The matrix A is simply the coefficient matrix of the system, the vector X is the column vector
constructed from the variables of the system, and finally the vector B is the column vector constructed
from the constants of the system. It is important to note that any system of linear equations can be written
in this form.
Notice that if we write a homogeneous system of equations in matrix form, it would have the form
AX = 0, for the zero vector 0.
You can see from this definition that a vector

x1
x2

X = ..
.
xn
50 Matrices

will satisfy the equation AX = B only when the entries x1 , x2 , , xn of the vector X are solutions to the
original system.
Now that we have examined how to multiply a matrix by a vector, we wish to consider the case where
we multiply two matrices of more general sizes, although these sizes still need to be appropriate as we will
see. For example, in Example 2.14, we multiplied a 3 4 matrix by a 4 1 vector. We want to investigate
how to multiply other sizes of matrices.
We have not yet given any conditions on when matrix multiplication is possible! For matrices A and
B, in order to form the product AB, the number of columns of A must equal the number of rows of B.
Consider a product AB where A has size m n and B has size n p. Then, the product in terms of size of
matrices is given by
these must match!
(m d
n) (n p ) = m p
Note the two outside numbers give the size of the product. One of the most important rules regarding
matrix multiplication is the following. If the two middle numbers dont match, you cant multiply the
matrices!
When the number of columns of A equals the number of rows of B the two matrices are said to be
conformable and the product AB is obtained as follows.

Definition 2.16: Multiplication of Two Matrices


Let A be an m n matrix and let B be an n p matrix of the form

B = [B1 B p ]

where B1 , ..., B p are the n 1 columns of B. Then the m p matrix AB is defined as follows:

AB = A [B1 B p ] = [(AB)1 (AB) p]

where (AB)k is an m 1 matrix or column vector which gives the kth column of AB.

Consider the following example.

Example 2.17: Multiplying Two Matrices


Find AB if possible.
  1 2 0
1 2 1
A= ,B = 0 3 1
0 2 1
2 1 1

Solution. The first thing you need to verify when calculating a product is whether the multiplication is
possible. The first matrix has size 2 3 and the second matrix has size 3 3. The inside numbers are
equal, so A and B are conformable matrices. According to the above discussion AB will be a 2 3 matrix.
2.1. Matrix Arithmetic 51

Definition 2.16 gives us a way to calculate each column of AB, as follows.



First column Second column Third column
z }| {
z }| { z }| {
   0
1   2 
1 2 1 1 2 1 1 2 1
0 , 3 , 1
0 2 1 0 2 1 0 2 1
2 1 1

You know how to multiply a matrix times a vector, using Definition 2.13 for each of the three columns.
Thus
  1 2 0  
1 2 1 1 9 3
0 3 1 =
0 2 1 2 7 3
2 1 1

Since vectors are simply n 1 or 1 m matrices, we can also multiply a vector by another vector.

Example 2.18: Vector Times Vector Multiplication



1  
Multiply if possible 2 1 2 1 0 .

1

Solution. In this case we are multiplying a matrix of size 3 1 by a matrix of size 1 4. The inside
numbers match so the product is defined. Note that the product will be a matrix of size 3 4. Using
Definition 2.16, we can compute this product as follows

First column Second column Third column Fourth column
z }| {
z }| { z }| { z }| {

1   1   1   1   1  
2 1 2 1 0 = 2 1 , 2 2 , 2 1 , 2 0

1 1 1 1 1

You can use Definition 2.13 to verify that this product is



1 2 1 0
2 4 2 0
1 2 1 0

Example 2.19: A Multiplication Which is Not Defined


Find BA if possible.
1 2 0  
1 2 1
B = 0 3 1 ,A =
0 2 1
2 1 1
52 Matrices

Solution. First check if it is possible. This product is of the form (3 3) (2 3) . The inside numbers do
not match and so you cant do this multiplication.
In this case, we say that the multiplication is not defined. Notice that these are the same matrices which
we used in Example 2.17. In this example, we tried to calculate BA instead of AB. This demonstrates
another property of matrix multiplication. While the product AB maybe be defined, we cannot assume
that the product BA will be possible. Therefore, it is important to always check that the product is defined
before carrying out any calculations.
Earlier, we defined the zero matrix 0 to be the matrix (of appropriate size) containing zeros in all
entries. Consider the following example for multiplication by the zero matrix.

Example 2.20: Multiplication by the Zero Matrix


Compute the product A0 for the matrix
 
1 2
A=
3 4

and the 2 2 zero matrix given by  


0 0
0=
0 0

Solution. In this product, we compute


    
1 2 0 0 0 0
=
3 4 0 0 0 0

Hence, A0 = 0.
 
0
Notice that we could also multiply A by the 2 1 zero vector given by . The result would be the
0
2 1 zero vector. Therefore, it is always the case that A0 = 0, for an appropriately sized zero matrix or
vector.

2.1.4. The i jth Entry of a Product

In previous sections, we used the entries of a matrix to describe the action of matrix addition and scalar
multiplication. We can also study matrix multiplication using the entries of matrices.
What is the i jth entry of AB? It is the entry in the ith row and the jth column of the product AB.
Now if A is m n and B is n p, then we know that the product AB has the form

a11 a12 a1n b11 b12 b1 j b1p
a21 a22 a2n b21 b22 b2 j b2p

.. .. . . .. .. .. .. ..
. . . . . . . .
am1 am2 amn bn1 bn2 bn j bnp
2.1. Matrix Arithmetic 53

The jth column of AB is of the form



a11 a12 a1n b1 j
a21 a22 a2n b2 j

.. .. . . . ..
. . . .. .
am1 am2 amn bn j
which is an m 1 column vector. It is calculated by

a11 a12 a1n
a21 a22 a2n

b1 j .. + b2 j .. + + bn j ..
. . .
am1 am2 amn

Therefore, the i jth entry is the entry in row i of this vector. This is computed by
n
ai1 b1 j + ai2 b2 j + + ain bn j = aik bk j
k=1

The following is the formal definition for the i jth entry of a product of matrices.

Definition 2.21: The i jth Entry of a Product


   
Let A = ai j be an m n matrix and let B = bi j be an n p matrix. Then AB is an m p matrix
and the (i, j)-entry of AB is defined as
n
(AB)i j = aik bk j
k=1

Another way to write this is



b1 j
 
b2 j

(AB)i j = ai1 ai2 ain ... = ai1 b1 j + ai2 b2 j + + ain bn j

bn j

In other words, to find the (i, j)-entry of the product AB, or (AB)i j , you multiply the ithrow of A, on
the left by the jth column of B. To express AB in terms of its entries, we write AB = (AB)i j .
Consider the following example.

Example 2.22: The Entries of a Product


Compute AB if possible. If it is, find the (3, 2)-entry of AB using Definition 2.21.

1 2  
2 3 1
A = 3 1 ,B =
7 6 2
2 6
54 Matrices

Solution. First check if the product is possible. It is of the form (3 2) (2 3) and since the inside
numbers match, it is possible to do the multiplication. The result should be a 3 3 matrix. We can first
compute AB:
1 2   1 2   1 2  
3 1 2 , 3 1 3 , 3 1 1
7 6 2
2 6 2 6 2 6
where the commas separate the columns in the resulting product. Thus the above product equals

16 15 5
13 15 5
46 42 14

which is a 3 3 matrix as desired. Thus, the (3, 2)-entry equals 42.


Now using Definition 2.21, we can find that the (3, 2)-entry equals
2
a3k bk2 = a31 b12 + a32 b22
k=1
= 2 3 + 6 6 = 42

Consulting our result for AB above, this is correct!


You may wish to use this method to verify that the rest of the entries in AB are correct.
Here is another example.

Example 2.23: Finding the Entries of a Product


Determine if the product AB is defined. If it is, find the (2, 1)-entry of the product.

2 3 1 1 2
A = 7 6 2 ,B = 3 1
0 0 0 2 6

Solution. This product is of the form (3 3) (3 2). The middle numbers match so the matrices are
conformable and it is possible to compute the product.
We want to find the (2, 1)-entry of AB, that is, the entry in the second row and first column of the
product. We will use Definition 2.21, which states
n
(AB)i j = aik bk j
k=1

In this case, n = 3, i = 2 and j = 1. Hence the (2, 1)-entry is found by computing



3   b11
(AB)21 = a2k bk1 = a21 a22 a23 b21
k=1 b31
2.1. Matrix Arithmetic 55

Substituting in the appropriate values, this product becomes



  b11   1
a21 a22 a23 b21 = 7 6 2 3 = 1 7 + 3 6 + 2 2 = 29
b31 2

Hence, (AB)21 = 29.


You should take a moment to find a few other entries of AB. You can multiply the matrices to check
that your answers are correct. The product AB is given by

13 13
AB = 29 32
0 0

2.1.5. Properties of Matrix Multiplication

As pointed out above, it is sometimes possible to multiply matrices in one order but not in the other order.
However, even if both AB and BA are defined, they may not be equal.

Example 2.24: Matrix Multiplication is Not Commutative


   
1 2 0 1
Compare the products AB and BA, for matrices A = ,B =
3 4 1 0

Solution. First, notice that A and B are both of size 2 2. Therefore, both products AB and BA are defined.
The first product, AB is     
1 2 0 1 2 1
AB = =
3 4 1 0 4 3
The second product, BA is     
0 1 1 2 3 4
=
1 0 3 4 1 2
Therefore, AB 6= BA.
This example illustrates that you cannot assume AB = BA even when multiplication is defined in both
orders. If for some matrices A and B it is true that AB = BA, then we say that A and B commute. This is
one important property of matrix multiplication.
The following are other important properties of matrix multiplication. Notice that these properties hold
only when the size of matrices are such that the products are defined.
56 Matrices

Proposition 2.25: Properties of Matrix Multiplication


The following hold for matrices A, B, and C and for scalars r and s,

A (rB + sC) = r (AB) + s (AC) (2.6)

(B +C) A = BA +CA (2.7)

A (BC) = (AB)C (2.8)

Proof. First we will prove 2.6. We will use Definition 2.21 and prove this statement using the i jth entries
of a matrix. Therefore,

(A (rB + sC))i j = aik (rB + sC)k j = aik rbk j + sck j
k k

= r aik bk j + s aik ck j = r (AB)i j + s (AC)i j


k k
= (r (AB) + s (AC))i j
Thus A (rB + sC) = r(AB) + s(AC) as claimed.
The proof of 2.7 follows the same pattern and is left as an exercise.
Statement 2.8 is the associative law of multiplication. Using Definition 2.21,

(A (BC))i j = aik (BC)k j = aik bkl cl j


k k l

= (AB)il cl j = ((AB)C)i j .
l
This proves 2.8.

2.1.6. The Transpose

Another important operation on matrices is that of taking the transpose. For a matrix A, we denote the
transpose of A by AT . Before formally defining the transpose, we explore this operation on the following
matrix. T
1 4  
3 1 = 1 3 2
4 1 6
2 6
What happened? The first column became the first row and the second column became the second row.
Thus the 3 2 matrix became a 2 3 matrix. The number 4 was in the first row and the second column
and it ended up in the second row and first column.
The definition of the transpose is as follows.
2.1. Matrix Arithmetic 57

Definition 2.26: The Transpose of a Matrix


Let A be an m n matrix. Then AT , the transpose of A, denotes the n m matrix given by
 T  
AT = ai j = a ji

The (i, j)-entry of A becomes the ( j, i)-entry of AT .


Consider the following example.

Example 2.27: The Transpose of a Matrix


Calculate AT for the following matrix
 
1 2 6
A=
3 5 4

   
Solution. By Definition 2.26, we know that for A = ai j , AT = a ji . In other words, we switch the row
and column location of each entry. The (1, 2)-entry becomes the (2, 1)-entry.
Thus,
1 3
AT = 2 5
6 4

Notice that A is a 2 3 matrix, while AT is a 3 2 matrix.


The transpose of a matrix has the following important properties .

Lemma 2.28: Properties of the Transpose of a Matrix


Let A be an m n matrix, B an n p matrix, and r and s scalars. Then
T
1. AT = A

2. (AB)T = BT AT

3. (rA + sB)T = rAT + sBT

Proof. First we prove 2. From Definition 2.26,


 T  
(AB)T = (AB)i j = (AB) ji = a jk bki = bki a jk
k k
 T  T  T
= [bik ]T ak j = bi j a i j = BT AT
k
The proof of Formula 3 is left as an exercise.
The transpose of a matrix is related to other important topics. Consider the following definition.
58 Matrices

Definition 2.29: Symmetric and Skew Symmetric Matrices


An n n matrix A is said to be symmetric if A = AT . It is said to be skew symmetric if A = AT .

We will explore these definitions in the following examples.

Example 2.30: Symmetric Matrices


Let
2 1 3
A= 1 5 3
3 3 7
Use Definition 2.29 to show that A is symmetric.

Solution. By Definition 2.29, we need to show that A = AT . Now, using Definition 2.26,

2 1 3
AT = 1 5 3
3 3 7

Hence, A = AT , so A is symmetric.

Example 2.31: A Skew Symmetric Matrix


Let
0 1 3
A = 1 0 2
3 2 0
Show that A is skew symmetric.

Solution. By Definition 2.29,


0 1 3
AT = 1 0 2
3 2 0

You can see that each entry of AT is equal to 1 times the same entry of A. Hence, AT = A and so
by Definition 2.29, A is skew symmetric.

2.1.7. The Identity and Inverses

There is a special matrix, denoted I, which is called to as the identity matrix. The identity matrix is
always a square matrix, and it has the property that there are ones down the main diagonal and zeroes
2.1. Matrix Arithmetic 59

elsewhere. Here are some identity matrices of various sizes.



1 0 0 0
  1 0 0
1 0 0 1 0 0
[1] , , 0 1 0 , 0 0 1

0 1 0
0 0 1
0 0 0 1

The first is the 1 1 identity matrix, the second is the 2 2 identity matrix, and so on. By extension, you
can likely see what the n n identity matrix would be. When it is necessary to distinguish which size of
identity matrix is being discussed, we will use the notation In for the n n identity matrix.
The identity matrix is so important that there is a special symbol to denote the i jth entry of the identity
matrix. This symbol is given by Ii j = i j where i j is the Kronecker symbol defined by

1 if i = j
i j =
0 if i 6= j

In is called the identity matrix because it is a multiplicative identity in the following sense.

Lemma 2.32: Multiplication by the Identity Matrix


Suppose A is an m n matrix and In is the n n identity matrix. Then AIn = A. If Im is the m m
identity matrix, it also follows that Im A = A.

Proof. The (i, j)-entry of AIn is given by:

aik k j = ai j
k

and so AIn = A. The other case is left as an exercise for you.


We now define the matrix operation which in some ways plays the role of division.

Definition 2.33: The Inverse of a Matrix


A square n n matrix A is said to have an inverse A1 if and only if

AA1 = A1 A = In

In this case, the matrix A is called invertible.

Such a matrix A1 will have the same size as the matrix A. It is very important to observe that the
inverse of a matrix, if it exists, is unique. Another way to think of this is that if it acts like the inverse, then
it is the inverse.

Theorem 2.34: Uniqueness of Inverse


Suppose A is an n n matrix such that an inverse A1 exists. Then there is only one such inverse
matrix. That is, given any matrix B such that AB = BA = I , B = A1 .
60 Matrices

Proof. In this proof, it is assumed that I is the n n identity matrix. Let A, B be n n matrices such that
A1 exists and AB = BA = I. We want to show that A1 = B. Now using properties we have seen, we get:

A1 = A1 I = A1 (AB) = A1 A B = IB = B

Hence, A1 = B which tells us that the inverse is unique.


The next example demonstrates how to check the inverse of a matrix.

Example 2.35: Verifying the Inverse of a Matrix


   
1 1 2 1
Let A = . Show is the inverse of A.
1 2 1 1

Solution. To check this, multiply


    
1 1 2 1 1 0
= =I
1 2 1 1 0 1

and     
2 1 1 1 1 0
= =I
1 1 1 2 0 1
showing that this matrix is indeed the inverse of A.
Unlike ordinary multiplication of numbers, it can happen that A 6= 0 but A may fail to have an inverse.
This is illustrated in the following example.

Example 2.36: A Nonzero Matrix With No Inverse


 
1 1
Let A = . Show that A does not have an inverse.
1 1

Solution. One might think A would have an inverse because it does not equal zero. However, note that
    
1 1 1 0
=
1 1 1 0

If A1 existed, we would have the following


   
0 1 0
= A
0 0
  
1 1
= A A
1
 
1
 1
= A A
1
 
1
= I
1
2.1. Matrix Arithmetic 61

 
1
=
1
This says that    
0 1
=
0 1
which is impossible! Therefore, A does not have an inverse.
In the next section, we will explore how to find the inverse of a matrix, if it exists.

2.1.8. Finding the Inverse of a Matrix

In Example 2.35, we were given A1 and asked to verify that this matrix was in fact the inverse of A. In
this section, we explore how to find A1 .
Let  
1 1
A=
1 2
 
1 x z
as in Example 2.35. In order to find A , we need to find a matrix such that
y w
    
1 1 x z 1 0
=
1 2 y w 0 1
We can multiply these two matrices, and see that in order for this equation to be true, we must find the
solution to the systems of equations,
x+y = 1
x + 2y = 0
and
z+w = 0
z + 2w = 1
Writing the augmented matrix for these two systems gives
 
1 1 1
1 2 0
for the first system and  
1 1 0
(2.9)
1 2 1
for the second.
Lets solve the first system. Take 1 times the first row and add to the second to get
 
1 1 1
0 1 1
Now take 1 times the second row and add to the first to get
 
1 0 2
0 1 1
62 Matrices

Writing in terms of variables, this says x = 2 and y = 1.


Now solve the second system, 2.9 to find z and w. You will find that z = 1 and w = 1.
If we take the values found for x, y, z, and w and put them into our inverse matrix, we see that the
inverse is    
1 x z 2 1
A = =
y w 1 1
After taking the time to solve the second system, you may have noticed that exactly the same row
operations were used to solve both systems. In each case, the end result was something of the form [I|X ]
where I is the identity and X gave a column of the inverse. In the above,
 
x
y

the first column of the inverse was obtained by solving the first system and then the second column
 
z
w

To simplify this procedure, we could have solved both systems at once! To do so, we could have
written  
1 1 1 0
1 2 0 1
and row reduced until we obtained
 
1 0 2 1
0 1 1 1

and read off the inverse as the 2 2 matrix on the right side.
This exploration motivates the following important algorithm.

Algorithm 2.37: Matrix Inverse Algorithm


Suppose A is an n n matrix. To find A1 if it exists, form the augmented n 2n matrix

[A|I]

If possible do row operations until you obtain an n 2n matrix of the form

[I|B]

When this has been done, B = A1 . In this case, we say that A is invertible. If it is impossible to
row reduce to a matrix of the form [I|B], then A has no inverse.

This algorithm shows how to find the inverse if it exists. It will also tell you if A does not have an
inverse.
Consider the following example.
2.1. Matrix Arithmetic 63

Example 2.38: Finding the Inverse



1 2 2
Let A = 1 0 2 . Find A1 if it exists.
3 1 1

Solution. Set up the augmented matrix



1 2 2 1 0 0
[A|I] = 1 0 2 0 1 0
3 1 1 0 0 1
Now we row reduce, with the goal of obtaining the 3 3 identity matrix on the left hand side. First,
take 1 times the first row and add to the second followed by 3 times the first row added to the third
row. This yields
1 2 2 1 0 0
0 2 0 1 1 0
0 5 7 3 0 1
Then take 5 times the second row and add to -2 times the third row.

1 2 2 1 0 0
0 10 0 5 5 0
0 0 14 1 5 2
Next take the third row and add to 7 times the first row. This yields

7 14 0 6 5 2
0 10 0 5 5 0
0 0 14 1 5 2
Now take 75 times the second row and add to the first row.

7 0 0 1 2 2
0 10 0 5 5 0
0 0 14 1 5 2
Finally divide the first row by -7, the second row by -10 and the third row by 14 which yields
2
1 0 0 17 2
7 7

1 1
0 1 0 2 2 0

1 5 1
0 0 1 14 14 7

Notice that the left hand side of this matrix is now the 3 3 identity matrix I3 . Therefore, the inverse is
the 3 3 matrix on the right hand side, given by
1 2 2
7 7 7

1
2 12 0

1 5 1
14 14 7
64 Matrices


It may happen that through this algorithm, you discover that the left hand side cannot be row reduced
to the identity matrix. Consider the following example of this situation.

Example 2.39: A Matrix Which Has No Inverse



1 2 2
Let A = 1 0 2 . Find A1 if it exists.
2 2 4

Solution. Write the augmented matrix [A|I]



1 2 2 1 0 0
1 0 2 0 1 0
2 2 4 0 0 1
 
and proceed to do row operations attempting to obtain I|A1 . Take 1 times the first row and add to the
second. Then take 2 times the first row and add to the third row.

1 2 2 1 0 0
0 2 0 1 1 0
0 2 0 2 0 1

Next add 1 times the second row to the third row.



1 2 2 1 0 0
0 2 0 1 1 0
0 0 0 1 1 1

At this point, you can see there will be no way to obtain I on the left side of this augmented matrix. Hence,
there is no way to complete this algorithm, and therefore the inverse of A does not exist. In this case, we
say that A is not invertible.
If the algorithm provides an inverse for the original matrix, it is always possible to check your answer.
To do so, use the method demonstrated in Example 2.35. Check that the products AA1 and A1 A both
equal the identity matrix. Through this method, you can always be sure that you have calculated A1
properly!
One way in which the inverse of a matrix is useful is to find the solution of a system of linear equations.
Recall from Definition 2.15 that we can write a system of equations in matrix form, which is of the form
AX = B. Suppose you find the inverse of the matrix A1 . Then you could multiply both sides of this
equation on the left by A1 and simplify to obtain

A1 AX 1
 = A1 B
1
A A X =A B
IX = A1 B
X = A1 B
2.1. Matrix Arithmetic 65

Therefore we can find X , the solution to the system, by computing X = A1 B. Note that once you have
found A1 , you can easily get the solution for different right hand sides (different B). It is always just
A1 B.
We will explore this method of finding the solution to a system in the following example.

Example 2.40: Using the Inverse to Solve a System of Equations


Consider the following system of equations. Use the inverse of a suitable matrix to give the solutions
to this system.
x+z = 1
xy+z = 3
x+yz = 2

Solution. First, we can write the system of equations in matrix form



1 0 1 x 1
AX = 1 1 1 y = 3 = B (2.10)
1 1 1 z 2

The inverse of the matrix


1 0 1
A = 1 1 1
1 1 1
is
1 1
0 2 2

A1 =
1 1 0

1 1
1 2 2

Verifying this inverse is left as an exercise.


From here, the solution to the given system 2.10 is found by

0 1 1 5
2 2
x 1 2
1
y = A B = 1 1 0 3 =
2


1 1
3
z 1 2 2 2 2



0
What if the right side, B, of 2.10 had been 1 ? In other words, what would be the solution to

3

1 0 1 x 0
1 1 1 y = 1 ?
1 1 1 z 3
66 Matrices

By the above discussion, the solution is given by



0 1 1
x 2 2 0 2

y = A B = 1 1
1
0 1 = 1

z 1 2 12
1 3 2

This illustrates that for a system AX = B where A1 exists, it is easy to find the solution when the vector
B is changed.
We conclude this section with some important properties of the inverse.

Theorem 2.41: Inverses of Transposes and Products


Let A, B, and Ai for i = 1, ..., k be n n matrices.

1. If A is an invertible matrix, then (AT )1 = (A1 )T

2. If A and B are invertible matrices, then AB is invertible and (AB)1 = B1 A1

3. If A1 , A2 , ..., Ak are invertible, then the product A1 A2 Ak is invertible, and (A1 A2 Ak )1 =


A1 1 1 1
k Ak1 A2 A1

Consider the following theorem.

Theorem 2.42: Properties of the Inverse


Let A be an n n matrix and I the usual identity matrix.

1. I is invertible and I 1 = I

2. If A is invertible then so is A1 , and (A1 )1 = A

3. If A is invertible then so is Ak , and (Ak )1 = (A1 )k

4. If A is invertible and p is a nonzero real number, then pA is invertible and (pA)1 = 1p A1

2.1.9. Elementary Matrices

We now turn our attention to a special type of matrix called an elementary matrix. An elementary matrix
is always a square matrix. Recall the row operations given in Definition 1.11. Any elementary matrix,
which we often denote by E, is obtained from applying one row operation to the identity matrix of the
same size.
For example, the matrix  
0 1
E=
1 0
2.1. Matrix Arithmetic 67

is the elementary matrix obtained from switching the two rows. The matrix

1 0 0
E = 0 3 0
0 0 1
is the elementary matrix obtained from multiplying the second row of the 3 3 identity matrix by 3. The
matrix  
1 0
E=
3 1
is the elementary matrix obtained from adding 3 times the first row to the third row.
You may construct an elementary matrix from any row operation, but remember that you can only
apply one operation.
Consider the following definition.

Definition 2.43: Elementary Matrices and Row Operations


Let E be an n n matrix. Then E is an elementary matrix if it is the result of applying one row
operation to the n n identity matrix In .
Those which involve switching rows of the identity matrix are called permutation matrices.

Therefore, E constructed above by switching the two rows of I2 is called a permutation matrix.
Elementary matrices can be used in place of row operations and therefore are very useful. It turns out
that multiplying (on the left hand side) by an elementary matrix E will have the same effect as doing the
row operation used to obtain E.
The following theorem is an important result which we will use throughout this text.

Theorem 2.44: Multiplication by an Elementary Matrix and Row Operations


To perform any of the three row operations on a matrix A it suffices to take the product EA, where
E is the elementary matrix obtained by using the desired row operation on the identity matrix.

Therefore, instead of performing row operations on a matrix A, we can row reduce through matrix
multiplication with the appropriate elementary matrix. We will examine this theorem in detail for each of
the three row operations given in Definition 1.11.
First, consider the following lemma.

Lemma 2.45: Action of Permutation Matrix


Let Pi j denote the elementary matrix which involves switching the ith and the jth rows. Then Pi j is
a permutation matrix and
Pi j A = B
where B is obtained from A by switching the ith and the jth rows.

We will explore this idea more in the following example.


68 Matrices

Example 2.46: Switching Rows with an Elementary Matrix


Let
0 1 0 a b
P12 = 1 0 0 , A = g d
0 0 1 e f
Find B where B = P12 A.

Solution. You can see that the matrix P12 is obtained by switching the first and second rows of the 3 3
identity matrix I.
Using our usual procedure, compute the product P12 A = B. The result is given by

g d
B= a b
e f

Notice that B is the matrix obtained by switching rows 1 and 2 of A. Therefore by multiplying A by P12 ,
the row operation which was applied to I to obtain P12 is applied to A to obtain B.
Theorem 2.44 applies to all three row operations, and we now look at the row operation of multiplying
a row by a scalar. Consider the following lemma.

Lemma 2.47: Multiplication by a Scalar and Elementary Matrices


Let E (k, i) denote the elementary matrix corresponding to the row operation in which the ith row is
multiplied by the nonzero scalar, k. Then

E (k, i) A = B

where B is obtained from A by multiplying the ith row of A by k.

We will explore this lemma further in the following example.

Example 2.48: Multiplication of a Row by 5 Using Elementary Matrix


Let
1 0 0 a b
E (5, 2) = 0 5 0 , A = c d
0 0 1 e f
Find the matrix B where B = E (5, 2) A

Solution. You can see that E (5, 2) is obtained by multiplying the second row of the identity matrix by 5.
Using our usual procedure for multiplication of matrices, we can compute the product E (5, 2) A. The
resulting matrix is given by
a b
B = 5c 5d
e f
2.1. Matrix Arithmetic 69

Notice that B is obtained by multiplying the second row of A by the scalar 5.


There is one last row operation to consider. The following lemma discusses the final operation of
adding a multiple of a row to another row.

Lemma 2.49: Adding Multiples of Rows and Elementary Matrices


Let E (k i + j) denote the elementary matrix obtained from I by adding k times the ith row to the
jth . Then
E (k i + j) A = B
where B is obtained from A by adding k times the ith row to the jth row of A.

Consider the following example.

Example 2.50: Adding Two Times the First Row to the Last
Let
1 0 0 a b
E (2 1 + 3) = 0 1 0 , A = c d
2 0 1 e f
Find B where B = E (2 1 + 3) A.

Solution. You can see that the matrix E (2 1 + 3) was obtained by adding 2 times the first row of I to the
third row of I.
Using our usual procedure, we can compute the product E (2 1 + 3) A. The resulting matrix B is
given by
a b
B= c d
2a + e 2b + f
You can see that B is the matrix obtained by adding 2 times the first row of A to the third row.
Suppose we have applied a row operation to a matrix A. Consider the row operation required to return
A to its original form, to undo the row operation. It turns out that this action is how we find the inverse of
an elementary matrix E.
Consider the following theorem.

Theorem 2.51: Elementary Matrices and Inverses


Every elementary matrix is invertible and its inverse is also an elementary matrix.

In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on I.
E 1 will be obtained by performing the row operation which would carry E back to I.

If E is obtained by switching rows i and j, then E 1 is also obtained by switching rows i and j.
70 Matrices

If E is obtained by multiplying row i by the scalar k, then E 1 is obtained by multiplying row i by


the scalar 1k .

If E is obtained by adding k times row i to row j, then E 1 is obtained by subtracting k times row i
from row j.

Consider the following example.

Example 2.52: Inverse of an Elementary Matrix


Let  
1 0
E=
0 2
Find E 1 .

Solution. Consider the elementary matrix E given by


 
1 0
E=
0 2

Here, E is obtained from the 2 2 identity matrix by multiplying the second row by 2. In order to carry E
back to the identity, we need to multiply the second row of E by 21 . Hence, E 1 is given by
" #
1 0
E 1 = 0 1
2

We can verify that EE 1 = I. Take the product EE 1 , given by


 " #  
1 1 0 1 0 1 0
EE = 0 21 = 0 1
0 2

This equals I so we know that we have compute E 1 properly.


Suppose an m n matrix A is row reduced to its reduced row-echelon form. By tracking each row
operation completed, this row reduction can be completed through multiplication by elementary matrices.
Consider the following definition.

Definition 2.53: The Form B = UA


Let A be an m n matrix and let B be the reduced row-echelon form of A. Then we can write
B = UA where U is the product of all elementary matrices representing the row operations done to
A to obtain B.

Consider the following example.


2.1. Matrix Arithmetic 71

Example 2.54: The Form B = UA



0 1
Let A = 1 0 . Find B, the reduced row-echelon form of A and write it in the form B = UA.
2 0

Solution. To find B, row reduce A. For each step, we will record the appropriate elementary matrix. First,
switch rows 1 and 2.
0 1 1 0
1 0 0 1
2 0 2 0

0 1 0
The resulting matrix is equivalent to finding the product of P12 = 1 0 0 and A.
0 0 1
Next, add (2) times row 1 to row 3.

1 0 1 0
0 1 0 1
2 0 0 0

1 0 0
This is equivalent to multiplying by the matrix E(2 1 + 3) = 0 1 0 . Notice that the
2 0 1
resulting matrix is B, the required reduced row-echelon form of A.
We can then write

B = E(2 1 + 2) P12 A

= E(2 1 + 2)P12 A
= UA

It remains to find the matrix U .

U = E(2 1 + 2)P12

1 0 0 0 1 0
= 0 1 0 1 0 0
2 0 1 0 0 1

0 1 0
= 1 0 0
0 2 1

We can verify that B = UA holds for this matrix U :



0 1 0 0 1
UA = 1 0 0 1 0
0 2 1 2 0
72 Matrices


1 0
= 0 1
0 0
= B


While the process used in the above example is reliable and simple when only a few row operations
are used, it becomes cumbersome in a case where many row operations are needed to carry A to B. The
following theorem provides an alternate way to find the matrix U .

Theorem 2.55: Finding the Matrix U


Let A be an m n matrix and let B be its reduced row-echelon form. Then B = UA where U is an
invertible m m matrix found by forming the matrix [A|Im ] and row reducing to [B|U ].

Lets revisit the above example using the process outlined in Theorem 2.55.

Example 2.56: The Form B = UA, Revisited



0 1
Let A = 1 0 . Using the process outlined in Theorem 2.55, find U such that B = UA.
2 0

Solution. First, set up the matrix [A|Im ].



0 1 1 0 0
1 0 0 1 0
2 0 0 0 1

Now, row reduce this matrix until the left side equals the reduced row-echelon form of A.


0 1 1 0 0 1 0 0 1 0
1 0 0 1 0 0 1 1 0 0
2 0 0 0 1 2 0 0 0 1

1 0 0 1 0
0 1 1 0 0
0 0 0 2 1

The left side of this matrix is B, and the right side is U . Comparing this to the matrix U found above
in Example 2.54, you can see that the same matrix is obtained regardless of which process is used.
Recall from Algorithm 2.37 that an n n matrix A is invertible if and only if A can be carried to the
n n identity matrix using the usual row operations. This leads to an important consequence related to the
above discussion.
2.1. Matrix Arithmetic 73

Suppose A is an n n invertible matrix. Then, set up the matrix [A|In] as done above, and row reduce
until it is of the form [B|U ]. In this case, B = In because A is invertible.

B = UA
In = UA
1
U = A

Now suppose that U = E1 E2 Ek where each Ei is an elementary matrix representing a row operation
used to carry A to I. Then,

U 1 = (E1 E2 Ek )1 = Ek1 E21 E1 1

Remember that if Ei is an elementary matrix, so too is Ei1 . It follows that

A = U 1
= Ek1 E21 E1 1

and A can be written as a product of elementary matrices.

Theorem 2.57: Product of Elementary Matrices


Let A be an n n matrix. Then A is invertible if and only if it can be written as a product of
elementary matrices.

Consider the following example.

Example 2.58: Product of Elementary Matrices



0 1 0
Let A = 1 1 0 . Write A as a product of elementary matrices.
0 2 1

Solution. We will use the process outlined in Theorem 2.55 to write A as a product of elementary matrices.
We will set up the matrix [A|I] and row reduce, recording each row operation as an elementary matrix.
First:
0 1 0 1 0 0 1 1 0 0 1 0
1 1 0 0 1 0 0 1 0 1 0 0
0 2 1 0 0 1 0 2 1 0 0 1

0 1 0
represented by the elementary matrix E1 = 1 0 0 .

0 0 1
Secondly:
1 1 0 0 1 0 1 0 0 1 1 0
0 1 0 1 0 0 0 1 0 1 0 0
0 2 1 0 0 1 0 2 1 0 0 1
74 Matrices


1 1 0
represented by the elementary matrix E2 = 0 1 0 .
0 0 1
Finally:
1 0 0 1 1 0 1 0 0 1 1 0
0 1 0 1 0 0 0 1 0 1 0 0
0 2 1 0 0 1 0 0 1 2 0 1

1 0 0
represented by the elementary matrix E3 = 0 1 0 .

0 2 1
Notice that the reduced row-echelon form of A is I. Hence I = UA where U is the product of the
above elementary matrices. It follows that A = U 1 . Since we want to write A as a product of elementary
matrices, we wish to express U 1 as a product of elementary matrices.

U 1 = (E3 E2 E1 )1
= E11 E21 E31

0 1 0 1 1 0 1 0 0
= 1 0 0 0 1 0 0 1 0
0 0 1 0 0 1 0 2 1
= A

This gives A written as a product of elementary matrices. By Theorem 2.57 it follows that A is invert-
ible.

2.1.10. More on Matrix Inverses

In this section, we will prove three theorems which will clarify the concept of matrix inverses. In order to
do this, first recall some important properties of elementary matrices.
Recall that an elementary matrix is a square matrix obtained by performing an elementary operation
on an identity matrix. Each elementary matrix is invertible, and its inverse is also an elementary matrix. If
E is an m m elementary matrix and A is an m n matrix, then the product EA is the result of applying to
A the same elementary row operation that was applied to the m m identity matrix in order to obtain E.
Let R be the reduced row-echelon form of an m n matrix A. R is obtained by iteratively applying
a sequence of elementary row operations to A. Denote by E1 , E2 , , Ek the elementary matrices asso-
ciated with the elementary row operations which were applied, in order, to the matrix A to obtain the
resulting R. We then have that R = (Ek (E2 (E1 A))) = Ek E2 E1 A. Let E denote the product matrix
Ek E2 E1 so that we can write R = EA where E is an invertible matrix whose inverse is the product
(E1 )1 (E2 )1 (Ek )1 .
Now, we will consider some preliminary lemmas.

Lemma 2.59: Invertible Matrix and Zeros


Suppose that A and B are matrices such that the product AB is an identity matrix. Then the reduced
row-echelon form of A does not have a row of zeros.
2.1. Matrix Arithmetic 75

Proof. Let R be the reduced row-echelon form of A. Then R = EA for some invertible square matrix E as
described above. By hypothesis AB = I where I is an identity matrix, so we have a chain of equalities

R(BE 1 ) = (EA)(BE 1 ) = E(AB)E 1 = EIE 1 = EE 1 = I

If R would have a row of zeros, then so would the product R(BE 1 ). But since the identity matrix I does
not have a row of zeros, neither can R have one.
We now consider a second important lemma.

Lemma 2.60: Size of Invertible Matrix


Suppose that A and B are matrices such that the product AB is an identity matrix. Then A has at
least as many columns as it has rows.

Proof. Let R be the reduced row-echelon form of A. By Lemma 2.59, we know that R does not have a row
of zeros, and therefore each row of R has a leading 1. Since each column of R contains at most one of
these leading 1s, R must have at least as many columns as it has rows.
An important theorem follows from this lemma.

Theorem 2.61: Invertible Matrices are Square


Only square matrices can be invertible.

Proof. Suppose that A and B are matrices such that both products AB and BA are identity matrices. We will
show that A and B must be square matrices of the same size. Let the matrix A have m rows and n columns,
so that A is an m n matrix. Since the product AB exists, B must have n rows, and since the product BA
exists, B must have m columns so that B is an n m matrix. To finish the proof, we need only verify that
m = n.
We first apply Lemma 2.60 with A and B, to obtain the inequality m n. We then apply Lemma 2.60
again (switching the order of the matrices), to obtain the inequality n m. It follows that m = n, as we
wanted.
Of course, not all square matrices are invertible. In particular, zero matrices are not invertible, along
with many other square matrices.
The following proposition will be useful in proving the next theorem.

Proposition 2.62: Reduced Row-Echelon Form of a Square Matrix


If R is the reduced row-echelon form of a square matrix, then either R has a row of zeros or R is an
identity matrix.

The proof of this proposition is left as an exercise to the reader. We now consider the second important
theorem of this section.
76 Matrices

Theorem 2.63: Unique Inverse of a Matrix


Suppose A and B are square matrices such that AB = I where I is an identity matrix. Then it follows
that BA = I . Further, both A and B are invertible and B = A1 and A = B1 .

Proof. Let R be the reduced row-echelon form of a square matrix A. Then, R = EA where E is an invertible
matrix. Since AB = I, Lemma 2.59 gives us that R does not have a row of zeros. By noting that R is a
square matrix and applying Proposition 2.62, we see that R = I. Hence, EA = I.
Using both that EA = I and AB = I, we can finish the proof with a chain of equalities as given by

BA = IBIA = (EA)B(E 1E)A


= E(AB)E 1 (EA)
= EIE 1 I
= EE 1 = I

It follows from the definition of the inverse of a matrix that B = A1 and A = B1 .


This theorem is very useful, since with it we need only test one of the products AB or BA in order to
check that B is the inverse of A. The hypothesis that A and B are square matrices is very important, and
without this the theorem does not hold.
We will now consider an example.

Example 2.64: Non Square Matrices


Let
1 0
A = 0 1 ,
0 0
Show that AT A = I but AAT 6= 0.

Solution. Consider the product AT A given by



  1 0  
1 0 0 0 1 = 1 0
0 1 0 0 1
0 0

Therefore, AT A = I2 , where I2 is the 2 2 identity matrix. However, the product AAT is



1 0   1 0 0
0 1 1 0 0 = 0 1 0
0 1 0
0 0 0 0 0

Hence AAT is not the 3 3 identity matrix. This shows that for Theorem 2.63, it is essential that both
matrices be square and of the same size.
Is it possible to have matrices A and B such that AB = I, while BA = 0? This question is left to the
reader to answer, and you should take a moment to consider the answer.
2.1. Matrix Arithmetic 77

We conclude this section with an important theorem.

Theorem 2.65: The Reduced Row-Echelon Form of an Invertible Matrix


For any matrix A the following conditions are equivalent:

A is invertible

The reduced row-echelon form of A is an identity matrix

Proof. In order to prove this, we show that for any given matrix A, each condition implies the other. We
first show that if A is invertible, then its reduced row-echelon form is an identity matrix, then we show that
if the reduced row-echelon form of A is an identity matrix, then A is invertible.
If A is invertible, there is some matrix B such that AB = I. By Lemma 2.59, we get that the reduced row-
echelon form of A does not have a row of zeros. Then by Theorem 2.61, it follows that A and the reduced
row-echelon form of A are square matrices. Finally, by Proposition 2.62, this reduced row-echelon form of
A must be an identity matrix. This proves the first implication.
Now suppose the reduced row-echelon form of A is an identity matrix I. Then I = EA for some product
E of elementary matrices. By Theorem 2.63, we can conclude that A is invertible.
Theorem 2.65 corresponds to Algorithm 1
 1 2.37, which claims that A is found by row reducing the
augmented matrix [A|I] to the form I|A . This will be a matrix product E [A|I] where E is a product of
elementary matrices. By the rules of matrix multiplication, we have that E [A|I] = [EA|EI] = [EA|E].
It follows that the reduced row-echelon form of [A|I] is [EA|E], where EA gives the reduced row-
echelon form of A. By Theorem 2.65, if EA 6= I, then A is not invertible, and if EA = I, A is invertible. If
EA = I, then by Theorem 2.63, E = A1 . This proves that Algorithm 2.37 does in fact find A1 .

Exercises

Exercise 2.1.1 For the following pairs of matrices, determine if the sum A + B is defined. If so, find the
sum.
   
1 0 0 1
(a) A = ,B =
0 1 1 0
   
2 1 2 1 0 3
(b) A = ,B =
1 1 0 0 1 4

1 0  
2 7 1
(c) A = 2 3 ,B =
0 3 4
4 2

Exercise 2.1.2 For each matrix A, find the matrix A such that A + (A) = 0.
78 Matrices

 
1 2
(a) A =
2 1
 
2 3
(b) A =
0 2

0 1 2
(c) A = 1 1 3
4 2 0

Exercise 2.1.3 In the context of Proposition 2.7, describe A and 0.

Exercise 2.1.4 For each matrix A, find the product (2)A, 0A, and 3A.
 
1 2
(a) A =
2 1
 
2 3
(b) A =
0 2

0 1 2
(c) A = 1 1 3
4 2 0

Exercise 2.1.5 Using only the properties given in Proposition 2.7 and Proposition 2.10, show A is
unique.

Exercise 2.1.6 Using only the properties given in Proposition 2.7 and Proposition 2.10 show 0 is unique.

Exercise 2.1.7 Using only the properties given in Proposition 2.7 and Proposition 2.10 show 0A = 0.
Here the 0 on the left is the scalar 0 and the 0 on the right is the zero matrix of appropriate size.

Exercise 2.1.8 Using only the properties given in Proposition 2.7 and Proposition 2.10, as well as previ-
ous problems show (1) A = A.
     
1 2 3 3 1 2 1 2
Exercise 2.1.9 Consider the matrices A = ,B = ,C = ,
    2 1 7 3 2 1 3 1
1 2 2
D= ,E = .
2 3 3
Find the following if possible. If it is not possible explain why.

(a) 3A

(b) 3B A

(c) AC
2.1. Matrix Arithmetic 79

(d) CB

(e) AE

(f) EA


1 2    
2 5 2 1 2
Exercise 2.1.10 Consider the matrices A = 3 2 ,B = ,C = ,
3 2 1 5 0
    1 1
1 1 1
D= ,E =
4 3 3
Find the following if possible. If it is not possible explain why.

(a) 3A

(b) 3B A

(c) AC

(d) CA

(e) AE

(f) EA

(g) BE

(h) DE


1 1   1 1 3
1 1 2
Exercise 2.1.11 Let A = 2 1 , B = , and C = 1 2 0 . Find the
2 1 2
1 2 3 1 0
following if possible.

(a) AB

(b) BA

(c) AC

(d) CA

(e) CB

(f) BC
80 Matrices

 
1 1
Exercise 2.1.12 Let A = . Find all 2 2 matrices, B such that AB = 0.
3 3
   
Exercise 2.1.13 Let X = 1 1 1 and Y = 0 1 2 . Find X T Y and XY T if possible.
   
1 2 1 2
Exercise 2.1.14 Let A = ,B = . Is it possible to choose k such that AB = BA? If so,
3 4 3 k
what should k equal?
   
1 2 1 2
Exercise 2.1.15 Let A = ,B = . Is it possible to choose k such that AB = BA? If so,
3 4 1 k
what should k equal?

Exercise 2.1.16 Find 2 2 matrices, A, B, and C such that A 6= 0,C 6= B, but AC = AB.

Exercise 2.1.17 Give an example of matrices (of any size), A, B,C such that B 6= C, A 6= 0, and yet AB =
AC.

Exercise 2.1.18 Find 2 2 matrices A and B such that A 6= 0 and B 6= 0 but AB = 0.

Exercise 2.1.19 Give an example of matrices (of any size), A, B such that A 6= 0 and B 6= 0 but AB = 0.

Exercise 2.1.20 Find 2 2 matrices A and B such that A 6= 0 and B 6= 0 with AB 6= BA.

Exercise 2.1.21 Write the system


x1 x2 + 2x3
2x3 + x1
3x3
3x4 + 3x2 + x1

x1
x2
in the form A
x3 where A is an appropriate matrix.
x4

Exercise 2.1.22 Write the system


x1 + 3x2 + 2x3
2x3 + x1
6x3
x4 + 3x2 + x1

x1
x2
in the form A
x3 where A is an appropriate matrix.
x4
2.1. Matrix Arithmetic 81

Exercise 2.1.23 Write the system


x1 + x2 + x3
2x3 + x1 + x2
x3 x1
3x4 + x1

x1
x2
in the form A
x3 where A is an appropriate matrix.
x4

Exercise 2.1.24 A matrix A is called idempotent if A2 = A. Let



2 0 2
A= 1 1 2
1 0 1

and show that A is idempotent .

Exercise 2.1.25 For each pair of matrices, find the (1, 2)-entry and (2, 3)-entry of the product AB.

1 2 1 4 6 2
(a) A = 3 4 0 ,B = 7 2 1
2 5 1 1 0 0

1 3 1 2 3 0
(b) A = 0 2 4 , B = 4 16 1
1 0 5 0 2 2

Exercise 2.1.26 Suppose A and B are square matrices of the same size. Which of the following are
necessarily true?

(a) (A B)2 = A2 2AB + B2

(b) (AB)2 = A2 B2

(c) (A + B)2 = A2 + 2AB + B2

(d) (A + B)2 = A2 + AB + BA + B2

(e) A2 B2 = A (AB) B

(f) (A + B)3 = A3 + 3A2 B + 3AB2 + B3

(g) (A + B) (A B) = A2 B2
82 Matrices


1 2    
2 5 2 1 2
Exercise 2.1.27 Consider the matrices A = 3 2 ,B = ,C = ,
3 2 1 5 0
    1 1
1 1 1
D= ,E =
4 3 3
Find the following if possible. If it is not possible explain why.

(a) 3AT

(b) 3B AT

(c) E T B

(d) EE T

(e) BT B

(f) CAT

(g) DT BE

Exercise 2.1.28 Let A be an n nmatrix. Show A equals the sum of a symmetric and a skew symmetric
matrix. Hint: Show that 12 AT + A is symmetric and then consider using this as one of the matrices.

Exercise 2.1.29 Show that the main diagonal of every skew symmetric matrix consists of only zeros.
Recall that the main diagonal consists of every entry of the matrix which is of the form aii .

Exercise 2.1.30 Prove 3. That is, show that for an m n matrix A, an n p matrix B, and scalars r, s, the
following holds:
(rA + sB)T = rAT + sBT

Exercise 2.1.31 Prove that Im A = A where A is an m n matrix.

Exercise 2.1.32 Suppose AB = AC and A is an invertible n n matrix. Does it follow that B = C? Explain
why or why not.

Exercise 2.1.33 Suppose AB = AC and A is a non invertible n n matrix. Does it follow that B = C?
Explain why or why not.

Exercise 2.1.34 Give an example of a matrix A such that A2 = I and yet A 6= I and A 6= I.

Exercise 2.1.35 Let  


2 1
A=
1 3
Find A1 if possible. If A1 does not exist, explain why.
2.1. Matrix Arithmetic 83

Exercise 2.1.36 Let  


0 1
A=
5 3
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.37 Let  


2 1
A=
3 0
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.38 Let  


2 1
A=
4 2
Find A1 if possible. If A1 does not exist, explain why.
 
a b
Exercise 2.1.39 Let A be a 2 2 invertible matrix, with A = . Find a formula for A1 in terms of
c d
a, b, c, d.

Exercise 2.1.40 Let


1 2 3
A= 2 1 4
1 0 2
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.41 Let


1 0 3
A= 2 3 4
1 0 2
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.42 Let


1 2 3
A= 2 1 4
4 5 10
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.43 Let


1 2 0 2
1 1 2 0
A=
2

1 3 2
1 2 1 2
Find A1 if possible. If A1 does not exist, explain why.

Exercise 2.1.44 Using the inverse of the matrix, find the solution to the systems:
84 Matrices

(a)     
2 4 x 1
=
1 1 y 2

(b)     
2 4 x 2
=
1 1 y 0

Now give the solution in terms of a and b to


    
2 4 x a
=
1 1 y b

Exercise 2.1.45 Using the inverse of the matrix, find the solution to the systems:

(a)
1 0 3 x 1
2 3 4 y = 0
1 0 2 z 1

(b)
1 0 3 x 3
2 3 4 y = 1
1 0 2 z 2

Now give the solution in terms of a, b, and c to the following:



1 0 3 x a
2 3 4 y = b
1 0 2 z c

Exercise 2.1.46 Show that if A is an n n invertible matrix and X is a n 1 matrix such that AX = B for
B an n 1 matrix, then X = A1 B.

Exercise 2.1.47 Prove that if A1 exists and AX = 0 then X = 0.

Exercise 2.1.48 Show that if A1 exists for an n n matrix, then it is unique. That is, if BA = I and AB = I,
then B = A1 .
1 T
Exercise 2.1.49 Show that if A is an invertible n n matrix, then so is AT and AT = A1 .

Exercise 2.1.50 Show (AB)1 = B1 A1 by verifying that



AB B1 A1 = I

and
B1 A1 (AB) = I
2.1. Matrix Arithmetic 85

Hint: Use Problem 2.1.48.


 
Exercise 2.1.51 Show that (ABC)1 = C1 B1 A1 by verifying that (ABC) C1 B1 A1 = I and C1 B1 A1
(ABC) = I. Hint: Use Problem 2.1.48.
1 2
Exercise 2.1.52 If A is invertible, show A2 = A1 . Hint: Use Problem 2.1.48.
1
Exercise 2.1.53 If A is invertible, show A1 = A. Hint: Use Problem 2.1.48.
   
2 3 1 2
Exercise 2.1.54 Let A = . Suppose a row operation is applied to A and the result is B = .
1 2 2 3
Find the elementary matrix E that represents this row operation.
   
4 0 8 0
Exercise 2.1.55 Let A = . Suppose a row operation is applied to A and the result is B = .
2 1 2 1
Find the elementary matrix E that represents this row operation.
 
1 3
Exercise 2.1.56 Let A = . Suppose a row operation is applied to A and the result is B =
  0 5
1 3
. Find the elementary matrix E that represents this row operation.
2 1

1 2 1
Exercise 2.1.57 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is
2 1 4

1 2 1

B= 2 1 4 .
0 5 1

(a) Find the elementary matrix E such that EA = B.

(b) Find the inverse of E, E 1 , such that E 1 B = A.


1 2 1
Exercise 2.1.58 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is
2 1 4

1 2 1
B= 0 10 2 .
2 1 4

(a) Find the elementary matrix E such that EA = B.

(b) Find the inverse of E, E 1 , such that E 1 B = A.


86 Matrices


1 2 1
Exercise 2.1.59 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is
2 1 4

1 2 1
B= 0 5 1 .
1 21 2

(a) Find the elementary matrix E such that EA = B.

(b) Find the inverse of E, E 1 , such that E 1 B = A.


1 2 1
Exercise 2.1.60 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is
2 1 4

1 2 1
B= 2 4 5 .
2 1 4

(a) Find the elementary matrix E such that EA = B.

(b) Find the inverse of E, E 1 , such that E 1 B = A.


3. Determinants

3.1 Basic Techniques and Properties

Outcomes
A. Evaluate the determinant of a square matrix using either Laplace Expansion or row operations.

B. Demonstrate the effects that row operations have on determinants.

C. Verify the following:

(a) The determinant of a product of matrices is the product of the determinants.


(b) The determinant of a matrix is equal to the determinant of its transpose.

3.1.1. Cofactors and 2 2 Determinants

Let A be an n n matrix. That is, let A be a square matrix. The determinant of A, denoted by det (A) is a
very important number which we will explore throughout this section.
If A is a 22 matrix, the determinant is given by the following formula.

Definition 3.1: Determinant of a Two By Two Matrix


 
a b
Let A = . Then
c d
det (A) = ad cb

The determinant is also often denoted by enclosing the matrix with two vertical lines. Thus
 
a b a b
det = = ad bc
c d c d

The following is an example of finding the determinant of a 2 2 matrix.

87
88 Determinants

Example 3.2: A Two by Two Determinant


 
2 4
Find det (A) for the matrix A = .
1 6

Solution. From Definition 3.1,

det (A) = (2) (6) (1) (4) = 12 + 4 = 16


The 2 2 determinant can be used to find the determinant of larger matrices. We will now explore how
to find the determinant of a 3 3 matrix, using several tools including the 2 2 determinant.
We begin with the following definition.

Definition 3.3: The i jth Minor of a Matrix


Let A be a 3 3 matrix. The i jth minor of A, denoted as minor (A)i j , is the determinant of the 2 2
matrix which results from deleting the ith row and the jth column of A.
In general, if A is an n n matrix, then the i jth minor of A is the determinant of the n 1 n 1
matrix which results from deleting the ith row and the jth column of A.

Hence, there is a minor associated with each entry of A. Consider the following example which
demonstrates this definition.

Example 3.4: Finding Minors of a Matrix


Let
1 2 3
A= 4 3 2
3 2 1
Find minor (A)12 and minor (A)23 .

Solution. First we will find minor (A)12 . By Definition 3.3, this is the determinant of the 2 2 matrix
which results when you delete the first row and the second column. This minor is given by
 
4 2
minor (A)12 = det
3 1

Using Definition 3.1, we see that


 
4 2
det = (4) (1) (3) (2) = 4 6 = 2
3 1

Therefore minor (A)12 = 2.


3.1. Basic Techniques and Properties 89

Similarly, minor (A)23 is the determinant of the 2 2 matrix which results when you delete the second
row and the third column. This minor is therefore
 
1 2
minor (A)23 = det = 4
3 2

Finding the other minors of A is left as an exercise.


The i jth minor of a matrix A is used in another important definition, given next.

Definition 3.5: The i jth Cofactor of a Matrix


Suppose A is an n n matrix. The i jth cofactor, denoted by cof (A)i j is defined to be

cof (A)i j = (1)i+ j minor (A)i j

It is also convenient to refer to the cofactor of an entry of a matrix as follows. If ai j is the i jth entry of
the matrix, then its cofactor is just cof (A)i j .

Example 3.6: Finding Cofactors of a Matrix


Consider the matrix
1 2 3
A= 4 3 2
3 2 1
Find cof (A)12 and cof (A)23 .

Solution. We will use Definition 3.5 to compute these cofactors.


First, we will compute cof (A)12 . Therefore, we need to find minor (A)12 . This is the determinant of
the 2 2 matrix which results when you delete the first row and the second column. Thus minor (A)12 is
given by  
4 2
det = 2
3 1
Then,
cof (A)12 = (1)1+2 minor (A)12 = (1)1+2 (2) = 2
Hence, cof (A)12 = 2.
Similarly, we can find cof (A)23 . First, find minor (A)23 , which is the determinant of the 2 2 matrix
which results when you delete the second row and the third column. This minor is therefore
 
1 2
det = 4
3 2

Hence,
cof (A)23 = (1)2+3 minor (A)23 = (1)2+3 (4) = 4
90 Determinants


You may wish to find the remaining cofactors for the above matrix. Remember that there is a cofactor
for every entry in the matrix.
We have now established the tools we need to find the determinant of a 3 3 matrix.

Definition 3.7: The Determinant of a Three By Three Matrix


Let A be a 3 3 matrix. Then, det (A) is calculated by picking a row (or column) and taking the
product of each entry in that row (column) with its cofactor and adding these products together.
This process when applied to the ith row (column) is known as expanding along the ith row (col-
umn) as is given by
det (A) = ai1 cof(A)i1 + ai2 cof(A)i2 + ai3 cof(A)i3

When calculating the determinant, you can choose to expand any row or any column. Regardless of
your choice, you will always get the same number which is the determinant of the matrix A. This method of
evaluating a determinant by expanding along a row or a column is called Laplace Expansion or Cofactor
Expansion.
Consider the following example.

Example 3.8: Finding the Determinant of a Three by Three Matrix


Let
1 2 3
A= 4 3 2
3 2 1
Find det (A) using the method of Laplace Expansion.

Solution. First, we will calculate det (A) by expanding along the first column. Using Definition 3.7, we
take the 1 in the first column and multiply it by its cofactor,


1+1 3 2

1 (1) = (1)(1)(1) = 1
2 1

Similarly, we take the 4 in the first column and multiply it by its cofactor, as well as with the 3 in the first
column. Finally, we add these numbers together, as given in the following equation.

cof(A)11 cof(A)21 cof(A)31


z }| { z }| { z }| {
3 2 2 3 2 3
det (A) = 1(1)1+1 + 4(1)2+1 + 3(1)3+1
2 1 2 1 3 2

Calculating each of these, we obtain

det (A) = 1 (1) (1) + 4 (1) (4) + 3 (1) (5) = 1 + 16 + 15 = 0

Hence, det (A) = 0.


3.1. Basic Techniques and Properties 91

As mentioned in Definition 3.7, we can choose to expand along any row or column. Lets try now by
expanding along the second row. Here, we take the 4 in the second row and multiply it to its cofactor, then
add this to the 3 in the second row multiplied by its cofactor, and the 2 in the second row multiplied by its
cofactor. The calculation is as follows.
cof(A)21 cof(A)22 cof(A)23
z }| { z }| { z }| {
2 3 1 3 1 2
det (A) = 4(1)2+1 + 3(1)2+2

+ 2(1)2+3


2 1 3 1 3 2

Calculating each of these products, we obtain


det (A) = 4 (1) (2) + 3 (1) (8) + 2 (1) (4) = 0

You can see that for both methods, we obtained det (A) = 0.
As mentioned above, we will always come up with the same value for det (A) regardless of the row or
column we choose to expand along. You should try to compute the above determinant by expanding along
other rows and columns. This is a good way to check your work, because you should come up with the
same number each time!
We present this idea formally in the following theorem.

Theorem 3.9: The Determinant is Well Defined


Expanding the n n matrix along any row or column always gives the same answer, which is the
determinant.

We have now looked at the determinant of 2 2 and 3 3 matrices. It turns out that the method used
to calculate the determinant of a 3 3 matrix can be used to calculate the determinant of any sized matrix.
Notice that Definition 3.3, Definition 3.5 and Definition 3.7 can all be applied to a matrix of any size.
For example, the i jth minor of a 4 4 matrix is the determinant of the 3 3 matrix you obtain when you
delete the ith row and the jth column. Just as with the 3 3 determinant, we can compute the determinant
of a 4 4 matrix by Laplace Expansion, along any row or column
Consider the following example.

Example 3.10: Determinant of a Four by Four Matrix


Find det (A) where
1 2 3 4
5 4 2 3
A=
1

3 4 5
3 4 3 2

Solution. As in the case of a 3 3 matrix, you can expand this along any row or column. Lets pick the
third column. Then, using Laplace Expansion,

5 4 3 1 2 4

det (A) = 3 (1)1+3 1 3 5 + 2 (1)2+3 1 3 5 +
3 4 2 3 4 2
92 Determinants


1 2 4 1 2 4

3+3

4 (1) 5 4 3 + 3 (1) 5 4 3
4+3

3 4 2 1 3 5

Now, you can calculate each 3 3 determinant using Laplace Expansion, as we did above. You should
complete these as an exercise and verify that det (A) = 12.
The following provides a formal definition for the determinant of an n n matrix. You may wish
to take a moment and consider the above definitions for 2 2 and 3 3 determinants in context of this
definition.

Definition 3.11: The Determinant of an n n Matrix


Let A be an n n matrix where n 2 and suppose the determinant of an (n 1) (n 1) has been
defined. Then
n n
det (A) = ai j cof (A)i j = ai j cof (A)i j
j=1 i=1

The first formula consists of expanding the determinant along the ith row and the second expands
the determinant along the jth column.

In the following sections, we will explore some important properties and characteristics of the deter-
minant.

3.1.2. The Determinant of a Triangular Matrix

There is a certain type of matrix for which finding the determinant is a very simple procedure. Consider
the following definition.

Definition 3.12: Triangular Matrices


A matrix A is upper triangular if ai j = 0 whenever i > j. Thus the entries of such a matrix below
the main diagonal equal 0, as shown. Here, refers to any nonzero number.


.
0 ..
. . .
.. .. ..
0 0

A lower triangular matrix is defined similarly as a matrix for which all entries above the main
diagonal are equal to zero.

The following theorem provides a useful way to calculate the determinant of a triangular matrix.
3.1. Basic Techniques and Properties 93

Theorem 3.13: Determinant of a Triangular Matrix


Let A be an upper or lower triangular matrix. Then det (A) is obtained by taking the product of the
entries on the main diagonal.

The verification of this Theorem can be done by computing the determinant using Laplace Expansion
along the first row or column.
Consider the following example.

Example 3.14: Determinant of a Triangular Matrix


Let
1 2 3 77
0 2 6 7
A=
0

0 3 33.7
0 0 0 1
Find det (A) .

Solution. From Theorem 3.13, it suffices to take the product of the elements on the main diagonal. Thus
det (A) = 1 2 3 (1) = 6.
Without using Theorem 3.13, you could use Laplace Expansion. We will expand along the first column.
This gives

2 6 7 2 3 77

det (A) = 1 0 3 33.7 + 0 (1) 0 3 33.7 +
2+1
0 0 1 0 0 1

2 3 77 2 3 77

3+1

0 (1) 2 6 4+1
7 + 0 (1) 2 6 7
0 0 1 0 3 33.7

and the only nonzero term in the expansion is



2 6 7

1 0 3 33.7
0 0 1

Now find the determinant of this 3 3 matrix, by expanding along the first column to obtain
 
3 33.7 6 7 6 7
det (A) = 1 2 + 0 (1)
2+1 + 0 (1) 3+1
0 1 0 1 3 33.7

3 33.7
= 1 2
0 1
Next use Definition 3.1 to find the determinant of this 2 2 matrix, which is just 3 1 0 33.7 = 3.
Putting all these steps together, we have

det (A) = 1 2 3 (1) = 6


94 Determinants

which is just the product of the entries down the main diagonal of the original matrix!
You can see that while both methods result in the same answer, Theorem 3.13 provides a much quicker
method.
In the next section, we explore some important properties of determinants.

3.1.3. Properties of Determinants I: Examples

There are many important properties of determinants. Since many of these properties involve the row
operations discussed in Chapter 1, we recall that definition now.

Definition 3.15: Row Operations


The row operations consist of the following

1. Switch two rows.

2. Multiply a row by a nonzero number.

3. Replace a row by a multiple of another row added to itself.

We will now consider the effect of row operations on the determinant of a matrix. In future sections,
we will see that using the following properties can greatly assist in finding determinants. This section will
use the theorems as motivation to provide various examples of the usefulness of the properties.
The first theorem explains the affect on the determinant of a matrix when two rows are switched.

Theorem 3.16: Switching Rows


Let A be an n n matrix and let B be a matrix which results from switching two rows of A. Then
det (B) = det (A) .

When we switch two rows of a matrix, the determinant is multiplied by 1. Consider the following
example.

Example 3.17: Switching Two Rows


   
1 2 3 4
Let A = and let B = . Knowing that det (A) = 2, find det (B).
3 4 1 2

Solution. By Definition 3.1, det (A) = 1 4 3 2 = 2. Notice that the rows of B are the rows of A but
switched. By Theorem 3.16 since two rows of A have been switched, det (B) = det (A) = (2) = 2.
You can verify this using Definition 3.1.
The next theorem demonstrates the effect on the determinant of a matrix when we multiply a row by a
scalar.
3.1. Basic Techniques and Properties 95

Theorem 3.18: Multiplying a Row by a Scalar


Let A be an n n matrix and let B be a matrix which results from multiplying some row of A by a
scalar k. Then det (B) = k det (A).

Notice that this theorem is true when we multiply one row of the matrix by k. If we were to multiply
two rows of A by k to obtain B, we would have det (B) = k2 det (A). Suppose we were to multiply all n
rows of A by k to obtain the matrix B, so that B = kA. Then, det (B) = kn det (A). This gives the next
theorem.

Theorem 3.19: Scalar Multiplication


Let A and B be n n matrices and k a scalar, such that B = kA. Then det(B) = kn det(A).

Consider the following example.

Example 3.20: Multiplying a Row by 5


   
1 2 5 10
Let A = , B= . Knowing that det (A) = 2, find det (B).
3 4 3 4

Solution. By Definition 3.1, det (A) = 2. We can also compute det (B) using Definition 3.1, and we see
that det (B) = 10.
Now, lets compute det (B) using Theorem 3.18 and see if we obtain the same answer. Notice that the
first row of B is 5 times the first row of A, while the second row of B is equal to the second row of A. By
Theorem 3.18, det (B) = 5 det (A) = 5 2 = 10.
You can see that this matches our answer above.
Finally, consider the next theorem for the last row operation, that of adding a multiple of a row to
another row.

Theorem 3.21: Adding a Multiple of a Row to Another Row


Let A be an n n matrix and let B be a matrix which results from adding a multiple of a row to
another row. Then det (A) = det (B).

Therefore, when we add a multiple of a row to another row, the determinant of the matrix is unchanged.
Note that if a matrix A contains a row which is a multiple of another row, det (A) will equal 0. To see this,
suppose the first row of A is equal to 1 times the second row. By Theorem 3.21, we can add the first row
to the second row, and the determinant will be unchanged. However, this row operation will result in a
row of zeros. Using Laplace Expansion along the row of zeros, we find that the determinant is 0.
Consider the following example.
96 Determinants

Example 3.22: Adding a Row to Another Row


   
1 2 1 2
Let A = and let B = . Find det (B).
3 4 5 8

Solution. By Definition 3.1, det (A) = 2. Notice that the second row of B is two times the first row of A
added to the second row. By Theorem 3.16, det (B) = det (A) = 2. As usual, you can verify this answer
using Definition 3.1.

Example 3.23: Multiple of a Row


 
1 2
Let A = . Show that det (A) = 0.
2 4

Solution. Using Definition 3.1, the determinant is given by

det (A) = 1 4 2 2 = 0

However notice that the second row is equal to 2 times the first row. Then by the discussion above
following Theorem 3.21 the determinant will equal 0.
Until now, our focus has primarily been on row operations. However, we can carry out the same
operations with columns, rather than rows. The three operations outlined in Definition 3.15 can be done
with columns instead of rows. In this case, in Theorems 3.16, 3.18, and 3.21 you can replace the word,
"row" with the word "column".
There are several other major properties of determinants which do not involve row (or column) opera-
tions. The first is the determinant of a product of matrices.

Theorem 3.24: Determinant of a Product


Let A and B be two n n matrices. Then

det (AB) = det (A) det (B)

In order to find the determinant of a product of matrices, we can simply take the product of the deter-
minants.
Consider the following example.

Example 3.25: The Determinant of a Product


Compare det (AB) and det (A) det (B) for
   
1 2 3 2
A= ,B =
3 2 4 1
3.1. Basic Techniques and Properties 97

Solution. First compute AB, which is given by


    
1 2 3 2 11 4
AB = =
3 2 4 1 1 4

and so by Definition 3.1  


11 4
det (AB) = det = 40
1 4
Now  
1 2
det (A) = det =8
3 2
and  
3 2
det (B) = det = 5
4 1
Computing det (A) det (B) we have 8 5 = 40. This is the same answer as above and you can
see that det (A) det (B) = 8 (5) = 40 = det (AB).
Consider the next important property.

Theorem 3.26: Determinant of the Transpose


Let A be a matrix where AT is the transpose of A. Then,

det AT = det (A)

This theorem is illustrated in the following example.

Example 3.27: Determinant of the Transpose


Let  
2 5
A=
4 3

Find det AT .

Solution. First, note that  


T 2 4
A =
5 3

Using Definition
 3.1, we can compute det (A) and det AT . It follows
 that det (A) = 2 3 4 5 =
T
14 and det A = 2 3 5 4 = 14. Hence, det (A) = det A . T
The following provides an essential property of the determinant, as well as a useful way to determine
if a matrix is invertible.
98 Determinants

Theorem 3.28: Determinant of the Inverse


Let A be an n n matrix. Then A is invertible if and only if det(A) 6= 0. If this is true, it follows that
1
det(A1 ) =
det(A)

Consider the following example.

Example 3.29: Determinant of an Invertible Matrix


   
3 6 2 3
Let A = ,B = . For each matrix, determine if it is invertible. If so, find the
2 4 5 1
determinant of the inverse.

Solution. Consider the matrix A first. Using Definition 3.1 we can find the determinant as follows:

det (A) = 3 4 2 6 = 12 12 = 0

By Theorem 3.28 A is not invertible.


Now consider the matrix B. Again by Definition 3.1 we have

det (B) = 2 1 5 3 = 2 15 = 13

By Theorem 3.28 B is invertible and the determinant of the inverse is given by


 1
det A1 =
det(A)
1
=
13
1
=
13

3.1.4. Properties of Determinants II: Some Important Proofs

This section includes some important proofs on determinants and cofactors.


 
First we recall the definition of a determinant. If A = ai j is an n n matrix, then det A is defined by
computing the expansion along the first row:
n
det A = a1,i cof(A)1,i . (3.1)
i=1

If n = 1 then det A = a1,1 .


The following example is straightforward and strongly recommended as a means for getting used to
definitions.
3.1. Basic Techniques and Properties 99

Example 3.30:
(1) Let Ei j be the elementary matrix obtained by interchanging ith and jth rows of I . Then det Ei j =
1.
(2) Let Eik be the elementary matrix obtained by multiplying the ith row of I by k. Then det Eik = k.
(3) Let Ei jk be the elementary matrix obtained by multiplying ith row of I by k and adding it to its
jth row. Then det Ei jk = 1.
(4) If C and B are such that CB is defined and the ith row of C consists of zeros, then the ith row of
CB consists of zeros.
(5) If E is an elementary matrix, then det E = det E T .

Many of the proofs in section use the Principle of Mathematical Induction. This concept is discussed
in Appendix A.2 and is reviewed here for convenience. First we check that the assertion is true for n = 2
(the case n = 1 is either completely trivial or meaningless).
Next, we assume that the assertion is true for n 1 (where n 3) and prove it for n. Once this is
accomplished, by the Principle of Mathematical Induction we can conclude that the statement is true for
all n n matrices for every n 2.
If A is an n n matrix and 1 j n, then the matrix obtained by removing 1st column and jth row
from A is an n 1 n 1 matrix (we shall denote this matrix by A( j) below). Since these matrices are used
in computation of cofactors cof(A)1,i , for 1 i 6= n, the inductive assumption applies to these matrices.
Consider the following lemma.

Lemma 3.31:
If A is an n n matrix such that one of its rows consists of zeros, then det A = 0.

Proof. We will prove this lemma using Mathematical Induction.


If n = 2 this is easy (check!).
Let n 3 be such that every matrix of size n 1 n 1 with a row consisting of zeros has determinant
equal to zero. Let i be such that the ith row of A consists of zeros. Then we have ai j = 0 for 1 j n.
Fix j {1, 2, . . ., n} such that j 6= i. Then matrix A( j) used in computation of cof(A)1, j has a row
consisting of zeros, and by our inductive assumption cof(A)1, j = 0.
On the other hand, if j = i then a1, j = 0. Therefore a1, j cof(A)1, j = 0 for all j and by (3.1) we have
n
det A = a1, j cof(A)1, j = 0
j=1

as each of the summands is equal to 0.


100 Determinants

Lemma 3.32:
Assume A, B and C are n n matrices that for some 1 i n satisfy the following.

1. jth rows of all three matrices are identical, for j 6= i.

2. Each entry in the jth row of A is the sum of the corresponding entries in jth rows of B and C.

Then det A = det B + detC.

Proof. This is not difficult to check for n = 2 (do check it!).


Now assume that the statement of Lemma is true for n 1 n 1 matrices and fix A, B and C as
in the statement. The assumptions state that we have al, j = bl, j = cl, j for j 6= i and for 1 l n and
al,i = bl,i + cl,i for all 1 l n. Therefore A(i) = B(i) = C(i), and A( j) has the property that its ith row
is the sum of ith rows of B( j) and C( j) for j 6= i while the other rows of all three matrices are identical.
Therefore by our inductive assumption we have cof(A)1 j = cof(B)1 j + cof(C)1 j for j 6= i.
By (3.1) we have (using all equalities established above)
n
det A = a1,l cof(A)1,l
l=1
= a1,l (cof(B)1,l + cof(C)1,l ) + (b1,i + c1,i )cof(A)1,i
l6=i
= det B + detC

This proves that the assertion is true for all n and completes the proof.

Theorem 3.33:
Let A and B be n n matrices.

1. If A is obtained by interchanging ith and jth rows of B (with i 6= j), then det A = det B.

2. If A is obtained by multiplying ith row of B by k then det A = k det B.

3. If two rows of A are identical then det A = 0.

4. If A is obtained by multiplying ith row of B by k and adding it to jth row of B (i 6= j) then


det A = det B.

Proof. We prove all statements by induction. The case n = 2 is easily checked directly (and it is strongly
suggested that you do check it).
We assume n 3 and (1)(4) are true for all matrices of size n 1 n 1.
(1) We prove the case when j = i + 1, i.e., we are interchanging two consecutive rows.
Let l {1, . . ., n} \ {i, j}. Then A(l) is obtained from B(l) by interchanging two of its rows (draw a
picture) and by our assumption
cof(A)1,l = cof(B)1,l . (3.2)
3.1. Basic Techniques and Properties 101

Now consider a1,i cof(A)1,l . We have that a1,i = b1, j and also that A(i) = B( j). Since j = i + 1, we have

(1)1+ j = (1)1+i+1 = (1)1+i

and therefore a1i cof(A)1i = b1 j cof(B)1 j and a1 j cof(A)1 j = b1i cof(B)1i . Putting this together with (3.2)
into (3.1) we see that if in the formula for det A we change the sign of each of the summands we obtain the
formula for det B.
n n
det A = a1l cof(A)1l = b1l B1l = det B.
l=1 l=1

We have therefore proved the case of (1) when j = i + 1. In order to prove the general case, one needs
the following fact. If i < j, then in order to interchange ith and jth row one can proceed by interchanging
two adjacent rows 2( j i) + 1 times: First swap ith and i + 1st, then i + 1st and i + 2nd, and so on. After
one interchanges j 1st and jth row, we have ith row in position of jth and lth row in position of l 1st
for i + 1 l j. Then proceed backwards swapping adjacent rows until everything is in place.
Since 2( j i) + 1 is an odd number (1)2( ji)+1 = 1 and we have that det A = det B.
(2) This is like (1). . . but much easier. Assume that (2) is true for all n 1 n 1 matrices. We
have that a ji = kb ji for 1 j n. In particular a1i = kb1i , and for l 6= i matrix A(l) is obtained from
B(l) by multiplying one of its rows by k. Therefore cof(A)1l = kcof(B)1l for l 6= i, and for all l we have
a1l cof(A)1l = kb1l cof(B)1l . By (3.1), we have det A = k det B.
(3) This is a consequence of (1). If two rows of A are identical, then A is equal to the matrix obtained
by interchanging those two rows and therefore by (1) det A = det A. This implies det A = 0.
(4) Assume (4) is true for all n 1 n 1 matrices and fix A and B such that A is obtained by multi-
plying ith row of B by k and adding it to jth row of B (i 6= j) then det A = det B. If k = 0 then A = B and
there is nothing to prove, so we may assume k 6= 0.
Let C be the matrix obtained by replacing the jth row of B by the ith row of B multiplied by k. By
Lemma 3.32, we have that
det A = det B + detC
and we only need to show that detC = 0. But ith and jth rows of C are proportional. If D is obtained by
multiplying the jth row of C by 1k then by (2) we have detC = 1k det D (recall that k 6= 0!). But ith and jth
rows of D are identical, hence by (3) we have det D = 0 and therefore detC = 0.

Theorem 3.34:
Let A and B be two n n matrices. Then

det (AB) = det (A) det (B)

Proof. If A is an elementary matrix of either type, then multiplying by A on the left has the same effect as
performing the corresponding elementary row operation. Therefore the equality det(AB) = det A det B in
this case follows by Example 3.30 and Theorem 3.33.
If C is the reduced row-echelon form of A then we can write A = E1 E2 Em C for some elementary
matrices E1 , . . . , Em .
Now we consider two cases.
102 Determinants

Assume first that C = I. Then A = E1 E2 Em and AB = E1 E2 Em B. By applying the above


equality m times, and then m 1 times, we have that

det AB = det E1 det E2 det Em det B


= det(E1 E2 Em ) det B
= det A det B.

Now assume C 6= I. Since it is in reduced row-echelon form, its last row consists of zeros and by (4)
of Example 3.30 the last row of CB consists of zeros. By Lemma 3.31 we have detC = det(CB) = 0 and
therefore
det A = det(E1 E2 Em ) det(C) = det(E1 E2 Em ) 0 = 0
and also
det AB = det(E1 E2 Em ) det(CB) = det(E1 E2 Em )0 = 0
hence det AB = 0 = det A det B.
The same machine used in the previous proof will be used again.

Theorem 3.35:
Let A be a matrix where AT is the transpose of A. Then,

det AT = det (A)

Proof. Note first that the conclusion is true if A is elementary by (5) of Example 3.30.
Let C be the reduced row-echelon form of A. Then we can write A = E1 E2 EmC. Then AT =
CT EmT E2T E1 . By Theorem 3.34 we have

det(AT ) = det(CT ) det(EmT ) det(E2T ) det(E1 ).

By (5) of Example 3.30 we have that det E j = det E Tj for all j. Also, detC is either 0 or 1 (depending on
whether C = I or not) and in either case detC = detCT . Therefore det A = det AT .
The above discussions allow us to now prove Theorem 3.9. It is restated below.

Theorem 3.36:
Expanding an n n matrix along any row or column always gives the same result, which is the
determinant.

Proof. We first show that the determinant can be computed along any row. The case n = 1 does not apply
and thus let n 2.
Let Abe an n n matrix and fix j > 1. We need to prove that
n
det A = a j,i cof(A) j,i .
i=1
3.1. Basic Techniques and Properties 103

Let us prove the case when j = 2.


Let B be the matrix obtained from A by interchanging its 1st and 2nd rows. Then by Theorem 3.33 we
have
det A = det B.
Now we have
n
det B = b1,i cof(B)1,i .
i=1
Since B is obtained by interchanging the 1st and 2nd rows of A we have that b1,i = a2,i for all i and one
can see that minor(B)1,i = minor(A)2,i.
Further,
cof(B)1,i = (1)1+i minorB1,i = (1)2+i minor(A)2,i = cof(A)2,i
hence det B = ni=1 a2,i cof(A)2,i , and therefore det A = det B = ni=1 a2,i cof(A)2,i as desired.
The case when j > 2 is very similar; we still have minor(B)1,i = minor(A) j,i but checking that det B =
ni=1 a j,i cof(A) j,i is slightly more involved.
Now the cofactor expansion along column j of A is equal to the cofactor expansion along row j of AT ,
which is by the above result just proved equal to the cofactor expansion along row 1 of AT , which is equal
to the cofactor expansion along column 1 of A. Thus the cofactor cofactor along any column yields the
same result.
Finally, since det A = det AT by Theorem 3.35, we conclude that the cofactor expansion along row 1
of A is equal to the cofactor expansion along row 1 of AT , which is equal to the cofactor expansion along
column 1 of A. Thus the proof is complete.

3.1.5. Finding Determinants using Row Operations

Theorems 3.16, 3.18 and 3.21 illustrate how row operations affect the determinant of a matrix. In this
section, we look at two examples where row operations are used to find the determinant of a large matrix.
Recall that when working with large matrices, Laplace Expansion is effective but timely, as there are
many steps involved. This section provides useful tools for an alternative method. By first applying row
operations, we can obtain a simpler matrix to which we apply Laplace Expansion.
While working through questions such as these, it is useful to record your row operations as you go
along. Keep this in mind as you read through the next example.

Example 3.37: Finding a Determinant


Find the determinant of the matrix

1 2 3 4
5 1 2 3
A=
4

5 4 3
2 2 4 5

Solution. We will use the properties of determinants outlined above to find det (A). First, add 5 times
the first row to the second row. Then add 4 times the first row to the third row, and 2 times the first
104 Determinants

row to the fourth row. This yields the matrix



1 2 3 4
0 9 13 17
B=
0 3 8 13

0 2 10 3

Notice that the only row operation we have done so far is adding a multiple of a row to another row.
Therefore, by Theorem 3.21, det (B) = det (A) .
At this stage, you could use Laplace Expansion to find det (B). However, we will continue with row
operations to find an even simpler matrix to work with.
Add 3 times the third row to the second row. By Theorem 3.21 this does not change the value of the
determinant. Then, multiply the fourth row by 3. This results in the matrix

1 2 3 4
0 0 11 22
C= 0 3 8 13

0 6 30 9

Here, det (C) = 3 det (B), which means that det (B) = 13 det (C)

Since det (A) = det (B), we now have that det (A) = 13 det (C). Again, you could use Laplace
Expansion here to find det (C). However, we will continue with row operations.
Now replace the add 2 times the third row to the fourth row. This does not change the value of the
determinant by Theorem 3.21. Finally switch the third and second rows. This causes the determinant to
be multiplied by 1. Thus det (C) = det (D) where

1 2 3 4
0 3 8 13
D= 0

0 11 22
0 0 14 17
 
Hence, det (A) = 13 det (C) = 13 det (D)
You could do more row operations or you could note that this can be easily expanded along the first
column. Then, expand the resulting 3 3 matrix also along the first column. This results in

11 22

det (D) = 1 (3) = 1485
14 17
1

and so det (A) = 3 (1485) = 495.
You can see that by using row operations, we can simplify a matrix to the point where Laplace Ex-
pansion involves only a few steps. In Example 3.37, we also could have continued until the matrix was in
upper triangular form, and taken the product of the entries on the main diagonal. Whenever computing the
determinant, it is useful to consider all the possible methods and tools.
Consider the next example.
3.1. Basic Techniques and Properties 105

Example 3.38: Find the Determinant


Find the determinant of the matrix

1 2 3 2
1 3 2 1
A=
2

1 2 5
3 4 1 2

Solution. Once again, we will simplify the matrix through row operations. Add 1 times the first row to
the second row. Next add 2 times the first row to the third and finally take 3 times the first row and add
to the fourth row. This yields
1 2 3 2
0 5 1 1
B= 0 3 4

1
0 10 8 4
By Theorem 3.21, det (A) = det (B).
Remember you can work with the columns also. Take 5 times the fourth column and add to the
second column. This yields
1 8 3 2
0 0 1 1
C=
0

8 4 1
0 10 8 4
By Theorem 3.21 det (A) = det (C).
Now take 1 times the third row and add to the top row. This gives.

1 0 7 1
0 0 1 1
D= 0 8 4

1
0 10 8 4
which by Theorem 3.21 has the same determinant as A.
Now, we can find det (D) by expanding along the first column as follows. You can see that there will
be only one non zero term.

0 1 1
det (D) = 1 det 8 4 1 +0+0+0
10 8 4
Expanding again along the first column, we have
    
1 1 1 1
det (D) = 1 0 + 8 det + 10 det = 82
8 4 4 1

Now since det (A) = det (D), it follows that det (A) = 82.
Remember that you can verify these answers by using Laplace Expansion on A. Similarly, if you first
compute the determinant using Laplace Expansion, you can use the row operation method to verify.
106 Determinants

Exercises

Exercise 3.1.1 Find the determinants of the following matrices.


 
1 3
(a)
0 2
 
0 3
(b)
0 2
 
4 3
(c)
6 2


1 2 4
Exercise 3.1.2 Let A = 0 1 3 . Find the following.
2 5 1

(a) minor(A)11

(b) minor(A)21

(c) minor(A)32

(d) cof(A)11

(e) cof(A)21

(f) cof(A)32

Exercise 3.1.3 Find the determinants of the following matrices.



1 2 3
(a) 3 2 2
0 9 8

4 3 2
(b) 1 7 8
3 9 3

1 2 3 2
1 3 2 3
(c)
4 1 5 0

1 2 1 2
3.1. Basic Techniques and Properties 107

Exercise 3.1.4 Find the following determinant by expanding along the first row and second column.

1 2 1

2 1 3

2 1 1

Exercise 3.1.5 Find the following determinant by expanding along the first column and third row.

1 2 1

1 0 1

2 1 1

Exercise 3.1.6 Find the following determinant by expanding along the second row and first column.

1 2 1

2 1 3

2 1 1

Exercise 3.1.7 Compute the determinant by cofactor expansion. Pick the easiest row or column to use.

1 0 0 1

2 1 1 0

0 0 0 2

2 1 3 1

Exercise 3.1.8 Find the determinant of the following matrices.


 
1 34
(a) A =
0 2

4 3 14

(b) A = 0 2 0
0 0 5

2 3 15 0
0 4 1 7
(c) A =
0 0 3

5
0 0 0 1

Exercise 3.1.9 An operation is done to get from the first matrix to the second. Identify what was done and
tell how it will affect the value of the determinant.
   
a b a c

c d b d
108 Determinants

Exercise 3.1.10 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
   
a b c d

c d a b

Exercise 3.1.11 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
   
a b a b

c d a+c b+d

Exercise 3.1.12 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
   
a b a b

c d 2c 2d

Exercise 3.1.13 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
   
a b b a

c d d c

Exercise 3.1.14 Let A be an r r matrix and suppose there are r 1 rows (columns) such that all rows
(columns) are linear combinations of these r 1 rows (columns). Show det (A) = 0.

Exercise 3.1.15 Show det (aA) = an det (A) for an n n matrix A and scalar a.

Exercise 3.1.16 Construct 2 2 matrices A and B to show that the det A det B = det(AB).

Exercise 3.1.17 Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why. If it is not so, give
a counter example.

Exercise 3.1.18 An n n matrix is called nilpotent if for some positive integer, k it follows Ak = 0. If A is
a nilpotent matrix and k is the smallest possible integer such that Ak = 0, what are the possible values of
det (A)?

Exercise 3.1.19 A matrix is said to be orthogonal if AT A = I. Thus the inverse of an orthogonal matrix is
just its transpose. What are the possible values of det (A) if A is an orthogonal matrix?
3.1. Basic Techniques and Properties 109

Exercise 3.1.20 Let A and B be two n n matrices. A B (A is similar to B) means there exists an
invertible matrix P such that A = P1 BP. Show that if A B, then det (A) = det (B) .

Exercise 3.1.21 Tell whether each statement is true or false. If true, provide a proof. If false, provide a
counter example.

(a) If A is a 3 3 matrix with a zero determinant, then one column must be a multiple of some other
column.

(b) If any two columns of a square matrix are equal, then the determinant of the matrix equals zero.

(c) For two n n matrices A and B, det (A + B) = det (A) + det (B) .

(d) For an n n matrix A, det (3A) = 3 det (A)



(e) If A1 exists then det A1 = det (A)1 .

(f) If B is obtained by multiplying a single row of A by 4 then det (B) = 4 det (A) .

(g) For A an n n matrix, det (A) = (1)n det (A) .



(h) If A is a real n n matrix, then det AT A 0.

(i) If Ak = 0 for some positive integer k, then det (A) = 0.

(j) If AX = 0 for some X 6= 0, then det (A) = 0.

Exercise 3.1.22 Find the determinant using row operations to first simplify.

1 2 1

2 3 2

4 1 2

Exercise 3.1.23 Find the determinant using row operations to first simplify.

2 1 3

2 4 2

1 4 5

Exercise 3.1.24 Find the determinant using row operations to first simplify.

1 2 1 2

3 1 2 3

1 0 3 1

2 3 2 2
110 Determinants

Exercise 3.1.25 Find the determinant using row operations to first simplify.

1 4 1 2

3 2 2 3

1 0 3 3

2 1 2 2

3.2 Applications of the Determinant

Outcomes
A. Use determinants to determine whether a matrix has an inverse, and evaluate the inverse using
cofactors.

B. Apply Cramers Rule to solve a 2 2 or a 3 3 linear system.

C. Given data points, find an appropriate interpolating polynomial and use it to estimate points.

3.2.1. A Formula for the Inverse

The determinant of a matrix also provides a way to find the inverse of a matrix. Recall the definition of
the inverse of a matrix in Definition 2.33. We say that A1 , an n n matrix, is the inverse of A, also n n,
if AA1 = I and A1 A = I.
We now define a new matrix called the cofactor matrix of A. The cofactor matrix of A is the matrix
whose i jth entry is the i jth cofactor of A. The formal definition is as follows.

Definition 3.39: The Cofactor Matrix


 
Let A = ahi j be ani n n matrix. Then the cofactor matrix of A, denoted cof (A), is defined by
cof (A) = cof (A)i j where cof (A)i j is the i jth cofactor of A.

Note that cof (A)i j denotes the i jth entry of the cofactor matrix.
We will use the cofactor matrix to create a formula for the inverse of A. First, we define the adjugate
of A to be the transpose of the cofactor matrix. We can also call this matrix the classical adjoint of A, and
we denote it by adj (A).
In the specific case where A is a 2 2 matrix given by
 
a b
A=
c d
3.2. Applications of the Determinant 111

then adj (A) is given by  


d b
adj (A) =
c a
In general, adj (A) can always be found by taking the transpose of the cofactor matrix of A. The
following theorem provides a formula for A1 using the determinant and adjugate of A.

Theorem 3.40: The Inverse and the Determinant


Let A be an n n matrix. Then

A adj (A) = adj (A) A = det (A)I

Moreover A is invertible if and only if det (A) 6= 0. In this case we have:


1
A1 = adj (A)
det (A)

Notice that the first formula holds for any n n matrix A, and in the case A is invertible we actually
have a formula for A1 .
Consider the following example.

Example 3.41: Find Inverse Using the Determinant


Find the inverse of the matrix
1 2 3
A= 3 0 1
1 2 1
using the formula in Theorem 3.40.

Solution. According to Theorem 3.40,


1
A1 = adj (A)
det (A)

First we will find the determinant of this matrix. Using Theorems 3.16, 3.18, and 3.21, we can first
simplify the matrix through row operations. First, add 3 times the first row to the second row. Then add
1 times the first row to the third row to obtain

1 2 3
B = 0 6 8
0 0 2
By Theorem 3.21, det (A) = det (B). By Theorem 3.13, det (B) = 1 6 2 = 12. Hence, det (A) = 12.
Now, we need to find adj (A). To do so, first we will find the cofactor matrix of A. This is given by

2 2 6
cof (A) = 4 2 0
2 8 6
112 Determinants

Here, the i jth entry is the i jth cofactor of the original matrix A which you can verify. Therefore, from
Theorem 3.40, the inverse of A is given by
1 1 1
T 6 3 6
2 2 6
1 1 1 2
1
A = 4 2 0
= 6 6 3
12
2 8 6 1 1

2 0 2

Remember that we can always verify our answer for A1 . Compute the product AA1 and A1 A and
make sure each product is equal to I.
Compute A1 A as follows
1 1 1
6 3 6
1 2 3 1 0 0

A1 A = 1 16 2
3
6 3 0 1 = 0 1 0 =I
1 1 2 1 0 0 1
2 0 21

You can verify that AA1 = I and hence our answer is correct.
We will look at another example of how to use this formula to find A1 .

Example 3.42: Find the Inverse From a Formula


Find the inverse of the matrix
1 1
2 0 2

16 1
21
A=

3


56 2
3 21

using the formula given in Theorem 3.40.

Solution. First we need to find det (A). This step is left as an exercise and you should verify that det (A) =
1
6 . The inverse is therefore equal to

1
A1 = adj (A) = 6 adj (A)
(1/6)
3.2. Applications of the Determinant 113

We continue to calculate as follows. Here we show the 2 2 determinants needed to find the cofactors.
1
1
1
1
1 1 T

3 2 6 2 6 3
2
1 1 5 2
6 3
5
3 2
6 2
1
0 1 1 1 0
2 2 2 2
A1 = 6
23 12

5 1 5 2

6 2 6 3
1
1 1 1
0 2 2 2 2 0
1 1 1
3 2
1
1 1 6 3
6 2

Expanding all the 2 2 determinants, this yields


T
1 1 1

6 3 6

1 1 1 2 1

A1 = 6 3 6 13
= 2 1 1

1 1 1 1 2 1
6 6 6

Again, you can always check your work by multiplying A1 A and AA1 and ensuring these products
equal I. 1 1
2 0 2
1 2 1 1 1 1
1 0 0
6 3 2
A1 A = 2 1 1 = 0 1 0

1 2 1 5 2 1 0 0 1
6 3 2

This tells us that our calculation for A1 is correct. It is left to the reader to verify that AA1 = I.
The verification step is very important, as it is a simple way to check your work! If you multiply A1 A
and AA1 and these products are not both equal to I, be sure to go back and double check each step. One
common error is to forget to take the transpose of the cofactor matrix, so be sure to complete this step.
We will now prove Theorem 3.40.
Proof. (of Theorem 3.40) Recall that the (i, j)-entry of adj(A) is equal to cof(A) ji . Thus the (i, j)-entry of
B = A adj(A) is :
n n
Bi j = aik adj(A)k j = aik cof(A) jk
k=1 k=1
By the cofactor expansion theorem, we see that this expression for Bi j is equal to the determinant of the
matrix obtained from A by replacing its jth row by ai1 , ai2 , . . . ain i.e., its ith row.
If i = j then this matrix is A itself and therefore Bii = det A. If on the other hand i 6= j, then this matrix
has its ith row equal to its jth row, and therefore Bi j = 0 in his case. Thus we obtain:
A adj (A) = det (A)I
Similarly we can verify that:
adj (A) A = det (A)I
114 Determinants

And this proves the first part of the theorem.


Further if A is invertible, then by Theorem 3.24 we have:
 
1 = det (I) = det AA1 = det (A) det A1

and thus det (A) 6= 0. Equivalently, if det (A) = 0, then A is not invertible.
Finally if det (A) 6= 0, then the above formula shows that A is invertible and that:

1
A1 = adj (A)
det (A)

This completes the proof.


This method for finding the inverse of A is useful in many contexts. In particular, it is useful with
complicated matrices where the entries are functions, rather than numbers.
Consider the following example.

Example 3.43: Inverse for Non-Constant Matrix


Suppose
et 0 0
A (t) = 0 cost sint
0 sint cost
Show that A (t)1 exists and then find it.

Solution. First note det (A (t)) = et (cos2 t + sin2 t) = et 6= 0 so A (t)1 exists.


The cofactor matrix is
1 0 0
C (t) = 0 et cost et sint
0 et sint et cost
and so the inverse is
T t
1 0 0 e 0 0
1
0 et cost et sint = 0 cost sint
et
0 et sint et cost 0 sint cost

3.2.2. Cramers Rule

Another context in which the formula given in Theorem 3.40 is important is Cramers Rule. Recall that
we can represent a system of linear equations in the form AX = B, where the solutions to this system
are given by X . Cramers Rule gives a formula for the solutions X in the special case that A is a square
invertible matrix. Note this rule does not apply if you have a system of equations in which there is a
3.2. Applications of the Determinant 115

different number of equations than variables (in other words, when A is not square), or when A is not
invertible.
Suppose we have a system of equations given by AX = B, and we want to find solutions X which
satisfy this system. Then recall that if A1 exists,

AX = B
A (AX ) = A1 B
1

A1 A X = A1 B
IX = A1 B
X = A1 B

Hence, the solutions X to the system are given by X = A1 B. Since we assume that A1 exists, we can use
the formula for A1 given above. Substituting this formula into the equation for X , we have
1
X = A1 B = adj (A) B
det (A)

Let xi be the ith entry of X and b j be the jth entry of B. Then this equation becomes
n
 1 n
1
xi = ai j bj = det (A) adj (A)i j b j
j=1 j=1

where adj (A)i j is the i jth entry of adj (A).


By the formula for the expansion of a determinant along a column,

b1
1 ..
xi = det ... ..
. .
det (A)
bn

where here the ith column of A is replaced with the column vector [b1 , bn ]T . The determinant of this
modified matrix is taken and divided by det (A). This formula is known as Cramers rule.
We formally define this method now.

Procedure 3.44: Using Cramers Rule


Suppose A is an n n invertible matrix and we wish to solve the system AX = B for X =
[x1 , , xn ]T . Then Cramers rule says
det (Ai )
xi =
det (A)

where Ai is the matrix obtained by replacing the ith column of A with the column matrix

b1

B = ...
bn
116 Determinants

We illustrate this procedure in the following example.

Example 3.45: Using Cramers Rule


Find x, y, z if
1 2 1 x 1
3 2 1 y = 2
2 3 2 z 3

Solution. We will use method outlined in Procedure 3.44 to find the values for x, y, z which give the solution
to this system. Let
1
B= 2

3
In order to find x, we calculate
det (A1 )
x=
det (A)
where A1 is the matrix obtained from replacing the first column of A with B.
Hence, A1 is given by
1 2 1
A1 = 2 2 1
3 3 2
Therefore,
1 2 1

2 2 1

det (A1 ) 3 3 2 1
x= = =
det (A) 1 2 1 2

3 2 1

2 3 2

Similarly, to find y we construct A2 by replacing the second column of A with B. Hence, A2 is given by

1 1 1
A2 = 3 2 1
2 3 2

Therefore,
1 1 1

3 2 1

det (A2 ) 2 3 2 1
y= = =
det (A) 7
1 2 1
3 2 1

2 3 2
3.2. Applications of the Determinant 117

Similarly, A3 is constructed by replacing the third column of A with B. Then, A3 is given by



1 2 1
A3 = 3 2 2
2 3 3

Therefore, z is calculated as follows.



1 2 1

3 2 2

det (A3 ) 2 3 3 11
z= = =
det (A) 1 2 1 14

3 2 1

2 3 2

Cramers Rule gives you another tool to consider when solving a system of linear equations.
We can also use Cramers Rule for systems of non linear equations. Consider the following system
where the matrix A has functions rather than numbers for entries.

Example 3.46: Use Cramers Rule for Non-Constant Matrix


Solve for z if
1 0 0 x 1
0 et cost et sint y = t
0 et sint et cost z t2

Solution. We are asked to find the value of z in the solution. We will solve using Cramers rule. Thus

1 0 1

0 et cost t

0 et sint t 2
z= = t ((cost)t + sint) et
1 0 0

0 e cost e sint
t t

0 et sint et cost

3.2.3. Polynomial Interpolation

In studying a set of data that relates variables x and y, it may be the case that we can use a polynomial to
fit to the data. If such a polynomial can be established, it can be used to estimate values of x and y which
have not been provided.
Consider the following example.
118 Determinants

Example 3.47: Polynomial Interpolation


Given data points (1, 4), (2, 9), (3, 12), find an interpolating polynomial p(x) of degree at most 2 and
then estimate the value corresponding to x = 12 .

Solution. We want to find a polynomial given by

p(x) = r0 + r1 x1 + r2 x22

such that p(1) = 4, p(2) = 9 and p(3) = 12. To find this polynomial, substitute the known values in for x
and solve for r0 , r1 , and r2 .

p(1) = r0 + r1 + r2 = 4
p(2) = r0 + 2r1 + 4r2 = 9
p(3) = r0 + 3r1 + 9r2 = 12

Writing the augmented matrix, we have



1 1 1 4
1 2 4 9
1 3 9 12

After row operations, the resulting matrix is



1 0 0 3
0 1 0 8
0 0 1 1

Therefore the solution to the system is r0 = 3, r1 = 8, r2 = 1 and the required interpolating polyno-
mial is
p(x) = 3 + 8x x2
To estimate the value for x = 12 , we calculate p( 21 ):

1 1 1
p( ) = 3 + 8( ) ( )2
2 2 2
1
= 3 + 4
4
3
=
4

This procedure can be used for any number of data points, and any degree of polynomial. The steps
are outlined below.
3.2. Applications of the Determinant 119

Procedure 3.48: Finding an Interpolating Polynomial


Suppose that values of x and corresponding values of y are given, such that the actual relationship
between x and y is unknown. Then, values of y can be estimated using an interpolating polynomial
p(x). If given x1 , ..., xn and the corresponding y1 , ..., yn , the procedure to find p(x) is as follows:

1. The desired polynomial p(x) is given by

p(x) = r0 + r1 x + r2 x2 + ... + rn1 xn1

2. p(xi ) = yi for all i = 1, 2, ..., n so that

r0 + r1 x1 + r2 x21 + ... + rn1 xn1


1 = y1
2 n1
r0 + r1 x2 + r2 x2 + ... + rn1 x2 = y2
.
..
r0 + r1 xn + r2 x2n + ... + rn1 xn1
n = yn

3. Set up the augmented matrix of this system of equations



1 x1 x21 xn1
1 y1
1 x x2 xn1 y
2 2 2 2
.. .. .. . .
. . . . ..
.
1 xn x2n xn1
n yn

4. Solving this system will result in a unique solution r0 , r1 , , rn1 . Use these values to con-
struct p(x), and estimate the value of p(a) for any x = a.

This procedure motivates the following theorem.

Theorem 3.49: Polynomial Interpolation


Given n data points (x1 , y1 ), (x2 , y2 ), , (xn , yn ) with the xi distinct, there is a unique polynomial
p(x) = r0 + r1 x + r2 x2 + + rn1 xn1 such that p(xi ) = yi for i = 1, 2, , n. The resulting polyno-
mial p(x) is called the interpolating polynomial for the data points.

We conclude this section with another example.

Example 3.50: Polynomial Interpolation


Consider the data points (0, 1), (1, 2), (3, 22), (5, 66). Find an interpolating polynomial p(x) of de-
gree at most three, and estimate the value of p(2).

Solution. The desired polynomial p(x) is given by:

p(x) = r0 + r1 x + r2 x2 + r3 x3
120 Determinants

Using the given points, the system of equations is

p(0) = r0 = 1
p(1) = r0 + r1 + r2 + r3 = 2
p(3) = r0 + 3r1 + 9r2 + 27r3 = 22
p(5) = r0 + 5r1 + 25r2 + 125r3 = 66

The augmented matrix is given by:



1 0 0 0 1
1 1 1 1 2

1 3 9 27 22
1 5 25 125 66

The resulting matrix is


1 0 0 0 1
0 1 0 0 2

0 0 1 0 3
0 0 0 1 0

Therefore, r0 = 1, r1 = 2, r2 = 3, r3 = 0 and p(x) = 1 2x + 3x2 . To estimate the value of p(2), we


compute p(2) = 1 2(2) + 3(22) = 1 4 + 12 = 9.

Exercises

Exercise 3.2.26 Let


1 2 3
A= 0 2 1
3 1 0
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse which involves the cofactor
matrix.

Exercise 3.2.27 Let


1 2 0
A= 0 2 1
3 1 1
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse.

Exercise 3.2.28 Let


1 3 3
A= 2 4 1
0 1 1
3.2. Applications of the Determinant 121

Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse.

Exercise 3.2.29 Let


1 2 3
A= 0 2 1
2 6 7
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse.

Exercise 3.2.30 Let


1 0 3
A= 1 0 1
3 1 0
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse.

Exercise 3.2.31 For the following matrices, determine if they are invertible. If so, use the formula for the
inverse in terms of the cofactor matrix to find each inverse. If the inverse does not exist, explain why.
 
1 1
(a)
1 2

1 2 3
(b) 0 2 1
4 1 1

1 2 1
(c) 2 3 0
0 1 2

Exercise 3.2.32 Consider the matrix



1 0 0
A = 0 cost sint
0 sint cost

Does there exist a value of t for which this matrix fails to have an inverse? Explain.

Exercise 3.2.33 Consider the matrix


1 t t2
A = 0 1 2t
t 0 2
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
122 Determinants

Exercise 3.2.34 Consider the matrix



et cosht sinht
A = et sinht cosht
et cosht sinht
Does there exist a value of t for which this matrix fails to have an inverse? Explain.

Exercise 3.2.35 Consider the matrix


t
e et cost et sint
A = et et cost et sint et sint + et cost
et 2et sint 2et cost
Does there exist a value of t for which this matrix fails to have an inverse? Explain.

Exercise 3.2.36 Show that if det (A) 6= 0 for A an n n matrix, it follows that if AX = 0, then X = 0.

Exercise 3.2.37 Suppose A, B are n n matrices and that AB = I. Show that then BA = I. Hint: First
explain why det (A) , det (B) are both nonzero. Then (AB) A = A and then show BA (BA I) = 0. From this
use what is given to conclude A (BA I) = 0. Then use Problem 3.2.36.

Exercise 3.2.38 Use the formula for the inverse in terms of the cofactor matrix to find the inverse of the
matrix t
e 0 0
A= 0 et cost et sint
t t t
0 e cost e sint e cost + e sint t

Exercise 3.2.39 Find the inverse, if it exists, of the matrix


t
e cost sint
A = et sint cost
et cos t sint

Exercise 3.2.40 Suppose A is an upper triangular matrix. Show that A1 exists if and only if all elements
of the main diagonal are non zero. Is it true that A1 will also be upper triangular? Explain. Could the
same be concluded for lower triangular matrices?

Exercise 3.2.41 If A, B, and C are each n n matrices and ABC is invertible, show why each of A, B, and
C are invertible.

Exercise 3.2.42 Decide if this statement is true or false: Cramers rule is useful for finding solutions to
systems of linear equations in which there is an infinite set of solutions.

Exercise 3.2.43 Use Cramers rule to find the solution to


x + 2y = 1
2x y = 2
3.2. Applications of the Determinant 123

Exercise 3.2.44 Use Cramers rule to find the solution to

x + 2y + z = 1
2x y z = 2
x+z = 1
4. Rn

4.1 Vectors in Rn

Outcomes
A. Find the position vector of a point in Rn .

The notation Rn refers to the collection of ordered lists of n real numbers, that is

Rn = (x1 xn ) : x j R for j = 1, , n
In this chapter, we take a closer look at vectors in Rn . First, we will consider what Rn looks like in more
detail. Recall that the point given by 0 = (0, , 0) is called the origin.
Now, consider the case of Rn for n = 1. Then from the definition we can identify R with points in R1
as follows:
R = R1 = {(x1 ) : x1 R}
Hence, R is defined as the set of all real numbers and geometrically, we can describe this as all the points
on a line.
Now suppose n = 2. Then, from the definition,

R2 = (x1 , x2 ) : x j R for j = 1, 2
Consider the familiar coordinate plane, with an x axis and a y axis. Any point within this coordinate plane
is identified by where it is located along the x axis, and also where it is located along the y axis. Consider
as an example the following diagram.
y
Q = (3, 4)
4

P = (2, 1)
1

x
3 2

125
126 Rn

Hence, every element in R2 is identified by two components, x and y, in the usual manner. The
coordinates x, y (or x1 ,x2 ) uniquely determine a point in the plan. Note that while the definition uses x1 and
x2 to label the coordinates and you may be used to x and y, these notations are equivalent.
Now suppose n = 3. You may have previously encountered the 3-dimensional coordinate system, given
by 
R3 = (x1 , x2 , x3 ) : x j R for j = 1, 2, 3

Points in R3 will be determined by three coordinates, often written (x, y, z) which correspond to the x,
y, and z axes. We can think as above that the first two coordinates determine a point in a plane. The third
component determines the height above or below the plane, depending on whether this number is positive
or negative, and all together this determines a point in space. You see that the ordered triples correspond to
points in space just as the ordered pairs correspond to points in a plane and single real numbers correspond
to points on a line.
The idea behind the more general Rn is that we can extend these ideas beyond n = 3. This discussion
regarding points in Rn leads into a study of vectors in Rn . While we consider Rn for all n, we will largely
focus on n = 2, 3 in this section.
Consider the following definition.

Definition 4.1: The Position Vector




Let P = (p1 , , pn ) be the coordinates of a point in Rn . Then the vector 0P with its tail at 0 =
(0, , 0) and its tip at P is called the position vector of the point P. We write

p1


0P = ...
pn



For this reason we may write both P = (p1 , , pn ) Rn and 0P = [p1 pn ]T Rn .
This definition is illustrated in the following picture for the special case of R3 .

P = (p1 , p2 , p3 )


T
0P = p1 p2 p3




Thus every point P in Rn determines its position vector 0P. Conversely, every such position vector 0P
which has its tail at 0 and point at P determines the point P of Rn .
Now suppose we are given two points, P, Q whose coordinates are (p1 , , pn ) and (q1 , , qn ) re-
spectively. We can also determine the position vector from P to Q (also called the vector from P to Q)
defined as follows.
q1 p1
..
PQ = . = 0Q 0P
qn pn
4.1. Vectors in Rn 127

Now, imagine taking a vector in Rn and moving it around, always keeping it pointing in the same
direction as shown in the following picture.

B 
T
0P = p1 p2 p3
A




After moving it around, it is regarded as the same vector. Each vector, 0P and AB has the same length
(or magnitude) and direction. Therefore, they are equal.
Consider now the general definition for a vector in Rn .

Definition 4.2: Vectors in Rn



Let Rn = (x1 , , xn ) : x j R for j = 1, , n . Then,

x1

~x = ...
xn

is called a vector. Vectors have both size (magnitude) and direction. The numbers x j are called the
components of ~x.

Using this notation, we may use ~p to denote the position vector of point P. Notice that in this context,


~p = 0P. These notations may be used interchangeably.
You can think of the components of a vector as directions for obtaining the vector. Consider n = 3.
Draw a vector with its tail at the point (0, 0, 0) and its tip at the point (a, b, c). This vector it is obtained
by starting at (0, 0, 0), moving parallel to the x axis to (a, 0, 0) and then from here, moving parallel to the
y axis to (a, b, 0) and finally parallel to the z axis to (a, b, c) . Observe that the same vector would result if
you began at the point (d, e, f ), moved parallel to the x axis to (d + a, e, f ) , then parallel to the y axis to
(d + a, e + b, f ) , and finally parallel to the z axis to (d + a, e + b, f + c). Here, the vector would have its
tail sitting at the point determined by A = (d, e, f ) and its point at B = (d + a, e + b, f + c) . It is the same
vector because it will point in the same direction and have the same length. It is like you took an actual
arrow, and moved it from one location to another keeping it pointing the same direction.
We conclude this section with a brief discussion regarding notation. In previous sections, we have
written vectors as columns, or n 1 matrices. For convenience in this chapter we may write vectors as the
transpose of row vectors, or 1 n matrices. These are of course equivalent and we may move between
both notations. Therefore, recognize that
 
2  T
= 2 3
3

Notice that two vectors ~u = [u1 un ]T and ~v = [v1 vn ]T are equal if and only if all corresponding
components are equal. Precisely,
~u =~v if and only if
u j = v j for all j = 1, , n
128 Rn

 T  T  T  T
Thus 1 2 4 R3 and 2 1 4 R3 but 1 2 4 6= 2 1 4 because, even though
the same numbers are involved, the order of the numbers is different.
For the specific case of R3 , there are three special vectors which we often use. They are given by
 
~i = 1 0 0 T
 T
~j = 0 1 0
 T
~k = 0 0 1
 T
We can write any vector ~u = u1 u2 u3 as a linear combination of these vectors, written as ~u =
u1~i + u2~j + u3~k. This notation will be used throughout this chapter.

4.2 Algebra in Rn

Outcomes
A. Understand vector addition and scalar multiplication, algebraically.

B. Introduce the notion of linear combination of vectors.

Addition and scalar multiplication are two important algebraic operations done with vectors. Notice
that these operations apply to vectors in Rn , for any value of n. We will explore these operations in more
detail in the following sections.

4.2.1. Addition of Vectors in Rn

Addition of vectors in Rn is defined as follows.

Definition 4.3: Addition of Vectors in Rn



u1 v1

If ~u = ... , ~v = ... Rn then ~u +~v Rn and is defined by
un vn


u1 v1

~u +~v = ... + ...
un vn

u1 + v1
.
= ..
un + vn
4.2. Algebra in Rn 129

To add vectors, we simply add corresponding components exactly as we did for matrices. Therefore,
in order to add vectors, they must be the same size.
Similarly to matrices, addition of vectors satisfies some important properties. These are outlined in the
following theorem.

Theorem 4.4: Properties of Vector Addition


The following properties hold for vectors ~u,~v,~w Rn .

The Commutative Law of Addition

~u +~v =~v +~u

The Associative Law of Addition

(~u +~v) + ~w = ~u + (~v + ~w)

The Existence of an Additive Identity

~u +~0 = ~u (4.1)

The Existence of an Additive Inverse

~u + (~u) = ~0

The proof of this theorem follows from the similar properties for matrix operations. Thus the additive
identity shown in equation 4.1 is also called the zero vector, the n 1 vector in which all components
are equal to 0. Further, ~u is simply the vector with all components having same value as those of ~u but
opposite sign; this is just (1)~u. This will be made more explicit in the next section when we explore
scalar multiplication of vectors. Note that subtraction is defined as ~u ~v = ~u + (~v) .

4.2.2. Scalar Multiplication of Vectors in Rn

Scalar multiplication of vectors in Rn is defined as follows. Notice that, just like addition, this definition
is the same as the corresponding definition for matrices.

Definition 4.5: Scalar Multiplication of Vectors in Rn


If ~u Rn and k R is a scalar, then k~u Rn is defined by

u1 ku1

k~u = k ... = ...
un kun

Just as with addition, scalar multiplication of vectors satisfies several important properties. These are
130 Rn

outlined in the following theorem.

Theorem 4.6: Properties of Scalar Multiplication


The following properties hold for vectors ~u,~v Rn and k, p scalars.

The Distributive Law over Vector Addition

k (~u +~v) = k~u + k~v

The Distributive Law over Scalar Addition

(k + p)~u = k~u + p~u

The Associative Law for Scalar Multiplication

k (p~u) = (kp)~u

Rule for Multiplication by 1


1~u = ~u

Proof. Again the verification of these properties follows from the corresponding properties for scalar
multiplication of matrices.
As a refresher we can show that
k (~u +~v) = k~u + k~v
Note that:
k (~u +~v) = k [u1 + v1 un + vn ]T
= [k (u1 + v1 ) k (un + vn )]T
= [ku1 + kv1 kun + kvn ]T
= [ku1 kun ]T + [kv1 kvn ]T
= k~u + k~v

We now present a useful notion you may have seen earlier combining vector addition and scalar mul-
tiplication

Definition 4.7: Linear Combination


A vector ~v is said to be a linear combination of the vectors ~u1 , ,~un if there exist scalars,
a1 , , an such that
~v = a1~u1 + + an~un

For example,
4 3 18
3 1 +2 0 = 3 .
0 1 2
4.2. Algebra in Rn 131

Thus we can say that


18
~v = 3
2
is a linear combination of the vectors

4 3
~u1 = 1 and ~u2 = 0
0 1

Exercises

5 8
1 2
Exercise 4.2.1 Find 3
2 + 5 3
.

3 6

6 13
0 1
Exercise 4.2.2 Find 7
4 +6
.
1
1 6

Exercise 4.2.3 Decide whether


4
~v = 4
3
is a linear combination of the vectors

3 2
~u1 = 1 and ~u2 = 2 .
1 1

Exercise 4.2.4 Decide whether


4
~v = 4
4
is a linear combination of the vectors

3 2
~u1 = 1 and ~u2 = 2 .
1 1
132 Rn

4.3 Geometric Meaning of Vector Addition

Outcomes
A. Understand vector addition, geometrically.

Recall that an element of Rn is an ordered list of numbers. For the specific case of n = 2, 3 this can
be used to determine a point in two or three dimensional space. This point is specified relative to some
coordinate axes.
Consider the case n = 3. Recall that taking a vector and moving it around without changing its length or
direction does not change the vector. This is important in the geometric representation of vector addition.
Suppose we have two vectors, ~u and ~v in R3 . Each of these can be drawn geometrically by placing the
tail of each vector at 0 and its point at (u1 , u2 , u3 ) and (v1 , v2 , v3 ) respectively. Suppose we slide the vector
~v so that its tail sits at the point of ~u. We know that this does not change the vector ~v. Now, draw a new
vector from the tail of ~u to the point of ~v. This vector is ~u +~v.
The geometric significance of vector addition in Rn for any n is given in the following definition.

Definition 4.8: Geometry of Vector Addition


Let ~u and ~v be two vectors. Slide ~v so that the tail of ~v is on the point of ~u. Then draw the arrow
which goes from the tail of ~u to the point of ~v. This arrow represents the vector ~u +~v.

~u +~v
~v

~u

This definition is illustrated in the following picture in which ~u +~v is shown for the special case n = 3.

~v



z ~u ~u +~v

~v
y

x
4.3. Geometric Meaning of Vector Addition 133

Notice the parallelogram created by ~u and ~v in the above diagram. Then ~u +~v is the directed diagonal
of the parallelogram determined by the two vectors ~u and ~v.
When you have a vector ~v, its additive inverse ~v will be the vector which has the same magnitude as
~v but the opposite direction. When one writes ~u ~v, the meaning is ~u + (~v) as with real numbers. The
following example illustrates these definitions and conventions.

Example 4.9: Graphing Vector Addition


Consider the following picture of vectors ~u and ~v.

~u
~v

Sketch a picture of ~u +~v,~u ~v.

Solution. We will first sketch ~u +~v. Begin by drawing ~u and then at the point of ~u, place the tail of ~v as
shown. Then ~u +~v is the vector which results from drawing a vector from the tail of ~u to the tip of ~v.

~v
~u

~u +~v

Next consider ~u ~v. This means ~u + (~v) . From the above geometric description of vector addition,
~v is the vector which has the same length but which points in the opposite direction to ~v. Here is a
picture.

~v

~u ~v
~u


134 Rn

4.4 Length of a Vector

Outcomes
A. Find the length of a vector and the distance between two points in Rn .

B. Find the corresponding unit vector to a vector in Rn .

In this section, we explore what is meant by the length of a vector in Rn . We develop this concept by
first looking at the distance between two points in Rn .
First, we will consider the concept of distance for R, that is, for points in R1 . Here, the distance
between two points P and Q is given by the absolute value of their difference. We denote the distance
between P and Q by d(P, Q) which is defined as
q
d(P, Q) = (P Q)2 (4.2)

Consider now the case for n = 2, demonstrated by the following picture.


P = (p1 , p2 )

Q = (q1 , q2 ) (p1 , q2 )

There are two points P = (p1 , p2 ) and Q = (q1 , q2 ) in the plane. The distance between these points
is shown in the picture as a solid line. Notice that this line is the hypotenuse of a right triangle which
is half of the rectangle shown in dotted lines. We want to find the length of this hypotenuse which will
give the distance between the two points. Note the lengths of the sides of this triangle are |p1 q1 | and
|p2 q2 |, the absolute value of the difference in these values. Therefore, the Pythagorean Theorem implies
the length of the hypotenuse (and thus the distance between P and Q) equals
 1/2  1/2
|p1 q1 |2 + |p2 q2 |2 = (p1 q1 )2 + (p2 q2 )2 (4.3)

Now suppose n = 3 and let P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) be two points in R3 . Consider the
following picture in which the solid line joins the two points and a dotted line joins the points (q1 , q2 , q3 )
and (p1 , p2 , q3 ) .
4.4. Length of a Vector 135

P = (p1 , p2 , p3 )

(p1 , p2 , q3 )

Q = (q1 , q2 , q3 ) (p1 , q2 , q3 )

Here, we need to use Pythagorean Theorem twice in order to find the length of the solid line. First, by
the Pythagorean Theorem, the length of the dotted line joining (q1 , q2 , q3 ) and (p1 , p2 , q3 ) equals
 1/2
(p1 q1 )2 + (p2 q2 )2

while the length of the line joining (p1 , p2 , q3 ) to (p1 , p2 , p3 ) is just |p3 q3 | . Therefore, by the Pythagorean
Theorem again, the length of the line joining the points P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) equals
 !1/2
1/2 2
(p1 q1 )2 + (p2 q2 )2 + (p3 q3 )2

 1/2
= (p1 q1 )2 + (p2 q2 )2 + (p3 q3 )2 (4.4)

This discussion motivates the following definition for the distance between points in Rn .

Definition 4.10: Distance Between Points


Let P = (p1 , , pn ) and Q = (q1 , , qn ) be two points in Rn . Then the distance between these
points is defined as
!1/2
n
distance between P and Q = d(P, Q) = |pk qk |2
k=1

This is called the distance formula. We may also write |P Q| as the distance between P and Q.

From the above discussion, you can see that Definition 4.10 holds for the special cases n = 1, 2, 3, as
in Equations 4.2, 4.3, 4.4. In the following example, we use Definition 4.10 to find the distance between
two points in R4 .

Example 4.11: Distance Between Points


Find the distance between the points P and Q in R4 , where P and Q are given by

P = (1, 2, 4, 6)

and
Q = (2, 3, 1, 0)
136 Rn

Solution. We will use the formula given in Definition 4.10 to find the distance between P and Q. Use the
distance formula and write
 1
2 2 2 2 2
d(P, Q) = (1 2) + (2 3) + (4 (1)) + (6 0) = 47


Therefore, d(P, Q) = 47.

There are certain properties of the distance between points which are important in our study. These are
outlined in the following theorem.

Theorem 4.12: Properties of Distance


Let P and Q be points in Rn , and let the distance between them, d(P, Q), be given as in Definition
4.10. Then, the following properties hold .

d(P, Q) = d(Q, P)

d(P, Q) 0, and equals 0 exactly when P = Q.

There are many applications of the concept of distance. For instance, given two points, we can ask what
collection of points are all the same distance between the given points. This is explored in the following
example.

Example 4.13: The Plane Between Two Points


Describe the points in R3 which are at the same distance between (1, 2, 3) and (0, 1, 2) .

Solution. Let P = (p1 , p2 , p3 ) be such a point. Therefore, P is the same distance from (1, 2, 3) and (0, 1, 2) .
Then by Definition 4.10,
q q
(p1 1) + (p2 2) + (p3 3) = (p1 0)2 + (p2 1)2 + (p3 2)2
2 2 2

Squaring both sides we obtain

(p1 1)2 + (p2 2)2 + (p3 3)2 = p21 + (p2 1)2 + (p3 2)2

and so
p21 2p1 + 14 + p22 4p2 + p23 6p3 = p21 + p22 2p2 + 5 + p23 4p3
Simplifying, this becomes
2p1 + 14 4p2 6p3 = 2p2 + 5 4p3
which can be written as
2p1 + 2p2 + 2p3 = 9 (4.5)
Therefore, the points P = (p1 , p2 , p3 ) which are the same distance from each of the given points form a
plane whose equation is given by 4.5.
4.4. Length of a Vector 137

We can now use our understanding of the distance between two points to define what is meant by the
length of a vector. Consider the following definition.

Definition 4.14: Length of a Vector


Let ~u = [u1 un ]T be a vector in Rn . Then, the length of ~u, written k~uk is given by
q
k~uk = u21 + + u2n

This definition corresponds to Definition 4.10, if you consider the vector ~u to have its tail at the point
0 = (0, , 0) and its tip at the point U = (u1 , , un ). Then the length of ~u is equal to the distance between

0 and U , d(0,U ). In general, d(P, Q) = kPQk.
Consider Example 4.11. By Definition 4.14, we could also find the distance between P and Q as the

length of the vector connecting them. Hence,if we were to draw a vector PQ with its tail at P and its point
at Q, this vector would have length equal to 47.
We conclude this section with a new definition for the special case of vectors of length 1.

Definition 4.15: Unit Vector


Let ~u be a vector in Rn . Then, we call ~u a unit vector if it has length 1, that is if

k~uk = 1

Let ~v be a vector in Rn . Then, the vector ~u which has the same direction as ~v but length equal to 1 is
the corresponding unit vector of ~v. This vector is given by
1
~u = ~v
k~vk

We often use the term normalize to refer to this process. When we normalize a vector, we find the
corresponding unit vector of length 1. Consider the following example.

Example 4.16: Finding a Unit Vector


Let ~v be given by
 T
~v = 1 3 4
Find the unit vector ~u which has the same direction as ~v .

Solution. We will use Definition 4.15 to solve this. Therefore, we need to find the length of ~v which, by
Definition 4.14 is given by q
k~vk = v21 + v22 + v23
Using the corresponding values we find that
q
k~vk = 12 + (3)2 + 42
138 Rn


= 1 + 9 + 16

= 26

In order to find ~u, we divide ~v by 26. The result is
1
~u = ~v
k~vk
1  T
= 1 3 4
26
h iT
= 1 3 4
26 26 26

You can verify using the Definition 4.14 that k~uk = 1.

4.5 Geometric Meaning of Scalar Multiplication

Outcomes
A. Understand scalar multiplication, geometrically.

q the point P = (p1 , p2 , p3 ) determines a vector ~p from 0 to P. The length of ~p, denoted k~pk,
Recall that
is equal to p21 + p22 + p23 by Definition 4.10.
 T
Now suppose we have a vector ~u = u1 u2 u3 and we multiply ~u by a scalar k. By Definition
 T
4.5, k~u = ku1 ku2 ku3 . Then, by using Definition 4.10, the length of this vector is given by
r  q
2 2 2
(ku1 ) + (ku2 ) + (ku3 ) = |k| u21 + u22 + u23

Thus the following holds.


kk~uk = |k| k~uk
In other words, multiplication by a scalar magnifies or shrinks the length of the vector by a factor of |k|.
If |k| > 1, the length of the resulting vector will be magnified. If |k| < 1, the length of the resulting vector
will shrink. Remember that by the definition of the absolute value, |k| > 0.
What about the direction? Draw a picture of ~u and k~u where k is negative. Notice that this causes the
resulting vector to point in the opposite direction while if k > 0 it preserves the direction the vector points.
Therefore the direction can either reverse, if k < 0, or remain preserved, if k > 0.
Consider the following example.
4.5. Geometric Meaning of Scalar Multiplication 139

Example 4.17: Graphing Scalar Multiplication


Consider the vectors ~u and ~v drawn below.

~u
~v

Draw ~u, 2~v, and 12~v.

Solution.
In order to find ~u, we preserve the length of ~u and simply reverse the direction. For 2~v, we double
the length of ~v, while preserving the direction. Finally 12~v is found by taking half the length of ~v and
reversing the direction. These vectors are shown in the following diagram.

~u
21~v
~v
~u 2~v


Now that we have studied both vector addition and scalar multiplication, we can combine the two
actions. Recall Definition 4.7 of linear combinations of column matrices. We can apply this definition to
vectors in Rn . A linear combination of vectors in Rn is a sum of vectors multiplied by scalars.
In the following example, we examine the geometric meaning of this concept.

Example 4.18: Graphing a Linear Combination of Vectors


Consider the following picture of the vectors ~u and ~v

~u
~v

Sketch a picture of ~u + 2~v,~u 12~v.

Solution. The two vectors are shown below.


140 Rn

~u 2~v

~u + 2~v

12~v
~u 12~v

~u

4.6 Parametric Lines

Outcomes
A. Find the vector and parametric equations of a line.

We can use the concept of vectors and points to find equations for arbitrary lines in Rn , although in
this section the focus will be on lines in R3 .
To begin, consider the case n = 1 so we have R1 = R. There is only one line here which is the familiar
number line, that is R itself. Therefore it is not necessary to explore the case of n = 1 further.
Now consider the case where n = 2, in other words R2 . Let P and P0 be two different points in R2
which are contained in a line L. Let ~p and ~p0 be the position vectors for the points P and P0 respectively.
Suppose that Q is an arbitrary point on L. Consider the following diagram.

P0


Our goal is to be able to define Q in terms of P and P0 . Consider the vector P0 P = ~p ~p0 which has its
tail at P0 and point at P. If we add ~p ~p0 to the position vector ~p0 for P0, the sum would be a vector with
its point at P. In other words,
~p = ~p0 + (~p ~p0 )
4.6. Parametric Lines 141

Now suppose we were to add t(~p ~p0 ) to ~p where t is some scalar. You can see that by doing so, we
could find a vector with its point at Q. In other words, we can find t such that

~q = ~p0 + t (~p ~p0 )

This equation determines the line L in R2 . In fact, it determines a line L in Rn . Consider the following
definition.

Definition 4.19: Vector Equation of a Line


Suppose a line L in Rn contains the two different points P and P0 . Let ~p and p~0 be the position
vectors of these two points, respectively. Then, L is the collection of points Q which have the
position vector ~q given by
~q = ~p0 + t (~p ~p0 )
where t R.
Let d~ = ~p ~p0 . Then d~ is the direction vector for L and the vector equation for L is given by
~ R
~p = ~p0 + t d,t

Note that this definition agrees with the usual notion of a line in two dimensions and so this is consistent
with earlier concepts. Consider now points in R3 . If a point P R3 is given by P = (x, y, z), P0 R3 by
P0 = (x0 , y0 , z0 ), then we can write

x x0 a
y = y0 + t b
z z0 c

a
~
where d = b . This is the vector equation of L written in component form .

c
The following theorem claims that such an equation is in fact a line.

Proposition 4.20: Algebraic Description of a Straight Line


Let ~a,~b Rn with ~b 6= ~0. Then ~x = ~a + t~b, t R, is a line.

x2 Rn . Define x~1 = ~a and let ~x2 ~x1 = ~b. Since ~b 6= ~0, it follows that ~x2 6= ~x1 . Then
Proof. Let x~1 , ~
~a + t~b = ~
x1 + t (~x2 ~x1 ). It follows that ~x = ~a + t~b is a line containing the two different points X1 and X2
whose position vectors are given by ~x1 and ~x2 respectively.
We can use the above discussion to find the equation of a line when given two distinct points. Consider
the following example.

Example 4.21: A Line From Two Points


Find a vector equation for the line through the points P0 = (1, 2, 0) and P = (2, 4, 6) .
142 Rn

Solution. We will use the definition of a line given above in Definition 4.19 to write this line in the form

~q = ~p0 + t (~p ~p0 )



x
Let ~q = y . Then, we can find ~p and ~p0 by taking the position vectors of points P and P0 respectively.
z
Then,
~q = ~p0 + t (~p ~p0 )
can be written as
x 1 1
y = 2 + t 6 , t R
z 0 6

1 2 1
Here, the direction vector 6 is obtained by ~p p~0 = 4 2 as indicated above in Defi-
6 6 0
nition 4.19.
Notice that in the above example we said that we found a vector equation for the line, not the
equation. The reason for this terminology is that there are infinitely many different vector equations for
the same line. To see this, replace t with another parameter, say 3s. Then you obtain a different vector
equation for the same line because the same set of points is obtained.

1
In Example 4.21, the vector given by 6 is the direction vector defined in Definition 4.19. If we
6
know the direction vector of a line, as well as a point on the line, we can find the vector equation.
Consider the following example.

Example 4.22: A Line From a Point and a Direction Vector

equation for the line which contains the point P0 = (1, 2, 0) and has direction vector
Finda vector
1
d~ = 2
1

~ t R. We are given the


Solution. We will use Definition 4.19 to write this line in the form ~p = ~p0 + t d,
~
direction
vector d. In to find ~p0 , we can use the position vector of the point P0 . This is given by
order
1 x
2 . Letting ~p = y , the equation for the line is given by
0 z

x 1 1
y = 2 +t 2 , t R (4.6)
z 0 1


4.6. Parametric Lines 143

We sometimes elect to write a line such as the one given in 4.6 in the form

x = 1+t
y = 2 + 2t where t R (4.7)

z=t

This set of equations give the same information as 4.6, and is called the parametric equation of the line.
Consider the following definition.

Definition 4.23: Parametric Equation of a Line



a
Let L be a line in R3 which has direction vector d~ = b and goes through the point P0 =
c
(x0 , y0 , z0 ). Then, letting t be a parameter, we can write L as

x = x0 + ta
y = y0 + tb where t R

z = z0 + tc

This is called a parametric equation of the line L.

You can verify that the form discussed following Example 4.22 in equation 4.7 is of the form given in
Definition 4.23.
There is one other form for a line which is useful, which is the symmetric form. Consider the line
given by 4.7. You can solve for the parameter t to write

t = x1
t = y2
2
t =z

Therefore,
y2
x1 = =z
2
This is the symmetric formof the line.
In the following example, we look at how to take the equation of a line from symmetric form to
parametric form.

Example 4.24: Change Symmetric Form to Parametric Form


Suppose the symmetric form of a line is
x2 y1
= = z+3
3 2
Write the line in parametric form as well as vector form.
144 Rn

Solution. We want to write this line in the form given by Definition 4.23. This is of the form

x = x0 + ta
y = y0 + tb where t R

z = z0 + tc

x2 y1
Let t = 3 ,t = 2 and t = z + 3, as given in the symmetric form of the line. Then solving for x, y, z,
yields
x = 2 + 3t
y = 1 + 2t with t R

z = 3 + t
This is the parametric equation for this line.
Now, we want to write this line in the form given by Definition 4.19. This is the form

~p = ~p0 + t d~

where t R. This equation becomes



x 2 3
y = 1 +t 2 , t R
z 3 1

Exercises

Exercise 4.6.5 Find the vector equation for the line through (7, 6, 0) and (1, 1, 4) . Then, find the
parametric equations for this line.

Exercise
4.6.6 Find parametric equations for the line through the point (7, 7, 1) with a direction vector
1
d~ = 6 .
2

Exercise 4.6.7 Parametric equations of the line are

x = t +2
y = 6 3t
z = t 6

Find a direction vector for the line and a point on the line.

Exercise 4.6.8 Find the vector equation for the line through the two points (5, 5, 1), (2, 2, 4) . Then, find
the parametric equations.
4.7. The Dot Product 145

Exercise 4.6.9 The equation of a line in two dimensions is written as y = x 5. Find parametric equations
for this line.

Exercise 4.6.10 Find parametric equations for the line through (6, 5, 2) and (5, 1, 2) .

Exercise 4.6.11 Find the vector


equation
and parametric equations for the line through the point (7, 10, 6)
1
with a direction vector d~ = 1 .
3

Exercise 4.6.12 Parametric equations of the line are

x = 2t + 2
y = 5 4t
z = t 3

Find a direction vector for the line and a point on the line, and write the vector equation of the line.

Exercise 4.6.13 Find the vector equation and parametric equations for the line through the two points
(4, 10, 0), (1, 5, 6) .

1
Exercise 4.6.14 Find the point on the line segment from P = (4, 7, 5) to Q = (2, 2, 3) which is 7 of
the way from P to Q.

Exercise 4.6.15 Suppose a triangle in Rn has vertices at P1 , P2 , and P3 . Consider the lines which are
drawn from a vertex to the mid point of the opposite side. Show these three lines intersect in a point and
find the coordinates of this point.

4.7 The Dot Product

Outcomes
A. Compute the dot product of vectors, and use this to compute vector projections.

4.7.1. The Dot Product

There are two ways of multiplying vectors which are of great importance in applications. The first of these
is called the dot product. When we take the dot product of vectors, the result is a scalar. For this reason,
the dot product is also called the scalar product and sometimes the inner product. The definition is as
follows.
146 Rn

Definition 4.25: Dot Product


Let ~u,~v be two vectors in Rn . Then we define the dot product ~u ~v as
n
~u ~v = uk vk
k=1

The dot product ~u ~v is sometimes denoted as (~u,~v) where a comma replaces . It can also be written
as h~u,~vi. If we write the vectors as column or row matrices, it is equal to the matrix product ~v~wT .
Consider the following example.

Example 4.26: Compute a Dot Product


Find ~u ~v for
1 0
2 1
~u =
0
,~v =


2
1 3

Solution. By Definition 4.25, we must compute


4
~u ~v = uk vk
k=1

This is given by

~u ~v = (1)(0) + (2)(1) + (0)(2) + (1)(3)


= 0 + 2 + 0 + 3
= 1


With this definition, there are several important properties satisfied by the dot product.

Proposition 4.27: Properties of the Dot Product


Let k and p denote scalars and ~u,~v, ~w denote vectors. Then the dot product ~u ~v satisfies the follow-
ing properties.

~u ~v =~v ~u

~u ~u 0 and equals zero if and only if ~u = ~0

(k~u + p~v) ~w = k (~u ~w) + p (~v ~w)

~u (k~v + p~w) = k (~u ~v) + p (~u ~w)

k~uk2 = ~u ~u
4.7. The Dot Product 147

The proof is left as an exercise. This proposition tells us that we can also use the dot product to find
the length of a vector.

Example 4.28: Length of a Vector


Find the length of
2
1
~u =
4
2
That is, find k~uk.


Solution. By Proposition 4.27, k~uk2 = ~u ~u. Therefore, k~uk = ~u ~u. First, compute ~u ~u.
This is given by

~u ~u = (2)(2) + (1)(1) + (4)(4) + (2)(2)


= 4 + 1 + 16 + 4
= 25

Then,

k~uk = ~u ~u

= 25
= 5


You may wish to compare this to our previous definition of length, given in Definition 4.14.
The Cauchy Schwarz inequality is a fundamental inequality satisfied by the dot product. It is given
in the following theorem.

Theorem 4.29: Cauchy Schwarz Inequality


The dot product satisfies the inequality

|~u ~v| k~ukk~vk (4.8)

Furthermore equality is obtained if and only if one of ~u or ~v is a scalar multiple of the other.

Proof. First note that if~v =~0 both sides of 4.8 equal zero and so the inequality holds in this case. Therefore,
it will be assumed in what follows that ~v 6= ~0.
Define a function of t R by
f (t) = (~u + t~v) (~u + t~v)
Then by Proposition 4.27, f (t) 0 for all t R. Also from Proposition 4.27

f (t) = ~u (~u + t~v) + t~v (~u + t~v)


148 Rn

= ~u ~u + t (~u ~v) + t~v ~u + t 2~v ~v


= k~uk2 + 2t (~u ~v) + k~vk2t 2

Now this means the graph of y = f (t) is a parabola which opens up and either its vertex touches the t
axis or else the entire graph is above the t axis. In the first case, there exists some t where f (t) = 0 and
this requires ~u + t~v = ~0 so one vector is a multiple of the other. Then clearly equality holds in 4.8. In the
case where ~v is not a multiple of ~u, it follows f (t) > 0 for all t which says f (t) has no real zeros and so
from the quadratic formula,
(2 (~u ~v))2 4k~uk2 k~vk2 < 0
which is equivalent to |~u ~v| < k~ukk~vk.
Notice that this proof was based only on the properties of the dot product listed in Proposition 4.27.
This means that whenever an operation satisfies these properties, the Cauchy Schwarz inequality holds.
There are many other instances of these properties besides vectors in Rn .
The Cauchy Schwarz inequality provides another proof of the triangle inequality for distances in Rn .

Theorem 4.30: Triangle Inequality


For ~u,~v Rn
k~u +~vk k~uk + k~vk (4.9)
and equality holds if and only if one of the vectors is a non-negative scalar multiple of the other.
Also
kk~uk k~vkk k~u ~vk (4.10)

Proof. By properties of the dot product and the Cauchy Schwarz inequality,

k~u +~vk2 = (~u +~v) (~u +~v)


= (~u ~u) + (~u ~v) + (~v ~u) + (~v ~v)
= k~uk2 + 2 (~u ~v) + k~vk2
k~uk2 + 2 |~u ~v| + k~vk2
k~uk2 + 2k~ukk~vk + k~vk2 = (k~uk + k~vk)2

Hence,
k~u +~vk2 (k~uk + k~vk)2
Taking square roots of both sides you obtain 4.9.
It remains to consider when equality occurs. Suppose ~u = ~0. Then, ~u = 0~v and the claim about when
equality occurs is verified. The same argument holds if ~v = ~0. Therefore, it can be assumed both vectors
are nonzero. To get equality in 4.9 above, Theorem 4.29 implies one of the vectors must be a multiple of
the other. Say ~v = k~u. If k < 0 then equality cannot occur in 4.9 because in this case

~u ~v = kk~uk2 < 0 < |k| k~uk2 = |~u ~v|

Therefore, k 0.
4.7. The Dot Product 149

To get the other form of the triangle inequality write


~u = ~u ~v +~v
so
k~uk = k~u ~v +~vk
k~u ~vk + k~vk
Therefore,
k~uk k~vk k~u ~vk (4.11)
Similarly,
k~vk k~uk k~v ~uk = k~u ~vk (4.12)
It follows from 4.11 and 4.12 that 4.10 holds. This is because |k~uk k~vk| equals the left side of either 4.11
or 4.12 and either way, |k~uk k~vk| k~u ~vk.

4.7.2. The Geometric Significance of the Dot Product

Given two vectors, ~u and ~v, the included angle is the angle between these two vectors which is given by
such that 0 . The dot product can be used to determine the included angle between two vectors.
Consider the following picture where gives the included angle.

~v

~u

Proposition 4.31: The Dot Product and the Included Angle


Let ~u and ~v be two vectors in Rn , and let be the included angle. Then the following equation
holds.
~u ~v = k~ukk~vk cos

In words, the dot product of two vectors equals the product of the magnitude (or length) of the two
vectors multiplied by the cosine of the included angle. Note this gives a geometric description of the dot
product which does not depend explicitly on the coordinates of the vectors.
Consider the following example.

Example 4.32: Find the Angle Between Two Vectors


Find the angle between the vectors given by

2 3
~u = 1 ,~v = 4

1 1
150 Rn

Solution. By Proposition 4.31,


~u ~v = k~ukk~vk cos
Hence,
~u ~v
cos =
k~ukk~vk
First, we can compute ~u ~v. By Definition 4.25, this equals
~u ~v = (2)(3) + (1)(4) + (1)(1) = 9

Then, p
k~uk =p (2)(2) + (1)(1) + (1)(1) = 6
k~vk = (3)(3) + (4)(4) + (1)(1) = 26
Therefore, the cosine of the included angle equals
9
cos = = 0.7205766...
26 6
With the cosine known, the angle can be determined by computing the inverse cosine of that angle,
giving approximately = 0.76616 radians.
Another application of the geometric description of the dot product is in finding the angle between two
lines. Typically one would assume that the lines intersect. In some situations, however, it may make sense
to ask this question when the lines do not intersect, such as the angle between two object trajectories. In
any case we understand it to mean the smallest angle between (any of) their direction vectors. The only
subtlety here is that if ~u is a direction vector for a line, then so is any multiple k~u, and thus we will find
complementary angles among all angles between direction vectors for two lines, and we simply take the
smaller of the two.

Example 4.33: Find the Angle Between Two Lines


Find the angle between the two lines

x 1 1
L1 : y = 2 + t 1
z 0 2

and
x 0 2
L2 : y = 4 + s 1
z 3 1

Solution. You can verify that these lines do not intersect, but as discussed above this does not matter and
we simply find the smallest angle between any directions vectors for these lines.
To do so we first find the angle between the direction vectors given above:

1 2
~u = 1 , ~v = 1
2 1
4.7. The Dot Product 151

In order to find the angle, we solve the following equation for

~u ~v = k~ukk~vk cos
2
to obtain cos = 12 and since we choose included angles between 0 and we obtain = 3 .
Now the angles between any two direction vectors for these lines will either be 23 or its complement
= 23 = 3 . We choose the smaller angle, and therefore conclude that the angle between the two lines
is 3 .
We can also use Proposition 4.31 to compute the dot product of two vectors.

Example 4.34: Using Geometric Description to Find a Dot Product


Let ~u,~v be vectors with k~uk = 3 and k~vk = 4. Suppose the angle between ~u and ~v is /3. Find ~u ~v.

Solution. From the geometric description of the dot product in Proposition 4.31

~u ~v = (3)(4) cos( /3) = 3 4 1/2 = 6


Two nonzero vectors are said to be perpendicular, sometimes also called orthogonal, if the included
angle is /2 radians (90 ).
Consider the following proposition.

Proposition 4.35: Perpendicular Vectors


Let ~u and ~v be nonzero vectors in Rn . Then, ~u and ~v are said to be perpendicular exactly when

~u ~v = 0

Proof. This follows directly from Proposition 4.31. First if the dot product of two nonzero vectors is equal
to 0, this tells us that cos = 0 (this is where we need nonzero vectors). Thus = /2 and the vectors are
perpendicular.
If on the other hand ~v is perpendicular to ~u, then the included angle is /2 radians. Hence cos = 0
and ~u ~v = 0.
Consider the following example.

Example 4.36: Determine if Two Vectors are Perpendicular


Determine whether the two vectors,

2 1
~u = 1 ,~v = 3

1 5

are perpendicular.
152 Rn

Solution. In order to determine if these two vectors are perpendicular, we compute the dot product. This
is given by
~u ~v = (2)(1) + (1)(3) + (1)(5) = 0
Therefore, by Proposition 4.35 these two vectors are perpendicular.

4.7.3. Projections

In some applications, we wish to write a vector as a sum of two related vectors. Through the concept
of projections, we can find these two vectors. First, we explore an important theorem. The result of this
theorem will provide our definition of a vector projection.

Theorem 4.37: Vector Projections


Let ~v and ~u be nonzero vectors. Then there exist unique vectors ~v|| and ~v such that

~v =~v|| +~v (4.13)

where ~v|| is a scalar multiple of ~u, and ~v is perpendicular to ~u.

Proof. Suppose 4.13 holds and ~v|| = k~u. Taking the dot product of both sides of 4.13 with ~u and using
~v ~u = 0, this yields
~v ~u = (~v|| +~v ) ~u
= k~u ~u +~v ~u
= kk~uk2
which requires k = ~v ~u/k~uk2 . Thus there can be no more than one vector ~v|| . It follows ~v must equal
~v ~v|| . This verifies there can be no more than one choice for both ~v|| and ~v and proves their uniqueness.
Now let
~v ~u
~v|| = ~u
k~uk2
and let
~v ~u
~v =~v ~v|| =~v ~u
k~uk2
~v~u
Then ~v|| = k~u where k = k~uk2
. It only remains to verify ~v ~u = 0. But

~v ~u
~v ~u = ~v ~u ~u ~u
k~uk2
= ~v ~u ~v ~u
= 0


The vector ~v|| in Theorem 4.37 is called the projection of ~v onto ~u and is denoted by

~v|| = proj~u (~v)


4.7. The Dot Product 153

We now make a formal definition of the vector projection.

Definition 4.38: Vector Projection


Let ~u and ~v be vectors. Then, the projection of ~v onto ~u is given by
 
~v ~u ~v ~u
proj~u (~v) = ~u = ~u
~u ~u k~uk2

Consider the following example of a projection.

Example 4.39: Find the Projection of One Vector Onto Another


Find proj~u (~v) if
2 1
~u = 3 ,~v = 2
4 1

Solution. We can use the formula provided in Definition 4.38 to find proj~u (~v). First, compute ~v ~u. This
is given by

1 2
2 3 = (2)(1) + (3)(2) + (4)(1)
1 4
= 264
= 8

Similarly, ~u ~u is given by

2 2
3 3 = (2)(2) + (3)(3) + (4)(4)
4 4
= 4 + 9 + 16
= 29

Therefore, the projection is equal to



2
8
proj~u (~v) = 3
29
4

16
29
= 24
29
32
29
154 Rn


We will conclude this section with an important application of projections. Suppose a line L and a
point P are given such that P is not contained in L. Through the use of projections, we can determine the
shortest distance from P to L.

Example 4.40: Shortest Distance from a Point to a Line


Let P = (1, 3, 5) be a pointin R3 , and let L be the line which goes through point P0 = (0, 4, 2) with
2
direction vector d~ = 1 . Find the shortest distance from P to the line L, and find the point Q on
2
L that is closest to P.


Solution. In order to determine the shortest distance from P to L, we will first find the vector P0 P and then

find the projection of this vector onto L. The vector P0 P is given by

1 0 1
3 4 = 1
5 2 7

Then, if Q is the point on L closest to P, it follows that



P0 Q = projd~ P0 P
~ !
P0 P d ~
= d
~ 2
kdk

2
15
= 1
9
2

2
5
= 1
3
2

Now, the distance from P to L is given by



kQPk = kP0 P P0 Qk = 26

The point Q is found by adding the vector P0 Q to the position vector 0P0 for P0 as follows

10
0 2 3
17
5
4 + 1 =
3 3
2 2 4
3

Therefore, Q = ( 10 17 4
3 , 3 , 3 ).

4.7. The Dot Product 155

Exercises

1 2
2 0
Exercise 4.7.16 Find
3 1
.

4 3

Exercise 4.7.17 Use the formula given in Proposition 4.31 to verify the Cauchy Schwarz inequality and
to show that equality occurs if and only if one of the vectors is a scalar multiple of the other.

Exercise 4.7.18 For ~u,~v vectors in R3 , define the product, ~u ~v = u1 v1 + 2u2 v2 + 3u3 v3 . Show the axioms
for a dot product all hold for this product. Prove

k~u ~vk (~u ~u)1/2 (~v ~v)1/2

   
~ ~ 1 ~ 2 ~ 2
Exercise 4.7.19 Let ~a, b be vectors. Show that ~a b = 4 k~a + bk k~a bk .

Exercise 4.7.20 Using the axioms of the dot product, prove the parallelogram identity:

k~a +~bk2 + k~a ~bk2 = 2k~ak2 + 2k~bk2

Exercise 4.7.21 Let A be a real m n matrix and let ~u Rn and ~v Rm . Show A~u ~v = ~u AT~v. Hint:
Use the definition of matrix multiplication to do this.

Exercise 4.7.22 Use the result of Problem 4.7.21 to verify directly that (AB)T = BT AT without making
any reference to subscripts.

Exercise 4.7.23 Find the angle between the vectors



3 1
~u = 1 ,~v = 4

1 2

Exercise 4.7.24 Find the angle between the vectors



1 1
~u = 2 ,~v = 2
1 7


1 1
Exercise 4.7.25 Find proj~v (~w) where ~w = 0 and ~v = 2 .

2 3
156 Rn


1 1
Exercise 4.7.26 Find proj~v (~w) where ~w = 2 and ~v = 0 .

2 3

1 1
2 2
Exercise 4.7.27 Find proj~v (~w) where ~w =
2 and ~v =
.
3
1 0

3
Exercise 4.7.28 Let P = (1,2, 3) be a point in R . Let L be the line through the point P0 = (1, 4, 5) with
1
direction vector d~ = 1 . Find the shortest distance from P to L, and find the point Q on L that is
1
closest to P.

(0, 2, 1) be a point in R3 . Let L be the line through the point P0 = (1, 1, 1) with
Exercise 4.7.29 LetP =
3
~
direction vector d = 0 . Find the shortest distance from P to L, and find the point Q on L that is closest

1
to P.

Exercise 4.7.30 Does it make sense to speak of proj~0 (~w)?

Exercise 4.7.31 Prove the Cauchy Schwarz inequality in Rn as follows. For ~u,~v vectors, consider

(~w proj~v~w) (~w proj~v~w) 0

Simplify using the axioms of the dot product and then put in the formula for the projection. Notice that
this expression equals 0 and you get equality in the Cauchy Schwarz inequality if and only if ~w = proj~v~w.
What is the geometric meaning of ~w = proj~v~w?

Exercise 4.7.32 Let ~v,~w ~u be vectors. Show that (~w +~u) = ~w +~u where ~w = ~w proj~v (~w) .

Exercise 4.7.33 Show that


(~v proj~u (~v) ,~u) = (~v proj~u (~v)) ~u = 0
and conclude every vector in Rn can be written as the sum of two vectors, one which is perpendicular and
one which is parallel to the given vector.
4.8. Planes in Rn 157

4.8 Planes in Rn

Outcomes
A. Find the vector and scalar equations of a plane.

Much like the above discussion with lines, vectors can be used to determine planes in Rn . Given a
vector ~n in Rn and a point P0 , it is possible to find a unique plane which contains P0 and is perpendicular
to the given vector.

Definition 4.41: Normal Vector


Let ~n be a nonzero vector in Rn . Then ~n is called a normal vector to a plane if and only if

~n ~v = 0

for every vector ~v in the plane.

In other words, we say that ~n is orthogonal (perpendicular) to every vector in the plane.
Consider now a plane with normal vector given by ~n, and containing a point P0 . Notice that this plane
is unique. If P is an arbitrary point on this plane, then by definition the normal vector is orthogonal to the

vector between P0 and P. Letting 0P and 0P0 be the position vectors of points P and P0 respectively, it
follows that


~n (0P 0P0 ) = 0
or

~n P0P = 0
The first of these equations gives the vector equation of the plane.

Definition 4.42: Vector Equation of a Plane


Let ~n be the normal vector for a plane which contains a point P0 . If P is an arbitrary point on this
plane, then the vector equation of the plane is given by


~n (0P 0P0 ) = 0

Notice that this equation can be used to determine if a point P is contained in a certain plane.

Example 4.43: A Point in a Plane



1
Let ~n = 2 be the normal vector for a plane which contains the point P0 = (2, 1, 4). Determine
3
if the point P = (5, 4, 1) is contained in this plane.
158 Rn

Solution. By Definition 4.42, P is a point in the plane if it satisfies the equation




~n (0P 0P0 ) = 0

Given the above ~n, P0 , and P, this equation becomes



1 5 2 1 3
2 4 1 = 2 3
3 1 4 3 3
= 3+69 = 0

Therefore P = (5, 4, 1) is contained in the plane.




a
Suppose ~n = b , P = (x, y, z) and P0 = (x0 , y0 , z0 ).

c
Then


~n (0P 0P0 ) = 0

a x x0
b y y0 = 0
c z z0

a x x0
b y y0 = 0
c z z0
a(x x0 ) + b(y y0 ) + c(z z0 ) = 0

We can also write this equation as

ax + by + cz = ax0 + by0 + cz0

Notice that since P0 is given, ax0 + by0 + cz0 is a known scalar, which we can call d. This equation
becomes
ax + by + cz = d

Definition 4.44: Scalar Equation of a Plane



a
Let ~n = b be the normal vector for a plane which contains the point P0 = (x0 , y0 , z0 ).Then if
c
P = (x, y, z) is an arbitrary point on the plane, the scalar equation of the plane is given by

ax + by + cz = d

where a, b, c, d R and d = ax0 + by0 + cz0 .


4.8. Planes in Rn 159

Consider the following equation.

Example 4.45: Finding the Equation of a Plane



2
Find an equation of the plane containing P0 = (3, 2, 5) and orthogonal to ~n = 4 .
1

Solution. The above vector ~n is the normal vector for this plane. Using Definition 4.42, we can determine
the vector equation for this plane.


~n (0P 0P0 ) = 0

2 x 3
4 y 2 = 0
1 z 5

2 x3
4 y+2 = 0
1 z5

Using Definition 4.44, we can determine the scalar equation of the plane.

2x + 4y + 1z = 2(3) + 4(2) + 1(5) = 9

Hence, the vector equation of the plane is



2 x3
4 y+2 = 0
1 z5
and the scalar equation is
2x + 4y + 1z = 9

Suppose a point P is not contained in a given plane. We are then interested in the shortest distance
from that point P to the given plane. Consider the following example.

Example 4.46: Shortest Distance From a Point to a Plane


Find the shortest distance from the point P = (3, 2, 3) to the plane given by
2x + y + 2z = 2, and find the point Q on the plane that is closest to P.

Solution. Pick an arbitrary point P0 on the plane. Then, it follows that



QP = proj~n P0 P


and kQPk is the shortest distance from P to the plane. Further, the vector 0Q = 0P QP gives the necessary
point Q.
160 Rn


2


From the above scalar equation, we have that ~n = 1 . Now, choose P0 = (1, 0, 0) so that ~n 0P =
2

3 1 2

2 = d. Then, P0 P = 2 0 = 2 .
3 0 3

Next, compute QP = proj~n P0 P.

QP = proj~n P0 P
!
P0 P ~n
= ~n
k~nk2

2
12
= 1
9
2

2
4
= 1
3
2


Then, kQPk = 4 so the shortest distance from P to the plane is 4.
Next, to find the point Q on the plane which is closest to P we have



0Q = 0P QP

3 2
4
= 2 1
3
3 2

1
1
= 2
3
1

Therefore, Q = ( 13 , 23 , 31 ).

4.9 The Cross Product

Outcomes

A. Compute the cross product and box product of vectors in R3 .

Recall that the dot product is one of two important products for vectors. The second type of product
for vectors is called the cross product. It is important to note that the cross product is only defined in
4.9. The Cross Product 161

R3 . First we discuss the geometric meaning and then a description in terms of coordinates is given, both
of which are important. The geometric description is essential in order to understand the applications to
physics and geometry while the coordinate description is necessary to compute the cross product.
Consider the following definition.

Definition 4.47: Right Hand System of Vectors


Three vectors, ~u,~v,~w form a right hand system if when you extend the fingers of your right hand
along the direction of vector ~u and close them in the direction of ~v, the thumb points roughly in the
direction of ~w.

For an example of a right handed system of vectors, see the following picture.

~w
~u

~v

In this picture the vector ~w points upwards from the plane determined by the other two vectors. Point
the fingers of your right hand along ~u, and close them in the direction of ~v. Notice that if you extend the
thumb on your right hand, it points in the direction of ~w.
You should consider how a right hand system would differ from a left hand system. Try using your left
hand and you will see that the vector ~w would need to point in the opposite direction.
Notice that the special vectors, ~i, ~j,~k will always form a right handed system. If you extend the fingers
of your right hand along ~i and close them in the direction ~j, the thumb points in the direction of ~k.

~k

~j
~i

The following is the geometric description of the cross product. Recall that the dot product of two
vectors results in a scalar. In contrast, the cross product results in a vector, as the product gives a direction
as well as magnitude.
162 Rn

Definition 4.48: Geometric Definition of Cross Product


Let ~u and ~v be two vectors in R3 . Then the cross product, written ~u ~v, is defined by the following
two rules.

1. Its length is k~u ~vk = k~ukk~vk sin ,


where is the included angle between ~u and ~v.

2. It is perpendicular to both ~u and ~v, that is (~u ~v) ~u = 0, (~u ~v) ~v = 0,


and ~u,~v,~u ~v form a right hand system.

The cross product of the special vectors ~i, ~j,~k is as follows.

~i ~j =~k ~j ~i = ~k
~k ~i = ~j ~i ~k = ~j
~j ~k =~i ~k ~j = ~i

With this information, the following gives the coordinate description of the cross product.
 T
Recall that the vector ~u = u1 u2 u3 can be written in terms of ~i, ~j,~k as ~u = u1~i + u2~j + u3~k.

Proposition 4.49: Coordinate Description of Cross Product


Let ~u = u1~i + u2~j + u3~k and ~v = v1~i + v2~j + v3~k be two vectors. Then

~u ~v = (u2 v3 u3 v2 )~i (u1 v3 u3 v1 ) ~j+


(4.14)
+ (u1 v2 u2 v1 )~k

Writing ~u ~v in the usual way, it is given by



u2 v3 u3 v2
~u ~v = (u1 v3 u3 v1 )
u1 v2 u2 v1

We now prove this proposition.


Proof. From the above table and the properties of the cross product listed,
   
~u ~v = u1~i + u2~j + u3~k v1~i + v2~j + v3~k
= u1 v2~i ~j + u1 v3~i ~k + u2 v1~j ~i + u2 v3~j ~k + +u3 v1~k ~i + u3 v2~k ~j
= u1 v2~k u1 v3~j u2 v1~k + u2 v3~i + u3 v1~j u3 v2~i
= (u2 v3 u3 v2 )~i + (u3 v1 u1 v3 ) ~j + (u1 v2 u2 v1 )~k

(4.15)

4.9. The Cross Product 163

There is another version of 4.14 which may be easier to remember in case you have already covered
the notion of matrix determinant; otherwise you can skip the next few lines. We can express the cross
product as the determinant of a matrix, as follows.

~i ~j ~k

~u ~v = u1 u2 u3 (4.16)
v1 v2 v3

Expanding the determinant along the top row yields



u2 u3 u1 u3 ~
3+1 u1 u2
~i (1)
1+1 + ~j (1) 2+1
+ k (1)
v2 v3 v1 v3 v1 v2

u2 u3 u1 u3 ~ u1 u2
~
= i ~
j +k
v2 v3 v1 v3 v1 v2
Expanding these determinants leads to

(u2 v3 u3 v2 )~i (u1 v3 u3 v1 ) ~j + (u1 v2 u2 v1 )~k

which is the same as 4.15.


The cross product satisfies the following properties.

Proposition 4.50: Properties of the Cross Product


Let ~u,~v,~w be vectors in R3 , and k a scalar. Then, the following properties of the cross product hold.

1. ~u ~v = (~v ~u) , and ~u ~u = ~0

2. (k~u) ~v = k (~u ~v) = ~u (k~v)

3. ~u (~v + ~w) = ~u ~v +~u ~w

4. (~v + ~w) ~u =~v ~u + ~w ~u

Proof. Formula 1. follows immediately from the definition. The vectors ~u ~v and ~v ~u have the same
magnitude, |~u| |~v| sin , and an application of the right hand rule shows they have opposite direction.
Formula 2. is proven as follows. If k is a non-negative scalar, the direction of (k~u) ~v is the same as
the direction of ~u ~v, k (~u ~v) and ~u (k~v). The magnitude is k times the magnitude of ~u ~v which is the
same as the magnitude of k (~u ~v) and ~u (k~v) . Using this yields equality in 2. In the case where k < 0,
everything works the same way except the vectors are all pointing in the opposite direction and you must
multiply by |k| when comparing their magnitudes.
The distributive laws, 3. and 4., are much harder to establish. For now, it suffices to notice that if we
know that 3. is true, 4. follows. Thus, assuming 3., and using 1.,

(~v + ~w) ~u = ~u (~v + ~w)


= (~u ~v +~u ~w)
=~v ~u + ~w ~u
164 Rn


We will now look at an example of how to compute a cross product.

Example 4.51: Find a Cross Product


Find ~u ~v for the following vectors

1 3
~u = 1 ,~v = 2
2 1

Solution. Note that we can write ~u,~v in terms of the special vectors ~i, ~j,~k as
~u =~i ~j + 2~k
~v = 3~i 2~j +~k
We will use the equation given by 4.16 to compute the cross product.

~i ~j ~k
1 2 1 2
~u ~v = 1 1 2 = ~i ~j + 1 1 ~k = 3~i + 5~j +~k
3 2 1 2 1 3 1 3 2

We can write this result in the usual way, as



3
~u ~v = 5
1

An important geometrical application of the cross product is as follows. The size of the cross product,
k~u ~vk, is the area of the parallelogram determined by ~u and ~v, as shown in the following picture.

~v k~vk sin( )


~u

We examine this concept in the following example.

Example 4.52: Area of a Parallelogram


Find the area of the parallelogram determined by the vectors ~u and ~v given by

1 3
~u = 1 ,~v = 2
2 1
4.9. The Cross Product 165

Solution. Notice that these vectors are the same as the ones given in Example 4.51. Recall from the
geometric description of the cross product, that the area of the parallelogram is simply the magnitude of
~u ~v. From Example 4.51, ~u ~v = 3~i + 5~j +~k. We can also write this as

3
~u ~v = 5

1

Thus the area of the parallelogram is


p
k~u ~vk = (3)(3) + (5)(5) + (1)(1) = 9 + 25 + 1 = 35

We can also use this concept to find the area of a triangle. Consider the following example.

Example 4.53: Area of Triangle


Find the area of the triangle determined by the points (1, 2, 3) , (0, 2, 5) , (5, 1, 2)

Solution. This triangle is obtained by connecting the three points with lines. Picking (1, 2, 3) as a starting
 T  T
point, there are two displacement vectors, 1 0 2 and 4 1 1 . Notice that if we add
either of these vectors to the position vector of the starting point, the result is the position vectors of
the other two points. Now, the area of the triangle is half the area of the parallelogram determined by
 T  T
1 0 2 and 4 1 1 . The required cross product is given by

1 4  
0 1 = 2 7 1
2 1
Taking the size of this vector gives the area of the parallelogram, given by
p
(2)(2) + (7)(7) + (1)(1) = 4 + 49 + 1 = 54

Hence the area of the triangle is 12 54 = 32 6.
In general, if you have three points in R3 , P, Q, R, the area of the triangle is given by
1
kPQ PRk
2

Recall that PQ is the vector running from point P to point Q.

P R

In the next section, we explore another application of the cross product.


166 Rn

4.9.1. The Box Product

Recall that we can use the cross product to find the the area of a parallelogram. It follows that we can use
the cross product together with the dot product to find the volume of a parallelepiped.
We begin with a definition.

Definition 4.54: Parallelepiped


A parallelepiped determined by the three vectors, ~u,~v, and ~w consists of

{r~u + s~v + t~w : r, s,t [0, 1]}

That is, if you pick three numbers, r, s, and t each in [0, 1] and form r~u + s~v + t~w then the collection
of all such points makes up the parallelepiped determined by these three vectors.

The following is an example of a parallelepiped.

~u ~v
~w

~v
~u

Notice that the base of the parallelepiped is the parallelogram determined by the vectors ~u and ~v.
Therefore, its area is equal to k~u ~vk. The height of the parallelepiped is k~wk cos where is the angle
shown in the picture between ~w and ~u ~v. The volume of this parallelepiped is the area of the base times
the height which is just
k~u ~vkk~wk cos = ~u ~v ~w
This expression is known as the box product and is sometimes written as [~u,~v,~w] . You should consider
what happens if you interchange the ~v with the ~w or the ~u with the ~w. You can see geometrically from
drawing pictures that this merely introduces a minus sign. In any case the box product of three vectors
always equals either the volume of the parallelepiped determined by the three vectors or else 1 times this
volume.

Proposition 4.55: The Box Product


Let ~u,~v, ~w be three vectors in Rn that define a parallelepiped. Then the volume of the parallelepiped
is the absolute value of the box product, given by

|~u ~v ~w|

Consider an example of this concept.


4.9. The Cross Product 167

Example 4.56: Volume of a Parallelepiped


Find the volume of the parallelepiped determined by the vectors

1 1 3
~u =
2 ,~v = 3 ,~w = 2

5 6 3

Solution. According to the above discussion, pick any two of these vectors, take the cross product and then
take the dot product of this with the third of these vectors. The result will be either the desired volume or
1 times the desired volume. Therefore by taking the absolute value of the result, we obtain the volume.
We will take the cross product of ~u and ~v. This is given by

1 1
~u ~v = 2 3
5 6

~i ~j ~k 3

= 1 2 5 = 3~i + ~
j +~k = 1
1 3 6 1

Now take the dot product of this vector with ~w which yields

3 3
(~u ~v) ~w = 1 2

1 3
   
= 3~i + ~j +~k 3~i + 2~j + 3~k
= 9+2+3
= 14

This shows the volume of this parallelepiped is 14 cubic units.


There is a fundamental observation which comes directly from the geometric definitions of the cross
product and the dot product.

Proposition 4.57: Order of the Product


Let ~u,~v, and ~w be vectors. Then (~u ~v) ~w = ~u (~v ~w) .

Proof. This follows from observing that either (~u ~v) ~w and ~u (~v ~w) both give the volume of the
parallelepiped or they both give 1 times the volume.
In case you have covered the notion of matrix determinant, you will remember that we can express the
cross product as the determinant of a particular matrix. It turns out that the same can be done for the box
 T  T  T
product. Suppose you have three vectors, ~u = a b c ,~v = d e f , and ~w = g h i .
168 Rn

Then the box product ~u ~v ~w is given by the following.



a ~i ~j ~k

~u ~v ~w = b d e f

c g h i

e f d f
= a b +c d e
h i g i g h

a b c
= det d e f
g h i

To take the box product, you can simply take the determinant of the matrix which results by letting the
rows be the components of the given vectors in the order in which they occur in the box product.
This follows directly from the definition of the cross product given above and the way we expand
determinants. Thus the volume of a parallelepiped determined by the vectors ~u,~v,~w is just the absolute
value of the above determinant.

Exercises

Exercise 4.9.34 Show that if ~a ~u = ~0 for any unit vector ~u, then ~a = ~0.

Exercise 4.9.35 Find the area of the triangle determined by the three points, (1, 2, 3) , (4, 2, 0) and (3, 2, 1) .

Exercise 4.9.36 Find the area of the triangle determined by the three points, (1, 0, 3) , (4, 1, 0) and (3, 1, 1) .

Exercise 4.9.37 Find the area of the triangle determined by the three points, (1, 2, 3) , (2, 3, 4) and (3, 4, 5) .
Did something interesting happen here? What does it mean geometrically?

1 3
Exercise 4.9.38 Find the area of the parallelogram determined by the vectors 2 , 2 .
3 1

1 4
Exercise 4.9.39 Find the area of the parallelogram determined by the vectors 0 , 2 .
3 1

Exercise
  4.9.40 Is ~u (~v ~w) = (~u ~v) ~w? What is the meaning of ~u ~v ~w? Explain. Hint: Try
~i ~j ~k.
4.9. The Cross Product 169

Exercise 4.9.41 Verify directly that the coordinate description of the cross product, ~u ~v has the property
that it is perpendicular to both ~u and ~v. Then show by direct computation that this coordinate description
satisfies

k~u ~vk2 = k~uk2 k~vk2 (~u ~v)2



= k~uk2 k~vk2 1 cos2 ( )

where is the angle included between the two vectors. Explain why k~u ~vk has the correct magnitude.

Exercise 4.9.42 Suppose A is a 3 3 skew symmetric matrix such that AT = A. Show there exists a
vector ~ such that for all ~u R3
A~u = ~ ~u
Hint: Explain why, since A is skew symmetric it is of the form

0 3 2
A = 3 0 1
2 1 0

where the i are numbers. Then consider 1~i + 2~j + 3~k.



1
Exercise 4.9.43 Find the volume of the parallelepiped determined by the vectors 7 ,
5
1 3
2 , and 2 .
6 3

Exercise 4.9.44 Suppose ~u,~v, and ~w are three vectors whose components are all integers. Can you con-
clude the volume of the parallelepiped determined from these three vectors will always be an integer?

Exercise 4.9.45 What does it mean geometrically if the box product of three vectors gives zero?

Exercise 4.9.46 Using Problem 4.9.45, find an equation of a plane containing the two position vectors, ~p
and ~q and the point 0. Hint: If (x, y, z) is a point on this plane, the volume of the parallelepiped determined
by (x, y, z) and the vectors ~p,~q equals 0.

Exercise 4.9.47 Using the notion of the box product yielding either plus or minus the volume of the
parallelepiped determined by the given three vectors, show that

(~u ~v) ~w = ~u (~v ~w)

In other words, the dot and the cross can be switched as long as the order of the vectors remains the same.
Hint: There are two ways to do this, by the coordinate description of the dot and cross product and by
geometric reasoning.

Exercise 4.9.48 Simplify (~u ~v) (~v ~w) (~w ~z) .


170 Rn

Exercise 4.9.49 Simplify k~u ~vk2 + (~u ~v)2 k~uk2 k~vk2 .

Exercise 4.9.50 For ~u,~v,~w functions of t, prove the following product rules:

(~u ~v) = ~u ~v +~u ~v


(~u ~v) = ~u ~v +~u ~v
4.10. Applications 171

4.10 Applications

Outcomes
A. Apply the concepts of vectors in Rn to the applications of physics and work.

4.10.1. Vectors and Physics

Suppose you push on something. Then, your push is made up of two components, how hard you push and
the direction you push. This illustrates the concept of force.

Definition 4.58: Force


Force is a vector. The magnitude of this vector is a measure of how hard it is pushing. It is measured
in units such as Newtons or pounds or tons. The direction of this vector is the direction in which
the push is taking place.

Vectors are used to model force and other physical vectors like velocity. As with all vectors, a vector
modeling force has two essential ingredients, its magnitude and its direction.
Recall the special vectors which point along the coordinate axes. These are given by

~ei = [0 0 1 0 0]T

where the 1 is in the ith slot and there are zeros in all the other spaces. The direction of ~ei is referred to as
the ith direction.
Consider the following picture which illustrates the case of R3 . Recall that in R3 , we may refer to these
vectors as ~i, ~j, and ~k.

~e3

~e1 ~e2 y
x

Given a vector ~u = [u1 un ]T , it follows that


n
~u = u1~e1 + + un~en = ui~ei
k=1
172 Rn

What does addition of vectors mean physically? Suppose two forces are applied to some object. Each
of these would be represented by a force vector and the two forces acting together would yield an overall
force acting on the object which would also be a force vector known as the resultant. Suppose the two
vectors are ~u = nk=1 ui~ei and ~v = nk=1 vi~ei . Then the vector ~u involves a component in the ith direction
given by ui~ei , while the component in the ith direction of ~v is vi~ei . Then the vector ~u +~v should have a
component in the ith direction equal to (ui + vi )~ei . This is exactly what is obtained when the vectors, ~u and
~v are added.

~u +~v = [u1 + v1 un + vn ]T
n
= (ui + vi )~ei
i=1

Thus the addition of vectors according to the rules of addition in Rn which were presented earlier,
yields the appropriate vector which duplicates the cumulative effect of all the vectors in the sum.
Consider now some examples of vector addition.

Example 4.59: The Resultant of Three Forces


There are three ropes attached to a car and three people pull on these ropes. The first exerts a force
of ~F1 = 2~i + 3~j 2~k Newtons, the second exerts a force of ~F2 = 3~i + 5~j +~k Newtons and the third
exerts a force of 5~i ~j + 2~k Newtons. Find the total force in the direction of ~i.

Solution. To find the total force, we add the vectors as described above. This is given by

(2~i + 3~j 2~k) + (3~i + 5~j +~k) + (5~i ~j + 2~k)


= (2 + 3 + 5)~i + (3 + 5 + 1)~j + (2 + 1 + 2)~k
= 10~i + 7~j +~k

Hence, the total force is 10~i + 7~j +~k Newtons. Therefore, the force in the ~i direction is 10 Newtons.
Consider another example.

Example 4.60: Finding a Vector from Geometric Description


An airplane flies North East at 100 miles per hour. Write this as a vector.

Solution. A picture of this situation follows.

Therefore, we need to find the vector ~u which has length 100 and direction as shown in this diagram.
We can consider the vector ~u as the hypotenuse of a right triangle having equal sides, since the direction
4.10. Applications 173

of ~u corresponds
with the 45 line. The sides, corresponding to the ~i and ~j directions, should be each of
length 100/ 2. Therefore, the vector is given by
 T
100~ 100 ~ 100
100

~u = i + j = 2 2
2 2

This example also motivates the concept of velocity, defined below.

Definition 4.61: Speed and Velocity


The speed of an object is a measure of how fast it is going. It is measured in units of length per unit
time. For example, miles per hour, kilometers per minute, feet per second. The velocity is a vector
having the speed as the magnitude but also specifying the direction.

Thus the velocity vector in the above example is ~i + 100


100 ~j, while the speed is 100 miles per hour.
2 2

Consider the following example.

Example 4.62: Position From Velocity and Time


The velocity of an airplane is 100~i + ~j +~k measured in kilometers per hour and at a certain instant
of time its position is (1, 2, 1) .
Find the position of this airplane one minute later.

Solution. Here imagine a Cartesian coordinate system in which the third component is altitude and the
first and second components are measured on a line from West to East and a line from South to North.
 T
Consider the vector 1 2 1 , which is the initial position vector of the airplane. As the plane
1
moves, the position vector changes according to the velocity vector. After one minute (considered as 60 of
an hour) the airplane has moved in the~i direction a distance of 100 60 = 3 kilometer. In the ~j direction it
1 5

has moved 60 1
kilometer during this same time, while it moves 60 1
kilometer in the ~k direction. Therefore,
the new displacement vector for the airplane is
 T  5 1 1 T  8 121 121 T
1 2 1 + 3 60 60 = 3 60 60

Now consider an example which involves combining two velocities.

Example 4.63: Sum of Two Velocities


A certain river is one half kilometer wide with a current flowing at 4 kilometers per hour from East
to West. A man swims directly toward the opposite shore from the South bank of the river at a speed
of 3 kilometers per hour. How far down the river does he find himself when he has swam across?
How far does he end up swimming?

Solution. Consider the following picture which demonstrates the above scenario.
174 Rn

3
4

First we want to know the total time of the swim across the river. The velocity in the direction across
the river is 3 kilometers per hour, and the river is 21 kilometer wide. It follows the trip takes 1/6 hour or
10 minutes.
Now, we can compute how far downstream he will end up. Since the river runs at a rate  of2 4 kilometers
1
per hour, and the trip takes 1/6 hour, the distance traveled downstream is given by 4 6 = 3 kilometers.
The distance traveled by the swimmer is given by the hypotenuse of a right triangle. The two arms of
the triangle are given by the distance across the river, 21 km, and the distance traveled downstream, 23 km.
Then, using the Pythagorean Theorem, we can calculate the total distance d traveled.
s   
2 2 1 2 5
d= + = km
3 2 6
5
Therefore, the swimmer travels a total distance of 6 kilometers.

4.10.2. Work

The mathematical concept of work is an application of vectors in Rn . The physical concept of work differs
from the notion of work employed in ordinary conversation. For example, suppose you were to slide a
150 pound weight off a table which is three feet high and shuffle along the floor for 50 yards, keeping the
height always three feet and then deposit this weight on another three foot high table. The physical concept
of work would indicate that the force exerted by your arms did no work during this project. The reason
for this definition is that even though your arms exerted considerable force on the weight, the direction of
motion was at right angles to the force they exerted. The only part of a force which does work in the sense
of physics is the component of the force in the direction of motion.
Work is defined to be the magnitude of the component of this force times the distance over which it
acts, when the component of force points in the direction of motion. In the case where the force points
in exactly the opposite direction of motion work is given by (1) times the magnitude of this component
times the distance. Thus the work done by a force on an object as the object moves from one point to
another is a measure of the extent to which the force contributes to the motion. This is illustrated in the
following picture in the case where the given force contributes to the motion.

~F ~F
Q
~F||
P

Recall that for any vector ~u in Rn , we can write ~u as a sum of two vectors, as in
~u = ~u|| +~u
4.10. Applications 175

For any force ~


F, we can write this force as the sum of a vector in the direction of the motion and a vector
perpendicular to the motion. In other words,
~F = ~F|| + ~F

In the above picture the force, ~F is applied to an object which moves on the straight line from P to Q.
There are two vectors shown, ~F|| and ~F and the picture is intended to indicate that when you add these
two vectors you get ~F. In other words, ~F = ~F|| + ~F . Notice that ~F|| acts in the direction of motion and ~F
acts perpendicular to the direction of motion. Only ~F|| contributes to the work done by ~F on the object as it
moves from P to Q. ~F|| is called the component of the force in the direction of motion. From trigonometry,
you see the magnitude of ~F|| should equal k~Fk |cos | . Thus, since ~F|| points in the direction of the vector
from P to Q, the total work done should equal

k~FkkPQk cos = k~Fkk~q ~pk cos

Now, suppose the included angle had been obtuse. Then the work done by the force ~F on the object
would have been negative because ~F|| would point in 1 times the direction of the motion. In this case,
cos would also be negative and so it is still the case that the work done would be given by the above
formula. Thus from the geometric description of the dot product given above, the work equals

k~Fkk~q ~pk cos = ~F (~q ~p)

This explains the following definition.

Definition 4.64: Work Done on an Object by a Force


Let ~F be a force acting on an object which moves from the point P to the point Q, which have
position vectors given by ~p and ~q respectively. Then the work done on the object by the given force
equals ~F (~q ~p) .

Consider the following example.

Example 4.65: Finding Work


 T
Let ~F = 2 7 3 Newtons. Find the work done by this force in moving from the point
(1, 2, 3) to the point (9, 3, 4) along the straight line segment joining these points where distances
are measured in meters.

Solution. First, compute the vector ~q ~p, given by


 T  T  T
9 3 4 1 2 3 = 10 5 1

According to Definition 4.64 the work done is


 T  T
2 7 3 10 5 1 = 20 + (35) + (3)
176 Rn

= 58 Newton meters


Note that if the force had been given in pounds and the distance had been given in feet, the units on
the work would have been foot pounds. In general, work has units equal to units of a force times units of
a length. Recall that 1 Newton meter is equal to 1 Joule. Also notice that the work done by the force can
be negative as in the above example.

Exercises

Exercise 4.10.51 The wind blows from the South at 20 kilometers per hour and an airplane which flies at
600 kilometers per hour in still air is heading East. Find the velocity of the airplane and its location after
two hours.

Exercise 4.10.52 The wind blows from the West at 30 kilometers per hour and an airplane which flies at
400 kilometers per hour in still air is heading North East. Find the velocity of the airplane and its position
after two hours.

Exercise 4.10.53 The wind blows from the North at 10 kilometers per hour. An airplane which flies at
300 kilometers per hour in still air is supposed to go to the point whose coordinates are at (100, 100) . In
what direction should the airplane fly?


3 1
Exercise 4.10.54 Three forces act on an object. Two are 1 and 3 Newtons. Find the third
1 4
force if the object is not to move.


6 2
Exercise 4.10.55 Three forces act on an object. Two are 3 and 1 Newtons. Find the third

3 3
7
force if the total force on the object is to be 1 .

3

Exercise 4.10.56 A river flows West at the rate of b miles per hour. A boat can move at the rate of 8 miles
per hour. Find the smallest value of b such that it is not possible for the boat to proceed directly across the
river.

Exercise 4.10.57 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air is heading North West. What is the velocity of the airplane relative
to the ground? What is the component of this velocity in the direction North?
4.10. Applications 177

Exercise 4.10.58 The wind blows from West to East at a speed of 60 miles per hour and an airplane can
travel travels at 100 miles per hour in still air. How many degrees West of North should the airplane head
in order to travel exactly North?

Exercise 4.10.59 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air heading somewhat West of North so that, with the wind, it is flying
due North. It uses 30.0 gallons of gas every hour. If it has to travel 600.0 miles due North, how much gas
will it use in flying to its destination?

Exercise 4.10.60 An airplane is flying due north at 150.0 miles per hour but it is not actually going due
North because there is a wind which is pushing the airplane due east at 40.0 miles per hour. After one
hour, the plane starts flying 30 East of North. Assuming the plane starts at (0, 0) , where is it after 2
hours? Let North be the direction of the positive y axis and let East be the direction of the positive x axis.

Exercise 4.10.61 City A is located at the origin (0, 0) while city B is located at (300, 500) where distances
are in miles. An airplane flies at 250 miles per hour in still air. This airplane wants to fly from city A to
city B but the wind is blowing in the direction of the positive y axis at a speed of 50 miles per hour. Find a
unit vector such that if the plane heads in this direction, it will end up at city B having flown the shortest
possible distance. How long will it take to get there?

Exercise 4.10.62 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man swims directly toward the opposite shore from the South bank of the river at a speed
of 3 miles per hour. How far down the river does he find himself when he has swam across? How far does
he end up traveling?

Exercise 4.10.63 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man can swim at 3 miles per hour in still water. In what direction should he swim in order
to travel directly across the river? What would the answer to this problem be if the river flowed at 3 miles
per hour and the man could swim only at the rate of 2 miles per hour?

Exercise 4.10.64 Three forces are applied to a point which does not move. Two of the forces are 2~i + 2~j
6~k Newtons and 8~i + 8~j + 3~k Newtons. Find the third force.

Exercise 4.10.65 The total force acting on an object is to be 4~i+2~j 3~k Newtons. A force of 3~i1~j +8~k
Newtons is being applied. What other force should be applied to achieve the desired total force?

Exercise 4.10.66 A bird flies from its nest 8 km in the direction 56 north of east where it stops to rest
on a tree. It then flies 1 km in the direction due southeast and lands atop a telephone pole. Place an xy
coordinate system so that the origin is the birds nest, and the positive x axis points east and the positive y
axis points north. Find the displacement vector from the nest to the telephone pole.
   
Exercise 4.10.67 If ~F is a force and ~D is a vector, show proj~D F ~ = kFk ~ cos ~u where ~u is the unit
vector in the direction of ~D, where ~u = ~D/k~Dk and is the included angle between the two vectors, ~F and
~D. k~
Fk cos is sometimes called the component of the force, ~F in the direction, ~D.
178 Rn

Exercise 4.10.68 A boy drags a sled for 100 feet along the ground by pulling on a rope which is 20 degrees
from the horizontal with a force of 40 pounds. How much work does this force do?

Exercise 4.10.69 A girl drags a sled for 200 feet along the ground by pulling on a rope which is 30 degrees
from the horizontal with a force of 20 pounds. How much work does this force do?

Exercise 4.10.70 A large dog drags a sled for 300 feet along the ground by pulling on a rope which is 45
degrees from the horizontal with a force of 20 pounds. How much work does this force do?

Exercise 4.10.71 How much work does it take to slide a crate 20 meters along a loading dock by pulling
on it with a 200 Newton force at an angle of 30 from the horizontal? Express your answer in Newton
meters.

Exercise 4.10.72 An object moves 10 meters in the direction of ~j. There are two forces acting on this
F1 =~i + ~j + 2~k, and ~F2 = 5~i + 2~j 6~k. Find the total work done on the object by the two forces.
object, ~
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force. Why?

Exercise 4.10.73 An object moves 10 meters in the direction of ~j +~i. There are two forces acting on this
object, ~F1 =~i + 2~j + 2~k, and ~F2 = 5~i + 2~j 6~k. Find the total work done on the object by the two forces.
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force. Why?

Exercise 4.10.74 An object moves 20 meters in the direction of ~k + ~j. There are two forces acting on this
F1 = ~i + ~j + 2~k, and ~F2 = ~i + 2~j 6~k. Find the total work done on the object by the two forces.
object, ~
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force.
5. Linear Transformations

5.1 Linear Transformations

Outcomes
A. Understand the definition of a linear transformation, and that all linear transformations are
determined by matrix multiplication.

Recall that when we multiply an m n matrix by an n 1 column vector, the result is an m 1 column
vector. In this section we will discuss how, through matrix multiplication, an m n matrix transforms an
n 1 column vector into an m 1 column vector.
Recall that the n 1 vector given by
x1
x2

~x = ..
.
xn
is said to belong to Rn , which is the set of all n 1 vectors. In this section, we will discuss transformations
of vectors in Rn .
Consider the following example.

Example 5.1: A Function Which Transforms Vectors


 
1 2 0
Consider the matrix A = . Show that by matrix multiplication A transforms vectors in
2 1 0
R3 into vectors in R2 .

Solution. First, recall that vectors in R3 are vectors of size 3 1, while vectors in R2 are of size 2 1. If
we multiply A, which is a 2 3 matrix, by a 3 1 vector, the result will be a 2 1 vector. This what we
mean when we say that A transforms vectors.

x
Now, for y in R3 , multiply on the left by the given matrix to obtain the new vector. This product
z
looks like
  x  
1 2 0 x + 2y
y =
2 1 0 2x + y
z

179
180 Linear Transformations

The resulting product is a 2 1 vector which is determined by the choice of x and y. Here are some
numerical examples.
  1  
1 2 0 5
2 =
2 1 0 4
3

1  
3 5
Here, the vector 2 in R was transformed by the matrix into the vector in R2 .
4
3
Here is another example:
  10  
1 2 0 20
5 =
2 1 0 25
3

The idea is to define a function which takes vectors in R3 and delivers new vectors in R2 . In this case,
that function is multiplication by the matrix A.
Let T denote such a function. The notation T : Rn 7 Rm means that the function T transforms vectors
in Rn into vectors in Rm . The notation T (~x) means the transformation T applied to the vector~x. The above
example demonstrated a transformation achieved by matrix multiplication. In this case, we often write

TA (~x) = A~x

Therefore, TA is the transformation determined by the matrix A. In this case we say that T is a matrix
transformation.
Recall the property of matrix multiplication that states that for k and p scalars,

A (kB + pC) = kAB + pAC

In particular, for A an m n matrix and B and C, n 1 vectors in Rn , this formula holds.


In other words, this means that matrix multiplication gives an example of a linear transformation,
which we will now define.

Definition 5.2: Linear Transformation


Let T : Rn 7 Rm be a function, where for each ~x Rn , T (~x) Rm . Then T is a linear transforma-
tion if whenever k, p are scalars and ~x1 and ~x2 are vectors in Rn (n 1 vectors),

T (k~x1 + p~x2 ) = kT (~x1 ) + pT (~x2 )

Consider the following example.


5.1. Linear Transformations 181

Example 5.3: Linear Transformation


Let T be a transformation defined by T : R3 R2 is defined by

x   x
x+y
T y = for all y R3
xz
z z

Show that T is a linear transformation.

Solution. By Definition 5.2 we need to show that T (k~x1 + p~x2 ) = kT (~x1 ) + pT (~x2 ) for all scalars k, p and
vectors ~x1 ,~x2 . Let
x1 x2
~x1 = y1 ,~x2 = y2
z1 z2
Then

x1 x2
T (k~x1 + p~x2 ) = T k y1 + p y2
z1 z2

kx1 px2
= T ky1 + py2
kz1 pz2

kx1 + px2
= T ky1 + py2
kz1 + pz2
 
(kx1 + px2 ) + (ky1 + py2 )
=
(kx1 + px2 ) (kz1 + pz2 )
 
(kx1 + ky1 ) + (px2 + py2 )
=
(kx1 kz1 ) + (px2 pz2 )
   
kx1 + ky1 px2 + py2
= +
kx1 kz1 px2 pz2
   
x1 + y1 x2 + y2
= k +p
x1 z1 x2 z2
= kT (~x1 ) + pT (~x2 )

Therefore T is a linear transformation.


Two important examples of linear transformations are the zero transformation and identity transfor-
mation. The zero transformation defined by T (~x) =~(0) for all ~x is an example of a linear transformation.
Similarly the identity transformation defined by T (~x) =~(x) is also linear. Take the time to prove these
using the method demonstrated in Example 5.3.
We began this section by discussing matrix transformations, where multiplication by a matrix trans-
forms vectors. These matrix transformations are in fact linear transformations.
182 Linear Transformations

Theorem 5.4: Matrix Transformations are Linear Transformations


Let T : Rn 7 Rm be a transformation defined by T (~x) = A~x. Then T is a linear transformation.

It turns out that every linear transformation can be expressed as a matrix transformation, and thus linear
transformations are exactly the same as matrix transformations.

Exercises

Exercise 5.1.1 Show the map T : Rn 7 Rm defined by T (~x) = A~x where A is an m n matrix and ~x is an
m 1 column vector is a linear transformation.

Exercise 5.1.2 Show that the function T~u defined by T~u (~v) =~v proj~u (~v) is also a linear transformation.

Exercise 5.1.3 Let ~u be a fixed vector. The function T~u defined by T~u~v = ~u +~v has the effect of translating
all vectors by adding ~u 6= ~0. Show this is not a linear transformation. Explain why it is not possible to
represent T~u in R3 by multiplying by a 3 3 matrix.

5.2 The Matrix of a Linear


Transformation

Outcomes
A. Find the matrix of a linear transformation and determine the action on a vector in Rn .

In the above examples, the action of the linear transformations was to multiply by a matrix. It turns
out that this is always the case for linear transformations. If T is any linear transformation which maps
Rn to Rm , there is always an m n matrix A with the property that

T (~x) = A~x (5.1)

for all ~x Rn .

Theorem 5.5: Matrix of a Linear Transformation


Let T : Rn 7 Rm be a linear transformation. Then we can find a matrix A such that T (~x) = A~x. In
this case, we say that T is determined or induced by the matrix A.
5.2. The Matrix of a Linear
Transformation 183

Here is why. Suppose T : Rn 7 Rm is a linear transformation and you want to find the matrix defined
by this linear transformation as described in 5.1. Note that

x1 1 0 0
x2 0 1 0 n

~x = .. = x1 .. + x2 .. + + xn .. = xi~ei
. . . . i=1
xn 0 0 1

where ~ei is the ith column of In , that is the n 1 vector which has zeros in every slot but the ith and a 1 in
this slot.
Then since T is linear,
n
T (~x) = xiT (~ei)
i=1

| | x1

= T (~e1 ) T (~en ) ...
| | xn

x1
..
= A .
xn

Therefore, the desired matrix is obtained from constructing the ith column as T (~ei ) . We state this formally
as the following theorem.

Theorem 5.6: Matrix of a Linear Transformation


Let T : Rn 7 Rm be a linear transformation. Then the matrix A satisfying T (~x) = A~x is given by

| |
A = T (~e1 ) T (~en )
| |

where ~ei is the ith column of In , and then T (~ei ) is the ith column of A.

The following Corollary is an essential result.

Corollary 5.7: Matrix and Linear Transformation


A transformation T is a linear transformation if and only if it is a matrix transformation.

Consider the following example.


184 Linear Transformations

Example 5.8: The Matrix of a Linear Transformation


Suppose T is a linear transformation, T : R3 R2 where

1   0   0  
1 9 1
T 0 = , T 1 = , T 0 =
2 3 1
0 0 1

Find the matrix A of T such that T (~x) = A~x for all ~x.

Solution. By Theorem 5.6 we construct A as follows:



| |
A = T (~e1 ) T (~en )
| |

In this case, A will be a 2 3 matrix, so we need to find T (~e1 ) , T (~e2 ) , and T (~e3 ). Luckily, we have
been given these values so we can fill in A as needed, using these vectors as the columns of A. Hence,
 
1 9 1
A=
2 3 1


In this example, we were given the resulting vectors of T (~e1 ) , T (~e2 ) , and T (~e3 ). Constructing the
matrix A was simple, as we could simply use these vectors as the columns of A. The next example shows
how to find A when we are not given the T (~ei ) so clearly.

Example 5.9: The Matrix of Linear Transformation: Inconveniently


Defined
Suppose T is a linear transformation, T : R2 R2 and
       
1 1 0 3
T = ,T =
1 2 1 2

Find the matrix A of T such that T (~x) = A~x for all ~x.

Solution. By Theorem 5.6 to find this matrix, we need to determine the action of T on ~e1 and ~e2 . In
Example 5.8, we were given these resulting vectors. However, in this example, we have been given T of
two different vectors. How can we find out the action of T on~e1 and~e2 ? In particular for~e1 , suppose there
exist x and y such that      
1 1 0
=x +y (5.2)
0 1 1
Then, since T is linear,      
1 1 0
T = xT + yT
0 1 1
5.2. The Matrix of a Linear
Transformation 185

Substituting in values, this sum becomes


     
1 1 3
T =x +y (5.3)
0 2 2

Therefore, if we know the values of x and y which satisfy 5.2, we can substitute these into equation
5.3. By doing so, we find T (~e1 ) which is the first column of the matrix A.
We proceed to find x and y. We do so by solving 5.2, which can be done by solving the system

x=1
xy = 0

We see that x = 1 and y = 1 is the solution to this system. Substituting these values into equation 5.3,
we have            
1 1 3 1 3 4
T =1 +1 = + =
0 2 2 2 2 4
 
4
Therefore is the first column of A.
4
Computing the second column is done in the same way, and is left as an exercise.
The resulting matrix A is given by  
4 3
A=
4 2

This example illustrates a very long procedure for finding the matrix of A. While this method is reliable
and will always result in the correct matrix A, the following procedure provides an alternative method.

Procedure 5.10: Finding the Matrix of Inconveniently Defined Linear Transformation


Suppose T : Rn Rm is a linear transformation. Suppose there exist vectors {~a1 , ,~an } in Rn
 1
such that ~a1 ~an exists, and
T (~ai ) = ~bi
Then the matrix of T must be of the form
  1
~b1 ~bn ~a1 ~an

We will illustrate this procedure in the following example. You may also find it useful to work through
Example 5.9 using this procedure.
186 Linear Transformations

Example 5.11: Matrix of a Linear Transformation


Given Inconveniently
Suppose T : R3 R3 is a linear transformation and

1 0 0 2 1 0
T 3 = 1 ,T 1 = 1 ,T 1 = 0

1 1 1 3 0 1

Find the matrix of this linear transformation.

1
1 0 1 0 2 0
Solution. By Procedure 5.10, A = 3 1 1 and B = 1 1 0
1 1 0 1 3 1
Then, Procedure 5.10 claims that the matrix of T is

2 2 4
1
C = BA = 0 0 1
4 3 6

Indeed you can first verify that T (~x) = C~x for the 3 vectors above:

2 2 4 1 0 2 2 4 0 2
0 0 1 3 = 1 , 0 0 1 1 = 1
4 3 6 1 1 4 3 6 1 3

2 2 4 1 0
0 0 1 1 = 0
4 3 6 0 1

But more generally T (~x) = C~x for any ~x. To see this, let ~y = A1~x and then using linearity of T :
!
T (~x) = T (A~y) = T ~yi~ai = ~yi T (~ai ) ~yi~bi = B~y = BA1~x = C~x
i


Recall the dot product discussed earlier. Consider the map ~v7 proj~u (~v) which takes a vector a trans-
forms it to its projection onto a given vector ~u. It turns out that this map is linear, a result which follows
from the properties of the dot product. This is shown as follows.
 
(k~v + p~w) ~u
proj~u (k~v + p~w) = ~u
~u ~u
   
~v ~u ~w ~u
= k ~u + p ~u
~u ~u ~u ~u
= k proj~u (~v) + p proj~u (~w)

Consider the following example.


5.2. The Matrix of a Linear
Transformation 187

Example 5.12: Matrix of a Projection Map



1
Let ~u = 2 and let T be the projection map T : R3 7 R3 defined by

3

T (~v) = proj~u (~v)

for any ~v R3 .

1. Does this transformation come from multiplication by a matrix?

2. If so, what is the matrix?

Solution.

1. First, we have just seen that T (~v) = proj~u (~v) is linear. Therefore by Theorem 5.5, we can find a
matrix A such that T (~x) = A~x.

2. The columns of the matrix for T are defined above as T (~ei ). It follows that T (~ei ) = proj~u (~ei ) gives
the ith column of the desired matrix. Therefore, we need to find
 
~ei ~u
proj~u (~ei ) = ~u
~u ~u

For the given vector ~u , this implies the columns of the desired matrix are

1 1 1
1 2 3
2 , 2 , 2
14 14 14
3 3 3

which you can verify. Hence the matrix of T is



1 2 3
1
2 4 6
14
3 6 9

Exercises

Exercise 5.2.4 Consider the following functions which map Rn to Rn .

(a) T multiplies the jth component of ~x by a nonzero number b.


188 Linear Transformations

(b) T replaces the ith component of ~x with b times the jth component added to the ith component.
(c) T switches the ith and jth components.

Show these functions are linear transformations and describe their matrices A such that T (~x) = A~x.

Exercise 5.2.5 You are given a linear transformation T : Rn Rm and you know that
T (Ai ) = Bi
 1
where A1 An exists. Show that the matrix of T is of the form
  1
B1 Bn A1 An

Exercise 5.2.6 Suppose T is a linear transformation such that



1 5
T 2 = 1
6 3

1 1

T 1 = 1
5 5

0 5
T 1 = 3
2 2
Find the matrix of T . That is find A such that T (~x) = A~x.

Exercise 5.2.7 Suppose T is a linear transformation such that



1 1
T 1 = 3
8 1

1 2
T 0 = 4
6 1

0 6
T 1 = 1
3 1
Find the matrix of T . That is find A such that T (~x) = A~x.

Exercise 5.2.8 Suppose T is a linear transformation such that



1 3
T 3 = 1
7 3
5.2. The Matrix of a Linear
Transformation 189


1 1
T 2 = 3
6 3

0 5
T 1 = 3
2 3
Find the matrix of T . That is find A such that T (~x) = A~x.

Exercise 5.2.9 Suppose T is a linear transformation such that



1 3
T 1 = 3
7 3

1 1
T 0 = 2
6 3

0 1
T 1 = 3
2 1
Find the matrix of T . That is find A such that T (~x) = A~x.

Exercise 5.2.10 Suppose T is a linear transformation such that



1 5
T 2 = 2
18 5

1 3

T 1 = 3
15 5

0 2
T 1 = 5
4 2
Find the matrix of T . That is find A such that T (~x) = A~x.

Exercise 5.2.11 Consider the following functions T : R3 R2 . Show that each is a linear transformation
and determine for each the matrix A such that T (~x) = A~x.

x  
x + 2y + 3z
(a) T y =
2y 3x + z
z

x  
7x + 2y + z
(b) T y =
3x 11y + 2z
z
190 Linear Transformations


x  
3x + 2y + z
(c) T y =
x + 2y + 6z
z

x  
2y 5x + z
(d) T y =
x+y+z
z

Exercise 5.2.12 Consider the following functions T : R3 R2 . Explain why each of these functions T is
not linear.

x  
x + 2y + 3z + 1
(a) T y =
2y 3x + z
z

x  2 + 3z

x + 2y
(b) T y =
2y + 3x + z
z

x  
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z

x  
x + 2y + 3z
(d) T y =
2y + 3x ln z
z

Exercise 5.2.13 Suppose


 1
A1 An
exists where each A j Rn and let vectors {B1 , , Bn } in Rm be given. Show that there always exists a
linear transformation T such that T (Ai ) = Bi .
 T
Exercise 5.2.14 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 2 3 .
 T
Exercise 5.2.15 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 5 3 .
 T
Exercise 5.2.16 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 0 3 .
5.3. Properties of Linear Transformations 191

5.3 Properties of Linear Transformations

Outcomes
A. Use properties of linear transformations to solve problems.

B. Find the composite of transformations and the inverse of a transformation.

Let T : Rn 7 Rm be a linear transformation. Then there are some important properties of T which will
be examined in this section. Consider the following theorem.

Theorem 5.13: Properties of Linear Transformations


Let T : Rn 7 Rm be a linear transformation and let ~x Rn .

T preserves the zero vector.

T (0~x) = 0T (~x). Hence T (~0) = ~0

T preserves the negative of a vector:

T ((1)~x) = (1)T (~x). Hence T (~x) = T (~x).

T preserves linear combinations:

Let ~x1 , ...,~xk Rn and a1 , ..., ak R.

Then if ~y = a1~x1 + a2~x2 + ... + ak~xk , it follows that


T (~y) = T (a1~x1 + a2~x2 + ... + ak~xk ) = a1 T (~x1 ) + a2 T (~x2 ) + ... + ak T (~xk ).

These properties are useful in determining the action of a transformation on a given vector. Consider
the following example.
192 Linear Transformations

Example 5.14: Linear Combination


Let T : R3 7 R4 be a linear transformation such that

4 4
1 4 4 5
T 3 =
0 , T 0 = 1


1 5
2 5

7
Find T 3 .
9


7 7
Solution. Using the third property in Theorem 5.13, we can find T 3 by writing 3 as a linear
9 9
1 4
combination of 3 and 0 .

1 5
Therefore we want to find a, b R such that

7 1 4
3 = a 3 +b 0
9 1 5

The necessary augmented matrix and resulting reduced row-echelon form are given by:

1 4 7 1 0 1
3 0 3 0 1 2
1 5 9 0 0 0

Hence a = 1, b = 2 and
7 1 4
3 = 1 3 + (2) 0
9 1 5
Now, using the third property above, we have

7 1 4
T 3 = T 1 3 + (2) 0

9 1 5

1 4
= 1T 3 2T 0
1 5

4 4
4
= 2 5
0 1
2 5
5.3. Properties of Linear Transformations 193


4
6
=


2
12

4
7 6
Therefore, T 3 =

.
2
9
12
Suppose two linear transformations act in the same way on ~x for all vectors. Then we say that these
transformations are equal.

Definition 5.15: Equal Transformations


Let S and T be linear transformations from Rn to Rm . Then S = T if and only if for every ~x Rn ,

S (~x) = T (~x)

Suppose two linear transformations act on the same vector ~x, first the transformation T and then a
second transformation given by S. We can find the composite transformation that results from applying
both transformations.

Definition 5.16: Composition of Linear Transformations


Let T : Rk 7 Rn and S : Rn 7 Rm be linear transformations. Then the composite of S and T is

S T : Rk 7 Rm

The action of S T is given by

(S T )(~x) = S(T (~x)) for all ~x Rk

Notice that the resulting vector will be in Rm . Be careful to observe the order of transformations. We
write S T but apply the transformation T first, followed by S.

Theorem 5.17: Composition of Transformations


Let T : Rk 7 Rn and S : Rn 7 Rm be linear transformations such that T is induced by the matrix
A and S is induced by the matrix B. Then S T is a linear transformation which is induced by the
matrix BA.

Consider the following example.


194 Linear Transformations

Example 5.18: Composition of Transformations


Let T be a linear transformation induced by the matrix
 
1 2
A=
2 0

and S a linear transformation induced by the matrix


 
2 3
B=
0 1
 
1
Find the matrix of the composite transformation S T . Then, find (S T )(~x) for ~x = .
4

Solution. By Theorem 5.17, the matrix of S T is given by BA.


    
2 3 1 2 8 4
BA = =
0 1 2 0 2 0

To find (S T )(~x), multiply ~x by BA as follows


    
8 4 1 24
=
2 0 4 2

To check, first determine T (~x):     


1 2 1 9
=
2 0 4 2
Then, compute S(T (~x)) as follows:
    
2 3 9 24
=
0 1 2 2

Consider a composite transformation S T , and suppose that this transformation acted such that (S
T )(~x) =~x. That is, the transformation S took the vector T (~x) and returned it to ~x. In this case, S and T are
inverses of each other. Consider the following definition.

Definition 5.19: Inverse of a Transformation


Let T : Rn 7 Rn and S : Rn 7 Rn be linear transformations. Suppose that for each ~x Rn ,

(S T )(~x) =~x

and
(T S)(~x) =~x
Then, S is called an inverse of T and T is called an inverse of S. Geometrically, they reverse the
action of each other.
5.3. Properties of Linear Transformations 195

The following theorem is crucial, as it claims that the above inverse transformations are unique.

Theorem 5.20: Inverse of a Transformation


Let T : Rn 7 Rn be a linear transformation induced by the matrix A. Then T has an inverse trans-
formation if and only if the matrix A is invertible. In this case, the inverse transformation is unique
and denoted T 1 : Rn 7 Rn . T 1 is induced by the matrix A1 .

Consider the following example.

Example 5.21: Inverse of a Transformation


Let T : R2 7 R2 be a linear transformation induced by the matrix
 
2 3
A=
3 4

Show that T 1 exists and find the matrix B which it is induced by.

Solution. Since the matrix A is invertible, it follows that the transformation T is invertible. Therefore, T 1
exists.
You can verify that A1 is given by:
 
1 4 3
A =
3 2

Therefore the linear transformation T 1 is induced by the matrix A1 .

Exercises
 
Exercise 5.3.17 Show that if a function T : Rn Rm is linear, then it is always the case that T ~0 = ~0.

 
3 1
Exercise 5.3.18 Let T be a linear transformation induced by the matrix A = and S a linear
  1 2  
0 2 2
transformation induced by B = . Find matrix of S T and find (S T ) (~x) for ~x = .
4 2 1
  
1 2
Exercise 5.3.19 Let T be a linear transformation and suppose T = . Suppose S is a
  4 3 
1 2 1
linear transformation induced by the matrix B = . Find (S T ) (~x) for ~x = .
1 3 4
196 Linear Transformations

 
2 3
Exercise 5.3.20 Let T be a linear transformation induced by the matrix A = and S a linear
    1 1
1 3 5
transformation induced by B = . Find matrix of S T and find (S T ) (~x) for ~x = .
1 2 6
 
2 1
Exercise 5.3.21 Let T be a linear transformation induced by the matrix A = . Find the matrix of
5 2
T 1 .
 
4 3
Exercise 5.3.22 Let T be a linear transformation induced by the matrix A = . Find the matrix
2 2
of T 1 .
     
1 9 0
Exercise 5.3.23 Let T be a linear transformation and suppose T = , T =
2 8 1
 
4
. Find the matrix of T 1 .
3

5.4 Special Linear Transformations in R2

Outcomes

A. Find the matrix of rotations and reflections in R2 and determine the action of each on a vector
in R2 .

In this section, we will examine some special examples of linear transformations in R2 including rota-
tions and reflections. We will use the geometric descriptions of vector addition and scalar multiplication
discussed earlier to show that a rotation of vectors through an angle and reflection of a vector across a line
are examples of linear transformations.
More generally, denote a transformation given by a rotation by T . Why is such a transformation linear?
Consider the following picture which illustrates a rotation. Let ~u,~v denote vectors.

T (~u) + T (~v) T (~v)

T (~u)
~v ~u +~v
T (~v)
~v

~u
5.4. Special Linear Transformations in R2 197

Lets consider how to obtain T (~u +~v). Simply, you add T (~u) and T (~v). Here is why. If you add
T (~u) to T (~v) you get the diagonal of the parallelogram determined by T (~u) and T (~v), as this action is our
usual vector addition. Now, suppose we first add ~u and ~v, and then apply the transformation T to ~u +~v.
Hence, we find T (~u +~v). As shown in the diagram, this will result in the same vector. In other words,
T (~u +~v) = T (~u) + T (~v).
This is because the rotation preserves all angles between the vectors as well as their lengths. In par-
ticular, it preserves the shape of this parallelogram. Thus both T (~u) + T (~v) and T (~u +~v) give the same
vector. It follows that T distributes across addition of the vectors of R2 .
Similarly, if k is a scalar, it follows that T (k~u) = kT (~u). Thus rotations are an example of a linear
transformation by Definition 5.2.
The following theorem gives the matrix of a linear transformation which rotates all vectors through an
angle of .

Theorem 5.22: Rotation


Let R : R2 R2 be a linear transformation given by rotating vectors through an angle of . Then
the matrix A of R is given by  
cos ( ) sin ( )
sin ( ) cos ( )

   
1 0
Proof. Let~e1 = and~e2 = . These identify the geometric vectors which point along the positive
0 1
x axis and positive y axis as shown.

~e2
(sin( ), cos( ))
R (~e1 ) (cos( ), sin( ))
R (~e2 )

~e1

From Theorem 5.6, we need to find R (~e1 ) and R (~e2 ), and use these as the columns of the matrix A
of T . We can use cos, sin of the angle to find the coordinates of R (~e1 ) as shown in the above picture.
The coordinates of R (~e2 ) also follow from trigonometry. Thus
   
cos sin
R (~e1 ) = , R (~e2 ) =
sin cos
Therefore, from Theorem 5.6,  
cos sin
A=
sin cos
We can also prove this algebraically without the use of the above picture. The definition of (cos ( ) , sin ( ))
is as the coordinates of the point of R (~e1 ). Now the point of the vector~e2 is exactly /2 further along the
198 Linear Transformations

unit circle from the point of ~e1 , and therefore after rotation through an angle of the coordinates x and y
of the point of R (~e2 ) are given by

(x, y) = (cos ( + /2) , sin ( + /2)) = ( sin , cos )


Consider the following example.

Example 5.23: Rotation in R2


Let R 2 : R2 R2 denote rotation through /2. Find the matrix of R 2 . Then, find R 2 (~x) where
 
1
~x = .
2

Solution. By Theorem 5.22, the matrix of R is given by


2
     
cos ( ) sin ( ) cos ( /2) sin ( /2) 0 1
= =
sin ( ) cos ( ) sin ( /2) cos ( /2) 1 0

To find R 2 (~x), we multiply the matrix of R 2 by ~x as follows


    
0 1 1 2
=
1 0 2 1


We now look at an example of a linear transformation involving two angles.

Example 5.24: The Rotation Matrix of the Sum of Two Angles


Find the matrix of the linear transformation which is obtained by first rotating all vectors through an
angle of and then through an angle . Hence the linear transformation rotates all vectors through
an angle of + .

Solution. Let R + denote the linear transformation which rotates every vector through an angle of + .
Then to obtain R + , we first apply R and then R where R is the linear transformation which rotates
through an angle of and R is the linear transformation which rotates through an angle of . Denoting
the corresponding matrices by A + , A , and A , it follows that for every ~u

R + (~u) = A +~u = A A ~u = R R (~u)

Notice the order of the matrices here!


Consequently, you must have
 
cos ( + ) sin ( + )
A + =
sin ( + ) cos ( + )
5.4. Special Linear Transformations in R2 199

  
cos sin cos sin
= = A A
sin cos sin cos

The usual matrix multiplication yields


 
cos ( + ) sin ( + )
A + =
sin ( + ) cos ( + )
 
cos cos sin sin cos sin sin cos
=
sin cos + cos sin cos cos sin sin
= A A

Dont these look familiar? They are the usual trigonometric identities for the sum of two angles derived
here using linear algebra concepts.

Here we have focused on rotations in two dimensions. However, you can consider rotations and other
geometric concepts in any number of dimensions. This is one of the major advantages of linear algebra.
You can break down a difficult geometrical procedure into small steps, each corresponding to multiplica-
tion by an appropriate matrix. Then by multiplying the matrices, you can obtain a single matrix which can
give you numerical information on the results of applying the given sequence of simple procedures.
Linear transformations which reflect vectors across a line are a second important type of transforma-
tions in R2 . Consider the following theorem.

Theorem 5.25: Reflection


Let Qm : R2 R2 be a linear transformation given by reflecting vectors over the line ~y = m~x. Then
the matrix of Qm is given by  
1 1 m2 2m
1 + m2 2m m2 1

Consider the following example.

Example 5.26: Reflection in R2


Let Q2 : R2 R2 denote reflection over the line  ~y =2~x. Then Q2 is a linear transformation. Find
1
the matrix of Q2 . Then, find Q2 (~x) where ~x = .
2

Solution. By Theorem 5.25, the matrix of Q2 is given by


     
1 1 m2 2m 1 1 (2)2 2(2) 1 3 8
= =
1 + m2 2m m2 1 1 + (2)2 2(2) (2)2 1 5 8 3

To find Q2 (~x) we multiply ~x by the matrix of Q2 as follows:


   " 19 #
1 3 8 1 5
= 2
5 8 3 2 5
200 Linear Transformations


Consider the following example which incorporates a reflection as well as a rotation of vectors.

Example 5.27: Rotation Followed by a Reflection


Find the matrix of the linear transformation which is obtained by first rotating all vectors through
an angle of /6 and then reflecting through the x axis.

Solution. By Theorem 5.22, the matrix of the transformation which involves rotating through an angle of
/6 is 1
  3 12
cos ( /6) sin ( /6) 2
= 1 1

sin ( /6) cos ( /6) 2 2 3

Reflecting across the x axis is the same action as reflecting vectors over the line ~y = m~x with m = 0.
By Theorem 5.25, the matrix for the transformation which reflects all vectors through the x axis is
     
1 1 m2 2m 1 1 (0)2 2(0) 1 0
= =
1 + m2 2m m2 1 1 + (0)2 2(0) (0)2 1 0 1

Therefore, the matrix of the linear transformation which first rotates through /6 and then reflects
through the x axis is given by
1
  1 3 1 3 1
1 0 2 2
=
2 2

0 1 1 1 1 1
2 2 3 2 2 3

Exercises

Exercise 5.4.24 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /3.

Exercise 5.4.25 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /4.

Exercise 5.4.26 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /3.

Exercise 5.4.27 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 2 /3.
5.4. Special Linear Transformations in R2 201

Exercise 5.4.28 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /12. Hint: Note that /12 = /3 /4.

Exercise 5.4.29 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 2 /3 and then reflects across the x axis.

Exercise 5.4.30 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /3 and then reflects across the x axis.

Exercise 5.4.31 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /4 and then reflects across the x axis.

Exercise 5.4.32 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of /6 and then reflects across the x axis followed by a reflection across the y axis.

Exercise 5.4.33 Find the matrix for the linear transformation which reflects every vector in R2 across the
x axis and then rotates every vector through an angle of /4.

Exercise 5.4.34 Find the matrix for the linear transformation which reflects every vector in R2 across the
y axis and then rotates every vector through an angle of /4.

Exercise 5.4.35 Find the matrix for the linear transformation which reflects every vector in R2 across the
x axis and then rotates every vector through an angle of /6.

Exercise 5.4.36 Find the matrix for the linear transformation which reflects every vector in R2 across the
y axis and then rotates every vector through an angle of /6.

Exercise 5.4.37 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 5 /12. Hint: Note that 5 /12 = 2 /3 /4.

Exercise 5.4.38 Find the matrix of the linear transformation which rotates every vector in R3 counter
clockwise about the z axis when viewed from the positive z axis through an angle of 30 and then reflects
through the xy plane.
 
a
Exercise 5.4.39 Let ~u = be a unit vector in R2 . Find the matrix which reflects all vectors across
b
this vector, as shown in the following picture.

~u

   
a cos
Hint: Notice that = for some . First rotate through . Next reflect through the x
b sin
axis. Finally rotate through .
6. Complex Numbers

6.1 Complex Numbers

Outcomes
A. Understand the geometric significance of a complex number as a point in the plane.

B. Prove algebraic properties of addition and multiplication of complex numbers, and apply
these properties. Understand the action of taking the conjugate of a complex number.

C. Understand the absolute value of a complex number and how to find it as well as its geometric
significance.

Although very powerful, the real numbers are inadequate to solve equations such as x2 +1 = 0, and this
is where complex numbers come in. We define the number i as the imaginary number such that i2 = 1,
and define complex numbers as those of the form z = a + bi where a and b are real numbers. We call this
the standard form, or Cartesian form, of the complex number z. Then, we refer to a as the real part of z,
and b as the imaginary part of z. It turns out that such numbers not only solve the above equation, but
in fact also solve any polynomial of degree at least 1 with complex coefficients. This property, called the
Fundamental Theorem of Algebra, is sometimes referred to by saying C is algebraically closed. Gauss is
usually credited with giving a proof of this theorem in 1797 but many others worked on it and the first
completely correct proof was due to Argand in 1806.
Just as a real number can be considered as a point on the line, a complex number z = a + bi can be
considered as a point (a, b) in the plane whose x coordinate is a and whose y coordinate is b. For example,
in the following picture, the point z = 3 + 2i can be represented as the point in the plane with coordinates
(3, 2) .

z = (3, 2) = 3 + 2i

Addition of complex numbers is defined as follows.

(a + bi) + (c + di) = (a + c) + (b + d) i

203
204 Complex Numbers

This addition obeys all the usual properties as the following theorem indicates.

Theorem 6.1: Properties of Addition of Complex Numbers


Let z, w, and v be complex numbers. Then the following properties hold.

Commutative Law for Addition


z+w = w+z

Additive Identity
z+0 = z

Existence of Additive Inverse


For each z C, there exists z C such that z + (z) = 0
In fact if z = a + bi, then z = a bi.

Associative Law for Addition

(z + w) + v = z + (w + v)

The proof of this theorem is left as an exercise for the reader.


Now, multiplication of complex numbers is defined the way you would expect, recalling that i2 = 1.

(a + bi) (c + di) = ac + adi + bci + i2bd


= (ac bd) + (ad + bc) i

Consider the following examples.

Example 6.2: Multiplication of Complex Numbers

(2 3i)(3 + 4i) = 6 + 17i

(4 7i)(6 2i) = 10 50i

(3 + 6i)(5 i) = 9 + 33i

The following are important properties of multiplication of complex numbers.


6.1. Complex Numbers 205

Theorem 6.3: Properties of Multiplication of Complex Numbers


Let z, w and v be complex numbers. Then, the following properties of multiplication hold.

Commutative Law for Multiplication


zw = wz

Associative Law for Multiplication

(zw) v = z (wv)

Multiplicative Identity
1z = z

Existence of Multiplicative Inverse

For each z 6= 0, there exists z1 such that zz1 = 1

Distributive Law
z (w + v) = zw + zv

You may wish to verify some of these statements. The real numbers also satisfy the above axioms, and
in general any mathematical structure which satisfies these axioms is called a field. There are many other
fields, in particular even finite ones particularly useful for cryptography, and the reason for specifying these
axioms is that linear algebra is all about fields and we can do just about anything in this subject using any
field. Although here, the fields of most interest will be the familiar field of real numbers, denoted as R,
and the field of complex numbers, denoted as C.
An important construction regarding complex numbers is the complex conjugate denoted by a hori-
zontal line above the number, z. It is defined as follows.

Definition 6.4: Conjugate of a Complex Number


Let z = a + bi be a complex number. Then the conjugate of z, written z is given by

a + bi = a bi

Geometrically, the action of the conjugate is to reflect a given complex number across the x axis.
Algebraically, it changes the sign on the imaginary part of the complex number. Therefore, for a real
number a, a = a.
206 Complex Numbers

Example 6.5: Conjugate of a Complex Number

If z = 3 + 4i, then z = 3 4i, i.e., 3 + 4i = 3 4i.

2 + 5i = 2 5i.

i = i.

7 = 7.

Consider the following computation.



a + bi (a + bi) = (a bi) (a + bi)
= a2 + b2 (ab ab) i = a2 + b2

Notice that there is no imaginary part in the product, thus multiplying a complex number by its conjugate
results in a real number.

Theorem 6.6: Properties of the Conjugate


Let z and w be complex numbers. Then, the following properties of the conjugate hold.

z w = z w.

(zw) = z w.

(z) = z.

wz = wz .

z is real if and only if z = z.

Division of complex numbers is defined as follows. Let z = a + bi and w = c + di be complex numbers


such that c, d are not both zero. Then the quotient z divided by w is
z a + bi a + bi c di
= =
w c + di c + di c di
(ac + bd) + (bc ad)i
=
c2 + d 2
ac + bd bc ad
= 2 + i.
c + d 2 c2 + d 2
z z
In other words, the quotient w is obtained by multiplying both top and bottom of w by w and then
simplifying the expression.
6.1. Complex Numbers 207

Example 6.7: Division of Complex Numbers



1 1 i i
= = 2 = i
i i i i

2i 2i 3 4i (6 4) + (3 8)i 2 11i 2 11
= = 2 2
= = i
3 + 4i 3 + 4i 3 4i 3 +4 25 25 25

1 2i 1 2i 2 5i (2 10) + (4 5)i 12 1
= = 2 2
= i
2 + 5i 2 + 5i 2 5i 2 +5 29 29

Interestingly every nonzero complex number a +bi has a unique multiplicative inverse. In other words,
for a nonzero complex number z, there exists a number z1 (or 1z ) so that zz1 = 1. Note that z = a + bi is
nonzero exactly when a2 + b2 6= 0, and its inverse can be written in standard form as defined now.

Definition 6.8: Inverse of a Complex Number


Let z = a + bi be a complex number. Then the multiplicative inverse of z, written z1 exists if and
only if a2 + b2 6= 0 and is given by
1 1 a bi a bi a b
z1 = = = 2 2
= 2 2
i 2
a + bi a + bi a bi a + b a +b a + b2

Note that we may write z1 as 1z . Both notations represent the multiplicative inverse of the complex
number z. Consider now an example.

Example 6.9: Inverse of a Complex Number


Consider the complex number z = 2 + 6i. Then z1 is defined, and
1 1
=
z 2 + 6i
1 2 6i
=
2 + 6i 2 6i
2 6i
= 2
2 + 62
2 6i
=
40
1 3
= i
20 20
You can always check your answer by computing zz1 .

Another important construction of complex numbers is that of the absolute value, also called the mod-
ulus. Consider the following definition.
208 Complex Numbers

Definition 6.10: Absolute Value


The absolute value, or modulus, of a complex number, denoted |z| is defined as follows.
p
|a + bi| = a2 + b2

Thus, if z is the complex number z = a + bi, it follows that

|z| = (zz)1/2

Also from the definition, if z = a + bi and w = c + di are two complex numbers, then |zw| = |z| |w| .
Take a moment to verify this.
The triangle inequality is an important property of the absolute value of complex numbers. There are
two useful versions which we present here, although the first one is officially called the triangle inequality.

Proposition 6.11: Triangle Inequality


Let z, w be complex numbers.
The following two inequalities hold for any complex numbers z, w:

|z + w| |z| + |w|
||z| |w|| |z w|

The first one is called the Triangle Inequality.

Proof. Let z = a + bi and w = c + di. First note that

zw = (a + bi) (c di) = ac + bd + (bc ad) i

and so |ac + bd| |zw| = |z| |w| .


Then,
|z + w|2 = (a + c + i (b + d)) (a + c i (b + d))
= (a + c)2 + (b + d)2 = a2 + c2 + 2ac + 2bd + b2 + d 2
|z|2 + |w|2 + 2 |z| |w| = (|z| + |w|)2
Taking the square root, we have that

|z + w| |z| + |w|

so this verifies the triangle inequality.


To get the second inequality, write

z = z w + w, w = w z + z

and so by the first form of the inequality we get both:

|z| |z w| + |w| , |w| |z w| + |z|


6.1. Complex Numbers 209

Hence, both |z| |w| and |w| |z| are no larger than |z w|. This proves the second version because
||z| |w|| is one of |z| |w| or |w| |z|.
With this definition, it is important to note the following. You may wish to take the time to verify this
remark. q
Let z = a + bi and w = c + di. Then |z w| = (a c)2 + (b d)2 . Thus the distance between the
point in the plane determined by the ordered pair (a, b) and the ordered pair (c, d) equals |z w| where z
and w are as just described.
For example, consider the distance between (2, 5) and (1,8) . Letting z = 2 + 5i and w = 1 + 8i, z w =
1 3i, (z w) (z w) = (1 3i) (1 + 3i) = 10 so |z w| = 10.
Recall that we refer to z = a + bi as the standard form of the complex number. In the next section, we
examine another form in which we can express the complex number.

Exercises

Exercise 6.1.1 Let z = 2 + 7i and let w = 3 8i. Compute the following.

(a) z + w

(b) z 2w

(c) zw
w
(d) z

Exercise 6.1.2 Let z = 1 4i. Compute the following.

(a) z

(b) z1

(c) |z|

Exercise 6.1.3 Let z = 3 + 5i and w = 2 i. Compute the following.

(a) zw

(b) |zw|

(c) z1 w
210 Complex Numbers

Exercise 6.1.4 If z is a complex number, show there exists a complex number w with |w| = 1 and wz = |z| .

Exercise 6.1.5 If z, w are complex numbers prove zw = z w and then show by induction that z1 zm =
z1 zm . Also verify that m m
k=1 zk = k=1 zk . In words this says the conjugate of a product equals the
product of the conjugates and the conjugate of a sum equals the sum of the conjugates.

Exercise 6.1.6 Suppose p (x) = an xn + an1 xn1 + + a1 x + a0 where all the ak are real numbers. Sup-
pose also that p (z) = 0 for some z C. Show it follows that p (z) = 0 also.

Exercise 6.1.7 I claim that 1 = 1. Here is why.


q
2
1 = i = 1 1 = (1)2 = 1=1

This is clearly a remarkable result but is there something wrong with it? If so, what is wrong?

6.2 Polar Form

Outcomes
A. Convert a complex number from standard form to polar form, and from polar form to standard
form.

In the previous section, we identified a complex number z = a + bi with a point (a, b) in the coordinate
plane. There is another form in which we can express the same number, called the polar form. The polar
form is the focus of this section. It will turn out to be very useful if not crucial for certain calculations as
we shall soon see.

Suppose z = a + bi is a complex number, and let r = a2 + b2 = |z|. Recall that r is the modulus of z
. Note first that
 a  2  b  2 a2 + b2
+ = =1
r r r2

and so ar , br is a point on the unit circle. Therefore, there exists an angle (in radians) such that

a b
cos = , sin =
r r
In other words is an angle such that a = r cos and b = r sin , that is = cos1 (a/r) and =
sin1 (b/r). We call this angle the argument of z.
We often speak of the principal argument of z. This is the unique angle ( , ] such that

a b
cos = , sin =
r r
6.2. Polar Form 211

The polar form of the complex number z = a + bi = r (cos + i sin ) is for convenience written as:

z = rei

where is the argument of z.

Definition 6.12: Polar Form of a Complex Number


Let z = a + bi be a complex number. Then the polar form of z is written as

z = rei

where r = a2 + b2 and is the argument of z.

When given z = rei , the identity ei = cos + i sin will convert z back to standard form. Here we
think of ei as a short cut for cos + i sin . This is all we will need in this course, but in reality ei can be
considered as the complex equivalent of the exponential function where this turns out to be a true equality.

z = a + bi = rei
r= a2 + b2 r

Thus we can convert any complex number in the standard (Cartesian) form z = a + bi into its polar
form. Consider the following example.

Example 6.13: Standard to Polar Form


Let z = 2 + 2i be a complex number. Write z in the polar form

z = rei


Solution. First, find r. By the above discussion, r = a2 + b2 = |z|. Therefore,
p
r= 22 + 22 = 8=2 2
Now, to find , we plot the point (2, 2) and find the angle from the positive x axis to the line between
this point and the origin. In this case, = 45 = 4 . That is we found the unique angle such that

= cos1 (1/ 2) and = sin1 (1/ 2).
Note that in polar form, we always express angles in radians, not degrees.
Hence, we can write z as
212 Complex Numbers


z = 2 2ei 4

Notice that the standard and polar forms are completely equivalent. That is not only can we transform
a complex number from standard form to its polar form, we can also take a complex number in polar form
and convert it back to standard form.

Example 6.14: Polar to Standard Form


Let z = 2e2 i/3 . Write z in the standard form

z = a + bi

Solution. Let z = 2e2 i/3 be the polar form of a complex number. Recall that ei = cos + i sin . There-
fore using standard values of sin and cos we get:

z = 2ei2 /3 = 2(cos(2 /3) + i sin(2 /3))


!
1 3
= 2 +i
2 2

= 1 + 3i

which is the standard form of this complex number.


You can always verify your answer by converting it back to polar form and ensuring you reach the
original answer.

Exercises

Exercise 6.2.8 Let z = 3 + 3i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = rei .

Exercise 6.2.9 Let z = 2i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = rei .
2
Exercise 6.2.10 Let z = 4e 3 i be a complex number written in polar form. Convert z to standard form,
and write it in the form z = a + bi.

Exercise 6.2.11 Let z = 1e 6 i be a complex number written in polar form. Convert z to standard form,
and write it in the form z = a + bi.

Exercise 6.2.12 If z and w are two complex numbers and the polar form of z involves the angle while
the polar form of w involves the angle , show that in the polar form for zw the angle involved is + .
6.3. Roots of Complex Numbers 213

6.3 Roots of Complex Numbers

Outcomes
A. Understand De Moivres theorem and be able to use it to find the roots of a complex number.

A fundamental identity is the formula of De Moivre with which we begin this section.

Theorem 6.15: De Moivres Theorem


For any positive integer n, we have  n
ei = ein
Thus for any real number r > 0 and any positive integer n, we have:

(r (cos + i sin ))n = rn (cos n + i sin n )

Proof. The proof is by induction on n. It is clear the formula holds if n = 1. Suppose it is true for n. Then,
consider n + 1.
(r (cos + i sin ))n+1 = (r (cos + i sin ))n (r (cos + i sin ))
which by induction equals

= rn+1 (cos n + i sin n ) (cos + i sin )


= rn+1 ((cos n cos sin n sin ) + i (sin n cos + cos n sin ))
= rn+1 (cos (n + 1) + i sin (n + 1) )

by the formulas for the cosine and sine of the sum of two angles.
The process used in the previous proof, called mathematical induction is very powerful in Mathematics
and Computer Science and explored in more detail in the Appendix.
Now, consider a corollary of Theorem 6.15.

Corollary 6.16: Roots of Complex Numbers


Let z be a non zero complex number. Then there are always exactly k many kth roots of z in C.

Proof. Let z = a + bi and let z = |z| (cos + i sin ) be the polar form of the complex number. By De
Moivres theorem, a complex number

w = rei = r (cos + i sin )

is a kth root of z if and only if

wk = (rei )k = rk eik = rk (cos k + i sin k ) = |z| (cos + i sin )


214 Complex Numbers

This requires rk = |z| and so r = |z|1/k . Also, both cos (k ) = cos and sin (k ) = sin . This can only
happen if
k = + 2
for an integer. Thus
+ 2
= , = 0, 1, 2, , k 1
k
and so the kth roots of z are of the form
    
1/k + 2 + 2
|z| cos + i sin , = 0, 1, 2, , k 1
k k
Since the cosine and sine are periodic of period 2 , there are exactly k distinct numbers which result from
this formula.
The procedure for finding the k kth roots of z C is as follows.

Procedure 6.17: Finding Roots of a Complex Number


Let w be a complex number. We wish to find the nth roots of w, that is all z such that zn = w.
There are n distinct nth roots and they can be found as follows:.

1. Express both z and w in polar form z = rei , w = sei . Then zn = w becomes:

(rei )n = rn ein = sei

We need to solve for r and .

2. Solve the following two equations:

rn = s

ein = ei (6.1)

3. The solutions to rn = s are given by r = n
s.

4. The solutions to ein = ei are given by:

n = + 2 , for = 0, 1, 2, , n 1

or
2
= + , for = 0, 1, 2, , n 1
n n
5. Using the solutions r, to the equations given in (6.1) construct the nth roots of the form
z = rei .

Notice that once the roots are obtained in the final step, they can then be converted to standard form
if necessary. Lets consider an example of this concept. Note that according to Corollary 6.16, there are
exactly 3 cube roots of a complex number.
6.3. Roots of Complex Numbers 215

Example 6.18: Finding Cube Roots


Find the three cube roots of i. In other words find all z such that z3 = i.

Solution. First, convert each number to polar form: z = rei and i = 1ei /2 . The equation now becomes

(rei )3 = r3 e3i = 1ei /2

Therefore, the two equations that we need to solve are r3 = 1 and 3i = i /2. Given that r R and r3 = 1
it follows that r = 1.
Solving the second equation is as follows. First divide by i. Then, since the argument of i is not unique
we write 3 = /2 + 2 for = 0, 1, 2.

3 = /2 + 2 for = 0, 1, 2
2
= /6 + for = 0, 1, 2
3
For = 0:
2
= /6 + (0) = /6
3
For = 1:
2 5
= /6 + (1) =
3 6
For = 2:
2 3
= /6 + (2) =
3 2
Therefore, the three roots are given by
5 3
1ei /6 , 1ei 6 , 1ei 2

Written in standard form, these roots are, respectively,



3 1 3 1
+i , + i , i
2 2 2 2

The ability to find kth roots can also be used to factor some polynomials.

Example 6.19: Solving a Polynomial Equation


Factor the polynomial x3 27.

!
1 3
Solution. First find the cube roots of 27. By the above procedure , these cube roots are 3, 3 +i ,
2 2
!
1 3
and 3 i . You may wish to verify this using the above steps.
2 2
216 Complex Numbers

Therefore, x3 27 =
!! !!
1 3 1 3
(x 3) x 3 +i x3 i
2 2 2 2
     
Note also x 3 1
2 +i 2
3
x 3 12 i 2
3
= x2 + 3x + 9 and so

x3 27 = (x 3) x2 + 3x + 9
where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex numbers.
Note that even though the polynomial x3 27 has all real coefficients, it has some complex zeros,
! !
1 3 1 3
3 +i , and 3 i . These zeros are complex conjugates of each other. It is always
2 2 2 2
the case that if a polynomial has real coefficients and a complex root, it will also have a root equal to the
complex conjugate.

Exercises

Exercise 6.3.13 Give the complete solution to x4 + 16 = 0.

Exercise 6.3.14 Find the complex cube roots of 8.

Exercise 6.3.15 Find the four fourth roots of 16.

Exercise 6.3.16 De Moivres theorem says [r (cost + i sint)]n = rn (cos nt + i sin nt) for n a positive integer.
Does this formula continue to hold for all integers n, even negative integers? Explain.

Exercise 6.3.17 Factor x3 + 8 as a product of linear factors. Hint: Use the result of 6.3.14.

Exercise 6.3.18 Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored any
more using only real numbers.

Exercise 6.3.19 Completely factor x4 + 16 as a product of linear factors. Hint: Use the result of 6.3.15.

Exercise 6.3.20 Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be
factored further without using complex numbers.

Exercise 6.3.21 If n is an integer, is it always true that (cos i sin )n = cos (n ) i sin (n )? Explain.

Exercise 6.3.22 Suppose p (x) = an xn + an1 xn1 + + a1 x + a0 is a polynomial and it has n zeros,
z1 , z2 , , zn
listed according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) = (x z)m divides p (x)
but (x z) f (x) does not.) Show that
p (x) = an (x z1 ) (x z2 ) (x zn )
6.4. The Quadratic Formula 217

6.4 The Quadratic Formula

Outcomes
A. Use the Quadratic Formula to find the complex roots of a quadratic equation.

The roots (or solutions) of a quadratic equation ax2 + bx + c = 0 where a, b, c are real numbers are
obtained by solving the familiar quadratic formula given by

b b2 4ac
x=
2a

When working with real numbers, we cannot solve this formula if b2 4ac < 0. However, complex
numbers allow us to find square roots of negative numbers, and the quadratic formula remains valid for
finding roots of the corresponding quadratic
equation. In
this case there are exactly two distinct (complex)
2
square roots of b 4ac, which are i 4ac b and i 4ac b2 .
2

Here is an example.

Example 6.20: Solutions to Quadratic Equation


Find the solutions to x2 + 2x + 5 = 0.

Solution. In terms of the quadratic equation above, a = 1, b = 2, and c = 5. Therefore, we can use the
quadratic formula with these values, which becomes
q
2
b b 4ac 2 (2) 4(1)(5)
2
x= =
2a 2(1)

Solving this equation, we see that the solutions are given by



2i 4 20 2 4i
x= = = 1 2i
2 2

We can verify that these are solutions of the original equation. We will show x = 1 + 2i and leave
x = 1 2i as an exercise.

x2 + 2x + 5 = (1 + 2i)2 + 2(1 + 2i) + 5


= 1 4i 4 2 + 4i + 5
= 0

Hence x = 1 + 2i is a solution.
218 Complex Numbers

What if the coefficients of the quadratic equation are actually complex numbers? Does the formula
hold even in this case? The answer is yes. This is a hint on how to do Problem 6.4.26 below, a special case
of the fundamental theorem of algebra, and an ingredient in the proof of some versions of this theorem.
Consider the following example.

Example 6.21: Solutions to Quadratic Equation


Find the solutions to x2 2ix 5 = 0.

Solution. In terms of the quadratic equation above, a = 1, b = 2i, and c = 5. Therefore, we can use
the quadratic formula with these values, which becomes
q
2
b b2 4ac 2i (2i) 4(1)(5)
x= =
2a 2(1)
Solving this equation, we see that the solutions are given by

2i 4 + 20 2i 4
x= = = i2
2 2
We can verify that these are solutions of the original equation. We will show x = i + 2 and leave
x = i 2 as an exercise.

x2 2ix 5 = (i + 2)2 2i(i + 2) 5


= 1 + 4i + 4 + 2 4i 5
= 0

Hence x = i + 2 is a solution.
We conclude this section by stating an essential theorem.

Theorem 6.22: The Fundamental Theorem of Algebra


Any polynomial of degree at least 1 with complex coefficients has a root which is a complex number.

Exercises

Exercise 6.4.23 Show that 1 + i, 2 + i are the only two roots to


p (x) = x2 (3 + 2i) x + (1 + 3i)
Hence complex zeros do not necessarily come in conjugate pairs if the coefficients of the equation are not
real.

Exercise 6.4.24 Give the solutions to the following quadratic equations having real coefficients.
6.4. The Quadratic Formula 219

(a) x2 2x + 2 = 0

(b) 3x2 + x + 3 = 0

(c) x2 6x + 13 = 0

(d) x2 + 4x + 9 = 0

(e) 4x2 + 4x + 5 = 0

Exercise 6.4.25 Give the solutions to the following quadratic equations having complex coefficients.

(a) x2 + 2x + 1 + i = 0

(b) 4x2 + 4ix 5 = 0

(c) 4x2 + (4 + 4i) x + 1 + 2i = 0

(d) x2 4ix 5 = 0

(e) 3x2 + (1 i) x + 3i = 0

Exercise 6.4.26 Prove the fundamental theorem of algebra for quadratic polynomials having coefficients
in C. That is, show that an equation of the form
ax2 + bx + c = 0 where a, b, c are complex numbers, a 6= 0 has a complex solution. Hint: Consider the
fact, noted earlier that the expressions given from the quadratic formula do in fact serve as solutions.
7. Spectral Theory

7.1 Eigenvalues and Eigenvectors of a Matrix

Outcomes
A. Describe eigenvalues geometrically and algebraically.

B. Find eigenvalues and eigenvectors for a square matrix.

Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is of fundamental
importance in many areas and is the subject of our study for this chapter.

7.1.1. Definition of Eigenvectors and Eigenvalues

In this section, we will work with the entire set of complex numbers, denoted by C. Recall that the real
numbers, R are contained in the complex numbers, so the discussions in this section apply to both real and
complex numbers.
To illustrate the idea behind what will be discussed, consider the following example.

Example 7.1: Eigenvectors and Eigenvalues


Let
0 5 10
A = 0 22 16
0 9 2
Compute the product AX for
5 1
X = 4 , X = 0
3 0
What do you notice about AX in each of these products?

Solution. First, compute AX for


5
X = 4
3

221
222 Spectral Theory

This product is given by



0 5 10 5 50 5
AX = 0 22 16 4 = 40 = 10 4
0 9 2 3 30 3

In this case, the product AX resulted in a vector which is equal to 10 times the vector X . In other
words, AX = 10X .
Lets see what happens in the next product. Compute AX for the vector

1
X = 0
0

This product is given by



0 5 10 1 0 1

AX = 0 22 16 0 = 0 =0 0

0 9 2 0 0 0

In this case, the product AX resulted in a vector equal to 0 times the vector X , AX = 0X .
Perhaps this matrix is such that AX results in kX , for every vector X . However, consider

0 5 10 1 5
0 22 16 1 = 38
0 9 2 1 11

In this case, AX did not result in a vector of the form kX for some scalar k.
There is something special about the first two products calculated in Example 7.1. Notice that for
each, AX = kX where k is some scalar. When this equation holds for some X and k, we call the scalar
k an eigenvalue of A. We often use the special symbol instead of k when referring to eigenvalues. In
Example 7.1, the values 10 and 0 are eigenvalues for the matrix A and we can label these as 1 = 10 and
2 = 0.
When AX = X for some X 6= 0, we call such an X an eigenvector of the matrix A. The eigenvectors
of A are associated to an eigenvalue. Hence, if 1 is an eigenvalue of A and AX = 1 X , we can label this
eigenvector as X1 . Note again that in order to be an eigenvector, X must be nonzero.
There is also a geometric significance to eigenvectors. When you have a nonzero vector which, when
multiplied by a matrix results in another vector which is parallel to the first or equal to 0, this vector is
called an eigenvector of the matrix. This is the meaning when the vectors are in Rn .
The formal definition of eigenvalues and eigenvectors is as follows.
7.1. Eigenvalues and Eigenvectors of a Matrix 223

Definition 7.2: Eigenvalues and Eigenvectors


Let A be an n n matrix and let X Cn be a nonzero vector for which

AX = X (7.1)
for some scalar . Then is called an eigenvalue of the matrix A and X is called an eigenvector of
A associated with , or a -eigenvector of A.
The set of all eigenvalues of an n n matrix A is denoted by (A) and is referred to as the spectrum
of A.

The eigenvectors of a matrix A are those vectors X for which multiplication by A results in a vector in
the same direction or opposite direction to X . Since the zero vector 0 has no direction this would make no
sense for the zero vector. As noted above, 0 is never allowed to be an eigenvector.
Lets look at eigenvectors in more detail. Suppose X satisfies 7.1. Then

AX X = 0
or
(A I) X = 0

for some X 6= 0. Equivalently you could write ( I A) X = 0, which is more commonly used. Hence,
when we are looking for eigenvectors, we are looking for nontrivial solutions to this homogeneous system
of equations!
Recall that the solutions to a homogeneous system of equations consist of basic solutions, and the
linear combinations of those basic solutions. In this context, we call the basic solutions of the equation
( I A) X = 0 basic eigenvectors. It follows that any (nonzero) linear combination of basic eigenvectors
is again an eigenvector.
Suppose the matrix ( I A) is invertible, so that ( I A)1 exists. Then the following equation
would be true.

X = IX
 
= ( I A)1 ( I A) X
= ( I A)1 (( I A) X )
= ( I A)1 0
= 0

This claims that X = 0. However, we have required that X 6= 0. Therefore ( I A) cannot have an inverse!
Recall that if a matrix is not invertible, then its determinant is equal to 0. Therefore we can conclude
that
det ( I A) = 0 (7.2)
Note that this is equivalent to det (A I) = 0.
The expression det (xI A) is a polynomial (in the variable x) called the characteristic polynomial
of A, and det (xI A) = 0 is called the characteristic equation. For this reason we may also refer to the
eigenvalues of A as characteristic values, but the former is often used for historical reasons.
224 Spectral Theory

The following theorem claims that the roots of the characteristic polynomial are the eigenvalues of A.
Thus when 7.2 holds, A has a nonzero eigenvector.

Theorem 7.3: The Existence of an Eigenvector


Let A be an n n matrix and suppose det ( I A) = 0 for some C.
Then is an eigenvalue of A and thus there exists a nonzero vector X Cn such that AX = X .

Proof. For A an n n matrix, the method of Laplace Expansion demonstrates that det ( I A) is a polyno-
mial of degree n. As such, the equation 7.2 has a solution C by the Fundamental Theorem of Algebra.
The fact that is an eigenvalue is left as an exercise.

7.1.2. Finding Eigenvectors and Eigenvalues

Now that eigenvalues and eigenvectors have been defined, we will study how to find them for a matrix A.
First, consider the following definition.

Definition 7.4: Multiplicity of an Eigenvalue


Let A be an n n matrix with characteristic polynomial given by det (xI A). Then, the multiplicity
of an eigenvalue of A is the number of times occurs as a root of that characteristic polynomial.

For example, suppose the characteristic polynomial of A is given by (x 2)2 . Solving for the roots of
this polynomial, we set (x 2)2 = 0 and solve for x. We find that = 2 is a root that occurs twice. Hence,
in this case, = 2 is an eigenvalue of A of multiplicity equal to 2.
We will now look at how to find the eigenvalues and eigenvectors for a matrix A in detail. The steps
used are summarized in the following procedure.

Procedure 7.5: Finding Eigenvalues and Eigenvectors


Let A be an n n matrix.

1. First, find the eigenvalues of A by solving the equation det (xI A) = 0.

2. For each , find the basic eigenvectors X 6= 0 by finding the basic solutions to ( I A) X = 0.

To verify your work, make sure that AX = X for each and associated eigenvector X .

We will explore these steps further in the following example.

Example 7.6: Find the Eigenvalues and Eigenvectors


 
5 2
Let A = . Find its eigenvalues and eigenvectors.
7 4
7.1. Eigenvalues and Eigenvectors of a Matrix 225

Solution. We will use Procedure 7.5. First we find the eigenvalues of A by solving the equation

det (xI A) = 0

This gives
    
1 0 5 2
det x = 0
0 1 7 4
 
x + 5 2
det = 0
7 x4

Computing the determinant as usual, the result is

x2 + x 6 = 0

Solving this equation, we find that 1 = 2 and 2 = 3.


Now we need to find the basic eigenvectors for each . First we will find the eigenvectors for 1 = 2.
We wish to find all vectors X 6= 0 such that AX = 2X . These are the solutions to (2I A)X = 0.
        
1 0 5 2 x 0
2 =
0 1 7 4 y 0
    
7 2 x 0
=
7 2 y 0

The augmented matrix for this system and corresponding reduced row-echelon form are given by
  " #
7 2 0 1 72 0

7 2 0 0 0 0

The solution is any vector of the form


" # " #
2 2
7s 7
=s
s 1

Multiplying this vector by 7 we obtain a simpler description for the solution to this system, given by
 
2
t
7

This gives the basic eigenvector for 1 = 2 as


 
2
7

To check, we verify that AX = 2X for this basic eigenvector.


226 Spectral Theory

      
5 2 2 4 2
= =2
7 4 7 14 7
This is what we wanted, so we know this basic eigenvector is correct.
Next we will repeat this process to find the basic eigenvector for 2 = 3. We wish to find all vectors
X 6= 0 such that AX = 3X . These are the solutions to ((3)I A)X = 0.
        
1 0 5 2 x 0
(3) =
0 1 7 4 y 0
    
2 2 x 0
=
7 7 y 0

The augmented matrix for this system and corresponding reduced row-echelon form are given by
   
2 2 0 1 1 0

7 7 0 0 0 0

The solution is any vector of the form


   
s 1
=s
s 1

This gives the basic eigenvector for 2 = 3 as


 
1
1

To check, we verify that AX = 3X for this basic eigenvector.


      
5 2 1 3 1
= = 3
7 4 1 3 1
This is what we wanted, so we know this basic eigenvector is correct.
The following is an example using Procedure 7.5 for a 3 3 matrix.

Example 7.7: Find the Eigenvalues and Eigenvectors


Find the eigenvalues and eigenvectors for the matrix

5 10 5
A= 2 14 2
4 8 6

Solution. We will use Procedure 7.5. First we need to find the eigenvalues of A. Recall that they are the
solutions of the equation
det (xI A) = 0
7.1. Eigenvalues and Eigenvectors of a Matrix 227

In this case the equation is



1 0 0 5 10 5
det x 0 1 0 2 14 2 = 0
0 0 1 4 8 6

which becomes

x5 10 5
det 2 x 14 2 = 0
4 8 x6
Using Laplace Expansion, compute this determinant and simplify. The result is the following equation.

(x 5) x2 20x + 100 = 0

Solving this equation, we find that the eigenvalues are 1 = 5, 2 = 10 and 3 = 10. Notice that 10 is
a root of multiplicity two due to
x2 20x + 100 = (x 10)2
Therefore, 2 = 10 is an eigenvalue of multiplicity two.
Now that we have found the eigenvalues for A, we can compute the eigenvectors.
First we will find the basic eigenvectors for 1 = 5. In other words, we want to find all non-zero vectors
X so that AX = 5X . This requires that we solve the equation (5I A) X = 0 for X as follows.

1 0 0 5 10 5 x 0
5 0 1 0 2 14 2 y = 0
0 0 1 4 8 6 z 0

That is you need to find the solution to



0 10 5 x 0
2 9 2 y = 0
4 8 1 z 0

By now this is a familiar problem. You set up the augmented matrix and row reduce to get the solution.
Thus the matrix you must row reduce is

0 10 5 0
2 9 2 0
4 8 1 0

The reduced row-echelon form is


1 0 45 0
1
0 1 2 0
0 0 0 0
228 Spectral Theory

and so the solution is any vector of the form


5 5
4 s 4
1 1

2 s = s
2
s 1

where s R. If we multiply this vector by 4, we obtain a simpler description for the solution to this system,
as given by
5
t 2 (7.3)
4
where t R. Here, the basic eigenvector is given by

5
X1 = 2
4

Notice that we cannot let t = 0 here, because this would result in the zero vector and eigenvectors are
never equal to 0! Other than this value, every other choice of t in 7.3 results in an eigenvector.
It is a good idea to check your work! To do so, we will take the original matrix and multiply by the
basic eigenvector X1 . We check to see if we get 5X1 .

5 10 5 5 25 5
2 14 2 2 = 10 = 5 2
4 8 6 4 20 4

This is what we wanted, so we know that our calculations were correct.


Next we will find the basic eigenvectors for 2 , 3 = 10. These vectors are the basic solutions to the
equation,
1 0 0 5 10 5 x 0
10 0 1 0 2 14 2 y = 0
0 0 1 4 8 6 z 0
That is you must find the solutions to

5 10 5 x 0
2 4 2 y = 0
4 8 4 z 0

Consider the augmented matrix


5 10 5 0
2 4 2 0
4 8 4 0
The reduced row-echelon form for this matrix is

1 2 1 0
0 0 0 0
0 0 0 0
7.1. Eigenvalues and Eigenvectors of a Matrix 229

and so the eigenvectors are of the form



2s t 2 1
s = s 1 +t 0
t 0 1
Note that you cant pick t and s both equal to zero because this would result in the zero vector and
eigenvectors are never equal to zero.
Here, there are two basic eigenvectors, given by

2 1
X2 = 1 , X3 = 0
0 1

Taking any (nonzero) linear combination of X2 and X3 will also result in an eigenvector for the eigen-
value = 10. As in the case for = 5, always check your work! For the first basic eigenvector, we can
check AX2 = 10X2 as follows.

5 10 5 1 10 1
2 14 2 0 = 0 = 10 0
4 8 6 1 10 1
This is what we wanted. Checking the second basic eigenvector, X3 , is left as an exercise.
It is important to remember that for any eigenvector X , X 6= 0. However, it is possible to have eigen-
values equal to zero. This is illustrated in the following example.

Example 7.8: A Zero Eigenvalue


Let
2 2 2
A = 1 3 1
1 1 1
Find the eigenvalues and eigenvectors of A.

Solution. First we find the eigenvalues of A. We will do so using Definition 7.2.


In order to find the eigenvalues of A, we solve the following equation.

x 2 2 2
det (xI A) = det 1 x 3 1 =0
1 1 x 1

This reduces to x3 6x2 + 8x = 0. You can verify that the solutions are 1 = 0, 2 = 2, 3 = 4. Notice
that while eigenvectors can never equal 0, it is possible to have an eigenvalue equal to 0.
Now we will find the basic eigenvectors. For 1 = 0, we need to solve the equation (0I A) X = 0.
This equation becomes AX = 0, and so the augmented matrix for finding the solutions is given by

2 2 2 0
1 3 1 0
1 1 1 0
230 Spectral Theory

The reduced row-echelon form is


1 0 1 0
0 1 0 0
0 0 0 0

1

Therefore, the eigenvectors are of the form t 0 where t 6= 0 and the basic eigenvector is given by
1

1
X1 = 0
1

We can verify that this eigenvector is correct by checking that the equation AX1 = 0X1 holds. The
product AX1 is given by
2 2 2 1 0
AX1 = 1 3 1 0 = 0

1 1 1 1 0
This clearly equals 0X1 , so the equation holds. Hence, AX1 = 0X1 and so 0 is an eigenvalue of A.
Computing the other basic eigenvectors is left as an exercise.
In the following sections, we examine ways to simplify this process of finding eigenvalues and eigen-
vectors by using properties of special types of matrices.

7.1.3. Eigenvalues and Eigenvectors for Special Types of Matrices

There are three special kinds of matrices which we can use to simplify the process of finding eigenvalues
and eigenvectors. Throughout this section, we will discuss similar matrices, elementary matrices, as well
as triangular matrices.
We begin with a definition.

Definition 7.9: Similar Matrices


Let A and B be n n matrices. Suppose there exists an invertible matrix P such that

A = P1 BP

Then A and B are called similar matrices.

It turns out that we can use the concept of similar matrices to help us find the eigenvalues of matrices.
Consider the following lemma.

Lemma 7.10: Similar Matrices and Eigenvalues


Let A and B be similar matrices, so that A = P1 BP where A, B are n n matrices and P is invertible.
Then A, B have the same eigenvalues.
7.1. Eigenvalues and Eigenvectors of a Matrix 231

Proof. We need to show two things. First, we need to show that if A = P1 BP, then A and B have the same
eigenvalues. Secondly, we show that if A and B have the same eigenvalues, then A = P1 BP.
Here is the proof of the first statement. Suppose A = P1 BP and is an eigenvalue of A, that is
AX = X for some X 6= 0. Then
P1 BPX = X
and so
BPX = PX
Since P is one to one and X 6= 0, it follows that PX 6= 0. Here, PX plays the role of the eigenvector in
this equation. Thus is also an eigenvalue of B. One can similarly verify that any eigenvalue of B is also
an eigenvalue of A, and thus both matrices have the same eigenvalues as desired.
Proving the second statement is similar and is left as an exercise.
Note that this proof also demonstrates that the eigenvectors of A and B will (generally) be different.
We see in the proof that AX = X , while B (PX ) = (PX ). Therefore, for an eigenvalue , A will have
the eigenvector X while B will have the eigenvector PX .
The second special type of matrices we discuss in this section is elementary matrices. Recall from
Definition 2.43 that an elementary matrix E is obtained by applying one row operation to the identity
matrix.
It is possible to use elementary matrices to simplify a matrix before searching for its eigenvalues and
eigenvectors. This is illustrated in the following example.

Example 7.11: Simplify Using Elementary Matrices


Find the eigenvalues for the matrix

33 105 105
A = 10 28 30
20 60 62

Solution. This matrix has big numbers and therefore we would like to simplify as much as possible before
computing the eigenvalues.
We will do so using row operations. First, add 2 times the second row to the third row. To do so, left
multiply A by E (2, 2). Then right multiply A by the inverse of E (2, 2) as illustrated.

1 0 0 33 105 105 1 0 0 33 105 105
0 1 0 10 28 30 0 1 0 = 10 32 30
0 2 1 20 60 62 0 2 1 0 0 2
By Lemma 7.10, the resulting matrix has the same eigenvalues as A where here, the matrix E (2, 2) plays
the role of P.
We do this step again, as follows. In this step, we use the elementary matrix obtained by adding 3
times the second row to the first row.

1 3 0 33 105 105 1 3 0 3 0 15
0 1 0 10 32 30 0 1 0 = 10 2 30 (7.4)
0 0 1 0 0 2 0 0 1 0 0 2
232 Spectral Theory

Again by Lemma 7.10, this resulting matrix has the same eigenvalues as A. At this point, we can easily
find the eigenvalues. Let
3 0 15
B = 10 2 30
0 0 2
Then, we find the eigenvalues of B (and therefore of A) by solving the equation det (xI B) = 0. You
should verify that this equation becomes

(x + 2) (x + 2) (x 3) = 0

Solving this equation results in eigenvalues of 1 = 2, 2 = 2, and 3 = 3. Therefore, these are also
the eigenvalues of A.

Through using elementary matrices, we were able to create a matrix for which finding the eigenvalues
was easier than for A. At this point, you could go back to the original matrix A and solve ( I A) X = 0
to obtain the eigenvectors of A.
Notice that when you multiply on the right by an elementary matrix, you are doing the column op-
eration defined by the elementary matrix. In 7.4 multiplication by the elementary matrix on the right
merely involves taking three times the first column and adding to the second. Thus, without referring to
the elementary matrices, the transition to the new matrix in 7.4 can be illustrated by

33 105 105 3 9 15 3 0 15
10 32 30 10 32 30 10 2 30
0 0 2 0 0 2 0 0 2

The third special type of matrix we will consider in this section is the triangular matrix. Recall Defi-
nition 3.12 which states that an upper (lower) triangular matrix contains all zeros below (above) the main
diagonal. Remember that finding the determinant of a triangular matrix is a simple procedure of taking
the product of the entries on the main diagonal.. It turns out that there is also a simple way to find the
eigenvalues of a triangular matrix.
In the next example we will demonstrate that the eigenvalues of a triangular matrix are the entries on
the main diagonal.

Example 7.12: Eigenvalues for a Triangular Matrix



1 2 4
Let A = 0 4 7 . Find the eigenvalues of A.
0 0 6

Solution. We need to solve the equation det (xI A) = 0 as follows



x 1 2 4
det (xI A) = det 0 x 4 7 = (x 1) (x 4) (x 6) = 0
0 0 x6
7.1. Eigenvalues and Eigenvectors of a Matrix 233

Solving the equation (x 1) (x 4) (x 6) = 0 for x results in the eigenvalues 1 = 1, 2 = 4 and


3 = 6. Thus the eigenvalues are the entries on the main diagonal of the original matrix.
The same result is true for lower triangular matrices. For any triangular matrix, the eigenvalues are
equal to the entries on the main diagonal. To find the eigenvectors of a triangular matrix, we use the usual
procedure.
In the next section, we explore an important process involving the eigenvalues and eigenvectors of a
matrix.

Exercises

Exercise 7.1.1 If A is an invertible n n matrix, compare the eigenvalues of A and A1 . More generally,
for m an arbitrary integer, compare the eigenvalues of A and Am .

Exercise 7.1.2 If A is an n n matrix and c is a nonzero constant, compare the eigenvalues of A and cA.

Exercise 7.1.3 Let A, B be invertible n n matrices which commute. That is, AB = BA. Suppose X is an
eigenvector of B. Show that then AX must also be an eigenvector for B.

Exercise 7.1.4 Suppose A is an n n matrix and it satisfies Am = A for some m a positive integer larger
than 1. Show that if is an eigenvalue of A then | | equals either 0 or 1.

Exercise 7.1.5 Show that if AX = X and AY = Y , then whenever k, p are scalars,

A (kX + pY ) = (kX + pY )

Does this imply that kX + pY is an eigenvector? Explain.

Exercise 7.1.6 Suppose A is a 3 3 matrix and the following information is available.



0 0
A 1 = 0 1
1 1

1 1
A 1 = 2 1
1 1

2 2
A 3 = 2 3
2 2

1
Find A 4 .
3
234 Spectral Theory

Exercise 7.1.7 Suppose A is a 3 3 matrix and the following information is available.



1 1
A 2 = 1 2
2 2

1 1
A 1 = 0 1
1 1

1 1
A 4 = 2 4
3 3

3
Find A 4 .
3

Exercise 7.1.8 Suppose A is a 3 3 matrix and the following information is available.



0 0
A 1 = 2 1
1 1

1 1
A 1 = 1 1
1 1

3 3
A 5 = 3 5
4 4

2
Find A 3 .
3

Exercise 7.1.9 Find the eigenvalues and eigenvectors of the matrix



6 92 12
0 0 0
2 31 4

One eigenvalue is 2.

Exercise 7.1.10 Find the eigenvalues and eigenvectors of the matrix



2 17 6
0 0 0
1 9 3

One eigenvalue is 1.
7.2. Diagonalization 235

Exercise 7.1.11 Find the eigenvalues and eigenvectors of the matrix



9 2 8
2 6 2
8 2 5

One eigenvalue is 3.

Exercise 7.1.12 Find the eigenvalues and eigenvectors of the matrix



6 76 16
2 21 4
2 64 17

One eigenvalue is 2.

Exercise 7.1.13 Find the eigenvalues and eigenvectors of the matrix



3 5 2
8 11 4
10 11 3

One eigenvalue is -3.

Exercise 7.1.14 Is it possible for a nonzero matrix to have only 0 as an eigenvalue?

Exercise 7.1.15 If A is the matrix of a linear transformation which rotates all vectors in R2 through 60 ,
explain why A cannot have any real eigenvalues. Is there an angle such that rotation through this angle
would have a real eigenvalue? What eigenvalues would be obtainable in this way?

Exercise 7.1.16 Let A be the 2 2 matrix of the linear transformation which rotates all vectors in R2
through an angle of . For which values of does A have a real eigenvalue?

Exercise 7.1.17 Let T be the linear transformation which reflects vectors about the x axis. Find a matrix
for T and then find its eigenvalues and eigenvectors.

Exercise 7.1.18 Let T be the linear transformation which rotates all vectors in R2 counterclockwise
through an angle of /2. Find a matrix of T and then find eigenvalues and eigenvectors.

Exercise 7.1.19 Let T be the linear transformation which reflects all vectors in R3 through the xy plane.
Find a matrix for T and then obtain its eigenvalues and eigenvectors.

7.2 Diagonalization
236 Spectral Theory

Outcomes
A. Determine when it is possible to diagonalize a matrix.

B. When possible, diagonalize a matrix.

7.2.1. Diagonalizing a Matrix

The most important theorem about diagonalizability is the following major result.

Theorem 7.13: Eigenvectors and Diagonalizable Matrices


An n n matrix A is diagonalizable if and only if there is an invertible matrix P given by
 
P = X1 X2 Xn

where the Xk are eigenvectors of A.


Moreover if A is diagonalizable, the corresponding eigenvalues of A are the diagonal entries of the
diagonal matrix D.

Proof. Suppose P is given as above as an invertible matrix whose columns are eigenvectors of A. Then
P1 is of the form
W1T
WT
2
P1 = ..
.
WnT
where WkT X j = k j , which is the Kroneckers symbol defined by

1 if i = j
i j =
0 if i 6= j

Then

W1T
W2T  

P1 AP = .. AX1 AX2 AXn
.
WnT

W1T
W2T  

= .. 1 X1 2 X2 n Xn
.
WnT

1 0
..
= .
0 n
7.2. Diagonalization 237

Conversely, suppose A is diagonalizable so that P1 AP = D. Let


 
P = X1 X2 Xn

where the columns are the Xk and


1 0
..
D= .
0 n
Then
1 0
  ..
AP = PD = X1 X2 Xn .
0 n
and so    
AX1 AX2 AXn = 1 X1 2 X2 n Xn
showing the Xk are eigenvectors of A and the k are eigenvectors.
Notice that because the matrix P defined above is invertible it follows that the set of eigenvectors of A,
{X1 , X2 , , Xn}, form a basis of Rn .
We demonstrate the concept given in the above theorem in the next example. Note that not only are
the columns of the matrix P formed by eigenvectors, but P must be invertible so must consist of a wide
variety of eigenvectors. We achieve this by using basic eigenvectors for the columns of P.

Example 7.14: Diagonalize a Matrix


Let
2 0 0
A= 1 4 1
2 4 4
Find an invertible matrix P and a diagonal matrix D such that P1 AP = D.

Solution. By Theorem 7.13 we use the eigenvectors of A as the columns of P, and the corresponding
eigenvalues of A as the diagonal entries of D.
First, we will find the eigenvalues of A. To do so, we solve det (xI A) = 0 as follows.

1 0 0 2 0 0
det x 0 1 0 1 4 1 = 0
0 0 1 2 4 4

This computation is left as an exercise, and you should verify that the eigenvalues are 1 = 2, 2 = 2,
and 3 = 6.
238 Spectral Theory

Next, we need to find the eigenvectors. We first find the eigenvectors for 1 , 2 = 2. Solving (2I A) X =
0 to find the eigenvectors, we find that the eigenvectors are

2 1
t 1 +s 0

0 1

where t, s are scalars. Hence there are two basic eigenvectors which are given by

2 1
X1 = 1 , X2 = 0

0 1

0
You can verify that the basic eigenvector for 3 = 6 is X3 = 1
2
Then, we construct the matrix P as follows.

  2 1 0
P = X1 X2 X3 = 1 0 1
0 1 2

That is, the columns of P are the basic eigenvectors of A. Then, you can verify that
1 1 1
4 2 4

1 1
1
P = 2 1 2

1 1 1
4 2 4

Thus,

41 1
2
1
4
2 0 0 2 1 0
1 1
P AP =
1
2 1 2

1 4 1 1 0 1
1 1 1 2 4 4 0 1 2
4 2 4

2 0 0
= 0 2 0
0 0 6

You can see that the result here is a diagonal matrix where the entries on the main diagonal are the
eigenvalues of A. We expected this based on Theorem 7.13. Notice that eigenvalues on the main diagonal
must be in the same order as the corresponding eigenvectors in P.
Consider the next important theorem.
7.2. Diagonalization 239

Theorem 7.15: Linearly Independent Eigenvectors


Let A be an n n matrix, and suppose that A has distinct eigenvalues 1 , 2 , . . . , m . For each i, let
Xi be a i -eigenvector of A. Then {X1 , X2 , . . . , Xm } is linearly independent.

The corollary that follows from this theorem gives a useful tool in determining if A is diagonalizable.

Corollary 7.16: Distinct Eigenvalues


Let A be an n n matrix and suppose it has n distinct eigenvalues. Then it follows that A is diago-
nalizable.

It is possible that a matrix A cannot be diagonalized. In other words, we cannot find an invertible
matrix P so that P1 AP = D.
Consider the following example.

Example 7.17: A Matrix which cannot be Diagonalized


Let  
1 1
A=
0 1
If possible, find an invertible matrix P and diagonal matrix D so that P1 AP = D.

Solution. Through the usual procedure, we find that the eigenvalues of A are 1 = 1, 2 = 1. To find the
eigenvectors, we solve the equation ( I A) X = 0. The matrix ( I A) is given by
 
1 1
0 1

Substituting in = 1, we have the matrix


   
1 1 1 0 1
=
0 11 0 0

Then, solving the equation ( I A) X = 0 involves carrying the following augmented matrix to its
reduced row-echelon form.    
0 1 0 0 1 0

0 0 0 0 0 0
Then the eigenvectors are of the form  
1
t
0
and the basic eigenvector is  
1
X1 =
0
240 Spectral Theory

In this case, the matrix A has one eigenvalue of multiplicity two, but only one basic eigenvector. In
order to diagonalize A, we need to construct an invertible 2 2 matrix P. However, because A only has
one basic eigenvector, we cannot construct this P. Notice that if we were to use X1 as both columns of P,
P would not be invertible. For this reason, we cannot repeat eigenvectors in P.
Hence this matrix cannot be diagonalized.
The idea that a matrix may not be diagonalizable suggests that conditions exist to determine when it
is possible to diagonalize a matrix. We saw earlier in Corollary 7.16 that an n n matrix with n distinct
eigenvalues is diagonalizable. It turns out that there are other useful diagonalizability tests.
First we need the following definition.

Definition 7.18: Eigenspace


Let A be an n n matrix and R. The eigenspace of A corresponding to , written E (A) is the
set of all eigenvectors corresponding to .

In other words, the eigenspace E (A) is all X such that AX = X . Notice that this set can be written
E (A) = null( I A), showing that E (A) is a subspace of Rn .
Recall that the multiplicity of an eigenvalue is the number of times that it occurs as a root of the
characteristic polynomial.
Consider now the following lemma.

Lemma 7.19: Dimension of the Eigenspace


If A is an n n matrix, then
dim(E (A)) m
where is an eigenvalue of A of multiplicity m.

This result tells us that if is an eigenvalue of A, then the number of linearly independent -eigenvectors
is never more than the multiplicity of . We now use this fact to provide a useful diagonalizability condi-
tion.

Theorem 7.20: Diagonalizability Condition


Let A be an n n matrix A. Then A is diagonalizable if and only if for each eigenvalue of A,
dim(E (A)) is equal to the multiplicity of .

7.2.2. Complex Eigenvalues

In some applications, a matrix may have eigenvalues which are complex numbers. For example, this often
occurs in differential equations. These questions are approached in the same way as above.
Consider the following example.
7.2. Diagonalization 241

Example 7.21: A Real Matrix with Complex Eigenvalues


Let
1 0 0
A = 0 2 1
0 1 2
Find the eigenvalues and eigenvectors of A.

Solution. We will first find the eigenvalues as usual by solving the following equation.

1 0 0 1 0 0
det x 0 1 0 0 2 1 = 0
0 0 1 0 1 2

This reduces to (x 1) x2 4x + 5 = 0. The solutions are 1 = 1, 2 = 2 + i and 3 = 2 i.
There is nothing new about finding the eigenvectors for 1 = 1 so this is left as an exercise.
Consider now the eigenvalue 2 = 2 + i. As usual, we solve the equation ( I A) X = 0 as given by

1 0 0 1 0 0 0
(2 + i) 0 1 0 0 2 1 X = 0
0 0 1 0 1 2 0
In other words, we need to solve the system represented by the augmented matrix

1+i 0 0 0
0 i 1 0
0 1 i 0

We now use our row operations to solve the system. Divide the first row by (1 + i) and then take i
times the second row and add to the third row. This yields

1 0 0 0
0 i 1 0
0 0 0 0
Now multiply the second row by i to obtain the reduced row-echelon form, given by

1 0 0 0
0 1 i 0
0 0 0 0
Therefore, the eigenvectors are of the form
0
t i
1
and the basic eigenvector is given by
0
X2 = i
1
242 Spectral Theory

As an exercise, verify that the eigenvectors for 3 = 2 i are of the form



0
t i
1

Hence, the basic eigenvector is given by


0
X3 = i
1
As usual, be sure to check your answers! To verify, we check that AX3 = (2 i) X3 as follows.

1 0 0 0 0 0
0 2 1 i = 1 2i = (2 i) i
0 1 2 1 2i 1

Therefore, we know that this eigenvector and eigenvalue are correct.


Notice that in Example 7.21, two of the eigenvalues were given by 2 = 2 + i and 3 = 2 i. You may
recall that these two complex numbers are conjugates. It turns out that whenever a matrix containing real
entries has a complex eigenvalue , it also has an eigenvalue equal to , the conjugate of .

Exercises

Exercise 7.2.20 Find the eigenvalues and eigenvectors of the matrix



5 18 32
0 5 4
2 5 11

One eigenvalue is 1. Diagonalize if possible.

Exercise 7.2.21 Find the eigenvalues and eigenvectors of the matrix



13 28 28
4 9 8
4 8 9

One eigenvalue is 3. Diagonalize if possible.

Exercise 7.2.22 Find the eigenvalues and eigenvectors of the matrix



89 38 268
14 2 40
30 12 90
7.2. Diagonalization 243

One eigenvalue is 3. Diagonalize if possible.

Exercise 7.2.23 Find the eigenvalues and eigenvectors of the matrix



1 90 0
0 2 0
3 89 2

One eigenvalue is 1. Diagonalize if possible.

Exercise 7.2.24 Find the eigenvalues and eigenvectors of the matrix



11 45 30
10 26 20
20 60 44

One eigenvalue is 1. Diagonalize if possible.

Exercise 7.2.25 Find the eigenvalues and eigenvectors of the matrix



95 25 24
196 53 48
164 42 43

One eigenvalue is 5. Diagonalize if possible.

Exercise 7.2.26 Suppose A is an n n matrix and let V be an eigenvector such that AV = V . Also
suppose the characteristic polynomial of A is

det (xI A) = xn + an1 xn1 + + a1 x + a0

Explain why 
An + an1 An1 + + a1 A + a0 I V = 0
If A is diagonalizable, give a proof of the Cayley Hamilton theorem based on this. This theorem says A
satisfies its characteristic equation,

An + an1 An1 + + a1 A + a0 I = 0

Exercise 7.2.27 Suppose the characteristic polynomial of an n n matrix A is 1 X n . Find Amn where m
is an integer.

Exercise 7.2.28 Find the eigenvalues and eigenvectors of the matrix



15 24 7
6 5 1
58 76 20

One eigenvalue is 2. Diagonalize if possible. Hint: This one has some complex eigenvalues.
244 Spectral Theory

Exercise 7.2.29 Find the eigenvalues and eigenvectors of the matrix



15 25 6
13 23 4
91 155 30

One eigenvalue is 2. Diagonalize if possible. Hint: This one has some complex eigenvalues.

Exercise 7.2.30 Find the eigenvalues and eigenvectors of the matrix



11 12 4
8 17 4
4 28 3

One eigenvalue is 1. Diagonalize if possible. Hint: This one has some complex eigenvalues.

Exercise 7.2.31 Find the eigenvalues and eigenvectors of the matrix



14 12 5
6 2 1
69 51 21

One eigenvalue is 3. Diagonalize if possible. Hint: This one has some complex eigenvalues.

Exercise 7.2.32 Suppose A is an n n matrix consisting entirely of real entries but a + ib is a complex
eigenvalue having the eigenvector, X + iY Here X and Y are real vectors. Show that then a ib is also an
eigenvalue with the eigenvector, X iY . Hint: You should remember that the conjugate of a product of
complex numbers equals the product of the conjugates. Here a + ib is a complex number whose conjugate
equals a ib.

7.3 Applications of Spectral Theory

Outcomes
A. Use diagonalization to find a high power of a matrix.

B. Use diagonalization to solve dynamical systems.

7.3.1. Raising a Matrix to a High Power

Suppose we have a matrix A and we want to find A50 . One could try to multiply A with itself 50 times, but
this is computationally extremely intensive (try it!). However diagonalization allows us to compute high
7.3. Applications of Spectral Theory 245

powers of a matrix relatively easily. Suppose A is diagonalizable, so that P1 AP = D. We can rearrange


this equation to write A = PDP1 .
Now, consider A2 . Since A = PDP1 , it follows that
2
A2 = PDP1 = PDP1 PDP1 = PD2 P1

Similarly,
3
A3 = PDP1 = PDP1 PDP1 PDP1 = PD3 P1
In general, n
An = PDP1 = PDn P1
Therefore, we have reduced the problem to finding Dn . In order to compute Dn , then because D is
diagonal we only need to raise every entry on the main diagonal of D to the power of n.
Through this method, we can compute large powers of matrices. Consider the following example.

Example 7.22: Raising a Matrix to a High Power



2 1 0
Let A = 0 1 0 . Find A50 .
1 1 1

Solution. We will first diagonalize A. The steps are left as an exercise and you may wish to verify that the
eigenvalues of A are 1 = 1, 2 = 1, and 3 = 2.
The basic eigenvectors corresponding to 1 , 2 = 1 are

0 1
X1 = 0 , X2 = 1
1 0

The basic eigenvector corresponding to 3 = 2 is



1
X3 = 0
1

Now we construct P by using the basic eigenvectors of A as the columns of P. Thus



  0 1 1
P = X1 X2 X3 = 0 1 0
1 0 1

Then also
1 1 1
P1 = 0 1 0
1 1 0
which you may wish to verify.
246 Spectral Theory

Then,

1 1 1 2 1 0 0 1 1
P1 AP = 0 1 0 0 1 0 0 1 0
1 1 0 1 1 1 1 0 1

1 0 0
= 0 1 0
0 0 2
= D

Now it follows by rearranging the equation that



0 1 1 1 0 0 1 1 1
A = PDP1 = 0 1 0 0 1 0 0 1 0
1 0 1 0 0 2 1 1 0

Therefore,
A50 = PD50 P1
50
0 1 1 1 0 0 1 1 1
= 0 1 0 0 1 0 0 1 0
1 0 1 0 0 2 1 1 0

By our discussion above, D50 is found as follows.


50 50
1 0 0 1 0 0
0 1 0 = 0 150 0
0 0 2 0 0 2 50

It follows that
50
0 1 1 1 0 0 1 1 1
A50 = 0 1 0 0 150 0 0 1 0
1 0 1 0 0 2 50 1 1 0

250 1 + 250 0
= 0 1 0
12 50 12 50 1


Through diagonalization, we can efficiently compute a high power of A. Without this, we would be
forced to multiply this by hand!
The next section explores another interesting application of diagonalization.

7.3.2. Raising a Symmetric Matrix to a High Power

We already have seen how to use matrix diagonalization to compute powers of matrices. This requires
computing eigenvalues of the matrix A, and finding an invertible matrix of eigenvectors P such that P1 AP
7.3. Applications of Spectral Theory 247

is diagonal. In this section we will see that if the matrix A is symmetric (see Definition 2.29), then we can
actually find such a matrix P that is an orthogonal matrix of eigenvectors. Thus P1 is simply its transpose
PT , and PT AP is diagonal. When this happens we say that A is orthogonally diagonalizable
In fact this happens if and only if A is a symmetric matrix as shown in the following important theorem.

Theorem 7.23: Principal Axis Theorem


The following conditions are equivalent for an n n matrix A:

1. A is symmetric.

2. A has an orthonormal set of eigenvectors.

3. A is orthogonally diagonalizable.

Proof. The complete proof is beyond this course, but to give an idea assume that A has an orthonormal
set of eigenvectors, and let P consist of these eigenvectors as columns. Then P1 = PT , and PT AP = D a
diagonal matrix. But then A = PDPT , and

AT = (PDPT )T = (PT )T DT PT = PDPT = A

so A is symmetric.
Now given a symmetric matrix A, one shows that eigenvectors corresponding to different eigenvalues
are always orthogonal. So it suffices to apply the Gram-Schmidt process on the set of basic eigenvectors
of each eigenvalue to obtain an orthonormal set of eigenvectors.
We demonstrate this in the following example.

Example 7.24: Orthogonal Diagonalization of a Symmetric Matrix



1 0 0
0 3 1
Let A =

2 2 . Find an orthogonal matrix P such that PT AP is a diagonal matrix.

0 12 32

Solution. In this case, verify that the eigenvalues are 2 and 1. First we will find an eigenvector for the
eigenvalue 2. This involves row reducing the following augmented matrix.

21 0 0 0
0 2 32 12 0


0 12 2 32 0

The reduced row-echelon form is


1 0 0 0
0 1 1 0
0 0 0 0
248 Spectral Theory

and so an eigenvector is
0
1
1
Finally to obtain an eigenvector of length one (unit eigenvector) we simply divide this vector by its length
to yield:
0
1/ 2

1/ 2
Next consider the case of the eigenvalue 1. To obtain basic eigenvectors, the matrix which needs to be
row reduced in this case is
11 0 0 0
0 1 32 12 0


0 12 1 32 0

The reduced row-echelon form is


0 1 1 0
0 0 0 0
0 0 0 0
Therefore, the eigenvectors are of the form
s
t
t
Note that all these vectors are automatically orthogonal to eigenvectors corresponding to the first eigen-
value. This follows from the fact that A is symmetric, as mentioned earlier.
We obtain basic eigenvectors
1 0
0 and 1
0 1
Since they are themselves orthogonal (by luck here) we do not need to use the Gram-Schmidt process and
instead simply normalize these vectors to obtain

1 0
0 and 1/ 2

0 1/ 2
An orthogonal matrix P to orthogonally diagonalize A is then obtained by letting these basic vectors be
the columns.
0 1 0
P = 1/ 2 0 1/2
1/ 2 0 1/ 2
We verify this works. PT AP is of the form

0 12 2 12 2 1 0 0
0 3 1 0 1 0
1 0 0 2 2 1/ 2 0 1/ 2

1
0 2 2 2 2 1 0 12 32 1/ 2 0 1/ 2
7.3. Applications of Spectral Theory 249


1 0 0
= 0 1 0
0 0 2
which is the desired diagonal matrix.
We can now apply this technique to efficiently compute high powers of a symmetric matrix.

Example 7.25: Powers of a Symmetric Matrix



1 0 0
0 3 1
Let A =

2 2 . Compute A7 .

0 12 32

Solution. We found in Example 7.24 that PT AP = D is diagonal, where



0 1 0 1 0 0
P = 1/ 2 0 1/2 and D = 0 1 0
1/ 2 0 1/ 2 0 0 2

Thus A = PDPT and A7 = PDPT PDPT PDPT = PD7 PT which gives:



7 0 21 2 1
2
0 1 0 1 0 0 2

A7
= 1/ 2 0 1/2 0 1 0 1 1 0 0

0 0 2 1
1/ 2 0 1/ 2 0 2 2 2 2

0 1 2 1
0 1 0 1 0 0 2 2 2


= 1/ 2 0 1/2
0 1 0 1 0 0
7 1
1

1/ 2 0 1/ 2 0 0 2 0 2 2 2 2

0 12 2 1
2 2
0 1 0
= 1/ 2 0 1/2

1 0 0
27 27

1/ 2 0 1/ 2 0 2 2 2 2

1 0 0
27 +1 27 1
0 2 2
=
7 27 +1
0 2 21 2

7.3.3. Markov Matrices

There are applications of great importance which feature a special type of matrix. Matrices whose columns
consist of non-negative numbers that sum to one are called Markov matrices. An important application
250 Spectral Theory

of Markov matrices is in population migration, as illustrated in the following definition.

Definition 7.26: Migration Matrices


Let m locations be denoted by the numbers 1, 2, , m. Suppose it is the case that each year the
proportion of residents in location j which move to location i is ai j . Also suppose no one escapes or
emigrates from without these m locations. This last assumption requires i ai j = 1, and means that
the matrix A, such that A = ai j , is a Markov matrix. In this context, A is also called a migration
matrix.

Consider the following example which demonstrates this situation.

Example 7.27: Migration Matrix


Let A be a Markov matrix given by  
.4 .2
A=
.6 .8
Verify that A is a Markov matrix and describe the entries of A in terms of population migration.

Solution. The columns of A are comprised of non-negative numbers which sum to 1. Hence, A is a Markov
matrix.
Now, consider the entries ai j of A in terms of population. The entry a11 = .4 is the proportion of
residents in location one which stay in location one in a given time period. Entry a21 = .6 is the proportion
of residents in location 1 which move to location 2 in the same time period. Entry a12 = .2 is the proportion
of residents in location 2 which move to location 1. Finally, entry a22 = .8 is the proportion of residents
in location 2 which stay in location 2 in this time period.
Considered as a Markov matrix, these numbers are usually identified with probabilities. Hence, we
can say that the probability that a resident of location one will stay in location one in the time period is .4.

Observe that in Example 7.27 if there was initially say 15 thousand people in location 1 and 10 thou-
sands in location 2, then after one year there would be .4 15 + .2 10 = 8 thousands people in location
1 the following year, and similarly there would be .6 15 + .8 10 = 17 thousands people in location 2
the following year.
More generally let Xn = [x1n xmn ]T where xin is the population of location i at time period n. We
call Xn the state vector at period n. In particular, we call X0 the initial state vector. Letting A be the
migration matrix, we compute the population in each location i one time period later by AXn . In order to
find the population of location i after k years, we compute the ith component of Ak X . This discussion is
summarized in the following theorem.

Theorem 7.28: State Vector


Let A be the migration matrix of a population and let Xn be the vector whose entries give the
population of each location at time period n. Then Xn is the state vector at period n and it follows
that
Xn+1 = AXn
7.3. Applications of Spectral Theory 251

The sum of the entries of Xn will equal the sum of the entries of the initial vector X0 . Since the columns
of A sum to 1, this sum is preserved for every multiplication by A as demonstrated below.
!
ai j x j = x j ai j = xj
i j j i j

Consider the following example.

Example 7.29: Using a Migration Matrix


Consider the migration matrix
.6 0 .1
A = .2 .8 0
.2 .2 .9
for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 in location 2 and
400 in location 3. Find the population in the three locations after 1, 2, and 10 units of time.

Solution. Using Theorem 7.28 we can find the population in each location using the equation Xn+1 = AXn .
For the population after 1 unit, we calculate X1 = AX0 as follows.

X1 = AX0

x11 .6 0 .1 100
x21 = .2 .8 0 200
x31 .2 .2 .9 400

100
= 180
420

Therefore after one time period, location 1 has 100 residents, location 2 has 180, and location 3 has 420.
Notice that the total population is unchanged, it simply migrates within the given locations. We find the
locations after two time periods in the same way.

X2 = AX1

x12 .6 0 .1 100
x22 = .2 .8 0 180
x32 .2 .2 .9 420

102
= 164
434

We could progress in this manner to find the populations after 10 time periods. However from our
above discussion, we can simply calculate (An X0 )i , where n denotes the number of time periods which
have passed. Therefore, we compute the populations in each location after 10 units of time as follows.

X10 = A10 X0
252 Spectral Theory

10
x110 .6 0 .1 100
x210 = .2 .8 0 200
x310 .2 .2 .9 400

115. 085 829 22
= 120. 130 672 44
464. 783 498 34

Since we are speaking about populations, we would need to round these numbers to provide a logical
answer. Therefore, we can say that after 10 units of time, there will be 115 residents in location one, 120
in location two, and 465 in location three.
A second important application of Markov matrices is the concept of random walks. Suppose a walker
has m locations to choose from, denoted 1, 2, , m. Let ai j refer to the probability that the person will
travel to location i from location j. Again, this requires that
k
ai j = 1
i=1

In this context, the vector Xn = [x1n xmn ]T contains the probabilities xin the walker ends up in location
i, 1 i m at time n.

Example 7.30: Random Walks


Suppose three locations exist, referred to as locations 1, 2 and 3. The Markov matrix of probabilities
A = [ai j ] is given by
0.4 0.1 0.5
0.4 0.6 0.1
0.2 0.3 0.4
If the walker starts in location 1, calculated the probability that he ends up in location 3 at time
n = 2.

Solution. Since the walker begins in location 1, we have



1
X0 = 0

0

The goal is to calculate X2 , using Xn+1 = AXn.

X1 = AX0

0.4 0.1 0.5 1
= 0.4 0.6 0.1 0
0.2 0.3 0.4 0

0.4
= 0.4
0.2
7.3. Applications of Spectral Theory 253

X2 = AX1

0.4 0.1 0.5 0.4
= 0.4 0.6 0.1 0.4
0.2 0.3 0.4 0.2

0.3
= 0.42
0.28

This gives the probabilities that our walker ends up in locations 1, 2, and 3. For this example we are
interested in location 3, with a probability on 0.28.
Returning to the context of migration, suppose we wish to know how many residents will be in a
certain location after a very long time. It turns out that if some power of the migration matrix has all
positive entries, then there is a vector Xs such that An X0 approaches Xs as n becomes very large. Hence as
more time passes and n increases, An X0 will become closer to the vector Xs.
Consider Theorem 7.28. Let n increase so that Xn approaches Xs . As Xn becomes closer to Xs , so too
does Xn+1 . For sufficiently large n, the statement Xn+1 = AXn can be written as Xs = AXs.
This discussion motivates the following theorem.

Theorem 7.31: Steady State Vector


Let A be a migration matrix. Then there exists a steady state vector written Xs such that

Xs = AXs

where Xs has positive entries which have the same sum as the entries of X0.
As n increases, the state vectors Xn will approach Xs .

Note that the condition in Theorem 7.31 can be written as (I A)Xs = 0, representing a homogeneous
system of equations.
Consider the following example. Notice that it is the same example as the Example 7.29 but here it
will involve a longer time frame.

Example 7.32: Populations over the Long Run


Consider the migration matrix
.6 0 .1
A = .2 .8 0
.2 .2 .9
for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 in location 2 and
400 in location 4. Find the population in the three locations after a long time.

Solution. By Theorem 7.31 the steady state vector Xs can be found by solving the system (I A)Xs = 0.
254 Spectral Theory

Thus we need to find a solution to



1 0 0 .6 0 .1 x1s 0
0 1 0 .2 .8 0 x2s = 0
0 0 1 .2 .2 .9 x3s 0

The augmented matrix and the resulting reduced row-echelon form are given by

0.4 0 0.1 0 1 0 0.25 0
0.2 0.2 0 0 0 1 0.25 0
0.2 0.2 0.1 0 0 0 0 0

Therefore, the eigenvectors are


0.25
t 0.25
1
The initial vector X0 is given by
100
200
400
Now all that remains is to choose the value of t such that

0.25t + 0.25t + t = 100 + 200 + 400


1400
Solving this equation for t yields t = 3 . Therefore the population in the long run is given by

0.25 116.666 666 666 666 7
1400
0.25 = 116.666 666 666 666 7
3
1 466.666 666 666 666 7

Again, because we are working with populations, these values need to be rounded. The steady state
vector Xs is given by
117
117
466

We can see that the numbers we calculated in Example 7.29 for the populations after the 10th unit of
time are not far from the long term values.
Consider another example.
7.3. Applications of Spectral Theory 255

Example 7.33: Populations After a Long Time


Suppose a migration matrix is given by

1 1 1
5 2 5

1 1 1

A= 4 4 2

11 1 3
20 4 10

Find the comparison between the populations in the three locations after a long time.

Solution. In order to compare the populations in the long term, we want to find the steady state vector Xs .
Solve
1 1 1

5 2 5
1 0 0 1 1 1 x1s 0

0 1 0 4 4 2 x2s = 0


0 0 1 11 1 3 x3s 0
20 4 10

The augmented matrix and the resulting reduced row-echelon form are given by

4 1 1
5 2 5 0
1 0 16
0
19
14 3
1
0 18
4 2 0 1 19 0

11 1 7
0 0 0 0 0
20 4 10

and so an eigenvector is
16
18
19
18
Therefore, the proportion of population in location 2 to location 1 is given by 16 . The proportion of
19
population 3 to location 2 is given by 18 .

7.3.3.1. Eigenvalues of Markov Matrices

The following is an important proposition.

Proposition 7.34: Eigenvalues of a Migration Matrix


 
Let A = ai j be a migration matrix. Then 1 is always an eigenvalue for A.

Proof. Remember that the determinant of a matrix always equals that of its transpose. Therefore,
  
T
det (xI A) = det (xI A) = det xI AT
256 Spectral Theory

because I T = I. Thus the characteristic equation for A is the same as the characteristic equation for AT .
Consequently, A and AT have the same eigenvalues. We will show that 1 is an eigenvalue for AT and then
it will follow that 1 is an eigenvalue for A.
 
Remember that for a migration matrix, i ai j = 1. Therefore, if AT = bi j with bi j = a ji , it follows
that
bi j = a ji = 1
j j

Therefore, from matrix multiplication,



1 j bi j 1
..
AT ... = ... = .
1 j bi j 1

1

Notice that this shows that ... is an eigenvector for AT corresponding to the eigenvalue, = 1. As
1
explained above, this shows that = 1 is an eigenvalue for A because A and AT have the same eigenvalues.

7.3.4. Dynamical Systems

The migration matrices discussed above give an example of a discrete dynamical system. We call them
discrete because they involve discrete values taken at a sequence of points rather than on a continuous
interval of time.
An example of a situation which can be studied in this way is a predator prey model. Consider the
following model where x is the number of prey and y the number of predators in a certain area at a certain
time. These are functions of n N where n = 1, 2, are the ends of intervals of time which may be of
interest in the problem. In other words, x (n) is the number of prey at the end of the nth interval of time.
An example of this situation may be modeled by the following equation
    
x (n + 1) 2 3 x (n)
=
y (n + 1) 1 4 y (n)

This says that from time period n to n + 1, x increases if there are more x and decreases as there are more
y. In the context of this example, this means that as the number of predators increases, the number of prey
decreases. As for y, it increases if there are more y and also if there are more x.
This is an example of a matrix recurrence which we define now.
7.3. Applications of Spectral Theory 257

Definition 7.35: Matrix Recurrence


Suppose a dynamical system is given by

xn+1 = axn + byn


yn+1 = cxn + dyn
   
xn a b
This system can be expressed as Vn+1 = AVn where Vn = and A = .
yn c d

In this section, we will examine how to find solutions to a dynamical system given certain initial
conditions. This process involves several concepts previously studied, including matrix diagonalization
and Markov matrices. The procedure is given as follows. Recall that when diagonalized, we can write
An = PDn P1 .

Procedure 7.36: Solving a Dynamical System


Suppose a dynamical system is given by

xn+1 = axn + byn


yn+1 = cxn + dyn

Given initial conditions x0 and y0 , the solutions to the system are found as follows:

1. Express the dynamical system in the form Vn+1 = AVn.

2. Diagonalize A to be written as A = PDP1 .

3. Then Vn = PDn P1V0 where V0 is the vector containing the initial conditions.

4. If given specific values for n, substitute into this equation. Otherwise, find a general solution
for n.

We will now consider an example in detail.

Example 7.37: Solutions of a Discrete Dynamical System


Suppose a dynamical system is given by

xn+1 = 1.5xn 0.5yn


yn+1 = 1.0xn

Express this system as a matrix recurrence and find solutions to the dynamical system for initial
conditions x0 = 20, y0 = 10.

Solution. First, we express the system as a matrix recurrence.

Vn+1 = AVn
258 Spectral Theory

    
x (n + 1) 1.5 0.5 x (n)
=
y (n + 1) 1.0 0 y (n)

Then  
1.5 0.5
A=
1.0 0
You can verify that the eigenvalues of A are 1 and .5. By diagonalizing, we can write A in the form
   
1 1 1 1 0 2 1
P DP =
1 2 0 .5 1 1

Now given an initial condition  


x0
V0 =
y0
the solution to the dynamical system is given by
Vn = PDn P1V0
    n   
x (n) 1 1 1 0 2 1 x0
=
y (n) 1 2 0 .5 1 1 y0
    
1 1 1 0 2 1 x0
=
1 2 0 (.5)n 1 1 y0
 n n 
y0 ((.5) 1) x0 ((.5) 2)
=
y0 (2 (.5)n 1) x0 (2 (.5)n 2)

If we let n become arbitrarily large, this vector approaches


 
2x0 y0
2x0 y0

Thus for large n,    


x (n) 2x0 y0

y (n) 2x0 y0
Now suppose the initial condition is given by
   
x0 20
=
y0 10

Then, we can find solutions for various values of n. Here are the solutions for values of n between 1
and 5      
25.0 27.5 28.75
n=1: ,n = 2 : ,n = 3 :
20.0 25.0 27.5
   
29.375 29.688
n=4: ,n = 5 :
28.75 29.375
Notice that as n increases, we approach the vector given by
     
2x0 y0 2 (20) 10 30
= =
2x0 y0 2 (20) 10 30

These solutions are graphed in the following figure.


7.3. Applications of Spectral Theory 259

y
29
28
27
x
28 29 30


The following example demonstrates another system which exhibits some interesting behavior. When
we graph the solutions, it is possible for the ordered pairs to spiral around the origin.

Example 7.38: Finding Solutions to a Dynamical System


Suppose a dynamical system is of the form
    
x (n + 1) 0.7 0.7 x (n)
=
y (n + 1) 0.7 0.7 y (n)

Find solutions to the dynamical system for given initial conditions.

Solution. Let  
0.7 0.7
A=
0.7 0.7
To find solutions, we must diagonalize A. You can verify that the eigenvalues of A are complex and are
given by 1 = .7 + .7i and 2 = .7 .7i. The eigenvector for 1 = .7 + .7i is
 
1
i
and that the eigenvector for 2 = .7 .7i is  
1
i
Thus the matrix A can be written in the form

   1
12 i
1 1 .7 + .7i 0 2
1 1

i i 0 .7 .7i 2 2i

and so,
Vn = PDn P1V0

    n  1
12 i  
x (n) 1 1 (.7 + .7i) 0 2 x0
=
y (n) i i 0 (.7 .7i)n 1
2
1
2i
y0

The explicit solution is given by


  n 
x0 12 (0.7 0.7i)n + 12 (0.7 + 0.7i)n  + y0 1 n
2 i (0.7 0.7i)
1
i (0.7 + 0.7i)
n
2
y0 12 (0.7 0.7i)n + 12 (0.7 + 0.7i)n x0 1 n
2 i (0.7 0.7i)
1
2 i (0.7 + 0.7i)
260 Spectral Theory

Suppose the initial condition is    


x0 10
=
y0 10
Then one obtains the following sequence of values which are graphed below by letting n = 1, 2, , 20

In this picture, the dots are the values and the dashed line is to help to picture what is happening.
These points are getting gradually closer to theorigin, but they are circling
  the origin in the clockwise
x (n) 0
direction as they do so. As n increases, the vector approaches
y (n) 0
This type of behavior along with complex eigenvalues is typical of the deviations from an equilibrium
point in the Lotka Volterra system of differential equations which is a famous model for predator-prey
interactions. These differential equations are given by

x = x (a by)
y = y (c dx)

where a, b, c, d are positive constants. For example, you might have X be the population of moose and Y
the population of wolves on an island.
Note that these equations make logical sense. The top says that the rate at which the moose population
increases would be aX if there were no predators Y . However, this is modified by multiplying instead
by (a bY ) because if there are predators, these will militate against the population of moose. The more
predators there are, the more pronounced is this effect. As to the predator equation, you can see that the
equations predict that if there are many prey around, then the rate of growth of the predators would seem
to be high. However, this is modified by the term cY because if there are many predators, there would
be competition for the available food supply and this would tend to decrease Y .
The behavior near an equilibrium point, which is a point where the right side of the differential equa-
tions equals zero, is of great interest. In this case, the equilibrium point is
c a
x = ,y =
d b
Then one defines new variables according to the formula
c a
x+ = x, y = y +
d b
7.3. Applications of Spectral Theory 261

In terms of these new variables, the differential equations become


 c   a 
x = x + ab y+
 d a  b c 
y = y + cd x+
b d
Multiplying out the right sides yields
c
x = bxy b y
d
a
y = dxy + dx
b
The interest is for x, y small and so these equations are essentially equal to
c a
x = b y, y = dx
d b
x(t+h)x(t)
Replace x with the difference quotient h where h is a small positive number and y with a
similar difference quotient. For example one could have h correspond to one day or even one hour. Thus,
for h small enough, the following would seem to be a good approximation to the differential equations.
c
x (t + h) = x (t) hb y
d
a
y (t + h) = y (t) + h dx
b
Let 1, 2, 3, denote the ends of discrete intervals of time having length h chosen above. Then the above
equations take the form
  " hbc
# 
x (n + 1) 1 d x (n)
= had
y (n + 1) b 1 y (n)
Note that the eigenvalues of this matrix are always complex.
We are not interested in time intervals of length h for h very small. Instead, we are interested in much
longer lengths of time. Thus, replacing the time interval with mh,
  " #m  
x (n + m) 1 hbc d x (n)
= had
y (n + m) b 1 y (n)

For example, if m = 2, you would have


  " # 
x (n + 2) 1 ach2 2b dc h x (n)
=
y (n + 2) 2 ab dh 1 ach2 y (n)

Note that most of the time, the eigenvalues of the new matrix will be complex.
You can also notice that the upper right corner will be negative by considering higher powers of the
matrix. Thus letting 1, 2, 3, denote the ends of discrete intervals of time, the desired discrete dynamical
system is of the form     
x (n + 1) a b x (n)
=
y (n + 1) c d y (n)
262 Spectral Theory

where a, b, c, d are positive constants and the matrix will likely have complex eigenvalues because it is a
power of a matrix which has complex eigenvalues.
You can see from the above discussion that if the eigenvalues of the matrix used to define the dynamical
system are less than 1 in absolute value, then the origin is stable in the sense that as n , the solution
converges to the origin. If either eigenvalue is larger than 1 in absolute value, then the solutions to the
dynamical system will usually be unbounded, unless the initial condition is chosen very carefully. The
next example exhibits the case where one eigenvalue is larger than 1 and the other is smaller than 1.
The following example demonstrates a familiar concept as a dynamical system.

Example 7.39: The Fibonacci Sequence


The Fibonacci sequence is the sequence given by

1, 1, 2, 3, 5,

which is defined recursively in the form

x (0) = 1 = x (1) , x (n + 2) = x (n + 1) + x (n)

Show how the Fibonacci Sequence can be considered a dynamical system.

Solution. This sequence is extremely important in the study of reproducing rabbits. It can be considered
as a dynamical system as follows. Let y (n) = x (n + 1) . Then the above recurrence relation can be written
as         
x (n + 1) 0 1 x (n) x (0) 1
= , =
y (n + 1) 1 1 y (n) y (0) 1
Let  
0 1
A=
1 1

The eigenvalues of the matrix A are 1 = 21 12 5 and 2 = 21 5 + 12 . The corresponding eigenvectors
are, respectively, " 1 # " 1 #
2 5 12 2 5 1
2
X1 = , X2 =
1 1
You can see from a short computation that one of the eigenvalues is smaller than 1 in absolute value
while the other is larger than 1 in absolute value. Now, diagonalizing A gives us
1 1
5 12 12 5 12   1 5 1 1 5 1
2 0 1 2 2 2 2

1 1
1 1 1 1
1
2 5 + 12 0
= 1

0 2 12 5
7.3. Applications of Spectral Theory 263

Then it follows that for a given initial condition, the solution to this dynamical system is of the form
1  n
  5 12 12 5 12 1 1
x (n) 2 5 + 0
= 2 2
 n
y (n) 1 1
1 1 0 22 5
1 1
1
5 5 10 5 + 2  
  1
1 1 1
5 5 5 5 2 5 21 1

It follows that
 n      
1 1 1 1 1 1 n 1 1
x (n) = 5+ 5+ + 5 5
2 2 10 2 2 2 2 10

Here is a picture of the ordered pairs (x (n) , y (n)) for n = 0, 1, , n.

There is so much more that can be said about dynamical systems. It is a major topic of study in
differential equations and what is given above is just an introduction.

Exercises
 
1 2
Exercise 7.3.33 Let A = . Diagonalize A to find A10 .
2 1

1 4 1
Exercise 7.3.34 Let A = 0 2 5 . Diagonalize A to find A50 .
0 0 5

1 2 1
Exercise 7.3.35 Let A = 2 1 1 . Diagonalize A to find A100 .
2 3 1
264 Spectral Theory

Exercise 7.3.36 The following is a Markov (migration) matrix for three locations

7 1 1
10 9 5

1 7 2

10 9 5

1 1 2
5 9 5

(a) Initially, there are 90 people in location 1, 81 in location 2, and 85 in location 3. How many are in
each location after one time period?

(b) The total number of individuals in the migration process is 256. After a long time, how many are in
each location?

Exercise 7.3.37 The following is a Markov (migration) matrix for three locations

1 1 2
5 5 5

2 2 1

5 5 5

2 2 2
5 5 5

(a) Initially, there are 130 individuals in location 1, 300 in location 2, and 70 in location 3. How many
are in each location after two time periods?

(b) The total number of individuals in the migration process is 500. After a long time, how many are in
each location?

Exercise 7.3.38 The following is a Markov (migration) matrix for three locations

3 3 1
10 8 3

1 3 1

10 8 3

3 1 1
5 4 3

The total number of individuals in the migration process is 480. After a long time, how many are in each
location?

Exercise 7.3.39 The following is a Markov (migration) matrix for three locations

3 1 1
10 3 5

3 1 7

10 3 10

2 1 1
5 3 10
7.3. Applications of Spectral Theory 265

The total number of individuals in the migration process is 1155. After a long time, how many are in each
location?

Exercise 7.3.40 The following is a Markov (migration) matrix for three locations

2 1 1
5 10 8

3 2 5

10 5 8

3 1 1
10 2 4

The total number of individuals in the migration process is 704. After a long time, how many are in each
location?

Exercise 7.3.41 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by
0.1 0.3 0.7
0.1 0.3 0.2
0.8 0.4 0.1
If the walker starts in location 2, what is the probability of ending back in location 2 at time n = 3?

Exercise 7.3.42 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by
0.5 0.1 0.6
0.2 0.9 0.2
0.3 0 0.2
It is unknown where the walker starts, but the probability of starting in each location is given by

0.2
X0 = 0.25
0.55

What is the probability of the walker being in location 1 at time n = 2?

Exercise 7.3.43 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.

SE NE NW SW
1 1 1 1
SE 3 10 10 5
1 7 1 1
NE 3 10 5 10
2 1 3 1
NW 9 10 5 5
1 1 1 1
SW 9 10 10 2
266 Spectral Theory

In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 413?

Exercise 7.3.44 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.
SE NE NW SW
1 1 1 1
SE 7 4 10 5
2 1 1 1
NE 7 4 5 10
1 1 3 1
NW 7 4 5 5
3 1 1 1
SW 7 4 10 2

In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 1469.

Exercise 7.3.45 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
next day with probability 51 . If it starts off sunny, it ends up sunny the next day with probability 52 and so
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 1
sunny 5 5 3
3 2 1
p.c. 5 5 3

Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?

Exercise 7.3.46 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
1
next day with probability 10 . If it starts off sunny, it ends up sunny the next day with probability 25 and so
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 4
sunny 10 5 9
7 2 2
p.c. 10 5 9

Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?

Exercise 7.3.47 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
7.3. Applications of Spectral Theory 267

locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.

SE NE NW SW
5 1 1 1
SE 11 10 10 5
1 7 1 1
NE 11 10 5 10
2 1 3 1
NW 11 10 5 5
3 1 1 1
SW 11 10 10 2

In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 407?

Exercise 7.3.48 The University of Poohbah offers three degree programs, scouting education (SE), dance
appreciation (DA), and engineering (E). It has been determined that the probabilities of transferring from
one program to another are as in the following table.

SE DA E
SE .8 .1 .3
DA .1 .7 .5
E .1 .2 .2

where the number indicates the probability of transferring from the top program to the program on the
left. Thus the probability of going from DA to E is .2. Find the probability that a student is enrolled in the
various programs.

Exercise 7.3.49 In the city of Nabal, there are three political persuasions, republicans (R), democrats (D),
and neither one (N). The following table shows the transition probabilities between the political parties,
the top row being the initial political party and the side row being the political affiliation the following
year.
R D N
R 15 61 27
1 1 4
D 5 3 7
3 1 1
N 5 2 7

Find the probabilities that a person will be identified with the various political persuasions. Which party
will end up being most important?

Exercise 7.3.50 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
next day with probability 51 . If it starts off sunny, it ends up sunny the next day with probability 72 and so
268 Spectral Theory

forth.
rains sunny p.c.
1 2 5
rains 5 7 9
1 2 1
sunny 5 7 3
3 3 1
p.c. 5 7 9

Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?
A. Some Prerequisite Topics
The topics presented in this section are important concepts in mathematics and therefore should be exam-
ined.

A.1 Sets and Set Notation

A set is a collection of things called elements. For example {1, 2, 3, 8} would be a set consisting of the
elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it is customary to write 3 {1, 2, 3, 8} .
We can also indicate when an element is not in a set, by writing 9 / {1, 2, 3, 8} which says that 9 is not an
element of {1, 2, 3, 8} . Sometimes a rule specifies a set. For example you could specify a set as all integers
larger than 2. This would be written as S = {x Z : x > 2} . This notation says: S is the set of all integers,
x, such that x > 2.
Suppose A and B are sets with the property that every element of A is an element of B. Then we
say that A is a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} . In symbols, we write
{1, 2, 3, 8} {1, 2, 3, 4, 5, 8} . It is sometimes said that A is contained in B" or even B contains A". The
same statement about the two sets may also be written as {1, 2, 3, 4, 5, 8} {1, 2, 3, 8}.
We can also talk about the union of two sets, which we write as A B. This is the set consisting of
everything which is an element of at least one of the sets, A or B. As an example of the union of two sets,
consider {1, 2, 3, 8} {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8}. This set is made up of the numbers which are in at least
one of the two sets.
In general
A B = {x : x A or x B}
Notice that an element which is in both A and B is also in the union, as well as elements which are in only
one of A or B.
Another important set is the intersection of two sets A and B, written A B. This set consists of
everything which is in both of the sets. Thus {1, 2, 3, 8} {3, 4, 7, 8} = {3, 8} because 3 and 8 are those
elements the two sets have in common. In general,

A B = {x : x A and x B}

If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus

A \ B = {x A : x
/ B}

For example, if A = {1, 2, 3, 8} and B = {3, 4, 7, 8}, then A \ B = {1, 2, 3, 8} \ {3, 4, 7, 8} = {1, 2}.
A special set which is very important in mathematics is the empty set denoted by 0. / The empty set, 0,
/
is defined as the set which has no elements in it. It follows that the empty set is a subset of every set. This

269
270 Some Prerequisite Topics

is true because if it were not so, there would have to exist a set A, such that 0/ has something in it which is
not in A. However, 0/ has nothing in it and so it must be that 0/ A.
We can also use brackets to denote sets which are intervals of numbers. Let a and b be real numbers.
Then

[a, b] = {x R : a x b}

[a, b) = {x R : a x < b}

(a, b) = {x R : a < x < b}

(a, b] = {x R : a < x b}

[a, ) = {x R : x a}

(, a] = {x R : x a}

These sorts of sets of real numbers are called intervals. The two points a and b are called endpoints,
or bounds, of the interval. In particular, a is the lower bound while b is the upper bound of the above
intervals, where applicable. Other intervals such as (, b) are defined by analogy to what was just
explained. In general, the curved parenthesis, (, indicates the end point is not included in the interval,
while the square parenthesis, [, indicates this end point is included. The reason that there will always be
a curved parenthesis next to or is that these are not real numbers and cannot be included in the
interval in the way a real number can.
To illustrate the use of this notation relative to intervals consider three examples of inequalities. Their
solutions will be written in the interval notation just described.

Example A.1: Solving an Inequality


Solve the inequality 2x + 4 x 8.

Solution. We need to find x such that 2x + 4 x 8. Solving for x, we see that x 12 is the answer.
This is written in terms of an interval as (, 12].
Consider the following example.

Example A.2: Solving an Inequality


Solve the inequality (x + 1) (2x 3) 0.

Solution. We need to find x such that (x + 1) (2x 3) 0. The solution is given by x 1 or x 32 .


Therefore, x which fit into either of these intervals gives a solution. In terms of set notation this is denoted
by (, 1] [ 32 , ).

Consider one last example.


A.2. Well Ordering and Induction 271

Example A.3: Solving an Inequality


Solve the inequality x (x + 2) 4.

Solution. This inequality is true for any value of x where x is a real number. We can write the solution as
R or (, ) .
In the next section, we examine another important mathematical concept.

A.2 Well Ordering and Induction


j
We begin this section with some important notation. Summation notation, written i=1 i, represents a sum.
Here, i is called the index of the sum, and we add iterations until i = j. For example,
j
i = 1+2++ j
i=1

Another example:
3
a11 + a12 + a13 = a1i
i=1

The following notation is a specific use of summation notation.

Notation A.4: Summation Notation


Let ai j be real numbers, and suppose 1 i r while 1 j s. These numbers can be listed in a
rectangular array as given by
a11 a12 a1s
a21 a22 a2s
. . .
.. .. ..
ar1 ar2 ars
Then sj=1 ri=1 ai j means to first sum the numbers in each column (using i as the index) and then to
add the sums which result (using j as the index). Similarly, ri=1 sj=1 ai j means to sum the vectors
in each row (using j as the index) and then to add the sums which result (using i as the index).

Notice that since addition is commutative, sj=1 ri=1 ai j = ri=1 sj=1 ai j .


We now consider the main concept of this section. Mathematical induction and well ordering are two
extremely important principles in math. They are often used to prove significant things which would be
hard to prove otherwise.

Definition A.5: Well Ordered


A set is well ordered if every nonempty subset S, contains a smallest element z having the property
that z x for all x S.
272 Some Prerequisite Topics

In particular, the set of natural numbers defined as

N = {1, 2, }

is well ordered.
Consider the following proposition.

Proposition A.6: Well Ordered Sets


Any set of integers larger than a given number is well ordered.

This proposition claims that if a set has a lower bound which is a real number, then this set is well
ordered.
Further, this proposition implies the principle of mathematical induction. The symbol Z denotes the
set of all integers. Note that if a is an integer, then there are no integers between a and a + 1.

Theorem A.7: Mathematical Induction


A set S Z, having the property that a S and n + 1 S whenever n S, contains all integers x Z
such that x a.

Proof. Let T consist of all integers larger than or equal to a which are not in S. The theorem will be proved
if T = 0.
/ If T 6= 0/ then by the well ordering principle, there would have to exist a smallest element of T ,
denoted as b. It must be the case that b > a since by definition, a
/ T . Thus b a + 1, and so b 1 a and
b1 / S because if b 1 S, then b 1 + 1 = b S by the assumed property of S. Therefore, b 1 T
which contradicts the choice of b as the smallest element of T . (b 1 is smaller.) Since a contradiction is
obtained by assuming T 6= 0, / it must be the case that T = 0/ and this says that every integer at least as large
as a is also in S.
Mathematical induction is a very useful device for proving theorems about the integers. The procedure
is as follows.

Procedure A.8: Proof by Mathematical Induction


Suppose Sn is a statement which is a function of the number n, for n = 1, 2, , and we wish to show
that Sn is true for all n 1. To do so using mathematical induction, use the following steps.

1. Base Case: Show S1 is true.

2. Assume Sn is true for some n, which is the induction hypothesis. Then, using this assump-
tion, show that Sn+1 is true.

Proving these two steps shows that Sn is true for all n = 1, 2, .

We can use this procedure to solve the following examples.


A.2. Well Ordering and Induction 273

Example A.9: Proving by Induction


n (n + 1) (2n + 1)
Prove by induction that nk=1 k2 = .
6

Solution. By Procedure A.8, we first need to show that this statement is true for n = 1. When n = 1, the
statement says that
1
1 (1 + 1) (2(1) + 1)
k2 =
6
k=1
6
=
6
= 1

The sum on the left hand side also equals 1, so this equation is true for n = 1.
Now suppose this formula is valid for some n 1 where n is an integer. Hence, the following equation
is true.
n
n (n + 1) (2n + 1)
k2 = 6
(1.1)
k=1
We want to show that this is true for n + 1.
Suppose we add (n + 1)2 to both sides of equation 1.1.
n+1 n
k2 = k2 + (n + 1)2
k=1 k=1
n (n + 1) (2n + 1)
= + (n + 1)2
6
The step going from the first to the second line is based on the assumption that the formula is true for n.
Now simplify the expression in the second line,
n (n + 1) (2n + 1)
+ (n + 1)2
6
This equals  
n (2n + 1)
(n + 1) + (n + 1)
6
and
n (2n + 1) 6 (n + 1) + 2n2 + n (n + 2) (2n + 3)
+ (n + 1) = =
6 6 6
Therefore,
n+1
(n + 1) (n + 2) (2n + 3) (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
k2 = 6
=
6
k=1
showing the formula holds for n + 1 whenever it holds for n. This proves the formula by mathematical
induction. In other words, this formula is true for all n = 1, 2, .
Consider another example.
274 Some Prerequisite Topics

Example A.10: Proving an Inequality by Induction


1 3 2n 1 1
Show that for all n N, < .
2 4 2n 2n + 1

Solution. Again we will use the procedure given in Procedure A.8 to prove that this statement is true for
all n. Suppose n = 1. Then the statement says
1 1
<
2 3
which is true.
Suppose then that the inequality holds for n. In other words,
1 3 2n 1 1
<
2 4 2n 2n + 1
is true.
2n+1
Now multiply both sides of this inequality by 2n+2 . This yields

1 3 2n 1 2n + 1 1 2n + 1 2n + 1
< =
2 4 2n 2n + 2 2n + 1 2n + 2 2n + 2
1
The theorem will be proved if this last expression is less than . This happens if and only if
2n + 3
 2
1 1 2n + 1
= >
2n + 3 2n + 3 (2n + 2)2

which occurs if and only if (2n + 2)2 > (2n + 3) (2n + 1) and this is clearly true which may be seen from
expanding both sides. This proves the inequality.
Lets review the process just used. If S is the set of integers at least as large as 1 for which the formula
holds, the first step was to show 1 S and then that whenever n S, it follows n + 1 S. Therefore, by
the principle of mathematical induction, S contains [1, ) Z, all positive integers. In doing an inductive
proof of this sort, the set S is normally not mentioned. One just verifies the steps above.
B. Selected Exercise Answers
x + 3y = 1  10 1

1.1.1 , Solution is: x = 13 , y = 13 .
4x y = 3

3x + y = 3
1.1.2 , Solution is: [x = 1, y = 0]
x + 2y = 1

x + 3y = 1  10 1

1.2.4 , Solution is: x = 13 , y = 13
4x y = 3

x + 3y = 1  10 1

1.2.5 , Solution is: x = 13 , y = 13
4x y = 3

x + 2y = 1  
1.2.6 2x y = 1 , Solution is: x = 35 , y = 15
4x + 3y = 3

1.2.7
No solution exists. You can see this
by writing the
augmented matrix and doing row operations.
1 1 3 2 1 0 4 0
2 1 1 1 , row echelon form: 0 1 7 0 . Thus one of the equations says 0 = 1 in an
3 2 2 0 0 0 0 1
equivalent system of equations.

4g I = 150
4I 17g = 660
1.2.8 , Solution is : {g = 60, I = 90, b = 200, s = 50}
4g + s = 290
g+I +sb = 0

1.2.9 The solution exists but is not unique.

1.2.10 A solution exists and is unique.

1.2.12 There might be a solution. If so, there are infinitely many.

1.2.13 No. Consider x + y + z = 2 and x + y + z = 1.

1.2.14 These can have a solution. For example, x + y = 1, 2x + 2y = 2, 3x + 3y = 3 even has an infinite set
of solutions.

1.2.15 h = 4

275
276 Selected Exercise Answers

1.2.16 Any h will work.

1.2.17 Any h will work.

1.2.18 If h 6= 2 there will be a unique solution for any k. If h = 2 and k 6= 4, there are no solutions. If h = 2
and k = 4, then there are infinitely many solutions.

1.2.19 If h 6= 4, then there is exactly one solution. If h = 4 and k 6= 4, then there are no solutions. If h = 4
and k = 4, then there are infinitely many solutions.

1.2.20 There is no solution. The system is inconsistent.


You can see thisfrom the augmented matrix.
1 0 0 1
1 2 1 1 2 3 0
1 1
1 1 1 , reduced row-echelon form: 0 1 0 23 0
2 .
1 1 0 1 0 0 1 0 0
4 2 1 0 5 0 0 0 0 1
 
1.2.21 Solution is: w = 32 y 1, x = 23 12 y, z = 13

1.2.22 (a) This one is not.

(b) This one is.

(c) This one is.


1 1

1 0 2 2

1.2.31 The reduced row-echelon form is 0 1 14 3
4
. Therefore, the solution is of the form z =

0 0 0 0
3 1
 1 1
t, y = 4 + t 4 , x = 2 2 t where t R.
 
1 0 4 2
1.2.32 The reduced row-echelon form is and so the solution is z = t, y = 4t, x = 2 4t.
0 1 4 1

1 0 0 0 9 3
0 1 0 0 4 0
1.2.33 The reduced row-echelon form is
0
and so x5 = t, x4 = 1 6t, x3 = 1 +
0 1 0 7 1
0 0 0 1 6 1
7t, x2 = 4t, x1 = 3 9t.

1 0 2 0 12 5
2

0 1 0 0 1 3
2 2
1.2.34 The reduced row-echelon form is . Therefore, let x5 = t, x3 = s. Then
0 0 0 1 3 1
2
2
0 0 0 0 0 0
the other variables are given by x4 = 2 2 t, x2 = 2 t 2 , , x1 = 2 + 12 t 2s.
1 3 3 1 5
277

1.2.35 Solution is: [x = 1 2t, z = 1, y = t]

1.2.36 Solution is: [x = 2 4t, y = 8t, z = t]

1.2.37 Solution is: [x = 1, y = 2, z = 1]

1.2.38 Solution is: [x = 2, y = 4, z = 5]

1.2.39 Solution is: [x = 1, y = 2, z = 5]

1.2.40 Solution is: [x = 1, y = 5, z = 4]

1.2.41 Solution is: [x = 2t + 1, y = 4t, z = t]

1.2.42 Solution is: [x = 1, y = 5, z = 3]

1.2.43 Solution is: [x = 4, y = 4, z = 2]

1.2.44 No. Consider x + y + z = 2 and x + y + z = 1.

1.2.45 No. This would lead to 0 = 1.

1.2.46 Yes. It has a unique solution.

1.2.47 The last column must not be a pivot column. The remaining columns must each be pivot columns.

1
4 (20 + 30 + w + x) y = 0
1
1.2.48 You need 4 (y + 30 + 0 + z) w = 0 , Solution is: [w = 15, x = 15, y = 20, z = 10] .
1
4 (20 + y + z + 10) x = 0
1
4 (x + w + 0 + 10) z = 0

1.2.60 It is because you cannot have more than min (m, n) nonzero rows in the reduced row-echelon form.
Recall that the number of pivot columns is the same as the number of nonzero rows from the description
of this reduced row-echelon form.

1.2.61 (a) This says B is in the span of four of the columns. Thus the columns are not independent.
Infinite solution set.

(b) This surely cant happen. If you add in another column, the rank does not get smaller.

(c) This says B is in the span of the columns and the columns must be independent. You cant have the
rank equal 4 if you only have two columns.

(d) This says B is not in the span of the columns. In this case, there is no solution to the system of
equations represented by the augmented matrix.

(e) In this case, there is a unique solution since the columns of A are independent.
278 Selected Exercise Answers

1.2.62 These are not legitimate row operations. They do not preserve the solution set of the system.

2.1.3 To get A, just replace every entry of A with its additive inverse. The 0 matrix is the one which has
all zeros in it.

2.1.5 Suppose B also works. Then

A = A + (A + B) = (A + A) + B = 0 + B = B

2.1.6 Suppose 0 also works. Then 0 = 0 + 0 = 0.

2.1.7 0A = (0 + 0) A = 0A + 0A. Now add (0A) to both sides. Then 0 = 0A.

2.1.8 A + (1) A = (1 + (1)) A = 0A = 0. Therefore, from the uniqueness of the additive inverse proved
in the above Problem 2.1.5, it follows that A = (1) A.
 
3 6 9
2.1.9 (a)
6 3 21
 
8 5 3
(b)
11 5 4
(c) Not possible
 
3 3 4
(d)
6 1 7
(e) Not possible

(f) Not possible



3 6
2.1.10 (a) 9 6
3 3
(b) Not possible.

11 2
(c) 13 6
4 2
(d) Not possible.

7
(e) 9
2
(f) Not possible.

(g) Not possible.


279

 
2
(h)
5

3 0 4
2.1.11 (a) 4 1 6
5 1 6
 
1 2
(b)
2 3
(c) Not possible

4 6
(d) 5 3
1 2
 
8 1 3
(e)
7 6 6

2.1.12
    
1 1 x y x z w y
=
3 3 z w 3x + 3z 3w + 3y
 
0 0
=
0 0
 
x y
Solution is: w = y, x = z so the matrices are of the form .
x y

0 1 2
2.1.13 X T Y = 0 1 2 , XY T = 1
0 1 2

2.1.14
    
1 2 1 2 7 2k + 2
=
3 4 3 k 15 4k + 6
    
1 2 1 2 7 10
=
3 k 3 4 3k + 3 4k + 6

3k + 3 = 15
Thus you must have , Solution is: [k = 4]
2k + 2 = 10

2.1.15
    
1 2 1 2 3 2k + 2
=
3 4 1 k 7 4k + 6
    
1 2 1 2 7 10
=
1 k 3 4 3k + 1 4k + 2
However, 7 6= 3 and so there is no possible choice of k which will make these matrices commute.
280 Selected Exercise Answers

     
1 1 1 1 2 2
2.1.16 Let A = ,B = ,C = .
1 1 1 1 2 2
    
1 1 1 1 0 0
=
1 1 1 1 0 0
    
1 1 2 2 0 0
=
1 1 2 2 0 0
   
1 1 1 1
2.1.18 Let A = ,B = .
1 1 1 1
    
1 1 1 1 0 0
=
1 1 1 1 0 0
   
0 1 1 2
2.1.20 Let A = ,B = .
1 0 3 4
    
0 1 1 2 3 4
=
1 0 3 4 1 2
    
1 2 0 1 2 1
=
3 4 1 0 4 3

1 1 2 0
1 0 2 0
2.1.21 A =
0

0 3 0
1 3 0 3

1 3 2 0
1 0 2 0
2.1.22 A =
0

0 6 0
1 3 0 1

1 1 1 0
1 1 2 0
2.1.23 A =
1

0 1 0
1 0 0 3

2.1.26 (a) Not necessarily true.

(b) Not necessarily true.

(c) Not necessarily true.

(d) Necessarily true.

(e) Necessarily true.


281

(f) Not necessarily true.

(g) Not necessarily true.


 
3 9 3
2.1.27 (a)
6 6 3
 
5 18 5
(b)
11 4 4
 
(c) 7 1 5
 
1 3
(d)
3 9

13 16 1
(e) 16 29 8
1 8 5
 
5 7 1
(f)
5 15 5
(g) Not possible.

1

2.1.28 Show that 2 AT + A is symmetric and then consider using this as one of the matrices. A =
A+AT AAT
2 + 2 .

2.1.29 If A is symmetric then A = AT . It follows that aii = aii and so each aii = 0.

2.1.31 (Im A)i j j ik Ak j = Ai j

2.1.32 Yes B = C. Multiply AB = AC on the left by A1 .



1 0 0
2.1.34 A = 0 1 0
0 0 1

 1 3
17
2 1 7
2.1.35 = 1 2

1 3 7 7

 1 " #
0 1 35 1
5
2.1.36 =
5 3 1 0

 1 " 1
#
2 1 0 3
2.1.37 =
3 0 1 32
282 Selected Exercise Answers

 1 " #
1
2 1 1 2
2.1.38 does not exist. The reduced row-echelon form of this matrix is
4 2 0 0

 1  d b

a b adbc adbc
2.1.39 = c a
c d adbc adbc

1
1 2 3 2 4 5
2.1.40 2 1 4 = 0 1 2
1 0 2 1 2 3

1 2 0 3

1 0 3
1 2
2.1.41 2 3 4 = 0 3 3
1 0 2 1 0 1
5

1 0 3
2
2.1.42 The reduced row-echelon form is 0 1 3 . There is no inverse.
0 0 0

1 1 1
1
1
2 2 2
1 2 0 2 1 1 5
1 3
2 2 2
1 2 0
2.1.43
2
=
1 3 2 1 0 0 1

1 2 1 2 2 3 1 9
4 4 4


x 1
2
2.1.45 (a) y = 3
z 0

x 12
(b) y = 1
z 5

x 3c 2a
y = 1b 2c
3 3
z ac

2.1.46 Multiply both sides of AX = B on the left by A1 .

2.1.47 Multiply on both sides on the left by A1 . Thus



0 = A1 0 = A1 (AX ) = A1 A X = IX = X

2.1.48 A1 = A1 I = A1 (AB) = A1 A B = IB = B.
283

T
2.1.49 You need to show that A1 acts like the inverse of AT because from uniqueness in the above
problem, this will imply it is the inverse. From properties of the transpose,
T T
AT A1 = A1 A = IT = I
 T T
A1 AT = AA1 = IT = I
T 1
Hence A1 = AT and this last matrix exists.
 
2.1.50 (AB) B1 A1 = A BB1 A1 = AA1 = I B1 A1 (AB) = B1 A1 A B = B1 IB = B1 B = I

2.1.51 The proof of this exercise follows from the previous one.
2 2
2.1.52 A2 A1 = AAA1 A1 = AIA1 = AA1 = I A1 A2 = A1 A1 AA = A1 IA = A1 A = I

1
2.1.53 A1 A = AA1 = I and so by uniqueness, A1 = A.

3.1.3 (a) The answer is 31.

(b) The answer is 375.

(c) The answer is 2.

3.1.4
1 2 1

2 1 3 =6

2 1 1

3.1.5
1 2 1

1 0 1 =2

2 1 1

3.1.6
1 2 1

2 1 3 =6

2 1 1

3.1.7
1 0 0 1

2 1 1 0
= 4
0 0 0 2

2 1 3 1

3.1.9 It does not change the determinant. This was just taking the transpose.

3.1.10 In this case two rows were switched and so the resulting determinant is 1 times the first.
284 Selected Exercise Answers

3.1.11 The determinant is unchanged. It was just the first row added to the second.

3.1.12 The second row was multiplied by 2 so the determinant of the result is 2 times the original deter-
minant.

3.1.13 In this case the two columns were switched so the determinant of the second is 1 times the
determinant of the first.

3.1.14 If the determinant is nonzero, then it will remain nonzero with row operations applied to the matrix.
However, by assumption, you can obtain a row of zeros by doing row operations. Thus the determinant
must have been zero after all.

3.1.15 det (aA) = det (aIA) = det (aI)det (A) = an det (A) . The matrix which has a down the main diagonal
has determinant equal to an .

3.1.16   
1 2 1 2
det = 8
3 4 5 6
   
1 2 1 2
det det = 2 4 = 8
3 4 5 6
   
1 0 1 0
3.1.17 This is not true at all. Consider A = ,B = .
0 1 0 1

3.1.18 It must be 0 because 0 = det (0) = det Ak = (det (A))k .
 
3.1.19 You would need det AAT = det (A) det AT = det (A)2 = 1 and so det (A) = 1, or 1.
  
3.1.20 det (A) = det S1 BS = det S1 det (B) det (S) = det (B) det S1 S = det (B).

1 1 2
3.1.21 (a) False. Consider 1 5 4
0 3 3
(b) True.

(c) False.

(d) False.

(e) True.

(f) True.

(g) True.

(h) True.
285

(i) True.
(j) True.

3.1.22
1 2 1

2 3 2 = 6

4 1 2

3.1.23
2 1 3

2 4 2 = 32

1 4 5

3.1.24 One can row reduce this using only row operation 3 to

1 2 1 2
0 5 5 3

0 0 2 9
5
0 0 0 63
10

and therefore, the determinant is 63.



1 2 1 2

3 1 2 3
= 63
1 0 3 1

2 3 2 2

3.1.25 One can row reduce this using only row operation 3 to

1 4 1 2
0 10 5 3

0 0 2 19
5
0 0 0 211
20

Thus the determinant is given by


1 4 1 2

3 2 2 3
= 211
1 0 3 3

2 1 2 2

1 2 3
3.2.26 det 0 2 1 = 13 and so it has an inverse. This inverse is
3 1 0

2 1 0 2 T
0 1
1 0 3 0 3 1 T
1 3 6
1
2 3
1 3


1 2 =
1
3 9 5
13 1 0 3 0 3 1 13
2 3 1 2 4 1 2
1 3
2 1 0 1 0 2
286 Selected Exercise Answers


1 3 4
13 13 13

3 9 1

= 13 13 13

6 5 2
13 13 13


1
27 2
T 7 7

1 2 0 1 3 6 3 1
1
3.2.27 det 0 2 1 = 7 so it has an inverse. This inverse is 7 2 1 5
= 7 7 17


3 1 1 2 1 2 6 5 2
7 7 7

3.2.28
1 3 3
det 2 4 1 = 3
0 1 1
so it has an inverse which is
1 0 3
2 1 5
3 3 3

2
3 13 23

3.2.30
1 0 3
det 1 0 1 = 2
3 1 0
and so it has an inverse. The inverse turns out to equal
1 3
2 2 0
3
9
2 2 1
1 1

2 2 0


1 1
3.2.31
(a) =1
1 2

1 2 3

(b) 0 2 1 = 15
4 1 1

1 2 1

(c) 2 3 0 =0

0 1 2

3.2.32 No. It has a nonzero determinant for all t


287

3.2.33
1 t t2
det 0 1 2t = t 3 + 2
t 0 2

and so it has no inverse when t = 3 2

3.2.34
et cosht sinht
det et sinht cosht = 0
et cosht sinht
and so this matrix fails to have a nonzero determinant at any value of t.

3.2.35
et et cost et sint
det et et cost et sint et sint + et cost = 5et 6= 0
et 2et sint 2et cost
and so this matrix is always invertible.

3.2.36 If det (A) 6= 0, then A1 exists and so you could multiply on both sides on the left by A1 and obtain
that X = 0.

3.2.37 You have 1 = det (A) det (B). Hence both A and B have inverses. Letting X be given,
A (BA I) X = (AB) AX AX = AX AX = 0
and so it follows from the above problem that (BA I)X = 0. Since X is arbitrary, it follows that BA = I.

3.2.38
et 0 0
det 0 et cost et sint = e3t .
t t t t
0 e cost e sint e cost + e sint
Hence the inverse is
T
e2t 0 0 
e3t 0 e2t cost + e2t sint e2t cost e2t sin t
0 e2t sint e2t cos (t)
t
e 0 0
= 0 et (cost + sint) (sint) et
0 et (cost sint) (cost) et

3.2.39
1
et cost sint
et sint cost
et cost sint
1 t 1 t
2e 0 2e
= 12 cost + 21 sint sint 12 sint 12 cost
1 1
2 sint 2 cost cost 12 cost 12 sint
288 Selected Exercise Answers

3.2.40 The given condition is what it takes for the determinant to be non zero. Recall that the determinant
of an upper triangular matrix is just the product of the entries on the main diagonal.

3.2.41 This follows because det (ABC) = det (A) det (B) det (C) and if this product is nonzero, then each
determinant in the product is nonzero and so each of these matrices is invertible.

3.2.42 False.

3.2.43 Solution is: [x = 1, y = 0]

3.2.44 Solution is: [x = 1, y = 0, z = 0] . For example,



1 1 1

2 2 1

1 1 1
y= =0
1 2 1

2 1 1

1 0 1

55
13
4.2.1
21
39

4.2.3
4 3 2
4 = 2 1 2
3 1 1

4.2.4 The system


4 3 2
4 = a1 1 + a2 2
4 1 1
has no solution.

1 2
2 0
4.7.16
3
= 17
1
4 3

4.7.17 This formula says that ~u ~v = k~ukk~vk cos where is the included angle between the two vectors.
Thus
k~u ~vk = k~ukk~vkk cos k k~ukk~vk
and equality holds if and only if = 0 or . This means that the two vectors either point in the same
direction or opposite directions. Hence one is a multiple of the other.
289

4.7.18 This follows from the Cauchy Schwarz inequality and the proof of Theorem 4.29 which only used
the properties of the dot product. Since this new product has the same properties the Cauchy Schwarz
inequality holds for it as well.

4.7.21 A~x ~y = k (A~x)k yk = k i Aki xi yk = i k ATik xi yk =~x AT~y

4.7.22

AB~x ~y = B~x AT~y


= ~x BT AT~y
= ~x (AB)T ~y

Since this is true for all ~x, it follows that, in particular, it holds for

~x = BT AT~y (AB)T ~y

and so from the axioms of the dot product,


   
BT AT~y (AB)T ~y BT AT~y (AB)T ~y = 0

and so BT AT~y (AB)T ~y = ~0. However, this is true for all ~y and so BT AT (AB)T = 0.
h iT h iT
3 1 1 1 4 2
4.7.23
9+1+1 1+16+4
= 3 = 0.197 39 = cos Therefore we need to solve 0.197 39 =
11 21
cos . Thus = 1.7695 radians.

4.7.24 1+4+110
1+4+49
= 0.55555 = cos Therefore we need to solve 0.55555 = cos , which gives
= 2. 031 3 radians.

5
1
14
~u~v 5 5

4.7.25 ~u~u~u = 14 2 = 7
3 15
14

1

1 2
~u~v 5
4.7.26 ~u~u~u = 10 0 = 0
3 32


1
14
h iT h iT 1
1 2 2 1 1 2 3 0 2 17
~u~v =
4.7.27 ~u~u~u= 1+4+9 3 3
14
0
0

4.7.30 No, it does not. The 0 vector has no direction. The formula for proj~0 (~w) doesnt make sense either.
290 Selected Exercise Answers

4.7.31    
~u ~v ~u ~v 1 1
~u 2
~v ~u 2
~v = k~uk2 2 (~u ~v)2 2
+ (~u ~v)2 0
k~vk k~vk k~vk k~vk2
And so
k~uk2 k~vk2 (~u ~v)2
~u~v
You get equality exactly when ~u = proj~v~u = k~vk2
~v in other words, when ~u is a multiple of ~v.

4.7.32

~w proj~v (~w) +~u proj~v (~u)


= ~w +~u (proj~v (~w) + proj~v (~u))
= ~w +~u proj~v (~w +~u)

This follows because


~u ~v ~w ~v
proj~v (~w) + proj~v (~u) = 2
~v + ~v
k~vk k~vk2
(~u + ~w) ~v
= ~v
k~vk2
= proj~v (~w +~u)
 
(vu)
~
4.7.33 (~v proj~u (~v)) ~u =~v ~u ~u
~u =~v ~u ~v ~u = 0. Therefore, ~v =~v proj~u (~v) + proj~u (~v) .
k~uk2
The first is perpendicular to ~u and the second is a multiple of ~u so it is parallel to ~u.

4.9.34 If ~a 6= ~0, then the condition says that k~a ~uk = k~ak sin = 0 for all angles . Hence ~a = ~0 after
all.

3 4 0
4.9.35 0 0 = 18 . So the area is 9.
3 2 0

3 4 1
4.9.36 1 1 = 18 . The area is given by
3 2 7
q
1 1
1 + (18)2 + 49 = 374
2 2
     
4.9.37 1 1 1 2 2 2 = 0 0 0 . The area is 0. It means the three points are on the same
line.

1 3 8
4.9.38 2 2 = 8 . The area is 8 3
3 1 8
291


1 4 6
4.9.39 0 2 = 11 . The area is 36 + 121 + 4 = 161
3 1 2
   
4.9.40 ~i ~j ~j =~k ~j = ~i. However, ~i ~j ~j = ~0 and so the cross product is not associative.

4.9.41 Verify directly from the coordinate description of the cross product that the right hand rule applies
to the vectors ~i, ~j,~k. Next verify that the distributive law holds for the coordinate description of the cross
product. This gives another way to approach the cross product. First define it in terms of coordinates and
then get the geometric properties from this. However, this approach does not yield the right hand rule
property very easily. From the coordinate description,

~a ~b ~a = i jk a j bk ai = jik a j bk ai = jik bk ai a j = ~a ~b ~a

and so ~a ~b is perpendicular to ~a. Similarly, ~a ~b is perpendicular to ~b. Now we need that



k~a ~bk2 = k~ak2 k~bk2 1 cos2 = k~ak2 k~bk2 sin2

and so k~a ~bk = k~akk~bk sin , the area of the parallelogram determined by ~a,~b. Only the right hand rule
is a little problematic. However, you can see right away from the component definition that the right hand
rule holds for each of the standard unit vectors. Thus ~i ~j =~k etc.

~i ~j ~k

1 0 0 =~k

0 1 0


1 7 5

4.9.43 1 2 6 = 113
3 2 3

4.9.44 Yes. It will involve the sum of product of integers and so it will be an integer.

4.9.45 It means that if you place them so that they all have their tails at the same point, the three will lie
in the same plane.
 
4.9.46 ~x ~a ~b = 0

4.9.48 Here [~v,~w,~z] denotes the box product. Consider the cross product term. From the above,

(~v ~w) (~w ~z) = [~v,~w,~z]~w [~w,~w,~z]~v


= [~v,~w,~z]~w

Thus it reduces to
(~u ~v) [~v,~w,~z]~w = [~v,~w,~z] [~u,~v, ~w]
292 Selected Exercise Answers

4.9.49

k~u ~vk2 = i jk u j vk irs ur vs = jr ks kr js ur vs u j vk
= u j vk u j vk uk v j u j vk = k~uk2 k~vk2 (~u ~v)2
It follows that the expression reduces to 0. You can also do the following.
k~u ~vk2 = k~uk2 k~vk2 sin2

= k~uk2 k~vk2 1 cos2
= k~uk2 k~vk2 k~uk2 k~vk2 cos2
= k~uk2 k~vk2 (~u ~v)2
which implies the expression equals 0.

4.9.50 We will show it using the summation convention and permutation symbol.
 
(~u ~v) i = ((~u ~v)i ) = i jk u j vk

= i jk uj vk + i jk uk vk = ~u ~v +~u ~v i
and so (~u ~v) = ~u ~v +~u ~v .
   
4.10.57 The velocity is the sum of two vectors. 50~i + 300
~i + ~j = 50 + 300 ~i + 300
~j. The component
2 2 2
in the direction of North is then 300
= 150 2 and the velocity relative to the ground is
2
 
300 ~ 300 ~
50 + i+ j
2 2
     
4.10.60 Velocity of plane for the first hour: 0 h 150 +i 40 0 = 40 150 . After one hour it is at
 
(40, 150). Next the velocity of the plane is 150 12 23 + 40 0 in miles per hour. After two hours
h i      
it is then at (40, 150) + 150 12 23 + 40 0 = 155 75 3 + 150 = 155.0 279. 9
     
4.10.61 Wind: 0 50 . Direction it needs to travel: (3, 5) 1 . Then you need 250 a b + 0 50
  34
to have this direction where a b is an appropriate unit vector. Thus you need
a2 + b2 = 1

250b + 50 5
=
250a 3
 
Thus a = 35 , b = 45 . The velocity of the plane relative to the ground is 150 250 . The speed of the plane
relative to the ground is given by
q
(150)2 + (250)2 = 291.55 miles per hour
q
It has to go a distance of (300)2 + (500)2 = 583. 10 miles. Therefore, it takes
583. 1
= 2 hours
291. 55
293

     
4.10.62 Water: 2 0 Swimmer: 0 3 Speed relative to earth: 2 3 . It takes him 1/6 of an

hour to get across. Therefore, he ends up travelling 16 4 + 9 = 16 13 miles. He ends up 1/3 mile down
stream.
   
4.10.63 Man: 3 ah b Water: 2 0 Then you need 3a = 2 and so a = 2/3 and hence b = 5/3.
i
The vector is then 23 35 .
 
In the second case, he could not do it. You would need to have a unit vector a b such that 2a = 3
which is not possible.
     
~ ~ ~ ~
4.10.67 proj~D ~F = F~ D ~D = k~Fk cos ~D = k~Fk cos ~u
kDk kDk kDk

20

4.10.68 40 cos 180 100 = 3758.8


4.10.69 20 cos 6 200 = 3464.1


4.10.70 20 cos 4 300 = 4242.6

4.10.71 200 cos 6 20 = 3464.1

4 0
4.10.72 3 1 10 = 30 You can consider the resultant of the two forces because of the prop-

4 0
erties of the dot product.

4.10.73

1 1 1
2 2   2
~F1
1

10 + ~F2 1

10 = ~F1 + ~F2
1

10
2 2 2
0 0 0
1

6 2

= 4 1 10
2
4 0

= 50 2

2 0
1
4.10.74 3 2 20 = 10 2
4 1
2

5.1.1 This result follows from the properties of matrix multiplication.

5.1.2
(a~v + b~w ~u)
T~u (a~v + b~w) = a~v + b~w ~u
k~uk2
(~v ~u) (~w ~u)
= a~v a 2
~u + b~w b ~u
k~uk k~uk2
= aT~u (~v) + bT~u (~w)
294 Selected Exercise Answers

5.1.3 Linear transformations take ~0 to ~0 which T does not. Also T~a (~u +~v) 6= T~a~u + T~a~v.

5.2.4 (a) The matrix of T is the elementary matrix which multiplies the jth diagonal entry of the identity
matrix by b.
(b) The matrix of T is the elementary matrix which takes b times the jth row and adds to the ith row.
(c) The matrix of T is the elementary matrix which switches the ith and the jth rows where the two
components are in the ith and jth positions.

5.2.5 Suppose
~cT1
..  1
. = ~a1 ~an
~cTn
Thus ~cTi ~a j = i j . Therefore,

~cT1
  1   .
~b1 ~bn ~a1 ~an ~ai = ~b1 ~bn .. ~ai
~cTn
 
= ~b1 ~bn ~ei
= ~bi
  1  
Thus T~ai = ~b1 ~bn ~a1 ~an ~ai = A~ai . If~x is arbitrary, then since the matrix ~a1 ~an
 
is invertible, there exists a unique ~y such that ~a1 ~an ~y =~x Hence
! !
n n n n
T~x = T yi~ai = yi T~ai = yi A~ai = A yi~ai = A~x
i=1 i=1 i=1 i=1

5.2.6
5 1 5 3 2 1 37 17 11
1 1 3 2 2 1 = 17 7 5
3 5 2 4 1 1 11 14 6

5.2.7
1 2 6 6 3 1 52 21 9
3 4 1 5 3 1 = 44 23 8
1 1 1 6 2 1 5 4 1

5.2.8
3 1 5 2 2 1 15 1 3
1 3 3 1 2 1 = 17 11 7
3 3 3 4 1 1 9 3 3

5.2.9
3 1 1 6 2 1 29 9 5
3 2 3 5 2 1 = 46 13 8
3 3 1 6 1 1 27 11 5
295

5.2.10
5 3 2 11 4 1 109 38 10
2 3 5 10 4 1 = 112 35 10
5 5 2 12 3 1 81 34 8

~v~u
5.2.14 Recall that proj~u (~v) = k~uk2
~u and so the desired matrix has ith column equal to proj~u (~ei ) . Therefore,
the matrix desired is
1 2 3
1
2 4 6
14
3 6 9

5.2.15
1 5 3
1
5 25 15
35
3 15 9

5.2.16
1 0 3
1
0 0 0
10
3 0 9

5.3.18 The matrix of S T is given by BA.


    
0 2 3 1 2 4
=
4 2 1 2 10 8

Now, (S T ) (~x) = (BA)~x.     


2 4 2 8
=
10 8 1 12

5.3.19 To find (S T ) (~x) we compute S(T (~x)).


    
1 2 2 4
=
1 3 3 11

5.3.21 The matrix of T 1 is A1 .  1  


2 1 2 1
=
5 2 5 2

    1 
cos 3 sin 3 1
3
5.4.24 = 1
2 2
1
sin 3 cos 3 2 3 2

    1 
cos 4 sin 4 2 2 12 2
5.4.25 = 1
sin 4 cos 4 2 2
1
2 2
     
cos 3  sin 3 1
2
1
2 3
5.4.26 =
sin 3 cos 3 12 3 1
2
296 Selected Exercise Answers

 2
 2
   
cos 3 sin 3 21 12 3
5.4.27 2 2 = 1
sin 3 cos 3 2 3 12

5.4.28
      
cos 3  sin 3 cos 4  sin 4
sin 3 cos 3 sin 4 cos 4
 1 1
1 1

4 2
3 + 4 2
1
4 2 4 23
= 1 1 1
4 2 3 4 2 4 2 3+ 4 2

5.4.29       
2 2 1 1
1 0 cos 3 sin 3 2 2 3
2 2 = 1 1
0 1 sin 3 cos 3 2 3 2

5.4.30       

1 0 cos 3 sin 3 1
2 1
2 3
= 1 1
0 1 sin 3 cos 3 2 3 2

5.4.31       1 

1 0 cos 4 sin 4 2 2 1
2 2
= 1 1
0 1 sin 4 cos 4 2 2 2 2

5.4.32       1 

1 0 cos 6 sin 6 2 3 1

2
=
0 1 sin 6 cos 6 1
2
1
2 3

5.4.33       1 

cos 4 sin 4 1 0 2 2 12 2
= 1
sin 4 cos 4 0 1 2 2 2 2
1

5.4.34       1 

cos 4 sin 4 1 0 2 2 12 2
=
sin 4 cos 4 0 1 12 2 21 2

5.4.35       1 

cos 6 sin 6 1 0 2 3 1
2
=
sin 6 cos 6 0 1 1
2 12 3

5.4.36       1 

cos 6 sin 6 1 0 2 3 12
=
sin 6 cos 6 0 1 12 1
2 3

5.4.37       
2
cos 3 sin 23 cos 4  sin 4
=
sin 2
3 cos 23 sin 4 cos 4
 1 1
1
1

4 23 4 2 4 2 3 2
4
1 1 1 1
4 2 3+ 4 2 4 2 3 4 2
Note that it doesnt matter about the order in this case.
297

5.4.38   1
1 0 0 cos 6  sin 6 0 2 3 12 0
0 1 0 sin 6
cos 6
0 = 1 1
0
2 2 3
0 0 1 0 0 1 0 0 1

5.4.39
   
cos ( ) sin ( ) 1 0 cos ( ) sin ( )
sin ( ) cos ( ) 0 1 sin ( ) cos ( )
 
cos2 sin2 2 cos sin
=
2 cos sin sin2 cos2

Now to write in terms of (a, b) , note that a/ a2 + b2 = cos , b/ a2 + b2 = sin . Now plug this in to the
above. The result is " 2 2 #  2 
a b ab
a2 +b2
2 a2 +b2 1 a b2 2ab
= 2
2 a2ab 2
b2 a2
2 2
a + b2 2ab b2 a2
+b a +b

Since this is a unit vector, a2 + b2 = 1 and so you get


 2 
a b2 2ab
2ab b2 a2

6.1.1 (a) z + w = 5 i

(b) z 2w = 4 + 23i

(c) zw = 62 + 5i

(d) w
z = 50 37
53 53 i

z
6.1.4 If z = 0, let w = 1. If z 6= 0, let w =
|z|

6.1.5 (a + bi) (c + di) = ac bd + (ad + bc) i = (ac bd)(ad + bc) i (a bi) (c di) = acbd (ad + bc) i
which is the same thing. Thus it holds for a product of two complex numbers. Now suppose you have that
it is true for the product of n complex numbers. Then

z1 zn+1 = z1 zn zn+1

and now, by induction this equals


z1 zn zn+1
As to sums, this is even easier.
n  n n
x j + iy j = xj +i yj
j=1 j=1 j=1
n n n n 
= xj i yj = x j iy j = x j + iy j .
j=1 j=1 j=1 j=1
298 Selected Exercise Answers

6.1.6 If p (z) = 0, then you have

p (z) = 0 = an zn + an1 zn1 + + a1 z + a0

= an zn + an1 zn1 + + a1 z + a0
= an zn + an1 zn1 + + a1 z + a0
= an zn + an1 zn1 + + a1 z + a0
= p (z)

6.1.7 The problem is that there is no single 1.

6.2.12 You have z = |z| (cos + i sin ) and w = |w| (cos + i sin ) . Then when you multiply these, you
get

|z| |w| (cos + i sin ) (cos + i sin )


= |z| |w| (cos cos sin sin + i (cos sin + cos sin ))
= |z| |w| (cos ( + ) + i sin ( + ))

6.3.13 Solution is:


(1 i) 2, (1 + i) 2, (1 i) 2, (1 + i) 2

6.3.14 The cube roots are the solutions to z3 + 8 = 0, Solution is: i 3 + 1, 1 i 3, 2

6.3.15 The fourth roots are the solutions to z4 + 16 = 0, Solution is:



(1 i) 2, (1 + i) 2, (1 i) 2, (1 + i) 2

6.3.16 Yes, it holds for all integers. First of all, it clearly holds if n = 0. Suppose now that n is a negative
integer. Then n > 0 and so
1 1
[r (cost + i sint)]n = n = r n (cos (nt) + i sin (nt))
[r (cost + i sint)]

rn rn (cos (nt) + i sin (nt))


= =
(cos (nt) i sin (nt)) (cos (nt) i sin (nt)) (cos (nt) + i sin (nt))
= rn (cos (nt) + i sin (nt))

because (cos (nt) i sin (nt)) (cos (nt) + i sin (nt)) = 1.



6.3.17 Solution is: i 3 + 1, 1 i 3, 2 and so this polynomial equals
     
(x + 2) x i 3 + 1 x 1i 3


6.3.18 x3 + 27 = (x + 3) x2 3x + 9
299

6.3.19 Solution is:


(1 i) 2, (1 + i) 2, (1 i) 2, (1 + i) 2.
These are just the fourth roots of 16. Then to factor, you get
     
x (1 i) 2 x (1 + i) 2
     
x (1 i) 2 x (1 + i) 2

  
6.3.20 x4 + 16 2 2
= x 2 2x + 4 x + 2 2x + 4 . You can use the information in the preceding prob-
lem. Note that (x z) (x z) has real coefficients.

6.3.21 Yes, this is true.

(cos i sin )n = (cos ( ) + i sin ( ))n


= cos (n ) + i sin (n )
= cos (n ) i sin (n )

6.3.22 p (x) = (x z1 ) q (x) + r (x) where r (x) is a nonzero constant or equal to 0. However, r (z1 ) = 0 and
so r (x) = 0. Now do to q (x) what was done to p (x) and continue until the degree of the resulting q (x)
equals 0. Then you have the above factorization.

6.4.23
(x (1 + i)) (x (2 + i)) = x2 (3 + 2i) x + 1 + 3i

6.4.24 (a) Solution is: 1 + i, 1 i



(b) Solution is: 61 i 35 16 , 16 i 35 16

(c) Solution is: 3 + 2i, 3 2i



(d) Solution is: i 5 2, i 5 2

(e) Solution is: 12 + i, 12 i



6.4.25 (a) Solution is : x = 1 + 12 2 12 i 2, x = 1 12 2 + 12 i 2

(b) Solution is : x = 1 21 i, x = 1 12 i

(c) Solution is : x = 12 , x = 12 i

(d) Solution is : x = 1 + 2i, x = 1 + 2i


 
(e) Solution is : x = 16 + 16 19 + 16 16 19 i, x = 16 16 19 + 1
6 + 16 19 i

7.1.1 Am X = m X for any integer. In the case of 1, A1 X = AA1 X = X so A1 X = 1 X . Thus the


eigenvalues of A1 are just 1 where is an eigenvalue of A.
300 Selected Exercise Answers

7.1.2 Say AX = X . Then cAX = c X and so the eigenvalues of cA are just c where is an eigenvalue
of A.

7.1.3 BAX = ABX = A X = AX . Here it is assumed that BX = X .

7.1.4 Let X be the eigenvector. Then Am X = m X , Am X = AX = X and so

m =
Hence if 6= 0, then
m1 = 1
and so | | = 1.

7.1.5 The formula follows from properties of matrix multiplications. However, this vector might not be
an eigenvector because it might equal 0 and eigenvectors cannot equal 0.
 
0 1
7.1.14 Yes. works.
0 0

7.1.16 When you think of this geometrically, it is clear that the only two values of are 0 and or these
added to integer multiples of 2
 
1 0
7.1.17 The matrix of T is . The eigenvectors and eigenvalues are:
0 1
   
0 1
1, 1
1 0
 
0 1
7.1.18 The matrix of T is . The eigenvectors and eigenvalues are:
1 0
   
i i
i, i
1 1

1 0 0
7.1.19 The matrix of T is 0 1 0 The eigenvectors and eigenvalues are:
0 0 1

0 1 0
0 1, 0 , 1 1

1 0 0

7.2.20 The eigenvalues are 1, 1, 1. The eigenvectors corresponding to the eigenvalues are:

10 7
2 1, 2 1

3 2
Therefore this matrix is not diagonalizable.
301

7.2.21 The eigenvectors and eigenvalues are:



2 2 7
0 1, 1 1, 2 3

1 0 2

The matrix P needed to diagonalize the above matrix is



2 2 7
0 1 2
1 0 2

and the diagonal matrix D is


1 0 0
0 1 0
0 0 3

7.2.22 The eigenvectors and eigenvalues are:



6 5 8
1 6, 2 3, 2 2

2 2 3

The matrix P needed to diagonalize the above matrix is



6 5 8
1 2 2
2 2 3

and the diagonal matrix D is


6 0 0
0 3 0
0 0 2

7.2.27 The eigenvalues are distinct because they are the nth roots of 1. Hence if X is a given vector with
n
X= a jV j
j=1

then
n n n
Anm X = Anm a jV j = a j AnmV j = a jV j = X
j=1 j=1 j=1

so Anm = I.

7.2.32 AX = (a + ib) X . Now take conjugates of both sides. Since A is real,

AX = (a ib) X
302 Selected Exercise Answers

7.3.33 First we write A = PDP1 .



     12 1
1 2 1 1 1 0 2
= 1 1

2 1 1 1 0 3 2 2

Therefore A10 = PD10 P1 .



 10   10 12 1
1 2 1 1 1 0 2
= 1 1

2 1 1 1 0 3 2 2

   12 1
1 1 (1)10 0
2

=
1 1 0 310 1
2
1
2
 
29525 29524
=
29524 29525

90
7.3.36 (a) Multiply the given matrix by the initial state vector given by 81 . After one time period
85
there are 89 people in location 1, 106 in location 2, and 61 in location 3.

x1s
(b) Solve the system given by (I A)Xs = 0 where A is the migration matrix and Xs = x2s is the
x3s
steady state vector. The solution to this system is given by
8
x1s = x3s
5
63
x2s = x3s
25
Letting x3s = t and using the fact that there are a total of 256 individuals, we must solve
8 63
t + t + t = 256
5 25
We find that t = 50. Therefore after a long time, there are 80 people in location 1, 126 in location 2,
and 50 in location 3.

x1s
7.3.38 We solve (I A)Xs = 0 to find the steady state vector Xs = x2s . The solution to the system is
x3s
given by
5
x1s = x3s
6
2
x2s = x3s
3
303

Letting x3s = t and using the fact that there are a total of 480 individuals, we must solve
5 2
t + t + t = 480
6 3
We find that t = 192. Therefore after a long time, there are 160 people in location 1, 128 in location 2, and
192 in location 3.

7.3.41
0.38
X3 = 0.18
0.44
Therefore the probability of ending up back in location 2 is 0.18.

7.3.42
0.367
X2 = 0.4625
0.1705
Therefore the probability of ending up in location 1 is 0.367.

7.3.43 The migration matrix is


1 1 1 1
3 10 10 5


1 7 1 1
3 10 5 10

A=
2 1 3 1


9 10 5 5


1 1 1 1
9 10 10 2

To find the number of trailers


in each location after a long time we solve system (I A)Xs = 0 for the

x1s
x2s
steady state vector Xs =
x3s . The solution to the system is
x4s
9
x1s = x4s
10
12
x2s = x4s
5
8
x3s = x4s
5
Letting x4s = t and using the fact that there are a total of 413 trailers we must solve
9 12 8
t + t + t + t = 413
10 5 5
We find that t = 70. Therefore after a long time, there are 63 trailers in the SE, 168 in the NE, 112 in the
NW and 70 in the SW.
Index
, 269 cofactor, 89
, 269 expanding along row or column, 90
\, 269 matrix inverse formula, 111
row-echelon form, 16 minor, 88
reduced row-echelon form, 16 product, 96, 101
algorithm, 19 row operations, 94
direction vector, 141
adjugate, 110 distance formula, 135
back substitution, 13 properties, 136
base case, 272 dot product, 145
basic solution, 29 properties, 146
box product, 166 eigenvalue, 223
Cauchy Schwarz inequality, 147 eigenvalues
characteristic equation, 223 calculating, 224
classical adjoint, 110 eigenvector, 223
Cofactor Expansion, 90 eigenvectors
cofactor matrix, 110 calculating, 224
complex eigenvalues, 240 elementary matrix, 67
complex numbers inverse, 69
absolute value, 208 elementary operations, 9
addition, 203 elementary row operations, 15
argument, 210 empty set, 269
conjugate, 205 field axioms, 204
conjugate of a product, 210 force, 171
modulus, 208, 210 Fundamental Theorem of Algebra, 203
multiplication, 204
polar form, 210 Gauss-Jordan Elimination, 24
roots, 213 Gaussian Elimination, 24
standard form, 203
triangle inequality, 208 hyper-planes, 6
component form, 141 idempotent, 80
component of a force, 175, 177 identity transformation, 181
consistent system, 8 included angle, 149
Cramers rule, 115 inconsistent system, 8
cross product, 161, 162 induction hypothesis, 272
area of parallelogram, 164 inner product, 145
coordinate description, 162 intersection, 269
geometric description, 161, 162 intervals
De Moivres theorem, 213 notation, 270
determinant, 87 Kronecker symbol, 59

305
306 INDEX

Laplace expansion, 90 properties, 56


leading entry, 16 vectors, 48
linear combination, 30, 130 migration matrix, 250
linear transformation, 180 multiplicity, 224
composite, 193
matrix, 183 Newton, 171
lines nilpotent, 108
parametric equation, 143 nontrivial solution, 28
symmetric form, 143 parallelepiped, 166
vector equation, 141 volume, 166
main diagonal, 92 parameter, 23
Markov matrices, 249 permutation matrices, 67
mathematical induction, 272 pivot column, 18
matrix, 14, 41 pivot position, 18
addition, 43 plane
augmented matrix, 14, 15 normal vector, 157
coefficient matrix, 14 scalar equation, 158
commutative, 55 vector equation, 157
components of a matrix, 42 polynomials
conformable, 50 factoring, 215
dimension, 14 position vector, 126
entries of a matrix, 42 quadratic formula, 217
equality, 42
equivalent, 27 random walk, 252
finding the inverse, 62 reflection
identity, 58 across a given vector, 201
inverse, 59 resultant, 172
invertible, 59 right handed system, 161
lower triangular, 92 row operations, 15, 94
orthogonal, 108
orthogonally diagonalizable, 247 scalar, 8
properties of addition, 44 scalar product, 145
properties of scalar multiplication, 46 scalars, 129
properties of transpose, 57 set notation, 269
raising to a power, 245 similar matrix, 230
rank, 31 skew lines, 6
scalar multiplication, 45 spectrum, 223
skew symmetric, 58 speed, 173
square, 42 state vector, 250
symmetric, 58 summation notation, 271
transpose, 57 system of equations, 8
upper triangular, 92 homogeneous, 8
matrix form AX=B, 49 matrix form, 49
matrix multiplication, 50 solution set, 8
ijth entry, 53 vector form, 47
INDEX 307

triangle inequality, 148


trigonometry
sum of two angles, 199
trivial solution, 28

union, 269
unit vector, 137

variable
basic, 25
free, 25
vector, 127
addition, 128
addition, geometric meaning, 127
components, 127
corresponding unit vector, 137
length, 137
orthogonal, 151
perpendicular, 151
points and vectors, 126
projection, 152, 153
scalar multiplication, 129
subtraction, 129
vectors, 46
column, 46
row vector, 46
velocity, 173

well ordered, 272


work, 175

zero matrix, 42
zero transformation, 181
zero vector, 129
ADAPTED FORMATIVE ONLINE COURSE COURSE LOGISTICS
OPEN TEXT ASSESSMENT SUPPLEMENTS & SUPPORT

You might also like