Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
422 views253 pages

A Gentle Introduction To Abstract Algebra

A GENTLE INTRODUCTION TO ABSTRACT ALGEBRA by b.a. Sethuraman California State University Northridge ii Copyright (c)2012. Permission is granted to copy, distribute and / or modify this document under the terms of the GNU Free Documentation License, version 1. Or any later version published by the Free Software Foundation.

Uploaded by

zepapi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
422 views253 pages

A Gentle Introduction To Abstract Algebra

A GENTLE INTRODUCTION TO ABSTRACT ALGEBRA by b.a. Sethuraman California State University Northridge ii Copyright (c)2012. Permission is granted to copy, distribute and / or modify this document under the terms of the GNU Free Documentation License, version 1. Or any later version published by the Free Software Foundation.

Uploaded by

zepapi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 253

A GENTLE INTRODUCTION TO

ABSTRACT ALGEBRA
by
B.A. Sethuraman
California State University Northridge
ii
Copyright 2012 B.A. Sethuraman.
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.3 or any later
version published by the Free Software Foundation; with no Invariant Sec-
tions, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
is included in the section entitled GNU Free Documentation License.
Source les for this book are available at
<http://www.csun.edu/~asethura/giaa/>
Contents
Preface v
To the Student: How to Read a Mathematics Book ix
1 Divisibility in the Integers 1
2 Rings and Fields 23
2.1 Rings: Denition and Examples . . . . . . . . . . . . . . . . . 23
2.2 Subrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Integral Domains and Fields . . . . . . . . . . . . . . . . . . . 45
2.4 Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Quotient Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6 Ring Homomorphisms and Isomorphisms . . . . . . . . . . . 63
2.7 Further Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Vector Spaces 95
3.1 Vector Spaces: Denition and Examples . . . . . . . . . . . . 95
3.2 Linear Independence, Bases, Dimension . . . . . . . . . . . . 103
3.3 Subspaces and Quotient Spaces . . . . . . . . . . . . . . . . . 125
3.4 Vector Space Homomorphisms: Linear Transformations . . . 133
3.5 Further Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 148
iii
iv CONTENTS
4 Groups 157
4.1 Groups: Denition and Examples . . . . . . . . . . . . . . . . 157
4.2 Subgroups, Cosets, Lagranges Theorem . . . . . . . . . . . . 180
4.3 Normal Subgroups, Quotient Groups . . . . . . . . . . . . . . 192
4.4 Group Homomorphisms and Isomorphisms . . . . . . . . . . . 197
4.5 Further Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 204
A Sets, Functions, and Relations 215
B Partially Ordered Sets, Zorns Lemma 219
C GNU Free Documentation License 227
GNU Free Documentation License 227
1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . 228
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . 230
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . 230
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . 231
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . 233
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . 234
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . 234
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . 235
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . 236
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . 236
ADDENDUM: How to use this License for your documents . . . . 237
Preface
This book is a gentle introduction to abstract algebra. It is ideal as a text
for a one-semester course designed to provide a rst exposure of the subject
to students in mathematics, science, or engineering. Such a course would
teach students the basic objects of algebra, providing plentiful examples
and enough theory to allow interested students to transition easily to more
advanced abstract algebra. At the same time, this course would allow future
users of the subject, including students interested in various other subelds
of mathematics and students of science and engineering, to gain enough
familiarity with the objects of algebra to be able to study them further
within the manifold contexts in which they are needed.
Thus, this book deals with groups, rings and elds, and vector spaces.
The approach to these objects is elementary, with a focus on examples and
on computation with these examples. The book starts with rings, reect-
ing my experience that students nd rings easier to grasp as an abstraction
since they are already familiar with the integers, the rationals, the reals, the
complexes, 2 2 matrices with real entries, and polynomials with real coef-
cients. Vector spaces are treated next, followed by groups. It is expected
that students have had some exposure to proof-based mathematics, such as
can be obtained in basic proofs courses common in many American uni-
versities. Such students are likely to be familiar with the properties of the
integers already, but for completeness, a preliminary chapter on divisibility
in the integers has been included. Material on sets, functions, and relations,
v
vi Preface
that belong more commonly to a proofs course, has also been provided as
an appendix.
The style of the book is conversational (a style that mirrors my own
approach to teaching), with a stress on exposition. I have attempted to
show that there are some common themes to the study of the three objects:
rings, vector spaces, and groups. For each, I introduce the object using a
large number of examples. For each, I introduce their various subobjects
(subrings, ideals, subspaces, subgroups, normal subgroups), again with nu-
merous examples. I introduce quotient objects, and then for each object I
introduce the appropriate notion of homomorphism and isomorphism. I end
with the fundamental homomorphism theorem for each object. I nd that
when students see the same concept three dierent times in mildly dierent
guises, such as the notion of a structure preserving map, the notion of a
kernel, or the notion of an appropriate quotient object, they become quite
comfortable with these concepts by the end of the semester. For example, I
nd that they have no trouble with quotient groups (a traditionally dicult
idea to convey if abstract algebra is introduced rst through groups) since
they have already computed with quotient rings in more intuitive settings
such as the integers mod n or the polynomials over a eld mod a linear or
quadratic polynomial.
The entire material in the book can be covered in a traditional sixteen
week semester, judiciously speeding up here and there. Besides copious ex-
amples and exercises (most of a computational kind, based on the examples,
and some that extend the theory developed in the text), each chapter comes
with end notes: remarks about various aspects of the theory, occasional hints
to some exercise, and several glimpses into material beyond the course. The
book shares some material with an earlier text I wrote called Rings, Fields
and Vector Spaces, but the focus and end goal of the two books are quite
dierent.
I am grateful to the various faculty members at California State Univer-
Preface vii
sity Northridge who have taught the introductory abstract algebra course,
Math 360, for several years now from this book. I am also grateful to the
students in the course; together, both the faculty and students have pro-
vided valuable feedback. The National Science Foundation has supported
me professionally through two research grants during much of the time when
this book was being developed, and I am grateful to them.
I owe a special debt of gratitude to the most extraordinary student I have
ever worked with, one whom I have never met. He will remain unnamed
here. He is currently in prison, but rather than succumb to circumstances,
he chose the positive route, and enrolled in mathematics courses at Califor-
nia State University Northridge as an extension student. Faculty would send
him course material by U.S. mail (the only form of interaction he is allowed
under incarceration), and he would complete his assignments under super-
vision and mail them back. He oered to read through this book and give
suggestions, an oer I readily accepted. I was amazed when I received his
edit suggestions! I have yet to see such meticulousness in any student, such
attention to the right word, such alertness for the clumsy phrase. But more
importantly, he proved to be a brilliant student, and made several powerful
suggestions that went beyond the writing and into the mathematics. There
are many explanations here and many additional remarks that owe their
existence to him. (All errors that remain, of course, are to be blamed on
me.) I was privileged that he learned abstract algebra from this book, and
to him I would like to say: Thank you, my friend! I hope to meet you some
day.
B.A. Sethuraman
California State University Northridge
viii Preface
To the Student: How to
Read a Mathematics Book
How should you read a mathematics book? The answer, which applies to
every book on mathematics, and in particular to this one, can be given in
one wordactively. You may have heard this before, but it can never be
overstressedyou can only learn mathematics by doing mathematics. This
means much more than attempting all the problems assigned to you (al-
though attempting every problem assigned to you is a must). What it
means is that you should take time out to think through every sentence
and conrm every assertion made. You should accept nothing on trust; in-
stead, not only should you check every statement, you should also attempt
to go beyond what is stated, searching for patterns, looking for connections
with other material that you may have studied, and probing for possible
generalizations.
Let us consider an example. On page 29 in Chapter 2, you will nd the
following sentence:
Yet, even in this extremely familiar number system, multi-
plication is not commutative; for instance,
_
1 0
0 0
_

_
0 1
0 0
_
,=
_
0 1
0 0
_

_
1 0
0 0
_
.
(The number system referred to is the set of 2 2 matrices whose entries
ix
x How to Read a Mathematics Book
are real numbers.) When you read a sentence such as this, the rst thing
that you should do is verify the computation yourselves. Mathematical in-
sight comes from mathematical experience, and you cannot expect to gain
mathematical experience if you merely accept somebody elses word that
the product on the left side of the equation does not equal the product on
the right side.
The very process of multiplying out these matrices will make the set
of 2 2 matrices a more familiar system of objects, but as you do the
calculations, more things can happen if you keep your eyes and ears open.
Some or all of the following may occur:
1. You may notice that not only are the two products not the same,
but that the product on the right side gives you the zero matrix. This
should make you realize that although it may seem impossible that two
nonzero numbers can multiply out to zero, this is only because you
are conning your thinking to the real or complex numbers. Already,
the set of 22 matrices (with which you have at least some familiarity)
contains nonzero elements whose product is zero.
2. Intrigued by this, you may want to discover other pairs of nonzero
matrices that multiply out to zero. You will do this by taking arbitrary
pairs of matrices and determining their product. It is quite probable
that you will not nd an appropriate pair. At this point you may be
tempted to give up. However, you should not. You should try to be
creative, and study how the entries in the various pairs of matrices
you have selected aect the product. It may be possible for you to
change one or two entries in such a way that the product comes out
to be zero. For instance, suppose you consider the product
_
1 1
1 1
_

_
4 0
2 0
_
=
_
6 0
6 0
_
You should observe that no matter what the entries of the rst matrix
are, the product will always have zeros in the (1, 2) and the (2, 2) slots.
How to Read a Mathematics Book xi
This gives you some freedom to try to adjust the entries of the rst
matrix so that the (1, 1) and the (2, 1) slots also come out to be zero.
After some experimentation, you should be able to do this.
3. You may notice a pattern in the two matrices that appear in our
inequality on page ix. Both matrices have only one nonzero entry, and
that entry is a 1. Of course, the 1 occurs in dierent slots in the two
matrices. You may wonder what sorts of products occur if you take
similar pairs of matrices, but with the nonzero 1 occuring at other
locations. To settle your curiosity, you will multiply out pairs of such
matrices, such as
_
0 0
1 0
_

_
0 1
0 0
_
,
or
_
0 0
1 0
_

_
0 0
1 0
_
.
You will try to discern a pattern behind how such matrices multiply.
To help you describe this pattern, you will let e
i,j
stand for the matrix
with 1 in the (i, j)-th slot and zeros everywhere else, and you will try
to discover a formula for the product of e
i,j
and e
k,l
, where i, j, k, and
l can each be any element of the set 1, 2.
4. You may wonder whether the fact that we considered only 2 2 ma-
trices is signicant when considering noncommutative multiplication
or when considering the phenomenon of two nonzero elements that
multiply out to zero. You will ask yourselves whether the same phe-
nomena occur in the set of 3 3 matrices or 4 4 matrices. You will
next ask yourselves whether they occur in the set of n n matrices,
where n is arbitrary. But you will caution yourselves about letting n
be too arbitrary. Clearly n needs to be a positive integer, since nn
matrices is meaningless otherwise, but you will wonder whether n can
be allowed to equal 1 if you want such phenomena to occur.
xii How to Read a Mathematics Book
5. You may combine 3 and 4 above, and try to dene the matrices e
i,j
analogously in the general context of n n matrices. You will study
the product of such matrices in this general context and try to discover
a formula for their product.
Notice that a single sentence can lead to an enormous amount of mathe-
matical activity! Every step requires you to be alert and actively involved
in what you are doing. You observe patterns for yourselves, you ask your-
selves questions, and you try to answer these questions on your own. In
the process, you discover most of the mathematics yourselves. This is re-
ally the only way to learn mathematics (and in particular, it is the way
every professional mathematician has learned the subject). Mathematical
concepts are developed precisely because mathematicians observe patterns
in various mathematical objects (such as the 2 2 matrices), and to have a
good understanding of these concepts you must try to notice these patterns
for yourselves.
May you spend many many hours happily playing in the rich and beau-
tiful world of mathematics!
Exercises
1. Carry out the program in steps (1) through (5) above.
Chapter 1
Divisibility in the Integers
We will begin our study with a very concrete set of objects, the integers, that
is, the set 0, 1, 1, 2, 2, . . . . This set is traditionally denoted Z and is very
familiar to usin fact, we were introduced to this set so early in our lives that
we think of ourselves as having grown up with the integers. Moreover, we
view ourselves as having completely absorbed the process of integer division;
we unhesitatingly say that 3 divides 99 and equally unhesitatingly say that
5 does not divide 101.
As it turns out, this very familiar set of objects has an immense amount of
structure to it. It turns out, for instance, that there are certain distinguished
integers (the primes) that serve as building blocks for all other integers.
These primes are rather beguiling objects; their existence has been known
for over two thousand years, yet there are still several unanswered questions
about them. They serve as building blocks in the following sense: every
positive integer greater than 1 can be expressed uniquely as a product of
primes. (Negative integers less than 1 also factor into a product of primes,
except that they have a minus sign in front of the product.)
The fact that nearly every integer breaks up uniquely into building blocks
is an amazing one; this is a property that holds in very few number systems,
and our goal in this chapter is to establish this fact. (In the exercises to
1
2 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
Chapter 2 we will see an example of a number system whose elements do not
factor uniquely into building blocks. Chapter 2 will also contain a discussion
of what a number system issee Remark 2.8.)
We will begin by examining the notion of divisibility and dening divisors
and multiples. We will study the division algorithm and how it follows from
the Well-Ordering Principle. We will explore greatest common divisors and
the notion of relative primeness. We will then introduce primes and prove
our factorization theorem. Finally, we will look at what is widely considered
as the ultimate illustration of the elegance of pure mathematicsEuclids
proof that there are innitely many primes.
Let us start with something that seems very innocuous, but is actually
rather profound. Write N for the set of nonnegative integers that is, N =
0, 1, 2, 3, . . . . (N stands for natural numbers, as the nonnegative integers
are sometimes referred to.) Let S be any nonempty subset of N. For example,
S could be the set 0, 5, 10, 15, . . . , or the set 1, 4, 9, 16, . . . , or else the
set 100, 1000. The following is rather obvious: there is an element in S
that is smaller than every other element in S, that is, S has a smallest or
least element. This fact, namely that every nonempty subset of N has a least
element, turns out to be a crucial reason why the integers possess all the
other beautiful properties (such as a notion of divisibility, and the existence
of prime factorizations) that make them so interesting.
Contrast the integers with another very familiar number system, the
rationals, that is, the set a/b [ a and b are integers, with b ,= 0. (This set
is traditionally denoted by Q.) Can you think of a nonempty subset of the
positive rationals that fails to have a least element?
We will take this property of the integers as a fundamental axiom, that
is, we will merely accept it as given and not try to prove it from more
fundamental principles. Also, we will give it a name:
Well-Ordering Principle: Every nonempty subset of the nonnegative
integers has a least element.
3
Now let us look at divisibility. Why do we say that 2 divides 6? It is
because there is another integer, namely 3, such that the product 2 times 3
exactly gives us 6. On the other hand, why do we say that 2 does not divide
7? This is because no matter how hard we search, we will not be able to
nd an integer b such that 2 times b equals 7. This idea will be the basis of
our denition:
Denition 1.1. A (nonzero) integer d is said to divide an integer a (denoted
d[a) if there exists an integer b such that a = db. If d divides a, then d is
referred to as a divisor of a or a factor of a, and a is referred to as a multiple
of d.
Observe that this is a slightly more general denition than most of us
are used toaccording to this denition, 2 divides 6 as well, since there
exists an integer, namely 3, such that 2 times 3 equals 6. Similarly, 2
divides 6, since 2 times 3 equals 6. More generally, if d divides a, then
all of the following are also true: d[ a, d[a, d[ a. (Take a minute to
prove this formally!) It is quite reasonable to include negative integers in
our concept of divisibility, but for convenience, we will often focus on the
case where the divisor is positive.
The following easy result will be very useful:
Lemma 1.2. If d is a nonzero integer such that d[a and d[b for two integers
a and b, then for any integers x and y, d[(xa +yb). (In particular, d[(a +b)
and d[(a b).)
Proof. Since d[a, a = dm for some integer m. Similarly, b = dn for some
integer n. Hence xa + yb = xdm + ydn = d(xm + yn). Since we have
succeeded in writing xa + yb as d times the integer xm + yn, we nd that
d[(xa+yb). As for the statement in the parentheses, taking x = 1 and y = 1,
we nd that d[a +b, and taking x = 1 and y = 1, we nd that d[a b. 2
Question 1.3. If a nonzero integer d divides both a and a + b,
must d divide b as well?
4 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
The following lemma holds the key to the division process. Its statement
is often referred to as the division algorithm. The Well-Ordering Principle
plays a central role in its proof.
Lemma 1.4. (Division Algorithm) Given integers a and b with b > 0, there
exist unique integers q and r, with 0 r < b such that a = bq +r.
Remark 1.5. First, observe the range that r lies in. It is constrained to lie
between 0 and b 1 (with both 0 and b 1 included as possible values for
r). Next, observe that the lemma does not just state that integers q and r
exist with 0 r < b and a = bq + r, it goes furtherit states that these
integers q and r are unique. This means that if somehow one were to have
a = bq
1
+r
1
and a = bq
2
+r
2
for integers q
1
, r
1
, q
2
, and r
2
with 0 r
1
< b
and 0 r
2
< b, then q
1
must equal q
2
and r
1
must equal r
2
. The integer q is
referred to as the quotient and the integer r is referred to as the remainder.
Proof of Lemma 1.4. Let S be the set a bn [ n Z. Thus, S contains
the following integers: a (= a b 0), a b, a + b, a 2b, a + 2b, a 3b,
a+3b, etc. Let S

be the set of all those elements in S that are nonnegative,


that is, S

= a bn [ n Z, and a bn 0. It is not immediate that


S

is nonempty, but if we think a bit harder about this, it will be clear


that S

indeed has elements in it. For if a is nonnegative, then a S

. If
a is negative, then a ba is nonnegative (check! remember that b itself is
positive, by hypothesis), so a ba S

. By the Well-Ordering Principle,


since S

is a nonempty subset of N, S

has a least element; call it r. (The


notation r is meant to be suggestive; this element will be the r guaranteed
by the lemma.)
Since r is in S (actually in S

as well), r must be expressible as a bq


for some integer q, since every element of S is expressible as a bn for some
integer n. (The notation q is also meant to be suggestive, this integer will
be the q guaranteed by the lemma.) Since r = a bq, we nd a = bq +r.
What we need to do now is to show that 0 r < b, and that q and r are
unique.
5
Observe that since r is in S

and since all elements of S

are nonnegative,
r must be nonnegative, that is 0 r. Now suppose r b. We will arrive
at a contradiction: Write r = b + x, where x 0 (why is x 0?). Writing
b + x for r in a = bq + r, we nd a = bq + b + x, or a = b(q + 1) + x, or
x = a b(q + 1). This form of x shows that x belongs to the set S (why?).
Since we have already seen that x 0, we nd further that x S

. But
more is true: since x = r b and b > 0, x must be less than r (why?). Thus,
x is an element of S

that is smaller that ra contradiction to the fact that


r is the least element of S

! Hence, our assumption that r b must have


been false, so r < b. Putting this together with the fact that 0 r, we nd
that 0 r < b, as desired.
Now for the uniqueness of q and r. Suppose a = bq + r and as well,
a = bq

+ r

, for integers q, r, q

, and r

with 0 r < b and 0 r

< b.
Then b(q q

) = r

r. Thus, r

r is a multiple of b. Now the fact that


0 r < b and 0 r

< b shows that b < r

r < b. (Convince yourselves


of this!) The only multiple of b in the range (b, b) (both endpoints of the
range excluded) is 0. Hence, r

r must equal 0, that is, r

= r. It follows
that b(q q

) = 0, and since b ,= 0, we nd that q = q

.
2
Observe that to test whether a given (positive) integer d divides a given
integer a, it is enough to write a as dq + r (0 r < d) as in Lemma 1.4
and examine whether the remainder r is zero or not. For d[a if and only
there exists an integer x such that a = dx. View this as a = dx +0. By the
uniqueness part of Lemma 1.4, we nd that a = dx + 0 if and only if b = x
and r = 0.
Now, given two nonzero integers a and b, it is natural to wonder whether
they have any divisors in common. Notice that 1 is automatically a common
divisor of a and b, no matter what a and b are. Recall that [a[ denotes the
absolute value of a, and notice that every divisor d of a is less than or equal
to [a[. (Why? Notice, too, that [a[ is a divisor of a.) Also, for every divisor
6 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
d of a, we must have d [a[. (Why? Notice, too, that [a[ is a divisor
of a.) Similarly, every divisor d of b must be less than or equal to [b[ and
greater than or equal to [b[ (and both [b[ and [ b[ are divisors of b). It
follows that every common divisor of a and b must be less than or equal to
the lesser of [a[ and [b[, and must be greater than or equal to the greater of
[a[ and [b[. Thus, there are only nitely many common divisors of a and
b, and they all lie in the range max([a[, [b[) to min([a[, [b[).
We will now focus on a very special common divisor of a and b.
Denition 1.6. Given two (nonzero) integers a and b, the greatest common
divisor of a and b (written as gcd(a, b)) is the largest of the common divisors
of a and b.
Note that since there are only nitely many common divisors of a and
b, it makes sense to talk about the largest of the common divisors.
Question 1.7. By contrast, must an innite set of integers nec-
essarily have a largest element? Must an innite set of integers
necessarily fail to have a largest element? What would your answers
to these two questions be if we restricted our attention to an innite
set of positive integers? How about if we restricted our attention to
an innite set of negative integers?
Notice that since 1 is already a common divisor, the greatest common
divisor of a and b must be at least as large as 1. We can conclude from this
that the greatest common divisor of two nonzero integers a and b must be
positive.
Question 1.8. If p and q are two positive integers and if q divides
p, what must gcd(p, q) be?
See the notes on Page 20 for a discussion on the restriction that both a
and b be nonzero in Denition 1.6 above.
Let us derive an alternative formulation for the greatest common divisor
that will be very useful. Given two nonzero integers a and b, any integer
that can be expressed in the form xa+yb for some integers x and y is called
a linear combination of a and b. (For example, a = 1 a + 0 b is a linear
7
combination of a and b; so are 3a 5b, 6a +10b, b = 0 a +(1) b, etc.)
Write P for the set of linear combinations of a and b that are positive. (For
instance, if a = 2 and b = 3, then 2 = (1) 2 + (0) 3 would not be in
P as 2 is negative, but 7 = 2 2 + 3 would be in P as 7 is positive.) Now
here is something remarkable: the smallest element in P turns out to be the
greatest common divisor of a and b! We will prove this below.
Theorem 1.9. Given two nonzero integers a and b, let P be the set xa +
yb[x, y Z, xa + yb > 0. Let d be the least element in P. Then d =
gcd(a, b). Moreover, every element of P is divisible by d.
Proof. First observe that P is not empty. For if a > 0, then a P, and if
a < 0, then a P. Thus, since P is a nonempty subset of N (actually, of
the positive integers as well), the Well-Ordering Principle guarantees that
there is a least element d in P, as claimed in the statement of the theorem.
To show that d = gcd(a, b), we need to show that d is a common divisor
of a and b, and that d is the largest of all the common divisors of a and b.
First, since d P, and since every element in P is a linear combination
of a and b, d itself can be written as a linear combination of a and b. Thus,
there exist integers x and y such that d = xa +yb. (Note: These integers x
and y need not be unique. For instance, if a = 4 and b = 6, we can express
2 as both (1) 4 + 1 6 and (4) 4 + 3 6. However, this will not be a
problem; we will simply pick one pair x, y for which d = xa + yb and stick
to it.)
Let us show that d is a common divisor of a and b. Write a = dq +r for
integers d and r with 0 r < d (division algorithm). We need to show that
r = 0. Suppose to the contrary that r > 0. Write r = a dq. Substituting
xa + yb for d, we nd that r = (1 xq)a + (yq)b. Thus, r is a positive
linear combination of a and b that is less than da contradiction, since d is
the smallest positive linear combination of a and b. Hence r must be zero,
that is, d must divide a. Similarly, one can prove that d divides b as well,
so that d is indeed a common divisor of a and b.
8 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
Now let us show that d is the largest of the common divisors of a and
b. This is the same as showing that if c is any common divisor of a and b,
then c must be no larger than d. So let c be any common divisor of a and b.
Then, by Lemma 1.2 and the fact that d = xa +yb, we nd that c[d. Thus,
c [d[ (why?). But since d is positive, [d[ is the same as d. Thus, c d, as
desired.
To prove the last statement of the theorem, note that we have already
proved that d[a and d[b. By Lemma 1.2, d must divide all linear combina-
tions of a and b, and must hence divide every element of P.
We have thus proved our theorem. 2
In the course of proving Theorem 1.9 above, we have actually proved
something else as well, which we will state as a separate result:
Proposition 1.10. Every common divisor of two nonzero integers a and b
divides their greatest common divisor.
Proof. As remarked above, the ideas behind the proof of this corollary are
already contained in the proof of Theorem 1.9 above. We saw there that if
c is any common divisor of a and b, then c must divide d, where d is the
minimum of the set P dened in the statement of the theorem. But this,
along with the other arguments in the proof of the theorem, showed that d
must be the greatest common divisor of a and b. Thus, to say that c divides
d is really to say that c divides the greatest common divisor of a and b, thus
proving the proposition. 2
Exercise 1.37 will yield yet another description of the greatest common
divisor.
9
Question 1.11. Given two nonzero integers a and b for which one
can nd integers x and y such that xa +yb = 2, can you conclude
from Theorem 1.9 that gcd(a, b) = 2? If not, why not? What,
then, are the possible values of gcd(a, b)? Now suppose there exist
integers x

and y

such that x

a + y

b = 1. Can you conclude that


gcd(a, b) = 1? (See the notes on Page 20 after you have thought
about these questions for at least a little bit yourselves!)
Given two nonzero integers a and b, we noted that 1 is a common divisor
of a and b. In general, a and b could have other common divisors greater
than 1, but in certain cases, it may turn out that the greatest common
divisor of a and b is precisely 1. We give a special name to this:
Denition 1.12. Two nonzero integers a and b are said to be relatively prime
if gcd(a, b) = 1.
We immediately have the following:
Corollary 1.13. Given two nonzero integers a and b, gcd(a, b) = 1 if and
only if there exist integers x and y such that xa +yb = 1.
Proof. You should be able to prove this yourselves! (See Question 1.11
above.) 2
The following lemma will be useful:
Lemma 1.14. Let a and b be positive integers, and let c be a third integer.
If a[bc and gcd(a, b) = 1, then a[c.
Proof. Since gcd(a, b) = 1, Theorem 1.9 shows that there exist integers x
and y such that 1 = xa +yb. Multiplying by c, we nd that c = xac +ybc.
Since a[a and a[bc, a must divide c by Lemma 1.2. 2
We are now ready to introduce the notion of a prime!
Denition 1.15. An integer p greater than 1 is said to be prime if its only
divisors are 1 and p. (An integer greater than 1 that is not prime is said to
be composite.)
10 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
The rst ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. The
hundredth prime is 541.
Primes are intriguing things to study. On the one hand, they should be
thought of as being simple, in the sense that their only positive divisors are 1
and themselves. (This is sometimes described by the statement primes have
no nontrivial divisors.) On the other hand, there is an immense number
of questions about them that are still unanswered, or at best, only partially
answered. For instance: is every even integer greater than 4 expressible as a
sum of two primes? (This is known as Goldbachs conjecture. The answer
is unknown.) Are there innitely many twin primes? (The answer to this is
also unknown.) Is there any pattern to the occurence of the primes among
the integers? Here, some partial answers are known. The following is just a
sample: There are arbitrarily large gaps between consecutive primes, that
is, given any n, it is possible to nd two consecutive primes that dier by at
least n. (See Exercise 1.31.) It is known that for any n > 1, there is always
a prime between n and 2n. (It is unknown whether there is a prime between
n
2
and (n + 1)
2
, however!) It is known that as n becomes very large, the
number of primes less than n is approximately n/ ln(n), in the sense that the
ratio between the number of primes less than n and n/ ln(n) approaches 1
as n becomes large. (This is the celebrated Prime Number Theorem.) Also,
it is known that given any arithmetic sequence a, a +d, a +2d, a +3d, . . . ,
where a and d are nonzero integers with gcd(a, d) = 1, innitely many of
the integers that appear in this sequence are primes!
Those of you who nd this fascinating should delve deeper into number
theory, which is the branch of mathematics that deals with such questions.
It is a wonderful subject with hordes of problems that will seriously challenge
your creative abilities! For now, we will content ourselves with proving the
unique prime factorization property and the innitude of primes already
referred to at the beginning of this chapter.
The following lemmas will be needed:
11
Lemma 1.16. Let p be a prime and a an arbitrary integer. Then either p[a
or else gcd(p, a) = 1.
Proof. If p already divides a, we have nothing to prove, so let us assume
that p does not divide a. We need to prove that gcd(p, a) = 1. Write x for
gcd(p, a). By denition x divides p. Since the only positive divisors of p
are 1 and p, either x = 1 (which is want we want to show), or else x = p.
Suppose x = p. Then, as x divides a as well, we nd p divides a. But we
have assumed that p does not divide a. Hence x = 1.
2
Lemma 1.17. Let p be a prime. If p[ab for two integers a and b, then either
p[a or else p[b.
Proof. If p already divides a, we have nothing to prove, so let us assume that
p does not divide a. Then by Lemma 1.16, gcd(p, a) = 1. It now follows
from Lemma 1.14 that p[b. 2
The following generalization of Lemma 1.17 will be needed in the proof
of Theorem 1.19 below:
Exercise 1.18. Show using induction and Lemma 1.17 that if a
prime p divides a product of integers a
1
a
2
a
k
(k 2), then p
must divide one of the a
i
s.
We are ready to prove our factorization theorem!
Theorem 1.19. (Fundamental Theorem of Arithmetic) Every positive in-
teger greater than 1 can be factored into a product of primes. The primes
that occur in any two factorizations are the same, except perhaps for the
order in which they occur in the factorization.
Remark 1.20. The statement of this theorem has two parts to it. The rst
sentence is an existence statementit asserts that for every positive integer
greater than 1, a prime factorization exists. The second sentence is a unique-
ness statement. It asserts that except for rearrangement, there can only be
12 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
one prime factorization. To understand this second assertion a little better,
consider the two factorizations of 12 as 12 = 3 2 2, and 12 = 2 3 2.
The orders in which the 2s and the 3 appear are dierent, but in both fac-
torizations, 2 appears twice, and 3 appears once. The uniqueness part of
the theorem tells us that no matter how 12 is factored, we will at most be
able to rearrange the order in which the two 2s and the 3 appear such as in
the two factorizations above, but every factorization must consist of exactly
two 2s and one 3.
Proof of Theorem 1.19. We will prove the existence part rst. The proof
is very simple. Assume to the contrary that there exists an integer greater
than 1 that does not admit prime factorization. Then, the set of positive
integers greater than 1 that do not admit prime factorization is nonempty,
and hence, by the Well-Ordering Principle, there must be a least positive
integer greater than 1, call it a, that does not admit prime factorization. Now
a cannot itself be prime, or else, a = a would be its prime factorization,
contradicting our assumption about a. Hence, a = bc for suitable positive
integers b and c, with 1 < b < a and 1 < c < a. But then, b and c must
both admit factorization into primes, since they are greater than 1 and less
than a, and a was the least positive integer greater than 1 without a prime
factorization. If b = p
1
p
2
p
k
and c = q
1
q
2
q
l
are prime factorizations
of b and c respectively, then a(= bc) = p
1
p
2
p
k
q
1
q
2
q
l
yields a prime
factorization of a, contradicting our assumption about a. Hence, no such
integer a can exist, that is, every positive integer must factor into a product
of primes.
Let us move on to the uniqueness part of the theorem. The basic ideas
behind the proof of this portion of the theorem are quite simple as well. The
key is to recognize that if an integer a has two prime factorizations, then
some prime in the rst factorization must equal some prime in the second
factorization. This will then allow us to cancel the two primes, one from
each factorization, and arrive at two factorizations of a smaller integer. The
13
rest is just induction.
So assume to the contrary that there exists a positive integer greater than
1 with two dierent (i.e., other than for rearrangement) prime factorizations.
Then, exactly as in the proof of the existence part above, the Well-Ordering
Principle applied to the (nonempty) set of positive integers greater than
1 that admit two dierent prime factorizations shows that there must be
a least positive integer greater than 1, call it a, that admits two dierent
prime factorizations. Suppose that
a = p
n
1
1
p
ns
s
= q
m
1
1
q
mt
t
,
where the p
i
(i = 1, . . . , s) are distinct primes, and the q
j
(j = 1, . . . , t) are
distinct primes, and the n
i
and the m
j
are positive integers. (By distinct
primes we mean that p
1
, p
2
, . . . , p
s
are all dierent from one another,
and similarly, q
1
, q
2
, . . . , q
t
are all dierent from one another.) Since p
1
divides a, and since a = q
m
1
1
q
mt
t
, p
1
must divide q
m
1
1
q
mt
t
. Now, by
Exercise 1.18 above (which simply generalizes Lemma 1.17), we nd that
since p
1
divides the product q
m
1
1
q
mt
t
, it must divide one of the factors of
this product, that is, it must divide one of the q
j
. Relabeling the primes q
j
if necessary (remember, we do not consider a rearrangement of primes to be
a dierent factorization), we may assume that p
1
divides q
1
. Since the only
positive divisors of q
1
are 1 and q
1
, we nd p
1
= q
1
.
Since now p
1
= q
1
, consider the integer a

= a/p
1
= a/q
1
. If a

= 1, this
means that a = p
1
= q
1
, and there is nothing to prove, the factorization of
a is already unique. So assume that a

> 1. Then a

is a positive integer
greater than 1 and less than a, so by our assumption about a, any prime
factorization of a

must be unique (that is, except for rearrangement of


factors). But then, since a

is obtained by dividing a by p
1
(= q
1
), we nd
that a

has the prime factorizations


a

= p
n
1
1
1
p
ns
s
= q
m
1
1
1
q
mt
t
So, by the uniqueness of prime factorization of a

, we nd that n
1
1 = m
1
1
14 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
(so n
1
= m
1
), s = t, and after relabeling the primes if necessary, p
i
= q
i
,
and similarly, n
i
= m
i
, for i = 2, . . . , s(= t). This establishes that the two
prime factorizations of a we began with are indeed the same, except perhaps
for rearrangement.
2
Remark 1.21. While Theorem 1.19 only talks about integers greater than
1, a similar result holds for integers less than 1 as well: every integer less
than 1 can be factored as 1 times a product of primes, and these primes
are unique, except perhaps for order. This is clear, since, if a is a negative
integer less than 1, then a = 1 [a[, and of course, [a[ > 1 and therefore
admits unique prime factorization.
The following result follows easily from studying prime factorizations
and will be useful in the exercises:
Proposition 1.22. Let a and b be integers greater than 1. Then b divides
a if and only if the prime factors of b are a subset of the prime factors of a
and if a prime p occurs in the factorization of b with exponent y and in the
factorization of a with exponent x, then y x.
Proof. Let us assume that b[a, so a = bc for some integer c. If c = 1,
then a = b, and there is nothing to prove, the assertion is obvious. So
suppose c > 1. Then c also has a factorization into primes, and multiplying
together the prime factorizations of b and c, we get a factorization of bc into
a product of primes. On the other hand, bc is just a, and a has its own
prime factorization as well. By the uniqueness of prime factorizations, the
prime factorization of bc that we get from multiplying together the prime
factorizations of b and c must be the prime factorization of a. In particular,
the prime factors of b (and c) must be a subset of the prime factors of a. Now
suppose that a prime p occurs to the power x in the factorization of a, to the
power y in the factorization of b, and to the power z in the factorization of c.
Multiplying together the factorizations of b and c, we nd that p occurs to
15
the power y +z in the factorization of bc. Since the factorization of bc is just
the factorization of a and since p occurs to the power x in the factorization
of a, we nd that x = y + z. In particular, y x. This proves one half of
the proposition.
As for the converse, assume that b has the prime factorization b =
p
n
1
1
p
ns
s
. Then, by the hypothesis, the primes p
1
, . . . , p
s
must all appear in
the prime factorization of a with exponents at least n
1
, . . . , n
s
(respectively).
Thus, the prime factorization of a must look like a = p
m
1
1
p
ms
s
p
m
s+1
s+1
p
mt
t
,
where m
i
n
i
for i = 1, . . . , s, and where p
s+1
, . . . , p
t
are other primes.
Writing c for p
m
1
n
1
1
p
msns
s
p
m
s+1
s+1
p
mt
t
and noting that m
i
n
i
0
for i = 1, . . . , s by hypotheses, we nd that c is an integer, and of course,
clearly, a = (p
n
1
1
p
ns
s
)c, i.e, a = bc. This proves the converse.
2
We have proved the Fundamental Theorem of Arithmetic, but there
remains the question of showing that there are innitely many primes. The
proof that we provide is due to Euclid, and is justly celebrated for its beauty.
Theorem 1.23. (Euclid) There exist innitely many prime numbers.
Proof. Assume to the contrary that there are only nitely many primes.
Label them p
1
, p
2
, . . . , p
n
. (Thus, we assume that there are n primes.)
Consider the integer a = p
1
p
2
p
n
+ 1. Since a > 1, a admits a prime
factorization by Theorem 1.19. Let q be any prime factor of a. Since the
set p
1
, p
2
, . . . , p
n
contains all the primes, q must be in this set, so q
must equal, say, p
i
. But then, a = q(p
1
p
2
p
i1
p
i+1
p
n
) + 1, so we get
a remainder of 1 when we divide a by q. In other words, q cannot divide a.
This is a contradiction. Hence there must be innitely many primes! 2
Question 1.24. What is wrong with the following proof of The-
orem 1.23?There are innitely many positive integers. Each of
them factors into primes by Theorem 1.19. Hence there must be
innitely many primes.
16 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
Further Exercises
Exercise 1.25. In this exercise, we will formally prove the validity of various
quick tests for divisibility that we learn in high school!
1. Prove that an integer is divisible by 2 if and only if the digit in the units
place is divisible by 2. (Hint: Look at a couple of examples: 58 = 510+8,
while 57 = 5 10 + 7. What does Lemma 1.2 suggest in the context of
these examples?)
2. Prove that an integer (with two or more digits) is divisible by 4 if and only
if the integer represented by the tens digit and the units digit is divisible
by 4. (To give you an example, the integer represented by the tens digit
and the units digit of 1024 is 24, and the assertion is that 1024 is divisible
by 4 if and only if 24 is divisible by 4which it is!)
3. Prove that an integer (with three or more digits) is divisible by 8 if and
only if the integer represented by the hundreds digit and the tens digit
and the units digit is divisible by 8.
4. Prove that an integer is divisible by 3 if and only if the sum of its digits is
divisible by 3. (For instance, the sum of the digits of 1024 is 1+0+2+4 =
7, and the assertion is that 1024 is divisible by 3 if and only if 7 is divisible
by 3and therefore, since 7 is not divisible by 3, we can conclude that
1024 is not divisible by 3 either! Here is a hint in the context of this
example: 1024 = 1 1000 +0 100 +2 10 +4 = 1 (999 +1) +0 (99 +
1) +2 (9 +1) +4. What can you say about the terms containing 9, 99,
and 999 as far as divisibility by 3 is concerned? Then, what does Lemma
1.2 suggest?)
5. Prove that an integer is divisible by 9 if and only if the sum of its digits
is divisible by 9.
6. Prove that an integer is divisible by 11 if and only if the dierence between
the sum of the digits in the units place, the hundreds place, the ten
thousands place, . . . (the places corresponding to the even powers of
10) and the sum of the digits in the tens place, the thousands place,
the hundred thousands place, . . . (the places corresponding to the odd
powers of 10) is divisible by 11. (Hint: 10 = 11 1, 100 = 99 + 1,
1000 = 1001 1, 10000 = 9999 + 1, etc. What can you say about the
integers 11, 99, 1001, 9999, etc. as far as divisibility by 11 is concerned?)
17
Exercise 1.26. Given nonzero integers a and b, with b > 0, write a = bq +r
(division algorithm). Show that gcd(a, b) = gcd(b, r).
(This exercise forms the basis for the Euclidean algorithm for nding the
greatest common divisor of two nonzero integers. For instance, how do we
nd the greatest common divisor of, say, 48 and 30 using this algorithm? We
divide 48 by 30 and nd a remainder of 18, then we divide 30 by 18 and
nd a remainder of 12, then we divide 18 by 12 and nd a remainder of 6,
and nally, we divide 12 by 6 and nd a remainder of 0. Since 6 divides 12
evenly, we claim that gcd(48, 30) = 6. What is the justication for this claim?
Well, applying the statement of this exercise to the rst division, we nd that
gcd(48, 30) = gcd(30, 18). Applying the statement to the second division,
we nd that gcd(30, 18) = gcd(18, 12). Applying the statement to the third
division, we nd that gcd(18, 12) = gcd(12, 6). Since the fourth division shows
that 6 divides 12 evenly, gcd(12, 6) = 6. Working our way backwards, we obtain
gcd(48, 30) = gcd(30, 18) = gcd(18, 12) = gcd(12, 6) = 6.)
Exercise 1.27. Given nonzero integers a and b, let h = a/gcd(a, b) and
k = b/gcd(a, b). Show that gcd(h, k) = 1.
Exercise 1.28. Show that if a and b are nonzero integers with gcd(a, b) = 1,
and if c is an arbitrary integer, then a[c and b[c together imply ab[c. Give a
counterexample to show that this result is false if gcd(a, b) ,= 1. (Hint: Just as
in the proof of Lemma 1.14, use the fact that gcd(a, b) = 1 to write 1 = xa+yb
for suitable integers x and y, and then multiply both sides by c. Now stare hard
at your equation!)
Exercise 1.29. The Fibonacci Sequence, 1, 1, 2, 3, 5, 8, 13, is dened as
follows: If a
i
stands for the ith term of this sequence, then a
1
= 1, a
2
= 1,
and for n 3, a
n
is given by the formula a
n
= a
n1
+a
n2
. Prove that for all
n 2, gcd(a
n
, a
n1
) = 1.
Exercise 1.30. Given an integer n 1, recall that n! is the product 1 2
3 (n1) n. Show that the integers (n+1)! +2, (n+1)! +3, . . . , (n+1)! +
(n + 1) are all composite.
Exercise 1.31. Use Exercise 1.30 to prove that given any positive integer n,
one can always nd consecutive primes p and q such that q p n.
Exercise 1.32. If m and n are odd integers, show that 8 divides m
2
n
2
.
Exercise 1.33. Show that 3 divides n
3
n for any integer n. (Hint: Factor
n
3
n as n(n
2
1) = n(n1)(n+1). Write n as 3q +r, where r is one of 0,
1, or 2, and examine, for each value of r, the divisibility of each of these factors
18 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
by 3. This result is a special case of Fermats Little Theorem , which you will
encounter as Theorem 4.42 in Chapter 4 ahead.)
Exercise 1.34. Here is another instance of Fermats Little Theorem : show
that 5 divides n
5
n for any integer n. (Hint: As in the previous exercise, factor
n
5
n appropriately, and write n = 5q +r for 0 r < 5.)
Exercise 1.35. . Show that 7 divides n
7
n for any integer n.
Exercise 1.36. Use Proposition 1.22 to show that the number of positive
divisors of n is (n
1
+ 1)(n
2
+ 1) (n
k
+ 1).
Exercise 1.37. Let m and n be positive integers. By allowing the exponents
in the prime factorizations of m and n to equal 0 if necessary, we may assume
that m = p
m
1
1
p
m
2
2
p
m
k
k
and n = p
n
1
1
p
n
2
2
p
n
k
k
, where for i = 1, , k, p
i
is prime, m
i
0, and n
i
0. (For instance, we can rewrite the factorizations
84 = 2
2
3 7 and 375 = 3 5
3
as 84 = 2
2
3 5
0
7 and 375 = 2
0
3 5
3
7
0
.)
For each i, let d
i
= min(m
i
, n
i
). Prove that gcd(m, n) = p
d
1
1
p
d
2
2
p
d
k
k
.
Exercise 1.38. Given two (nonzero) integers a and b, the least common mul-
tiple of a and b (written as lcm(a, b)) is dened to be the smallest of all the
positive common multiples of a and b.
1. Show that this denition makes sense, that is, show that the set of positive
common multiples of a and b has a smallest element.
2. Retaining the notation of Exercise 1.37 above, let l
i
= max(m
i
, n
i
) (i =
1, . . . , k). Show that lcm(m, n) = p
l
1
1
p
l
2
2
p
l
k
k
.
3. Use Exercise 1.37 and Part 2 above to show that lcm(a, b) = ab/gcd(a, b).
4. Conclude that if if gcd(a, b) = 1, then lcm(a, b) = ab.
Exercise 1.39. Let a = p
n
, where p is a prime and n is a positive integer.
Prove that the number of integers x such that 1 x a and gcd(x, a) = 1 is
p
n
p
n1
.
(More generally, if a is any integer greater than 1, one can ask for the number
of integers x such that 1 x a and gcd(x, a) = 1. This number is denoted
by (a), and is referred to as Eulers -function. It turns out that if a has the
prime factorization p
m
1
1
p
m
2
2
p
m
k
k
, then (a) = (p
m
1
1
) (p
m
2
2
) (p
m
k
k
)!
Delightful as this statement is, we will not delve deeper into it in this book, but
you are encouraged to read about it in any introductory textbook on number
theory.)
19
Exercise 1.40. The series 1+1/2+1/3+ is known as the harmonic series.
This exercise concerns the partial sums (see below) of this series.
1. Fix an integer n 1, and let S
n
denote the set 1, 2, . . . , n Let 2
t
be
the highest power of 2 that appears in S
n
. Show that 2
t
does not divide
any element of S
n
other than itself.
2. For any integer n 1, the nth partial sum of the harmonic series is the
sum of the rst n terms of the series, that is, it is the number 1 +1/2 +
1/3 + 1/n. Show that if n 2, the nth partial sum is not an integer
as follows:
(a) Clearing denominators, show that the nth partial sum may be written
as a/b, where b = n! and a = (2 3 n) +
(2 4 n) + (2 3 5 n) + + (2 3 n 1).
(b) Let S
n
and 2
t
be as in part 1 above. Also, let 2
m
be the highest
power of 2 that divides n!. Show that m t 1 and that m
mt + 1 1.
(c) Conclude from part 2b above that 2
mt+1
divides b.
(d) Use part 1 to show that 2
mt+1
divides all the summands in the
expression in part 2a above for a except the term (2 3 2
t
1
2
t
+ 1 n).
(e) Conclude that 2
mt+1
does not divide a.
(f) Conclude that the nth partial sum is not an integer.
Exercise 1.41. Fix an integer n 1, and let S
n
denote the set 1, 3, 5, . . . , 2n
1. Let 3
t
be the highest power of 3 that appears in S
n
. Show that 3
t
does
not divide any element of S
n
other than itself. Can you use this result to show
that the nth partial sums (n 2) of a series analogous to the harmonic series
(see Exercise 1.40 above) are not integers?
Exercise 1.42. Prove using the unique prime factorization theorem that

2
is not a rational number. Using essentially the same ideas, show that

p is not
a rational number for any prime p. (Hint: Suppose that

2 = a/b for some


two integers a and b with b ,= 0. Rewrite this as a
2
= 2b
2
. What can you say
about the exponent of 2 in the prime factorizations of a
2
and 2b
2
?)
20 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
Notes
Remarks on Denition 1.6 The alert reader may wonder why we have re-
stricted both integers a and b to be nonzero in Denition 1.6 above. Let us explore
this question further: Suppose rst that a and b are both zero. Note that every
nonzero integer divides 0, since given any nonzero integer n, we certainly have the
relation n 0 = 0. Thus, if a and b are both zero, we nd that every nonzero integer
is a common divisor of a and b, and thus, there is no greatest common divisor at
all. The concept of the greatest common divisor therefore has no meaning in this
situation. Next, let us assume just one of a and b is nonzero. For concreteness, let
us assume a ,= 0 and b = 0. Then, as we have seen in the discussions preceding
Dention 1.6, [a[ is a divisor of a, and is the largest of the divisors of a. Also, since
every nonzero integer divides 0 and we have assumed b = 0, we nd [a[ divides b.
It follows that [a[ is a common divisor of a and b, and since [a[ is the largest among
the divisors of a, it has to be the greatest of the common divisors of a and b. We
nd therefore that if exactly one of a and b, say a, is nonzero, then the concept of
gcd(a, b) has meaning, and the gcd in this case equals [a[. However, this situation
may be viewed as somewhat less interesting, since every integer anyway divides b.
The more interesting case, therefore, is when both a and b are nonzero, and we
have chosen to focus on that situation in Denition 1.6.
Remarks on Theorem 1.9 and Exercise 1.11. It is very crucial that d
be the least positive linear combination of a and b for you to be able to conclude
that gcd(a, b) = d. For instance, if you only know that there exist integers x and y
such that xa + yb = 2, you cannot conclude that gcd(a, b) = 2for all you know,
there may exist two other integers x

and y

such that x

a +y

b = 1!
Notice though that if you know that there exist integers x

and y

such that
x

a+y

b = 1, you can conclude that gcd(a, b) = 1. For 1 has to be the least positive
linear combination of a and b, since there is no positive integer smaller than 1.
Remarks on the denition of the greatest common divisor. We have
dened the greatest common divisor of two nonzero integers a and b to be the largest
of their common divisors (Denition 1.6), and we have noted that gcd(a, b) must
be positive. On the other hand, Corollary 1.10 showed that every common divisor
of a and b must divide gcd(a, b). Putting these together, we nd that gcd(a, b) has
21
the following specic properties:
1. gcd(a, b) is a positive integer.
2. gcd(a, b) is a common divisor of a and b.
3. Every common divisor of a and b must divide gcd(a, b).
You will nd that many textbooks have turned these properties around and have
used these properites to dene the greatest common divisor! Thus, these textbooks
dene the greatest common divisor of a and b to be that integer d which has the
following properties:
1. d is a positive integer.
2. d is a common divisor of a and b.
3. Every common divisor of a and b must divide d.
Of course, it is not immediately clear that such an integer d must exist, nor is it
clear that it must be unique, and these books then give a proof of the existence and
uniqueness of such a d. Furthermore, it is not immediately clear that the integer
d yielded by this denition is the same as the greatest common divisor as we have
dened it (although it will be clear if one takes a moment to think about it). The
reason why many books prefer to dene the greatest common divisor as above is
that this denition applies (with a tiny modication) to other number systems
where the concept of a largest common divisor may not exist.
In the case of the integers, however, we prefer our Denition 1.6, since the
largest of the common divisors of a and b is exactly what we would intuitively
expect gcd(a, b) to be!
22 CHAPTER 1. DIVISIBILITY IN THE INTEGERS
Chapter 2
Rings and Fields
2.1 Rings: Denition and Examples
Abstract algebra begins with the observation that several sets that occur
naturally in mathematics, such as the set of integers, the set of rationals,
the set of 22 matrices with entries in the reals, the set of functions from the
reals to the reals, all come equipped with certain operations that allow one
to combine any two elements of the set and come up with a third element.
These operations go by dierent names, such as addition, multiplication, or
composition (you would have seen the notion of composing two functions in
calculus). Abstract algebra studies mathematics from the point of view of
these operations, asking, for instance, what properties of a given mathemat-
ical set can be deduced just from the existence of a given operation on the
set with a given list of properties. We will be dealing with some of the more
rudimentary aspects of this approach to mathematics in this book.
However, do not let the abstract nature of the subject fool you into
thinking that mathematics no longer deals with concrete objects! Abstrac-
tion grows only from extensive studies of the concrete, it is merely a device
(albeit an extremely eective one) for codifying phenomena that simultane-
ously occur in several concrete mathematical sets. In particular, to under-
23
24 CHAPTER 2. RINGS AND FIELDS
stand an abstract concept well, you must work with the specic examples
from which the abstract concept grew (remember the advice on active learn-
ing).
Let us look at Z, focusing on the operations of addition and multiplica-
tion.
Given a set S, recall that a binary operation on S is a process that takes
an ordered pair of elements from S and gives us a third member of the set.
It is helpful to think of this in more abstract termsa binary operation
on S is just a function f : S S S, that is, a rule that assigns to each
ordered pair (a, b), a third element f(a, b). Given an aribitrary set S, it is
quite easy to dene binary operations on it, but it is much harder to dene
binary operations that satisfy additional properties.
Question 2.1. How many dierent binary operations can be de-
ned on the set 0, 1? Now select some of these binary operations
and check whether they are associative or commutative. How many
binary operations can be constructed on a set T that has n elements?
What will be crucial to us is that addition and multiplication are special
binary operations on Z that satisfy certain extra properties.
First, why are addition and multiplication binary operations? The pro-
cess of adding two integers is of course familiar to us, but suppose we view
addition abstractly as a rule that assigns to each ordered pair of integers
(m, n) the integer m+n. (For instance, addition assigns to the ordered pair
(2, 3) the integer 5, to the ordered pair (3, 4) the integer 1, to the ordered
pair (1, 0) the integer 1, etc.) It is clear then that addition is indeed a binary
operationit takes an ordered pair of integers, namely (m, n), and gives us
a third uniquely determined integer, namely m+n. Similarly, multiplication
too is a binary operationit is a rule that assigns to every ordered pair of
integers (m, n) the uniquely determined integer m n.
What are the properties of these binary operations? Let us consider
addition rst. It is customary to write (Z, +) to emphasize the fact that we
are considering Z not just as a set of objects, but as a set with the binary
2.1. RINGS: DEFINITION AND EXAMPLES 25
operation of addition. (We will temporarily ignore the fact that Z has a
second binary operation, namely multiplication, dened on it.) The rst
property that (Z, +) has is that + is associative. That is, for all integers a,
b, and c, (a+b) +c = a+(b +c). The second property that (Z, +) has is the
existence of an identity element with respect to +. This is the integer 0it
satises the condition a + 0 = 0 + a = a all integers a. The third property
of (Z, +) is the existence of inverses with respect to +. For every integer a,
there is an integer b (depending on a) such that a + b = b + a = 0. (It is
clear what this integer b is, it is just the integer a.)
What these observations show is that the integers form a group with
respect to addition. We will study groups in detail in Chapter 4 ahead,
but let us introduce the concept here. It turns out that the situation we
have encountered above (namely, a set equipped with a binary operation
with certain properties) arises in several dierent areas of mathematics.
Precisely because the same situation appears in so many dierent contexts,
it has been given a name and has been studied extensively as a subject in
its own right.
Denition 2.2. A group is a set S with a binary operation : S S S
such that
1. is associative, i.e., a (b c) = (a b) c for all a, b, and c in S,
2. S has an identity element with respect to , i.e., an element id such
that a id = id a = a for all a in S, and
3. every element of S has an inverse with respect to , i.e., for every element
a in S there exists an element a
1
such that a a
1
= a
1
a = id.
To emphasize that there are two ingredients in this denitionthe set S and
the operation with these special propertiesthe group is sometimes written
as (S, ), and S is often referred to as a group with respect to the operation .
The reason that the integers form a group with respect to addition is
that if we take the set S of this denition to be Z, and if we take the
binary operation to be +, then the three conditions of the denition are
26 CHAPTER 2. RINGS AND FIELDS
met. There is a vast and beautiful theory about groups, the beginnings of
which we will pursue in Chapter 4 ahead.
Observe that there is one more property of addition that we have not
listed yet, namely commutativity. This is the property that for all integers
a and b, a + b = b + a. In the language of group theory, this makes (Z, +)
an abelian group:
Denition 2.3. An abelian group is one in which the function in Denition
2.2 above satises the additional condition a b = b a for all a and b in S.
Commutativity of addition is a crucial property of the integers; the only
reason we delayed introducing it was to allow us rst to introduce the notion
of a group.
Now let us consider multiplication. As with addition, we write (Z, )
to emphasize the fact that we are considering Z as a set with the binary
operation of multiplication, temporarily ignoring the operation addition.
As with addition, we nd that multiplication is associative, that is, for all
integers a, b, and c, (a b) c = a (b c). Also, Z has an identity with respect
to multiplication. This is the integer 1; it satises a 1 = 1 a = a for all
integers a.
Question 2.4. Is (Z, ) a group? In other words, do the integers
form a group with respect to multiplication? To answer this ques-
tion, you would check whether the three group axioms above hold
for (Z, ). What is the inverse with respect to multiplication of 1?
What is the inverse of 2? What is the inverse of 0?
There are two more properties of multiplication of integers we wish to
consider. The rst is that multiplication is commutative, that is, a b =
b a for all integers a and b. The second, which is not a property of just
multiplication alone, but rather a property that connects multiplication and
addition together, is the distributivity of multiplication over addition, that
is, for all integers a, b, and c, a (b + c) = a b + a c, and (a + b) c =
a c +b c. (Notice that since multiplication of integers is commutative, the
second relation in the previous sentence follows from the rst!)
2.1. RINGS: DEFINITION AND EXAMPLES 27
There are other properties of these operations of course (for instance
a b = 0 implies that either a = 0 or b = 0), but we will study these
later. Let us meanwhile reect on the properties that we have considered
so far. Studying them closely, one gets the sense that these properties are
somehow rather natural. For instance, if one were to think of the integers
as (intellectual) counting tools, then it is clear that addition must necessarily
be commutative, since commutativity of addition corresponds to the fact
that if you have a certain number of objects in one pile and a certain number
in another, then the total number of objects can be obtained either by
counting all the objects in the rst pile and then all the objects in the
second pile, or by counting all the objects in the second pile and then all
the objects in the rst pile.
This sense of these properties being natural is further reinforced when
we consider other number systems that we encounter in mathematics. For
instance, consider the set of all polynomials in one variable whose coecients
are real numbers, a set with which you are already very familiar. (The real
numbers are traditionally denoted by R, and the set of all polynomials in
one variable whose coecients are real numbers is traditionally denoted by
R[x].) This set, too, is more than just a collection of objects. Just as
with the integers, R[x] has two binary operations, also called addition and
multiplication. Recall that given two polynomials g(x) =
n

i=0
g
i
x
i
and h(x) =
m

j=0
h
i
x
i
, we add g and h by adding together the coecients of the same
powers of x, and we multiply g and h by multiplying each monomial g
i
x
i
of
g by each monomial h
j
x
j
of h and adding the results together. (For instance,
(1+x+x
2
) +(x+

3x
3
) is 1+2x+x
2
+

3x
3
, and (1+x+x
2
) (x+

3x
3
)
is x + x
2
+ (1 +

3)x
3
+

3x
4
+

3x
5
.) Furthermore, it is our experience
that these binary operations on R[x] satisfy the very same properties above
that the corresponding operations on Z satised.
It turns out that these properties of addition and multiplication are
shared not just by Z and R[x], but by a whole host of number systems
28 CHAPTER 2. RINGS AND FIELDS
in mathematics. Because of the importance of such sets with two binary
operations with these special properties, there is a special term for them
they are called rings.
Denition 2.5. A ring is a set R with two binary operations + and such
that
1. a + (b +c) = (a +b) +c for all a, b, c in R.
2. There exists an element in R, denoted 0, such that a + 0 = 0 +a = a
for all a in R.
3. For each a in R there exists an element in R, denoted a, such that
a + (a) = (a) +a = 0.
4. a +b = b +a for all elements a, b in R.
5. a (b c) = (a b) c for all elements a, b, c in R.
6. There exists an element in R, denoted 1, such that a 1 = 1 a = a for
all a in R.
7. a (b +c) = a b +a c and (a +b) c = a c +b c for all elements a, b,
c in R.
Remark 2.6. The binary operation + is usually referred to as addition and
the binary operation is usually referred to as multiplication, in keeping
with the terminology for the integers and other familiar rings. As is the
usual practice in high school algebra, one often suppresses the multiplication
symbol, that is, one often writes ab for a b.
Remark 2.7. Just as we did earlier with the integers, if we temporarily ignore
the operation on R and write (R, +) to indicate that we are focusing on
just the operation +, then the rst four conditions in the denition of the
ring R show that (R, +) is an abelian group.
Remark 2.8. We have used the term number system at several places in
the book without really being explicit about what a number system is. We
did not have the language before this point to make our meaning precise,
but what we had intended to convey loosely by this term is the concept
2.1. RINGS: DEFINITION AND EXAMPLES 29
of a set with two binary operations with properties much like those of the
integers. But now that we have the language, let us be precise: a number
system is just a ring as dened above!
It must be borne in mind however that number system is a nonstan-
dard term: it is not used very widely, and when used at all, dierent authors
mean dierent things by the term! So it is better to stick to rings, which
is standard.
Observe that we left out one important property of the integers in our
denition of a ring, namely the commutativity of multiplication. And corre-
spondingly, we have included both left distributivity (a (b +c) = a b+a c)
and right distributivity ((a+b)c = ac+bc) of multiplication over addition.
While this may seem strange at rst, think about the set of 2 2 matrices
with entries in R. Convince yourselves that this is a ring with respect to the
usual denitions of matrix addition and multiplicationsee Example 2.16
ahead. Yet, even in this extremely familiar number system, multiplication
is not commutative; for instance,
_
1 0
0 0
_

_
0 1
0 0
_
,=
_
0 1
0 0
_

_
1 0
0 0
_
.
Rings in which multiplication is not commutative are fairly common in
mathematics, and hence requiring commutativity of multiplication in the
denition of a ring would be too restrictive. On the other hand, there is
no denying that a signicant proportion of the rings that we come across
indeed have multiplication that is commutative. Thus, it is reasonable to
single them out as special cases of rings, and we have the following:
Denition 2.9. A commutative ring is a ring R in which a b = b a for all
a and b in R.
(Rings in which the multiplication is not commutative are referred to as
noncommutative rings.)
The following are various examples of rings. (Once again, recall the ad-
vice in the preliminary chapter To the Student, page ix, on reading actively.)
30 CHAPTER 2. RINGS AND FIELDS
Example 2.10. The set of rational numbers, Q, with the usual operations of
addition and multiplication forms a ring. We know how to add and multiply
two rational numbers very well, and we know that all the ring axioms hold
for the rationals. (One can take a more advanced perspective and prove that
the ring axioms hold for the rationals, starting from the fact that they hold
for the integers. Although sound, such an approach is unduly technical for
a rst course.) Q is, in fact, a commutative ring.
Question 2.10.1. Q has one crucial property (with respect to
multiplication) that Z does not have. Can you discover what that
might be? (See the remarks on page 86 in the notes, but only after
you have thought about this question on your own!)
Example 2.11. In a like manner, both the reals, R, and the complexes,
usually denoted C, are rings under the usual operations of addition and
multiplication. Again, we will not try to prove that the ring axioms hold;
we will just invoke our intimate knowledge of R and C to recognize that
they are rings.
Example 2.12. Let Q[

2] denote the set of all real numbers of the form


a + b

2, where a and b are arbitrary rational numbers. For instance, this


includes numbers like 1/2 + 3

2, 1/7 + (1/5)

2, etc. You know from


your experience with real numbers how to add and multiply two elements
a + b

2 and c + d

2 of this set. Under these operations, this set indeed


forms a ringlet us see why:
Question 2.12.1. Here is the rst point you need to check: under
this method of addition and multiplication, do the sum and product
of any two elements of this set also lie in this set? (Remember, a
binary operation should take an ordered pair of elements to another
element in the same set. If, say, the usual product of some two
elements a + b

2 and c + d

2 of this set does not belong to this


set, then our usual product will not be a valid binary operation on
this set, and hence we cannot claim that this set is a ring!)
Question 2.12.2. Why should associativity of addition and multi-
plication and distributivity of multiplication over addition all follow
from the fact that this set is contained in R?
2.1. RINGS: DEFINITION AND EXAMPLES 31
Question 2.12.3. Are all other ring axioms satised? Check!
Question 2.12.4. You know that

2 is not a rational number (see


Chapter 1, Exercise 1.42). Why does it follow that if a and b are
rational numbers, then a +b

2 = 0 if and only if both a and b are


zero.
See the notes on page 86 (but as always, after you have played with this
example yourselves!). Also, see the notes on page 90 (in particular, Example
2.141 on page 92) for an explanation of the notation Q[

2].
Example 2.13. Now let us generalize Example 2.12 above. Let m be any
rational number. Note that if m is negative,

m will not be a real number
but a complex (non-real) number. Let Q[

m] denote the set of all complex


numbers of the form a+b

m, where a and b are arbitrary rational numbers.


(Of course, if m 0, then Q[

m] will actually be contained in the reals.)


Question 2.13.1. What familiar set of numbers does Q[

m] re-
duce to if m is the square of a rational number?
Question 2.13.2. More generally, compare the sets Q[

m] and
Q[

] when m and m

satisfy the relation m = q


2
m

for some
rational number q. Are these the same sets?
Question 2.13.3. Under the usual addition and multiplication of
complex numbers, does Q[

m] form a ring? (Follow the same steps


as in Example 2.12 above.)
Question 2.13.4. Is it true that if a and b are rational numbers,
then a + b

m = 0 if and only if a = b = 0? (As always, if you


claim something is not true, give a counterexample!)
Example 2.14. As a specic example of Example 2.13, take m = 1. We
get the ring Q[], which is the set of all complex numbers of the form a +b,
where a and b are arbitrary rational numbers and stands for

1
Exercise 2.14.1. Show that if a and b are real numbers, then
a + b = 0 if and only if both a and b are zero. (See the notes on
page 87 for a clue.)
32 CHAPTER 2. RINGS AND FIELDS
Example 2.15. Consider the set of rational numbers q that have the prop-
erty that when q is written in the reduced form a/b with a, b integers and
gcd(a, b) = 1 the denominator b is odd. This set is usually denoted by Z
(2)
,
and contains elements like 1/3, 5/7, 6/19, etc., but does not contain 1/4
or 5/62.
Question 2.15.1. Does Z
(2)
contain 2/6?
Notice that every element of Z
(2)
is just a fraction (albeit of a particular
kind). We know how to add and multiply two fractions together, so we can
use this knowledge to add and multiply any two elements of Z
(2)
. Here is
the punch line: Z
(2)
forms a ring under the usual operation of addition and
multiplication of fractions! Strange as this ring may seem at rst, it plays
an important role in number theory.
Question 2.15.2. Check that if you add (or multiply) two fractions
in Z
(2)
you get a fraction that is not an arbitrary rational number
but one that also lives in Z
(2)
. What role does the fact that the
denominators are odd play in ensuring this? (The role of the odd
denominators is rather crucial; make sure that you understand it!)
Question 2.15.3. Why do associativity and distributivity follow
from the fact that Z
(2)
Q?
Question 2.15.4. Do the other ring axioms hold? Check!
Question 2.15.5. Can you generalize this construction to other
subsets of Q where the denominators have analogous properties?
(See the notes on page 87 for some comments.)
Example 2.16. The set of nn matrices with entries in R (M
n
(R)), where
n is a positive integer, forms a ring with respect to the usual operations
of matrix addition and multiplication. For almost all values of n, matrix
multiplication is not commutative.
Question 2.16.1. What is the exception?
2.1. RINGS: DEFINITION AND EXAMPLES 33
Checking associativity of addition and multiplication and the distribu-
tivity of multiplication over addition is tedious, but you should check at
least one of them so as to be familiar with the process.
Exercise 2.16.1. For example, prove that for any three matrices
A, B, and C, (A+B) +C = A+ (B +C).
What is important is that you get a feel for how associativity and dis-
tributivity in M
n
(R) derives from the fact that associativity and distribu-
tivity hold for R.
Question 2.16.2. What about the ring axioms other than asso-
ciativity and distributivity: do they hold?
Question 2.16.3. What are the additive and multiplicative iden-
tities?
Question 2.16.4. Let e
i,j
denote the matrix with 1 in the (i, j)-th
slot and 0 everywhere else. Study the case of 2 2 matrices and
guess at a formula for the product e
i,j
e
k,l
. (You need not try to
prove formally that your formula is correct, but after you have made
your guess, substitute various values for i, j, k, and l and test your
guess.)
Question 2.16.5. Would the ring axioms still be satised if we
only considered the set of n n matrices whose entries came from
Q? From Z?
Question 2.16.6. Now suppose R is any ring. Let us consider
the set M
n
(R) of n n matrices with entries in R with the usual
denitions of matrix addition and multiplication. Is M
n
(R) with
these operations a ring? What if R is not commutative? Does this
aect whether M
n
(R) is a ring or not?
(See the notes on page 88 for some hints.)
Example 2.17. R[x], the set of polynomials in one variable with coecients
from R, forms a ring with respect to the usual operations of polynomial
addition and multiplication. (We have considered this before.) Here, x
denotes the variable. Of course, one could use any letter to represent the
34 CHAPTER 2. RINGS AND FIELDS
variable. For instance, one could refer to the variable as t, in which case
the set of polynomials with coecients in R would be denoted by R[t].
Sometimes, to emphasize our choice of notation for the variable, we refer to
R[x] as the set of polynomials in the variable x with coecients in R, and
we refer to R[t] as the set of polynomials in the variable t with coecients in
R. Both R[x] and R[t], of course, refer to the same set of objects. Likewise,
we often write f(x) (or f(t)) for a polynomial, rather than just f, to
emphasize that the variable is x (or t).
If f(x) = a
0
+ a
1
x + a
2
x
2
+ is a nonzero polynomial in R[x], the
degree of f(x) is the largest value of n for which a
n
,= 0, a
n
x
n
is known
as the highest term, and a
n
is known as the highest coecient. Thus, the
polynomials of degree 0 are precisely the nonzero constants. Polynomials
of degree 1 are called linear, polynomials of degree 2 are called quadratic,
polynomials of degree 3 are called cubic, and so on. Note that we have not
dened the degree of the zero polynomial. This is on purposeit will be
convenient for the formulation of certain theorems if the zero polynomial
does not have a degree!
It is worth recalling an elementary property of polynomials that we will
use frequently (in fact, in a more formal treatment of polynomials, this fact
is built into the denitions of polynomials): two polynomials are equal if
and only if their coecients are equal. That is,

f
i
x
i
=

g
i
x
i
if and only
if f
i
= g
i
(i = 0, 1, . . . ). In particular, a polynomial

f
i
x
i
equals 0 if and
only if each f
i
= 0.
Exercise 2.17.1. Now just as with Example 2.16, prove that if
f, g, and h are any three polynomials in R[x], then (f + g) + h =
f + (g + h). Your proof should invoke the fact that associativity
holds in R.
Example 2.18. Instead of polynomials with coecients from R, we can
consider polynomials in the variable x with coecients from an arbitrary ring
R, with the usual denition of addition and multiplication of polynomials.
We get a ring, denoted R[x]. Thus, if we were to consider polynomials in
2.1. RINGS: DEFINITION AND EXAMPLES 35
the variable x whose coecients are all integers, we get the ring Z[x].
Question 2.18.1. As always, convince yourself that for a general
ring R, the set of polynomials R[x] forms a ring. For arbitrary R, is
R[x] commutative?
(See the notes on page 88 for some hints and more remarks.)
Example 2.19. Generalizing Example 2.17, the set R[x, y] of polynomials
in two variables x and y, forms a ring. A polynomial in x and y is of the form

i,j
f
i,j
x
i
y
j
. (For example, consider the polynomial 4+2x+3y+x
2
y+5xy
3

here, f
0,0
is the coecient of x
0
y
0
, i.e., the coecient of 1, so f
0,0
= 4.
Similarly, f
1,3
is the coecient of x
1
y
3
, so it equals 5. On the other hand,
f
1,1
is zero, since there is no xy term.) Two polynomials

i,j
f
i,j
x
i
y
j
and

i,j
g
i,j
x
i
y
j
are equal if and only if for each pair (i, j), f
i,j
= g
i,j
.
In the same manner, we can consider R[x
1
, . . . , x
n
], the set of polynomials
in n variables x
1
, . . . , x
n
with coecients in R. These too form a ring.
More generally, if R is any ring we may consider R[x
1
, . . . , x
n
], the set of
polynomials in n variables x
1
, . . . , x
n
with coecients in R. Once again,
we get a ring.
Example 2.20. Here is a ring with only two elements! Divide the integers
into two sets, the even integers and the odd integers. Let [0]
2
denote the
set of even integers, and let [1]
2
denote the set of odd integers. (Notice that
[0]
2
and [1]
2
are precisely the equivalence classes of Z under the equivalence
relation dened by a b i ab is even.) Denote by Z/2Z the set [0]
2
, [1]
2
.
Each element of [0]
2
, [1]
2
is itself a set containing an innite number of
integers, but we will ignore this fact. Instead, we will view all the even
integers together as one number of Z/2Z, and we will view all the odd
integers together as another number of Z/2Z. How should we add and
multiply these new numbers? Recall that if we add two even integers we get
an even integer, if we add an even and an odd integer we get an odd integer,
and if we add two odd integers we get an even integer. This suggests the
following addition rules in Z/2Z:
36 CHAPTER 2. RINGS AND FIELDS
+ [0]
2
[1]
2
[0]
2
[0]
2
[1]
2
[1]
2
[1]
2
[0]
2
(There is an obvious way to interpret this table: if you want to know what
a + b is, you go to the cell corresponding to row a and column b.)
Similarly, we know that the product of two even integers is even, the product
of an even integer and an odd integer is even, and the product of two odd
integers is odd. This gives us the following multiplication rules:
[0]
2
[1]
2
[0]
2
[0]
2
[0]
2
[1]
2
[0]
2
[1]
2
Later in this chapter (see Example 2.83 and the discussions preceding
that example), we will interpret the ring Z/2Z dierently: as a quotient
ring of Z. This interpretation, in particular, will prove that Z/2Z is indeed
a ring under the given operations. Just accept for now the fact that we get
a ring, and play with the it to develop a feel for it.
Question 2.20.1. How would you get a ring with three elements
in it? With four?
Example 2.21. Here is the answer to the previous two questions! We have
observed that [0]
2
and [1]
2
are just the equivalence classes of Z under the
equivalence relation a b i a b is even. Analogously, let us consider
the equivalence classes of Z under the equivalence relation aRb i a b is
divisible by 3. Since a b is divisible by 3 exactly when a and b each leaves
the same remainder when divided by 3, there are three equivalence classes:
(i) [0]
3
, the set of all those integers that yield a remainder of 0 when you
divide them by 3. In other words, [0]
3
consists of all multiples of 3, that is,
all integers of the form 3k, k Z. (ii) [1]
3
for the set of all those integers
that yield a remainder of 1, so [1]
3
consists of all integers of the form 3k +1,
k Z. (iii) [2]
3
for the set of all those integers that yield a remainder of 2,
2.1. RINGS: DEFINITION AND EXAMPLES 37
so [2]
3
consists of all integers of the form 3k +2, k Z. Write Z/3Z for the
set [0]
3
, [1]
3
, [2]
3
. Just as in the case of Z/2Z, every element of this set is
itself a set consisting of an innite number of integers, but we will ignore
this fact. How would you add two elements of this set? In Z/2Z, we dened
addition using observations like an odd integer plus an odd integer gives
you an even integer. The corresponding observations here are an integer
of the form 3k+1 plus another integer of the form 3k+1 gives you an integer
of the form 3k + 2, an integer of the form 3k + 1 plus another integer of
the form 3k +2 gives you an integer of the form 3k, an integer of the form
3k + 2 plus another integer of the form 3k + 2 gives you an integer of the
form 3k + 1, etc. We thus get the following addition table:
+ [0]
3
[1]
3
[2]
3
[0]
3
[0]
3
[1]
3
[2]
3
[1]
3
[1]
3
[2]
3
[0]
3
[2]
3
[2]
3
[0]
3
[1]
3
Exercise 2.21.1. Similarly, study how the remainders work out
when we multiply two integers. (For instance, we nd that an
integer of the form 3k + 2 times an integer of the form 3k + 2
gives you an integer of the form 3k +1, etc.) Derive the following
multiplication table:
[0]
3
[1]
3
[2]
3
[0]
3
[0]
3
[0]
3
[0]
3
[1]
3
[0]
3
[1]
3
[2]
3
[2]
3
[0]
3
[2]
3
[1]
3
This process can easily be generalized to yield a ring with n elements
(Z/nZ) for any n 2.
Exercise 2.21.2. Construct the addition and multiplication tables
for the ring Z/4Z.
Example 2.22. Suppose R and S are two rings. (For example, take R =
Z/2Z, and take S = Z/3Z.) Consider the Cartesian product T = R S,
38 CHAPTER 2. RINGS AND FIELDS
which is the set of ordered pairs (r, s) with r R and s S. Dene addition
in T by (r, s)+(r

, s

) = (r+r

, s+s

). Here, r+r

refers to the addition of


two elements of R according to the dention of addition in R, and similarly,
s+s

refers to the addition of two elements of S according to the denition


of addition in S. For instance, in Z/2Z Z/3Z, ([0]
2
, [1]
3
) + ([1]
2
, [2]
3
) =
([1]
2
, [0]
3
). Similarly, dene multiplication in T by (r, s)(r

, s

) = (rr

, ss

).
Once again, rr

refers to the multiplication of two elements of R according


to the denition of multiplication in R, and ss

refers to the multiplication


of two elements of S according to the denition of multiplication in S. Thus,
in Z/2Z Z/3Z again, ([0]
2
, [1]
3
) ([1]
2
, [2]
3
) = ([0]
2
, [2]
3
).
Question 2.22.1. Do these denitions of addition and mulitplica-
tion make T a ring? Check!
Denition 2.23. Given two rings R and S, the ring T constructed above is
known as the direct product of R and S.
Question 2.23.1. What are the identity elements with respect to
addition and multiplication?
Question 2.23.2. Now take R = S = Z. Can you nd pairs of
nonzero elements a and b in the ring T = ZZ such that a b = 0?
(Note that Z itself does not contain pairs of such elements.) If R
and S are arbitrary rings, can you nd a pair of nonzero elements a
and b in T = R S such that a b = 0?
(See the notes on page 89 for hints.)
Remark 2.24. The examples above should have convinced you that our def-
inition of a ring (Denition 2.5 above) is rather natural, and that it very
eectively models several number systems that arise in mathematics. Here
is further evidence that our axioms are the correct ones. Notice that in
all the rings that we have come across, the following properties hold:
1. The additive identity is unique, that is, there is precisely one element
0 in the ring that has the property that a +0 = a for all elements a in
the ring.
2.1. RINGS: DEFINITION AND EXAMPLES 39
2. The multiplicative identity is unique, that is, there is precisely one
element 1 in the ring that has the property that a 1 = 1 a = a for
all elements a in the ring.
3. a +b = a +c implies b = c for any elements a, b, and c in the ring.
4. For every element a in the ring, there is precisely one element a that
satises the condition that a + (a) = 0.
5. For every element a in the ring, (a) is just a.
6. a 0 = 0 a = 0 for all elements a.
7. (1) a = a (1) = a for all elements a.
8. More generally, a (b) = (a) b = (ab) for all elements a and b.
9. (1) (1) = 1.
10. More generally, (a) (b) = ab for all elements a and b.
Now these properties all seem extremely natural, and we would certainly
like them to hold in all rings. (More strongly, a ring in which any of these
properties fail would appear very pathological to us!) Now, if our ring axioms
were the correct ones, then the properties above would be deducible from
the ring axioms themselves, thereby showing that they hold in all rings.
As it turns out, this is indeed true: they are deducible from the axioms,
and therefore, they do hold in every ring R. Although we will not verify
this in the text, it is good practice for you to verify that at least some
of these properties above follow from the axioms, so we have included the
verication as Exercise 2.114 in the exercises at the end of this chapter (see
also the remarks on page 89).
Property 3 above is known as additive cancellation. It is actually a
consequence of the fact that if R is any ring, then (R, +) is a group: in any
group (G, ), if a b = a c for elements a, b, and c in G then b = c (see
40 CHAPTER 2. RINGS AND FIELDS
Exercise 4.18 in Chapter 4 ahead). In much the same way, properties 1, 4,
and 5 are really consequences of the fact that these properties hold for any
group. (See Exercises 4.15, 4.16, and 4.17 in Chapter 4.)
In fact, after you do Exercise 2.115 at the end of this chapter, you will
realize that property 2 above is also a property that comes from a particular
group structure on a particular subset of R!
Notice that there is one property that is very similar to additive cancel-
lation, namely multiplicative cancellation: a b = a c implies b = c, which
we have not listed above. The reason for its absence is very simple: multi-
plicative cancellation cannot be deduced from the ring axioms. In turn, the
reason that it cannot be deduced from the axioms is because multiplicative
cancellation does not hold in all rings!
Question 2.25. Can you think of an example of a ring R and
elements a, b, and c in R such that ab = ac yet b ,= c?
2.2 Subrings
In Examples 2.12, 2.13, 2.14, and 2.15 above, we came across the following
phenomenon: A ring R and a subset S of R that had the following two
properties: For any s
1
and s
2
in S, s
1
+ s
2
was in S and s
1
s
2
was in S. In
Example 2.12, the ring R was R, and the subset S was the set of all real
numbers of the form a + b

2 with a and b rational numbers. In Example


2.14, R was C and S was the set of all complex numbers of the form a +bi
with a and b rational numbers. In Example 2.15, R was Q, and S was
the set of all reduced fractions with odd denominator. Moreover, in all
three examples, we endowed S with binary operations in the following way:
Given s
1
and s
2
in S, we viewed them as elements of R, and formed the
sum s
1
+s
2
(the sum being dened according to the denition of addition in
R). Next, we observed that s
1
+s
2
was actually in S (this is one of the two
properties alluded to above). Similarly, we observed that s
1
s
2
(the product
being formed according to the denition of multiplication in R) was also in
2.2. SUBRINGS 41
S. These two facts hence gave us two binary operation on S. We then found
that with respect to these binary operations, S was not just an arbitrary
subset of R, it was actually a ring in its own right.
The crucial reason (although not the only reason) why the set S in all
our examples was itself a ring was that S had the properties described at
the beginning of the previous paragraph. We give these properties a name.
Denition 2.26. Given an arbitrary nonempty subset S of a ring R, we say
that S is closed under addition if for any s
1
and s
2
in S, s
1
+ s
2
is also in S.
Similarly, we say that S is closed under multiplication if for any s
1
and s
2
in S,
s
1
s
2
is also in S.
As we have observed, if a subset S of a ring R is closed under addition,
then the addition operation on R, when restricted to ordered pairs of ele-
ments of S, yields a binary operation on S (which we also call addition), and
we say that the addition on S is induced by the addition on R. Similarly,
when S is closed under multiplication, we get a binary operation on S (also
called multiplication) that we say is induced by the multiplication on R.
Now suppose that S is a subset of a ring R that is closed with respect
to addition and multiplication, and just as in our examples above, suppose
that with respect to the induced operations, S is itself a ring. We will give
a special name to this situation:
Denition 2.27. Let S be a subset of a ring R that is closed with respect
to addition and multiplication. Suppose that 1 S. Suppose further that with
respect to these addition and multiplication operations on S that are induced
from those on R, S is itself a ring. We say that S is a subring of R. We also
describe R as a ring extension of S, and refer to R and S jointly as the ring
extension R/S.
Examples 2.12, 2.13, 2.14, and 2.15 above are therefore all instances of
subrings: Q[

2] is a subring of R, Q[] is a subring of C, and Z


(2)
is a subring
of Q. (See the notes on page 90 for a remark on Denition 2.27 above.)
42 CHAPTER 2. RINGS AND FIELDS
Question 2.28. Consider the subset S of Z consisting of the pos-
itive even integers, that is, the set 2n[n Z and n > 0. Check
that S is closed with respect to both addition and multiplication.
Does this make S a subring of Z? Next, consider the set T of all
nonnegative integers. Check that T is also closed with respect to
addition and multiplication. Clearly, T contains 1. Does this make
T a subring of Z?
Here is a quick exercise, which is really a special case of Exercise 4.22 in
Chapter 4 ahead:
Exercise 2.29. Let S be a subring of the ring R. Thus, by deni-
tion (S, +) is an abelian group. Let 0
S
denote the identity element
of this group, and write 0
R
for the usual 0 of R. Show that
0
S
= 0
R
. (See also Exercise 3.54 in Chapter 3 ahead.)
Before we proceed to look at further examples of subrings, let us rst
consider a criterion that will help us decide whether a given subset of a ring
is actually a subring.
Lemma 2.30. Let S be a subset of a ring R which has the following prop-
erties:
1. S is closed under addition,
2. S is closed under multiplication,
3. 1 is in S, and
4. For all a S, a is also in S.
Then S is a subring of R.
Proof. As discussed above, since S is closed with respect to addition and
multiplication, the addition and multiplication operations on R induce ad-
dition and multiplication operations on S. Now consider addition. For any
a, b, and c in S, we may view a, b, and c as elements of R, and since addition
is associative in R, we nd (a + b) + c = a + (b + c). Viewing a, b, and c
back as elements of S in this equation, we nd that the induced addition
2.2. SUBRINGS 43
operation on S is associative. Similarly, since addition is commutative in R,
the induced addition on S is commutative. Now we are given that 1 S,
so property (4) shows that 1 is also in S. From the fact that S is closed
under addition, we nd that 1 +(1) is also in S, so 0 is in S. The relation
s + 0 = s holds for all s S, since it holds more generally for any s R.
Thus, S has an additive identity, namely 0. For every s S, we are given
that s is also in S, so every element of S has an additive inverse. As for
multiplication, given a, b, and c in S, we may view these as elements of
R, and since multiplication in R is associative, we nd that (ab)c = a(bc).
As before, viewing a, b, and c back as elements of S in this equation, we
nd that the induced multiplication operation on S is associative. Since
s 1 = 1 s = s for all s S (as this is true more generally for all s R),
and since 1 S, we nd that S has a multiplicative identity, namely 1.
Finally, exactly as in the arguments for associativity above, the relations
a(b +c) = ab +ac and (a+b)c = ac +bc hold for all a, b, and c in S because
they hold in R, so distributivity is satised. S is hence a ring in its own
right with respect to the induced operations of addition and multiplication
and it contains 1. Thus, S is a subring of R. 2
The following are further examples of subrings. Play with these exam-
ples to gain familiarity with them. Check that they are indeed examples of
subrings of the given rings by applying Lemma 2.30.
Example 2.31. The set of all real numbers of the form a + b

2 where a
and b are integers is a subring of Q[

2]. Why? It is denoted by Z[

2].
Example 2.32. The set of all complex numbers of the form a + bi where
a and b are integers is a subring of Q[]. It is denoted by Z[]. (It is often
called the ring of Gaussian integers.)
Example 2.33. Let Z[1/2] denote the set of all rational numbers that are
such that when written in the reduced form a/b with gcd(a, b) = 1, the
44 CHAPTER 2. RINGS AND FIELDS
denominator b is a power of 2. (Contrast this set with Z
(2)
.) This is a
subring of Q.
Question 2.33.1. What are the rational numbers that this ring
has in common with Z
(2)
?
(See the notes on page 90 for clues.)
Example 2.34. Let Q[

2,

3] denote the set of all real numbers of the


form a + b

2 + c

3 + d

6, where a, b, c, and d are all rational numbers.


This is a subring of the reals. (See the notes on page 93, in particular see
Example 2.146 there, for an explanation of this notation.)
Question 2.34.1. Is the set of all real numbers of the form a +
b

2 +c

3 where a, b, and c are rationals a subring of the reals?


Example 2.35. If S is a subring of a ring R, then S[x] is a subring of R[x].
Exercise 2.35.1. Prove this assertion!
In particular, this shows that Q[x] is a subring of R[x], which in turn is
a subring of C[x].
Example 2.36. Similarly, If S is a subring of a ring R, then M
n
(S) is a
subring of M
n
(R).
Example 2.37. Let U
n
(R) denote the upper triangular matrices, that is,
the subset of M
n
(R) consisting of all matrices whose entries below the main
diagonal are all zero. Thus, U
n
(R) is the set of all (a
i,j
) in M
n
(R) with
a
i,j
= 0 for i > j. (You may have seen the notation (a
i,j
) before: it
denotes the matrix whose entry in the ith row and jth column is the element
a
i,j
.) Then U
n
(R) is a subring of M
n
(R).
Question 2.37.1. Why?
Question 2.37.2. For what values of n will U
n
(R) be the same
as M
n
(R)?
2.3. INTEGRAL DOMAINS AND FIELDS 45
Question 2.37.3. Suppose we considered the set of strictly upper
triangular matrices, namely the set of all (a
i,j
) in M
n
(R) with a
i,j
=
0 for i j. Would we still get a subring of M
n
(R)?
Example 2.38. Here is another subring of M
n
(R). For each real number
r, let diag(r) denote the matrix in which each diagonal entry is just r and
in which the o-diagonal entries are all zero. The set of matrices in M
n
(R)
of the form diag(r) (as r ranges through R) is then a subring.
Question 2.38.1. What observations can you make about the
function from R to M
n
(R) that sends r to diag(r)?
(See Example 2.106 ahead.)
2.3 Integral Domains and Fields
In passing from the concrete example of the integers to the abstract de-
nition of a ring, observe that we have introduced some phenomena that at
rst seem pathological. The rst, which we have already pointed out explic-
itly and is already present in M
2
(R), is noncommutativity of multiplication.
The second, which is also present in M
2
(R), and examples of which you have
seen as far back as in the preliminary chapter To the Student, page ix, is
the existence of zero-divisors.
Denition 2.39. A zero-divisor in a ring R is a nonzero element a for which
there exists a nonzero element b such that either a b = 0 or b a = 0.
Just as noncommutativity of multiplication, on closer observation, turns
out to be quite a natural phemomenon after all, the existence of zero-divisors
is really not very pathological either. It merely seems so because most of
our experience has been restricted to various rings that appear as subrings
of the complex numbers.
Besides matrix rings (try to discover lots of zero-divisors in M
2
(R) for
yourselves), zero-divisors occur in several rings that arise naturally in math-
ematics, including many commutative ones. For instance, the direct product
of two rings always contains zero-divisors (see Example 2.22 above). Also,
(see Exercise 2.21.2), Z/4Z contains zero-divisors: [2]
4
[2]
4
= [0]
4
! In fact,
46 CHAPTER 2. RINGS AND FIELDS
as long as n is not prime, you should be able to discover zero-divisors in any
of the rings Z/nZ (see 2.58 ahead). (It can be proved, however, that Z/nZ
cannot have zero-divisors if n is prime, see 2.59 ahead.)
On the other hand, there is no doubt that the absence of zero-divisors
in a ring indeed makes the ring relatively easy to work with. If, in addition,
such a ring is also commutative, it becomes exceptionally nice to work with.
With this in mind, we make the following denition:
Denition 2.40. An integral domain is a commutative ring with no zero-
divisors.
(Alternatively, an integral domain is a commutative ring R with the
property that whenever a b = 0 for two elements a and b in R, then either
a must be 0 or else b must be 0.)
Z, Q, R, and C are all obvious examples of integral domains. (Again, we
are simply invoking our knowledge of these rings when we make this claim.)
Question 2.41. Is R[x] an integral domain? More generally, if
R is an arbitrary ring, can you determine necessary and sucient
conditions on R that will guarantee that R[x] has no zero-divisors?
(See the notes on page 88 for a denition of R[x], and for some
discussions that may help you answer this question.)
Notice that any subring S of an integeral domain R must itself be an
integral domain. (If ab = 0 holds in S for some nonzero elements a and b,
then viewing a and b as elements of R, we would nd ab = 0 in R, which is
a contradiction, since R is an integral domain.) In particular, any subring
of C is an integral domain.
Question 2.42. Now suppose S is a subring of R and suppose
that S (note!) is an integral domain. Must R also be an integral
domain? (Hint: Look at Example 2.38 above for inspiration!)
Integral domains have one nice property: one can always cancel elements
from both sides of an equation, i.e., multiplicative cancellation holds! More
precisely, we have the following:
2.3. INTEGRAL DOMAINS AND FIELDS 47
Lemma 2.43. (Multiplicative Cancellation in Integral Domains:) Let R be
an integral domain, and let a be a nonzero element of R. If ab = ac for two
elements b and c in R, then b = c.
Proof. Write ab = ac as a(b c) = 0. Since a ,= 0 and since R is an integral
domain, b c = 0, or b = c! 2
Now, integral domains are denitely very nice rings, but one can go out
on a limb and require that rings be even nicer! We can require that we be
able to divide any element a by any nonzero element b. This would certainly
make the ring behave much more like Q or R.
To understand division better, let us look at the process of dividing two
integers a little closer. To divide 3 by 5 is really to multiply together 3 and
1/5 (just as to subtract, say, 6 from 9 is really to add together 9 and 6).
The reason this cannot be done within the context of the integers is that
1/5 is not an integer. (After all, if 1/5 were an integer, then the product
of 3 and 1/5 would also be an integer.) Now let us look at 1/5 a dierent
way. 1/5 has the property that 1/5 5 = 5 1/5 = 1. In other words, 1/5 is
the inverse of 5 with respect to multiplication (just as 6 is the inverse of 6
with respect to addition). First, let us pause to give a name to this:
Denition 2.44. If R is an arbitrary ring, a nonzero element a is said to be
invertible or to have a multiplicative inverse if there exists an element b R
such that ab = ba = 1. In such a situation, b is known as the multiplicative
inverse of a, and a is known as the multiplicative inverse of b.
Invertible elements of a ring R are also known as units of R.
(Notice that for an arbitrary ring, it is not enough in the denition of
invertibility to insist that ab = 1, we also need ba to equal 1. It is certainly
possible to have two elements a and b in a ring R such that ab = 1 but
ba ,= 1: see Exercise 3.103 in Chapter 3 ahead.)
Question 2.45. What are the units of Z?
48 CHAPTER 2. RINGS AND FIELDS
Putting all this together, the reason that we cannot divide within the
context of the integers is that given an arbitrary (nonzero integer) m, it
need not be invertible. With this in mind, we have the following denition:
Denition 2.46. A eld is an integral domain in which every nonzero element
a is invertible. The multiplicative inverse of a nonzero element a is usually
denoted either by 1/a or by a
1
.
(For a comment on this denition, see the notes on page 93.)
Here is a quick exercise:
Exercise 2.47. Let R be a commutative ring. Suppose that ev-
ery nonzero element in R is invertible. Prove that R cannot have
zero-divisors, and hence, R must a eld. Give an example of a com-
mutative ring to show that, conversely, if R is commutative and has
no zero-divisors, then all nonzero elements need not be invertible.
We will often use the letter F to denote a eld. The set of nonzero
elements of a eld F is often denoted by F

.
Question 2.48. If F is a eld, is F

a group with respect to


multiplication?
(See also Exercise 2.115 at the end of the chapter.)
Remark 2.49. Notice that 0 can never have a multiplicative inverse, since
a 0 = 0 for any a. (See Remark 2.24.) We describe this by saying that
division by 0 is not dened.
Perhaps the most familiar example of a eld is Q. We have already seen
that it is a ring (Example 2.10) The multiplicative inverse of the nonzero
rational number m/n is, of course, n/m. Here are more examples:
Example 2.50. The reals, R.
Example 2.51. Q[

2].
Question 2.51.1. Q[

2] is a subring of R, and hence an integral


domain. Explicitly exhibit the multiplicative inverse of the nonzero
number a +b

2 as c +d

2 for suitable rational numbers c and d.


(Think in terms of rationalizing denominators.)
2.3. INTEGRAL DOMAINS AND FIELDS 49
Question 2.51.2. Is Z[

2] a eld?
Example 2.52. The complex numbers, C.
Question 2.52.1. What is the inverse of the nonzero number a +
b? (Give the inverse as c + d for suitable real numbers c and d:
think in terms of real-izing denominators.)
Example 2.53. Q[].
Question 2.53.1. Why is Q[] a eld?
Question 2.53.2. Is Z[] a eld?
Example 2.54. Here is a new example: the set of rational functions with
coecients from the reals, R(x). (Note the parentheses around the x.) This
is the set of all quotients of polynomials with coecients from the reals,
that is, the set
_
f(x)
g(x)
_
, where f(x) and g(x) are elements of R[x], and
g(x) ,= 0. (Of course, we take f(x)/g(x) = f

(x)/g

(x) if f(x)g

(x) =
g(x)f

(x).) Addition and multiplication in R(x) are similar to addition and


multiplication in Q
f
1
(x)
g
1
(x)
+
f
2
(x)
g
2
(x)
=
f
1
(x) g
2
(x) +f
2
(x) g
1
(x)
g
1
(x) g
2
(x)
,
and
f
1
(x)
g
1
(x)

f
2
(x)
g
2
(x)
=
f
1
(x) f
2
(x)
g
1
(x) g
2
(x)
.
The multiplicative inverse of the nonzero element
f(x)
g(x)
is just
g(x)
f(x)
.
Example 2.55. More generally, if F is any eld, we may consider the
set of rational functions with coecients from F, denoted F(x). This is
analogous to R(x): it is the set
_
f(x)
g(x)
_
, where f(x) and g(x) are now
elements of F[x] instead of R[x], and g(x) ,= 0. (As with R(x), we take
f(x)/g(x) = f

(x)/g

(x) if f(x)g

(x) = g(x)f

(x).) Addition and multipli-


cation are dened just as in R(x), and we can check that we get a eld.
50 CHAPTER 2. RINGS AND FIELDS
Example 2.56. The ring Z/2Z is a eld! This is easy to see from the
multiplication table for Z/2Z (see page 36): since the only nonzero element
is [1]
2
, and since [1]
2
[1]
2
= [1]
2
,= [0]
2
, it is clear that Z/2Z is an integral
domain. But we can read one more fact from the relation [1]
2
[1]
2
= [1]
2
: the
only nonzero element in Z/2Z is actually invertible! Thus, Z/2Z is indeed
a eld.
Question 2.56.1. How many elements does the ring M
2
(Z/2Z)
have? Which of these elements are invertible?
(It would be helpful to recall from more elementary courses that a
matrix with entries in, say the real numbers, is invertible if and only
if its determinant is nonzero. You may accept for now that this same
result holds for matrices with entries in any eld.)
Example 2.57. The ring Z/3Z is also a eld!
Question 2.57.1. Study the multiplication table in Z/3Z on page
37. There are no zeros in the table other than in the rst row and
in the rst column (which correspond to multiplication by zero).
Why does this show that there are no zero-divisors in this ring?
Now notice that every row and every column (other than the rst)
has [1]
3
in it. Why does this show that every nonzero element is
invertible?
(After this example and the previous example of Z/2Z, you may nd
Exercise 2.127 at the end of the chapter illuminating.)
Example 2.58. It would be tempting to jump to the conclusion that Z/mZ
is a eld for all m 2. However, we have already seen on page 45 that
Z/4Z has a zero-divisor. This shows that Z/4Z is not an integral domain,
and hence most denitely not a eld.
Question 2.58.1. Study the observation on page 45 that shows
that Z/4Z is not an integral domain. How should you generalize
this observation to prove that for any composite integer m 4,
Z/mZ is not an integral domain?
Example 2.59. However, Examples 2.56 and 2.57 do generalize suitably:
it turns out that for any prime p, the ring Z/pZ is a eld (with p elements).
2.3. INTEGRAL DOMAINS AND FIELDS 51
Recall from the discussions in Examples 2.20 and 2.21 that the elements
of Z/pZ are equivalence classes of integers under the relation a b if and
only if a b is divisible by p. The equivalence class [a]
p
of an integer a is
thus the set of integers of the form a p, a 2p, a 3p, . . . . Addition and
multiplication in Z/pZ are dened by the rules
1. [a]
p
+ [b]
p
= [a +b]
p
2. [a]
p
[b]
p
= [a b]
p
Exercise 2.59.1. Show that addition and multiplication are well-
dened, that is, if a a

and b b

, then a + b a

+ b

and
a b a

.
Exercise 2.59.2. Show that the zero in this ring is [0]
p
, and the 1
in this ring is [1]
p
. (In particular, [a]
p
is nonzero in Z/pZ precisely
when a is not divisible by p.)
Exercise 2.59.3. Now let [a]
p
be a nonzero element in Z/pZ.
Show that [a]
p
is invertible. (Hint: Invoking the fact that a and p
are relatively prime, we nd that there must exist integers x and y
such that xa +yp = 1. So?)
Exercise 2.59.4. Now conclude using Exercise 2.47 and Exercise
2.59.3 above that Z/pZ is a eld.
We end this section with the concept of a subeld. The idea is very
simple (compare with Denition 2.27 above):
Denition 2.60. A subset F of a eld K is called a subeld of K if F is a
subring of K and is itself a eld. In this situation, we also describe K as a eld
extension of F, and refer to F and K jointly as the eld extension K/F.
The dierence between being a subring of K and a subeld of K is
as follows: Suppose R is a subring of K. Given a nonzero element a in
R, its multiplicative inverse 1/a certainly exists in K (why?). However,
1/a may not live inside R. If 1/a happens to live inside R, we say that a
has a multiplicative inverse in R itself. Now, if every nonzero a in R has
a multiplicative inverse in R itself, then by Denition 2.46 (why is R an
52 CHAPTER 2. RINGS AND FIELDS
integral domain?), R is a eld. Therefore, by Denition 2.60 above, R is
then a subeld of K.
Thus, Q is a subeld of R, but Z is only a subring of R; it is not a
subeld of R. Similarly, R is a subeld of C. (Is R a subring of C?) Q[

2]
is a subeld of C. In fact, more is trueQ is a subeld of Q[

2], which in
turn is a subeld of R, which in turn is a subeld of C.
Question 2.61. By contrast, is Q[i] a subeld of R? Of C?
Question 2.62. Is Z[i] a subeld of R? Of C?
(See Exercise 2.128 at the end of the chapter for a situation in which we
can conclude that a subring of a eld must actually be a subeld.)
2.4 Ideals
Consider the ring Z, and consider the subset of even integers, denoted (sug-
gestively) 2Z. The set 2Z is closed under addition (the sum of two even
integers is again an even integer), and in fact, (2Z, +) is even an abelian
group (this is because (i) 0 is an even integer and hence in 2Z, (ii) for any
even integer a, a is also an even integer and hence in 2Z, (iii) and of course,
addition of integers, restricted to 2Z is both an associative and commuta-
tive operation). Moreover, the set 2Z has one extra property that will be of
interest: for any integer a 2Z and for any arbitrary integer m, am is also
an even integer and hence in 2Z. Subsets such as these play a crucial role
in the structure of rings, and are given a special name: they are referred to
as ideals.
Denition 2.63. Let R be a ring. A subset I of R is called an ideal of R
if I is closed under the addition operation of R and under this induced binary
operation (I, +) is an abelian group, and if for any i I and arbitrary r R,
both ri I and ir I. An ideal I is called proper if I ,= R.
2.4. IDEALS 53
Remark 2.64. Of course, if R is commutative, as in the example of Z and
2Z above, ri I if and only if ir I, but in an arbitrary ring, one must
specify in the denition that both ri and ir be in I.
Remark 2.65. Notice in the denition of ideals above that if ir I for all r
R, then in particular, taking r to come from I, we nd that I must be closed
under multiplication as well, that is, for any i and j in I, ij must also be in
I. Once we nd that ideals are closed under multiplication, the associative
and distributive laws will then be inherited from R, so ideals seem like
they should be the same as subrings. However, they dier from subrings in
one crucial aspectideals do not have to contain the multiplicative identity
1. (Recall the denition of subrings, and see the example of 2Z aboveit
certainly does not contain 1.)
Exercise 2.66. Show that if I is an ideal of a ring R, then 1 I
implies I = R.
Here is an alternative characterization of ideals:
Lemma 2.67. Let I be a subset of a ring R. Then I is an ideal of R if and
only if
1. I is nonempty,
2. I is closed under addition, and
3. for all i I and r R, both ir and ri are in I.
Proof. If I is an ideal of R, then by denition, I is closed under addition,
and for all i I and r R, both ir and ri are in I. Moreover, by denition
of being an ideal, (I, +) is an abelian group, so it has at least one element
(the identity element). This shows that I is nonempty.
Now assume that I is nonempty, closed under addition, and for all i I
and r R, both ir and ri are in I. Since I is nonempty, there exists at
least one element in I, call it a. Then, by the hypotheses, a 0 = 0 must
be in I. Also, for any i I, i (1) = i I. Since commutativity and
54 CHAPTER 2. RINGS AND FIELDS
associativity of addition in I follows from that in R, we nd that indeed
(I, +) is an abelian group. 2
Exercise 2.68. If I is an ideal of R, then by denition, (I, +) is
an abelian group. Consequently, it has an identity element, call it
0
I
, that satises the property that i + 0
I
= 0
I
+i = i for all i I.
On the other hand, the element 0 in R is the identity element for
the group (R, +). Prove that the element 0
I
must be the same as
the element 0.
(See Exercise 4.22 in Chapter 4 ahead.)
The signicance of ideals will become clear when we study quotient rings
and ring homomorphisms a little ahead, but rst let us consider several
examples of ideals in rings:
Example 2.69. Convince yourselves that if R is any ring, then both R and
the set 0 are both ideals of R. The ideal 0 is often referred to informally
as the zero ideal.
Example 2.70. Just as with the set 2Z, we may consider, for any integer
m, the set of all multiples of m, denoted mZ.
Exercise 2.70.1. Prove that mZ is an ideal of Z.
Question 2.70.1. What does mZ look like when m = 1?
Question 2.70.2. What does mZ look like when m = 0?
Example 2.71. In the ring R[x], let x denote the set of all polynomials
that are a multiple of x, i.e. the set xg(x) [ g(x) R[x].
Exercise 2.71.1. Prove that x is an ideal of R[x].
Exercise 2.71.2. More generally, let f(x) be an arbitrary poly-
nomial, and let f(x) denote the set of all polynomials that are a
multiple of f(x), i.e. the set f(x)g(x) [ g(x) R[x]. Show that
f(x) is an ideal of R[x].
2.4. IDEALS 55
Example 2.72. In the ring R[x, y], let x, y denote the set of all polyno-
mials that can be expressed as xf(x, y) + yg(x, y) for suitable polynomials
f(x, y) and g(x, y). For example, the polynomial x + 2y + x
2
y + xy
3
is in
x, y because it can be rewritten as x(1 +xy) +y(2 +xy
2
). (Note that this
rewrite is not uniqueit can also be written as x(1 +xy +y
3
) +2ybut this
will not be an issue.)
Exercise 2.72.1. Show that x, y is an ideal of R[x, y].
Exercise 2.72.2. More generally, given two arbitrary polynomials
p(x, y) and q(x, y), let p(x, y), q(x, y) denote the set of all polyno-
mials that can be expressed as p(x, y)f(x, y)+q(x, y)g(x, y) for suit-
able polynomials f(x, y) and g(x, y). Show that p(x, y), q(x, y) is
an ideal of R[x, y].
Example 2.73. Fix an integer n 1. In the ring M
n
(Z) (see Exercise
2.16.5), the subset M
n
(2Z) consisting of all matrices all of whose entries are
even, is an ideal.
Exercise 2.73.1. Prove this.
Question 2.73.1. Given an arbitrary integer m, is the subset
M
n
(mZ) consisting of all matrices all of whose entries are a multiple
of m an ideal of M
n
(Z)?
Example 2.74. Let R be an arbitrary ring, and let I be an ideal of R. Fix
an integer n 1. In M
n
(R), let M
n
(I) denote the subset of all matrices all
of whose entries come from I.
Exercise 2.74.1. Prove that M
n
(I) is an ideal of M
n
(R).
Example 2.75. In the ring Z
(2)
, denote by 2
(2)
the set of all fractions of
the (reduced) form a/b where b is odd and a is even.
Question 2.75.1. Study Example 2.15 carefully. Is 2
(2)
a proper
subset of Z
(2)
?
Exercise 2.75.1. Prove that 2
(2)
is an ideal of Z
(2)
.
56 CHAPTER 2. RINGS AND FIELDS
Example 2.76. Let R and S be rings, and let I
1
be an ideal of R and I
2
an ideal of S. Let I
1
I
2
denote the set (a, b) [ a I
1
, b I
2
.
Exercise 2.76.1. Prove that I
1
I
2
is an ideal of R S.
(See Exercise 2.129 ahead.)
Example 2.77. For simplicity, we will restrict ourselves in this example to
commutative rings. First, just to point out terminology that we have already
introduced in Example 2.71, by a multiple of r in a general commutative ring
R, we mean the set ra [ a R. (This obviously generalizes the notion
of multiple that we use in Z.) In Examples 2.70 and 2.71, we considered
the set of all multiples of a given element of our ring (multiples of m in the
case of Z, multiples of f(x) in the case of R[x]), and observed that these
formed an ideal. In Example 2.72, we considered something more general:
the set p(x, y), q(x, y) is the set of sums of multiples of p(x, y) and q(x, y).
This process can be generalized even further. If a
1
, . . . , a
n
are elements of
a commutative ring R, we denote by a
1
, . . . , a
n
the set of all elements of R
that are expressible as a
1
r
1
+ +a
n
r
n
for suitable elements r
1
, . . . , r
n
in
R. Thus, the elements of a
1
, . . . , a
n
are sums of multiples of the a
i
. (As
in Example 2.72, the r
i
may not be uniquely determined, but this will not
be an issue.)
Exercise 2.77.1. Show that a
1
, . . . , a
n
is an ideal of R.
The ideal a
1
, . . . , a
n
is known as the ideal generated by a
1
, . . . , a
n
.
An ideal generated by a single element is known as a principal ideal. Thus,
the ideal 2Z is a principal ideal in Z. (Of course, the ideal 2Z could just as
easily have been denoted by 2.) See Exercise 2.130 ahead.
Exercise 2.77.2. Show that a
1
, . . . , a
n
is the smallest ideal con-
taining a
1
, . . . , a
n
, in the sense that if J is any ideal of R that
contains a
1
, . . . , a
n
, then a
1
, . . . , a
n
J.
2.5. QUOTIENT RINGS 57
Question 2.77.1. Convince yourselves that 1 = R and 0 is
just the zero ideal 0.
Exercise 2.77.3. Suppose that R is a eld, and let a be a nonzero
element of R. Show that a = R. (Hint: play with the fact that
a
1
exists in R and that a is an ideal.)
Exercise 2.77.4. Conclude that the only ideals in a eld F are
the set 0 and F.
2.5 Quotient Rings
We now come to a fundamental method of constructing a new ring from a
given ring and an ideal in the ring, namely, the quotient ring construction.
Let R be a ring (not necessarily commutative) and let I be an ideal in R.
We dene a relation on R by declaring a b if and only if a b I.
It is immediate that is an equivalence relation:(i) certainly, for any a,
a a = 0 I; (ii) if a b, then by denition a b I, but since I is an
ideal, 1(a b) = b a I, so b a as well; (iii) nally, if a b and b c,
then by denition, a b I and b c I, so again because I is an ideal,
(a b) + (b c) = a c I, so a c as well.
Let us denote the equivalence class of an element a as [a]. (Recall what
this means: it is the set of all elements in R that are related to a under this
equivalence relation.) Let us also denote by a + I the set of all elements of
the ring of the form a + i as i varies in I. The set denoted a + I is called
the coset of I with respect to a. We have the following:
Lemma 2.78. The equivalence class [a] is precisely the coset a +I.
Proof. Take b [a]. Then b a, so by denition, b a I. Thus, b a = i
for some i I, or written dierently, b = a +i. Thus, b a +I, and since b
was arbitrary, we nd [a] a + I. Conversely, take any element b a + I.
Then by denition of the set a+I, we nd b = a+i for some i I. But this
just means b a I, that is b a. Thus, b [a] and since b was arbitrary,
we nd a +I [a]. This proves that the two sets are equal.
58 CHAPTER 2. RINGS AND FIELDS
2
Let us write R/I (R mod I) for the set of equivalence classes of R under
the relation above. Thanks to Lemma 2.78 we know that the equivalence
class of r R is the same as the coset r +I, so we will use the notation [r]
and r+I interchangeably for the equivalence class of r. The key observation
we make is that the set R/I can be endowed with two binary operations +
(addition) and (multiplication) by the following rather natural denitions:
Denition 2.79. [a] + [b] = [a + b] and [a] [b] = [a b] for all [a] and [b] in
R/I. (In coset notation, this would read (a + I) + (b + I) = (a + b) + I, and
(a +I)(b +I) = ab +I.) As always, if the context is clear, we will often omit
the sign and write [a][b] for [a] [b].
Before proceeding any further, we need to settle the issue of whether
these denitions make sense, in other words, whether these operations are
well-dened. Observe that the denition of addition, for instance, depends
on which representative we use for the equivalence classes. Now recall that
if a

a, then [a] = [a

]. Similarly, if b

b, then [b] = [b

]. If we use a and
b as representatives for the equivalence classes to which they belong, our
denition of the sum of the two classes is the class to which a + b belongs.
However, if we use a

and b

as representatives for the classes [a] and [b], our


denition says that the sum of the two classes is the class to which a

+ b

belongs. Can we be certain that the class to which a+b belongs is the same
as the class to which a

+b

belongs? If yes, then we can be certain that our


denition of addition is independent of which representative we use for the
equivalence class. We have the following:
Lemma 2.80. The operations of addition and multiplication on R/I de-
scribed above in Denition 2.79 are indeed well-dened. Moreover, the ad-
dition operation is commutative.
Proof. As in the paragraph above, suppose that a

a and b

b. Then,
by denition, a

a = i for some i I, and b

b = j for some j I. Thus,


2.5. QUOTIENT RINGS 59
a

+b

= (a +i) + (b +j) = (a +b) + (i +j). Since I is an ideal and hence


closed under addition, i +j is also in I. Thus, we nd that (a

+b

) (a+b)
is in I, that is, a

+ b

is related to a + b. Put dierently, this just means


that [a

+b

] = [a +b], so indeed, addition is well-dened.


As for multiplication, note that a

= (a +i)(b +j) = ab +aj +ib +ij.


Since I is an ideal, j I implies that aj I and ij I, and again since
I is an ideal, i I implies that ib I. Thus, aj + ib + ij I as well (as
I is closed under addition). It follows that a

ab I, or put dierently,
[a

] = [ab]. This shows that multiplication is well-dened.


Finally, note that [a] +[b] = [a +b] = [b +a] (the last equality is because
a + b = b + a in the ring R), and of course, [b + a] = [b] + [a]. Hence,
[a] + [b] = [b] + [a]. 2
Remark 2.81. The proof above illustrates why we require in the denition
of ideals that they be closed under addition and that ir I and ri I for
all i in I and all r R (see Lemma 2.67). It was this that allowed us to say
that addition and multiplication are well-dened: we needed to know above
that i +j I in the proof that addition is well-dened, and that aj I and
ib I and ij I and then aj +ib +ij I in the proof that multiplication is
well-dened, and for this, we invoked the corresponding properties of ideals.
Having proved that the operations + and on R/I are well-dened, let
us proceed to prove that all ring axioms hold in R/I:
Theorem 2.82. (R/I, +, ) is a ring.
Proof. We proceed to check all axioms one by one:
1. Associativity of +: Given elements [a], [b], and [c] in R/I, we need to
check that ([a] + [b]) + [c] = [a] + ([b] + [c]). Now ([a] + [b]) + [c] =
[a+b]+[c] by denition of [a]+[b], and similarly [a+b]+[c] = [(a+b)+c].
But by the associativity of addition in R, (a+b)+c = a+(b+c). Hence,
[(a +b) +c] = [a + (b +c)]. But applying the denition of addition of
60 CHAPTER 2. RINGS AND FIELDS
two elements of R/I in reverse, [a +(b +c)] is just [a] +[b +c], which
is then [a] + ([b] + [c]). Thus, + is associative in R/I.
2. Existence of identity element for +: The element [0] is the additive
identity, since for any element [a], [a]+[0] = [a+0] = [a], and [0]+[a] =
[0 +a] = [a].
3. Existence of inverses under +: For any element [a], the element [a]
is the inverse of [a] under +, since [a] + [a] = [a + (a)] = [0], and
similarly, [a] + [a] = [a +a] = [0].
4. Commutativity of +: This was already observed in Lemma 2.80 above.
5. Associativity of : This proof is similar to the proof of associativity of
+ above.
6. Existence of identity for : The element [1] acts as the 1 of R/I
since for any element [a], [a] [1] = [a 1] = [a], and similarly, [1] [a] =
[1 a] = [a].
7. Distributivity of over +: For any elements [a], [b], and [c] in R/I,
we have [a] ([b] + [c]) = [a] [b + c] = [a (b + c)] = [a b + a c]
(this last equality is because of the distributive property in R). And
of course, [a b + a c] = [a b] + [a c] = [a] [b] + [a] [c]. Putting
it together, we nd [a] ([b] + [c]) = [a] [b] + [a] [c]. The proof that
([a] + [b]) [c] = [a] [c] + [b] [c] is similar.
2
Denition 2.83. (R/I, +, ) is called the quotient ring of R by the ideal I.
How should one visualize R/I? Here is one intuitive way. Note that the
zero of R/I is the element [0], which is just the coset 0+I (see Lemma 2.78).
But the coset 0 +I is the set of all elements of R of the form 0 +i for some
i I, and of course, the set of all such elements is just I. Thus, we may
2.5. QUOTIENT RINGS 61
view the quotient construction as something that takes the ring R and simply
converts all elements in I to zeromore colloquially, the construction kills
all elements in I, or divides out all elements in I. This last description
explains the term quotient ring, and pushing the analogy one step further,
R/I can then be thought of as the set of all remainders after dividing out
by I, endowed with the natural quotient binary operations of Denition
2.79.
Example 2.84. As our rst example, take R to be R[x], and I to be x
(Example 2.71). What does R/I look like here? Any polynomial in R[x] is
of the form a
0
+a
1
x+a
2
x
2
+ +a
n
x
n
for some n 0 and some a
i
R. The
monomials a
1
x, a
2
x
2
, . . . , a
n
x
n
are all in I since each of these is a multiple
of x. If we set these to zero, we are left with simply a
0
which is a real
number. Thus, R[x]/x is just the set of constant terms (the coecients of
x
0
) as we range through all the polynomials in R[x]. But the set of constant
terms is precisely the set of all real numbers, since every constant term is
just a real number, and every real number shows up as the constant term of
some polynomial. Thus, R[x]/x equals R. But this equality is more than
just an equality of sets: it is an equality that preserves the ring structure.
(We will make the notion of preserving ring structure more precise in the
next sectionsee Example 2.104; Example 2.96 is also relevant.)
Example 2.85. Here is another example that would help us understand
how to visualize R/I. Consider R[x] again, but this time take I to be
x
2
+ 1 (Example 2.71, Exercise 2.71.2). Notice that x
2
is in the same
equivalence class as 1, since x
2
(1) = x
2
+ 1 is clearly in I. What
this means is in the quotient ring R[x]/x
2
+1, we may represent the coset
x
2
+ I by 1 + I. (Another way of thinking about this is to note that
x
2
may be written as (x
2
+ 1) + (1). If we kill o the rst summand
x
2
+ 1, which is in I, we arrive at the representative 1 + I for x
2
+ I.)
But there is more. As we have seen while proving the well-denedness of
multiplication in R/I (Lemma 2.80 above), if x
2
1, then x
2
x
2

62 CHAPTER 2. RINGS AND FIELDS


(1) (1). Thus, x
4
1, so we may replace x
4
+ I by 1 + I. Proceeding,
we nd x
6
+I is the same as 1+I, x
8
+I is the same as 1+I, etc. Moreover,
x
3
+I = (x+I)(x
2
+I) = (x+I)(1+I) = (x+I), etc. The coset of any
monomial x
n
is thus either 1+I or x+I. For instance, while considering
the equivalence class of a polynomial such as 2 5x +3x
2
+2x
3
2x
4
+x
5
,
which is (2+I)(5+I)(x+I)+(3+I)(x
2
+I)+(2+I)(x
3
+I)(2+I)(x
4
+I)+
(x
5
+I), we may make the replacements above to nd that it is the same as
(2+I)(5+I)(x+I)+(3+I)(1+I)+(2+I)(x+I)(2+I)(1+I)+(x+I).
Multiplying out, we nd this is the same as (2 5x 3 2x 2 + x) + I,
which simplies to (36x) +I or (3+I) 6(x+I). Temporarily writing
x for x + I, we loosely think of (3 6x) + I as the element 3 6x
subject to the relation x
2
+ 1 = 0, or what is the same thing, x
2
= 1.
But if x
2
= 1, then x behaves like our familiar complex number ! Thus,
we appear to have obtained the complex number 3 6 as the equivalence
class of 2 5x + 3x
2
+ 2x
3
2x
4
+x
5
mod x
2
+ 1. Indeed this is true: it
turns out that the ring R[x]/x
2
+ 1 is the same as the set of complex
numbers. We will make all this precise and justify these heuristics in the
next section (where this example will appear in Exercise 2.112).
Let us revisit a couple of quotient rings that we have already considered.
Example 2.86. The ring Z/2Z is really just the quotient ring of Z by the
ideal 2Z. Recall that [0]
2
and [1]
2
are precisely the equivalence classes of Z
under the equivalence relation a b i a b is even (see 2.20). Since the
even integers constitute the ideal 2Z, this is precisely the sort of equivalence
relation we have considered in this section.
Question 2.86.1. Write down the addition and multiplication ta-
bles for the ring operations on Z/2Z that we have introduced in
Denition 2.79 of this section. Can you see that these are precisely
the ring operations we dened early on in Example 2.20?
Example 2.87. In a similar manner, the ring Z/3Z of Example 2.21 is the
quotient ring of Z by the ideal 3Z.
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 63
Question 2.87.1. Do you see this?
More generally, one can consider the ideal mZ for m 4 and construct
the ring Z/nZ with operations as in Denition 2.79.
Exercise 2.87.1. Redo Exercise 2.21.2 in this new light.
2.6 Ring Homomorphisms and Isomorphisms
The process of forming the quotient ring of a ring R by an ideal I is worth
studying from an alternative perspective. Intuitively speaking, the ring
operations in R/I are essentially the same as the operations in R except
that the elements of R have all been divided out by I. What do we mean by
this? Let us take addition: Suppose we have two elements r + I and s + I
in R/I, and we wish to know how to add them. By the very denition of
addition in R/I, to add r +I and s +I in R/I is the same as adding r and
s rst in the ring R and then pushing the answer down to R/I to obtain
the coset (r +s) +I. It is in this sense that adding in R/I is essentially the
same as adding in R. We can view this in terms of the function f : R R/I
that pushes r R down to r + I. Since r + I = f(r), s + I = f(s),
and (r + s) + I = f(r + s), we nd f(r) + f(s) = f(r + s). The function f
that sends r to r +I, along with the property f(r) +f(s) = f(r +s) for all
r and s in R, precisely captures the notion that addition in R/I and R are
essentially the same.
Similarly, the denition of multiplication in R/I: (r +I)(s +I) = rs +I
gives the feeling that multiplication in R/I is the same as the multiplica-
tion in R except for dividing out by I: once again this intuition is captured
by the function f above along with the property f(r)f(s) = f(rs) for all r
and s in R.
We will turn this situation around. Suppose one has a function f from
one ring R to another ring S which has the two properties described above
(along with one another: f(1
R
) = 1
S
, where 1
R
and 1
S
are the multiplicative
64 CHAPTER 2. RINGS AND FIELDS
identities in R and S respectively). In the paragraphs above, the map f :
R R/I was surjective, but let us be more general, and not assume that
our map f from R to S is surjective. It will turn out that the image of f
will, all the same, be a subring of S (see Lemma 2.103 ahead). In such a
situation too, it will turn out, the ring operations in the ring R and in the
image of f (a subring of S) will essentially be the same except perhaps
for dividing out by some ideal. We will give this a name:
Denition 2.88. Let R and S be two rings, and let f : R S be a function.
Suppose that f has the following properties:
1. f(a) +f(b) = f(a +b) for all a, b, in R,
2. f(a)f(b) = f(ab) for all a, b, in R,
3. f(1
R
) = 1
S
.
Then f is said to be a ring homomorphism from R to S.
Remark 2.89. There are some features of this denition that are worth not-
ing:
1. In the equation f(a) + f(b) = f(a + b), note that the operation on
the left side represents addition in the ring S, while the operation on
the right side represents addition in the ring R. Loosely, we say that
any function f : R S satisfying f(a) + f(b) = f(a + b) preserves
addition.
2. Similarly for the equation f(a)f(b) = f(ab): the operation on the left
side represents multiplication in S, while the operation on the right
side represents multiplication in R. Loosely, we say that any function
f : R S satisfying f(a)f(b) = f(ab) preserves multiplication.
3. By the very denition of a function, f is dened on all of R. The
image of R under f, however, need not be all of S (i.e, f need not
be surjective). We will see examples of this ahead (see Example 2.97
and Example 2.98 for instance). However, the image of R under f is
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 65
not an arbitrary subset of S. The denition of a ring homomorphism
ensures that the image of R under f is actually a subring of S (see
Lemma 2.103 later in this section).
4. In fact, the stipulation f(1
R
) = 1
S
in the denition of a ring homo-
morphism is made precisely to ensure that the image of f is a subring
of S.
5. Writing 0
R
and 0
S
for the additive identities of R and S respectively,
note that it is not necessary to stipulate that f(0
R
) = 0
S
this prop-
erty holds automatically, as we will prove in Lemma 2.90 below.
Lemma 2.90. Let f : R S be a ring homomorphism. Then f(0
R
) = 0
S
.
Proof. We start with the fact that f(0
R
) = f(0
R
+ 0
R
) = f(0
R
) + f(0
R
),
where the rst equality is because 0
R
= 0
R
+ 0
R
, and the second equality
is because f(a + b) = f(a) + f(b) for all a and b in R. We now have an
equality in S: f(0
R
) = f(0
R
) + f(0
R
). Since f(0
R
) = 0
S
+ f(0
R
), we nd
0
S
+f(0
R
) = f(0
R
) +f(0
R
). By additive cancellation (see Remark 2.24) we
nd 0
S
= f(0
R
), thereby proving the lemma.
2
There is an immediate corollary to this that will be useful (see Corollary
4.60 in Chapter 4):
Corollary 2.91. Let f : R S be a ring homomorphism. Then, for all
a R, f(a) = f(a). In particular, f(1
R
) = 1
S
.
Proof. Note that 0
R
= a+(a). Hence, f(0
R
) = f(a+(a)) = f(a)+f(a).
Since f(0
R
) = 0
S
by Lemma 2.90, we nd 0
S
= f(a) + f(a) (and by
commutativity of addition in S, 0
S
= f(a) +f(a) as well). It follows that
f(a) = f(a). In particular, taking a = 1
R
and noting that f(1
R
) = 1
S
by denition of a ring homomorphism, the last line of the corollary follows.
2
66 CHAPTER 2. RINGS AND FIELDS
Before proceeding to examples of ring homomorphisms, let us consider
one remaining issue: the concept of a ring homomorphism was introduced
to capture the notion of operations on two rings being the same except for
dividing out by some ideal. What is the natural candidate for this ideal? To
divide out by an ideal in R is to make it zero in S (recall our discussion after
Denition 2.83 on how to view R/I). This leads naturally to the following:
Denition 2.92. Given a ring homomorphism f : R S, the kernel of f is
the set r R [ f(r) = 0
S
. It is denoted ker(f).
(Thus, the kernel of f is the set of all elements of R that get mapped to
0
S
under f.)
After these discussions, the following statement should come as no sur-
prise:
Proposition 2.93. The kernel of a ring homomorphism f : R S is an
ideal of R.
Proof. Given a and b in ker(f), note that f(a+b) = f(a)+f(b) = 0
S
+0
S
=
0
S
, so a + b is also in the kernel of f. Hence ker(f) is closed under the
addition operation on R. We rst wish to show that (ker(f), +) is an abelian
group. Both associativity and commutativity of + follow from the fact that
these properties hold for the addition operation on all of R, so we only need
to show that 0
R
is in ker(f) and that for all a ker(f), a is also in ker(f).
The fact that 0
R
ker(f) is just a restatement of Lemma 2.90. Now, for
any a ker(f), f(a) = 0
S
by denition of the kernel. By Corollary 2.91,
f(a) = f(a), so f(a) = 0
S
= 0
S
. This shows that a is in ker(f) as
well. Thus, (ker(f), +) is indeed an abelian group.
Next, note that for any r R and a ker(f), f(ra) = f(r)f(a) =
f(r) 0
S
= 0
S
(for the last equality, recall the properties in Remark 2.24),
and similarly, f(ar) = f(a)f(r) = 0
S
f(r) = 0
S
, so both ra and ar are also
in ker(f). This proves that ker(f) is indeed an ideal of R.
2
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 67
Here are some examples of ring homomorphisms:
Example 2.94. Let us revisit the example that started this discussion on
ring homomorphisms: a ring R, an ideal I in R, the quotient ring R/I,
and the function f : R R/I that sends r to its equivalence class modulo
I, i.e., [r], or what is the same thing, r + I. We have already observed in
our discussion above that f(r + s) = f(r) + f(s) and f(rs) = f(r)f(s)
in fact, it is these properties of f that led us to the denition of a ring
homomorphism. It is immediate that the third property of Denition 2.88
also holds: By Theorem 2.82, the multiplicative identity in R/I is 1+I, and
indeed, f(1) = 1 +I. Thus, f(1
R
) = 1
R/I
as desired. What is the kernel of
f? We expect it be I, since our entire discussion of kernels was modeled on
how, in this very example, we are dividing out by I. Let us formally verify
this: the kernel of f is all r R such that f(r) = 0
R/I
. Now, by Theorem
2.82, the zero in R/I is 0 + I. Thus, f(r) = 0 if and only if r + I = 0 + I,
i.e., if and only if r +I = I. Now if r +I = I, this means in particular that
r (= r +0), which is an element of r +I, must be in I. Conversely, if r I,
then it is easy to see (prove it!) that the set r + I must equal I. Putting
this together, we nd that the kernel of f is precisely I.
Example 2.95. As a special case of Example 2.94, we have, for any m 2,
a ring homomorphism from Z to Z/mZ dened by f(a) = a + mZ, whose
kernel if precisely the ideal mZ (see Example 2.87).
Example 2.96. Consider the function f : R[x] R that sends x to 0 and
more generally, a polynomial p(x) to p(0). (Thus, given a polynomial p(x) =
a
0
+a
1
x+a
2
x
2
+ a
k
x
k
, f simply sets x to zero, so f sends p(x) to a
0
.)
Exercise 2.96.1. Prove that f is a ring homomorphism from R[x]
to R.
Exercise 2.96.2. Prove that the kernel of f is precisely the ideal
x.
68 CHAPTER 2. RINGS AND FIELDS
See the discussion on page 61. We will have more to say on this example
ahead (see Example 2.104 and Theorem 2.110). See also Example 2.101
ahead for a generalization.
Example 2.97. Consider Z as a subset of Q. The function f : Z Q that
sends n Z to the fraction n/1 is a ring homomorphism.
Exercise 2.97.1. Prove this.
Exercise 2.97.2. Prove that the kernel of f is the zero ideal in Z.
Note that the image of f is just the integers, and in particular, f is not
surjective.
Example 2.98. More generally, if R is a subring of S, the function f : R
S that sends r to r is a ring homomorphism. The image of f is just R, so if
R is a proper subset of S, then f will not be surjective.
Example 2.99. Consider the function f : Q[x] Q[

2] that sends x to

2 and more generally p(x) to p(

2). (Thus, given a polynomial p(x) =


a
0
+ a
1
x + a
2
x
2
+ a
k
x
k
, f simply sets x to

2, so f sends p(x) to
the element a
0
+ a
1

2 + a
2
(

2)
2
+ a
k
(

2)
k
. Of course, this horrible
expression simplies into one of the form a + b

2, by using the fact that


(

2)
2
= 2, (

2)
3
= 2

2, etc.)
Exercise 2.99.1. Prove that f is a ring homomorphism.
Exercise 2.99.2. Prove that f is surjective.
(Hint: Given rationals a and b what is the image of a +bx?)
Let us determine the kernel of this homomorphism. Since x
2
goes to 2,
x
2
2 is certainly in the kernel. Since the kernel is an ideal of Q[x], the
set of multiples of x
2
2 (which is the principal ideal denoted x
2
2, see
Example 2.77), will also be in the kernel. We will show that there are no
other elements in the kernel, that is, ker(f) = x
2
2. To this end, let us
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 69
invoke polynomial long division that is taught in high school (and which we
will revisit in Exercise 2.131 at the end of this chapter). So, suppose we are
given an arbitrary polynomial p(x) that is in ker(f). We wish to show that
p(x) is a multiple of x
2
2. Dividing p(x) by x
2
2 using long division,
we can write p(x) = q(x)(x
2
2) + r(x) for some quotient polynomial q(x)
and some remainder r(x) that is at most of degree 1. We wish to show
that r(x) is actually zero. Since r(x) is at most of degree 1, we may write
it as a + bx for some a and b in Q. Since f is a ring homomorphism,
f(p(x)) = f(q(x))f(x
2
2) +a +b

2, and since p(x) goes to zero under f,


we nd p(

2) = 0 = q(

2) 0+a+b

2. Thus, we nd a+b

2 = 0. But we
have seen in Exercise 2.12.4 that this is impossible unless a = b = 0. Thus,
r(x) must be zero, thereby showing that ker(f) = x
2
2.
Question 2.99.1. After all, x goes to

2 under f, so why is
x

2 not in the kernel of f?


Example 2.100. Here is an example similar in spirit to Example 2.99 above.
Exercise 2.100.1. Show that the function f : Q[x] Q[] that
sends x to i and more generally p(x) to p(i) is a surjective ring
homomorphism, whose kernel is the ideal x
2
+ 1.
Example 2.101. After seeing in Examples 2.99 and 2.100 above how long
division can be used to determine kernels of homomorphisms from Q[x] to
other rings, the following should be easy:
Exercise 2.101.1. Let F be any eld, and let a F be arbitrary.
Show that the function f : F[x] F that sends x to a and more
generally p(x) to p(a) is a surjective ring homomorphism whose
kernel is the ideal generated by x a.
Notice that this example generalizes 2.96 above. The process of
sending x to a is also known as evaluation at a, and hence this
homomorphism is also known as the Evaluation Homomorphism.
We now come to a very special family of ring homomorphisms, namely,
ring isomorphisms. While ring homomorphisms capture the notion that
somehow the addition and multiplication in two rings are essentially the
70 CHAPTER 2. RINGS AND FIELDS
same except perhaps for dividing out by some ideal, isomorphisms cap-
ture a stronger notion: that multiplication in two rings are essentially the
same without even having to divide out by any ideal.
First, we need a couple of lemmas:
Lemma 2.102. Let f : R S be a ring homomorphism. Then f is an
injective function if and only if ker(f) = 0
R
(the zero ideal in R).
Proof. Suppose f is injective. Suppose that r ker(f), so f(r) = 0
S
. By
Lemma 2.90, f(0
R
) = 0
S
. Since both r and 0
R
map to the same element
in S and since f is injective, we nd r = 0. Thus, the kernel of f consists
of just the element 0
R
l. Conversely, suppose that ker(f) = 0
R
. Suppose
that f(r
1
) = f(r
2
) for r
1
, r
2
in R. Since f is a ring homomorphism, we nd
f(r
1
r
2
) = f(r
1
) +f(r
2
) = f(r
1
) f(r
2
) = 0
S
(the last but one equality
is because of Corollary 2.91). Thus, r
1
r
2
ker(f). But ker(f) is the zero
ideal, so r
1
r
2
= 0, i.e., r
1
= r
2
. Hence, f is injective.
2
Lemma 2.103. Let f : R S be a ring homomorphism. Write f(R) for
the image of R under f. Then f(R) is a subring of S.
Proof. We will apply Lemma 2.30 to f(R). Given arbitrary s
1
and s
2
, in
f(R), note that by denition of being in the image of R, s
1
= f(r
1
) and s
2
=
f(r
2
) for some elements r
1
and r
2
in R (these elements are not necessarily
uniquely determined in R). Then s
1
+s
2
= f(r
1
) +f(r
2
) = f(r
1
+r
2
) (the
last equality is because f is a ring homomorphism), thus showing that s
1
+s
2
is also in the image of R. Hence f(R) is closed under addition. Similarly,
s
1
s
2
= f(r
1
)f(r
2
) = f(r
1
r
2
), so f(R) is also closed under multiplication.
By denition, f(1
R
) = 1
S
, so 1
S
f(R). Finally, we need to show that
s
1
f(R) (recall that s
1
is an arbitrary element of f(R)). But this is easy:
thanks to Corollary 2.91, f(r
1
) = f(r
1
) = s
1
, so indeed s
1
f(R).
Hence, f(R) is a subring of S.
2
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 71
We now quantify our observation (see the discussion on page 61) that
somehow, the rings R/x and R are equal. Let us revisit this example
again in a new light:
Example 2.104. Let us dene

f : R[x]/x R by

f(p(x) + x) = p(0).
Let us explain this: the equivalence class of a polynomial p(x) under the
equivalence relation that denes the ring R/I is the coset p(x) + I (see
Lemma 2.78). Our function sends the equivalence class of p(x) to the
constant term of p(x). We rst need to check that this function is well-
dened: we have dened

f in terms of one representative of an equiv-
alence class, what if we had used another representative? So, suppose
p(x) + x = q(x) + x, then if we had used q(x), we would have dened

f(p(x) +x) =

f(q(x) +x) = q(0). Earlier, we had dened

f(p(x) +x)
to be p(0): are these denitions the same? In other words, is p(0) = q(0)?
The answer is yes! For, the fact that p(x) + x = q(x) + x means that
p(x) q(x) x (why?), or alternatively, p(x) q(x) is a multiple of x.
Hence, the constant term of p(x) q(x), which is p(0) q(0), must be zero,
i.e., p(0) = q(0). It follows that

f is indeed well-dened.
Now that we know

f is well-dened, it is easy to check that

f is a
ring homomorphism (do it!). What is the kernel of

f? It consists of all
equivalence classes p(x) +x such that the constant term p(0) is zero. But
to say that p(0) is zero is to say that p(x) is divisible by x (why?), or in
other words, that p(x) is already in x. Thus, the kernel of

f consists
of just the equivalence class xbut this is the zero element in the ring
R[x]/x. Thus, the kernel of

f is just the zero ideal, so by Lemma 2.102,

f is injective. Moreover,

f is clearly surjective, since every real number r
arises as the constant term of some polynomial in R[x] (for example, the
polynomial r + 0x + 0x
2
+ ).
The function

f quanties why R[x]/x and R are really equal to each
other. There are two ingredients to this: the function

f, being injective
and surjective, provides a one-to-one correspondence between R[x]/x and
72 CHAPTER 2. RINGS AND FIELDS
R as sets, and the fact that

f is a ring homomorphism tells us that the
addition and multiplication in R is essentially the same as that in R[x]/x.
Moreover, since

f has kernel zero, we do not even have to divide out by any
ideal in R[x]/x to realize this sameness of ring operations. Thus, R/x
and R are really the same rings, even though they look dierent. We say
that R[x]/x is isomorphic to R via the map

f.
Denition 2.105. Let f : R S be a ring homomorphism. If f is both
injective and surjective, then f is said to be an isomorphism between R and S.
Two rings R and S are said to be isomorphic (written R

= S) if there is some
function f : R S that is an isomorphism between R and S.
Let us look at some examples of ring isomorphisms:
Example 2.106. Let us revisit Example 2.38. Denote the function that
sends r R to diag(r) by f.
Exercise 2.106.1. Check that f is bijective as a function from R
to the subring of M
n
(R) consisting of matrices of the form diag(r).
Exercise 2.106.2. Also, check that f(r + s) = f(r) + f(s), and
f(rs) = f(r)f(s).
Moreover f(1) is clearly the identity matrix. Thus, the function f is
indeed a ring homomorphism from R to the subring of M
n
(R) consisting of
matrices of the form diag(r) that is both injective and surjective, or described
alternatively, f is an isomorphism between these two rings. Intuitively, these
two rings are the same, even though one appears as a set of ordinary
numbers, while the other appears in the form of special matrices.
Example 2.107. Dene a function

f from the quotient ring Q[x]/x
2
2
to Q[

2] by the rule

f(p(x) +x
2
2) = p(

2). We will prove that



f is an
isomorphism between Q[x]/x
2
2 and Q[

2].
Exercise 2.107.1. Show that

f is well dened. (Hint: If p(x) +
x
2
2 = q(x) + x
2
2, then p(x) q(x) x
2
2, so
p(x)q(x) = g(x)(x
2
2) for some polynomial g(x) Q[x]. What
happens if you set x =

2 in this?)
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 73
Exercise 2.107.2. Show that

f is a ring homomorphism.
Exercise 2.107.3. Show that

f is surjective.
Exercise 2.107.4. Show that

f is injective. (Hint: Recall that we
have proved in Example 2.99 that p(

2) is zero precisely when p(x)


is divisible by x
2
2.)
Thus,

f provides an isomorphism between Q[x]/x
2
2 and Q[

2].
Intuitively, the two rings are the same, even though one appears as a
quotient ring of polynomials, while the other appears as a subring of the
reals.
Example 2.108. The following examples show that well-known elds can
show up as subrings of matrices!
Exercise 2.108.1. Let S denote the subset of M
2
(Q) consisting
of all matrices of the form
_
a 2b
b a
_
where a and b are arbitrary rational numbers.
1. Show that S is a subring of M
2
(Q).
2. Prove that the map f : Q[

2] S that sends a + b

2 to
the matrix above is an isomorphism between Q[

2] and S.
Exercise 2.108.2. Let S denote the subset of M
2
(R) consisting
of all matrices of the form
_
a b
b a
_
where a and b are arbitrary real numbers.
1. Show that S is a subring of M
2
(R).
2. Prove that the map f : C S that sends a+b to the matrix
above is an isomorphism between C and S.
74 CHAPTER 2. RINGS AND FIELDS
The two examples in the exercises above are referred to as the regular
representation of Q[

2] in M
2
(Q) and C in M
2
(R) (respectively). More
generally, let K/F be any eld extension (see Denition 2.60). Then K can
be considered as a vector space over F (we will study this in Example 3.7
in Chapter 3 ahead). When the dimension of K over F is nite, say n, then
one can always nd a subring of M
n
(F) that is isomorphic to K: this is
considered in Exercise 3.105 in Chapter 3.
Example 2.109. It is not necessary that the rings R and S in the denition
of a ring isomorphism be dierent rings. A ring isomorphism f : R R is
to be thought of as a one-to-one onto map from R to R that preserves the
ring structure. (Such a map is also known as an automorphism of R.) Here
are some examples:
Exercise 2.109.1. Prove that the map f : Q[

2] Q[

2] that
sends a + b

2 (for a, b, in the rationals) to a b

2 is a ring
isomorphism. What are the elements of Q[

2] on which f acts as
the identity map?
Exercise 2.109.2. Let F be a eld, and let a be a nonzero element
of F. Let b be an arbitrary element of F. Prove that the map
f : F[x] F[x] that sends x to ax + b and more generally, a
polynomial p
0
+ p
1
x + + p
n
x
n
to the polynomial p
0
+ p
1
(ax +
b) + +p
n
(ax +b)
n
is an automorphism of F[x].
Exercise 2.109.3. Prove that the complex conjugation map f :
C C that sends a+b (given real number a and b) to the complex
number a b is an automorphism of C. Determine the set of
complex numbers on which f acts as the identity map.
We now come to a fundamental result that connects homomorphisms
and isomorphisms. To motivate this, compare Examples 2.96 and 2.104. In
the rst example, we dened a function f : R[x] R that sends p(x) to p(0)
and observed that it was a ring homomorphism whose image was all of R
and whose kernel was the ideal x, while in the second example, we dened
a function

f : R[x]/x R by

f(p(x) + x) = p(0), and observed that it
was well-dened and that it gave us an isomorphism between R[x]/x and
2.6. RING HOMOMORPHISMS AND ISOMORPHISMS 75
R. Observe the close connection between how the functions f and

f are
dened in the two examples, and observe that the ring R[x]/x is obtained
by modding R[x] by the kernel of f. Now as another instance, compare
Examples 2.99 and 2.107. Here too, in the rst example, we dened a
function f : Q[x] Q[

2] that sends p(x) to p(

2), and we observed


that it was a surjective ring homomorphism whose kernel was x
2
2. In
the second example, we dened a function

f : Q[x]/x
2
2 Q[

2] by

f(p(x) +x
2
2) = p(

2), and observed that it was well-dened and that


it gave us an isomorphism between Q[x]/x
2
2 and Q[

2]. Once again,


observe the close connection between how the functions f and

f are dened
in the two examples, and observe that the ring Q[x]/x
2
2 is obtained by
modding out Q[x] by the kernel of f.
These connections in the two pairs of examples above are not accidental,
and are merely instances of a more general phenomenon, captured by the
following:
Theorem 2.110. (Fundamental Theorem of Homomorphisms of Rings.)
Let f : R S be a homomorphism of rings, and write f(R) for the
image of R under f. Then the function

f : R/ker(f) f(R) dened by

f(r + ker(f)) = f(r) is well-dened, and provides an isomorphism between


R/ker(f) and f(R).
Proof. The idea of the proof is already contained in the two sets of examples
2.96 and 2.104, and 2.99 and 2.107 discussed above.
We check that

f is well-dened. Suppose r +ker(f) = s +ker(f). Then
r s ker(f), so f(r s) = f(r) f(s) = 0, so f(r) = f(s). Thus,

f(r +ker(f)) =

f(s +ker(f)), i.e.,

f is well-dened.
Now let us go through the three ingredients in Denition 2.88 and check
that

f is a ring homomorphism. We have

f ((r +ker(f)) + (s +ker(f))) =

f ((r +s) +ker(f)) = f(r+s) = f(r)+f(s) =



f(r+ker(f))+

f(s+ker(f)).
Exercise 2.110.1. Justify all the equalities above.
76 CHAPTER 2. RINGS AND FIELDS
Similarly,

f ((r +ker(f)) (s +ker(f))) =

f ((r s) +ker(f)) = f(r
s) = f(r) f(s) =

f(r +ker(f))

f(s +ker(f)).
Exercise 2.110.2. Again, justify all the equalities above.
Finally,

f(1
R
+ker(f)) = f(1
R
) = 1
S
.
Question 2.110.1. Why?
Hence

f is a ring homomorphism.
We check that

f is surjective as a function from R/ker(f) to f(R). Note
that any element of f(R) is, by denition, of the form f(r) for some r R.
But then, by the way we have dened

f, we nd f(r) =

f(r + ker(f)), so
indeed

f is surjective.
Finally, we check that

f is injective. Note that

f(r +ker(f)) = f(r) = 0
means that r ker(f). But this means that r + ker(f) = ker(f) (why?),
so r +ker(f) is the zero element of R/ker(f). Thus

f is injective.
Putting this together, we nd that

f provides an isomorphism between
R/ker(f) and f(R).
2
Here are examples of applications of this theorem, all built from examples
of ring homomorphisms that we have already seen.
Example 2.111. We have the isomorphism (see Example 2.100) Q[x]/x
2
+
1

= Q[].
Example 2.112. By the same token, we nd R[x]/x
2
+ 1

= C.
Exercise 2.112.1. Mimic Example 2.100 and construct a homo-
morphism from R[x] to C that sends p(x) to p(i) and prove that
it is surjective with kernel x
2
+ 1. Then apply Theorem 2.110 to
establish the claim that R[x]/x
2
+ 1

= C.
Example 2.113. Example 2.101 along with Theorem 2.110 above estab-
lishes that for any eld F and any a F, F[x]/x a

= F.
2.7. FURTHER EXERCISES 77
2.7 Further Exercises
Exercise 2.114. Starting from the ring axioms, prove that the properties
stated in Remark 2.24 hold for any ring R.
(See the notes on page 89 for some hints.)
Exercise 2.115. This generalizes Exercise 2.48: If R is a ring, let R

denote
the set of invertible elements of R. Prove that R

forms a group with respect


to multiplication.
Exercise 2.116. This exercise determines the units of the ring Z[]:
1. Dene a function N : Z[] Z by N(a + b) = a
2
+ b
2
. Show that
N(xy) = N(x)N(y) for all x and y in Z[].
2. If x is invertible in Z[], show that N(x) must equal 1.
3. Conclude that the only units of Z[] are 1 and .
Exercise 2.117. Consider the ring Q[

m] of Example 2.12. Now assume for


this exercise that m is not a perfect square. Show that a + b

m = 0 (for a
and b in Q) if and only if a = b = 0. Show that Q[

m] is a eld.
Exercise 2.118. The following concerns the ring Q[

2,

3] of Example 2.34,
and is designed to show that if a, b, c, and d are rational numbers, then a +
b

2 +c

3 +d

6 = 0 if and only if a, b, c, and d are all zero.


1. Show that
_
3/2 is not rational. (This is similar to showing that

p is
not rational for any prime p.
2. Show that

3 , Q[

2]. (Hint: Assume that

3 Q[

2]. Then there


must exist rational numbers x and y such that

3 = x + y

2. Square
both sides and arrive at a contradiction. You will need to invoke a fact
about Q[

2] that you were asked to prove in Example 2.12, as well as


the results of Chapter 1, Exercise 1.42, and part 1 above.)
3. Now assume that a +b

2 +c

3 +d

6 = 0 for some choice of rational


numbers a, b, c, and d. Write this as (a + b

2) +

3(c + d

2) = 0.
Prove that c + d

2 must be zero. (Hint: Argue that otherwise we can


write

3 =
a +b

2
c +d

2
. Why is this last equality a contradiction?)
78 CHAPTER 2. RINGS AND FIELDS
4. Conclude that this forces a = b = c = d = 0.
5. Observe that if a = b = c = d = 0 then a + b

2 + c

3 + d

6 = 0
trivially. This proves the assertion stated at the beginning.
Exercise 2.119. We will prove in this exercise that Q[

2,

3] is actually a
eld.
1. You know that if a and b are rational numbers, then (a+b

2) (ab

2)
is also rational. (Why?) Similarly, if c and d are rational numbers, then
(c +d

3) (c d

3) is also rational. Now show the following: if a, b, c,


and d are all rational numbers, then the product of the four terms
(a +b

2 +c

3 +d

6) (a +b

2 c

3 d

6)
(a b

2 +c

3 d

6) (a b

2 c

3 +d

6)
is also rational. (This just involves multiplying out all the terms above
do it! However, you can save yourselves a lot of work by multiplying the
rst two terms together using the formula (x +y)(x y) = x
2
y
2
, and
then multiplying the remaining two terms together, and looking out for
patterns.)
2. Now show using part (1) above that Q[

2,

3] is a eld. (Hint: Given


a nonzero element a + b

2 + c

3 + d

6 in Q[

2,

3], rst note that


by Exercise 2.118 above, none of (a + b

2 c

3 d

6), (a b

2 +
c

3 d

6) or (a b

2 c

3 +d

6) can be zerowhy? Now, in the


case of Q[

2], one nds the inverse of x +y

2 by multiplying both the


numerator and the denominator of the fraction
1
x+y

2
by x y

2 and
taking advantage of the fact that (x +y

2)(x y

2) is rational. What
ideas do you get from part (1) above?)
Exercise 2.120. Let R be an integral domain. Show that an element in R[x] is
invertible, if and only if it is the constant polynomial r(= r +0x+0x
2
+ ) for
some invertible element r R. In particular, if R is a eld, then a polynomial
in R[x] is invertible if and only if it is a nonzero element of R. (See the notes
on Page 88 for a discussion on polynomials with coecients from an arbitrary
ring.)
By contrast, show that the (nonconstant) polynomial 1 +[2]
4
x in the poly-
nomial ring Z/4Z[x] is invertible, by explicitly nding the inverse of 1 + [2]
4
x.
Repeat the exercise by nding the inverse of 1 + [2]
8
x in the polynomial ring
Z/8Z[x]. (Hint: Think in terms of the usual Binomial Series for 1/(1 +t) from
your Calculus courses. Do not worry about convergence issues. Instead, think
2.7. FURTHER EXERCISES 79
about what information would you glean from this series if, due to some miracle,
t
n
= 0 for some positive integer n?)
Exercise 2.121. We will revisit some familiar identities from high school in
the context of rings! Let R be a ring:
1. Show that a
2
b
2
= (a b)(a + b) for all a and b in R if and only if R
is commutative.
2. Show that (a +b)
2
= a
2
+ 2ab +b
2
for all a and b in R if and only if R
is commutative.
3. More generally, if R is a commutative ring, prove that the Binomial The-
orem holds in R: for all a and b in R and for all positive integers n,
(a+b)
n
=
_
n
0
_
a
n
+
_
n
1
_
a
n1
b+
_
n
2
_
a
n2
b
2
+ +
_
n
n 1
_
ab
n1
+
_
n
n
_
b
n
Exercise 2.122. An element a in a ring is said to be nilpotent if a
n
= 0 for
some positive integer n.
1. Show that if a is nilpotent, then 1a and 1+a are both invertible. (Hint:
Just as in Exercise 2.120 above, think in terms of the Binomial Series for
1/(1t) and 1/(1+t). Do not worry about convergence, but ask yourself
what you can learn from the series if t
n
= 0 for some positive integer n.)
2. Let R be a commutative ring. Show that the set of all nilpotent elements
in R forms an ideal in R. (Hint: Suppose that a
n
= 0 and b
m
= 0. What
can you say about (a + b)
n+m1
, given your knowledge of the Binomial
Theorem for commutative rings from Exercise 2.121 above?
Exercise 2.123. Let S denote the set of all functions f : R R. Given f
and g in S, dene two binary operations + and on S by the rules
(f +g)(x) = f(x) +g(x)
(f g)(x) = f(x)g(x)
(These are referred to, respectively, as the pointwise addition and multiplication
of functions.)
1. Convince yourselves that (S, +, ) is a ring. What is the 0 of S? What
is the 1 of S?
80 CHAPTER 2. RINGS AND FIELDS
2. Show that S is not an integral domain. (Hint: Play with functions like
f(x) = x +[x[ or g(x) = x [x[.)
3. More generally, show that every nonzero f S is either a unit or a
zero-divisor by showing:
(a) f is a unit if and only if f(x) ,= 0 for all x R.
(b) f is a zero-divisor if and only if f(x) = 0 for at least one x R.
4. Let s : R S be the function that sends the real number r to the function
s
r
dened by s
r
(x) = r for all x R. Show that s is an injective ring
homomorphism from R to S The image of s in R is therefore a subring of
R that is isomorphic to R. It is known as the set of constant functions.
Exercise 2.124. Let R be a ring.
Denition 2.124.1. The center of R, written Z(R), is dened
to be the set r R [ rx = xr for all x R.
1. Show that Z(R) is a subring of R.
2. If R is commutative, what is Z(R)?
3. Determine Z(M
2
(Z)). (Hint: Invoke the fact that a matrix in the center
must commute with the four matrices e
i,j
, where e
i,j
is as dened in
Exercise 2.16.4.)
Exercise 2.125. Let R be a ring.
1. If I and J are ideals of R, show that I J is an ideal of R. (Is I J an
ideal of R?)
2. If S and T are subrings of R, show that S T is a subring of R. (Is S T
a subring of R?)
3. If R is a eld, and if S and T are subelds of R, show that S T is a
subeld of R.
Exercise 2.126. Here is an example of a ring in which elements do not factor
uniquely into a product of primes! Consider the subring of C generated by Z and

5, namely, Z[

5]. By arguments nearly identical to those that you must


have used in Exercise 2.117 above, every element in this ring can be written
uniquely as a + b

5 for suitable integers a and b. We dene a function N


from Z[

5] to Z as follows: N(a +b

5) = a
2
+5b
2
. (Notice that a
2
+5b
2
is just (a +b

5) (a b

5).)
2.7. FURTHER EXERCISES 81
1. Show that N is multiplicative, that is, N(xy) = N(x)N(y) for any two
elements x and y of Z[

5].
2. Show that if x in Z[

5] is such that N(x) = 1, then x must be 1.


3. Use part 1 and Question 2.45 to show that if x is a unit in Z[

5], then
N(x) must be 1.
4. Use parts 2 and 3 above to show that if x is a unit in Z[

5], then x
can only be 1.
5. If R is a commutative ring, an irreducible in R is a nonzero element x
such that if x = bc for two elements b and c, then either b or c must be
a unit. (It turns out that this is the correct generalization of the concept
of primes that is needed to study unique factorization in arbitrary rings.)
Also, just as in Z, we say an element b in an arbitrary commutative ring
R divides an element a (or is a divisor of a) if there exists an element
c in R such that a = bc. Using part 4, show that if x is an irreducible
element in Z[

5], then the only divisors of x are 1 and x. (Thus, at


least in Z[

5], it is clear that irreducible elements are just like primes.)


6. Show that if x in Z[

5] is such that N(x) is a prime integer, then x


must be irreducible.
7. Show that there is no element x in Z[

5] with N(x) = 2. Similarly,


show that there is no element x with N(x) = 3.
8. Show that 2 is irreducible in Z[

5]. (Hint: Suppose 2 = xy. Then


4 = N(2) = N(x)N(y), as N is multiplicative. Study the various factor-
izations of 4 and use the previous parts.)
9. Similarly, show that 3 is irreducible in Z[

5].
10. Study the various factors of N(1 +

5) and of N(1

5) and show
that both 1 +

5 and 1

5 are irreducible.
11. Two irreducibles x and y in a commutative ring R are said to be associates
if x = yu for some unit u. Part 4 shows that in the ring Z[

5], two
elements x and y are associates if and only if x = y. Now use the fact
that every element in Z[

5] is uniquely expressible as a+b

5 to show
that neither 2 nor 3 is an associate of either 1 +

5 or 1

5.
12. A commutative ring R is said to possess unique prime factorization if every
element a R that is not a unit factors into a product of irreducibles,
82 CHAPTER 2. RINGS AND FIELDS
and if a = x
1
x
2
x
s
and a = y
1
y
2
y
t
are two factorizations of a
into irreducibles, then s must equal t, and after relabeling if necessary,
each x
i
must be an associate of the corresponding y
i
. (Again, it turns
out that this is the correct generalization of the concept of unique prime
factorization in the integers to arbitrary commutative rings.) Prove that
Z[

5] does not possess unique prime factorization by considering two


dierent factorizations of 6 into irreducibles. (Hint: Look at parts 8, 9,
10, and 11.)
Exercise 2.127. Prove that any nite integral domain must be a eld. (Hint:
Write R for the integral domain. Given a nonzero a R, you need to show that
a is invertible. What can you say about the function f
a
: R R that sends
any r to ar? Is it injective? Is it surjective? So?)
Exercise 2.128. Let K be a eld, and let R be a subring of K. Assume that
every element of K satises a monic polynomial with coecients in R: this
means that given any k in K, there exists a positive integer n and elements
r
0
, r
1
, . . . , r
n1
in R such that k
n
+r
n1
k
n1
+ +r
1
k +r
0
= 0. (The term
monic refers to the fact that the coecient of k
n
in the relation above is 1.)
Show that R must also be a eld.
(Hint: Since R is already an integral domain, you only need to show that
every nonzero element of R is invertible. Given a nonzero r R, note that r is
invertible as an element of K since K is a eld. In particular, r
1
exists in K.
Use the hypothesis to show that r
1
actually lives in R.)
Exercise 2.129. Let R and S be two rings. This exercise studies ideals in the
direct product R and S. Let I be an ideal of R S.
1. Let I
1
= a R such that (a, b) I for some b S. Show that I
1
is
an ideal of R.
2. Similarly dene I
2
to be the set b S such that (a, b) I for some
a R. Show that I
2
is an ideal of S.
3. Recall from Example 2.76 the meaning of I
1
I
2
. Show that I = I
1
I
2
.
(Hint: (a, b) = (1
R
, 0
S
)(a, b) + (0
R
, 1
S
)(a, b).)
We saw in Example 2.76 that if I
1
and I
2
are ideals of R and S respectively,
then I
1
I
2
is an ideal of R S. This exercise therefore shows the converse:
every ideal of RS is of the form I
1
I
2
where I
1
and I
2
are ideals of R and
S respectively.
2.7. FURTHER EXERCISES 83
Exercise 2.130. We will prove in this exercise that every ideal in Z is principal
(see Example 2.77). Let I be an ideal of Z. If I consists of just the element 0,
then I = 0 and is already principal. So, assume in what follows that I is a
nonzero ideal.
1. Show that I contains at least one positive integer.
2. Let d be the least positive integer in I (Well-Ordering Principle). Given
an arbitrary i I, write i = dq + r for some 0 r d by the division
algorithm. Show that r I.
3. Conclude that r must be zero.
4. Conclude that I = d.
An integral domain in which every ideal is principal is called a principal ideal
domain. This exercise therefore establishes that Z is a principal ideal domain.
Exercise 2.131. The purpose of this exercise is to prove the following result,
which is known as the division algorithm for polynomials: Let F be a eld,
and let f(x) and g(x) be two polynomials in F[x] with g(x) ,= 0. Then, there
exist unique polynomials q(x) and r(x), such that f(x) = g(x)q(x) +r(x) with
either r(x) = 0 or else deg(r(x)) < deg(g(x)).
Note the similarity of this result with Lemma 1.4 of Chapter 1. This result,
of course, simply codes the output of the familiar process of dividing f(x) by
g(x) using long division, and in fact, the long division process can be turned
around to furnish a proof of this result. However, we will prove this result
instead in a manner analogous to how we proved Lemma 1.4 of Chapter 1, to
underscore certain similarities between the integers and polynomials. (You will
notice that the core of this proof, however, invokes a crucial ingredient of the
long division process in Part 5a below!)
1. First prove the uniqueness of q(x) and r(x) as follows: Suppose that
f(x) = g(x)q(x) +r(x) and as well, f(x) = g(x)q

(x) +r

(x), for poly-


nomials q(x), r(x), q

(x), and r

(x) with either r(x) = 0 or deg(r(x)) <


deg(g(x)), and similarly, either r

(x) = 0 or deg(r

(x)) < deg(g(x)).


Rewrite this as g(x)(q(x) q

(x)) = r

(x) r(x). Show that if r

(x)
r(x) ,= 0 then the degree of the right side must be less than the degree
of the left side, and hence conclude that r(x) = r

(x) and q(x) = q

(x).
This establishes the uniqueness of q(x) and r(x).
2. Now for the existence of q(x) and r(x). First, let S

denote the set


f(x) g(x)h(x) [ h(x) F[x]. Show that S

is nonempty.
84 CHAPTER 2. RINGS AND FIELDS
3. If S

contains 0, show that we have proved the existence of q(x) and r(x)
with the required properties.
4. So assume from now on that S

does not contain 0. Show that among


the elements of S

there must be an element of least degree.


5. Let r(x) denote an element of least degree in S

, and let q(x) F[x]


be that polynomial such that f(x) g(x)q(x) = r(x). First show that
deg(r(x)) < deg(g(x)) as follows:
(a) Assume to the contrary that deg(r(x)) deg(g(x)). Let m =
deg(r(x)) and n = deg(g(x)), so m n. Let r
m
and g
n
(re-
spectively) be the highest coecients of r(x) and g(x) (so, by def-
inition of highest coecient, r
m
and g
n
are nonzero). Show that
r(x) (r
m
/g
n
)x
mn
g(x) has degree less than r(x).
(b) Now show that the element f(x) g(x)(q(x) +(r
m
/g
n
)x
mn
) is an
element of S

that has degree less than that of r(x). Conclude that


deg(r(x)) < deg(g(x)).
6. Conclude that we have proved the existence of q(x) and r(x) with the
required properties in the case where S

does not contain 0, and have


hence proved our result in all cases.
Exercise 2.132. We saw in Exercise 2.130 that Z is a principal ideal domain.
The key to that proof was the division algorithm in the integers. Now that we
have established a corresponding division algorithm in the ring F[x], where F is
any eld (see Exercise 2.131), we will use it to show that F[x] is also a principal
ideal domain.
Accordingly, let I be an ideal of F[x]. If I consists of just the element 0,
then I = 0 and is already principal. Similarly, if I = R, then I = 1 (see
Exercise 2.77.1) so I is principal in this case as well. So, assume in what follows
that I is a nonzero proper ideal of R (proper simply means that I ,= R).
In particular, I cannot contain any constant polynomials other than 0, since, if
some nonzero a F is in I, then a a
1
= 1 is also in I, contradicting what
we have assumed about I.
Let f(x) be a polynomial in I whose degree is least among all (nonzero)
polynomials in I. (Such a polynomial exists by the Well-Ordering Principle.)
Note that f(x) must have positive degree by our assumption about I. Let g(x)
be an arbitrary polynomial in I. Apply the division algorithm and, using similar
ideas as in Exercise 2.131, prove that g(x) must a multiple of f(x). Conclude
that I = f(x).
2.7. FURTHER EXERCISES 85
Exercise 2.133. By contrast with the situation in Exercise 2.132 above, Z[x]
is not a principal ideal domain! Prove this. (Hint: Show that the ideal 2, x of
Z[x] cannot be generated by a single polynomial f(x) with coecients in Z.)
Exercise 2.134. Let R be a ring, and let I and J be two ideals of R. The
sum of I and J, denoted I + J, is the set i + j [ i I and j J (i.e., it
consists of all elements of R that are expressible as a sum of an element from
I and an element from J). Show that I +J is an ideal of R.
Exercise 2.135. Let R be a ring, and for simplicity, assume throughout this
exercise that R is commutative. A proper ideal I is said to be maximal if for
any other proper ideal J, I J implies that I = J.
1. Show that if I is maximal then for any other ideal J, J I implies
I +J = R. (Hint: Assume that I is maximal and J is another ideal with
J I. Pick an element j J I. Show that the set K = i +rj [ i
I and r R is an ideal of R. Now invoke the fact I K.)
2. Show that the converse is true as well: if I is a proper ideal such that for
any other ideal J, J I implies I + J = R, then I is maximal. (Hint:
Assume that I has this property, and let J be a proper ideal with I J.
If I ,= J, then clearly J I. The hypothesis then says I + J = R. But
what else can you say about I +J that then gives a contradiction?)
(Thus either property could be used to dene maximal ideals.)
3. Show that a proper ideal I is maximal if and only if R/I is a eld. (Hint:
Assume that I is maximal. Pick a nonzero element [x] in R/I. Since [x]
is nonzero, x , I. Study the set K = i +rx [ i I and r R, which
you showed in part (1) to be an ideal of R. By maximality of I show that
there must be some i I and r R such that i + rx = 1. What does
this relation read in R/I? Now invoke Exercise 2.47. A similar argument
should also establish that if R/I is a eld, then I must be maximal.)
It is instructive to note that maximal ideals always existsee Theorem B.6
in Appendix B.
Exercise 2.136. Let R be a commutative ring. A proper ideal I of R is said
to be prime if whenever ab I for a and b in R, then either a or b must be in
I.
1. Show that I is a prime ideal if and only if R/I is an integral domain.
(Hint: This is just a matter of translating the denition of a prime ideal
over to the ring R/I: for instance, assume that I is prime. If we have a
relation [a][b] = 0
R/I
in R/I, then this means that ab I in R.)
86 CHAPTER 2. RINGS AND FIELDS
2. Show that every maximal ideal is necessarily prime.
3. Show that if p is a prime integer, then the ideal p in Z/pZ is a prime
ideal.
4. Let F be a eld, and let p(x) be an irreducible polynomial in F[x]. (This
means that whenever p(x) = q(x)r(x) for two polynomials in F[x], then
either q(x) or r(x) must be a constant polynomial.) Show that the ideal
p(x)) is a prime ideal in F[x].
Exercise 2.137. Let R be any ring containing the rationals. Prove that the
only ring homomorphism f : Q R is the identity map that sends any rational
number to itself. (Hint: given that f(1) must be 1, what can you say about
f(2), f(3), etc.? Next, what can you say about f(1/2), f(1/3), etc.? So now,
what can you say about f(m/n) for arbitrary integers m and n with n ,= 0?
Exercise 2.138. Prove that the following are all ring isomorphisms from
Q[

2,

3] to itself. Here, a, b, c, and d are, as usual, rational numbers.


1. The map that sends a +b

2 +c

3 +d

6 to a b

2 +c

3 d

6.
2. The map that sends a +b

2 +c

3 +d

6 to a +b

2 c

3 d

6.
3. The map that sends a +b

2 +c

3 +d

6 to a b

2 c

3 +d

6.
(Of course, the identity map that sends a+b

2+c

3+d

6 to a+b

2+
c

3 +d

6 is also a ring isomorphism. It can be shown that these four are all
the ring isomorphism from Q[

2,

3] to itself.)
Notes
Remarks on Example 2.10 Every nonzero element in Q has a multiplicative
inverse, that is, given any q Q with q ,= 0, we can nd a rational number q

such
that qq

= 1. The same cannot be said for the integers: not every nonzero integer
has a multiplicative inverse within the integers. For example, there is no integer a
such that 2a = 1, so 2 does not have a multiplicative inverse.
Remarks on Example 2.12 The sum and product of any two elements a +
b

2 and c +d

2 of Q[

2] are (respectively) (a +c) + (b +d)

2 and (ac + 2bd) +


(ad + bc)

2. Since a + c, b + d, ac + 2bd and ad + bc are all rational numbers,


2.7. FURTHER EXERCISES 87
the sum and product also lie in Q[

2]. Thus, the standard method of adding and


multiplying two real numbers of the form x +y

2 with x and y in Q indeed gives


us binary operations on Q[

2]. (In the language of the next section, Q[

2] is
closed under addition and multiplication.) Now suppose you were trying to prove
that, say, addition in Q[

2] is associative, that is, for any u, v, and w in Q[

2],
(u +v) +w = u + (v +w). Notice that in addition to being in Q[

2], u, v, and w
are also real numbers. Since associativity holds in the reals, we nd upon viewing
u, v, and w as real numbers that (u + v) + w = u + (v + w). Now viewing u, v,
and w in this equation back again as elements of Q[

2] , we nd that associativity
holds in Q[

2]! This same argument holds for associativity of multiplication and


distributivity of multiplication over addition. To prove that a + b

2 = 0 i a = 0
and b = 0, proceed as follows: If b is not zero, a+b

2 = 0 yields

2 = a/b. Since
a/b is a rational number, this contradicts Chapter 1, Exercise 1.42, so b must be
zero. But if b = 0, a +b

2 = 0 reads a = 0, so we nd that both a and b are zero.


Remarks on Example 2.14 Assume that a +b = 0. If b is not zero, we can
write = a/b, and squaring both sides, we nd 1 = a
2
/b
2
. The right hand side
is positive, since both a
2
and b
2
are positive (they are squares). But the left hand
side is negative. Because of this contradiction, b must be zero. As before, we nd
that a is also zero.
Remarks on Examples 2.15 Given two elements a and b in Z
(2)
, write a
as x/y where gcd(x, y) = 1 and y is odd. (Why can you do this?) Write b as
u/v where gcd(u, v) = 1 and v is odd. Then a + b = (xv + yu)/yv. This fraction
may not be reduced, but notice that yv, being a product of two odd integers, is
odd. After you cancel all common factors from (xv + yu) and yv, the resultant
fraction will still have an odd denominator (why?). Hence a + b will be in Z
(2)
.
In a similar way, show that ab (gotten by the usual multiplication of two rational
numbers) will also be in Z
(2)
. Now that you have two binary operations on Z
(2)
,
you can check that the ring axioms hold. As with previous examples, associativity
and distributivity follow from the fact that they hold for the rationals. Notice that
the fact that the product of two odd integers is odd was essential in showing that
both a +b and ab lie in Z
(2)
. How could we generalize this? Rewrite this property
in the contrapositive form, yv is even implies either y or v is even, that is, 2[yv
implies 2[y or 2[v. If we could nd another integer n that has the property that
88 CHAPTER 2. RINGS AND FIELDS
n[yv implies n[y or n[v, we could use the same arguments to show that Z
(n)
is also
a ring. (Assuming you found such an integer n, how would you dene Z
(n)
?) Can
you think of other integers that have this property? (Hint: You have come across
such integers in the previous chapter.)
Remarks on Example 2.16 For n = 1, M
n
(R) is just R, so it is commutative.
For all other n, M
n
(R) is noncommutative. Given A in M
n
(R), write A as (a
i,j
).
(Recall what this notation means: you are referring to the (i, j)-th entry as a
i,j
.)
Similarly, write B as (b
i,j
) and C as (c
i,j
). Consider (A+B)+C. What is the (i, j)-
th entry of this resultant matrix? It is (a
i,j
+b
i,j
)+c
i,j
. On the other hand, what is
the (i, j)-th entry of A+(B+C)? It is a
i,j
+(b
i,j
+c
i,j
). Are the two (i, j)-th entries
equal on both sides? Yes! Why? Because a
i,j
, b
i,j
, and c
i,j
are just real numbers,
and since addition is associative in R, (a
i,j
+b
i,j
) +c
i,j
= a
i,j
+(b
i,j
+c
i,j
)! Since
this is true for every pair (i, j), we nd that (A+B) +C = A+ (B +C). (Notice
how the associativity of addition in M
n
(R) depends on the associativity of addition
in R.) In a similar manner, try to prove the distributive property of multiplication
over addition for M
n
(R); your proof should invoke the fact that distributivity holds
in R. Actually, if R is any ring, M
n
(R) is also a ring. It is noncommutative if
n 2. When n = 1, M
n
(R) is just R, so for n = 1, M
n
(R) is commutative if and
only if R is commutative.
Remarks on Example 2.17 For any ring R, we can consider the set of polyno-
mials with coecients in R with the usual denition of addition and multiplication
of polynomials. This will be a ring, with additive identity the constant polynomial
0 and multiplicative identity the constant polynomial 1. If R is commutative, R[x]
will also be commutative. (Why? Play with two general polynomials f =

n
i=0
f
i
x
i
and g =

m
j=0
g
j
x
j
and study fg and gf.) If R is not commutative, R[x] will also
not be commutative. To see this last assertion, suppose a and b in R are such that
ab ,= ba. Then viewing a and b as constant polynomials in R[x], we nd that we
get two dierent products of the polynomials a and b depending on the order in
which we multiply them!
Here is something strange that can happen with polynomials with coecients
in an arbitrary ring R. First, the degree and highest coecient of polynomials in
R[x] (where R is arbitrary) are dened exactly as for polynomials with coecients
in the reals. Now over R[x], if f(x) and g(x) are two nonzero polynomials, then
2.7. FURTHER EXERCISES 89
deg(f(x)g(x)) = deg(f(x)) + deg(g(x). But for an arbitrary ring R, the degree of
f(x)g(x) can be less than deg(f(x)) + deg(g(x))!
To see why this is, suppose f(x) = f
n
x
n
+ lower-degree terms (with f
n
,= 0),
and suppose g(x) = g
m
x
m
+lower-degree terms (with g
m
,= 0). On multiplying out
f(x) and g(x), the highest power of x that will show up in the product is x
n+m
,
and its coecient will be f
n
g
m
. If we are working in R, then f
n
,= 0 and g
m
,= 0
will force f
n
g
m
to be nonzero, so the degree of f(x)g(x) will be exactly n+m. But
over arbitrary rings, it is quite possible for f
n
g
m
to be zero even though f
n
and g
m
are themselves nonzero. (You have already seen examples of this in matrix rings.
Elements a and b in a ring R such that a ,= 0 and b ,= 0 but ab = 0 will be referred
to later in the chapter as zero-divisors.) When this happens, the highest nonzero
term in f(x)g(x) will be something lower than the x
n+m
term, so the degree of
f(x)g(x) will be less than n +m!
Clearly, this phenomenon will not occur if the coecient ring R does not have
any zero-divisors. As will be explained further along in the chapter, elds do
not have any zero-divisors (i.e., they are integral domains.) Hence if F is a eld
and f(x) and g(x) are two nonzero polynomials in F[x], then deg(f(x)g(x)) =
deg(f(x)) + deg(g(x)). (In particular, this shows that if F is any eld, F[x] also
does not have zero-divisorswhy?)
Remarks on Example 2.22 The additive identity is (0, 0) and the multi-
plicative identity is (1, 1). What is the product of (1, 0) and (0, 1)? Of (2, 0) and
(0, 2)?
Remarks on some properties of rings deducible from the axioms
Here is a hint for some of these properties listed in Remark 2.24:
1. Uniqueness of additive identity: Suppose 0 and 0

are two additive identities


in a ring R. Consider the expression 0 +0

. First view 0 as the identity, and


then view 0

as the identity. What do you nd?


2. Additive cancellation: Given that a+b = a+c, what happens if you add the
additive inverse of a to both sides of the equation, and use associativity?
3. a 0 = 0 a = 0. What happens if you invoke the fact that 0 = 0 + 0 and
multiply both sides by a?
90 CHAPTER 2. RINGS AND FIELDS
4. (1) (1) = 1. You would by now already have proved that a 0 = 0 a = a
for all a in your ring R. Write (1) 0 as (1) (1+(1)) and play with this!
Remarks on Denition 2.27 The requirement that 1 be in S arises from a
rather nasty technical point that can be ignored during a rst reading. If you are
curious, recall rst that 1 is merely notation for the multiplicative identity of R;
we could just as easily have referred to it as e or something else all along. It
turns out that if we dened subrings without the condition that 1 be in S, then
it is possible for S to be a subring of R (under this hypothetical denition) with
S and R having dierent multiplicative identities! This is a scenario we wish to
avoid, and it turns out that insisting that the multiplicative identity of R (namely
1) be in S will take care of this problem. At the same time, it turns out that no
such precaution needs to be taken for the additive identitythe additive identities
of R and S will necessarily be equal. (The proof is simple: write 0
S
and 0
R
for the
two additive identities, so we wish to prove that 0
S
= 0
R
. We know that 1 S.
So, working in S, we nd 0
S
+ 1 = 1, by the very denition of 0
S
. On the other
hand, working in R, we nd 0
R
+ 1 = 1. Comparing the two expressions for 1 and
working in R we nd 1 = 0
S
+ 1 = 0
R
+ 1. Additive cancellation in R now shows
that 0
S
= 0
R
.) This is of course all too pedantic for a rst go aroundwe would
do best by just accepting the denition above and getting on with our lives!
Remarks on Examples 2.33 Since every integer a can be written as a/1,
and since 1 of course is 2
0
, Z Z[1/2]. Since 2 does not divide 1, every integer a
(= a/1) is also in Z
(2)
. Hence, Z[1/2] Z
(2)
certainly contains Z. Now let x be any
rational number in Z[1/2] Z
(2)
. Since x Z[1/2], x can be written in the reduced
form a/2
n
, for some integer a and some n 0. If n > 0, then x cannot be in Z
(2)
.
Hence n = 0, that is, x Z. It follows that Z[1/2] Z
(2)
is precisely Z.
Remarks on the notation Q[

2]: Subring Generated by an Element


We have used notation like Q[

2], Q[], Z[1/2], to denote various rings that we have


studied. There is a reason for this notation: these are all examples of rings generated
by a subring and an element. We consider this notion here.
We will consider only commutative rings, even though the notion exists for
noncommutative rings as well. Accordingly, let R be a commutative ring, and let S
be a subring. (Must S be commutative as well?) Let a be any element in R. (For
2.7. FURTHER EXERCISES 91
instance, let R be the reals, let S be the rationals, and let a be the real number
1 +

2.) In general, S a will not be a subring of R, since this new set may not
be closed under addition and multiplication. (In our example, the square of 1+

2,
which is 3 +2

2, is not in Q 1 +

2. Similarly, the sum of, say 2 and 1 +

2,
which is 3 +

2 is not in Q1 +

2.) One could then ask: If in general S a


is not a subring of R, what are the elements of R that you should adjoin to the set
S a to get a set that is actually a subring of R?
To get a subring of R that contains both S and a, it is clear that we need to
be able to multiply a with itself any number of times, since our desired set must be
closed under multiplication. Hence, we need to adjoin all the elements a
2
, a
3
,. . .
Next, once all powers a
i
are adjoined, we need to be able to multiply any power of
a with any element of S, so we need to adjoin all products of the form sa
i
, where
s is an arbitrary element of S and a
i
is an arbitrary power of a. (The assumption
that R is commutative is being used here somewhere. Where exactly do you think
it is used?) Once we have such products, we need to be able to add such products
together if we are to have a ring (remember, our target set must be closed under
addition), so we need to have all elements of the form s
0
+s
1
a +s
2
a
2
+ +s
n
a
n
,
where the s
i
are arbitrary elements of S, and n 0. Is this enough? It turns out
it is!
Denition 2.139. Let R be a commutative ring, S a subring, and a an element of
R. An expression such as s
0
+s
1
a+s
2
a
2
+ +s
n
a
n
is called a polynomial expression
in a with coecients in S. Let S[a] denote the set of all polynomial expressions in a
with coecients in S, that is, the set of all elements of R that can be written in the
form s
0
+s
1
a +s
2
a
2
+ +s
n
a
n
, for some n 0, and some elements s
0
, s
1
,. . . , s
n
in S. S[a] is known as the subring of R generated by S and a. (If it is clear that we
are working inside a xed ring R, we often refer to S[a] merely as the ring generated
by S and a.)
Of course, we have blithely referred to S[a] as a ring in the denition above,
but we have yet to prove that S[a] is actually a ring! We will do so in a moment.
Lemma 2.140. Let R be a commutative ring, and let S be a subring of R. Let a
be an element of R. The set S[a] dened above is a subring of R.
Proof. Since S S[a], and since 1 S, 1 is in S[a]. Every element in S[a] is of the
form s
0
+s
1
a +s
2
a
2
+ +s
n
a
n
for some n 0 and some elements s
0
, s
1
,. . . , s
n
in S. The negative of such an element is (s
0
) +(s
1
)a+(s
2
)a
2
+ +(s
n
)a
n
,
which is also a polynomial expression in a with coecients in S, and is hence in
92 CHAPTER 2. RINGS AND FIELDS
S[a]. By Lemma 2.2.1, we only need to show that S[a] is closed under addition and
multiplication. You should be able to do this yourselves: show that the sum and
product of two polynomial expressions in a with coecients in S are also polynomial
expressions in a with coecients in S. 2
Notice that S[a] includes both S and a. Our arguments preceding the lemma
above show that any subring of R that contains both S and a must contain all
polynomial expressions in a with coecients in S, that is, it must contain S[a].
S[a] should thus be thought of as the smallest subring of R that contains both S
and a.
Here is an exercise: In the setup above, if two polynomial expressions s
0
+s
1
a+
s
2
a
2
+ + s
n
a
n
and s

0
+ s

1
a + s

2
a
2
+ + s

m
a
m
are equal (as elements of R),
can you conclude that n = m and s
i
= s

i
for i = 0, . . . , n? (Hint: See the examples
below.)
Now let us consider some examples:
Example 2.141. What, according to our denition above, is the subring of the
reals generated by Q and

2? It is the set of all polynomial expressions in

2
with coecients in Q, that is, the set of all expressions of the form q
0
+ q
1

2 +
q
2
(

2)
2
+ +q
n
(

2)
n
. Now let us look at these expressions more closely. Since
(

2)
2
= 2, q
2
(

2)
2
is just 2q
2
, q
4
(

2)
4
is just 4q
4
, etc. Similarly, q
3
(

2)
3
is just
2q
3

2, q
5
(

2)
5
is just 4q
5

2, etc. By collecting terms together, it follows that every


polynomial expression in

2 with coecients in Q can be rewritten as a +b

2 for
suitable rational numbers a and b. (For example, 1+2

2+(1/2)(

2)
2
+(1/4)(

2)
3
can be rewritten as 2 +(5/2)

2.) Hence, the subring of the reals generated by the


rationals and

2 is the set of all real numbers of the form a + b

2. It is for this
reason that we denoted this ring Q[

2] as far back as Example 2.12.


Example 2.142. Similarly, the subring of Q[

2] generated by Z and

2 is the set
of all real numbers of the form a +b

2, where a and b are integers. This is why we


denoted this ring Z[

2] in Example 2.31.
Example 2.143. Using the fact that i
2
= 1, show that the subring of C generated
by Q and i is the set of all complex numbers of the form a +bi, where a and b are
rational numbers. This explains the notation Q[] for the ring in Example 2.14.
2.7. FURTHER EXERCISES 93
Example 2.144. Similarly, the subring of Q[] generated by Z and i is is the set
of all complex numbers of the form a + bi, where a and b are integers. Hence the
notation Z[] in Example 2.32.
Example 2.145. Show that the subring of Q generated by Z and 1/2 is the set
of all rational numbers that have the property that, when written in the reduced
form a/b with gcd(a, b) = 1, the denominator b is a power of 2. This explains the
notation Z[1/2] in Example 2.33.
Example 2.146. Prove that the subring of R generated by Q[

2] and

3 is
precisely the ring of Example 2.34. Thus, this ring should be denoted Q[

2][

3].
We will often avoid using the second pair of brackets and simply refer to this ring
as Q[

2,

3].
Here is a quick exercise: In Lemma 2.140, suppose a is actually in S. Can you
prove that the ring generated by S and a is just S?
Remarks on Denition 2.46 Most textbooks dene a eld to be a commuta-
tive ring in which every nonzero a is invertible. In other words, the extra condition
that we have imposed, namely that the ring in question rst be an integral domain,
is omitted by most textbooks. This is because this extra condition is not really
requiredone can show easily that any commutative ring in which every nonzero
element a is invertible must necessarily be an integral domain. (If there were to ex-
ist a pair of nonzero elements a and b such that ab = 0, then multiplying both sides
by a
1
, which exists by hypothesis, we would nd b = 0, a contradiction. Hence
there can be no pair of nonzero elements that multiply out to zero.) The reason we
have chosen to dene a eld as an integral domain in which every nonzero element
is invertible is to highlight the hierarchical nature of the objects that we have been
considering: rings are fairly general objects, commutative rings are special rings
that are nicer to deal with, integral domains are special commutative rings that are
even nicer, and nally, elds are special integral domains that are nicest of all!
94 CHAPTER 2. RINGS AND FIELDS
Chapter 3
Vector Spaces
3.1 Vector Spaces: Denition and Examples
Recall from elementary linear algebra the notation R
2
for 2-dimensional xy
space and R
3
for 3-dimensional xyz space. A vector in R
2
(respectively
R
3
) is an arrow with its base at the origin and its tip at some point in R
2
(respectively R
3
). If v and w are vectors, then we add v and w using the
parallelogram law. We know that this process of addition is commutative,
that is, v + w = w + v for all vectors v and w. Vector addition is also
associative, that is, v +(w+u) = (v +w)+u for all vectors v, w, and u. The
vector whose base and tip are at the origin is denoted 0 (suggestively), and
satises v +0 = 0+v for all vectors v. Finally, for every vector v, the vector
we get by inverting v about the origin is denoted v (also suggestively), and
satises v + (v) = (v) +v = 0.
Focusing just on R
2
for convenience, let us stop thinking of R
2
as a
geometric object. Instead, since every point of R
2
corresponds to a vector
whose tip is at the given point, let us consider R
2
as a set consisting of
abstract objects called vectors. This set has a binary operation dened on
itaddition, where v + w is dened as the vector we get by temporarily
reverting to the geometric interpretation of R
2
as a plane and considering
95
96 CHAPTER 3. VECTOR SPACES
the vector obtained as the diagonal of the parallelogram formed by v and
w. What do you notice about this set of vectors with this binary operation?
The binary operation satises all the axioms for an abelian group! Thus,
in addition to being a geometric object (the plane), R
2
, when considered as
a set with a binary operation, has an algebraic structureit is an abelian
group!
But there is more. Let us go back to the interpretation of R
2
as 2-
dimensional xy space, and let us recall the notion of scalar multiplication.
A scalar is any real number, and given a scalar r and a vector v, we multiply
r and v according to the following denitionif r 0, then r v is the vector
in the same direction as v but whose length is r times the length of v, and if
r < 0, then r v is the vector in the opposite direction as v but whose length
is [r[ times the length of v. What are the properties of scalar multiplication?
If r and s are any two scalars, and if v and w are any two vectors, we have
the following: r (v+w) = r v+r w, (r+s) v = r v+s v, (rs) v = r (s v),
and 1 v = v.
Observe that the set of scalars, namely the real numbers, is a eld. Now,
let us attempt to generalize all this. In the case of R
2
above, we have seen
that the geometric interpretation of R
2
as 2-dimensional xy space furnishes
us with the notion of vector addition and scalar multiplication, but once
these denitions have been furnished, R
2
seems to have an algebraic life of
its own. For instance, (R
2
, +) is an abelian group, while scalar multiplication
has the (algebraic) properties listed above. Could similar sets of objects
called vectors and scalars not arise in dierent circumstances, with the same
properties as the ones listed above, but with the vector addition and scalar
multiplication perhaps dened by some process other than a geometric one?
The answer is yes, and in fact, they arise in vastly dierent situations. As
with the other concepts that we have seen (groups, rings, elds, etc.), it is
worth isolating this phenomenon and studying it in its own right.
Denition 3.1. Let F be a eld. A vector space over F (also called an F
vector space) is an abelian group V together with a function F V V called
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 97
scalar multiplication and denoted such that for all r and s in F and v and w
in V ,
1. r (v +w) = r v +r w,
2. (r +s) v = r v +s v,
3. (rs) v = r (s v), and
4. 1 v = v.
The elements of V are called vectors and the elements of F are called scalars.
Thus, R
2
and R
3
are both vector spaces over R. Let us look at several ex-
amples of vector spaces that arise from other than geometric considerations:
Example 3.2. We have looked at R
2
and R
3
, why not generalize these,
and consider R
4
, R
5
, etc.? These would of course correspond to higher-
dimensional worlds. It is certainly hard to visualize such spaces, but there
is no problem considering them in a purely algebraic manner. Recall that
every vector in R
2
can be described uniquely by the pair (a, b), consisting
of the x and y components of the vector. (Uniquely means that the
vector (a, b) equals the vector (a

, b

) if and only if a = a

and b = b

.)
Similarly, every vector in R
3
can be described uniquely by the triple (a, b, c),
consisting of the x, y, and z components of the vector. Thus, R
2
and R
3
can be described respectively as the set of all pairs (a, b) and the set of all
triples (a, b, c), where a, b, and c are arbitrary real numbers. Proceeding
analogously, for any positive integer n, we will let R
n
denote the set of n-
tuples (a
1
, a
2
, . . . , a
n
), where the a
i
are arbitrary real numbers. (As with
R
2
and R
3
, the understanding here is that two n-tuples (a
1
, a
2
, . . . , a
n
) and
(a

1
, a

2
, . . . , a

n
) are equal if and only if their respective components are equal,
that is, a
1
= a

1
, a
2
= a

2
, . . . , and a
n
= a

n
.) These n-tuples will be our
vectors; how should we add them? Recall that in R
2
we add the vectors
(a, b) and (a

, b

) by adding a and a

together and b and b

together, that is,


by adding componentwise.
98 CHAPTER 3. VECTOR SPACES
Exercise 3.2.1. Deduce from the parallelogram law of addition of
vectors in R
2
that the sum of (a, b) and (a

, b

) is (a +a

, b +b

).
We will do the same with R
n
we will decree that (a
1
, a
2
, . . . , a
n
) +
(a

1
, a

2
, . . . , a

n
) = (a
1
+a

1
, a
2
+a

2
, . . . , a
n
+a

n
).
Exercise 3.2.2. Check that with this denition of addition,
(R
n
, +) is an abelian group.
What should our scalars be? Just as in R
2
and R
3
, let us take our scalars
to be the eld R. How about scalar multiplication? In R
2
, the product
of the scalar r and the vector (a, b) is (ra, rb), that is, we multiply each
component of the vector (a, b) by the real number r. (Is that so? Check!)
We will multiply scalars and vectors in R
n
in the same way: we will decree
that the product of the real number r and the n-tuple (a
1
, a
2
, . . . , a
n
) is
(ra
1
, ra
2
, . . . , ra
n
).
Exercise 3.2.3. Check that this denition satises the axioms of
scalar multiplication in Denition 3.1.
Thus, R
n
is a vector space over R.
Example 3.3. Now, why restrict the examples above to n-tuples of R?
For any eld F, let F
n
stand for the set of n-tuples (a
1
, a
2
, . . . , a
n
), where
the a
i
are arbitrary elements of F. Add two such n-tuples componentwise,
that is, dene addition via the rule (a
1
, a
2
, . . . , a
n
) +(a

1
, a

2
, . . . , a

n
) = (a
1
+
a

1
, a
2
+ a

2
, . . . , a
n
+ a

n
). Take the eld F to be the eld of scalars, and
dene scalar multiplication just as in R
n
: given an arbitrary f F, and
an arbitrary n-tuple (a
1
, a
2
, . . . , a
n
), dene their scalar product to be the
n-tuple (fa
1
, fa
2
, . . . , fa
n
).
Exercise 3.3.1. Check that these denitions of vector addition
and scalar multiplication make F
n
a vector space over F.
Taking F = C and n = 2 for instance, we get complex 2-space, which,
for example, is a natural arena in which to study plane curves.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 99
Example 3.4. Similarly, for any eld F, let

0
F denote the set of all
innite-tuples (a
0
, a
1
, a
2
, . . . ), where the a
i
are in F. (It is convenient in
certain applications to index the components from 0 rather than 1, but if
this bothers you, it is harmless to think of the tuples as (a
1
, a
2
, a
3
, . . . ).)
Addition and scalar multiplication are dened just as in F
n
, except that
we now have innitely many components. With these denitions,

0
F
becomes an Fvector space. (This example is known as the direct product
of (countably innite) copies of F.)
Example 3.5. Consider the ring M
n
(R). Focusing just on the addition op-
eration on M
n
(R), recall that (M
n
(R), +) is an abelian group. (Remember,
for any ring R, (R, +) is always an abelian group.) We will treat the reals
as scalars. Given any real number r and any matrix (a
i,j
) in M
n
(R), we will
dene their product to be the matrix (ra
i,j
). (See the notes on page 153 for
a comment on this product.) Verify that with this denition, M
n
(R) is a
vector space over R. In a similar manner, if F is any eld, M
n
(F) will be a
vector space over F.
Example 3.6. Consider the eld Q[

2]. Then (Q[

2], +) is an abelian
group (why?). Think of the rationals as scalars. There is a very natural way
of multiplying a rational number q with an element a+b

2 of Q[

2], namely,
q (a+b

2) = qa+qb

2. With this denition of scalar multiplication, check


that Q[

2] becomes a vector space over the rationals.


If you probe this example a little harder, you may come up with an
apparent anomaly. What exactly is the role of the rationals here? True, we
want to think of the rationals as scalars. However, Q Q[

2], so every
rational number is also an element of Q[

2], and is therefore also a vector!


How do we resolve this conict? As it turns out, there really is nothing to
resolve, we merely accept the fact that the rationals have a dual role in this
example! When we see a rational number q by itself, we want to think
of it as q + 0

2, that is, we want to think of q as an element of Q[

2], or
in other words, we want to think of q as a vector. However, when we see q
100 CHAPTER 3. VECTOR SPACES
in an expression like q(a + b

2), we want to think of q as a scalar, that is,


something we multiply vectors by!
Example 3.7. Let us generalize Example 3.6. What we needed above were
that
1. Q[

2] is a eld, so (Q[

2], +) is automatically an abelian group, and


2. Q Q[

2], so that we could use the natural multiplication inside


Q[

2] to multiply any q Q with any a +b

2 Q[

2].
These two facts together gave us a Qvector space structure on Q[

2]. Now
let K/F be any eld extension. Since K is a eld, (K, +) is an abelian
group. Next, let us consider multiplication. Given any two elements k and
l of K, we know we can multiply the two elements together. However,
let us ignore this fact temporarily, and just consider the fact that given
any element f of F and any element k of K, we can multiply f and k.
(Notice that we have restricted the rst element to be from F. However,
we have placed no restriction on the second element, it can be any element
of K. This is just like considering the multiplication of any q Q and any
a+b

2 Q[

2] in Example 3.6 above.) Now note the following properties of


this (restricted) multiplication, which are just consequences of the properties
of the (unrestricted) multiplication in K: If f and g are any two elements
of F, and k and l are any two elements of K, then 1) f (k +l) = f k +f l,
2) (f +g) k = f k +g k, 3) (fg) k = f (g k), and 4) 1 k = k. (In this
last property, we consider 1 as an element of F.) What do we notice? If we
take the eld F as our scalars, (K, +) as our vectors, and the multiplication
operation between elements of F and elements of K (that arises from the
multiplication operation on K) as scalar multiplication, then, just as in
Example 3.6 above, K becomes an Fvector space!
Also, exactly as in Example 3.6 above, the elements of F have a dual
role, both as scalars and as vectors. When we see an element f F by
itself, f is playing the role of a vector. But when we see an element f F
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 101
in an expression like f k, f is playing the role of a scalar that is multiplying
the vector k!
Example 3.8. Now let us generalize Example 3.6 even further, by once
again considering the two conditions at the beginning of Example 3.7. Do
we really need the full force of the fact that Q[

2] is a eld? No, all we


need is the fact that Q[

2] is a ring that contains the eld Q; this is enough


to provide an abelian group structure on (Q[

2], +) and to furnish a scalar


product between elements of Q and elements of Q[

2]. Now let R be any


ring that contains a eld F. Then just as in Example 3.7 above, (R, +)
is an abelian group, and we can use the multiplication in R to dene the
scalar product between any element f of F and any element r of R. This
multiplication clearly satises the scalar product axioms in Denition 3.1, so
R becomes an Fvector space. Just as in Example 3.7 above, the elements
of F have a dual role, both as scalars and as vectors.
Here is a familiar instance of this phenomenon. Consider the polynomial
ring R[x]. This ring contains R (since every real number r lives inside R[x] as
the constant polynomial r+0x+0x
2
+ ). Thus, R[x] is a vector space over
R. Explicitly, the scalar product of any real number r and any polynomial
f =
n

i=0
a
i
x
i
(where the a
i
are real numbers and n is some nonnegative
integer) is the polynomial
n

i=0
ra
i
x
i
. The real numbers have a dual role here:
when we see a real number r by itself, we want to think of it as a vector,
and when we see it in an expression r f, we want to think of it as a scalar
multiplying the vector f.
In the same vein, F[x] is an Fvector space for any eld F.
Example 3.9. Here is an example related to F[x]. For any eld F and any
nonnegative integer n, write F
n
[x] for the set of all polynomials in x with
coecients in F whose degrees are at most n. Then F
n
[x] is an Fvector
space.
Question 3.9.1. Why?
102 CHAPTER 3. VECTOR SPACES
Example 3.10. Now think about this: Suppose V is a vector space over a
eld K. Suppose F is a subeld of K. Then V is also a vector space over
F!
Question 3.10.1. Why? What do you think the scalar multiplica-
tion ought to be? (See the notes on page 153 for some remarks on
this.)
As an example of this phenomenon, R[x], besides being an Rvector
space, is also a Qvector space. Vector addition is the usual addition of poly-
nomials. As for scalar multiplication, when we consider R[x] as a Qvector
space, we only allow multiplication of polynomials by rational numberswe
ignore the fact that we can multiply polynomials by arbitrary real numbers.
Similarly, M
2
(Q[

2], besides being a Q[

2]vector space, is also a Q


vector space.
Example 3.11. Here is an example that may seem pathological at rst,
but is not really so! Consider the trivial abelian group V : this consists of a
single element, namely, the identity element 0
V
. The only addition rule here
is 0
V
+0
V
= 0
V
, and it is easy to check that the set 0
V
with the addition
rule above is indeed an abelian group. Now let F be any eld. Then V is a
vector space over F with the product rule f 0
V
= 0
V
. There is only vector
in this space, namely 0
V
, although, there are lots of scalars! This vector
space is known as the trivial vector space or the zero vector space over F,
and shows up quite naturally as kernels of injective linear transformations
(see Lemma 3.87 ahead, for instance).
Remark 3.12. Now observe that all these examples of vector spaces have the
following properties:
1. For any scalar f, f times the zero vector is just the zero vector.
2. For any vector v, the scalar 0 times v is the zero vector.
3. For any scalar f and any vector v, (f) v = (f v).
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 103
4. If v is a nonzero vector, then f v = 0 for some scalar f implies f = 0.
These properties somehow seem very natural, and one would expect them
to hold for all vector spaces. Just as in Remark 2.24, where we considered
a similar set of properties for rings, we would like these properties to be
deducible from the vector space axioms themselves. This would, among
other things, convince us that our vector space axioms are the correct
ones, that is, they yield objects that behave more or less like the examples
above instead of objects that are rather pathological. As it turns out, our
expectations are not misguided: these properties are deducible from the
vector space axioms, and therefore do hold in all vector spaces. We will
leave the verication of this to the exercises (see Exercise 3.97).
3.2 Linear Independence, Bases, Dimension
Now, given a eld F and an Fvector space V , it is natural to wonder about
the size of V . To measure this size, we need to consider the concept of the
dimension of a vector space.
Let us contrast R
2
with R
3
. We all share the intuition that R
3
is somehow
bigger than R
2
. But what precisely is it about R
2
and R
3
that makes us
feel that one is bigger than the other? If we examine our intuition a little
more closely, we discover that the reason that R
3
seems bigger than R
2
is
that R
3
has three coordinate axes, while R
2
has only two. Hidden in this
fact is the concept of the dimension of a vector space. And in fact, without
necessarily having paused to think through the notion of dimension or make
it precise, most of us have already absorbed this concept and integrated it
into our liveswe readily describe R
2
as a 2-dimensional space and R
3
as a
3-dimensional space.
With this in mind, what should we take to be the dimension of a vector
space? The number of coordinate axes it contains? As it turns out, this is
indeed correct, but we have some work to do rst. Remember, a vector space
104 CHAPTER 3. VECTOR SPACES
is an algebraic object. It is dened as an abelian group (V, +) along with a
scalar multiplication F V V with the properties that we have described
above. Thus, while the term coordinate axes has an obvious meaning
in the geometric examples of R
2
and R
3
, it is not clear what meaning it
should have in a general vector space. So our rst task is to convert the
geometric notion of coordinate axes into an algebraic notion. Next, we
need to worry about the possibility that an arbitrary vector space dened
purely algebraically may not have any coordinate axes at all, as well as the
possibility that dierent sets of coordinate axes of the same vector space may
have dierent numbers of axes in each set! If either of these possibilities were
to occur, we would not have a unique number that we could assign as the
dimension of the vector space. As it turns out, neither of these can happen,
and our second task is to consider the impossibility of these two scenarios.
Let us turn to the rst task. Focusing on R
2
for convenience, let us
denote the vector with tip at the point (1, 0) by i, and the one with the tip
at the point (0, 1) by j. From vector calculus, we know that if we take an
arbitrary vector in R
2
, say u, with its tip at (a, b), then the projection of u
onto the x-axis is just a times the vector i and the projection on the y-axis
is just b times the vector j. The parallelogram law then shows that u is
the sum of a i and b j, that is, u = a i + b j. Since u was an arbitrary
vector in this discussion, we nd that every vector in R
2
can be written as a
scalar times i added to another scalar times j. This example motivates two
denitions.
Denition 3.13. Let V be a vector space over a eld F. A linear combination
of vectors v
1
, , v
n
(or, an F-linear combination of vectors v
1
, , v
n
, if we
wish to emphasize the eld over which the vector space is dened) is any vector
in V that can be written as f
1
v
1
+ +f
n
v
n
for suitable scalars f
1
, , f
n
.
Thus, what we found above is that every vector in R
2
can be written as a
R-linear combination of the vectors i and j. (To give you more examples, the
vectors i+j,

2i3j =

2i+(3)j, and i+3


2
j are all linear combinations
of i and j.)
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 105
The other denition motivated by the example of the vectors i and j in
R
2
is the following:
Denition 3.14. Let V be a vector space over a eld F. A subset S of V is
said to span V (or S is said to be a spanning set for V ) if every vector v V
can be written as
n

i=1
f
i
v
i
for some integer n 1, some choice of vectors v
1
,
. . . , v
n
from S, and some choice of scalars f
1
, . . . , f
n
. (In the language of
Denition 3.13 above, S is a spanning set for V if every vector in V is expressible
as a linear combination of some elements of S.
The discussion before Denition 3.13 showed that the set S = i, j is a
spanning set for R
2
. Here are more examples:
Example 3.15. We have seen in Example 3.6 that Q[

2] is a Qvector
space. Note that every element of Q[

2] is of the form a +b

2 for suitable
a and b Q. Thinking of a as a 1, this tells us that every element
of Q[

2] is expressible as a Q-linear combination of 1 and

2. (We are
thinking of 1 as a vector in this last statement. Recall the discussion of the
dual role of Q in Example 3.6.) Hence, S = 1,

2 is a spanning set for


the Q-vector space Q[

2].
Example 3.16. The set 1, x, x
2
, . . . is a spanning set for the polynomial
ring R[x] considered as a vector space over R (see Example 3.8 above). This
is clear since every polynomial in R[x] is of the form r
0
+r
1
x+ +r
n
x
n
for
some integer n 0 and suitable real numbers r
0
, r
1
, . . . , r
n
. Put dierently,
every polynomial can be expressed as a R-linear combination of 1, x, . . . , x
n
for some integer n 0. Since dierent polynomials have dierent degrees,
we need to use all powers x
i
(i = 1, 2, . . . ) to get a spanning set for R[x].
Remark 3.17. By convention, the empty set is taken as a spanning set for
the zero vector space. Moreover, by convention, the trivial space is the only
space spanned by the empty set. This convention will be useful later, when
dening the dimension of the zero vector space.
So, returning to our study of dimension, should we take the algebraic
analog of coordinate axes to be any set S of vectors that span V ? No, not
106 CHAPTER 3. VECTOR SPACES
yet! There could be redundancy in this set! It may turn out, for example,
that the smaller set S v obtained by deleting a particular vector v from
the set already spans V ! (If so, why bother using this vector v as one of
coordinate axes?!?)
Let us formulate this as a denition:
Denition 3.18. Given a vector space V over a eld F, a vector v in a
spanning set S is said to be redundant if the subset S v obtained by
removing v is itself a spanning set for V . (Put dierently, v is redundant in
S if every vector in V can already be expressed as a linear combination of
elements in S v, so the vector v is not needed at all.) We will say that
there is redundancy in the spanning set S if any one of the vectors in this set is
redundant.
Example 3.19. For an example of a spanning set with redundancy in it,
we do not have to look very far: Going back to R
2
, let us write w for the
vector with tip at (1/

2, 1/

2). Then i, j, and w also span R


2
.
Question 3.19.1. This is of course very trivial to seethe vector
with tip at (a, b) can be written as the sum a i + b j + 0 w.
More interestingly, can you show that it can also be written as (a
1/

2) i + (b 1/

2) j +w?
Since i and j already span R
2
, there is clearly redundancy in the set
i, j, w.
To push this example a bit further, note that i and w also form a span-
ning set for R
2
. To see this, note that j = i +

2w. Thus, any vector


a i +b j in R
2
, can be written as (ab) i +(

2b) w by simply substituting


i +

2w for j. This shows that i and w also span R


2
.
Notice that there is no redundancy in the set i, w, because if you
remove, say w, then the remaining vector i alone will not span R
2
: the
various linear combinations of i are the vectors of the form ri, where
r is an arbitrary real number, and these are all aligned with the vector i
and will therefore not give all of R
2
. (Similarly, if you remove i, the linear
combinations of remaining vector w will all be aligned with w and will not
give all of R
2
.)
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 107
Question 3.19.2. Similarly, can you show that j and w also span
R
2
.
In this example, we would of course take the set i, j as coordinate axes
for R
2
, as is the usual practice, but we could just as easily take the set i, w
or the set j, w as coordinate axes.
Example 3.20. Let S be a spanning set for a vector space V . If v is any
vector in V that is not in S, then S v is also a spanning set for V in
which there is redundancy. More generally, if T is any nonempty subset of
V that is disjoint from S, then S T is also a spanning set for V in which
there is redundancy.
Exercise 3.20.1. Convince yourself of this!
For instance, we have seen in Example 3.16 above that the set 1, x, x
2
, . . .
is a spanning set for the polynomial ring R[x] considered as a vector space
over R. Taking T = 1 +x, 1 +x +x
2
, 1 +x +x
2
+x
3
, . . . , it follows that
the set U = 1, x, 1 +x, x
2
, 1 +x +x
2
, x
3
, 1 +x +x
2
+x
3
, . . . is a spanning
set for R[x] in where there is redundancy.
Exercise 3.20.2. Continuing with the example of the polynomial
ring R[x] considered as a vector space over R, show that there is no
redundancy in the spanning set 1, x, x
2
, . . . .
Remember, we are trying to formulate an algebraic denition of coordi-
nate axes. Our intuition from Example 3.19, as well as Example 3.16 and
Exercise 3.20.2 above, would suggest that a set of coordinate axes, rst,
should span our vector space, and next, should not have more vectors than
are needed to span the space, that is, should not have redundancy in it.
It would be very useful to have alternative characterizations of redun-
dancy. We have the following:
Lemma 3.21. Let V be a vector space over a eld F, and let S be a spanning
set for V . Then, the following are equivalent:
108 CHAPTER 3. VECTOR SPACES
1. There is redundancy in S,
2. Some vector v in S is expressible as a linear combination of some
vectors from the set S v, and
3. There exist a positive integer m and scalars f
1
, . . . , f
m
, not all zero,
such that for some vectors v
1
, . . . , v
m
from S, we have the relation
f
1
v
1
+ +f
m
v
m
= 0.
Proof. Let us prove the implications (1) (2) (3) (1).
(1) (2): Given that there is redundancy in S, this means that there is
some v S such that S v is already a spanning set for V . Thus, by
denition of what it means to span V , there are some vectors v
1
, . . . , v
n
(for some integer n 1) in S v such that v is expressible as a linear
combination of v
1
, . . . , v
n
. Thus, (1) (2).
(2) (3): Given that v in S is expressible as a linear combination of some
vectors from the set S v, this means that there exist vectors v
1
, . . . , v
n
(for some integer n 1) in S v, and some scalars f
1
, . . . , f
n
, such that
v = f
1
v
1
+ +f
n
v
n
. We may rewrite this as f
1
v
1
+ +f
n
v
n
+(1)v = 0.
Thus, taking m to be n+1, v
m
to be v, f
m
to be 1, we nd f
1
v
1
+ +
f
n
v
n
+f
m
v
m
= 0. Notice here that f
m
,= 0, and that m 2 > 1. Thus, we
have proved that (2) (3).
(3) (1): Given a dependence relation f
1
v
1
+ +f
n
v
n
+f
m
v
m
= 0 where
some f
i
is nonzero, assume for convenience that f
m
,= 0. If m = 1, then the
relation f
1
v
1
= 0 yields v
1
= 0. Clearly, S is redundant, since the 0-vector
is not needed for any spanning relation, so S 0 will continue to span V .
Now assume that m > 1. Then, dividing by f
m
and moving v
m
to the other
side, we nd
v
m
= (f
1
/f
m
)v
1
+ + (f
m1
/f
m
)v
m1
Write v for v
m
. We claim now that the set S v is already a spanning
set for V . For, given a vector w, we know that it is expressible as a linear
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 109
combination g
1
u
1
+ + g
n
u
n
of some elements u
1
, . . . , u
n
from S (here
the g
i
are scalars). If v is not one of these vectors u
1
, . . . , u
n
, then u
1
, . . . ,
u
n
are all in S v, so w is already expressible as a linear combination
of vectors from S v. So assume that v is one of these vectors, say (for
simplicity), v = u
n
. Then, invoking our earlier expression for v = v
m
, we
nd
w = g
1
u
1
+ +g
n
u
n
= g
1
u
1
+ +g
n
((f
1
/f
m
)v
1
+ + (f
m1
/f
m
)v
m1
)
= g
1
u
1
+ + (g
n
f
1
/f
m
)v
1
+ + (g
n
f
m1
/f
m
)v
m1
Notice that w is now expressed as a linear combination of the vectors u
1
,
. . . , u
n1
, v
1
, . . . , v
m1
all of which are in S v. It follows therefore
that every vector in V is expressible as a linear combination of vectors in
S v. In other words, S v already spans V , so there is redundancy
in S. Thus, we have proved that (3) (1).
2
With this lemma in mind, we make the following denition:
Denition 3.22. Let F be a eld and V an Fvector space. Let v
1
, . . . , v
n
be elements of v. Then v
1
, . . . , v
n
are said to be linearly dependent over F,
or Flinearly dependent if there exist scalars a
1
, . . . , a
n
, not all zero, such
that a
1
v
1
+ + a
n
v
n
= 0. If no such scalars exist, then v
1
, . . . , v
n
are
said to be linearly independent over F, or Flinearly independent. (If there is
no ambiguity about the eld F, the vectors are merely referred to as linearly
dependent or linearly independent. Also, if v
1
, . . . , v
n
are linearly independent,
respectively linearly dependent, vectors, then the set v
1
, . . . , v
n
is said to
be a linearly independent, respectively a linearly dependent, set.) An arbitrary
subset S of V is said to be linearly independent if every nite subset of S is
linearly independent. Similarly, an arbitrary subset S of V is said to be linearly
dependent if some nite subset of S is linearly dependent.
Thus, the implications 1 3 of Lemma 3.21 can be stated in this new
language as follows: There is redundancy in S if and only if S is linearly
dependent.
110 CHAPTER 3. VECTOR SPACES
Before proceeding further, here are a few quick exercises:
Exercise 3.22.1. Show that if v is a nonzero vector, then the set
v must be linearly independent. See Property (4) in Remark 3.12.
Exercise 3.22.2. Show that two vectors are linearly dependent if
and only if one is a scalar multiple of the other.
Exercise 3.22.3. Are the following subsets of the given vector
spaces linearly independent? (Very little computation, if any, is
necessary.)
1. In R
3
: (1, 1, 1), (10, 20, 30), (23, 43, 63)
2. In R
3
: (1, 0, 0), (2, 2, 0), (3, 3, 3)
3. In R[x]: (x + 1)
3
, x
2
+x, x
3
+ 1
Exercise 3.22.4. We know that C
2
is a vector-space over both C
(Example 3.3) and over R (Example 3.10). Show that v = (1+, 2)
and w = (1, 1 +) are linearly dependent when C
2
is considered as
a C vector space, but linearly independent when considered as a R
vector space.
Also, let us illustrate the meaning of the last two sentences of the De-
nition 3.22 above. Let us consider the following:
Example 3.23. Consider the subset S = 1, x, x
2
, x
3
, . . . of R[x], with
R[x] viewed as a vector space over R (we have already considered this set in
Examples 3.16 and 3.20 above). This is, of course, an innite set. Consider
any nonempty nite subset of S, for instance, the subset x, x
5
, x
17
, or the
subset 1, x, x
2
, x
20
, or the subset 1, x
3
, x
99
, x
100
, x
1001
, x
1004
. In general,
a nonempty nite subset of S would contain n elements (for some n 1), and
these elements would be various powers of xsay x
i
1
, x
i
2
, . . . , x
in
. These
elements are denitely linearly independent, since if a
1
x
i
1
+ + a
n
x
in
is
the zero polynomial, then by the denition of the zero polynomial, each a
i
must be zero. This is true regardless of which nite subset of S we takeall
that would be dierent in dierent nite subsets is the number of elements
(the integer n) and the particular powers of x (the integers i
1
through i
n
)
chosen. Thus, according to our denition, the set S is linearly independent.
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 111
On the other hand, consider the subset S

= S 1 + x. Any nite
subset of S

that does not contain all three vectors 1, x and 1 + x will be


linearly independent (check!). However, this alone is not enough for you to
conclude that S

is a linearly independent set. For the subset 1, x, 1 + x


of T is linearly dependent: 1 1 +1 x+(1) (1 +x) = 0. By the denition
above, T is a linearly dependent set.
Remark 3.24. Note that the zero vector is linearly dependent: for example,
the nonzero scalar 1 multiplied by 0
V
gives 0
V
. Thus, if V is the zero vector
space, then 0
V
is a linearly dependent spanning set, so by Lemma 3.21,
this set has to have redundancy. Hence, some subset of 0
V
must already
span the trivial space. But the only subset of 0
V
is the empty set, hence
this lemma tells us that the empty set must span 0
V
. This is indeed
consistent with the convention adopted in Remark 3.17 above.
We are now ready to construct the algebraic analog of coordinate axes.
We will choose as our candidate any set of vectors that spans our vector
space and in which there is no redundancy. Moreover, instead of using the
term coordinate axes (which is inspired by the geometric examples of R
2
and R
3
), we will coin a new termthe algebraic analog of coordinate axes
will be called a basis of our vector space. Since redundancy is equivalent to
linear dependence (Lemma 3.21), lack of redundancy is equivalent to linear
independence. We hence have the following denition:
Denition 3.25. Let F be a eld and V an Fvector space. A subset S of
V is said to be a basis of V if S spans V and there is no redundancy in S.
Alternatively, since lack of redundancy is equivalent to linear independence, S is
said to be a basis of V if S spans V and is linearly independent. The individual
vectors that belong to S are referred to as basis vectors. Sometimes, when we
wish to emphasize the eld of scalars, we refer to S as an F-basis of V .
Here are some examples of bases of vector spaces:
Example 3.26. The set consisting of the vectors i and j is a basis for R
2
.
We have already seen in the text that i and j span R
2
.
112 CHAPTER 3. VECTOR SPACES
Exercise 3.26.1. Argue carefully why there is no redundancy in
the set i, j. Alternatively, argue why the set i, j is linearly inde-
pendent.
Exercise 3.26.2. Show that the set consisting of the vectors i and
w = (1/

2, 1/

2) also forms a basis. (We have already done this,


in Example 3.19!)
Example 3.27. Recall the denition of the vector space R
n
in Example 3.2.
Let e
i
stand for the vector whose components are all zero except in the i-th
slot, where the component is 1. (For example, in R
4
, e
1
= (1, 0, 0, 0), e
3
=
(0, 0, 1, 0), etc.). Then the e
i
form a basis for R
n
as a R-vector space. They
clearly span R
n
since any n-tuple (r
1
, . . . , r
n
) R
n
is just r
1
e
1
+ +r
n
e
n
. As
for the linear independence, assume that r
1
e
1
+ +r
n
e
n
= 0 for some scalars
r
1
, . . . , r
n
. Since the sum r
1
e
1
+ +r
n
e
n
is just the vector (r
1
, . . . , r
n
), we
nd (r
1
, . . . , r
n
) = (0, . . . , 0), so each r
i
must be zero.
This basis is known as the standard basis for R
n
. Of course, in R
2
, e
1
and e
2
are more commonly written as i and j, and in R
3
, e
1
, e
2
, and e
3
are
more commonly written as i, j, and k.
Exercise 3.27.1. Show that the vectors e
1
, e
2
e
1
, e
3
e
2
, . . . ,
e
n
e
n1
also form a basis for R
n
.
Example 3.28. The set consisting of the elements 1 and

2 forms a basis
for Q[

2] as a vector space over Q. (We have seen in Example 3.15 above


that 1 and

2 span Q[

2]. As for the Qlinear independence of 1 and

2,
you were asked to prove this in Exercise 2.12.4 in Chapter 2!)
Example 3.29. The set 1, x, x
2
, . . . forms a basis for R[x] as a vector
space over R. We have seen in Example 3.16 that this set spans R[x]. As
for the linear independence, see the argument in Example 3.23 above.
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 113
Exercise 3.29.1. Prove that the set B = 1, 1+x, 1+x+x
2
, 1+
x +x
2
+x
3
. . . is also a basis for R[x] as a vector space over R.
(Hint: Writing v
0
= 1, v
1
= 1 + x, v
2
= 1 + x + x
2
, etc., note
that for i = 1, 2, . . . , x
i
= v
i
v
i1
. It follows that all powers
of x (including x
0
) are expressible as linear combinations of the v
i
.
Why does it follow from this that the v
i
span R[x]? As for linear
independence, suppose that for some nite collection v
i
1
, . . . , v
i
k
(with i
1
< i
2
< < i
k
), there exist scalars r
1
, . . . , r
k
such that
r
1
v
i
1
+ + r
k
v
i
k
= 0. What is the highest power of x in this
expression? In how many of the elements v
i
1
, . . . , v
i
k
does it show
up? What is its coecient? So?)
Example 3.30. Consider F
n
[x] as an Fvector space (see Example 3.9
above). You should easily be able to describe a basis for this space and
prove that your candidate is indeed a basis.
Example 3.31. The set 1,

2,

3,

6 forms a basis for Q[

2,

3] as
a vector space over Q. You have seen in Example 2.34 that, by our very
denition of the ring, every element of Q[

2,

3] is of the form a + b

2 +
c

3 + d

6, where a, b, c, and d are all rational numbers. This simply


says that the set 1,

2,

3,

6 spans Q[

2,

3] as a vector space over Q.


As for the linear independence of this set, this was precisely the point of
Exercise 2.118 in Chapter 2!)
Example 3.32. The n
2
matrices e
i,j
(see Exercise 2.16.4 of Chapter 2 for
this notation) are a basis for M
n
(R).
Exercise 3.32.1. Prove this! To start you o, here is a hint: In
M
2
(R), for example, a matrix such as
_
1 2
3 4
_
can be written as the linear combination e
1,1
+2e
1,2
+3e
2,1
+4e
2,2
.
Example 3.33. Certain linear combinations of basis vectors also give us a
basis:
Exercise 3.33.1. Going back to Q[

2], show that the vectors 1


and 1 +

2 also form a basis. (Hint: Any vector a + b

2 can be
rewritten as (a b) +b(1 +

2). So?)
114 CHAPTER 3. VECTOR SPACES
Exercise 3.33.2. Now show that if V is any vector space over any
eld with basis v
1
, v
2
, then the vectors v
1
, v
1
+ v
2
also form a
basis. How would you generalize this pattern to a vector space that
has a basis consisting of n elements v
1
, v
2
, . . . , v
n
? Prove that
your candidate forms a basis.
Exercise 3.33.3. Let V be a vector space with basis v
1
, . . . , v
n
.
Study Exercise 3.27.1 and come up with a linear combination of the
v
i
, similar to that exhibited in that exercise, that also forms a basis
for V . Prove that your candidate forms a basis.
Example 3.34. Consider the vector space

0
F of Example 3.4 above.
You may nd it hard to describe explicitly a basis for this space. However,
let e
i
(for i = 0, 1, . . . ) be the innite-tuple with 1 in the position indexed
by i and zeros elsewhere. (Thus, e
0
= (1, 0, 0, . . . ), e
1
= (0, 1, 0, . . . ), etc.)
Exercise 3.34.1. Why is the set S = e
0
, e
1
, e
2
, . . . not a basis
for

0
F? Is S at least linearly independent? (See the notes on
page 154 for some comments on this example.)
Example 3.35. The empty set is a basis for the trivial vector space. This
follows from Remark 3.17 (see also Remark 3.24), since the empty set spans
the trivial space, and since the empty set is vacuously linearly independent.
Here is a result that describes a useful property of bases and is very easy
to prove.
Proposition 3.36. Let V be a vector space over a eld F, and let S be a
basis. Then in any expression of a vector v V as v = f
1
b
1
+ +f
n
b
n
for
suitable vectors b
i
S and nonzero scalars f
i
, the b
i
and the f
i
are uniquely
determined.
Proof. What we need to show is that if v is expressible as f
1
b
1
+ +f
n
b
n
for suitable vectors b
i
S and nonzero scalars f
i
, and is also expressible
as g
1
c
1
+ + g
m
c
m
for suitable vectors c
i
S and nonzero scalars g
i
,
then n = m, and after relabelling if necessary, each b
i
= c
i
and each f
i
= g
i
(i = 1, . . . , n). To do this, assume, after relabelling if necessary, that b
1
= c
1
,
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 115
. . . , b
t
= c
t
(for some t min(m, n)), and that the sets b
t+1
, . . . , b
n
and
c
t+1
, . . . , c
m
are disjoint. Then, bringing all terms to one side, we may
rewrite our equality as
(f
1
g
1
)b
1
+ + (f
t
g
t
)b
t
+ f
t+1
b
t+1
+ +f
n
b
n
g
t+1
b
t+1
g
m
b
m
= 0
By the linear independence of the subset b
1
, . . . , b
t
, b
t+1
, , b
n
, c
t+1
, , c
m

of S, we nd that f
1
= g
1
, . . . , f
t
= g
t
, f
t+1
= = f
n
= 0, g
t+1
= =
g
m
= 0. But since the scalars were assumed to be nonzero, f
t+1
= 0
and g
t+1
= 0 are impossible, so, to begin with, there must have been no
f
t+1
or g
t+1
to speak of! Thus, t must have equaled n, and similarly,
t must have equaled m. From this, we get n = m (= t), and then, by our
very denition of t, we nd that b
1
= c
1
, . . . , b
n
= c
n
. Coupled with our
derivation that f
1
= g
1
, . . . , f
t
= g
t
, we have our desired result. 2
Now that we have arrived at the algebraic analog of coordinate axes, we
turn our attention to the next step in our programwe need to show that
every vector space has a basis, and that dierent bases of the same vector
space have the same number of elements in them.
The rst of these two tasks, namely, showing that every vector space has
a basis, is a little tricky to do: to do full justice to this task, we need to
invoke Zorns Lemma, an extremely useful tool of logic. (Zorns Lemma, in
spite of its name, is really not a lemma, but an axiom of logic. See Chapter
B in the Appendix.) For a rst introduction to abstract algebra, any usage
of Zorns Lemma can seem dense and somewhat foreboding (what else will
the Gods of Logic hurl at us?), so we will relegate the full proof to the same
Chapter B in the Appendix (see Theorem B.7 there). However, to help build
a more concrete feel for the existence of bases, we will also give a proof of
the existence of a basis in the special case when we know that the vector
space in question has a nite spanning set.
116 CHAPTER 3. VECTOR SPACES
We will assume that our vector space is not the trivial space, since we
already know that the trivial space has a basis (see Example 3.35 above).
Proposition 3.37. Let V be a vector space over a eld F. Let S be a
spanning set for V , and assume that S is a nite set. Then some subset of
S is a basis of V . In particular, every vector space with a nite spanning
set has a basis.
Proof. Note that S is nonempty, since V has been assumed to not be the
trivial space (see Remark 3.17). If the zero vector appears in S, then the
set S

= S 0 that we get by throwing out the zero vector will still span
V (why?) and will still be nite. Any subset of S

will also be a subset of


S, so if we can show that some subset of S

must be a basis of V , then we


would have proved our theorem. Hence, we may assume that we are given
a spanning set S for V that is not only nite, but one in which none of the
vectors is zero.
Let S = v
1
, v
2
, . . . , v
n
for some n 1. If there is no redundancy in S,
then there is nothing to prove: S would be a basis by the very denition of a
basis. So assume that there is redundancy in S. By relabelling if necessary,
we may assume that v
n
is redundant. Thus, S
1
= v
1
, v
2
, . . . , v
n1
is itself
a spanning set for V . Once again, if there is no redundancy in S
1
, then we
would be done; S
1
would be a basis. So assume that there is redundancy
in S
1
. Repeating the arguments above and shrinking our set further and
further, we nd that this process must stop somewhere, since at worst, we
would shrink our spanning set down to one vector, say S
n1
= v
1
, and a set
containing just one nonzero vector must be linearly independent (Exercise
3.22.1), so S
n1
would form a basis. (Note that this is only the worst case;
in actuality, this process may stop well before we shrink our spanning set
down to just one vector.) When this process stops, we would have a subset
of S that would be a basis of V . 2
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 117
Remark 3.38. Notice that to prove that bases exist (in the special case where
V has a nite spanning set) what we really did was to show that every nite
spanning set of V can be shrunk down to a basis of V . This result is true
more generally: Given any spanning set S of a vector space V (in other
words, not just a nite spanning set S), there exists a subset S

of S that
forms a basis of V . See the notes on page 223 Chapter B in the Appendix.
Having proved that every vector space has a basis, we now need to show
that dierent bases of a vector space have the same number of elements in
them. (Remember our original program. We wish to measure the size of
a vector space, and based on our examples of R
2
and R
3
, we think that a
good measure of the size would be the number of coordinate axes, or basis
elements, that a vector space has. However, for this to make sense, we need
to be guaranteed that every vector space has a basiswe just convinced
ourselves of thisand that dierent bases of a vector space have the same
number of elements in them.) In preparation, we will prove an important
lemma. Our desired results will fall out as corollaries.
We continue to assume that our vector space is not the trivial space.
Lemma 3.39 (Exchange Lemma). Let V be a vector space over a eld
F, and let B = v
1
, . . . , v
n
(n 1) be a spanning set for V . Let C =
w
1
, . . . , w
m
be a linearly independent set. Then m n.
Proof. The basic idea behind the proof is to replace vectors in the spanning
set B one after another with vectors in C, and observing at the end that
if m were greater than n, then there would not be enough replacements of
elements of B to guarantee linear independence of the set C.
We begin as follows: Since B spans V , every vector in V is expressible as
a linear combination of elements of B. In particular, we may write w
1
as a
linear combination of elements of B, that is, w
1
= c
1
v
1
++c
2
v
2
+ +c
n
v
n
for
suitable scalars c
i
, not all zero. Since one of these scalars is nonzero, we may
assume for convenience (by relabelling the vectors of B if necessary), that
c
1
,= 0. As usual, we may write v
1
= (1/c
1
)w
1
+(c
2
/c
1
)v
2
+(c
3
/c
1
)v
3
+
118 CHAPTER 3. VECTOR SPACES
+ (c
n
/c
1
)v
n
. Now go back and study how we proved (2) (3) in
Lemma 3.21. We are going to use the same sort of an argument here: we
will prove that the set w
1
, v
2
, v
3
, . . . , v
n
spans V . For given any vector v in
V , it can be written as a linear combination v = f
1
v
1
+f
2
v
2
+ +f
n
v
n
for
suitable scalars f
i
(why?). Now, in this expression, substitute (1/c
1
)w
1
+
(c
2
/c
1
)v
2
+(c
3
/c
1
)v
3
+ +(c
n
/c
1
)v
n
for v
1
, and what do you nd?v
is expressible as a linear combination of w
1
, v
2
, v
3
, . . . , v
n
! Thus, the set
w
1
, v
2
, v
3
, . . . , v
n
spans V as claimed.
Now observe what we have done: we have replaced v
1
with w
1
. Let us
take this to the next step. Since the set w
1
, v
2
, v
3
, . . . , v
n
spans V , we can
write w
2
as a linear combination of elements of this set. Thus, w
2
= g
1
w
1
+
g
2
v
2
+g
3
v
3
+ +g
n
v
n
for suitable scalars g
i
, not all zero. Now the scalars
g
2
, g
3
, . . . , g
n
cannot all be zero, since g
1
would then have to be nonzero
(why?) and this relation would then read w
2
= g
1
w
1
a contradiction, as
the set C is linearly independent. Hence, one of the scalars g
2
, g
3
, . . . ,
g
n
must be nonzero. For convenience, we may assume (by relabelling the
vectors v
2
, v
3
, . . . , v
n
if necessary) that g
2
,= 0. Dividing by g
2
and moving
all terms but v
2
to one side, we can write v
2
as a linear combination of the
vectors w
1
, w
2
, v
3
, . . . , v
n
. Exactly as in the last paragraph, we nd that
since the set w
1
, v
2
, v
3
, . . . , v
n
spans V , the set w
1
, w
2
, v
3
, . . . , v
n
also
spans V .
So far, we have succeeded in replacing v
1
with w
1
and v
2
with w
2
, and the
resultant set w
1
, w
2
, v
3
, . . . , v
n
still spans V . Now continue this process,
and consider what would happen if we were to assume that m is greater
than n. Well, we would replace v
3
by w
3
, v
4
by w
4
, etc., and then v
n
by w
n
.
(We know that we would be able to replace all the vs with ws because by
assumption, there are more ws than vs.) At each stage of the replacement,
we would be left with a set that spans V . In particular, the set we would be
left with after replacing v
n
by w
n
, namely w
1
, w
2
, . . . , w
n
, would span V .
But since we assumed that m is greater than n, there would be at least one
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 119
w left, namely w
n+1
. Since w
1
, w
2
, . . . , w
n
would span V , we would be
able to write w
n+1
as a linear combination of the vectors w
1
, w
2
, . . . , w
n
.
This is a contradiction, since the set C is linearly independent! Hence m
cannot be greater than n, that is, m n! 2
We are now ready to prove that dierent bases of a given vector space
have the same number of elements. We will distinguish between two cases:
vector spaces having bases with nitely many elements, and those having
bases with innitely many elements. We will take care of the innite case
rst.
Corollary 3.40. If a vector space V has one basis with an innite number
of elements, then every other basis of the vector space also has an innite
number of elements.
Proof. Let S be the basis of V with an innite number of elements (that
exists by hypothesis), and let T be any other basis. Assume that T has only
nitely many elements, say m. Since S has innitely many elements, we can
certainly pick m+1 vectors from it. So pick any m+1 vectors from S and
denote this selected set of vectors by S

. Since the vectors in S

are part of
the basis S, they are certainly linearly independent. We may think of the
set T as the set B of Lemma 3.39 (after all, T being a basis, will span V ),
and we may think of the set S

as the set C of the same lemma (after all,


S

is linearly independent). The lemma then shows that m+1 m, a clear


contradiction. Hence T must also be innite! 2
We settle the nite case now. Recall that we are assuming that our
vector space is not the trivial space. The trivial space has only one basis
anyway, the empty set (see Remark 3.24).
Corollary 3.41. If a vector space V has one basis with a nite number of
elements n, then every basis of V contains n elements.
120 CHAPTER 3. VECTOR SPACES
Proof. Let S = x
1
, . . . , x
n
be the given basis of V with n elements, and
let T be any other basis. If T were innite, Lemma 3.40 above says that
S must also be innite. Since this is not true, we nd that T must have
a nite number of elements. So, assume that T has m elements, say T =
y
1
, . . . , y
m
. We wish to show that m = n. We may think of S as the set
B of Lemma 3.39, since it clearly spans V . Also, we may think of the set
T as the set C of the lemma, since T, being a basis, is certainly linearly
independent. Then the lemma says that m must be less than or equal to n.
Now let us reverse this situation: let us think of T as the set B, and let
us think of S as the set C. (Why can we do this?) Then the lemma says
that n must be less than or equal to m. Thus, we have m n and n m,
so we nd that n = m. 2
We are nally ready to make the notion of the size of a vector space
precise!
Denition 3.42. A (nontrivial) vector space V over a eld F is said to be
nite-dimensional (or nite-dimensional over F) if it has a basis with a nite
number of elements in it; otherwise, it is said to be innite-dimensional (or
innite-dimensional over F). If V is nite-dimensional, the dimension of V is
dened to be the number of elements in any basis. If V is innite-dimensional,
the dimension of V is dened to be innite. If V has dimension n, then V is
also referred to as an n-dimensional space (or as being n-dimensional over F);
this is often written as dim
F
(V ) = n.
Remark 3.43. By convention, the dimension of the trivial space is taken to
be zero. This is consistent with the fact that it has as basis the empty set,
which has zero elements.
Let us consider the dimensions of some of the vector spaces in the exam-
ples on page 97 (see also the examples on page 111, where we consider bases
of these vector spaces). R
2
and R
3
have dimensions 2 and 3 (respectively)
as vector spaces over R.
Question 3.44. What is the dimension of R
n
?
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 121
Q[

2] is 2-dimensional over Q. R[x] is innite-dimensional over R, while


Q[

2,

3] is 4-dimensional over Q. Similarly, M


n
(R) is n
2
-dimensional over
R.
Question 3.45. What is the dimension of F
n
[x] over F? (Warn-
ing! It is not n.)
Question 3.46. By Example 3.7, C is a vector space over C, and
as well, over R. What is the dimension of C as a C-vector space?
As a R-vector space?
With the denition of dimension under our belt, the following is another
corollary to the Exchange Lemma (Lemma 3.39):
Corollary 3.47. Let V be an n-dimensional vector space. Then every sub-
set S of V consisting of more than n elements is linearly dependent. (Al-
ternatively, if S is a linearly independent subset of V then S has at most n
elements.)
Proof. Assume, to the contrary, that V contains a linearly independent sub-
set S that contains more than n elements. Therefore, we can nd n + 1
distinct elements v
1
, v
2
, . . . , v
n+1
in S. Write C for the set v
1
, v
2
, . . . , v
n+1

and let B be any basis. By the very denition of dimension, B must have
n elements. Now apply Lemma 3.39 to the sets B and Cwe nd that
n + 1 n, which is a contradiction. Hence every subset of V consisting of
more than n elements must be linearly dependent, or, what is the same, any
linearly independent subset of V must have at most n elements. 2
Similarly, with the denition of dimension under our belt, the following
is an easy corollary of Proposition 3.37:
Corollary 3.48. Let V be an n-dimensional vector space. Then any span-
ning set for V has at least n elements.
Proof. Let S be a spanning set, and assume that [S[ = t < n. By Proposition
3.37, some subset of S is a basis of V . Since this subset can have at most t
122 CHAPTER 3. VECTOR SPACES
elements, it follows that the dimension of V , which is the size of this basis,
is at most t. This contradicts the fact that the dimension of V is n. 2
Putting together Corollary 3.48 and Proposition 3.37, we nd that if V
is an n-dimensional vector space, then any spanning set for V must have at
least n elements, and this set can then be shrunk to a basis of V (consisting of
exactly n elements). There is a corresponding result for linearly independent
elements in V . Corollary 3.47 shows that any linearly independent subset
of V must have at most n elements. What we will see in Proposition 3.49
below is that any linearly independent subset of V can be expanded to a
basis of V (which will then have exactly n elements).
Proposition 3.49 below holds even when V is not assumed to be nite-
dimensional, but a full proof requires the use of Zorns Lemma. The proof
of the general case is sketched in the remarks on page 224 in Chapter B in
the Appendix.
Proposition 3.49. Let V be a nite-dimensional vector space, and let C
be a linearly indepenent set. Then C can be expanded to a basis of V , i.e.,
there exists a basis B of V such that C B.
Proof. Let n be the dimension of V . Then by Corollary 3.47 C has at
most n elements in it. Assume that C = v
1
, v
2
, . . . , v
t
for some t n.
If C already spans V , then C would be a basis and we would be done.
(And if this happens, you know that t must equal n by Corollary 3.41!) So
assume that C does not span V . By the very denition of what it means
to span a vector space, there must be a vector in V , call it v
t+1
, that is not
expressible as a linear combination of the elements in C. We claim that the
set C
1
= v
1
, v
2
, . . . , v
t
, v
t+1
must be linearly independent. For suppose
f
1
v
1
+ + f
t
v
t
+ f
t+1
v
t+1
= 0 for some scalars f
i
, not all of which are
zero. Then f
t+1
cannot be zero, since otherwise our relation would read
f
1
v
1
+ +f
t
v
t
= 0 for nonzero scalars f
i
, and this would violate the linear
independence of C. Therefore, we may divide our original relation by f
t+1
3.2. LINEAR INDEPENDENCE, BASES, DIMENSION 123
to nd v
t+1
= (f
1
/f
t+1
)v
1
+ +(f
t
/f
t+1
)v
t
, contradicting the fact that
v
t+1
is not expressible as a linear combination of elements of C. Thus, C
1
is indeed linearly independent as claimed.
Note that the set C
1
has t + 1 elements. If C
1
spans V , then C
1
would
be a basis of V containing C, and we would be done. Otherwise, we could
expand C
1
to a linearly independent set C
2
and repeat our arguments . . . .
Notice that in the process above, we start with our set C with t elements,
and at each stage, we come up with a set that has one more element than the
set at the previous stage. When we reach a set with exactly n elements, this
set must span V , for if not, the set we would get at the next stage would
contain n + 1 elements and would be linearly independent, contradicting
Corollary 3.47 above. This set with exactly n elements would therefore be
a basis of V containing C. 2
Example 3.50. For example, in R
2
, consider the linear independent set
i. The contention of the theorem above is that one can adjoin one other
vector to this to get a basis for R
2
: for instance the set i, j is a basis, and
so, for that matter, is the set i, w. (Here, just as earlier in the chapter,
i = (1, 0), j = (0, 1), and w = (1/

2, 1/

2).)
We end this section with two more easy results concerning spanning sets
and linearly independent sets: the proofs simply consist of combining earlier
results!
Proposition 3.51. Let V be an n-dimensional vector space and S a subset
of V . Then:
1. If S is a spanning set for V (so [S[ n by Corollary 3.48), and if
moreover [S[ = n, then S is a basis for V .
2. If S is a linearly independent set (so [S[ n by Corollary 3.47), and
if moreover [S[ = n, then S is a basis for V .
Proof. As promised, the proof simply consists of combining previous results:
124 CHAPTER 3. VECTOR SPACES
1. Given S a spanning set with n elements, Proposition 3.37 shows that
some subset S

of S is a basis. Hence, as V is n-dimensional, [S

[ = n.
Since [S[ = n as well, we nd S

= S, i.e., S is already a basis for V .


2. Given S a linearly independent set with n elements, Proposition 3.49
shows that S can be expanded to a basis S

. Hence, as V is n-
dimensional, [S

[ = n. Since [S[ = n as well, we nd S

= S, i.e.,
S is already a basis for V .
2
Remark 3.52. We have proved quite a few results in this section concerning
spanning sets, linearly independent set, and bases. It would be helpful to
summarize these results here. In what follows, V is, as usual, a vector space
over a eld F:
1. A basis for V is a subset of V that spans V and in which there is
no redundancy. Alternatively, a basis is a subset that spans V and is
linearly independent.
2. Bases always exist.
3. If one basis for V has an innite number of elements in it, then every
other basis for V must also have an innite number of elements. When
this occurs, we say V is innite dimensional.
4. If one basis for V has a nite number of elements n in it, then every
other basis must also have n elements. When this occurs, we say V if
nite-dimensional and we dene the dimension of V to be n.
5. Assume that V is of nite dimension n:
(a) Any spanning set S for V must contain at least n elements.
(b) Any spanning set S can be shrunk to a basis for V .
3.3. SUBSPACES AND QUOTIENT SPACES 125
(c) If a spanning set S has exactly n elements, then it is already a
basis for V .
(d) Any linearly independent set S must contain at most n elements.
(e) Any linearly independent set S can be expanded to a basis for V .
(f) If a linearly independent set S has exactly n elements in it, then
it is already a basis for V .
Of course, the statements in both (5b) and (5e) above hold even when
V is innite-dimensional.
3.3 Subspaces and Quotient Spaces
The idea behind subspaces is very similar to the idea behind subrings, while
the idea behind quotient spaces is very similar to the idea behind quotient
rings. (There is one key dierence: quotient rings are obtained by modding
out rings by ideals, modding out by subrings will not work. However, quo-
tient spaces can be made by modding out by subspaces. We will see this
later in the chapter.)
We will consider subspaces rst:
Denition 3.53. Given a vector space V over a eld F, a subspace of V
is a nonempty subset W of V that is closed with respect to vector addition
and scalar multiplication, such that with respect to this addition and scalar
multiplication, W is itself a vector space (that is, W satises all the axioms of
a vector space).
Now, we saw in the context of rings (Exercise 2.28 in Chapter 2) that
one could have a subset S of a ring R such that S is closed with respect to
addition and multiplication, and yet S is not a subring of R. It turns out
that in the case of vector spaces, it is enough for a (nonempty) subset W
of a vector space V to be closed with respect to vector addition and scalar
multiplicationW will then automatically satisfy all the axioms of a vector
space. This is the content of Theorem 3.55 below.
126 CHAPTER 3. VECTOR SPACES
But rst, a quick exercise, which is really a special case of Exercise 4.22
in Chapter 4 ahead:
Exercise 3.54. Let W be a subspace of the vector space V . Thus,
by denition (W, +) is an abelian group. Let 0
W
denote the identity
element of this group, and let 0
V
denote the usual 0 of V . Show
that 0
W
= 0
V
. (See also Exercise 2.29 in Chapter 2.)
Theorem 3.55. Let V be a vector space over a eld F, and let W be
a nonempty subset of V that is closed with respect to vector addition and
scalar multiplication. Then W is a subspace of V .
Proof. We need to check that all the axioms of a vector space hold. Let us
rst check that (W, +) is an abelian group. Vector addition in W is both
commutative and associative, since for any v
1
, v
2
, v
3
W, we may consider
v
1
, v
2
and v
3
to be elements of V , and in V , the relations v
1
+v
2
= v
2
+v
1
,
and (v
1
+v
2
) +v
3
= v
1
+(v
2
+v
3
) certainly hold. Next, given any v W, let
us show that v is also in W. For this we invoke that fact that W is closed
with respect to scalar multiplicationsince v W, 1 v is also in W, and
1 v is, of course, just v (see Remark 3.12 above). Now let us show that
0 is in W. Observe that so far, we have not used the hypothesis that W
is nonempty. (The proofs that we have given for the fact that addition in
W is associative and that every element in W has its additive inverse in W
hold vacuously even in the case where W is empty. For instance, the chain
of arguments v W 1 v W (as W is closed with respect to scalar
multiplication) v W is correct even when there is no vector v in W
to begin with!) Now let us use the fact that W is nonempty. Since W is
nonempty, it contains at least one vector, call it v. Then, by what we proved
above, v is also in W. Since W is closed under vector addition, v + (v)
is in W, and so 0 is in W. We have thus shown that (W, +) is an abelian
group.
It remains to be shown that the four axioms of scalar multiplication also
hold for W. But for any r and s in F and v and w in W, we may consider
v and w to be elements of V , and as elements of V , we certainly have the
3.3. SUBSPACES AND QUOTIENT SPACES 127
relations r (v +w) = r v +r w, (r +s) v = r v +s v, (rs) v = r (s v),
and 1 v = v. Hence, the axioms of scalar multiplication hold for W.
This proves that W is a subspace of V . 2
We have the following, which captures both closure conditions of the test
in Theorem 3.55 above:
Corollary 3.56. Let V be a vector space over a eld F, and let W be a
nonempty subset of V that is closed under linear combinations, i.e., for all
w
1
, w
2
in W and all f
1
, f
2
in F, the element f
1
w
1
+ f
2
w
2
is also in W.
Then W is a subspace of V Conversely, if W is a subspace, then W is closed
under linear combinations.
Proof. Assume that W is closed under linear combinations. Taking f
1
=
f
2
= 1, we nd that w
1
+ w
2
is in W for all w
1
, w
2
in W, i.e., W is closed
under addition. Taking f
2
= 0 we nd f
1
w
1
is in W for all w
1
in W and
all f
1
in F, i.e., W is closed under scalar multiplication. Thus, by Theorem
3.55, W is a subspace. Conversely, if W is a subspace, then for w
1
, w
2
in W
and all f
1
, f
2
in F, f
1
w
1
and f
2
w
2
are both in W because W is closed under
scalar multiplication, and then, f
1
w
1
+ f
2
w
2
is in W because W is closed
under vector addition. Hence, W is closed under linear combinations. 2
Here are some examples of subspaces. In each case, check that the con-
ditions of Theorem 3.55 apply.
Example 3.57. The set consisting of just the element 0 is a subspace.
Question 3.57.1. Why?
We refer to this as the zero subspace.
Example 3.58. If you think of R
2
as the vectors lying along the xy plane
of 3-dimensional xyz space, then R
2
becomes a subspace of R
3
.
128 CHAPTER 3. VECTOR SPACES
Example 3.59. For any nonnegative integers n and m with n < m, F
n
[x]
is a subspace of F
m
[x]. Also, F
n
[x] and F
m
[x] are both subspaces of F[x].
Example 3.60. U
n
(R) (the set of upper triangular n n matrices with
entries in R) is a subspace of the Rvector space M
n
(R).
Example 3.61. Q[

2] is a subspace of the Qvector space Q[

2,

3]. Of
course, we know very well by now that since Q Q[

2], Q[

2] is directly
a Qvector space. Both Qvector space structures on Q[

2] are the same,


that is, in both ways of looking at Q[

2] as a Qvector space, the rules


for vector addition and scalar multplication are the same. In the rst way
(viewing Q[

2] as a subspace of Q[

2,

3]), we rst think of any element


a+b

2 of Q[

2] as the element a+b

2+0

3+0

6 of Q[

2,

3]. Doing so,


the vector sum of a+b

2+0

3+0

6 (= a+b

2) and a

+b

2+0

3+0

6
(= a

+b

2) is (a+a

)+(b+b

2+0

3+0

6 (= (a+a

)+(b+b

2). On
the other hand, viewing Q[

2] directly as a Qvector space, the vector sum


of a +b

2 and a

+b

2 is also (a +a

) + (b +b

2. In a similar manner,
you can see that the rules for scalar multiplication are also identical.
Example 3.62. The example above generalizes as follows: Suppose F
K L are elds. The eld extension L/F makes L an Fvector space.
Since K is closed with respect to vector addition and scalar multiplication,
K becomes a subspace of L. But the eld extension K/F exhibits K directly
as an Fvector space. The two Fvector space structures on K, one that
we get from viewing K as a subspace of the Fvector space L and the other
that we get directly from the eld extension K/F, are the same.
Example 3.63. In Example 3.4 let

0
F denote the set of all innite
tuples (a
0
, a
1
, . . . ) in which only nitely many of the a
i
are nonzero. Then

0
F is a subspace of

0
F.
Exercise 3.63.1. Prove this!
Exercise 3.63.2. Show that the set S = e
0
, e
1
, e
2
, . . . is a basis
for

0
F? (Contrast this with Exercise 3.34.1 above.)
3.3. SUBSPACES AND QUOTIENT SPACES 129
This example is known as the direct sum of (countably innite) copies
of F.
Example 3.64. For any eld F, F[x
2
] (that is, the set of all polynomials
of the form
n

i=0
f
i
x
2i
, n 0) is a subspace of F[x].
Question 3.64.1. What is the dimension of this subspace? Can
you discover a basis for this subspace?
Example 3.65. Let V be a vector space over a eld F, and let S be any
nonempty subset of V .
Denition 3.65.1. The linear span of S is dened as the set
of all linear combinations of elements of S, that is, the set of all
vectors in V that can be written as c
1
s
1
+ c
2
s
2
+ + c
k
s
k
for
some integer k 1, some scalars c
i
, and some vectors s
i
S.
Exercise 3.65.1. Show that the linear span of S is a subspace of
V .
For instance, in R
3
, if we take S = i, j, then the linear span of S is
the set of all vectors in R
3
that are of the form ai +bj for suitable scalars a
and b, in other words, the xy-plane. As we saw in Example 3.58 above, the
xy-plane is a subspace of R
3
!
You should be able to do the following:
Question 3.66. Which of the following are subspaces of R
3
?
1. (a, b, c) [ a + 3b = c
2. (a, b, c) [ a = b
2

3. (a, b, c) [ ab = 0
We turn our attention now to quotient spaces. Recall how we constructed
the quotient ring R/I given a ring R and an ideal I: we rst dened an
equivalence relation on R by a b if and only if a b I (see page 57 in
Chapter 2). We found that the equivalence class of an element a is precisely
the coset a+I (Lemma 2.78 in that chapter). We then dened the ring R/I
130 CHAPTER 3. VECTOR SPACES
to be the set of equivalence class of R under the naturally induced denitions
[a] + [b] = [a + b] and [a][b] = [ab] (see Denition 2.79 in that chapter). Of
course, we had to check that our operations were well-dened and that we
indeed obtained a ring by this process (see Lemma 2.80 and Theorem 2.82
in that chapter). We will follow the same approach here.
So, given a vector space V over a eld F, and a subspace W, we dene
an equivalence relation on W by v w if and only if v w W. Exactly as
on page 57, we can see that this is indeed an equivalence relation. We dene
the coset a +W to be the set of all elements of the vector space of the form
a + w as w varies in W, and we call this the coset of W with respect to a
We have the following, whose proof is exactly as in Lemma 2.78 of Chapter
2 and is therefore omitted:
Lemma 3.67. The equivalence class [a] is precisely the coset a +W.
As with quotient rings, we will denote the set of equivalence classes of V
by V/W, whose members we will denote as both [a] and a+W. We dene an
addition operation on V/W and a scalar multiplication F V/W V/W
by the following:
Denition 3.68. [u] + [v] = [u +v] and f [u] = [f u] for all [u] and [v] in
V/W and all f in F. (In coset notation, this would read (u+W) +(v +W) =
(u +v) +W, and f(u +W) = fu +W.) As always, if the context is clear, we
will often omit the sign and write r[b] for r [b].
The following should now be easy, after your experience with quotient
rings (see Lemma 2.80 in Chapter 2):
Exercise 3.69. Show that the operations of addition and scalar
multiplication on V/W described above in Denition 3.68 are well-
dened. Show that the addition operation is commutative.
We now have the following:
Theorem 3.70. (V/W, +, ) is a vector space over F.
3.3. SUBSPACES AND QUOTIENT SPACES 131
Proof. As in Theorem 2.82 of Chapter 2, the proof involves checking that all
the vector space axioms of Denition 3.1 hold. The proof that (V/W, +) is
an abelian group is in fact identical to the proof that (R/I, +) is an abelian
group, and we will not do it here (see the remarks on page 154 on where
the similarity comes from). As for the axioms for scalar multiplication, let
us go through them one-by-one:
1. For all r F and [v], [w] V/W, we have r([v] + [w]) = r[v + w] =
[r(v+w)] = [rv+rw], where the rst and second equalities are because
of the way operations are dened on V/W and the last equality is
because r(v + w) = rv + rw is a property that holds in the original
vector space V . On the other hand, r[v]+r[w] = [rv]+[rw] = [rv+rw],
where the equalities are because of the way operations are dened on
V/W. Thus, both sides equal [rv + rw], so indeed r([v] + [w]) =
r[v] +r[w].
2. For all r, s F and [v] V/W, (r + s)[v] = [(r + s)v] = [rv + sv],
where the last equality is because of properties of the original vector
space V . On the other hand, r[v] + s[v] = [rv] + [sv] = [rv + sv]. It
follows that (r +s)[v] = r[v] +s[v].
3. For all r, s F and [v] V/W, (rs)[v] = [(rs)v] = [r(sv)], where the
last equality is because of properties of the original vector space V ,
while r(s[v]) = r[sv] = [r(sv)]. It follows that (rs)[v] = r(s[v]).
4. For all [v] V/W, 1[v] = [1v] = [v], where the last equality is because
1 v = v holds in V
2
Denition 3.71. (V/W, +, ) is called the quotient space of V by the subspace
W.
As with the case of quotient rings, the intuition behind V/W is that it
is a space formed by setting all elements of W to zero. More colloquially,
132 CHAPTER 3. VECTOR SPACES
the construction kills all elements in W, or divides out all elements in
W. This last description explains the term quotient space, and pushing
the analogy one step further, V/W can then be thought of as the set of all
remainders after dividing out by W, endowed with the natural quotient
binary operation and scalar multiplication of Denition 3.71.
For example, take V = R
3
and W to be the subspace consisting of all
vectors lying on the xy plane (Example 3.58 above). What sense do we make
of V/W? Every vector v in R
3
can be written as ai + bj + ck for unique
real numbers a, b, and c (see Example 3.27 above). Notice that both ai and
bj are in W. If we set these to zero we are left simply with ck which is
a vector lying on the z-axis. Moreover every vector ck lying on the z-axis
arises this way (why?) so we nd that V/W is precisely the z-axis. As in the
case of rings, this is more than just an equality of sets: this identication of
V/W with the z-axis preserves the vector space structure as well, which we
will make more precise in the next section.
The following lemma will be useful ahead. We will state the result only
for nite-dimensional vector spaces, although, the result (suitably phrased)
is true for innite-dimensional spaces as well (see Exercise 3.109):
Lemma 3.72. Let V be a nite-dimensional vector space over a eld F and
let W be a subspace. Let b
1
, . . . , b
m
be a basis for W. Expand this to a
basis b
1
, . . . , b
m
, b
m+1
, . . . , b
n
of V (see Theorem 3.49). Then the set (of
equivalence classes of vectors) b
m+1
+ W, . . . , b
n
+ W is a basis for the
quotient space V/W.
Proof. Given any v + W V/W, we may write v = r
1
b
1
+ + r
m
b
m
+
r
m+1
b
m+1
+ + r
n
b
n
for suitable scalars r
1
, . . . , r
n
. Since the vectors b
1
,
. . . , b
m
are in W, so is the vector r
1
b
1
+ +r
m
b
m
. Thus, v (r
m+1
b
m+1
+
+r
n
b
n
) W. But this just says that v+W = (r
m+1
b
m+1
+ +r
n
b
n
)+W.
Recalling how vector addition and scalar multiplication are dened in V/W,
we nd v + W = (r
m+1
b
m+1
+ + r
n
b
n
) + W = r
m+1
(b
m+1
+ W) + +
r
n
(b
n
+W). This shows that the set b
m+1
+W, . . . , b
n
+W spans V/W.
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS133
As for the linear independence, assume that r
m+1
(b
m+1
+ W) + +
r
n
(b
n
+ W) = 0
V/W
for some scalars r
m+1
, . . . , r
n
. Since 0
V/W
is the class
of W, we nd r
m+1
b
m+1
+ + r
n
b
n
= w for some w W. But the set
b
1
, . . . , b
m
is a basis for W, so we may write w = r
1
b
1
+ + r
m
b
m
for
suitable scalars r
1
, . . . , r
m
. Putting this together, we nd r
1
b
1
+ +r
m
b
m
+
(r
m+1
)b
m+1
+ +(r
n
)b
n
= 0. Since the set b
1
, . . . , b
m
, b
m+1
, . . . , b
n
is
a basis of V , each r
i
(i = 1, . . . , n) must be zero. In particular, r
m+1
, . . . , r
n
must all be zero, proving the linear independence of b
m+1
+W, . . . , b
n
+W.
2
We get an easy corollary from this:
Corollary 3.73. Let V be a nite-dimensional vector space over a eld F
and let W be a subspace. Then dim(V ) = dim(W) + dim(V/W).
Proof. This is clear from the statement of the lemma above. 2
3.4 Vector Space Homomorphisms: Linear Trans-
formations
The ideas in this section parallel the development of ring homomorphisms
in Chapter 2. As in the passage from R to R/I, we notice some preservation
of structure when passing from V to V/W: the operations in V/W are
essentially the same as the operations in V except that the elements of
V have all been divided out by W. What this means is analogous to the
situation with R and R/I: let us denote by f the function f : V V/W
that pushes u V down to u+W. Since u+W = f(u), v +W = f(v),
and (u+v) +W = f(u+v), we nd f(u) +f(v) = f(u+v). The function f
that sends u to u +W, along with the property f(u) +f(v) = f(u +v) for
all u and v in V , precisely captures the notion that addition in V/W and V
are essentially the same.
134 CHAPTER 3. VECTOR SPACES
Similarly, the denition of scalar multiplication in V/W: r(u + W) =
ru +W (here r is in F) gives the feeling that scalar multiplication in V/W
is the same as the scalar multiplication in V except for dividing out by
W: once again this intuition is captured by the function f above along with
the property rf(u) = f(ru) for all r F and u in V .
Just as with rings we will turn this situation around. Suppose one has
a function f from one vector space V over F to another vector space X
over F (note that the set of scalars F is the same for both spaces) which
has the two properties described above, then one similarly gets the sense
that the vector space operations in the two F vector spaces V and X are
essentially the same except perhaps for dividing out by some subspace. In
analogy with rings, we should call this a vector space homomorphism, but
traditionally, such a function has been called a linear transformation:
Denition 3.74. Let V and X be two vector spaces over a eld F, and let
f : V X be a function. Suppose that f has the following properties:
1. f(u) +f(v) = f(u +v) for all u, v, in V ,
2. rf(u) = f(ru) for all r in F and u in V .
Then f is said to be a linear tranformation from V to X.
Remark 3.75. As with ring homomorphisms, there are some features of this
denition that are worth noting:
1. In the equation f(u) +f(v) = f(u+v), note that the operation on the
left side represents vector addition in the vector space X, while the
operation on the right side represents addition in the vector space V .
2. Similarly for the equation rf(u) = f(ru): the operation on the left
side represents scalar multiplication in X, while the operation on the
right side represents scalar multiplication in V .
3. By the very denition of a function, f is dened on all of V , how-
ever, the image of V under f need not be all of X i.e, f need not be
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS135
surjective (see Example 3.83 or Example 3.84 for instance, although,
such examples are really very easy to write down). However, the im-
age of V under f is not an arbitrary subset of X, the denition of a
linear tranformation ensures that the image of V under f is actually
a subspace of X (see Lemma 3.88 later in this section).
4. Note that it is not necessary to stipulate that f(0
V
) = 0
X
since the
property holds automatically, see Lemma 3.77 below.
5. The condition (1) of the denition simply says that f should be a
group homomorphism from the group (V, +) to the group (X, +) (see
Denition 4.57 in Chapter 4 ahead), while the second condition (2)
says that the group homomorphism should, in addition, be F-linear.
The following lemma combines the two conditions in the denition of a
linear transformation into one:
Lemma 3.76. Let V and X be two F-vector spaces, and let f : V X be
a function that satises the property that f(r
1
v
1
+r
2
v
2
) = r
1
f(v
1
) +r
2
f(v
2
)
for all v
1
, v
2
in V and all r
1
, r
2
in F. Then f is a linear transformation.
Conversely, if f is a linear transformation, then f(r
1
v
1
+r
2
v
2
) = r
1
f(v
1
) +
r
2
f(v
2
) for all v
1
, v
2
in V and all r
1
, r
2
in F.
Proof. Assume that f satises the property that f(r
1
v
1
+r
2
v
2
) = r
1
f(v
1
) +
r
2
f(v
2
) for all v
1
, v
2
in V and all r
1
, r
2
in F. Taking r
1
= r
2
= 1, we see that
f(v
1
+v
2
) = f(v
1
)+f(v
2
), and taking r
2
= 0, we see that f(r
1
v
1
) = r
1
f(v
1
).
Thus, f is a linear transformation. As for the converse, if f is a linear
transformation, then for all v
1
, v
2
in V and all r
1
, r
2
in F, f(r
1
v
1
+r
2
v
2
) =
f(r
1
v
1
) +f(r
2
v
2
) = r
1
f(v
1
) +r
2
f(v
2
), as desired.
2
The following lemma is analogous to Lemma 2.90 in Chapter 2:
Lemma 3.77. Let V and X be two F-vector spaces, and let f : V X be
a linear tranformation. Then f(0
V
) = 0
X
.
136 CHAPTER 3. VECTOR SPACES
Proof. This proof is identical to the proof of the corresponding Lemma 2.90
in Chapter 2, (since, ultimately, these are both proofs that a group homo-
morphism from a group G to a group H maps the identity in G to the
identity in Hsee Lemma 4.59 in Chapter 4 ahead). We start with the fact
that f(0
V
) = f(0
V
+0
V
) = f(0
V
) +f(0
V
). We now have an equality in X:
f(0
V
) = f(0
V
) + f(0
V
). Since (X, +) is an abelian group, every element
of X has an additive inverse, so there is an element, denoted f(0
V
) with
the property that f(0
V
) + (f(0
V
)) = (f(0
V
)) + f(0
V
) = 0
X
. Adding
f(0
V
) to both sides of f(0
V
) = f(0
V
) +f(0
V
), we get f(0
V
) +f(0
V
) =
f(0
V
) + (f(0
V
) +f(0
V
)). The left side is just 0
X
, while by associativity,
the right side is (f(0
V
) +f(0
V
)) +f(0
V
) = 0
X
+f(0
V
). But by the de-
nition of 0
X
, 0
X
+ f(0
V
) is just f(0
V
). We thus nd 0
X
= f(0
V
), thereby
proving the lemma. 2
Remark 3.78. Here is another way to prove the statement of the lemma
above: Pick any v V . Then, 0
V
= 0
F
v, so f(0
V
) = f(0
F
v) = 0
F
f(v) =
0
X
. (Here, the rst equality is due to Remark 3.12.2, and the last but
one equality is because f(rv) = rf(v) for any scalar r since f is a linear
transformation.)
Before proceeding to examples of linear transformations, let us consider
one remaining object, analogous to the kernel of a ring homomorphism. The
concept of a linear transformation was introduced to capture the notion of
operations on two F-vector spaces being the same except for dividing out
by some subspace. Just as with ring homomorphisms, the natural candidate
for this subspace is the following:
Denition 3.79. Given a linear transformation f : V X between two F-
vector spaces, the kernel of f is the set u V [ f(u) = 0
X
. It is denoted
ker(f).
As in the case of kernels of ring homomorphisms, the following statement
should come as no surprise:
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS137
Proposition 3.80. Let V and X be vector spaces over a eld F. The kernel
of a linear tranformation f : V X is a subspace of V .
Proof. By Corollary 3.56, it is sucient to check that ker(f) is a nonempty
subset of V that is closed under linear combinations. Since 0
V
ker(f)
(Lemma 3.77), ker(f) is nonempty. Now, for any w
1
, w
2
in ker(f) and any
r
1
, r
2
in F, we nd f(r
1
w
1
+r
2
w
2
) = r
1
f(w
1
) +r
2
f(w
2
) = r
1
0
X
+r
2
0
X
=
0
X
. Hence r
1
w
1
+r
2
w
2
is indeed in the kernel of f, so ker(f) is closed under
linear combinations.
2
Remark 3.81. As in the case of ring homomorphisms, for any linear trans-
formation f : V X between two F-vector spaces, we will have f(v) =
f(v). One proof is exactly the same as in Remark 2.91 in Chapter 2, and
this is not surprising: this is really a proof that in any group homomorphism
f from a group G to a group H, f(g
1
) will equal (f(g))
1
for all g G (see
Corollary 4.60 in Chapter 4). Another proof, of course, is to invoke scalar
multiplication and Remark 3.12.3: f(v) = f(1 v) = 1f(v) = f(v).
We are now ready to study examples of linear transformations. The
rst example is really the master-example: it provides an algorithm for
constructing linear transformations and leads to matrix representations of
linear transformations that are useful for computations:
Example 3.82. Master-Example of Linear Transformation: Let V be an
F-vector space that is (for simplicity) nite-dimensional, and let b
1
, . . . , b
n

be a basis for V . Let X be an F-vector space, and let w


1
, . . . , w
n
be arbitrary
vectors in X. Then we have the following:
Lemma 3.82.1. The function f : V X that sends each basis element
b
i
to the vector w
i
(i = 1, . . . , n) and a general linear combination r
1
b
1
+
+ r
n
b
n
(r
i
F) to the vector r
1
w
1
+ + r
n
w
n
is a (well-dened)
138 CHAPTER 3. VECTOR SPACES
linear transformation. Conversely, any linear transformation f : V X
is determined fully by where f sends each basis vector b
i
to: if f(b
i
) = w
i
,
then f must be dened on all of V by the formula f(r
1
b
1
+ + r
n
b
n
) =
r
1
w
1
+ +r
n
w
n
.
Proof. That f is well-dened comes from the fact that the b
i
form a basis
for V , so each element u V is expressible as r
1
b
1
+ +rf
n
b
n
for a unique
choice of scalars r
i
. Hence, dening what f does to the element u in terms
of the scalars r
i
poses no problem as the r
i
are uniquely determined by u.
It is now trivial to check that f is a linear transformation: Given u =
r
1
b
1
+ +r
n
b
n
and v = s
1
b
1
+ +s
n
b
n
(here, the r
i
and the s
j
are scalars),
we nd u +v = (r
1
+s
1
)b
1
+ + (r
n
+s
n
)b
n
, so f(u +v) = (r
1
+s
1
)w
1
+
+(r
n
+s
n
)w
n
= (r
1
w
1
+ +r
n
w
n
) +(s
1
w
1
+ +s
n
w
n
) = f(u) +f(v).
Similarly, given any scalar r F, rv = r(r
1
b
1
+ + r
n
b
n
) = (rr
1
)b
1
+
+ (rr
n
)b
n
, so f(rv) = (rr
1
)w
1
+ + (rr
n
)w
n
= r(r
1
w
1
+ +r
n
w
n
) =
rf(v).
Exercise 3.82.1. Which vector space axioms were used in the two
chains of equalities in the proof above that f(u +v) = f(u) +f(v)
and f(rv) = rf(v)?
Exercise 3.82.2. Would the proof be any more complicated if V
were not assumed to be nite-dimensional? (Work it out!)
Conversely, if f is any linear transformation from V to X and if f(b
i
) =
w
i
(i = 1, . . . , n), then, since f is a linear transformation, f(r
1
b
1
+ +
r
n
b
n
) = r
1
f(b
1
) + + r
n
f(b
n
) = r
1
w
1
+ + r
n
b
n
. Since any vector in
V is a linear combination of the vectors b
1
, . . . , b
n
, this formula completely
determines what f sends each vector in V to. 2
Now let us carry this one step further. Let f : V X be a linear trans-
formation, and suppose (for simplicity) that X is also nite-dimensional,
with some basis c
1
, . . . , c
m
. Thus, every vector w X can be uniquely
expressed as s
1
c
1
+ +s
m
c
m
for suitably scalars c
i
. In particular, each of
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS139
the vectors w
i
(= f(b
i
)) therefore can be expressed as a linear combination
of the c
j
as follows:
w
1
= p
1,1
c
1
+ +p
1,m
c
m
w
2
= p
2,1
c
1
+ +p
2,m
c
m
.
.
. =
.
.
.
w
n
= p
n,1
c
1
+ +p
n,m
c
m
(The p
i,j
are scalars. Note how they are indexed: p
i,j
stands for the coe-
cient of c
j
in the expression of w
i
as a linear combination of the various cs.
Thus, across each row of this equation, it is the second index in p
i,j
that
varies.) Now consider an arbitrary u V , expressed as a linear combination
u = r
1
b
1
+ +r
n
b
n
for suitable scalars r
i
. Then
f(u) = r
1
w
1
+ +r
n
w
n
= r
1
(p
1,1
c
1
+ +p
1,m
c
m
)
+ r
2
(p
2,1
c
1
+ +p
2,m
c
m
)
.
.
.
.
.
.
+ r
n
(p
n,1
c
1
+ +p
n,m
c
m
)
Now let us regroup the right side so that all the scalars that are attached to
the basis vector c
1
are together, all scalars attached to the basis vector c
2
are together, etc. Doing so, we nd
f(u) = (p
1,1
r
1
+p
2,1
r
2
+ +p
n,1
r
n
) c
1
= (p
1,2
r
1
+p
2,2
r
2
+ +p
n,2
r
n
) c
2
=
.
.
.
= (p
1,m
r
1
+p
2,m
r
2
+ +p
n,m
r
n
) c
m
(Study this relation carefully: note how the indices of the p
i,j
behave: p
i,j
multiplies r
i
and is attached to c
j
. Notice that across each row of this
140 CHAPTER 3. VECTOR SPACES
equation, it is the rst index of p
i,j
that changes: this is in contrast to the
behavior of the indices in the previous equations. There, it was the second
index of p
i,j
that changed in each row.)
Now suppose that we adopt the convention that we will write any vector
u V , u = r
1
b
1
+ +r
n
b
n
as the column vector
u =
_
_
_
_
_
_
_
r
1
r
2
.
.
.
r
n
_
_
_
_
_
_
_
and any vector w X, w = s
1
c
1
+ +s
m
c
m
as the column vector
w =
_
_
_
_
_
_
_
s
1
s
2
.
.
.
s
m
_
_
_
_
_
_
_
Let us rewrite our equation for f(u) above in the form f(u) = s
1
c
1
+ +
s
m
c
m
for suitable scalars s
i
. Since the coecient of c
1
in f(u) is p
1,1
r
1
+
p
2,1
r
2
+ +p
n,1
r
n
(see the equation above), we nd s
1
= p
1,1
r
1
+p
2,1
r
2
+
+p
n,1
r
n
. Similarly, since the coecient of c
2
in f(u) is p
1,2
r
1
+p
2,2
r
2
+
+p
n,2
r
n
, we nd s
2
= p
1,2
r
1
+p
2,2
r
2
+ +p
n,2
r
n
. Proceeding thus, we
nd that the vectors u and f(u) are related by the matrix equation
_
_
_
_
_
_
_
s
1
s
2
.
.
.
s
m
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
p
1,1
p
2,1
. . . p
n,1
p
1,2
p
2,2
. . . p
n,2
.
.
.
.
.
. . . .
.
.
.
p
1,m
p
2,m
. . . p
n,m
_
_
_
_
_
_
_
_
_
_
_
_
_
_
r
1
r
2
.
.
.
r
n
_
_
_
_
_
_
_
(3.1)
There is an easy way to remember how this matrix is constructed: notice
that the rst column of the matrix is precisely the vector w
1
written in terms
of its coecients in the basis c
1
, . . . , c
m
, the second column is precisely
the vector w
2
written in terms of its coecients in the basis c
1
, . . . , c
m
,
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS141
and so on, until the last column is precisely the vector w
n
written in terms
of its coecients in the basis c
1
, . . . , c
m
.
Notice something else about this matrix: it depends vitally on the choice
of the basis b
1
, . . . , b
n
for V and on the choice of the basis c
1
, . . . , c
m

of X. For, our entire derivation of Equation 3.1 depended on our writing


u as a linear combination of the vectors b
1
, . . . , b
n
and the v
i
as a linear
combination of the vectors c
1
, . . . , c
m
. A dierent choice of basis for V or a
dierent choice of basis for X would have led to dierent linear combinations,
hence to a dierent matrix in Equation 3.1 above.
We refer to this matrix as the matrix of the linear transformation f in the
bases b
1
, . . . , b
n
for V and c
1
, . . . , c
m
for X. (If V = X and we use the
same basis to describe both vectors and their images under f, then we simply
refer to this matrix as the matrix of f in the basis b
1
, . . . , b
n
.) Since each
linear transformation is uniquely determined by the w
i
= f(b
i
), and since
the w
i
can be written uniquely as a linear combination of the basis vectors
c
1
, . . . , c
m
, and since these unique coecients of the c
j
then become the
i-th column in the matrix, we nd that each linear transformation uniquely
determines an mn matrix with coecients in F, by this procedure.
But more is true. Since an arbitrary mn matrix with coecients in F
determines a collection of vectors w
1
, . . . , w
n
from X (with the i-th column
representing w
i
), and since a linear transformation f can be constructed from
these vectors w
1
, . . . , w
n
by dening f(r
1
b
1
+ +r
n
b
n
) = r
1
w
1
+ +r
n
w
n
,
we nd that an arbitrary m n matrix with coecients in F leads to a
linear transformation f : V X. Thus, the set of linear transformations
f : V X is in one-to-one correspondence with m n matrices with
coecients in F.
This is all very pretty!
142 CHAPTER 3. VECTOR SPACES
Question 3.82.1. For practice with a concrete example, think
about the following:
1. If you are given that a linear transformation f : R
2
R
3
sends the vector i = (1, 0) to the vector (1, 2, 0) and the
vector j = (0, 1) to the vector (2, 1, 3), what does f do to an
arbitrary vector (a, b) in R
2
?
2. What is the matrix of f with respect to the basis i, j of R
2
and the basis (1, 0, 0), (0, 1, 0), (0, 0, 1) of R
3
?
3. What is the matrix of f with respect to the basis i, w =
(1/

2, 1/

2) of R
2
(see Example 3.26) and the basis
(1, 0, 0), (0, 1, 0), (0, 0, 1) of R
3
? (Hint: What does f do
to w?)
Question 3.82.2. What are the coordinates, in the standard basis
for R
3
(see Example 3.27), of the vector xi +yj, after it undergoes
the linear transformation f : R
2
R
3
given by the matrix
_
_
a b
c d
e f
_
_
where the matrix is written with respect to the basis i, w =
(1/

2, 1/

2) of R
2
and the basis (1, 0, 0), (1, 1, 0), (0, 1, 1)
of R
3
? (See Exercise 3.27.1 for why (1, 0, 0), (1, 1, 0), (0, 1, 1)
is a basis of R
3
.)
Question 3.82.3. How will the treatment in this example change
if either V or X (or both) were to be innite-dimensional F-vector
spaces? (See the remarks on page 155 in the notes for some hints.)
Example 3.83. Let V be an F-vector space. The map f : V V that
sends any v V to 0 is a linear transformation.
Question 3.83.1. If V is n-dimensional with basis b
1
, . . . , b
n
,
what is the matrix of f with respect to this basis?
Example 3.84. Let V be an F-vector space, and let W be a subspace. The
map f : W V dened by f(w) = w is a linear transformation.
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS143
Question 3.84.1. Assume that W is m-dimensional and V is n-
dimensional. Pick a basis B = b
1
, . . . , b
m
of W and expand to a
basis C = b
1
, . . . , b
m
, b
m+1
, . . . , b
n
of V . What is the matrix of
f with respect to the basis B of W and the basis C of V ?
Example 3.85. Let F be a eld, and view M
n
(F) as a vector space over
F (see Example 3.5). Now view F as an F-vector space (see Example
3.7: note that F is trivially an extension eld of F). Then the function
f : M
n
(F) F that sends a matrix to its trace is a linear tranformation.
(Recall that the trace of a matrix is the sum of its diagonal entries.)
To prove this, note that this is really a function that sends basis vectors
of the form e
i,i
to 1 and e
i,j
(i ,= 0, j ,= 0) to 0, and an arbitrary matrix

i,j
m
i,j
e
i,j
to m
1,1
1+ +m
n,n
1. Now apply Lemma 3.82.1 to conclude
that f must be a linear transformation.
See Exercise 3.100 at the end of the chapter.
Example 3.86. Let V be a vector space over a eld F and let W be a sub-
space. Assume that V is nite-dimensional (for simplicity). Let dim
F
(V ) =
n, and dim
F
(W) = m. Let b
1
, . . . , b
m
be a basis for W, and let us expand
this to a basis b
1
, . . . , b
m
, b
m+1
, . . . , b
n
(see Theorem 3.49). Given any
v V , we may therefore write v = f
1
b
1
+ +f
m
b
m
+f
m+1
b
m+1
+ f
n
b
n

for unique scalars f


i
F.
Exercise 3.86.1. Show that the function : V W that sends
any v expressed as above to the vector f
1
b
1
+ + f
m
b
m
in W is
a linear transformation from V to W.
Exercise 3.86.2. Is surjective? Describe a basis for ker().
Exercise 3.86.3. The basis b
1
, . . . , b
m
of W can be expanded
to a basis b
1
, . . . , b
m
, b
m+1
, . . . , b
n
of V in many dierent ways
(see Example 3.50). The denition of above depends on which
choice of b
m+1
, . . . , b
n
we make. For example, take V = R
2
and
W the subspace represented by the x-axis. Take the vector b
1
= i
(= (1, 0)) as a basis for W. Show that the denition of depends
crucially on the choice of vector b
2
used to expand b
1
to a basis
for R
2
as follows: Select b
2
in two dierent ways and show that for
suitable v R
2
, (v) dened with one choice of b
2
will be dierent
from (v) dened by the other choice of b
2
.
144 CHAPTER 3. VECTOR SPACES
We now come to isomorphisms between vector spaces. In analogy with
ring isomorphisms, vector space isomorphisms capture the notion that the
vector space structures in two spaces are essentially the same without even
having to divide out by any subspace. As with rings, we need a couple of
lemmas rst:
Lemma 3.87. Let V and X be two vector spaces over a eld F and let
f : V X be a linear transformation. Then f is an injective function if
and only if ker(f) is the zero subspace.
Exercise 3.87.1. The proof of this is very similar to the proof of
the corresponding Lemma 2.102 in Chapter 2: study that proof and
write down a careful proof of Lemma 3.87 above.
Our next lemma is analogous to Lemma 2.103 of Chapter 2:
Lemma 3.88. Let V and X be two vector spaces over a eld F and let
f : V X be a linear transformation. Write f(V ) for the image of V
under f. Then f(V ) is a subspace of X.
Proof. We will apply Corollary 3.56 to f(V ). By Lemma 3.77, 0
X
f(V ),
so f(V ) is nonempty. Now take any w
1
, w
2
in f(V ), and any r
1
, r
2
in
F. We wish to show that r
1
w
1
+ r
2
w
2
is also in f(V ). Since w
1
f(V ),
w
1
= f(v
1
) for some v
1
V . Similarly, w
2
= f(v
2
) for some v
2
V . Then,
f(r
1
v
1
+ r
2
v
2
) = f(r
1
v
1
) + f(r
2
v
2
) = r
1
f(v
1
) + r
2
f(v
2
) = r
1
w
1
+ r
2
w
2
, as
desired.
2
Lemma 3.88 above allows us to make a quick denition:
Denition 3.89. Let V and X be two vector spaces over a eld F and let
f : V X be a linear transformation. The rank of f is dened to be the
dimension of f(V ) as an F-vector space.
It is tempting to prove the following two easy results before proceeding
to vector space isomorphisms:
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS145
Lemma 3.90. Let V and X be vector spaces over a eld F and let f : V
X be a linear transformation. Let B be a basis for V . Then the vectors
f(b) [ b B span f(V ).
Proof. Any vector in f(V ) is of the form f(v) for some v V . Since B is a
basis for V , v = r
1
b
1
+ r
n
b
n
for some scalars r
1
, . . . , r
n
and some vectors
b
1
, . . . , b
n
from B. Then, f(v) = r
1
f(b
1
) + + r
n
f(b
n
), showing that
every vector in f(V ) is expressible as a linear combination of the vectors
f(b) [ b B, as desired. 2
Lemma 3.91. Continuing with the notation of Lemma 3.90, assume further
that f is injective. Then the vectors f(b) [ b B form a basis for f(V ).
Proof. With the additional assumption that f is injective, we need to show
that the vectors f(b) [ b B are linearly independent, since we already
know from Lemma 3.90 that they span f(V ). Assume that r
1
f(b
1
) + +
r
n
f(b
n
) = 0
X
for some scalars r
1
, . . . , r
n
and some vectors b
1
, . . . , b
n
from
B. Since f is a linear transformation, the left side is just f(r
1
b
1
+ +r
n
b
n
).
By the injectivity of f, we nd r
1
b
1
+ +r
n
b
n
= 0
V
. But since the b
i
are
linearly independent in V , r
1
, . . . , r
n
must all be zero, showing that the
vectors f(b) [ b B are indeed linearly independent. 2
We now have the following, completely in analogy with rings:
Denition 3.92. Let V and X be vector spaces over a eld F, and let f :
V X be a linear transformation. If f is both injective and surjective, then f
is said to be an isomorphism between V and X. Two vector spaces V and X
are said to be isomorphic (written V

= X) if there is some function f : V X
that is an isomorphism between V and X.
Example 3.93. Any two vector spaces over the same eld F of the same
dimension n are isomorphic. For, if V and W are two vector spaces over
F both of dimension n, and if, say, v
1
, v
2
, . . . , v
n
is a basis for V and
w
1
, w
2
, . . . , w
n
is a basis for W, then the function f : V W dened by
146 CHAPTER 3. VECTOR SPACES
f(v
1
) = w
1
, f(v
2
) = w
2
, . . . , f(v
n
) = w
n
, and f(r
1
v
1
+r
2
v
2
+ +r
n
v
n
) =
r
1
w
1
+ r
2
w
2
+ + r
n
w
n
is an F-linear transformation, by Lemma 3.82.1.
This map is injective: if v = r
1
v
1
+r
2
v
2
+ +r
n
v
n
is such that f(v) = 0, then
this means r
1
w
1
+r
2
w
2
+ +r
n
w
n
= 0, and since the w
i
form a basis for W,
each r
i
must be zero, so v must be zero. Also, f is surjective: clearly, given
any w = r
1
w
1
+r
2
w
2
+ +r
n
w
n
in W, the vector v = r
1
v
1
+r
2
v
2
+ +r
n
v
n
maps to w under f. Thus, f is an isomorphism between V and W.
Remark 3.93.1. If f : V W is an isomorphism between two vector spaces
V and W over a eld F then, since f provides a bijection between V and W,
we may dene f
1
: W V by f
1
(w) equals that unique v V such that
f(v) = w. Clearly, the composite function V
f
W
f
1
V is just the identity
map on V , and similarly, the composite function W
f
1
V
f
W is just the
identity map on W. But more: the map f
1
is a linear transformation from
W to V .
Exercise 3.93.1. If f : V W is an isomorphism, show that the
map f
1
of Remark 3.93.1 above is a linear transformation from W
to V .
The following is analogous to Theorem 2.110 of Chapter 2:
Theorem 3.94. (Fundamental Theorem of Linear Transformations of Vec-
tor Spaces.) Let V and X be vector spaces over a eld F. Let f : V X
be a linear transformation, and write f(V ) for the image of V under f.
Then the function

f : V/ker(f) f(V ) dened by

f(v +ker(f)) = f(v) is
well-dened, and provides an isomorphism between V/ker(f) and f(V ).
Proof. The proof is similar to the proof of 2.110 of Chapter 2. We rst
check that

f is well-dened. Suppose u + ker(f) = v + ker(f). Then
u v ker(f), so f(u v) = f(u) f(v) = 0
X
, so f(u) = f(v). Thus,

f(u +ker(f)) =

f(v +ker(f)), i.e.,

f is well-dened.
Now let us apply Lemma 3.76: We have

f (r
1
(v
1
+ker(f)) +r
2
(v
2
+ker(f))) =

f ((r
1
v
1
+ker(f)) + (r
2
v
2
+ker(f))) =

f ((r
1
v
1
+r
2
v
2
) +ker(f)) = f(r
1
v
1
+
r
2
v
2
) = r
1
f(v
1
) + r
2
f(v
2
) = r
1

f(v
1
+ ker(f)) + r
2

f(v
2
+ ker(f)). Hence

f
is a linear transformation.
3.4. VECTOR SPACE HOMOMORPHISMS: LINEAR TRANSFORMATIONS147
Exercise 3.94.1. Justify all the equalities above.
We check that

f is surjective as a function from V/ker(f) to f(V ). Note
that any element of f(V ) is, by denition, of the form f(v) for some v V .
But then, by the way we have dened

f, we nd f(v) =

f(v + ker(f)), so
indeed

f is surjective.
Finally, we check that

f is injective. Suppose that v+ker(f) is in ker(

f).
Thus,

f(v +ker(f)) = 0
X
. Since

f(v +ker(f)) = f(v), we nd f(v) = 0
X
.
Hence v ker(f). But this means that the coset v +ker(f) equals the coset
ker(f) (why?), so v + ker(f) is the zero element of V/ker(f). Thus

f is
injective.
Putting this together, we nd that

f provides an isomorphism between
V/ker(f) and f(V ).
2
We now study the relation between the dimensions of V , ker(f) and
f(V ) in the case where V is nite-dimensional. But rst, let us state a
consequence of Lemmas 3.90 and 3.91:
Corollary 3.95. Let V and X be vector spaces over a eld F and let f :
V X be a linear transformation. If f is an isomorphism between V and
X, then f sends any basis of V to a basis of X.
Exercise 3.95.1. Convince yourselves that this follows from Lem-
mas 3.90 and 3.91!
We are now ready to prove:
Theorem 3.96. Let V and X be vector spaces over a eld F and let f :
V X be a linear transformation. Assume that V is nite-dimensional.
Then dim
F
(V ) = dim
F
(f(V )) + dim
F
(ker(f)).
Proof. The proof is a combination of Theorem 3.94, Lemma 3.72, and Corol-
lary 3.95. Start with a basis b
1
, . . . , b
m
of ker(f), and expand this to
a basis b
1
, . . . , b
m
, b
m+1
, . . . , b
n
of V . (Thus, dim
F
(ker(f)) = m and
148 CHAPTER 3. VECTOR SPACES
dim
F
(V ) = n.) Then, according to that lemma, the set b
m+1
+ker(f), . . . , b
n
+
ker(f) is a basis for V/ker(f). By Theorem 3.94, the function

f : V/ker(f)
f(V ) dened by

f(v + ker(f)) = f(v) is an isomorphism, so by Corollary
3.95 the set of vectors

f(b
m+1
+ker(f)), . . . ,

f(b
n
+ker(f)) forms a basis
for f(V ). In particular, the dimension of f(V ) must be the size of this set,
which is n m. It follows that dim
F
(V ) = dim
F
(f(V )) + dim
F
(ker(f)).
2
3.5 Further Exercises
Exercise 3.97. Starting from the vector space axioms, prove that the proper-
ties listed in Remark 3.12 hold for all vector spaces. (Hint: You should get ideas
from the solutions to the corresponding Exercise 2.114 of Chapter 2: the proofs
of the rst three properties are quite similar in spirit. As for the last property,
look to f
1
for help!)
Exercise 3.98. Prove that the polynomials 1, 1 + x, (1 + x)
2
, (1 + x)
3
, . . .
also form a basis for R[x] as a Rvector space. (Hint: To show that these
polynomials span R[x], it is sucient to show that the polynomials 1, x, x
2
,
. . . are in the linear span (see Example 3.65 above) of 1, 1 + x, (1 + x)
2
,
(1 + x)
3
, . . . (Why?) The vector 1 is of course in the linear span. Assuming
inductively that the vectors 1, x, . . . , and x
n1
are in the linear span, show that
x
n
is also in the linear span by considering the binomial expansion of (1 +x)
n
.
As for linear independence, suppose that
n

i=0
d
i
(1 + x)
i
= 0. You may assume
that d
n
,= 0 (why?) Now expand each term (1 + x)
i
above and consider the
coecient of x
n
. What do you nd?)
If you nd the hint too computational, you can also establish this result by
invoking Exercise 3.106 ahead and Exercise 2.109.2 in Chapter 2. (However,
note that Exercise 2.109.2 in turn is computational, so this merely shifts all the
computations to a dierent place!)
Exercise 3.99. Show that the matrices e
i,j
and

2e
i,j
(1 i, j 2) form a
basis for M
2
(Q[

2]) considered as a Qvector space. (

2e
i,j
is the 22 matrix
with

2 in the (i, j) slot, and zeros in the remaining slots.) Now discover a
basis for M
2
(C) considered as a vector space over R.
3.5. FURTHER EXERCISES 149
Exercise 3.100. Show that the set of all matrices in M
n
(R) whose trace is
zero is a subspace of M
n
(R) by exhibiting this space as the kernel of a suitable
homomorphism that we have considered in the text. Use Theorem 3.96 to prove
that this subspace has dimension n
2
1. Discover a basis for this subspace.
Exercise 3.101. Let V be an F-vector space. So far, we have considered
individual linear tranformations of the form f : V V ; this exercise deals
with the collection of all such F-linear transformations. Let End
F
(V ) denote
the set of all F-linear transformations from V to V . (End is short for the
word endomorphism, which is another word for a homomorphism from one
(abelian) group to itself, while the subscript F indicates that we are considering
those (abelian) group homomorphisms that are in addition F-linearsee (5) in
Remark 3.75 earlier in this chapter.)
1. Let f and g be two elements in End
F
(V ). Consider the function, sugges-
tively denoted f+g that is obtained by dening (f+g)(v) = f(v)+g(v).
Show that f+g is also an F-linear transformation, and hence is an element
of End
F
(V ).
2. Show that End
F
(V ), with this denition of addition of two linear trans-
formations, is an abelian group. What is the identity element in this
group? How do you dene the inverse with respect to addition of any
f End
F
(V )?
3. Let f g denote the usual composition of functions on V , dened by
(f g)(v) = f(g(v)). Show that f g is also an F-linear transformation,
and hence is an element of End
F
(V ).
4. Show that by thinking of function composition as a multiplication
operation on End
F
(V ), the set (End
F
(V ), +, ) becomes a ring. What
is the multiplicative identity in this ring? Is this ring commutative? (What
if the dimension of V is 1?)
Exercise 3.102. Prove that an element f End
F
(V ) (see Exercise 3.101
above) is invertible if and only if f is an isomorphism. (Hint: For one direction
of this problem, Remark 3.93.1 and Exercise 3.93.1 may be helpful.)
Exercise 3.103. Now that you have shown that End
F
(V ) is a ring in Exercise
3.101 above, here is an example that shows ab = 1 doesnt imply ba = 1 in an
arbitrary ring! (See Denition 2.44 in Chapter 2.)
Let V be a vector space with a countably innite basis v
i
, i Z. (For
example, see Exercise 3.63.2 earlier in this chapter.) Let T be the F-linear
150 CHAPTER 3. VECTOR SPACES
transformation that sends v
i
to v
i+1
for i = 1, 2, . . . , and let S be the linear
transformation that sends v
i
to v
i1
for i = 1, 2, . . . with the understanding
that v
0
means the zero vector. (Why are these linear transformations? See the
remarks on page 155 on how to dene linear transformations between innite-
dimensional spaces.) Show that in the ring End
F
(V ), the product ST = 1 but
the product TS sends v
1
to zero and hence is not 1.
Exercise 3.104. Let V be an F-vector space of dimension n with basis
b
1
, . . . , b
n
. Recall from Example 3.82 how one can assign to each F-linear
transformation T on V the (n n) matrix of T with respect to the basis
b
1
, . . . , b
n
. Write M
T
for the matrix in M
n
(F) that corresponds to T under
this assignment. Study the addition and multiplication operations on End
F
(V )
in Exercise 3.101 above, and prove that the map M : End
F
(V ) M
n
(F) that
sends T to M
T
provides a ring isomorphism between End
F
(V ) and M
n
(F).
Exercise 3.105. Let K/F be a eld extension. By Example 3.7, K may be
viewed as an F-vector space. Assume that the dimension of K as an F-vector
space is n. This exercise shows how K may be realized as a subring of M
n
(F),
thus generalizing Example 2.108 in Chapter 2.
1. For each k K, write l
k
for the map from K to K that sends any x K
to kx. Show that l
k
is an F-linear transformation from K to K.
2. Recall from Exercise 3.101 that End
F
(V ), the set of all F-linear trans-
formations of an F-vector space V , is a ring, under the operation of
composition of functions. In particular, viewing K as an F-vector space,
End
F
(K) is a ring, and the linear transform l
k
of Part (1) above is an
element of this ring. Show that the map l : K End
F
(K) that sends
k K to the linear transform l
k
is an injective ring homomorphism from
K to End
F
(K).
3. Let b
1
, . . . , b
n
K be an F-basis of K. The linear transformation l
k
corresponds to a matrix M
l
k
with respect to the basis b
1
, . . . , b
n
K
(as in Example 3.82). Show that the map from K to M
n
(F) that sends
k to M
l
k
is a ring homomorphism.
(Hint: By Exercise 3.104 above, End
F
(K) is isomorphic to M
n
(F) via
the map M that sends a linear transform T to its matrix M
T
written in
the basis b
1
, . . . , b
n
. Compose the map l : K End
F
(K) with the
map M : End
F
(K) M
n
(F).)
4. Show that this ring homomorphism in (3) above is injective. Conclude
that K is isomorphic to a subring of M
n
(F) using Lemma 2.103 and
Theorem 2.110 of Chapter 2.
3.5. FURTHER EXERCISES 151
The image of K under the homomorphism in (3) above is called the regular
representation of K in M
n
(F).
Exercise 3.106. Let R be a ring containing a eld F, so R is an F-vector
space (see Example 3.8 earlier in this chapter). Let f : R R be a ring
isomorphism that acts as the identity on F (i.e., f(r) = r for all r F). Show
that if B R is an F-basis of R, then the set f(B) = f(b) [ b B is also
an F-basis of R.
Exercise 3.107. Recall from Exercise 2.123 in Chapter 2 that the set S of all
functions from R to R is a ring under the operation of pointwise addition and
multiplication of functions. Since, by that same exercise, the set of constant
functions is a subring of S that is isomorphic to R, S carries the natural structure
of a R-vector space. (Explicitly, the vector space structure is given by the map
R S S that sends (r, f) to the function s
r
f, where s
r
is as in Exercise
2.123. More simply, however, the product of the real number r and the function
f(x) is the function, suggestively denoted r f, dened by (r f)(x) = rf(x).)
1. Which of the following are subspaces of S?
(a) f R [ f(1) = 0
(b) f R [ f(0) = 1
(c) The set of all constant functions.
(d) f R [ f(x) 0 for all x R
2. Show that the set 1, sin
2
(x), cos
2
(x) is linearly dependent.
3. Is the set e
x
, 1, x, x
2
, x
3
, . . . linearly dependent or independent?
Exercise 3.108. Prove Proposition 3.49 without the assumption that V is
nite-dimensional. (See the notes on page 224 in Chapter B in the Appendix
for hints.)
Exercise 3.109. This exercise shows that Lemma 3.72 holds even for innite-
dimensional spaces. Let V be a vector space over a eld F and let W be a
subspace. Let B be a basis for W. Expand this to a basis S of V (see Proposition
3.49, as well as the remarks on page 224 in Chapter B in the Appendix). Write
T for S B (so S is the disjoint union of B and T). Prove that the set (of
equivalence classes of vectors) t +W [ t T is a basis for the quotient space
V/W.
152 CHAPTER 3. VECTOR SPACES
Exercise 3.110. If V is a nite-dimensional vector space and if W is a sub-
space of V , prove that the dimension of W is no bigger than the dimension of
V . Now prove that if the dimension of W and V are equal, then W = V .
Exercise 3.111. Let V be a vector space over a eld F, and let U and W be
two subspaces.
1. Show that U W is a subspace of V . (Is U W a subspace of V ?)
2. Denote by U + W the set u + w [ u U and w W. Show that
U +W is a subspace of V .
3. Now assume that V is nite-dimensional. The aim of this part is to
establish the following:
dim(U +W) = dim(U) + dim(W) dim(U W)
(a) Let v
1
, . . . , v
p
be a basis for UW (so dim(UW) = p). Expand
this to a basis v
1
, . . . , v
p
, u
1
, . . . , u
q
of U, and also to a basis
v
1
, . . . , v
p
, w
1
, . . . , w
r
of W (so dim(U) = p + q and dim(W) =
p + r). Show that the set B = v
1
, . . . , v
p
, u
1
, . . . , u
q
, w
1
, . . . , w
r

spans U +W.
(b) Show that the set B is linearly independent. (Hint: Assume that we
have the relation f
1
v
1
+ + f
p
v
p
+ g
1
u
1
+ + g
q
u
q
+ h
1
w
1
+
+h
r
w
r
= 0. Rewrite this as g
1
u
1
+ +g
q
u
q
= (f
1
v
1
+ +
f
p
v
p
+h
1
w
1
+ +h
r
w
r
). Observe that the left side is in U while
the right is in W, so g
1
u
1
+ + g
q
u
q
must be in U W. Hence,
g
1
u
1
+ + g
q
u
q
= j
1
v
1
+ + j
p
v
p
for some scalars j
1
, . . . , j
p
.
Why does this show that the g
i
must be zero? Now proceed to show
that the f
i
and the h
i
must also be zero.)
(c) Conclude that dim(U +W) = dim(U) + dim(W) dim(U W).
(d) Prove that any two 2-dimensional spaces of R
3
must intersect in a
space of dimension at least 1.
Exercise 3.112. Show that the nth Bernstein Poylnomials B
(n)
i
(x) =
_
n
i
_
x
i
(1
x)
ni
, (i = 0, 1, . . . , n) form a basis for R
n
[x] (n 1) as follows:
1. Show that 1 =

n
i=0
B
(n)
i
.
2. The equation in part 1 above continues to hold if we replace n by n 1
everywhere. (Why?) Make this replacement, multiply throughout by x,
and derive the relation x =

n
i=0
(i/n)B
(n)
i
. (Hint: you will need to use
the relation
_
n1
i1
_
= (i/n)
_
n
i
_
. Why does this last relation hold?)
3.5. FURTHER EXERCISES 153
3. Similarly, for k = 2, . . . , n1, show that x
k
=

n
i=0
(i(i 1) (i k +
1)/n(n 1) (n k + 1))B
(n)
i
.
4. Now conclude that the B
(n)
i
span R
n
[x].
5. Use Proposition 3.51 above to conclude that the B
(n)
i
form a basis.
These Bernstein polynomials nd applications in diverse areas of mathematics,
as well as in various applied elds, such as computer graphics! For instance, in
advanced calculus, they are useful in showing that any continuous function on
an interval [a, b] can be approximated arbitrarily closely by a polynomial func-
tion. (This is known as the Weierstrass Approximation Theorem.) In computer
graphics, they are used to t, through a given set of points, a curve that is
smooth and has minimal wiggle,and as well, to provide convenient handles
by which the user can then control the shape of this curve.
Notes
Remarks on Example 3.5 It is worth remarking that our denition of scalar
multiplication is a very natural one. First, observe that we can consider R to be a
subring of M
n
(R) in the following way: the set of matrices of the form diag(r), as
r ranges through R, is essentially the same as R (see Example 2.106 in Chapter 2).
(Observe that this makes the set of diagonal matrices of the form diag(r) a eld in
its own right!) Under this identication of r R with diag(r), what is the most
natural way to multiply a scalar r and a vector (a
i,j
)? Well, we think of r as diag(r),
and then dene r (a
i,j
) as just the usual product of the two matrices diag(r) and
(a
i,j
). But, as you can check easily, the product of diag(r) and (a
i,j
) is just (ra
i,j
)!
It is in this sense that our denition of scalar multiplication is naturalit arises
from the rules of matrix multiplication itself. Notice that once R has been identied
with the subring of M
n
(R) consisting of the set of matrices of the form diag(r), this
example is just another special case of Example 3.8.
Remarks on Example 3.10 (V, +) remains an abelian group. This does not
change when we restrict our attention to the subeld F. So we only need to worry
about what the new scalar multiplication ought to be. But there is a natural way
to multiply any element f of F with any element v of V : simply consider f as an
element of K, and use the multiplication already dened between elements of K
154 CHAPTER 3. VECTOR SPACES
and elements of V ! The scalar multiplication axioms clearly hold: for any f and g
in F and any v and w in V , we may rst think of f and g as elements of K, and
since the scalar multiplication axioms hold for V viewed as a vector space over K,
we certainly have f (v+w) = f v+f w, (f +g) v = f v+g v, (fg) v = f (g v),
and 1 v = v.
Remarks on Example 3.34 This example is a bit tricky. Why are the e
i
not a
basis? They are certainly linearly independent, since if
n

i=0
c
i
e
i
= 0 for some scalars
c
i
F, then the tuple (c
0
, c
1
, . . . , c
n
, 0, 0, . . . ) must be zero, but a tuple is zero if
and only if each of its components is zero. Thus, each of c
0
, c
1
, . . . , c
n
must be
zero, proving linear independence. However, the e
i
do not span

0
F, contrary to
what one might expect. To understand this, let us look at something that has been
implicit all along in the denition of linear combination. The e
i
would span

0
F
if every vector in

0
F could be written as a linear combination of elements of
the set e
0
, e
1
, e
2
, . . . . Now notice that whenever we consider linear combinations,
we only consider sums of a nite number of terms. Hence, a linear combination of
elements of the set e
0
, e
1
, e
2
, . . . looks like c
i1
e
i1
+ c
i2
e
i2
+ + c
in
e
in
for some
nite n. It is clear that any vector that is expressible in such a manner will have
only nitely many components that are nonzero. (These will be at most the ones
at the slots i
1
, i
2
, . . . , i
n
; all other components will be zero.) Consequently, the
vectors in

0
F in which innitely many components are nonzero (for example,
the vector (1, 1, 1, . . . )), cannot be expressed as linear combinations of the e
i
.
On the other hand, see Exercise 3.63.2.
It is worth pointing out that innite sums have no algebraic meaning. Addition
is, to begin with, a binary operation, that is, it is a rule that assigns to a
1
and a
2
the element a
1
+a
2
. This can be extended inductively to a nite number of a
i
: for
instance, the sum a
1
+a
2
+a
3
+a
4
+a
5
is dened as (((a
1
+a
2
) +a
3
) +a
4
) +a
5
.
(In other words, we rst determine a
1
+a
2
, then we add a
3
to this, then a
4
to what
we get from adding a
3
, and then nally a
5
to what we got at the previous step.)
While this inductive denition makes sense for a nite number of terms, it makes
no sense for an innite number of terms. To interpret innite sums of elements, we
really need to have a notion of convergence (such as the ones you may have seen in
a course on real analysis). Such notions may not exist for arbitrary elds.
3.5. FURTHER EXERCISES 155
Remarks on the proof Theorem 3.70 The reason why the proofs that
(V/W, +) and (R/I, +) are abelian groups are so similar is that what we are essen-
tially proving in both is that if (G, +) is an abelian group and if H is a subgroup,
then the set of equivalence classes of G under the relation g
1
g
2
if and only if
g
1
g
2
H with the operation [g
1
] +[g
2
] = [g
1
+g
2
] is indeed an abelian group in
its own right! We will take this up in Chapter 4 ahead.
Remarks on linear transformations f : V X when V or X are
not necessarily nite-dimensional Similar considerations will apply: we
let S = b

[ B be a basis for V , where B is some index set. Let w

[ B
be arbitrary vectors in X. Every vector in V can be uniquely written as r
1
b
1
+
+r
k
b

k
, where the r
i
are scalars from the eld F and b
1
, . . . , b

k
is some nite
subset of S. Then, just as in the nite-dimensional case, the function f : V X
that sends r
1
b
1
+ r
k
b

k
to r
1
w
1
+ r
k
w

k
is a linear transformation, and all
linear transformations from V to X are given in this way. Let T = c

[ C
be a given basis of X (again, C is some index set). The matrix representation of
f with respect to the basis S of V and T of X is a [B[ [C[ matrix (where [B[
and [C[ are the cardinality of the possibly innite sets B and C), with the rows
indexed by the basis vectors in T and the columns indexed by the basis vectors
in S. The column with index represents the image of b

under f, written as a
column vector, whose entry in the row indexed by is the coecient of c

(in the
expression of f(b

) as a linear combination of the c

). Note that since any vector


is always expressed as a nite linear combination of the basis vectors in C (see the
remarks on Example 3.34 on page 154), each column of the matrix will have only
nitely-many nonzero entries. Conversely, given any [B[ [C[ matrix with entries
in F in which each column has only nitely many nonzero entries, one can dened
a linear transformation f : V X exactly as in Example 3.82, with the column
indexed by corresponding to f(b

).
156 CHAPTER 3. VECTOR SPACES
Chapter 4
Groups
4.1 Groups: Denition and Examples
Of all the algebraic objects that we have considered in this coursegroups,
rings, elds, and vector spacesgroups are technically the most elementary:
they are sets with just one binary operation, and there are just three axioms
that govern them: (i) the binary operation should be associative, (ii) there
should be an identity for this operation, and (iii) every element should have
an inverse with respect to this operation (See Denition 2.2 in Chapter 2).
Yet, we have reserved our study of groups to the last and have started with
rings instead. The primary reason for this is that even if they are technically
more complicated than groups, rings are a much more familiar object to
most students who are seeing abstract algebra for the rst time: after all,
the number systems that we have grown up with and are so intimate
with, namely the integers, the rationals, the reals, and the complexes, are
all examples of rings. Rings are thus, for many, a natural entry point into
algebra. In the same vein, examples like R
2
and R
3
make vector spaces also
a familiar object, and their study is therefore a natural candidate to follow
our study of rings.
However, let neither their elementary denition nor the location of this
157
158 CHAPTER 4. GROUPS
chapter in this book lull you into underestimating the importance of groups:
groups are vitally important in mathematics, and they show up in just about
every nook and corner of the subject. Although this may not be obvious
from the examples that we have seen so far (which have all been groups
of the form (R, +), where R is a ring and + is the addition operation on
the ring, or of the form (R

, ), the set of invertible elements of a ring R


under multiplication), groups are objects by which one measures symmetry
in mathematical objects. Of course, what a mathematician means by sym-
metry is something very abstract, but it is merely a generalization (albeit a
vast one) of what people mean by symmetry in day-to-day contexts. Since
symmetry is so central to mathematics (one view would have it that all of
mathematics is a study of symmetry!), it should come as no surprise that
groups are central to mathematics.
Here is what a mathematician would mean by symmetry. Suppose you
have a set, and suppose the set has some structure on it. To say that the
set has some structure is merely to say that it has some specic feature that
we are focusing on for the moment: the set could have lots of other features
as well, but we will ignore those temporarily. A symmetry of a set with a
given structure is merely a bijective correspondence from the set to itself
which preserves the structure, i.e., a one-to-one onto map from the set to
itself which preserves the feature that we are considering. The set of all
such one-to-one and onto maps whose inverse also preserves the feature that
we are considering constitute a group, which is called the symmetry group
of the set for the given structure, and both the size and the nature of this
group then quantify the kind of symmetry that the set with structure has.
Now this is too advanced for a rst reading, so we will postpone consid-
eration of sets with structure and their symmetry groups to the notes at the
end of the chapter (see Page 206). But rst, let us repeat the denition of
groups from Chapter 2, just so to have the denition within this chapter:
Denition 4.1. (Repeat of Denition 2.2) A group is a set S with a binary
operation : S S S such that
4.1. GROUPS: DEFINITION AND EXAMPLES 159
1. is associative, i.e., a (b c) = (a b) c for all a, b, and c in S,
2. S has an identity element with respect to , i.e., an element id such
that a id = id a = a for all a in S, and
3. every element of S has an inverse with respect to , i.e., for every element
a in S there exists an element a
1
such that a a
1
= a
1
a = id.
To emphasize that there are two ingredients in this denitionthe set S and
the operation with these special propertiesthe group is sometimes written
as (S, ), and S is often referred to as a group with respect to the operation .
Recall from Chapter 2 (Denition 2.3) that an abelian group is one in
which the group operation is commutative, i.e., a b = b a for all a and b
in the group.
Here are some examples of groups other than those that appear as the
additive group of a ring or the multiplicative group of a eld:
4.1.1 Symmetric groups
Example 4.2. Consider the set
3
= 1, 2, 3, and consider one-to-one
and onto maps from
3
to itself: in more common language, such maps are
known as permutations of 1, 2, 3. Let us, for example, write
_
1 2 3
2 3 1
_
for the
permutation that sends 1 to 2, 2 to 3, and 3 to 1 (so we write the image of
an element under the element, we will call this the stack notation). Then it
is easy to see that there are exactly six permutations, and they are listed in
the following table (where we have given a name to each permutation):
id
_
1 2 3
1 2 3
_
r
1
_
1 2 3
2 3 1
_
r
2
_
1 2 3
3 1 2
_
f
1
_
1 2 3
1 3 2
_
f
2
_
1 2 3
3 2 1
_
f
3
_
1 2 3
2 1 3
_
160 CHAPTER 4. GROUPS
Now let us see how these permutations compose. You will observe that
r
1
r
1
takes 1 to 2 under the rst application of r
1
and then 2 to 3 under the
second application of r
1
. Likewise, r
1
r
1
takes 2 to 3 and then to 1, and
similarly, 3 to 1 and then to 2. The net result: r
1
r
1
is the permutation
_
1 2 3
3 1 2
_
, that is, r
1
r
1
= r
2
!
Now play with r
1
f
1
and compare it with f
1
r
1
. When computing
r
1
f
1
, for instance, we observe that 1 goes to 1 under f
1
, and then 1 goes to
2 under r
1
. Computing fully, we nd that r
1
f
1
is the permutation
_
1 2 3
2 1 3
_
,
that is, r
1
f
1
= f
3
. Computing f
1
r
1
similarly, we nd f
1
r
1
= f
2
!
Computing fully, we nd the following table that describes how the six
permutations of S compose:
id r
1
r
2
f
1
f
2
f
3
id id r
1
r
2
f
1
f
2
f
3
r
1
r
1
r
2
id f
3
f
1
f
2
r
2
r
2
id r
1
f
2
f
3
f
1
f
1
f
1
f
2
f
3
id r
1
r
2
f
2
f
2
f
3
f
1
r
2
id r
1
f
3
f
3
f
1
f
2
r
1
r
2
id
Now observe the following: (i) The composition of two permutations of

3
is another permutation of
3
: we computed this out explicitly above, but
we already would have known this from an earlier exposure to functions: if
f : S S and g : S S are functions (here S is some set) and if both f
and g are bijective, then both compositions (g f) and (f g) from S to S
are also bijective, (ii) Composition of functions is an associative operation:
this too would be familiar to us from our earlier exposure to functions: if
f : S S, g : S S, and h : S S are three functions on some set
S, then for any s S, ((f g) h)(s) = (f g)(h(s)) = f(g(h(s))), while
(f (g h))(s) = f((g h)(s)) = f(g(h(s))), so indeed (f g) h = f (g h),
(iii) The permutation id acts as the identity element: this is clear from the
rst row and the rst column of the table above, and nally, (iv) Every
4.1. GROUPS: DEFINITION AND EXAMPLES 161
permutation of S
3
has an inverse: r
1
r
2
= r
2
r
1
= id, id id = id,
f
1
f
1
= id, f
2
f
2
= id and f
3
f
3
= id. Hence, the set of permutations
of
3
forms a group under composition. We denote this group as S
3
, and
call it the symmetric group on three elements. (S
3
can be interpreted as the
set of symmetries of
3
with the trivial structure: see Example 4.88 in the
notes at the end of the chapter.)
Observe something about this group: it is not a commutative group! For
instance, as we observed above, r
1
f
1
= f
3
while f
1
r
1
= f
2
. We say that
the group is nonabelian.
From now on we will suppress the symbol, and simply write fg for
the composition f g. Not only is there less writing involved, but it is
notation that we are used to: it is the notation we use for multiplication.
Continuing the analogy, we write f f as f
2
, and so on, and we sometimes
write 1 for the identity (see Remark 4.14 ahead for more on the notation
used for the identity and the group operation). In this notation, note that
r
3
1
= r
3
2
= 1, f
2
1
= f
2
2
= f
2
3
= 1.
The table such as the one above that describes how pairs of elements in
a group compose under the given binary operation is called the group table
for the group.
Exercise 4.2.1. Use the group table to show that every element
of S
3
can be written as r
i
1
f
j
1
for uniquely determined integers i
0, 1, 2 and j 0, 1.
Example 4.3. Just as we considered the set of permutations of the set

3
= 1, 2, 3 above, we can consider for any integer n 1, the permutations
of the set
n
= 1, 2, . . . , n. This set forms a group under composition,
just as S
2
and S
3
did above.
Denition 4.3.1. The set of permutations of
n
, which forms
a group under composition, is denoted S
n
and is called the sym-
metric group on n elements.
162 CHAPTER 4. GROUPS
Exercise 4.3.1. Write down the set of permutations of the set

2
= 1, 2 and construct the table that describes how the permu-
tations compose. Verify that the set of permutations of
2
forms
a group. Is it abelian? This group is denoted S
2
, and called the
symmetric group on two elements.
Exercise 4.3.2. Compare the group table of S
2
that you get in
the exercise above with the table for (Z/2Z, +) on page 36. What
similarities do you see?
Exercise 4.3.3. Prove that S
n
has n! elements.
Exercise 4.3.4. Find an element g S
n
such that g
n
= 1 but
g
t
,= 1 (see Remark 4.14 ahead on notation for the identity element)
for any positive integer t < n.
Here is an alternative notation that is used for a special class of per-
mutations, which we will call the cycle notation: Working for the sake of
concreteness in
5
, consider the permutation that sends 1 to 3, 3 to 4, and
4 back to 1, and acts as the identity on the remaining elements 2 and 5.
(This is the permutation we have denoted up to now as
_
1 2 3 4 5
3 2 4 1 5
_
.) Notice
the cyclic nature of this permutation: it moves 1 to 3 to 4 back to 1, and
leaves 2 and 5 untouched. We will use the notation (1, 3, 4) for this special
permutation and call it a 3-cycle. In general, if a
1
, . . . , a
d
are distinct
elements of the set
n
(so 1 d n), we will denote by (a
1
, a
2
, . . . , a
d
)
the permutation that sends a
1
to a
2
, a
2
to a
3
, . . . , a
d1
to a
d
, a
d
back to
a
1
, and acts as the identity on all elements of
n
other than these a
i
. We
will refer to (a
1
, a
2
, . . . , a
d
) as a d-cycle or a cycle of length d. A 2-cycle
(a
1
, a
2
) is known as a transposition, since it only swaps a
1
and a
2
and
leaves all other elements unchanged. Of course a 1-cycle (a
1
) is really just
the identity element since it sends a
1
to a
1
and acts as the identity on all
other elements of
n
.
Notice something about cycles: the cycle (1, 3, 4) is the same as (3, 4, 1),
as they both clearly represent the same permutation. More generally, the cy-
cle (a
1
, a
2
, . . . , a
d
) is the same as (a
2
, a
3
, . . . , a
d
, a
1
), which is the same as
4.1. GROUPS: DEFINITION AND EXAMPLES 163
(a
3
, a
4
, . . . , a
d
, a
1
, a
2
), etc. We will refer to these dierent representations
of the same cycle as internal cyclic rearrangements.
Since a d-cycle is just a special case of a permutation, it makes perfect
sense to compose a d-cycle and an e-cycle: it is just the composition of
two (albeit special) permutations. For instance, in any S
n
(for n 3), we
have the relation (1, 3)(1, 2) = (1, 2, 3) (check!). (We will see shortly
Corollary 4.3.1 aheadthat every permutation in S
n
can be factored into
transpositions.)
Exercise 4.3.5. Write the 4-cycle (1, 2, 3, 4) of S
n
(here n is at
least 4) as a product of three transpositions.
Exercise 4.3.6. Show that any k cycle in S
n
(here n k 2)
can be written as the product of k 1 transpositions.
Two cycles (a
1
, . . . , a
d
) and (b
1
, . . . , b
e
) are said to be disjoint if none
of the integers a
1
, . . . , a
d
appear among the integers b
1
, . . . , b
e
and none of
the integers b
1
, . . . , b
e
appear among the integers a
1
, . . . , a
d
. For example,
in S
6
, the cycles s = (1, 4, 5) and t = (2, 3) are disjoint. Notice something
with this pair of permutations: s and t commute! Let us rewrite s and t in
the stack notation and compute:
st =
_
1 2 3 4 5 6
4 2 3 5 1 6
_ _
1 2 3 4 5 6
1 3 2 4 5 6
_
=
_
1 2 3 4 5 6
4 3 2 5 1 6
_
ts =
_
1 2 3 4 5 6
1 3 2 4 5 6
_ _
1 2 3 4 5 6
4 2 3 5 1 6
_
=
_
1 2 3 4 5 6
4 3 2 5 1 6
_
This computation is of course very explicit, but the intuitive idea behind
why s and t commute is the following: s only moves the elements 1, 4,
and 5 among themselves, and in particular, it leaves the elements 2 and 3
untouched. On the other hand, t swaps the elements 2 and 3, and leaves
the elements 1, 4 and 5 untouched. Since s and t operate on disjoint sets
of elements, the action of s is not aected by t and the action of t is not
eected by s. In particular, it makes no dierence whether we perform s
rst and then t or the other way around.
Essentially these same ideas lead to the following:
164 CHAPTER 4. GROUPS
Lemma 4.3.1. Let s and t be any two disjoint cycles in S
n
. Then s and t
commute.
Exercise 4.3.7. Prove this assertion carefully by writing s =
(a
1
, . . . , a
d
) and t = (b
1
, . . . , b
e
) for disjoint integers a
1
, . . . ,
a
d
, b
1
, . . . , b
d
, and writing out the eect of both st and ts on each
integer 1, . . . , n. (See the notes on page 210 for some hints.)
Now let us consider another feature of permutations: it turns out that
any permutation can be decomposed into a product of disjoint cycles! To
take an example, consider the permutation s =
_
1 2 3 4 5 6
3 2 1 6 4 5
_
. Let us take the
element 1 and follow it under repeated action of s: 1 goes to 3 which goes
back to 1. Thus, the eect of s on the subset 1, 3 is to act as a swap, or
a transposition. Now pick another element not equal to either 1 or 3, say 2,
and follow it under repeated action of s: 2 stays untouched. Thus, the eect
of s on the subset 2 is to act as the identity. So now, pick an element
not equal to either 1 or 3 or 2, say 4: we nd 4 goes to 6 which goes to 5
which then goes back to 4. Hence, the eect of s on the subset 4, 5, 6
is to act as the 3-cycle (4, 6, 5). It is now easy to see, either by explicitly
computing, or by using the same intuition as we did above for why disjoint
cycles commute, that s =
_
1 2 3 4 5 6
3 2 1 6 4 5
_
= (4, 6, 5)(2)(1, 3). (Since (2) is just
the identity permutation, it is typically omitted, and this product is written
as (4, 6, 5)(1, 3).
Notice that since disjoint cycles commute, (4, 6, 5)(1, 3) is the same as
(1, 3)(4, 6, 5). Notice, too, that had we started with, for instance, 6 and
followed it around, and then picked 3 and followed it around, we have found
s = (3, 1)(6, 5, 4). Any other decomposition of s into disjoint cycles must
be related to the rst decomposition s = (4, 6, 5)(1, 3) in a similar manner
as these two above: either the cycles could have been swapped, or internally,
a cycle could have been rearranged cyclically (such as (6, 5, 4) instead of
(4, 6, 5)). This is because, the product of disjoint cycles simply follows,
one by one, the various elements of 1 2 3 4 5 6 under repeated action of s,
and no matter in which manner the cycles are written, the repeated action
4.1. GROUPS: DEFINITION AND EXAMPLES 165
of s must be the same.
These same ideas apply to arbitrary permutations, and we have the
following (whose proof we omit because it is somewhat tedious to write in
full generality):
Proposition 4.3.1. Every permutation in S
n
factors into a product of dis-
joint cycles. Two factorizations can only dier in the order in which the
cycles appear, or, within any one cycle, by an internal cyclic rearrangement.
Corollary 4.3.1. Every permutation in S
n
can be written as a product of
transpositions.
Proof. This is just a combination of Proposition 4.3.1 and Exercise 4.3.6
above, which establishes that every cycle can be written as a product of
transpositions. 2
Remark 4.3.1. Unlike the factorization of a permutation into disjoint cycles,
there is no uniqueness to the factorization into transpositions. (For instance,
in addition to the factorization (1, 3)(1, 2) = (1, 2, 3) we had before, we also
nd (1, 2)(3, 2) = (1, 2, 3).) But something a little weaker than uniqueness
holds even for factorizations into transpositions: if a permutation s has two
factorizations s = d
1
d
2
d
l
and s = e
1
e
2
e
m
where the d
i
and e
j
are transpositions, then either l and m will both be even or both
be odd! (The proof is slightly complicated, and we will omit it since this
is an introduction to the subject.) This allows us to dene unambiguously
the parity of a permutation: we call a permutation even if the number of
transpositions that appear in any factorization into transpositions is even,
and likewise, we call it odd if this number is odd.
4.1.2 Dihedral groups
Example 4.4. Consider a piece of cardboard in the shape of an equilat-
eral triangle. Now consider all operations we can perform on the piece of
166 CHAPTER 4. GROUPS
cardboard that do not shrink, stretch, or in anyway distort the triangle, but
are such that after we perform the operation, nobody can tell that we did
anything to the triangle! To help determine what such operations could be,
pretend that the piece of cardboard has been placed at a xed location on a
table, and the location has been marked by lines drawn under the edges of
the cardboard. Also, label the points on the table that lie directly under the
vertices of the triangle as a, b, and c respectively. After we have done our
(yet to be determined!) operation on the cardboard, the triangle should stay
at the same locationotherwise it would be obvious that somebody has done
something to the piece of cardboard. This means that after our operation,
each vertex of the triangle must somehow end up once again on top of one
of the three points a, b, and c marked on the table.
b
c
a
We will refer to our operations as symmetries of the equilateral triangle.
We will also refer to each operation as a rigid motion, because, by not
distorting the cardboard, it preserves its rigidity. Observe that since we are
not allowed to distort the triangle, once we know where the vertices have
gone to under our operation, we would immediately know where every other
point on the triangle would have gone to. For, if a point P is at a distance
x from a vertex A, a distance y from a vertex B and a distance z from
the third vertex C, then the image of P must be at a distance x from the
image of A, a distance y from the image of B and a distance z from the
image of C, and this xes the location of the image of P. (Actually, more
is true: it is sucient to know where any two vertices have gone to under
our operation to know where every point has gone: see Remark 4.5.1 ahead
if you are interested. But of course, if you know where two vertices have
gone, then you automatically know where the third vertex has gone.) Hence,
4.1. GROUPS: DEFINITION AND EXAMPLES 167
it is enough to study the possible rearrangements, or permutations, of the
vertices of the triangle to determine our operations. A key sticking point
is that while every symmetry of the triangle corresponds to a permutation
of the vertices, it is conceivable that not every permutation of the vertices
comes from a symmetry. As it turns out, this is not the case, as we will see
below.
Let us, for example, write
_
a b c
b c a
_
for the permutation of the vertices that
takes whichever vertex that was on the point on the table marked a and
moves it to the point marked b, whichever vertex that was on the point
on the table marked b and moves it to the point marked c, and whichever
vertex that was on the point on the table marked c and moves it to the
point marked a. Notice that since there are three vertices, there are only six
permutations to consider. With this notation, let us consider each of the six
permutations in turn, and show that they can be realized as a symmetry of
the triangle:
1. id =
_
a b c
a b c
_
. This of course corresponds to doing nothing to the tri-
angle. This is a valid operation of the sort that we are seeking: it
is clearly a rigid motion of the triangle (there is no distortion of the
cardboard), and after we have performed this operation, we would not
be able to tell whether anybody has disturbed the triangle or not!
2. =
_
a b c
b c a
_
. This can be realized by rotating the triangle counter-
clockwise by 120

. This is a rigid motion (there is no stretching or


other distortion involved), and of course, after the rotation is over, we
would not be able to tell if the cardboard has been moved.
3.
2
=
_
a b c
c a b
_
. This can be realized as a counter-clockwise rotation
by 240

, or what is the same thing, a clockwise rotation by 120

.
Notice that if were to form the composition , we would arrive at
this permutation, and it is for this reason that we have denoted this
permutation by
2
.
168 CHAPTER 4. GROUPS
4.
a
=
_
a b c
a c b
_
. This can be realized by ipping the triangle about the
line joining the point a and the midpoint of the opposite side bc. This
too is a rigid motion, and after the ip is over, we would not be able
to tell if the cardboard has been moved.
5.
b
=
_
a b c
c b a
_
. This can be realized by ipping the triangle about the
line joining the point b and the midpoint of the opposite side ac. Like

a
, this too is a rigid motion, and after the ip is over, we would not
be able to tell if the cardboard has been moved.
6.
c
=
_
a b c
b a c
_
. This is just like
a
and
b
, and can be realized by ipping
the triangle about the line joining the point c and the midpoint of the
opposite side ab.
Thus, we have obtained all six permutations as symmetries of the trian-
gle! Notice that these six symmetries compose as follows:
id
2

a

b

c
id id
2

a

b

c

2
id
c

a

b

2
id
b

c

a

a

a

b

c
id
2

b

b

c

a

2
id

c

c

a

b

2
id
Notice that we get a group: the composition of any two symmetries is a
symmetry, composition is associative since this is always true for composition
of functions, the element id acts as the identity, and it is clear from the
relations
2
=
2
= id,
a

a
=
b

b
=
c

c
= id that every element has an
inverse. This group is called the dihedral group of index 3 and is denoted
D
3
. (Notice the similarity between this group and the group S
3
of Example
4.2. We will take this up again when we consider isomorphisms later in this
chapter.)
4.1. GROUPS: DEFINITION AND EXAMPLES 169
See Example 4.89 in the notes at the end of the chapter, where D
3
is
interpreted as the group of symmetries of the equilateral triangle with the
rigid structure.
Example 4.5. This example is similar in spirit to the previous example.
We consider a piece of cardboard in the shape of a square. We wish to
determine all operations we can perform on the piece of cardboard that
do not shrink, stretch, or in anyway distort the square, but are such that
after we perform the operation, nobody can tell that we did anything to the
square! To help determine what such operations could be, pretend that the
piece of cardboard has been placed at a xed location on a table, and the
location has been marked by lines drawn under the edges of the cardboard.
Also, label the points on the table that lie directly under the vertices of
the square as a, b, c, and d respectively. After we have done our (yet to be
determined!) operation on the cardboard, the square should stay at the same
locationotherwise it would be obvious that somebody has done something
to the piece of cardboard.
b
c
a
d
We will refer to our operations as symmetries of the square and will refer
to each operation as a rigid motion. Just as in the previous example, each
vertex of the square must somehow end up once again on top of one of the
four points a, b, c, and d marked on the table after the application of a
symmetry. As before, the preservation of the rigidity of the square ensures
that once we know where the vertices have gone to under the application
of a symmetry, we would immediately know where every other point on
the square would have gone to. (In fact, it is enough to know where two
170 CHAPTER 4. GROUPS
adjacent vertices have gonesee Remark 4.5.1 ahead.) Hence, it is enough
to study the possible permutations of the vertices of the square to determine
its symmetries. Unlike the previous example, however, it is not true that
every permutation of the vertices comes from a symmetry.
As before, we write, for example,
_
a b c d
b c d a
_
for the permutation of the
vertices that takes whichever vertex that was on the point on the table
marked a and moves it to the point marked b, the vertex on b to c, the
vertex on c to d, and the vertex on d to a. Notice that since there are
four vertices, there are 4! = 24 permutations to consider (see Exercise 4.3.3
above). With this notation, let us see which of these 24 permutations can
be realized as a symmetry of the square:
1. id =
_
a b c d
a b c d
_
. This of course corresponds to doing nothing to the
square. As with the operation of the previous example that does noth-
ing on the equilateral triangle, this operation on the square is a rigid
motion of the square, and after we have performed this operation, we
would not be able to tell whether anybody has disturbed the square
or not.
2. =
_
a b c d
b c d a
_
. This can be eected by rotating the square counter-
clockwise by 90

. This too is a rigid motion, and after the rotation is


over, we would not be able to tell if the cardboard has been moved.
3.
2
=
_
a b c d
c d a b
_
. This is eected by rotating the square counter-clockwise
(or clockwise) by 180

, and corresponds to the composition . Hence


the name
2
for this symmetry.
4.
3
=
_
a b c d
d a b c
_
. This is eected by rotating the square counter-clockwise
by 270

(or clockwise by 90

), and corresponds to the composition


.
5.
H
=
_
a b c d
b a d c
_
. This corresponds to ipping the square about its hor-
izontal axis (i.e., the line joining the midpoints of the sides ab and
cd).
4.1. GROUPS: DEFINITION AND EXAMPLES 171
6.
V
=
_
a b c d
d c b a
_
. This corresponds to ipping the square about its ver-
tical axis (i.e., the line joining the midpoints of the sides ad and bc).
7.
ac
=
_
a b c d
a d c b
_
. This corresponds to ipping the square about the top-
left to bottom-right diagonal, i.e., the line joining the points a and
c.
8.
bd
=
_
a b c d
c b a d
_
. This corresponds to ipping the square about the
bottom-left to top-right diagonal, i.e., the line joining the points b and
d.
One can check that the remaining permutations of the vertices cannot be
realized by rigid motions. For instance, consider the permutation
_
a b c d
b c a d
_
.
If, for example, vertex A of the square lies on the point a and vertex D of
the square lies on d, then this permutation xes D but moves the vertex A
so that it lies on b. The segment AD therefore is converted from a diagonal
of the square to a side of the square, and is hence shortenedthis is clearly
not a rigid motion!
Dispensing o all remaining permutations similarly, we nd that our
square with its structure of being a rigid object lying at the given location
has just these eight symmetries above.
Remark 4.5.1. Here is another way of seeing that there are only eight sym-
metries: Observe that once we know where a pair of adjacent vertices have
gone under the application of a symmetry, we immediately know where every
other point on the square has gone, because of the rigidity. This is because
if a point P of the square is at a distance x from a vertex A and a distance
y from the adjacent vertex B, then the image of P must be at a distance
x from the image of A and a distance y from the image of B. There is a
unique point on or inside the four lines drawn on the table that satises this
property, and this point will be the image of P.
Now consider a pair of adjacent vertices A and B. After the application
of a symmetry, A can end up in one of four possible locations marked a,
172 CHAPTER 4. GROUPS
b, c, or d that correspond to the four vertices of the square. Moreover, for
the cardboard to not get distorted, B must end up at one of these locations
that is adjacent to A. Hence, once a symmetry has placed A in one of
four locations, there are only two possible locations where the symmetry
could place B: either adjacent to A in the clockwise direction or adjacent
to A in the counter-clockwise direction. Since the symmetries of the square
are determined by where they send the adjacent vertices A and B, we nd
that there are 4 2 = 8 potential symmetries. On the other hand, we
have explicitly exhibited eight distinct symmetries already. Hence the set of
symmetries of the square consists precisely of these eight symmetries above.
It is a fun exercise for you to prove that these symmetries form a group
(see below). Notice that this group is noncommutative:
H
=
H

3
for
example.
Exercise 4.5.1. Create a table that shows how these symmetries
compose and argue, as in Example 4.4 above, why this table shows
that the set of symmetries forms a group.
This group is called the dihedral group of index 4, and is denoted D
4
.
Exercise 4.5.2. Use the group table of D
4
to show that every
element of D
4
can be expressed as
i

j
H
for uniquely determined
integers i 0, 1, 2, 3 and j 0, 1.
Denition 4.5.1. The center of a group is dened to be the set
of all elements in the group that commute with all other elements
in the group. (For instance, the identity element is always in the
center of a group as it commutes with all other elements.)
Exercise 4.5.3. Determine the elements in D
4
that lie in its center.
4.1.3 Cyclic groups
Example 4.6. Notice that the subset 1, 1 of Z endowed with the usual
multiplication operation of the integers is a group!
Question 4.6.1. What similarities do you see between this group
and the group (Z/2Z, +)?
4.1. GROUPS: DEFINITION AND EXAMPLES 173
Question 4.6.2. Let G be any group that has exactly two ele-
ments. Can you see that G must be similar to the group (Z/2Z, +)
in exactly the same way that this group 1, 1 is similar to
(Z/2Z, +)? Now that you have seen the notion of isomorphism
in the context of rings and vector spaces, can you formulate pre-
cisely how any group with exactly two elements must be similar to
(Z/2Z, +)?
We will now generalize Example 4.6.
Example 4.7. Let n 3 be any integer. Recall that the complex num-
ber z
n
= cos(2/n) + sin(2/n) has modulus 1, and is at an angle
n
=
2/n with respect to the positive real axis. DeMoivres theorem ((cos() +
sin())
k
= cos(k) + sin(k) for k = 1, 2, . . . ) shows that the complex
number z
2
n
= cos(4/n) + sin(4/n). This also has modulus 1, but is now
at an angle 2
n
= 4/n. Proceeding, we nd that the complex numbers
1, z
n
, z
2
n
, z
3
n
, . . . , z
n1
n
are evenly spaced around the unit circle, and z
n
n
gives
you back the complex number 1.
The elements z
i
6
are shown below:
1
b
a
d
c
caption
b
a
c
caption
1 = z
6
z
z
2
z
3
z
4
z
5
Cyclic group of order 6
It is easy to see that the set C
n
= 1, z
n
, z
2
n
, . . . , z
n1
n
is a group; it is
known as the cyclic group of order n. (See Lemma 4.29.1 to note that C
n
=
z
n
, with notation as in that lemma. Thus, in the language of Denition
174 CHAPTER 4. GROUPS
4.29.1 ahead, C
n
is the group generated by z
n
. See also Remark 4.32 ahead
as well as Exercise 4.72.1.)
Question 4.7.1. If z
i
n
z
j
n
= z
k
n
for some k with 0 k < n, what
is k in terms of i and j?
Question 4.7.2. If (z
i
n
)
1
= z
j
n
for some j with 0 j < n, what
is j in terms of i?
Question 4.7.3. Consider the group (Z/nZ, +), for a xed integer
n 1. Notice that every element in this group is obtained by adding
[1]
n
to itself various number of times. For instance, [2]
n
= [1]
n
+[1]
n
(which we write as 2 [1]
n
), [3]
n
= [1]
n
+[1]
n
+[1]
n
(which we write
as 3 [1]
n
), etc. What similarities do you see between (Z/nZ, +)
and the group C
n
above? Now that you have seen the notion of
isomorphism in the context of rings and of vector spaces, can you
formulate precisely how (Z/nZ, +) and C
n
are similar?
4.1.4 Direct product of groups
Example 4.8. Let G and H be groups. We endow the cartesian product G
H with the operation (g
1
, h
1
)(g
2
, h
2
) = (g
1
g
2
, h
1
h
2
) (compare with Example
2.22 in Chapter 2). Here, the product g
1
g
2
refers to the operation in G,
while the product h
1
h
2
refers to the operation in H.
Exercise 4.8.1. Verify that with this denition of operation, the
set GH forms a group.
This is known as the direct product of G and H.
Question 4.8.1. What is the identity element in GH? What is
the inverse of an element (g, h)?
Question 4.8.2. If G and H are abelian, must GH necessarily
be abelian? If G H is not abelian, can G or H be abelian? Can
both G and H be abelian?
Exercise 4.8.2. Consider the direct product (Z/2Z, +)
(Z/3Z, +). Show by direct computation that every element of this
group is a multiple of the element ([1]
2
, [1]
3
). What similarities do
you see between this group and (Z/6Z, +)? With your experience
with isomorphisms in the context of rings and vector spaces, can
you formulate precisely how (Z/2Z, +) (Z/3Z, +) and (Z/6Z, +)
are similar?
4.1. GROUPS: DEFINITION AND EXAMPLES 175
4.1.5 Matrix groups
Example 4.9. We know (see Exercise 2.115 in Chapter 2) that the set
of invertible elements of a ring R, denoted R

, forms a group under the


multiplication operation in the ring. In particular, taking R to be M
n
(F)
for a xed eld F, we nd that the set of n n invertible matrices with
entries in F forms a group with respect to matrix multiplication. This is
a very important group in mathematics, and has its own notation and its
own name: it is denoted by Gl
n
(F) and is called the general linear group of
order n over F. Recall (see the parenthetical remarks in Exercise 2.56.1 in
Chapter 2) that a matrix with entries in a eld is invertible if and only if
its determinant is nonzero. Thus, Gl
n
(F) may be thought of as the group
of all n n matrices with entries in F whose determinant is nonzero.
Exercise 4.9.1. Write down the group table for the group of units
of the ring Gl
2
(Z/2Z) (see Exercise 2.56.1 in Chapter 2). What
familiar group is this isomorphic to?
Remark 4.9.1. Recall from Exercise 3.104 in Chapter 3 that if V is an n-
dimensional vector space over F, then the ring of F-linear transformation
from V to V , namely End
F
(V )see Exercise 3.101 of that same chapter
as wellis isomorphic to the ring M
n
(F). As in Exercise 3.104, let M :
End
F
(V ) M
n
(F) be the function that provides this isomorphism (M
depends on a choice of basis for V , but that is not important at this point).
It follows at once that the invertible elements of End
F
(V ) will correspond to
the invertible elements of M
n
(F) bijectively under M. Since the invertible
elements of M
n
(F) are what we have denoted as Gl
n
(F) above, and since
the invertible elements of End
F
(V ) are just those linear transformations
that are both injective and surjective (see Exercise 3.102 in Chapter 3),
we nd that elements of Gl
n
(F) correspond to the injective and surjective
linear transformations of V . But more, if the matrix M
T
corresponds to
the linear transform T and the matrix M
S
to the linear transform S under
this isomorphism, then, since the map M preserves multiplication, we
nd that the product matrix M
T
M
S
corresponds to T S. Although we
176 CHAPTER 4. GROUPS
have not studied the notion of isomorphisms of groupswe will in Section
4.4 ahead)we have enough experience with isomorphisms in the context of
rings and vector spaces by now to realize that Gl
n
(F) and the group of units
of End
F
(V ) are isomorphic as groups.
Many interesting groups arise as subgroups of Gl
n
(F) for suitable n and
F. (Of course, we have not formally dened the term subgroup yetwe
will in Section 4.2 aheadbut we have enough experience with subrings and
subspaces already to know what that term should mean: a subset of a group
G that is closed with respect to the operation in G and forms a group on its
own under this operation.) In fact, every nite group that has n elements
in it occurs as a subgroup of Gl
n
(F), for every eld F. We consider a few
examples of subgroups of Gl
n
(F) ahead. The identity element in all such
groups will be the n n identity matrix. Moreover, the multiplication in
such groups will necessarily be associative, since matrix multiplication is
known to be an associative operation (see Question 2.16.6 in Chapter 2).
Example 4.10. The set of matrices in Gl
n
(F) whose determinant is 1 forms
a group under matrix multiplication, denoted Sl
n
(F), and called the special
linear group of order n over F.
Question 4.10.1. Why does this subset form a group? Check that
the axioms hold.
Example 4.11. Let B
2
(R) be the set of matrices in M
2
(R) of the form
g =
_
a b
0 d
_
where ad ,= 0. Since the determinant of g is precisely ad,
the condition ad ,= 0 shows that g is invertible, i.e., g Gl
2
(R). B
2
(R) is a
group with respect to matrix multiplication.
Question 4.11.1. What does the product of two such matrices g
and h above look like?
Question 4.11.2. What does the inverse of a matrix such as g
above look like?
More generally, consider the subset B
n
(F) of upper triangular matrices in
M
n
(F) whose product of diagonal entries is nonzero. Since the determinant
4.1. GROUPS: DEFINITION AND EXAMPLES 177
of an upper triangular matrix is just the product of its diagonal entries,
B
n
is a subset of Gl
n
(F). B
n
(F) forms a group with respect to matrix
multiplication It is referred to as the upper triangular subgroup of Gl
n
(F)
or the standard Borel subgroup of Gl
n
(F).
Exercise 4.11.1. Prove that the determinant of an upper triangu-
lar n n matrix (with, say, entries in a eld F), is just the product
of its diagonal entries.
Exercise 4.11.2. Show that the product of two upper triangular
matrices is also upper triangular.
Exercise 4.11.3. Show that the inverse of an invertible upper
triangular matrix is also upper triangular.
Example 4.12. Let U
2
(R) be the subset of matrices in M
2
(R) of the form
g
a
=
_
1 a
0 1
_
, where a R. Note that the determinant of g
a
= 1 for all
a. U
2
(R) is a group with respect to matrix multiplication.
Question 4.12.1. What is the product of g
a
and g
b
in terms of a
and b?
Question 4.12.2. What is the multiplicative inverse of g
a
?
Question 4.12.3. What similarity do you see between U
2
(R) and
(R, +)?
Question 4.12.4. View the elements of U
2
(R) as the matrix of
linear transformations of R
2
with respect to the usual basis i and j.
Where do i and j go to under the action of g
a
?
Question 4.12.5. How would any of these calculations above in
Questions (4.12.1) or (4.12.2) or (4.12.4) be changed if we had
restricted a to be an integer? What similarity would you then have
seen between this modied set (with a now restricted to be an
integer) and (Z, +)?
See Exercise 4.79 at the end of the chapter for a generalization to n n
matrices.
178 CHAPTER 4. GROUPS
Example 4.13. As we will see in Exercise 4.78, the set of 2 2 matrices
with entries in R satisfying A
t
A = I, where I is the identity matrix and A
t
stands for the transpose of A, forms a group: it is precisely the group of
symmetries of the set described in Example 4.92 on Page 208. This set of
matrices is indeed a subset of Gl
2
(R), since, as you are asked to prove as
well in Exercise 4.78, the relation A
t
A = I yields that the determinant of A
is 1, and in particular, nonzero. More generally, one can consider the set
of n n matrices A with entries in any eld F satisfying A
t
A = I. This set
will form a group, known as the orthogonal group of order n over F. We
will use the notation O
n
(F) for such groups.
See Remark 4.5 at the end of the chapter as well about more general
orthogonal groups than the one we have introduced above.
Exercise 4.13.1. Prove that the set of n n matrices A with
entries in G satisfying A
t
A = I forms a group under matrix multi-
plication. (Hint: you will need to show that A
t
A = I is equivalent
to AA
t
= I, and from this, that (A
1
)
t
= (A
t
)
1
.)
4.1.6 Remarks and some general properties of groups
Remark 4.14. In an abstract group, several dierent symbols are used to
denote the binary operation, the identity element, and the inverse of an
element. Sometimes, one uses the symbol id for the identity element, as we
have done above for the groups S
3
, D
3
, etc. Sometimes the symbol e is used
for the identity element. Very often, one imagines the group operation to be
some sort of multiplication between group elements (Warning: this is just
an informal way to think about the operationin general, the operation may
not represent any sort of actual multiplication in the sense of multiplication
in rings), and in such cases, one uses the familiar symbol 1 to represent
the identity element. (In such a situation we say that the group is written
in multiplicative notation, or written multiplicatively.) When writing the
group multiplicatively, one simply writes the binary operation without any
symbol, thus, the product of two elements a and b is simply written ab.
4.1. GROUPS: DEFINITION AND EXAMPLES 179
(We have followed this convention already with S
3
, for example.) In the case
where the group is abelian, one often imagines the group operation as some
sort of addition in analogy with the addition operation in rings (Same
Warning: this is just an informal way to think about the operation), and
one writes + for the group operation and 0 for the identity. And continuing
with the analogy, one writes a for the inverse of an element a. (In such a
situation, we say that the group is written in additive notation, or written
additively.)
Before proceeding further, here are some exercises that would be useful.
You would have encountered many of these results already in the context of
the additive group of a ring or a vector space. (For instance, see Remark
2.24, Exercise 2.114, and the notes on page 89 in Chapter 2):
Exercise 4.15. Show that the identity element in a group is unique.
Exercise 4.16. Show that the inverse of an element in a group is
unique.
Exercise 4.17. Show that for any element a in a group G,
(a
1
)
1
= a.
Exercise 4.18. ( Cancellation in Groups) If ab = ac for elements a,
b, and c in a group G, show that b = c (left cancellation). Similarly,
if ba = ca, show that b = c (right cancellation).
Exercise 4.19. Let G be a group, and let a and b be elements in
G. Show that (ab)
1
= b
1
a
1
.
180 CHAPTER 4. GROUPS
Exercise 4.20. Let G be a group written multiplicatively. For any
element a G and for any positive integer j, it is customary to
write a
j
for a a a
. .
jtimes
. Similarly, for any negative integer j, it is
customary to write a
j
for a
1
a
1
a
1
. .
|j|times
. Finally, it is customary
to take a
0
to be 1. Prove the following:
1. If y = a
j
for some integer j, then y
1
= a
j
. (Hint: Compute
a
j
a
j
by invoking the denition of a
j
and a
j
you would of
course have to divide your proof into whether j is positive,
negative, or zero.
2. For integers s and t, prove that a
s
a
t
= a
s+t
. (Hint: First
dispose of the case where either s or t is zero, and then divide
your proof into four cases according to whether s is positive
or negative and t is positive or negative.)
4.2 Subgroups, Cosets, Lagranges Theorem
After our practice with subrings and subspaces, the following concept must
now be quite intuitive:
Denition 4.21. Let G be a group. A subgroup of G is a subset H that
is closed with respect to the binary operation such that with respect to this
operation, H is itself a group.
Exercise 4.22. Let G be a group and let H be a subgroup. Prove
that the identity element of H must be the same as the identity
element of G. (Hint: Write id
G
and id
H
for the respective identities.
Then id
H
id
H
= id
H
. Also, id
G
id
H
= id
H
. So?)
The following lemma allows us to check if a nonempty subset of a group
is a subgroup.
Lemma 4.23. (Subgroup Test) Let G be a group, and let H be a nonempty
subset. If for all a and b in H the product ab
1
is also in H, then H is a
subgroup of G.
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 181
Proof. Since H is nonempty (note that we are invoking the nonemptiness
hypothesis!), H has at least one element in it, call it a. Then, taking b = a
in the the statement of the lemma, we nd aa
1
= e H. Thus, H contains
the identity. Next, given any x H, we take a = e and b = x in the
statement of the lemma to nd that the product ex
1
= x
1
must be in H,
so H contains inverses of all its elements. Finally, given any x and y in H,
note that y
1
must also be in H by what we just saw, so, taking a = x and
b = y
1
, we nd x(y
1
)
1
= xy must be in H. Putting all this together,
we nd that H is closed with respect to the group operation, contains the
identity, and contains inverses of all its elements. Since the associativity of
the group operation is simply inherited from the fact that the operation is
associative on all of G, we nd that H satises all group axioms, and hence,
H is a subgroup of G.
2
Example 4.24. Let G be a group. The subset 1
G
is a subgroup, called
the trivial subgroup.
Exercise 4.24.1. Prove this by applying the subgroup test (Lemma
4.23).
Example 4.25. In the group S
3
, the subset id, r
1
, r
2
is a subgroup, as
are the subsets id, f
1
, id, f
2
, and id, f
3
.
Exercise 4.25.1. Prove these assertions by studying the group
table of S
3
on page 160.
Example 4.26. In the group S
n
of permutations of 1, 2, . . . , n, let H be
the subset consisting of all permutations that act as the identity on n.
Exercise 4.26.1. Prove that H is a subgroup of S
n
using the
subgroup test (Lemma 4.23).
Exercise 4.26.2. Compare H with S
n1
. What similarities do you
see?
182 CHAPTER 4. GROUPS
Example 4.27. The various matrix groups we considered above such as
Sl
n
(F), B
n
(F), O
n
(F), etc., are all subgroups of Gl
n
(F).
Example 4.28. Let G be a group. Recall that we have dened the center
of G (see Denition 4.5.1 ) to be the subset consisting of all elements of G
that commute with every other element of G.
Exercise 4.28.1. Prove that the center of G is a subgroup of G.
Question 4.28.1. What can you say about the center of G when
G is abelian?
4.2.1 Subgroup generated by an element
Example 4.29. Let G be a group, and let a be an element in G. What
would be the smallest subgroup of G that contains a? (By smallest, we
mean smallest with respect to set theoretic inclusion, that is, we seek a
subgroup H of G that contains a such that if K is any other subgroup of
G that contains a, then H K.) Let us write G multiplicatively. Then,
any subgroup H that contains a must contain, along with a, the elements
a a = a
2
, a a
2
= a
3
, . . . , because the subgroup must be closed with respect
to the group operation. It must contain the identity 1 (= a
0
) since it is
a subgroup. Similarly, it must contain the inverse a
1
, and then, it must
contain all products a
1
a
1
= a
2
, a
1
a
2
= a
3
, . . . . We have the
following:
Lemma 4.29.1. The set a = a
n
[ n Z is a subgroup of G. It is
the smallest subgroup of G that contains a, in the sense that if H is any
subgroup of G that contains a, then a H.
Proof. The discussions just before the statement of this lemma show that
if H is any subgroup of G that contains a, then H must contain all a
i
, for
i Z, that is, H must contain a. Thus, we only need to show that a is a
subgroup of G. But this is easy by the subgroup test (Lemma 4.23): a is
nonempty since a is in there. Given any two elements x and y in a, x = a
i
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 183
for some i Z, and y = a
j
for some j Z. Note that y
1
= a
j
(Exercise
4.20). Then xy
1
= a
i
a
j
. Hence (Exercise 4.20 again), xy
1
= a
ij
a,
proving that a is indeed a subgroup. 2
Before proceeding further, we pause to give a name to the object con-
sidered in the lemma above:
Denition 4.29.1. Let G be a group, and let a be an element
in G. The subgroup a is called the subgroup generated by a.
A subgroup H of G is called cyclic if H = g for some g G.
In particular, G itself is called cyclic if G = g for some g G.
Exercise 4.29.1. Let G be a group written multiplicatively, and let
a G. For integers s and t, prove that a
s
a
t
= a
s+t
by mimicking
the proof that (a
j
)
1
= a
j
in the lemma above. (Hint: First
dispose of the case where either s or t is zero, and then divide your
proof into four cases according to whether s is positive or negative
and t is positive or negative.)
In S
3
, for instance, we see that r
1
is the (nite) set id, r
1
, r
2
1
= r
2
.
This is because we need no further powers: r
3
1
= id, so r
4
1
= r
7
1
= = r
1
,
and r
5
1
= r
8
1
= = r
2
1
= r
2
. Similarly, r
1
1
= r
2
, so r
2
1
= r
1
1
r
1
1
= r
2
2
=
r
1
, and from this, we see that all powers r
n
1
(n = 1, 2, . . . ) is one of id, r
1
,
or r
2
.
By contrast, the subgroup 1 of the additive group (Z, +) is all of Z.
This is easy to see: 1 = 1, 1 + 1 = 2, 1 + 2 = 3, . . . , 0, 1, (1) + (1) =
2, (1) + (2) = 3, . . . .
These examples suggest an interesting and important concept:
Denition 4.29.2. Let G be a group (written multiplicatively),
and let a G. The order of a (written o(a))is the least positive
integer n (if it exists) such that a
n
= 1. If no such integer exists,
we say that a has innite order.
We now have the following:
Lemma 4.29.2. Let G be a group and let a G. Then o(a) is nite if
and only if a is a nite set. When these (equivalent) conditions hold,
184 CHAPTER 4. GROUPS
o(a) equals the number of elements in the subgroup a, and if this common
integer is m, then the elements 1, a, . . . , a
m1
, are all distinct, and a =
1, a, . . . , a
m1
.
Proof. Assume that o(a) is nite, say m. Then, any integer l can be written
as bm + q for 0 q < m, so a
l
= a
bm+q
= (a
m
)
b
a
q
= 1 a
q
. Hence, every
power of a can be written as a
q
for some q between 0 and m 1, that is,
a = 1, a, . . . , a
m1
. This shows that a is a nite set.
Now assume that a is a nite set. Then, the powers 1, a, a
2
, . . . cannot
all be distinct (otherwise a would be innite), so there exist nonnegative
integers k and l, with k < l, such that a
k
= a
l
. Multiplying by a
k
= (a
k
)
1
,
we nd 1 = a
lk
. Note that lk is positive. Thus, the set of positive integers
t such that a
t
= 1 is nonempty, since l k is in this set. By the well-ordering
principle, there is a least positive integer s such that a
s
= 1. This shows
that a has nite order, namely s.
Now assume that these equivalent conditions hold. We have already seen
above that if o(a) is nite and equal to some m, then a = 1, a, . . . , a
m1
.
Note that these elements are all distinct, since if a
j
= a
k
for 0 j < k
m1, then, multiplying both sides by a
j
= (a
j
)
1
, we would nd 1 = a
kj
,
and since 0 < k j < m, this would contradict the fact that m is the least
positive integer l such that a
l
= 1. It follows that the number of elements
in a is precisely m, the order of a. 2
The following result is useful, its proof uses an idea that we have already
encountered in the proof of Lemma 4.29.2 above:
Lemma 4.29.3. Let a be an element of a group G and suppose that a
l
= 1
for some integer l. Then the order of a divides l. (In particular, the order
of a is nite.)
Proof. Note that if l is negative, then a
l
= (a
l
)
1
= 1. Hence, the set of
positive integers n such that g
n
= 1 is nonempty, since either l or l is in
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 185
that set. By the Well-Ordering principle, this set has a least element, so
indeed the order of g is nite.
Now suppose the order of a is m. Write l = bm + r for integers b and
r with 0 r < m. Then a
r
= a
l
a
bm
= a
l
(a
m
)
b
= 1, because both a
l
and a
m
equal 1. Since m is the least positive integer n such that a
n
= 1, it
follows that r = 0, i.e., that m divides l. 2
We now prove a result that determines the order of a
d
in terms of the
order of a, in the case where the order of a is nite:
Lemma 4.29.4. Let G be a group and let a G have nite order m. Then,
for any integer d, the order of a
d
equals m/gcd(m, d).
Proof. Let us assume rst that d is positive. Denote the order of a
d
by t.
Thus, the integer t will correspond to the rst time that 1 occurs in the list
a
d
, a
2d
, a
3d
, . . . . By Lemma 4.29.3 above, the integer t will correspond to
the rst time m divides a member of the list d, 2d, 3d, . . . . This member will
then be a common multiple of d and m, and since this is the rst common
multiple in the list, it will the least common multiple of d and m. In other
words, t will be such that dt will be the least common multiple of m and d.
Since dm = gcd(m, d)lcm(m, d) = gcd(m, d)dt, we nd t = m/gcd(m, d), as
desired.
If d is zero or negative, choose a positive integer p so that pm + d 0.
Now observe that a
pm+d
= (a
m
)
p
a
d
= 1
p
a
d
= a
d
. Hence, the order of a
d
is
the same as the order of a
pm+d
, and since pm+d > 0, we may apply the result
of the last paragraph to nd that the order of a
d
= m/gcd(m, pm + d). To
nish the proof, we will show that gcd(m, d) = gcd(m, pm+d). It is enough
to show that the set of common divisors of the two pairs of integers are the
same. But this is easy: if l divides both m and d, then it must also divide
the linear combination pm+d by Lemma 1.2, so l is a common divisor of m
and pm+d. Thus the set of common divisors of m and d is a subset of the
common divisors of m and pm+d. On the other hand, if l divides both m
186 CHAPTER 4. GROUPS
and pm+d, then l must divide the linear combination (p)m+pm+d = d,
so l is a common divisor of m and d. Thus, we have the reverse inclusion:
the set of common divisors of m and pm + d is a subset of the common
divisors of m and d. The two sets are hence equal. 2
We have the following immediately:
Corollary 4.29.1. With a and m as above,
1. If d divides m, then a
d
has order m/d.
2. If d is relatively prime to m, then a
d
has order m, and a
d
= a.
Proof. It follows directly from the lemma that if d divides m then a
d
has
order m/d, and that if d and m are relatively prime, then a
d
has order m.
To see that a
d
= a in the case where d and m are relatively prime, note
that by Lemma 4.29.2, the subgroup a has m elements since a has order
m, and likewise, the subgroup a
d
has m elements since a
d
has order m.
But a
d
is a subset of a, since any power of a
d
is also a power of a. Since
a subset T of a nite set S that has the same number of elements as S must
equal S, we nd that a
d
= a. 2
Here is a quick exercise to show you that cyclic groups can come in
hidden forms!
Exercise 4.29.2. Show that Z/2ZZ/3Z is cyclic. (Hint: What
is the order of the element ([1]
2
, [1]
3
)?) After you work this out, see
Exercise 4.72.1 ahead and Exercise 4.86 at the end of the chapter.
Now let H be a subgroup of a group G, and assume that the number
of elements in G is nite. Then there is a very tight restriction on the
possible number of elements in H (Theorem 4.39 ahead), and we will work
towards understanding this restriction in the next two subsections. First, a
denition:
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 187
Denition 4.30. Let G be a group. The order of G (written o(G)) is the
number of elements in G, if this number is nite. If the number of elements in
G is innite, we say G is of innite order.
Remark 4.31. Do not confuse the order of an element a in a group G with
the order of the group G, these refer to two separate concepts. (All the
same, even though these are separate concepts, we will see (Corollary 4.40
ahead) that the two integers are related.) Note that Lemma 4.29.2 above
says that the order of a equals the order of the subgroup generated by a.
Thus in the special case when G is cyclic, i.e., when G = a for some
a G (See Denition 4.29.1 above), the order of a and the order of the
group G = a are indeed the same integers, even though they arise out of
dierent concepts.
Remark 4.32. Continuing with the special situation at the end of Remark
4.31 above, let G be any cyclic group of order n. Thus G = a for some
a G, and since G has order n, Lemma 4.29.2 shows that the element a
must have order n, and that G = 1, a, . . . , a
n1
. Notice the similarity
with Example 4.7 above. See also Exercise 4.72.1.
4.2.2 Cosets
We have already seen the notion of the coset of a subgroup with respect to
an element before. We saw this in the context of subgroups I of abelian
groups of the form (R, +), where R is a ring and I is an ideal (see page 57).
We also saw this in the context of subgroups W of abelian groups of the
form (V, +), where V is a vector space and W is a subspace (see page 130).
The following should therefore come as no surprise, the only novel feature
is that we need to distinguish between right and left cosets since the group
operation in an arbitrary group need not be commutative:
Denition 4.33. Let G be a group and let H be a subgroup. Given any
a G, the left coset of H with respect to a is the set of all elements of the
form ah as h varies in H, and is denote aH. Similarly, the right coset of H
188 CHAPTER 4. GROUPS
with respect to a is the set of all elements of the form ha as h varies in H, and
is denoted Ha.
Example 4.34. Let us consider an example that will show that indeed left
and right cosets can be dierent. Take G to be S
3
(see Example 4.2), and
let H = f
1
. Since the order of f
1
is 2 (see the group table for S
3
on page
160), f
1
= 1, f
1
by Lemma 4.29.2. Take a to be the element r
1
. Then
the left coset r
1
f
1
= r
1
, r
1
f
1
= r
1
, f
3
(see the group table), while the
right coset f
1
r
1
= r
1
, f
1
r
1
= r
1
, f
2
. Clearly, the left and right cosets
of r
1
with respect to the subgroup f
1
are not equal!
Continuing with this example, let us make a table of all left and right
cosets of f
1
.
a Left Coset af
1
Right Coset f
1
a
id 1, f
1
1, f
1

r
1
r
1
, f
3
r
1
, f
2

r
2
r
2
, f
2
r
2
, f
3

f
1
1, f
1
1, f
1

f
2
f
2
, r
2
f
2
, r
1

f
3
f
3
, r
1
f
3
, r
2

Notice that every coset (left or right) has exactly two elements, which is
the same number as the number of elements in the subgroup f
1
that we
are considering. This will be useful in understanding the proof of Lagranges
theorem (Theorem 4.39) below.
Exercise 4.35. Take G = S
3
and take H = r
1
. Write down all
left cosets of H and all right cosets of H with respect to all the
elements of G. What observation do you make?
The following equivalence relation in Lemma 4.36 below is analogous to
the corresponding equivalence relations for rings (see page 57) and vector
spaces (see page 130), except that once again, we need to distinguish two
cases because the group operation need not be commutative. Note that in
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 189
the case of rings, for example, we dene a b if and only if ab I (where
I is some given ideal). Now note that a b is really a + (b). Thus, in
the group situation, the expression analogous to a + (b) would be ab
1
,
and this is indeed the expression we consider in the lemma below. (And
while a +(b) = (b) +a in the situation of rings, the operation in a group
need not be commutative, so we need to consider the expression analogous
to (b) +a as well, which is b
1
a.)
Lemma 4.36. Let G be a group and H a subgroup. Dene two relations
on G, denoted
L
and
R
, by the following rules: a
L
b if and only
if b
1
a H, and a
R
b if and only if ab
1
H. Then
L
and
R
are
both equivalence relations on G. The equivalence class [a]
L
of an element
a with respect to the relation
L
is the left coset aH, while its equivalence
class [a]
R
with respect the relation
R
is the right coset Ha
Proof. The proof that
L
is an equivalence relation is similar to the proof
of Lemma 2.78 in Chapter 2, except that we have to account for the fact
that the group operation need not be commutative.
To show that a
L
a, simply note that a
1
a = 1 H. To show that
a
L
b implies that b
L
a, note that a
L
b gives (by denition) b
1
a = h
for some h H, and taking inverses of both sides (see Exercise 4.19 above),
we nd (b
1
a)
1
= a
1
b = h
1
. Since h
1
is also in H as H is a subgroup,
we nd a
1
b is in H, which shows that b
L
a. Finally, given a
L
b and
b
L
c, note that (by denition) b
1
a = h
1
and c
1
b = h
2
for some h
1
and
h
2
in H. Then h
2
h
1
= c
1
b b
1
a = c
1
a, and since h
2
h
1
is also in H (as
H is a subgroup), we nd a
L
c as well.
The proof that
R
is an equivalence relation is similar.
To prove that [a]
L
= aH, note that any element b in aH is of the form ah
for some h H. Multiplying by a
1
, we nd a
1
b = h and hence a
1
b H.
This shows that b
L
a. Thus, all elements in aH are in the equivalence
class of a, i.e., aH [a]
L
. For the other direction, take any b [a]
L
. Then
b
L
a, so (by denition) a
1
b = h for some h H. Thus, multiplying both
190 CHAPTER 4. GROUPS
sides by a, we nd b = ah, so b aH. Hence, [a]
L
aH as well.
The proof that [a]
R
= Ha is similar.
2
Note that we immediately have:
Corollary 4.37. Any two left cosets aH and bH are either equal or disjoint.
Similarly any two right cosets Ha and Hb are either equal or disjoint.
Proof. This follows from the fact that aH = [a]
L
and bH = [b]
L
, and the
fact that any two equivalence classes arising from an equivalence relation
are either equal or disjoint. (The proof is identical for right cosets.) 2
Here is a quick exercise:
Exercise 4.38. Show that H is itself a left coset, as well as a
right coset. Now show that the left coset aH equals H if and only
if a H. Similarly, show that the right coset Hb equals H if and
only if b H. More generally, show that left coset aH equals the
left coset bH if and only if a bH if and only if b aH (and
similarly for right cosets).
4.2.3 Lagranges theorem
Theorem 4.39. (Lagranges Theorem) Let G be a group of nite order, and
let H be a subgroup. Then the order of H divides the order of G.
Proof. The crux of the proof is to show that any two left cosets of H have the
same number of elements (recall that we have already seen this phenomenon
in Example 4.34 above: see the table of left and right cosets in that example).
Once we have shown this, it will follow that every coset has o(H) elements
in it, since H itself is one of these left cosets (it is the left coset hH for any
h H, for instance, see Exercise 4.38). From this it is trivial to conclude
that o(H) must divide o(G): since the left cosets are disjoint and their union
is G, and since each left coset has o(H) elements, we would nd that o(H)
4.2. SUBGROUPS, COSETS, LAGRANGES THEOREM 191
times the number of distinct left cosets of H must equal o(G), i.e., o(H)
must divide o(G).
To prove that any two left cosets of H have the same number of elements,
take two left cosets aH and bH. Every element of aH can be written as ah
for some unique h H. For, by denition, every element of aH is already
of the form ah for some h H. We only have to show that h is unique. But
this is clear: if ah = ah

, then by cancellation (see Exercise 4.18), h must


equal h

. Hence, the following function f : aH bH is well dened: take


an element in aH, it is expressible as ah for some uniquely determined h,
so send this element to bh which lives in the left coset bH. This is a one-to-
one function, since if ah
1
= ah
2
, then cancelling a, we would nd h
1
= h
2
.
It is onto as well because every element in bH is of the form bh for some
(uniquely determined) element h in H, and hence, f(ah) = bh. It follows
that the number of elements in aH equals the number of elements in bH.
The rest of the proof of the theorem is as described in the rst paragraph.
(Note that essentially the same proof shows that any two right cosets of
H have the same number of elements.)
2
Here is an immediate corollary containing a result promised in Remark
4.31 above:
Corollary 4.40. Let G be a group of nite order. Then the order of any
element of G divides the order of G.
Proof. By Lemma 4.29.2, the order of any element a equal the order of the
subgroup a generated by a. But, by the theorem above, o(a) divides
o(G). It follows that o(a) divides o(G).
2
Here is is a corollary to Corollary 4.40:
192 CHAPTER 4. GROUPS
Corollary 4.41. Let G be a group of nite order d. Then a
d
= 1 for all
a G.
Proof. Let the order of a be q. We saw in Corollary 4.40 that q[d, so d = mq
for some integer m. Then, a
d
= (a
q
)
m
= 1
m
= 1. 2
One of the prettiest theorems in elementary number theory is Fermats
Little Theorem, which is purely an application of the corollary above.
Theorem 4.42. Let p be a prime, and let a be any integer. Then a
p
a
(mod p).
Proof. If p[a, then clearly p[a
p
, so both a
p
and a are congruent to 0 mod
p. In particular, this means that a
p
a (mod p) for such a. Thus, we only
need to consider the case where p a. In that case, note that [a]
p
= [r]
p
,
where r is one of 1, 2, . . . , p1. Now we have already observed in our earlier
examples (see Example 2.59 in Chapter 2) that Z/pZ is a eld. By Exercise
2.48 of the same chapter, the nonzero elements of a eld form a group under
multiplication. In particular, this means that the nonzero elements of Z/pZ
form a group under multiplication, and this group clearly has order p 1.
So, by the corollary above, [r]
p1
p
= [1]
p
. Multiplying both sides by [r]
p
, we
nd [r]
p
p
= [r]
p
. Since [r]
p
is just [a]
p
, we nd [a]
p
p
= [a]
p
in the ring Z/pZ,
so, back in Z, we nd a
p
a (mod p).
2
4.3 Normal Subgroups, Quotient Groups
Recall how we formed a quotient ring R/I(see page 57) from a ring R and an
ideal I; the elements of R/I were the cosets a+I as a ranged through R, and
the addition and multiplication were dened respectively by (a+I)+(b+I) =
(a +b) +I, and (a +I) (b +I) = (ab) +I. We showed that these rules for
addition and multiplication were well-dened (Lemma 2.80) and then went
4.3. NORMAL SUBGROUPS, QUOTIENT GROUPS 193
on to show (Theorem 2.82) that R/I with these operations was indeed a ring.
Similarly, recall how we formed a quotient space V/W (see page 130) from a
vector space V over a eld F and a subspace W: the elements of V/W were
the cosets u+W as u ranged through V , and the vector addition and scalar
multiplication were dened respectively by (u+W)+(v+W) = (u+v)+W,
and f(u+W) = fu+W. Once again, we observed that these rules for vector
addition and scalar multiplication were well-dened (Exercise 3.69) and then
went on to show that V/W with these operations was indeed a vector space
over F (Theorem 3.70).
We would of course like to mimic these constructions and form a quotient
group G/H from a group G and a subgroup H: we would take the elements
of G/H to be the various (say, left) cosets gH as g ranges through G, and we
would dene the group operation on G/H by aH bH = (ab)H. But when
we carry out this program, we run into a slight problem: in general, the
operation aH bH = (ab)H is not well dened! For, suppose that aH = a

H
and bH = b

H. Viewing aH as a

H and bH as b

H, our desired operation


should yield that aHbH = a

Hb

H = (a

)H. Thus, (ab)H ought to equal


(a

)H whenever aH = a

H and bH = b

H (or put dierently, whenever


a
L
a

and b
L
b

).
In general, this need not happen. For instance, take G = S
3
, and take
H = f
1
. Consider the cosets r
1
f
1
= r
1
, f
3
and r
2
f
1
= r
2
, f
2

(see the table in Example 4.34). Now, it is clear from the table that
r
1
f
1
= f
3
f
1
and that r
2
f
1
= f
2
f
1
. So, the question is: is (r
1
r
2
)f
1
=
(f
3
f
2
)f
1
? The answer is no! We nd that (r
1
r
2
)f
1
= 1f
1
= 1, f
1
,
while (f
3
f
2
)f
1
= r
2
f
1
= r
2
, f
2
.
So how should one x this problem? Let us rst analyze the situation
some more. Since a

= a

1 a

H and since a

H = aH, we nd a

aH,
so a

= ah for some h H. Similarly, b

= bk for some k H. Then


a

= ahbk. If (ab)H ought to equal (a

)H, then a

ought to equal abl


for some l H (see Exercise 4.38). We have gotten a

to look like ahbk,


194 CHAPTER 4. GROUPS
let us massage this a bit and write it as abb
1
hbk. Now, suppose that b
1
hb
is also in H by some miracle, say that b
1
hb = j for some j H. Then,
a

= ahbk = abb
1
hbk = abjk, and of course, jk H as both j and k are
in H. It would follow that if this miracle were to happen, then a

would
look like ab times an element of H, and therefore, abH would equal a

H.
As the example of G = S
3
and H = f
1
above shows, this miracle will
not always happen, but there are some special situations where this will
happen, and we give this a name:
Denition 4.43. Let G be a group. A subgroup H of G is called a normal
subgroup if for any g G, g
1
hg H for all h H.
Remark 4.44. Alternatively, write g
1
Hg for the set g
1
hg [ h H. Then
we may rewrite the denition above as follows: H is said to be normal if
g
1
Hg H for all g G. Note that this is equivalent to requiring that
gHG
1
H for all g G. For, setting y to be g
1
, note that as g ranges
through all the elements of G, y = g
1
ranges through all the elements of G
as well.
Example 4.45. Take G = S
3
again, but this time around, take H = r
1
=
1, r
1
, r
2
. Let us consider the sets g
1
Hg as g ranges through S
3
. We can
obtain the various products by using the group table for S
3
on page 160,
for instance, f
1
1, r
1
, r
2
f
1
1
= f
1
1, r
1
, r
2
f
1
= f
1
f
1
, f
1
r
1
f
1
, f
1
r
2
f
1
=
1, r2, r
1
, etc. Doing so, we obtain the following:
g g1, r
1
, r
2
g
1
id 1, r
1
, r
2

r
1
1, r
1
, r
2

r
2
1, r
1
, r
2

f
1
1, r
2
, r
1

f
2
1, r
2
, r
1

f
3
1, r
2
, r
1

Thus, for each y G, we nd yHy


1
= H (so most denitely, yHy
1

H as needed in Denition 4.44), so indeed H is a normal subgroup of G.


4.3. NORMAL SUBGROUPS, QUOTIENT GROUPS 195
It was not a coincidence in the example above that yHy
1
actually
turned out be equal to H instead of merely being a subset. We have the
following easy result:
Lemma 4.46. Prove that if H is a normal subgroup of G, then indeed
yHy
1
= H for all y G.
Proof. Fix a y G. Since H is normal, we know that yHy
1
H. We wish
to show that H yHy
1
as well. But since H is normal, y
1
H(y
1
)
1
H,
so y
1
Hy H. Thus, for any h H, y
1
hy = k for some k H. We may
rewrite this as h = yky
1
by pre-multiplying by y and post-multiplying by
y
1
. But yky
1
is an element of yHy
1
as k H, so we nd that for each
h H, h yHy
1
. Thus H yHy
1
as desired. 2
There is an immediate corollary to this:
Corollary 4.47. Let G be a group, and let N be a normal subgroup. Then
for any g G, the left coset gN and the right coset Ng are equal.
Proof. Since N is normal, we may apply the lemma above with y = g to nd
gNg
1
= N. Hence, for any n N, we have gng
1
= m for some m N.
Post-multiplying this by g, we nd gn = mg. Thus, gn Ng. Since this
is true for arbitrary n N, we nd gN Ng. For the reverse inclusion,
take y = g
1
in Lemma 4.46 to nd g
1
Ng = N as well. Hence, given any
n N, g
1
ng = m for some m N. Pre-multiplying this by g, we nd
ng = gm, so ng gNH. Since this is true for arbitrary n N, we nd
Ng gN. 2
Remark 4.48. As a result of this corollary, if N is normal in G, we may
simply talk of the cosets of N without specifying whether these are left or
right coset.
Exercise 4.49. Prove the converse of Corollary 4.47: If N is a
subgroup of G such that for every g G, the left coset gN equals
the right coset Ng, then N is normal.
196 CHAPTER 4. GROUPS
Exercise 4.50. Prove that the center of a group (see Denition
4.5.1) is a normal subgroup.
Exercise 4.51. Prove that every subgroup of an abelian group is
normal.
The following is now a consequence of all our discussions:
Lemma 4.52. Let G be a group, and let N be a normal subgroup. Denote by
G/N the set of cosets (see Remark 4.48) of N. Then, the binary operation
dened on G/N by (aN)(bN) = (ab)N is well-dened.
Proof. The proof of this lemma is contained in the discussions just before
Denition 4.43. In fact, it was precisely the analysis of what would make
the operation (aH)(bH) = (ab)H on the (left) cosets of an arbitrary group
H well-dened that led us to the denition of normal subgroups. It would
be a good idea to read that discussion and furnish the proof of this lemma
yourselves.
2
Theorem 4.53. Let G be a group, and let N be a normal subgroup. Then,
the set G/N, with the operations as dened in the statement of Lemma 4.52,
is a group.
Proof. We have observed in Lemma 4.52 that these operations are well-
dened. We have to check if all group axioms are satised.
1. Associativity: Given aN, bN, and cN in G/N, we have (aN)[(bN)(cN)] =
(aN)[(bc)N] = [a(bc)]N = [(ab)c]N (the last equality because of asso-
ciativity in G). On the other hand, [(aN)(bN)](cN) = [(ab)N](cN) =
[(ab)c]N. Hence, (aN)[(bN)(cN)] = [(aN)(bN)](cN).
2. Identity element: The coset N = 1N acts as the identity element. For,
for any aN, we have (aN)(1 N) = (a 1)N = aN, and (1 N)(aN) =
(1 a)N = aN.
4.4. GROUP HOMOMORPHISMS AND ISOMORPHISMS 197
3. Existence of inverses: For any aN, consider the coset a
1
N. We nd
(aN)(a
1
N) = (aa
1
)N = 1 N = N, and similarly, (a
1
N)(aN) =
(a
1
a)N = 1 N = N. Since the coset N is the identity element in
G/N, we nd that the coset a
1
N is the inverse of the coset aN.
This proves that G/N with the operation as above is indeed a group.
2
Denition 4.54. The set G/N with the binary operation dened in the state-
ment of Lemma 4.52 is called the quotient group of G by the normal subgroup
N.
Exactly as with quotient rings and quotient vector spaces, the intuition
behind quotient groups is that it is a group obtained from a group G by
killing or dividing out all elements in a given normal subgroup N. Thus,
G/N is to be thought of as the set of all remainders left after dividing out
by N, endowed with the natural quotient operation of Lemma 4.52.
Exercise 4.55. If the order of G is nite, show that o(G/N) =
o(G)/o(N).
Exercise 4.56. Take G to be D
4
, and take N to be the group
generated by
2
(see Example 4.5 and also Exercise 4.5.2).
1. Prove that N is normal in G.
2. Prove that G/N has order 4.
3. Prove that G/N is abelian.
4. Prove that G/N is not cyclic.
4.4 Group Homomorphisms and Isomorphisms
Having had enough experience with quantifying the fact that sometimes the
ring operations in two given rings may be the same except perhaps for
dividing out by an ideal, or that, sometimes, the vector space operations
in two given vector spaces over a eld may be the same except perhaps
198 CHAPTER 4. GROUPS
for dividing out by a subspace, the following concept should now be very
intuitive:
Denition 4.57. Let G and H be groups. A function f : G H is called a
group homomorphism if f(g)f(h) = f(gh) for all g, h G.
Remark 4.58. Just as with the denitions of ring homomorphisms and vector
space homomorphisms (linear transformations) there are some features of
this denition that are worth noting:
1. In the equation f(g)f(h) = f(gh), note that the operation on the left
side represents the group operation in the group H, while the operation
on the right side represents the group operation in the group G.
2. By the very denition of a function, f is dened on all of G. The
image of G under f, however, need not be all of H (i.e, f need not
be surjective). We will see examples of this ahead (see Example 4.63
and Example 4.64 for instance). However, the image of G under f is
not an arbitrary subset of H: the denition of a group homomorphism
ensures that the image of G under f is actually a subgroup of H (see
Lemma 4.68 later in this section).
3. Note that it is not necessary to stipulate that f(1
G
) = 1
H
since the
property holds automatically, see Lemma 4.59 below.
Lemma 4.59. Let f : G H be a group homomorphism. Then f(1
G
) =
1
H
.
Proof. We have already seen the proof of this in the context of ring homo-
morphisms (Lemma 2.90 in Chapter 2) and of vector space homomorphisms
(Lemma 3.77 in Chapter 3). For completeness, we will prove it again: you
should read this proof and go back and re-read the proofs of the correspond-
ing lemmas on ring homomorphisms and vector space homomorphisms. We
have f(1
G
) = f(1
G
1
G
) = f(1
G
) f(1
G
), so putting this together, we have
4.4. GROUP HOMOMORPHISMS AND ISOMORPHISMS 199
f(1
G
) = f(1
G
) 1
H
= f(1
G
) f(1
G
). Invoking left cancellation (see Exercise
4.18), we nd 1
H
= f(1
G
). 2
We get an immediate corollary to this (see Corollary 2.91 in Chapter 2,
as also Remark 3.81 in Chapter 3):
Corollary 4.60. Let f : G H be a group homomorphism. Then for any
g G, f(g
1
) = (f(g))
1
.
Proof. Since gg
1
= 1
G
, we have f(gg
1
) = f(g)f(g
1
) = f(1
G
) = 1
H
, and
similarly, from g
1
g = 1
G
we nd f(g
1
)f(g) = 1
H
. This shows that f(g)
and f(g
1
) are inverses in H. 2
The following denition should be natural at this point, after your ex-
periences with ring homomorphisms and vector space homomorphisms:
Denition 4.61. Given a group homomorphism f : G H, the kernel of f
is the set g G [ f(g) = 1
H
. It is denoted ker(f).
No surprise here:
Proposition 4.62. The kernel of a group homomorphism f : G H is a
normal subgroup of G.
Proof. Let us prove rst that ker(f) is a subgroup. Since 1
G
ker(f)
(see Lemma 4.59), ker(f) is certainly nonempty. Now that we know it is
nonempty, by Lemma 4.23, it is sucient to show that whenever g and k
are in ker(f), then gk
1
is also in ker(f). First note that by Corollary 4.60,
f(k) and f(k)
1
are inverses of each other in the group H. With this at
hand, we have f(gk
1
) = f(g)f(k
1
) = f(g)(f(k))
1
= 1
H
1
H
= 1
H
(we
have invoked the fact here that both g and k are in the kernel of f so they
get mapped to 1
H
under f). We thus nd gk
1
ker(f) as desired.
To show ker(f) is normal, we need to show that gkg
1
ker(f) for all
g G and all k ker(f). But this is easy: for any g G and k ker(f),
200 CHAPTER 4. GROUPS
f(gkg
1
) = f(g)f(k)f(g
1
) = f(g)1
H
(f(g))
1
= f(g)(f(g))
1
= 1
H
, so
indeed, gkg
1
ker(f).
2
Example 4.63. Given groups G and H, the map f : G H that sends
every g G to 1
H
is a group homomorphism.
Question 4.63.1. Why is this f a group homomorphism? What
is the kernel of f?
Notice that if H has more than just the identity element, then f is not
surjective.
Example 4.64. Let R and S be rings, and let f : R S be a ring homo-
morphism. Then, focusing just on the addition operations in R and S (with
respect to which we know that R and S are abelian groups), the function
f : (R, +) (S, +) is a group homomorphism. In particular, if f is not
surjective as a ring homomorphism (for example, the natural inclusion map
Z Q, see Example 2.97 in Chapter 2), then f is not surjective as a group
homomorphism either.
Example 4.65. Let G and H be groups (see Example 4.8). Dene a func-
tion f : GH H by f(g, h) = h.
Question 4.65.1. Why is this f a group homomorphism? What
is the kernel of f?
Example 4.66. Dene a function f : S
3
1, 1 (see Example 4.6) by
f(r
i
1
f
j
1
) = (1)
j
(see Exercise 4.2.1).
Question 4.66.1. Why is this f a group homomorphism? What
is the kernel of f?
We now come to group isomorphisms. Just as ring isomorphisms capture
the notion that the addition and multiplication in two rings are essentially
the same without even having to divide out by any ideal, and just as vector
space isomorphisms capture the notion that the vector space operations in
4.4. GROUP HOMOMORPHISMS AND ISOMORPHISMS 201
two vector spaces are essentially the same without even having to divide
out by any subspace, group isomorphisms capture the notion that the group
operations in two groups are essentially the same without even having to
divide out by any normal subgroup.
As with rings and vector spaces, we need a couple of lemmas rst:
Lemma 4.67. Let G and H be two groups and let f : G H be a group
homomorphism. Then f is an injective function if and only if ker(f) is the
trivial subgroup 1
G
.
Proof. The proof of this is very similar to the proof of the corresponding
Lemma 2.102 in Chapter 2: let us redo that proof in the context of groups.
Suppose f is injective. Suppose that g ker(f), so f(g) = 1
H
. By Lemma
4.59, f(1
G
) = 1
H
. Since both g and 1
G
map to the same element in H and
since f is injective, we nd g = 1
G
. Thus, the kernel of f consists of just
the element 1
G
, which is precisely the trivial subgroup. Conversely, suppose
that ker(f) = 1
G
. Suppose that f(g
1
) = f(g
2
) for g
1
, g
2
in G. Since f is
a group homomorphism, we nd f(g
1
g
1
2
) = f(g
1
)f(g
1
2
) = f(g
1
)(f(g
2
))
1
(the last equality is because of Remark 4.60), and of course f(g
1
)(f(g
2
))
1
=
f(g
1
)(f(g
1
))
1
= 1
H
. Thus, g
1
g
1
2
ker(f). But ker(f) = 1
G
, so
g
1
g
1
2
= 1
G
, i.e., g
1
= g
2
. Hence, f is injective.
2
Our next lemma is analogous to Lemma 2.103 of Chapter 2 and Lemma
3.88 of Chapter 3:
Lemma 4.68. Let G and H be two groups and let f : G H be a group
homomorphism. Write f(G) for the image of G under f. Then f(G) is a
subgroup of H.
Proof. Note that 1
H
f(G) by Lemma 4.59, so f(G) is nonempty. We can
hence apply Lemma 4.23, so given h and k in f(G), we need to show that
hk
1
is also in f(G). By denition of being in f(G), there exist g and j in G
202 CHAPTER 4. GROUPS
such that f(g) = h and f(j) = k. Note that f(j
1
) = k
1
, by Remark 4.60.
Hence f(gj
1
) = f(g)f(j
1
) = hk
1
, showing that hk
1
f(G). Hence
f(G) is a subgroup of H. 2
We are now ready for
Denition 4.69. Let f : G H be a group homomorphism. If f is both
injective and surjective, then f is said to be an isomorphism between G and H.
Two groups G and H are said to be isomorphic (written G

= H) if there is
some function f : G H that is an isomorphism between G and H.
Here are some examples:
Example 4.70. The function f : S
2
(Z/2Z, +) that sends 1
S
2
to [0]
2
and the element (1, 2) (written in cycle notation) to [1]
2
is an isomorphism
(see Exercise 4.3.2). Verify this!
Example 4.71. The groups S
3
and D
3
are isomorphic.
Question 4.71.1. Compare their group tables on pages 160 and
168. Can you determine a function f : S
3
D
3
that eects an
isomorphism between S
3
and D
3
.
Example 4.72. Recall that Remark 4.32 showed that if G is a cyclic group
of order n, generated by an element g, then g also has order n and that
G = 1, g, . . . , g
n1
.
Exercise 4.72.1. Extend this statement to prove: If G and H are
any two cyclic groups of order n, then G

= H.
Example 4.73. Let G be a cyclic group of order n and H a cyclic group of
order m. If m and n are relatively prime, then the direct product GH is
isomorphic to C
nm
. (See Exercise 4.29.2, as also, Exercise 4.86 at the end
of the chapter.)
Exercise 4.73.1. Prove this by showing rst that if G = g and
H = h, then (g, h) must have order mn. Since mn is also the
order of G H, G H must equal the cyclic subgroup generated
by (g, h). Now use Exercise 4.72.1 above.
4.4. GROUP HOMOMORPHISMS AND ISOMORPHISMS 203
Example 4.74. Recall the group G/N where G = D
4
and N is the subgroup
generated by
2
(see Exercise 4.56).
Exercise 4.74.1. Prove that G/N is isomorphic to (Z/2Z, +)
(Z/2Z, +).
Finally, we have the following:
Theorem 4.75. (Fundamental Theorem of Homomorphisms of Groups.)
Let f : G H be a homomorphism of groups, and write f(G) for the
image of G under f. Then the function

f : G/ker(f) f(G) dened by

f(g ker(f)) = f(g) is well-dened, and provides an isomorphism between


G/ker(f) and f(G).
Proof. The proof is similar to the proofs of the corresponding theorems for
rings (Theorem 2.110 of Chapter 2) and vector spaces (Theorem 3.94 of
Chapter 3). We rst check that

f is well-dened. Suppose that g ker(f) =
h ker(f). Then gh
1
ker(f), so f(gh
1
) = 1
H
. But also, f(gh
1
) =
f(g)f(h
1
) = f(g)(f(h))
1
. Thus, 1
H
= f(g)(f(h))
1
, so f(g) = f(h).
Now we check that

f is a homomorphism. We have

f(g ker(f))

f(h
ker(f)) = f(g)f(h) = f(gh) (as f is a group homomorphism). On the other
hand,

f((gh) ker(f)) = f(gh). Hence

f is a group homomorphism.
We check that

f is surjective: Note that any h f(G) is by denition
of the form f(g) for some g G. It is clear that

f(g ker(f)) = f(g) = h,
so

f is surjective.
Now we check that

f is injective. Suppose that

f(g ker(f)) = 1
H
.
Then, f(g) = 1
H
, so g ker(f). It follows that g ker(f) = ker(f), i.e.
g ker(f) = 1
G/ker(f)
. Hence f is injective.
2
Here is a quick exercise that uses this theorem:
Exercise 4.76. Let G be a group of nite order, and let f : G
H be a surjective group homomorphism. Prove that H also has
nite order, and that the order of H divides the order of G. (Hint:
Combine Exercise 4.55 and the theorem above.)
204 CHAPTER 4. GROUPS
4.5 Further Exercises
Exercise 4.77. You have seen the dihedral groups of index 3 and 4 in the text
(Examples 4.4 and 4.5). The groups D
n
are dened analogously for n 5.
Determine the group table for D
5
and determine its center.
Exercise 4.78. We will determine the group of symmetries of the set in Ex-
ample 4.92 (see Page 208). Recall from Example 3.82 in Chapter 3 that after
xing a basis of R
2
, we can identify the set of all linear transformations of R
2
with M
2
(R).
Let T be a linear transformation that preserves the structure of our set,
and let M
T
be its matrix representation with respect to, say, the standard basis
i, j of R
2
. Then,
M
T
=
_
a b
c d
_
for suitable real numbers a, b, c, and d. We will describe the conditions that a,
b, c, and d must satisfy:
1. By considering the lengths of an arbitrary vector (x, y) before and after
applying T, prove that (ax +by)
2
+(cx +dy)
2
= x
2
+y
2
for all x and y
in R.
2. Show that this relation leads to the following necessary and sucient
conditions for M
T
to represent a symmetry of our set:
(a) a
2
+c
2
= 1
(b) b
2
+d
2
= 1, and
(c) ab +cd = 0
3. Show that the conditions in (2) above are equivalent to the condition
(M
T
)
t
M
T
= I, where I is the identity matrix, and (M
T
)
t
stands for the
transpose of M
T
. Conclude from this that any matrix that satises the
condition in (2) above must have determinant equal to 1.
4. Now assume that M
T
satises these conditions. Observe that this means
that the columns of M
T
are of length 1, and that the two columns are
perpendicular to each other. (Such a matrix is called orthonormal.) We
have thus determined the symmetries of the set in Example 4.92 to be
the set of 2 2 orthonormal matrices with entries in R. Now prove that
this set actually forms a group under matrix multiplication. This group
is known as the orthogonal group of order 2 over R. You should go back
4.5. FURTHER EXERCISES 205
and revisit Example 4.13 as well. In alignment with that example, the
group in this exercise should be denoted O
2
(R).
Exercise 4.79. Here is a group that generalizes Example 4.12. Let U
n
(R)
denote the set of nn upper triangular matrices with entries in R, all of whose
diagonal entries equal 1. Thus, every matrix in U
n
(R) can be expressed as the
sum of the identity matrix I and a strictly upper triangular matrix N. We will
show that U
n
(R) forms a group with respect to multiplication.
1. For any matrix M in M
n
(R), dene the level l of M as follows: the
level of the zero matrix is , and for nonzero matrices M, l(M) =
minj i [ M
i,j
,= 0, where M
i,j
stands for the (i, j) entry of M.
Thus, a matrix is of level 0 if and only if it is upper triangular with at
least one nonzero entry along the diagonal, and it is of level 1 if and
only if it is strictly upper triangular, with at least one nonzero entry along
the super diagonal (the super diagonal is the set of entries that run
diagonally from the (1, 2) slot down to the (n 1, n) slot), etc. Show
that l(MN) l(M)+l(N). Give an example of matrices M and N such
that MN ,= 0, and l(MN) > l(M) +l(N).
2. Conclude that any strictly upper triangular matrix N is nilpotent (see
Exercise 2.122 in Chapter 2).
3. Now show using Parts (1) and (2) above that U
n
(R) forms a group with
respect to matrix multiplication. (You may also want to look at Exercise
2.122 in Chapter 2 for some ideas.)
Exercise 4.80. Let G be a group with an even number of elements. Prove
that G has at least one nonidentity element a such a
2
= 1. (Hint: To say that
a
2
= 1 is to say that a = a
1
. Now pair the elements in the group suitably,
and invoke the fact that the group has an even number of elements.)
Exercise 4.81. Prove that a group G is abelian if and only if the function
f : G G that sends any g to g
1
is a homomorphism.
Exercise 4.82. Prove that a group G is abelian if and only if (gh)
2
= g
2
h
2
for all g and h in G.
Exercise 4.83. The discussions preceding Denition 4.43 established the fol-
lowing: if N is a normal subgroup of G, then the operation on the left cosets of
N determined by (aN)(bN) = abN is well-dened. Prove the converse of this:
if N is a subgroup of G such that the operation on the left cosets of N deter-
mined by (aN)(bN) = abN is well-dened, then N must be normal in G. (Hint:
206 CHAPTER 4. GROUPS
For any g G, consider the product of left cosets g
1
N gN = 1
G
N = N. For
any n N, note that the coset (g
1
n)N is the same as g
1
N (why?). Since
the product is well-dened, (g
1
n)N gN should also equal N. So?)
Exercise 4.84. What is the last digit in 43
99999
? (Hint: Work mod 2 and
mod 5 separately, applying Corollary 4.41 above, and then combine the result.)
Exercise 4.85. By Exercise 4.51, the center Z(G) of a group G is a normal
subgroup of G. Hence, it makes sense to talk of the quotient group G/Z(G).
Prove that if G/Z(G) is cyclic, then G must be abelian, and thus, Z(G) must
equal G.
Exercise 4.86. Let G be a cyclic group of order m and H a cyclic group of
order n. Show that GH is cyclic if and only if gcd(m, n) = 1.
Notes
Remarks on sets with structure and their symmetry groups Recall
from the text (Page 158) that a set with structure is simply a set with a certain
feature that we wish to focus on, and a symmetry of such a set is a one-to-one
onto map from the set to itself that preserves this feature. If f and g are two
such maps, then the composition f g as well as g f will also be one-to-one onto
maps that preserve the feature. (Recall that if f : S S and g : S S are two
functions from a set S to itself, then the composition of f and g, written f g takes
s S to f(g(s)), and similarly, g f takes s to g(f(s)).) Often, if f is such a
feature-preserving map, then f
1
(which exists because f is a bijection) will also
preserve the feature, although, this is not always guaranteed. (See the remarks on
Page 211 later in these notes for some examples where the inverse of a structure-
preserving map is not structure-preserving.) So, if we restrict our attention to
those structure-preserving maps whose inverse is also structure-preserving, then
these maps constitute a group, called the symmetry group of the set with the given
structure.
We consider some examples below of sets with structure and their symmetry
groups:
Example 4.87. The set in question could be any set, such as
3
= 1, 2, 3, with
the feature of interest being merely the fact that it is a set. (This structure is called
the trivial structure.) Of course this particular set has lots of other features (for
4.5. FURTHER EXERCISES 207
example, each element in
3
corresponds to a length on a number linesee Example
4.88 below), but we do not focus on any other feature for the moment. Any one-
to-one onto map from a set such as
3
to itself will certainly preserve the feature
that
3
is a set, so, the symmetries of a set with trivial structure are precisely the
various one-to-one maps from the set to itself. We have already considered this
group in Example 4.2, it is S
3
.
Example 4.88. If we consider instead
3
with the feature that each element
corresponds to a length on a number line1 to the unit length, 2 to twice the unit-
length, and 3 to three times the unit lengththen our symmetry group would be
dierent. Any symmetry f would now have to satisfy the property that if n
3
corresponds to a certain length, then f(n) should also correspond to the same
length. It follows immediately that f(1) has to equal 1: f(1) cannot be 2 or 3 since
1 has unit length while 2 has twice the unit length and 3 has three times the unit
length. Similary f(2) = 2 and f(3) = 3. Hence, f can only be the identity map.
Thus, the symmetry group of
3
with the feature that each element corresponds
to a length on the number line is the trivial group consisting of just the identity.
Example 4.89. The set could be a piece of cardboard cut in the shape of an
equilateral triangle with the feature that it is a rigid object. Again, this set could
have other features (for example, the cardboard could be colored in alternating
horizontal strips of black and whitesee Question 4.89.1 below), but we will ignore
those. The symmetries of this set would be those one-to-one and onto maps f from
the triangle to itself that preserve the rigidity of the triangle, i.e, that do not distort
the cardboard. (Put dierently, if p and q are any two points on the triangle, then
the distance between p and q should be the same as the distance between f(p) and
f(q)). We have seen this group before: it is the group D
3
(Example 4.4).
Question 4.89.1. Pick one edge of the triangle, and refer to its direc-
tion as the horizontal direction. Suppose the piece of cardboard of this
example had, additionally, been colored in alternating horizontal strips of
black and white. Suppose that the total number of strips is odd (and at
least three), so that the strips along the two horizontal edges are both of
the same color and there is at least one strip of the other color. What
would be the symmetries of this new set? What would be the symmetries
if the total number of strips were even, so that the strips along the two
horizontal edges are of dierent color?
208 CHAPTER 4. GROUPS
Example 4.90. Just as in the last example, the set could be a piece of cardboard
cut in the shape of a square, with the structure that it is a rigid object. We have
seen the symmetries of this set, it is D
4
(Example 4.5).
Question 4.90.1. Pick one edge of the square, and refer to its direc-
tion as the horizontal direction. Suppose the piece of cardboard of this
example had, additionally, been colored in alternating horizontal strips of
black and white. What would be the symmetries of this new set?
Example 4.91. The set could be R
2
, and the feature of interest could be the
fact that it is a vector space over the reals. What would be the symmetries of a
vector space that preserve its vector space structure? In fact, what should it mean
to preserve the vector space structure? A vector space is characterized by two
operations, an addition operation on vectors, and a scalar multiplication operation
between a scalar and a vector. We say that a map f : R
2
R
2
preserves its vector
space structure if for any two vectors v and w in R
2
and for any scalar a R, f
respects these operations, i.e., if f sends v to some f(v) and w to some f(w), then
f must send v + w to f(v) + f(w) and av to af(v). We have seen such maps f
before in Chapter 3: f must be a linear transformation of R
2
. Hence, a symmetry
of R
2
with its vector space structure is a linear transformation of R
2
that is both
injective and surjective.
Exercise 4.91.1. Show that if f is a one-to-one onto linear transforma-
tion of R
2
, then the inverse map f
1
is also a linear transformation of
R
2
. (Hence, the inverse also preserves the vector space structure of R
2
.
It follows that the symmetry group of R
2
with its vector space structure is
precisely the set of injective and surjective linear transformations of R
2
.)
Example 4.92. This examples puts further conditions on Example 4.91: We could
take the set to be R
2
as before, with the structure being that it is a vector space
over the reals, and that each vector has a length. (Recall that the length of the
vector (a, b) in R
2
is taken to be

a
2
+b
2
.) The symmetries of this set would be
one-to-one onto linear transformations of R
2
that in addition preserve the length
of a vector. (See Exercise 4.78 at the end of the chapter.)
Example 4.93. This example and the next are central in Galois theory, and you
may wish to postpone them for a future reading. The set could be the eld Q[

2],
and the structure could be that (i) Q[

2] is a eld, and (ii) every element in Q[

2]
satises a family of polynomials over the rationals.
4.5. FURTHER EXERCISES 209
The symmetries of Q[

2] that preserve the fact that it is a eld are those one-


to-one onto maps f : Q[

2] Q[

2] that satisfy the property that for all a and b


in Q[

2], f(a +b) = f(a) +f(b), f(ab) = f(a)f(b) and f(1) = 1: in other words, f
must also be a ring homomorphism. (Note that once f satises the property that
it is a ring homomorphism, the relations ab = 1 will mean that f(a)f(b) = 1, so
pairs of multiplicative inverses will go to pairs of multiplicative inverses under f.
Thus, the essential character of Q[

2] that gives it the structure of not just a ring


but a eld will automatically be preserved.) Put dierently, we nd that f must
be a ring isomorphism from Q[

2] to Q[

2] if it is to preserve the eld structure


of Q[

2].
As for the second feature, we say that a one-to-one onto map f : Q[

2] Q[

2]
preserves the minimal polynomial over the rationals if any a Q[

2], the element


f(a) satises the same minimal polynomial over the rationals as a. Now take an
arbitrary a Q (note). It will have minimal polynomial x a (why?). Hence
f(a) must also have minimal polynomial x a if the minimal polynomial is to be
preserved. In particular, f(a) must satisfy f(a)a = 0, i.e, f(a) must equal a. Since
this is true for arbitrary a Q, we nd that any symmetry of Q[

2] that preserves
the eld structure and the minimal polynomial over the rationals must be a ring
isomorphism that act as the identity map on the rationals. Moreover, it is easy to see
that any ring isomorphism from Q[

2] to Q[

2] that is the identity on the rationals


necessarily preserves the minimal polynomial over the rationals of an arbitrary
a Q[

2]: if a satises the polynomial p(x) = x


t
+q
t1
x
t1
+ +q
1
x +q
0
with
coecients in Q, then applying f to the equation a
t
+q
t1
a
t1
+ +q
1
a+q
0
= 0 and
using the fact that f is a ring homomorphism that is the identity on the rationals,
we nd that f(a)
t
+ q
t1
f(a)
t1
+ + q
1
f(a) + q
0
= 0, i.e., f(a) also satises
p(x). Conversely, if f(a) satises some monic polynomial q(x) with coecients in
the rationals, then applying f
1
, we nd that a also satises q(x). In particular,
since a and f(a) satisfy the same monic polynomials with rational coecients, they
must both have the same minimal polynomial over the rationals.
Thus, the symmetries of Q[

2] that preserve both the eld structure and the


minimal polynomial of elements must be ring isomorphisms from Q[

2] to Q[

2]
which act as the identity on Q. But by Exercise 2.137 in Chapter 2, any ring
homomorphism from Q to Q[

2] must automatically be the identity map on the


rationals, so this extra condition is not necessary. It follows that the symmetries
210 CHAPTER 4. GROUPS
of Q[

2] that preserve both the eld structure and the minimal polynomial of
elements over the rationals are precisely the set of ring isomorphisms from Q[

2]
to itself.
Exercise 4.93.1. Using the ideas developed in these remarks and using
Exercise 2.109.1 in Chapter 2, prove that the only non-trivial symmetry of
Q[

2] with the structure above is the familiar conjugation map that sends
each a+b

2 (a, b, in the rationals) to ab

2. Hence, there are precisely


two symmetries of this set: the do nothing symmetry, that is, the identity
map on Q[

2], and this conjugation map.


Example 4.94. (This is also from Galois theory, and as with the previous exam-
ple, you may wish to postpone this for a future reading.) The set could be the
eld Q[

2,

3], and the structure could be that (i) Q[

2,

3] is a eld (see Exer-


cise 2.119 in Chapter 2), and (ii) every element in Q[

2,

3] satises a family of
polynomials over the eld Q[

2].
The same considerations as in Example 4.93 above apply: The symmetries of
Q[

2,

3] that preserve the eld structure are precisely the set of ring isomor-
phisms from Q[

2,

3] to itself. The symmetries which also preserve the minimal


polynomial of elements over Q[

2] can be determined exactly as above: these must


be the ring isomorphisms from Q[

2,

3] to itself that act as the identity on Q[

2].
(Note that unlike the previous example, it is not true that every ring isomorphism
from Q[

2,

3] to itself acts as the identity on Q[

2].)
Exercise 4.94.1. Using the ideas developed in these remarks and using
Exercise 2.138 in Chapter 2, prove that the only non-trivial symmetry of
Q[

2,

3] with the structure above is the map that sends a + b

2 +
c

3 + d

6 to a + b

2 c

3 d

6 (here a, b, c, and d are rational


numbers). Thus, there are precisely two symmetries of this set. Note,
however, that Exercise 2.138 of Chapter 2 shows that there are other ring
isomorphisms from Q[

2,

3] to itself: these however do not act as the


identity on Q[

2].
Remarks on Exercise 4.3.7 Here is how you may start this exercise: Let j
be an integer in the set
n
. Let us consider the case where j is one of the bs, say
j = b
k
, for some k. Then t(j) = b
k+1
, where the subscript is taken modulo e so as
to lie in the set 1, 2, . . . , n. Hence st(j) would be s(b
k+1
). Now, because s and t
are disjoint cycles, b
k+1
will not appear among the as, and hence, s(b
k+1
) would
equal b
k+1
. Now work out ts(j) for this particular case. Next consider the case
where j is not one of the bs and work out the details.
4.5. FURTHER EXERCISES 211
Remarks on structure-preserving maps forming a group It is not
always true that the inverse of a structure preserving map also preserves the struc-
ture. Typically this is so, but occasionally this is not the case. It is for this reason
that we only consider symmetries of a set whose inverse also preserves the given
structure when viewing the symmetries as a group. Here is an example:
We consider the real numbers with its dierentiable structure. What this
means is that there exists a notion of dierentiability of functions R R. A
symmetry of R with its dierential structure would be a one-to-one onto function
f : R R that preserves this dierentiable structure. This means that f should
satisfy the condition that for any dierentiable map g : R R, the composite g f
must also be dierentiable. A necessary and sucient condition for this to happen
is that f itself must be dierentiable. It is now easy to nd bijections f : R R
that are dierentiable, but whose inverse is not dierentiable. One example is the
function f(x) = x
3
. It is dierentiable at all values of x, but its inverse function
f
1
(x) = x
1/3
fails to be dierentiable at x = 0.
Remarks on orthogonal groups Orthogonal groups come in more guises
than the one we have described in Example 4.13. Recall the origins of the n = 2 case
over R that we exhibited in Exercise 4.78: the group O
2
(R) is the set of symmetries
of R
2
with the structure that it is a vector space over the reals, and that every
vector has a length (Example 4.92). Now let us examine length more closely.
The length of a vector pi +qj is dened to be
_
p
2
+q
2
. Temporarily ignoring the
square root (we will put it back later), the squared-length of a general vector xi +yj
is thus given by x
2
+y
2
. This is an example of a quadratic forma polynomial
all of whose monomials are of degree 2 inin two variables. Now note that the
polynomial x
2
+y
2
can be written as
(x, y)
_
1 0
0 1
_
(x, y)
t
(Here, recall from elementary linear algebra that the product of a row vector (s, t)
and a column vector (p, q)
t
is given by sp+tq. Thus, since
_
1 0
0 1
_
(x, y)
t
is just
(x, y)
t
, the product above becomes (x, y)(x, y)
t
= x
2
+y
2
, as claimed.)
Mathematicians have found it useful to dene length dierently as well (we will
see a famous example of this ahead). More generally, let q = ax
2
+ 2bxy + cy
2
be
212 CHAPTER 4. GROUPS
any quadratic form with coecients a, b, c from the reals. (It is convenient to write
the coecient of xy as 2b.) Then, q may be written as
(x, y)
_
a b
b c
_
(x, y)
t
(Check this! Notice how the fact that we wrote the coecient of xy as 2b allows
us to write the (1, 2) and (2, 1) entries of the matrix above as b. Had we taken
the coecient of xy as b, then these entries would have had to be b/2.) Using this
quadratic form, we dene the q-length of a vector pi+qj as
_
ap
2
+ 2bpq +cq
2
. (The
length may well turn out to be imaginarythe quantity under the square root sign
may be negativebut that only makes matters more interesting!) Moreover, we
dene the q-dot product of two vectors si +tj and pi +qj to be asp+b(sq +tp)+ctq.
Writing M
q
for the matrix
_
a b
b c
_
above, we nd that the q-length of pi +qj is
given by
(p, q)
_
a b
b c
_
(p, q)
t
,
and the q-dot-product of si +tj and pi +qj is given by
(s, t)
_
a b
b c
_
(p, q)
t
,
The matrix M
q
allows us to compute q-lengths and q-dot products; notice that
M
q
is a symmetric matrix (the (1, 2) entry and (2, 1) entry are equal).
Given an arbitrary quadratic form q in two variables with coecients in the
reals, we may now consider the symmetries of R
2
with q-structure: this is the
structure that R
2
is a vector space over the reals, and that every vector has a q-
length. The symmetries then would be one-to-one onto linear transformations of
R
2
that in addition preserve q-length. These symmetries form a group that we will
denote O
2
(R, q). It is called the orthogonal group of q over R.
Exercise 4.95. Prove that O
2
(R, q) consists of those 2 2 matrices A
with entries in R satisfying A
t
M
q
A = M
q
.
Exercise 4.96. Given a one-to-one onto linear transform T, let us say
that it satises Property (1) if the q-length of T(v) is the same as the
q-length of v for all v in R
2
. Let us say that it satises Property (2) if
the q-dot product of T(v) and T(w) is the same as the q-dot product of
v and w, for all v and w in R
2
. Show that T satises Property (1) if and
only if it satises Property (2).
4.5. FURTHER EXERCISES 213
More generally, an arbitrary quadratic form q in n variables x
1
, x
2
, . . . , x
n
over a eld F is a polynomial in these variables with coecients in F, all of whose
monomials are of degree 2. As long as 2 ,= 0 in this eld (so we rule out elds
like Z/2Z), we may form a symmetric nn matrix M
q
as above, where the entries
in the slots (i, j) and (j, i) both equal half the coecient of x
i
x
j
in the quadratic
form q. (We have to impose the 2 ,= 0 condition, because otherwise, we would not
be able to divide by 2!) The set of nn matrices A satisfying A
t
M
q
A = M
q
forms
a group O
n
(F, q), referred to as the orthogonal group of q over F.
Perhaps the most famous example of the length of vectors in R
n
being measured
by quadratic forms other than x
2
1
+ x
2
2
+ + x
2
n
is given by Einsteins theory
of relativity. There, space-time is considered as a four dimensional space, and
the length of the vector (t, x, y, z)
t
, where t is the time coordinate and x, y, and
z are the usual spatial coordinates, is given by
_
t
2
x
2
y
2
z
2
. (Actually,
this is a drastic simplication: space-time is not really a vector space but a four-
dimensional manifold, and the length formula above applies on the tangent spaces
which are actual vector spacesbut that is too mathematically advanced for now.)
This quadratic form t
2
x
2
y
2
z
2
has associated symmetric matrix
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
_
_
The associated orthogonal group is called the Lorentz group.
214 CHAPTER 4. GROUPS
Appendix A
Sets, Functions, and
Relations
We review here some basic notions that you would have seen in an earlier course
on proofs or on discrete mathematics.
A set is simply a collection of objects. We are of course being informal here:
there are more formal denitions of sets that are based on various axioms designed
to avoid paradoxes, but we will not go into such depths in this appendix. If A
is a set, the objects whose collection make up the set A are also referred to as
the elements of A. You will be familiar with both notations for sets: the explicit
notation, such as A = 2, 3, 5, 7 as well as the implicit or set-builder notation,
such as A = n [n is a prime integer between 2 and 10. You will also be familiar
with the notation for element of.
If A and B are two sets, we say A is a subset of B (written A B) if x A
implies x B. If A B and B A, we say A = B. If A B but A ,= B, we say
that A is a proper subset of B, and we write A B.
The union of two sets A and B, denoted A B, is simply the set x [ x
A or x B. The intersection of two sets A and B, denoted A B is the set
x [ x A and x B. The dierence of two sets A and B, denoted AB, is the
set x [ x A and x , B. (Note that in general, AB ,= B A.)
A function f from A to B (written f : A B) is a rule that assigns to each
element of A a unique element of B. A function f : A B is called injective or
215
216 APPENDIX A. SETS, FUNCTIONS, AND RELATIONS
one-to-one if f(a
1
) = f(a
2
) for a
1
, a
2
A implies that a
1
= a
2
(or alternatively, if
a
1
,= a
2
, then f(a
1
) ,= f(a
2
)). A function f : A B is called surjective or onto if
for each b B, there exists a A such that f(a) = b. A function f : A B that
is both injective and surjective is said to be bijective; also, f is said to provide a
bijection or a one-to-one correspondence between A and B.
Example A.1. Consider the following functions from the integers to itself:
1. f(n) = 2n.
2. g(n) =
_
_
_
n, if n is odd
n/2, if n is even
3. h(n) = n
2
+ 1.
4. b(n) = n + 1.
Then f is injective but not surjective, g is surjective but not injective, h is
neither injective nor surjective, and b is bijective.
The Cartesian Product of two sets A and B, denoted A B, is simply the set
of all ordered pairs (a, b) [ a A, b B. A relation on a set A is simply a subset
of AA. Let R be a relation on a set A. If (a, b) R, we say a is related to b and
we often write a R b to indicate that a is related to b under the relation R. The
relation R is said to be reexive if for each a A, a R a. R is said to be symmetric
if whenever a R b, then b R a as well. Finally, R is said to transitive if whenever
a R b and b R c, then a R c as well.
A relation R on a set A that is reexive, symmetric, and transitive is called
an equivalence relation on A. For any a A, let us write [a] for the set of all
elements of B that are related to a, that is, [a] = b [ a R b. The set [a] is called
the equivalence class of a. We have the following: if R is an equivalence relation
on A, then for any two elements a and b in A, either [a] = [b] or else, [a] and [b]
are disjoint. In particular, this means that the equivalence classes divide A into
disjoint sets of the form [a], whose union is all of A.
The symbol is often used instead of R to denote a relation on a set.
Example A.2. The easiest and most central example perhaps of an equivalence
relation on a set is the relation on Z dened by saying that m is related to n
(or m n) i m n is even. Convince yourself that this relation is indeed an
217
equivalence relation, and that there are precisely two equivalence classes: the class
[0] and the class [1].
A binary operation on a set A is simply a function f : A A A. As we
have seen, the usual operations of addition and multiplication in, for example, the
integers, are just binary operations on Z, that is, functions Z Z Z.
Question A.3. Is division a binary operation on the rationals? How
about on the set Q0?
A set A is said to be countable if there exists a one-to-one correspondence
between A and some subset of N. If no such correspondence exists, then A is said
to be uncountable. If there exists a one-to-one correspondence between A and the
subset 1, 2, . . . , n of Z (for some n), then A is said to be nite. If no such n Z
exists for which there is a one-to-one correspondence between A and 1, 2, . . . , n,
then A is said to be innite. Note that an innite set can be either countable or
uncountable.
Example A.4. Any set with a nite number of elements is countable, by denition
of niteness and countability.
Example A.5. Any subset of a countable set is also countable.
Example A.6. The integers are countable. One one-to-one correspondence be-
tween Z and N is the one that sends a to 2a if a 0, and a to 2(a) 1 if
a < 0.
Example A.7. The Cartesian product of two countable sets is countable. Here
is a sketch of a proof when both A and B are innite. There exists a one-to-
one correspondence between A and N (why?), and turn, there exists a one-to-one
correspondence between N and the set 2
n
n N. Composing, we get a one-to-one
correspondence f between A and the set 2
n
n N. Similarly, we have a one-to-
one correspondence g between B and the set the set 3
n
n N. Now dene the
map h : AB N by h(a, b) = f(a)g(b), and show that this is a bijection.
Example A.8. The rationals Q are countable. This is because we may view
Q Z Z by identifying the rational number a/b, written in lowest terms, with
the ordered pair (a, b). By Example A.7 above, Z Z is countable, and hence by
Example A.5, Q is also countable.
Example A.9. The reals R are uncountable. The proof of this is the famous
Cantor diagaonalization argument.
218 APPENDIX A. SETS, FUNCTIONS, AND RELATIONS
Appendix B
Partially Ordered Sets, and
Zorns Lemma
A Partial Order on a set S is a relation on S that is reexive, antisymmetric
(i.e., a b and b a imply that a = b), and transitive. Here are two examples:
Example B.1. Dene a relation on the positive integers by the rule m n
if and only if m divides n. Since m[m for all positive integers, is reexive. Since
m[n and n[m imply m = n (recall that we are only allowing positive integers in our
set), our relation is indeed antisymmetric. Finally, if m[n and n[q, then indeed
m[q, so is transitive.
Example B.2. Let S be a nonempty set, and write T for the set of all proper
subsets of S. Dene a relation on T by dening X Y if and only if X Y .
You should be able to verify easily that is a partial order on T.
This partial order could also have been dened on the set of all subsets of S,
we chose to dene it only on the set of proper subsets to make the situation more
interesting (see Example B.4 ahead, for instance)!
Given a partial order on a set, two elements x and y are said to be com-
parable if either x y or y x. If neither x y or y x, then x and y are
said to be incomparable. For instance, in Example B.1, 2 and 3 are incomparable,
since neither 2[3 nor 3[2. Similarly, in the set of all proper subsets of, say, the set
1, 2, 3, the subsets 1, 2 and 1, 3 are incomparable, since neither of these sets
is a subset of the other.
219
220 APPENDIX B. PARTIALLY ORDERED SETS, ZORNS LEMMA
Given a partial order on a set S, and given a subset A of S, an upper bound
of A is an element z S such that x z for all x A.
Example B.3. In Example B.1, if we take A to be the set 1, 2, 3, 4, 5, 6, then
lcm(1, 2, 3, 4, 5, 6) = 60 is an upper bound for A.
Note that not all subsets of S need have an upper bound. For instance, if we
take B in this same example to be the set of all powers of 2, then there is no integer
divisible by 2
m
for all values of m, so B will not have an upper bound.
Given a partial order on a set S, a maximal element in S is an element x
such that for any other element y, either y x or else x and y are incomparable.
Example B.4. In Example B.2, suppose we took S = 1, 2, 3, so
T = , 1, 2, 3, 1, 2, 1, 3, 2, 3.
Then 1, 2 is maximal: each of 1, 2,, and 2 are 1, 2, while 1, 3and
2, 3 cannot be compared with 1, 2.
Of course, these same arguments show that 1, 3and 2, 3 are also maximal
elements.
Note that instead, we had taken T to be the set of all subsets of 1, 2, 3, then
there would only have been one maximal element, namely 1, 2, 3, , and all other
subset X would have satised X 1, 2, 3. Having several maximal elements
incomparable to one another is certainly a more intriguing situation!
A partial order on a set that has the further property that any two elements
are comparable is called a linear order. For example, the usual order relation on R
is a linear order.
Given a partial order on a set S, a chain in S is a nonempty subset A of S
that is linearly ordered with respect to , i.e., for all x and y from A, either x y
or y x.
Example B.5. In Example B.3, note that B is a chain, since every element of B
is a power of 2, and given elements 2
m
and 2
n
in B, if m n then 2
m
[2
n
, else
2
n
[2
m
. On the other hand, A is not a chain: we have already seen that 2 and 3 are
incomparable.
Zorns Lemma, in spite of its name, is really not a lemma, but a universally
accepted axiom of logic. It states the following:
221
Zorns Lemma: Let S be a nonempty set with a partial order . If every
chain in S has an upper bound in S, then S has a maximal element.
Zorns Lemma is equivalent to certain other axioms of logic, most famously, to
the Axiom of Choice. What this means that if one were to accept the statement
of Zorns Lemma as a fundamental axiom of logic, then in conjunction with other
accepted axioms of logic, one can derive the statement of the Axiom of Choice.
Conversely, if one were to accept the Axiom of Choice as a fundamental axiom of
logic, then in conjunction with other accepted axioms of logic, one can derive the
statement of Zorns Lemma.
Here is a typical application of Zorns Lemma. Recall from Exercise 2.135 of
Chapter 2 the denition of maximal ideals.
Theorem B.6. Let R be a ring. Then R contains maximal ideals.
Proof. Let S be the set of all proper ideals of R. Note that S is nonempty, since
the zero ideal 0 is in S. We dene a partial order on S by I J if and only if
I J (see Example B.2 above). Let T be a chain in S. Recall what this means:
T is a collection of proper ideals of R such that if I and J are in the collection,
then either I J or else J I. We claim that T has an upper bound in S, i.e.,
there exists a proper ideal K in R such that I K for all I in our chain T. The
proof of the claim is simple. By the denition of being a chain, T is nonempty, so
T contains at least one ideal of R. We dene K, as a set, to be the union of all
the ideals I in T. We need to show that K is a proper ideal of R. This is easy.
Note that since there is at least one ideal in T, and since this ideal contains 0, K
must be nonempty as it must contain at least the element 0. Now given a and b in
K, note that a must live in some ideal I in T and b must live in some ideal J in
T, since K is, after all, the union of all the ideals in T. Since T is linearly ordered
(this is where the property that chains are linearly ordered comes in), either I J
or else J I. Say I J. Then both a and b are in J. Hence, a+b is also in J as J
is an ideal. Since J in turn is contained in K, we nd a +b K. This shows that
K is closed under addition. Now given any a K, as before, a I for some ideal
I in T. Since I is an ideal, both ar and ra are in I for all r R. Since I K, we
nd ar and ra are in K. By Lemma 2.67 of Chapter 2, we nd K is an ideal. Of
course, K is clearly an upper bound for T, since I K for all I in T by the very
manner in which we have dened K.
222 APPENDIX B. PARTIALLY ORDERED SETS, ZORNS LEMMA
Note that indeed K is a proper ideal of R, i.e., K is in S. For, if not, then
K = R, so in particular, this means that 1 K. Since K is the union of the ideals
in T, we nd 1 I for some ideal I in T. But this is a contradiction, since I is
a proper ideal of R (remember that the set S was dened as the set of all proper
ideals of R, and I is a member of S).
Since T was arbitrary, we have found that every chain in S has an upper bound
in S. By Zorns lemma, S has a maximal element. But a maximal element of S is
precisely a maximal ideal of R!
2
Now we will present the proof that bases exist in all vector spaces, not just in
those with a nite spanning set; this proof invokes Zorns Lemma. Recall that we
can assume that our vector space is nontrivial, thanks to Example 3.35 of Chapter
3.
Theorem B.7. Every vector space has a basis.
Proof. Let S be the set of all linearly independent subsets of V . Since V is not
trivial by assumption, it has at least one nonzero vector, say v, and the set v is
then linearly independent (Exercise 3.22.1). It follows that S is a nonempty set.
Dene a partial order on S by declaring, for any two linearly independent
subsets X and Y , that X Y if and only if X Y . It is easy to check that this is
indeed a partial order: First, given any linearly independent subset X of V , clearly
X X, so indeed X X. Next, if X and Y are two linearly independent subsets
of V and if X Y and Y X, this means that X Y and Y X, so indeed
X = Y . Finally, if X Y and Y Z for three linearly independent subsets X, Y ,
and Z of V , then this means that X Y Z, i.e., X Z, so indeed X Z.
Our strategy will be to rst establish that S has a maximal element with respect
to this partial order, and then to show that this maximal element must be a basis
for V .
Given any chain T in S (recall that this means that T consists of linearly
independent subsets of S with the property that if X and Y are in T, then either
X Y or Y X), we will show that T has an upper bound in S. Write K for
the union of all linearly independent subsets X that are contained in T. We claim
that K is an upper bound for T. Let us rst show that K is a linearly independent
subset of V . By Denition 3.22 of Chapter 3, we need to show that every nite
223
subset of K is linearly independent. Given any nite set of vectors v
1
, . . . , v
n
from
K, note that each v
i
must live in some linearly independent subset X
i
in the chain
T. Since T is a chain, the subsets in T are linearly ordered (this is where we use
the dening property that the elements of a chain are linearly ordered!), we must
have X
i1
X
i2
X
in
for some permutation (i
1
, i
2
, . . . , i
n
) of the integers
(1, 2, . . . n). Thus, all the vectors v
1
, . . . , v
n
belong to X
in
. But since X
i,n
is a
linearly independent set, Denition 3.22 of Chapter 3 implies that the vectors v
1
,
. . . , v
n
must be linearly independent! Since this is true for any nite set of vectors
in K, we nd that K is a linearly independent set. In particular, K is in S.
Now note that given any linearly independent subset X contained in the chain
T, we have X K by the very denition of K, so by denition of the order relation,
X K. This shows that indeed T has an upper bound in S.
By Zorns Lemma, S has a maximal element, call it B. We will show that
B must be a basis of V . Since B is already linearly independent, we only need
to show that B spans V . So let v be any nonzero vector in V : we need to show
that v can be written as a linear combination of elements of B. If v is already in
B, there is nothing to prove (why?). If v is not in B, B v must be linearly
dependent, otherwise, Bv would be a linearly independent subset of V strictly
containing B, violating the maximality of B. Thus, there exists a relation f
0
v +
f
1
b
1
+ f
2
b
2
+ f
k
b
k
= 0 for some scalars f
0
, f
1
, , f
k
(not all zero), and some
vectors b
1
, b
2
, , b
k
of B. Notice that f
0
,= 0, since otherwise, our relation would
read f
1
b
1
+ f
2
b
2
+ f
k
b
k
= 0 (with not all f
i
equal to zero), which is impossible
since the b
i
are in B and B is a linearly independent set. Therefore, we can divide
by f
0
to nd v = (f
1
/f
0
)b
1
+ (f
2
/f
0
)b
2
+ + (f
k
/f
0
)b
k
. Hence v can be
written as a linear combination of elements of B, so B spans V .
Thus, B is a basis of V . 2
Remarks on Proposition 3.37, Chapter 3: Shrinking innite span-
ning sets down to a basis The proof that any spanning set of V can be
shrunk to basis, even when V is innite-dimensional, involves a modication of the
proof of Theorem B.7.
Let us use to denote the given spanning set of V , and as in the proof of
Theorem B.7, let S denote the set of all linearly independent sets of V that are
contained in S. (The italicized condition is where we depart from the proof of
224 APPENDIX B. PARTIALLY ORDERED SETS, ZORNS LEMMA
Theorem B.7.) Note that S is not empty, since is nonempty (recall V is not the
trivial space), and therefore, for any nonzero v , v will be a linearly
independent set, so v will be an element of S.
Now impose the same partial order on S as in the proof of Theorem B.7: X Y
if and only if X Y for two sets X and Y in S. Argue exactly as in that proof
that S must have a maximal element. (Note that if T is a chain in S, then K, the
union of all the sets contained in T, will also be contained in , since every set in T
is contained in .) Let B be a maximal element of S. (Note that by construction
B .) The claim is that B is a basis for V .
To prove this it is of course sucient to prove that B spans V since B is
already linearly independent. For this, we claim that it is sucient to show that
every vector in is expressible as a linear combination of elements of B. For,
assume that we have shown this. Then, given any vector v V , rst write it as
v = f
1
u
1
+ + f
n
u
n
for suitable vectors u
i
and scalars f
i
, invoking the
fact that spans V . Next, since we would have shown that every vector in
is expressible as a linear combination of elements in B, we nd that each u
i
is
expressible as u
i
= f
i,1
b
i,1
+ + f
i,ni
b
i,ni
for some vectors b
i,j
B and scalars
f
i,j
. Substituting these expressions for each u
i
into the expression above for v, we
nd that v is expressible as a linear combination of elements of B, i.e., that B spans
V .
To show that every vector in is expressible as a linear combination of elements
of B, assume that some u is not expressible as a linear combination of elements
of B. Then, exactly as in the proof of Proposition 3.49 (see how we showed C
1
=
C v
t+1
must be linearly independent), we would nd that B u is linearly
independent. But this contradicts the maximality of B! Hence every vector in
must be expressible as a linear combination of elements of B, which means that B
must be a basis. Since B , we have succeeded in shrinking down to a basis.
Remarks on Proposition 3.49, Chapter 3: the general case The
proof of this proposition when V is not assumed to be nite-dimensional involves
just a minor modication of the proof of Theorem B.7. What we need to show is
that there is a maximal linearly independent subset B of V that contains C. Then,
exactly as in the proof of Theorem B.7, this maximal linearly independent set would
be a basis of V , and of course, it would have been chosen so as to contain C. To
show the existence of B, we need to consider the set S of all linearly independent
225
subsets of V that contain C. One would impose a partial order on this set exactly
as in the proof of Theorem B.7. Once again, S, with this partial order, will turn
out to satisfy the extra hypothesis of Zorns Lemma, and will hence have a maximal
element. That maximal element would be our desired maximal linearly independent
subset of V that contains C.
226 APPENDIX B. PARTIALLY ORDERED SETS, ZORNS LEMMA
Appendix C
GNU Free Documentation
License
Version 1.3, 3 November 2008
Copyright 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
<http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other functional
and useful document free in the sense of freedom: to assure everyone the eective
freedom to copy and redistribute it, with or without modifying it, either commer-
cially or noncommercially. Secondarily, this License preserves for the author and
publisher a way to get credit for their work, while not being considered responsible
for modications made by others.
This License is a kind of copyleft, which means that derivative works of the
document must themselves be free in the same sense. It complements the GNU
General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software,
because free software needs free documentation: a free program should come with
227
228 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
manuals providing the same freedoms that the software does. But this License is
not limited to software manuals; it can be used for any textual work, regardless of
subject matter or whether it is published as a printed book. We recommend this
License principally for works whose purpose is instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains
a notice placed by the copyright holder saying it can be distributed under the terms
of this License. Such a notice grants a world-wide, royalty-free license, unlimited in
duration, to use that work under the conditions stated herein. The Document,
below, refers to any such manual or work. Any member of the public is a licensee,
and is addressed as you. You accept the license if you copy, modify or distribute
the work in a way requiring permission under copyright law.
A Modied Version of the Document means any work containing the Doc-
ument or a portion of it, either copied verbatim, or with modications and/or
translated into another language.
A Secondary Section is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the publishers or au-
thors of the Document to the Documents overall subject (or to related matters)
and contains nothing that could fall directly within that overall subject. (Thus, if
the Document is in part a textbook of mathematics, a Secondary Section may not
explain any mathematics.) The relationship could be a matter of historical connec-
tion with the subject or with related matters, or of legal, commercial, philosophical,
ethical or political position regarding them.
The Invariant Sections are certain Secondary Sections whose titles are
designated, as being those of Invariant Sections, in the notice that says that the
Document is released under this License. If a section does not t the above def-
inition of Secondary then it is not allowed to be designated as Invariant. The
Document may contain zero Invariant Sections. If the Document does not identify
any Invariant Sections then there are none.
The Cover Texts are certain short passages of text that are listed, as Front-
Cover Texts or Back-Cover Texts, in the notice that says that the Document is
released under this License. A Front-Cover Text may be at most 5 words, and a
Back-Cover Text may be at most 25 words.
229
A Transparent copy of the Document means a machine-readable copy, rep-
resented in a format whose specication is available to the general public, that is
suitable for revising the document straightforwardly with generic text editors or (for
images composed of pixels) generic paint programs or (for drawings) some widely
available drawing editor, and that is suitable for input to text formatters or for
automatic translation to a variety of formats suitable for input to text formatters.
A copy made in an otherwise Transparent le format whose markup, or absence
of markup, has been arranged to thwart or discourage subsequent modication by
readers is not Transparent. An image format is not Transparent if used for any
substantial amount of text. A copy that is not Transparent is called Opaque.
Examples of suitable formats for Transparent copies include plain ASCII with-
out markup, Texinfo input format, LaTeX input format, SGML or XML using
a publicly available DTD, and standard-conforming simple HTML, PostScript or
PDF designed for human modication. Examples of transparent image formats in-
clude PNG, XCF and JPG. Opaque formats include proprietary formats that can be
read and edited only by proprietary word processors, SGML or XML for which the
DTD and/or processing tools are not generally available, and the machine-generated
HTML, PostScript or PDF produced by some word processors for output purposes
only.
The Title Page means, for a printed book, the title page itself, plus such
following pages as are needed to hold, legibly, the material this License requires to
appear in the title page. For works in formats which do not have any title page
as such, Title Page means the text near the most prominent appearance of the
works title, preceding the beginning of the body of the text.
The publisher means any person or entity that distributes copies of the
Document to the public.
A section Entitled XYZ means a named subunit of the Document whose ti-
tle either is precisely XYZ or contains XYZ in parentheses following text that trans-
lates XYZ in another language. (Here XYZ stands for a specic section name men-
tioned below, such as Acknowledgements, Dedications, Endorsements,
or History.) To Preserve the Title of such a section when you modify the
Document means that it remains a section Entitled XYZ according to this de-
nition.
230 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty Disclaimers
are considered to be included by reference in this License, but only as regards
disclaiming warranties: any other implication that these Warranty Disclaimers may
have is void and has no eect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commer-
cially or noncommercially, provided that this License, the copyright notices, and
the license notice saying this License applies to the Document are reproduced in all
copies, and that you add no other conditions whatsoever to those of this License.
You may not use technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept compensa-
tion in exchange for copies. If you distribute a large enough number of copies you
must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you
may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed
covers) of the Document, numbering more than 100, and the Documents license
notice requires Cover Texts, you must enclose the copies in covers that carry, clearly
and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-
Cover Texts on the back cover. Both covers must also clearly and legibly identify
you as the publisher of these copies. The front cover must present the full title with
all words of the title equally prominent and visible. You may add other material on
the covers in addition. Copying with changes limited to the covers, as long as they
preserve the title of the Document and satisfy these conditions, can be treated as
verbatim copying in other respects.
If the required texts for either cover are too voluminous to t legibly, you should
put the rst ones listed (as many as t reasonably) on the actual cover, and continue
the rest onto adjacent pages.
231
If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along with
each Opaque copy, or state in or with each Opaque copy a computer-network lo-
cation from which the general network-using public has access to download using
public-standard network protocols a complete Transparent copy of the Document,
free of added material. If you use the latter option, you must take reasonably pru-
dent steps, when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated location until
at least one year after the last time you distribute an Opaque copy (directly or
through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document
well before redistributing any large number of copies, to give them a chance to
provide you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modied Version of the Document under the
conditions of sections 2 and 3 above, provided that you release the Modied Ver-
sion under precisely this License, with the Modied Version lling the role of the
Document, thus licensing distribution and modication of the Modied Version
to whoever possesses a copy of it. In addition, you must do these things in the
Modied Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of
the Document, and from those of previous versions (which should, if there
were any, be listed in the History section of the Document). You may use
the same title as a previous version if the original publisher of that version
gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible
for authorship of the modications in the Modied Version, together with at
least ve of the principal authors of the Document (all of its principal authors,
if it has fewer than ve), unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modied Version,
as the publisher.
232 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modications adjacent to the
other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the
public permission to use the Modied Version under the terms of this License,
in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required
Cover Texts given in the Documents license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled History, Preserve its Title, and add to it
an item stating at least the title, year, new authors, and publisher of the
Modied Version as given on the Title Page. If there is no section Entitled
History in the Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add an item
describing the Modied Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access
to a Transparent copy of the Document, and likewise the network locations
given in the Document for previous versions it was based on. These may be
placed in the History section. You may omit a network location for a work
that was published at least four years before the Document itself, or if the
original publisher of the version it refers to gives permission.
K. For any section Entitled Acknowledgements or Dedications, Preserve the
Title of the section, and preserve in the section all the substance and tone of
each of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text
and in their titles. Section numbers or the equivalent are not considered part
of the section titles.
M. Delete any section Entitled Endorsements. Such a section may not be
included in the Modied Version.
N. Do not retitle any existing section to be Entitled Endorsements or to con-
ict in title with any Invariant Section.
233
O. Preserve any Warranty Disclaimers.
If the Modied Version includes new front-matter sections or appendices that
qualify as Secondary Sections and contain no material copied from the Document,
you may at your option designate some or all of these sections as invariant. To
do this, add their titles to the list of Invariant Sections in the Modied Versions
license notice. These titles must be distinct from any other section titles.
You may add a section Entitled Endorsements, provided it contains nothing
but endorsements of your Modied Version by various partiesfor example, state-
ments of peer review or that the text has been approved by an organization as the
authoritative denition of a standard.
You may add a passage of up to ve words as a Front-Cover Text, and a passage
of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the
Modied Version. Only one passage of Front-Cover Text and one of Back-Cover
Text may be added by (or through arrangements made by) any one entity. If the
Document already includes a cover text for the same cover, previously added by
you or by arrangement made by the same entity you are acting on behalf of, you
may not add another; but you may replace the old one, on explicit permission from
the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give
permission to use their names for publicity for or to assert or imply endorsement
of any Modied Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms dened in section 4 above for modied versions, provided
that you include in the combination all of the Invariant Sections of all of the original
documents, unmodied, and list them all as Invariant Sections of your combined
work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple
identical Invariant Sections may be replaced with a single copy. If there are multiple
Invariant Sections with the same name but dierent contents, make the title of
each such section unique by adding at the end of it, in parentheses, the name of
the original author or publisher of that section if known, or else a unique number.
234 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
Make the same adjustment to the section titles in the list of Invariant Sections in
the license notice of the combined work.
In the combination, you must combine any sections Entitled History in the
various original documents, forming one section Entitled History; likewise com-
bine any sections Entitled Acknowledgements, and any sections Entitled Dedi-
cations. You must delete all sections Entitled Endorsements.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this License in the
various documents with a single copy that is included in the collection, provided that
you follow the rules of this License for verbatim copying of each of the documents
in all other respects.
You may extract a single document from such a collection, and distribute it
individually under this License, provided you insert a copy of this License into the
extracted document, and follow this License in all other respects regarding verbatim
copying of that document.
7. AGGREGATION WITH INDEPENDENT
WORKS
A compilation of the Document or its derivatives with other separate and inde-
pendent documents or works, in or on a volume of a storage or distribution medium,
is called an aggregate if the copyright resulting from the compilation is not used
to limit the legal rights of the compilations users beyond what the individual works
permit. When the Document is included in an aggregate, this License does not ap-
ply to the other works in the aggregate which are not themselves derivative works
of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the
Document, then if the Document is less than one half of the entire aggregate,
the Documents Cover Texts may be placed on covers that bracket the Document
within the aggregate, or the electronic equivalent of covers if the Document is in
electronic form. Otherwise they must appear on printed covers that bracket the
whole aggregate.
235
8. TRANSLATION
Translation is considered a kind of modication, so you may distribute transla-
tions of the Document under the terms of section 4. Replacing Invariant Sections
with translations requires special permission from their copyright holders, but you
may include translations of some or all Invariant Sections in addition to the original
versions of these Invariant Sections. You may include a translation of this License,
and all the license notices in the Document, and any Warranty Disclaimers, provided
that you also include the original English version of this License and the original
versions of those notices and disclaimers. In case of a disagreement between the
translation and the original version of this License or a notice or disclaimer, the
original version will prevail.
If a section in the Document is Entitled Acknowledgements, Dedications, or
History, the requirement (section 4) to Preserve its Title (section 1) will typically
require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as
expressly provided under this License. Any attempt otherwise to copy, modify,
sublicense, or distribute it is void, and will automatically terminate your rights
under this License.
However, if you cease all violation of this License, then your license from a par-
ticular copyright holder is reinstated (a) provisionally, unless and until the copy-
right holder explicitly and nally terminates your license, and (b) permanently, if
the copyright holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated per-
manently if the copyright holder noties you of the violation by some reasonable
means, this is the rst time you have received notice of violation of this License
(for any work) from that copyright holder, and you cure the violation prior to 30
days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses
of parties who have received copies or rights from you under this License. If your
rights have been terminated and not permanently reinstated, receipt of a copy of
some or all of the same material does not give you any rights to use it.
236 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU
Free Documentation License from time to time. Such new versions will be similar
in spirit to the present version, but may dier in detail to address new problems or
concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the
Document species that a particular numbered version of this License or any later
version applies to it, you have the option of following the terms and conditions
either of that specied version or of any later version that has been published (not
as a draft) by the Free Software Foundation. If the Document does not specify a
version number of this License, you may choose any version ever published (not as
a draft) by the Free Software Foundation. If the Document species that a proxy
can decide which future versions of this License can be used, that proxys public
statement of acceptance of a version permanently authorizes you to choose that
version for the Document.
11. RELICENSING
Massive Multiauthor Collaboration Site (or MMC Site) means any World
Wide Web server that publishes copyrightable works and also provides prominent
facilities for anybody to edit those works. A public wiki that anybody can edit is
an example of such a server. A Massive Multiauthor Collaboration (or MMC)
contained in the site means any set of copyrightable works thus published on the
MMC site.
CC-BY-SA means the Creative Commons Attribution-Share Alike 3.0 license
published by Creative Commons Corporation, a not-for-prot corporation with a
principal place of business in San Francisco, California, as well as future copyleft
versions of that license published by that same organization.
Incorporate means to publish or republish a Document, in whole or in part,
as part of another Document.
An MMC is eligible for relicensing if it is licensed under this License, and if
all works that were rst published under this License somewhere other than this
MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no
cover texts or invariant sections, and (2) were thus incorporated prior to November
1, 2008.
237
The operator of an MMC Site may republish an MMC contained in the site
under CC-BY-SA on the same site at any time before August 1, 2009, provided the
MMC is eligible for relicensing.
ADDENDUM: How to use this License for your
documents
To use this License in a document you have written, include a copy of the
License in the document and put the following copyright and license notices just
after the title page:
Copyright YEAR YOUR NAME. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.3 or any later version pub-
lished by the Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the license is
included in the section entitled GNU Free Documentation License.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace
the with . . . Texts. line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other combination
of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend
releasing these examples in parallel under your choice of free software license, such
as the GNU General Public License, to permit their use in free software.
238 APPENDIX C. GNU FREE DOCUMENTATION LICENSE
Index
Active learning, xi
Bernstein polynomials, 152
Binary operation, 24, 28, 95
Chain, 220
Closure
under addition, 41
under multiplication, 41
Division algorithm, 4, 83
Euclid, 15
Eulers -function, 18
Fermats Little Theorem, 18, 192
Field, 48
examples, 48
eld extension, 51
nite, 50
multiplicative group, 48
Galois theory, 208, 210
Greatest common divisor, 6, 20
Group, 25, 157, 158
Gl
n
(F), 175
Sl
n
(F), 176
abelian, 26
center, 172
cyclic group, 173, 183
Dihedral group
D
3
, 165
D
4
, 169
direct product, 174
homomorphism, 198
Fundamental Theorem, 203
kernel, 199
isomorphism, 202
examples, 202
Lorentz group, 213
nonabelian, 161
normal subgroup, 194
order, 187
order of element, 183
Orthogonal group, 178, 204
orthogonal group
of a quadratic form, 212
quotient group, 197
subgroup, 180
coset, 187
subgroup generated by element, 182
symmetric group
S
2
, 162
S
3
, 159
S
n
, 161
d-cycle, 162
symmetry group of set with struc-
ture, 158, 206
239
240 INDEX
table, 159, 160
upper triangular invertible matrices,
176
Harmonic series, 19
Ideal, 52
coset with respect to, 57
examples, 54
ideal generated by a set, 56
principal ideal, 56
Integers, 1
addition, 25
composite, 9
division algorithm, 4
divisor, 3
greatest common divisor, 6, 20
least common multiple, 18
linear combination, 6
multiple, 3
multiplication, 26
prime, 9
relatively prime, 9
unique prime factorization, 11
Least common multiple, 18
Matrices
over R, 32
over arbitrary ring, 33
strictly upper triangular, 45
upper triangular, 44
Natural numbers, 2
Number system, 28
Partial Order, 219
Polynomial
Bernstein, 152
expression, 91
Polynomial expression, 91
Polynomials
division algorithm, 83
Prime, 9
innitely many, 15
Prime Number Theorem, 10
Principal Ideal Domain, 83
quadratic form, 211
Ring, 28
center, 80
commutative, 29
examples, 29
homomorphism, 63
examples, 67
Fundamental Theorem, 75
kernel, 66
integral domains, 46
invertible element, 47
irreducible element, 81
isomorphism, 63, 72
automorphism, 74
examples, 72
nilpotent element, 79
noncommutative, 29
quotient ring, 57, 60
examples, 62
ring extension, 41
unit, 47
zero-divisors, 45
Spanning set, 105
INDEX 241
redundancy, 106
Subeld, 51
Subring, 41
examples, 43
generated by an element, 90
examples, 92
test, 42
Subspace
test, 126
Unique prime factorization, 11
Vector Space
linear transformations
Fundamental Theorem, 146
Vector space, 96
basis, 111
examples, 111
basis vectors, 111
dimension, 120
examples, 97
isomorphism, 145
linear combination, 104
linear transformations, 133
kernel, 136
matrix representation, 137
linearly dependent, 109
linearly independent, 109
quotient space, 125, 129, 131
scalar multiplication, 97
scalars, 97
spanning set, 105
subspace, 125
coset with respect to, 130
examples, 127
vectors, 97
Weierstrass Approximation Theorem, 153
Well-Ordering Principle, 2
Zorns Lemma, 115, 122, 221

You might also like