Abstract Algebra
Abstract Algebra
Abstract Algebra
Also of Interest
Algebra and Number Theory. A Selection of Highlights
Benjamin Fine, Anja Moldenhauer, Gerhard Rosenberger,
Annika Schürenberg, Leonard Wienke, 2023
ISBN 978-3-11-078998-0, e-ISBN (PDF) 978-3-11-079028-3,
e-ISBN (EPUB) 978-3-11-079039-9
Abstract Algebra
�
With Applications to Galois Theory, Algebraic Geometry,
Representation Theory and Cryptography
3rd edition
Mathematics Subject Classification 2020
Primary: 11-01, 12-01, 13-01, 14-01, 16-01, 20-01, 20C15; Secondary: 01-01, 08-01, 94-01
Authors
Prof. Dr. Gerhard Rosenberger Dr. Leonard Wienke
University of Hamburg University of Bremen
Bundesstr. 55 Bibliothekstr. 5
20146 Hamburg 28359 Bremen
Germany Germany
Annika Schürenberg
Grundschule Hoheluft
Wrangelstr. 80
20253 Hamburg
Germany
ISBN 978-3-11-113951-7
e-ISBN (PDF) 978-3-11-114252-4
e-ISBN (EPUB) 978-3-11-114284-5
www.degruyter.com
Preface
Traditionally, mathematics has been separated into three main areas: algebra, analysis,
and geometry. Of course, there is a great deal of overlap between these areas. In gen-
eral, algebraic methods and symbolism pervade all of mathematics, and it is essential
for anyone learning any advanced mathematics to be familiar with the concepts and
methods in abstract algebra.
This is an introductory text on abstract algebra. It grew out of courses given to ad-
vanced undergraduate and beginning graduate students in the United States, and to
mathematics students and teachers in Germany. We assume that the reader is famil-
iar with calculus and with some linear algebra, primarily matrix algebra and the basic
concepts of vector spaces, bases, and dimensions. All other necessary material is intro-
duced and explained in the book. Our expectation is that the material in this text can be
completed in a full year’s course.
We present the material sequentially, so that polynomials and field extensions pre-
cede an in-depth look at advanced topics in group theory and Galois theory. This text
follows the new approach of conveying abstract algebra starting with rings and fields,
rather than with groups. Our teaching experience shows that examples of groups seem
rather abstract and require a certain formal framework and mathematical maturity that
would distract a course from its main objectives. The idea is that the integers provide
the most natural example of an algebraic structure that students know from school.
A student who goes through ring theory first, will attain a solid background in abstract
algebra and will be able to move on to more advanced topics.
The centerpiece of our book is the development of Galois theory and its important
applications, especially the insolvability of the quintic polynomial. After introducing the
basic algebraic structures, groups, rings, and fields, we begin with the theory of polyno-
mials and polynomial equations over fields. We then develop the main ideas of field
extensions and adjoining elements to fields. Since the second edition, we include added
material on skew field extensions of ℂ and Frobenius’s theorem.
After this, we present the necessary material from group theory needed to complete
both the insolvability of the quintic polynomial and solvability by radicals in general.
Hence, the middle part of the book, Chapters 9 through 14, are concerned with group
theory, including permutation groups, solvable groups, Abelian groups, and group ac-
tions. Chapter 14 is somewhat off to the side of the main theme of the book. Here, we give
a brief introduction to free groups, group presentations, and combinatorial group the-
ory. In this third edition, we have extended Chapter 14 to include a primer on hyperbolic
groups. With the group theory material at hand, we return to Galois theory and study
general normal and separable extensions, and the fundamental theorem of Galois the-
ory. Using this approach, we present several major applications of the theory, including
solvability by radicals and the insolvability of the quintic, the fundamental theorem of
algebra, the construction of regular n-gons, and the famous impossibilities: squaring the
circle, doubling the cube, and trisecting an angle.
https://doi.org/10.1515/9783111142524-201
VI � Preface
We continue with the theory of modules and prove the fundamental theorem for
finitely generated modules over principle ideal domains. We then consider transcen-
dental field extensions and prove Noether’s normalization theorem as preparation for
algebraic geometry based on Hilbert’s basis theorem and the nullstellensatz, and de-
scribe several applications. Since the second edition, we include a new chapter on al-
gebras and group representations. We finish in a slightly different direction, giving an
introduction to algebraic and noncommutative group-based cryptography. In this third
edition, we have devoted a modernized chapter to each of these topics including recent
developments and results.
In the bibliography we choose to mention some interesting books and papers which
are not used explicitly in our exposition but are very much related to the topics of the
present book and could be helpful for additional reading.
We were very pleased with the response to the second edition of this book, and we
were very happy to do a third edition. In this third edition, we have added the extensions
mentioned above, cleaned up various typos pointed out by readers, and have incorpo-
rated their suggestions. Here, we have to give a special thank you to Ahmad Mirzay and
O-joung Kwon. We would also like to thank Anja Rosenberger, who helped tremendously
with editing and LaTeX, and who made some invaluable suggestions about the contents.
Last but not least, we thank De Gruyter for publishing our book.
5 Field Extensions � 65
5.1 Extension Fields and Finite Extensions � 65
5.2 Finite and Algebraic Extensions � 68
VIII � Contents
Bibliography � 403
Index � 407
1 Groups, Rings and Fields
1.1 Abstract Algebra
Abstract algebra or modern algebra can be best described as the theory of algebraic
structures. Briefly, an algebraic structure is a set together with one or more binary oper-
ations on it satisfying axioms governing the operations. There are many algebraic struc-
tures, but the most commonly studied structures are groups, rings, fields, and vector
spaces. Also, widely used are modules and algebras. In this first chapter, we will look at
some basic preliminaries concerning groups, rings, and fields. We will only briefly touch
on groups here; a more extensive treatment will be done later in the book.
Mathematics traditionally has been subdivided into three main areas—analysis, al-
gebra, and geometry. These areas overlap in many places so that it is often difficult, for
example, to determine whether a topic is one in geometry or in analysis. Algebra and
algebraic methods permeate all these disciplines and most of mathematics has been al-
gebraicized; that is, uses the methods and language of algebra. Groups, rings, and fields
play a major role in the study of analysis, topology, geometry, and even applied mathe-
matics. We will see these connections in examples throughout the book.
Abstract algebra has its origins in two main areas and questions that arose in these
areas—the theory of numbers and the theory of equations. The theory of numbers deals
with the properties of the basic number systems—integers, rationals, and reals, whereas
the theory of equations, as the name indicates, deals with solving equations, in partic-
ular, polynomial equations. Both are subjects that date back to classical times. A whole
section of Euclid’s elements is dedicated to number theory. The foundations for the mod-
ern study of number theory were laid by Fermat in the 1600s, and then by Gauss in the
1800s. In an attempt to prove Fermat’s big theorem, Gauss introduced the complex inte-
gers a + bi, where a and b are integers and showed that this set has unique factorization.
These ideas were extended by Dedekind and Kronecker, who developed a wide ranging
theory of algebraic number fields and algebraic integers. A large portion of the termi-
nology used in abstract algebra, such as rings, ideals, and factorization, comes from the
study of algebraic number fields. This has evolved into the modern discipline of alge-
braic number theory.
The second origin of modern abstract algebra was the problem of trying to deter-
mine a formula for finding the solutions in terms of radicals of a fifth degree polynomial.
It was proved first by Ruffini in 1800, and then by Abel that it is impossible to find a for-
mula in terms of radicals for such a solution. Galois in 1820 extended this and showed
that such a formula is impossible for any degree five or greater. In proving this, he laid
the groundwork for much of the development of modern abstract algebra, especially
field theory and finite group theory. Earlier, in 1800, Gauss proved the fundamental the-
orem of algebra, which says that any nonconstant complex polynomial equation must
have a solution. One of the goals of this book is to present a comprehensive treatment
of Galois theory and a proof of the results mentioned above.
https://doi.org/10.1515/9783111142524-001
2 � 1 Groups, Rings and Fields
The locus of real points (x, y), which satisfy a polynomial equation f (x, y) = 0, is
called an algebraic plane curve. Algebraic geometry deals with the study of algebraic
plane curves and extensions to loci in a higher number of variables. Algebraic geometry
is intricately tied to abstract algebra and especially commutative algebra. We will touch
on this in the book also.
Finally linear algebra, although a part of abstract algebra, arose in a somewhat dif-
ferent context. Historically, it grew out of the study of solution sets of systems of linear
equations and the study of the geometry of real n-dimensional spaces. It began to be
developed formally in the early 1800s with work of Jordan and Gauss, and then later in
the century by Cayley, Hamilton, and Sylvester.
1.2 Rings
The primary motivating examples for algebraic structures are the basic number sys-
tems: the integers ℤ, the rational numbers ℚ, the real numbers ℝ, and the complex
numbers ℂ. Each of these has two basic operations, addition and multiplication, and
form what is called a ring. We formally define this.
Definition 1.2.1. A ring is a set R with two binary operations defined on it: addition,
denoted by +, and multiplication, denoted by ⋅, or just by juxtaposition, satisfying the
following six axioms:
(1) Addition is commutative: a + b = b + a for each pair a, b in R.
(2) Addition is associative: a + (b + c) = (a + b) + c for a, b, c ∈ R.
(3) There exists an additive identity, denoted by 0, such that a + 0 = a for each a ∈ R.
(4) For each a ∈ R, there exists an additive inverse, denoted by −a, such that a + (−a) = 0.
(5) Multiplication is associative: a(bc) = (ab)c for a, b, c ∈ R.
(6) Multiplication is left and right distributive over addition: a(b + c) = ab + ac, and
(b + c)a = ba + ca for a, b, c ∈ R.
A set G with one operation, +, on it satisfying axioms (1) through (4) is called an
Abelian group. We will discuss these further later in the chapter.
The numbers systems ℤ, ℚ, ℝ, ℂ are commutative rings with identity.
A ring R with only one element is called trivial. A ring R with identity is trivial if and
only if 0 = 1. A finite ring is a ring R with only finitely many elements in it. Otherwise, R is
1.3 Integral Domains and Fields � 3
an infinite ring. ℤ, ℚ, ℝ, ℂ are all infinite rings. Examples of finite rings are given by the
integers modulo n, ℤn , with n > 1. The ring ℤn consists of the elements 0, 1, 2, . . . , n − 1
with addition and multiplication done modulo n. That is, for example 4 ⋅ 3 = 12 = 2
modulo 5. Hence, in ℤ5 , we have 4 ⋅ 3 = 2. The rings ℤn are all finite commutative rings
with identity.
To give examples of rings without an identity, consider the set nℤ = {nz : z ∈ ℤ}
consisting of all multiples of the fixed integer n. It is an easy verification (see exercises)
that this forms a ring under the same addition and multiplication as in ℤ, but that there
is no identity for multiplication. Hence, for each n ∈ ℤ with n > 1, we get an infinite
commutative ring without an identity.
To obtain examples of noncommutative rings, we consider matrices. Let M(2, ℤ) be
the set of (2 × 2)-matrices with integral entries. Addition of matrices is done component-
wise; that is,
a1 b1 a b2 a + a2 b1 + b2
( )+( 2 )=( 1 ),
c1 d1 c2 d2 c1 + c2 d1 + d2
a1 b1 a b2 a a + b1 c2 a1 b2 + b1 d2
( )⋅( 2 )=( 1 2 ).
c1 d1 c2 d2 c1 a2 + d1 c2 c1 b2 + d1 d2
Then again, it is an easy verification (see exercises) that M(2, ℤ) forms a ring. Further,
since matrix multiplication is noncommutative, this forms a noncommutative ring.
However, the identity matrix does form a multiplicative identity for it. M(2, ℤn ) with
n > 1 provides an example of an infinite noncommutative ring without an identity.
Finally, M(2, ℤn ) for n > 1 will give an example of a finite noncommutative ring.
Definition 1.3.1. A zero divisor in a ring R is an element a ∈ R with a ≠ 0 such that there
exists an element b ≠ 0 with ab = 0. A commutative ring with an identity 1 ≠ 0 and with
no zero divisors is called an integral domain.
Notice that having no zero divisors is equivalent to the fact that if ab = 0 in R, then
either a = 0, or b = 0.
Hence, ℤ, ℚ, ℝ, ℂ are all integral domains, but from the example above, ℤ6 is not.
In general, we have the following:
Proof. First of all, notice that under multiplication modulo n, an element m is 0 if and
only if n divides m. We will make this precise shortly. Recall further Euclid’s lemma
(see Chapter 2), which says that if a prime p divides a product ab, then p divides a, or p
divides b.
Now suppose that n is a prime and ab = 0 in ℤn . Then n divides ab. From Euclid’s
lemma it follows that n divides a, or n divides b. In the first case, a = 0 in ℤn , whereas
in the second, b = 0 in ℤn . It follows that there are no zero divisors in ℤn , and since ℤn
is a commutative ring with an identity, it is an integral domain.
Conversely, suppose ℤn is an integral domain. Suppose that n is not prime. Then n =
ab with 1 < a < n, 1 < b < n. It follows that ab = 0 in ℤn with neither a nor b being zero.
Therefore, they are zero divisors, which is a contradiction. Hence, n must be prime.
Hence, a field K always contains at least two elements, a zero element 0 and an
identity 1 ≠ 0.
The rationals ℚ, the reals ℝ, and the complexes ℂ are all fields. If we relax the com-
mutativity requirement and just require that in the ring R with identity, each nonzero
element is a unit, then we get a skew field or division ring.
Proof. Since a field K is already a commutative ring with an identity, we must only show
that there are no zero divisors in K.
Suppose that ab = 0 with a ≠ 0. Since K is a field and a is nonzero, it has an in-
verse a−1 . Hence,
Proof. First suppose that ℤn is a field. Then from Lemma 1.3.5, it is an integral domain.
Therefore, from Theorem 1.3.2, n must be a prime.
Conversely, suppose that n is a prime. We must show that ℤn is a field. Since we
already know that ℤn is an integral domain, we must only show that each nonzero ele-
ment of ℤn is a unit. Here, we need some elementary facts from number theory. If a, b
are integers, we use the notation a|b to indicate that a divides b.
Recall that given nonzero integers a, b, their greatest common divisor or GCD d > 0
is a positive integer, which is a common divisor; that is, d|a and d|b, and if d1 is any
other common divisor, then d1 |d. We denote the greatest common divisor of a, b by either
gcd(a, b) or (a, b). It can be proved that given nonzero integers a, b their GCD exists, is
unique and can be characterized as the least positive linear combination of a and b. If
the GCD of a and b is 1, then we say that a and b are relatively prime or coprime. This is
equivalent to being able to express 1 as a linear combination of a and b (see Chapter 3
for proofs and more details).
Now let a ∈ ℤn with n prime and a ≠ 0. Since a ≠ 0, we have that n does not divide a.
Since n is prime, it follows that a and n must be relatively prime, (a, n) = 1. From the
number theoretic remarks above, we then have that there exist x, y with
ax + ny = 1.
ax = 1.
Therefore, a has a multiplicative inverse in ℤn and is, hence, a unit. Since a was an
arbitrary nonzero element, we conclude that ℤn is a field.
The theorem above is actually a special case of a more general result from which
Theorem 1.3.6 could also be obtained.
Proof. Let K be a finite integral domain. We must show that K is a field. It is clearly
sufficient to show that each nonzero element of K is a unit. Let
{0, 1, r1 , . . . , rn }
be the elements of K. Let ri be a fixed nonzero element and multiply each element of K
by ri on the left. Now
if ri rj = ri rk then ri (rj − rk ) = 0.
R = {0, 1, r1 , . . . , rn } = ri R = {0, ri , ri r1 , . . . , ri rn }.
Therefore, the identity element 1 must be in the right-hand list; that is, there is an rj such
that ri rj = 1. Therefore, ri has a multiplicative inverse and is, hence, a unit. Therefore,
K is a field.
Definition 1.4.1. A subring of a ring R is a nonempty subset S that is also a ring under
the same operations as R. If R is a field and S also a field, then it is a subfield.
Lemma 1.4.2. A subset S of a ring R is a subring if and only if S is nonempty, and whenever
a, b ∈ S, we have a + b ∈ S, a − b ∈ S and ab ∈ S.
Example 1.4.3. Show that if n > 1, the set nℤ is a subring of ℤ. Here, clearly nℤ is
nonempty. Suppose a = nz1 , b = nz2 are two elements of nℤ. Then
Therefore, nℤ is a subring.
Example 1.4.4. Show that the set of real numbers of the form
S = {u + v√2 : u, v ∈ ℚ}
Therefore, S is a subring.
1.4 Subrings and Ideals � 7
Definition 1.4.5. Let R be a ring and I ⊂ R. Then I is a (two-sided) ideal if the following
properties hold:
(1) I is nonempty.
(2) If a, b ∈ I, then a ± b ∈ I.
(3) If a ∈ I and r is any element of R, then ra ∈ I, and ar ∈ I.
⟨a⟩ = aR = {ar : r ∈ R}
is an ideal of R.
Proof. We must verify the three properties of the definition. Since a ∈ R, we have that
aR is nonempty. If u = ar1 , v = ar2 are two elements of aR, then
Theorem 1.4.7. Any subring of ℤ is of the form nℤ for some n. Hence, each subring of ℤ
is actually a principal ideal.
Proof. Let S be a subring of ℤ. If S = {0}, then S = 0ℤ, so we may assume that S has
nonzero elements. Since S is a subring if it has nonzero elements, it must have positive
elements (since it has the additive inverse of any element in it).
8 � 1 Groups, Rings and Fields
Let S + be the set of positive elements in S. From the remarks above, this is a
nonempty set, and so, there must be a least positive element n. We claim that S = nℤ.
Let m be a positive element in S. By the division algorithm
m = qn + r,
where either r = 0, or 0 < r < n (see Chapter 3). Suppose that r ≠ 0. Then
r = m − qn.
We mention that this is true in ℤ, but not always true. For example, ℤ is a subring
of ℚ, but not an ideal. An extension of the proof of Lemma 1.4.6 gives the following. We
leave the proof as an exercise.
⟨a1 , . . . , an ⟩ = {r1 a1 + r2 a2 + ⋅ ⋅ ⋅ + rn an : ri ∈ R}
is an ideal of R.
Theorem 1.4.9. Let R be a commutative ring with an identity 1 ≠ 0. Then R is a field if and
only if the only ideals in R are {0} and R.
Proof. Suppose that R is a field and I ⊲ R is an ideal. We must show that either I = {0},
or I = R. Suppose that I ≠ {0}, then we must show that I = R.
Since I ≠ {0}, there exists an element a ∈ I with a ≠ 0. Since R is a field, this element
a has an inverse a−1 . Since I is an ideal, it follows that a−1 a = 1 ∈ I. Let r ∈ R, then, since
1 ∈ I, we have r ⋅ 1 = r ∈ I. Hence, R ⊂ I and, therefore, R = I.
Conversely, suppose that R is a commutative ring with an identity, whose only ideals
are {0} and R. We must show that R is a field, or equivalently, that every nonzero element
of R has a multiplicative inverse.
1.5 Factor Rings and Ring Homomorphisms � 9
r + I = {r + i : i ∈ I}
Lemma 1.5.2. Let I be an ideal in a ring R. Then the cosets of I partition R; that is, any
two cosets are either coincide or disjoint.
We leave the proof to the exercises. Now, on the set of all cosets of an ideal, we will
build a new ring.
Theorem 1.5.3. Let I be an ideal in a ring R. Let R/I = {r + I : r ∈ R} be the set of all cosets
of I in R. We define addition and multiplication on R/I in the following manner:
Then R/I forms a ring called the factor ring of R modulo I. The zero element of R/I is
0 + I and the additive inverse of r + I is −r + I. Further, if R is commutative, then R/I is
commutative, and if R has an identity, then R/I has an identity 1 + I.
Proof. The proof that R/I satisfies the ring axioms under the definitions above is
straightforward. For example,
and so, addition is commutative. What must be shown is that both addition and multi-
plication are well defined. That is, if
then
10 � 1 Groups, Rings and Fields
and
Now if r1 + I = r1′ + I, then r1 ∈ r1′ + I, and so, r1 = r1′ + i1 for some i1 ∈ I. Similarly, if
r2 + I = r2′ + I, then r2 ∈ r2′ + I, and so, r2 = r2′ + i2 for some i2 ∈ I. Then
since all the other products are in the ideal I. This shows that addition and multiplication
are well defined. It also shows why the ideal property is necessary.
As an example, let R be the integers ℤ. As we have seen, each subring is an ideal and
of the form nℤ for some natural number n. The factor ring ℤ/nℤ is called the residue
class ring modulo n, denoted ℤn . Notice that we can take as cosets
Addition and multiplication of cosets is then just addition and multiplication modulo n.
As we can see, this is just a formalization of the ring ℤn , which we have already looked
at. Recall that ℤn is an integral domain if and only if n is prime and ℤn is a field for
precisely the same n. If n = 0, then ℤ/nℤ is the same as ℤ.
We now show that ideals and factor rings are closely related to certain mappings
between rings.
In addition,
(1) f is an epimorphism if it is surjective.
(2) f is an monomorphism if it is injective.
(3) f is an isomorphism if it is bijective; that is, both surjective and injective. In this case,
R and S are said to be isomorphic rings, which we denote by R ≅ S.
1.5 Factor Rings and Ring Homomorphisms � 11
Lemma 1.5.5. Let R and S be rings, and let f : R → S be a ring homomorphism. Then
(1) f (0) = 0, where the first and second 0 are the zero elements of R and S, respectively.
(2) f (−r) = −f (r) for any r ∈ R.
Proof. We obtain f (0) = 0 from the equation f (0) = f (0 + 0) = f (0) + f (0). Hence,
0 = f (0) = f (r − r) = f (r + (−r)) = f (r) + f (−r); that is, f (−r) = −f (r).
Definition 1.5.6. Let R and S be rings, and let f : R → S be a ring homomorphism. Then
the kernel of f is
Theorem 1.5.7 (Ring isomorphism theorem). Let R and S be rings, and let
f :R→S
R/ ker(f ) ≅ im(f ).
(2) Conversely, suppose that I is an ideal in a ring R. Then the map f : R → R/I, given by
f (r) = r + I for r ∈ R, is a ring homomorphism, whose kernel is I, and whose image
is R/I.
The theorem says that the concepts of ideal of a ring and kernel of a ring homomor-
phism coincide; that is, each ideal is the kernel of a homomorphism and the kernel of
each ring homomorphism is an ideal.
Proof. If s1 , s2 ∈ im(f ), then there exist r1 , r2 ∈ R, such that f (r1 ) = s1 , and f (r2 ) = s2 .
Then certainly, im(f ) is a subring of S from Definition 1.5.4 and Lemma 1.5.5. Now, let
I = ker(f ). We show first that I is an ideal. If r1 , r2 ∈ I, then f (r1 ) = f (r2 ) = 0. It follows
from the homomorphism property that
Therefore, I is a subring.
12 � 1 Groups, Rings and Fields
and
Theorem 1.5.7 is called the ring isomorphism theorem or the first ring isomorphism
theorem. We mention that there is an analogous theorem for each algebraic structure,
in particular, for groups and vector spaces. We will mention the result for groups in
Section 1.8.
Theorem 1.6.1. The rationals ℚ are the smallest field containing the integers ℤ. That is,
if ℤ ⊂ K ⊂ ℚ with K a subfield of ℚ, then K = ℚ.
Theorem 1.6.2. Let D be an integral domain. Then there is a field K containing D, called
the field of fractions for D, such that each element of K is a fraction from D; that is, an
element of the form d1 d2−1 with d1 , d2 ∈ D. Further, K is unique up to isomorphism and is
the smallest field containing D.
Proof. The proof is just the mimicking of the construction of the rationals from the in-
tegers. Let
K ′ = {(d1 , d2 ) : d1 , d2 ≠ 0, d1 , d2 ∈ D}.
Let K be the set of equivalence classes, and define addition and multiplication in the
usual manner as for fractions, where the result is the equivalence class:
It is now straightforward to verify the ring axioms for K. The inverse of (d1 , 1) is (1, d1 )
for d1 ≠ 0 in D. As with ℤ, we identify the elements of K as fractions dd1 . The proof that
2
K is the smallest field containing D is the same as for ℚ from ℤ.
As examples, we have that ℚ is the field of fractions for ℤ. A familiar, but less com-
mon, example is the following:
Let ℝ[x] be the set of polynomials over the real numbers ℝ. It can be shown that
ℝ[x] forms an integral domain (see Chapter 3). The field of fractions consists of all
f (x)
formal functions g(x) , where f (x), g(x) are real polynomials with g(x) ≠ 0. The cor-
responding field of fractions is called the field of rational functions over ℝ and is de-
noted ℝ(x).
14 � 1 Groups, Rings and Fields
Lemma 1.7.2. Let K be any field. Then K contains a prime field K as a subfield.
Definition 1.7.3. Let R be a commutative ring with an identity 1 ≠ 0. The smallest posi-
tive integer n such that n ⋅ 1 = 1 + 1 + ⋅ ⋅ ⋅ + 1 = 0 is called the characteristic of R. If there
is no such n, then R has characteristic 0. We denote the characteristic by char(R).
We have seen that every field contains a prime field. We extend this.
Definition 1.7.5. A commutative ring R with an identity 1 ≠ 0 is a prime ring if the only
subring containing the identity is the whole ring.
Clearly both the integers ℤ and the modular integers ℤn are prime rings. In fact, up
to isomorphism, they are the only prime rings.
Theorem 1.7.6. Let R be a prime ring. Then char(R) = 0 implies R ≅ ℤ, whereas char(R) =
n > 0 implies R ≅ ℤn .
Theorem 1.7.6 can be extended to fields with ℚ, taking the place of ℤ and ℤp , with
p a prime, taking the place of ℤn .
Proof. The proof is identical to that of Theorem 1.7.6; however, we consider the smallest
subfield K1 of K containing S.
We mention that there can be infinite fields of characteristic p. Consider, for ex-
ample, the field of fractions of the polynomial ring ℤp [x]. This is the field of rational
functions with coefficients in ℤp .
We give a theorem on fields of characteristic p that will be important much later
when we look at Galois theory.
p p(p − 1) ⋅ ⋅ ⋅ (p − i + 1)
( )= ,
i i ⋅ (i − 1) ⋅ ⋅ ⋅ 1
and it is clear that p|(pi) for 1 ≤ i ≤ p − 1. Hence, in K, we have (pi) ⋅ 1 = 0, and so, we have
Therefore, ϕ is a homomorphism.
Further, ϕ is always injective. To see this, suppose that ϕ(x) = ϕ(y). Then
ϕ(x − y) = 0 ⇒ (x − y)p = 0.
1.8 Groups
We close this first chapter by introducing some basic definitions and results from
group theory that mirror the results, which were presented for rings and fields. We
will look at group theory in more detail later in the book. Proofs will be given at that
point.
Definition 1.8.1. A group G is a set with one binary operation (which we will denote by
multiplication) such that
(1) the operation is associative;
(2) there exists an identity for this operation; and
(3) each g ∈ G has an inverse for this operation.
If, in addition, the operation is commutative, the group G is called an Abelian group. The
order of G is the number of elements in G, denoted by |G|. If |G| < ∞, G is a finite group;
otherwise G is an infinite group.
Groups most often arise from invertible mappings of a set onto itself. Such mappings
are called permutations.
Theorem 1.8.2. The group of all permutations on a set A forms a group called the sym-
metric group on A, which we denote by SA . If A has more than 2 elements, then SA is non-
Abelian.
Theorem 1.8.5. If A1 and A2 are sets with |A1 | = |A2 |, then SA1 ≅ SA2 . If |A| = n with n
finite, we call SA the symmetric group on n elements, which we denote by Sn . Further, we
have |Sn | = n!.
As with rings the cosets of a subgroup partition a group. We call the number of right
cosets of a subgroup H in a group G, then index of H in G, denoted |G : H|. One can prove
that the number of right cosets is equal to the number of left cosets. For finite groups,
we have the following beautiful result called Lagrange’s theorem.
Theorem 1.8.8 (Lagrange’s theorem). Let G be a finite group and H a subgroup. Then the
order of H divides the order of G. In particular,
Theorem 1.8.10. Let H be a normal subgroup of a group G. Let G/H be the set of all cosets
of H in G; that is,
(g1 H)(g2 H) = g1 g2 H.
Then G/H forms a group called the factor group or quotient group of G modulo H.
The identity element of G/H is 1H, and the inverse of gH is g −1 H. Further, if G is Abelian,
then G/H is also Abelian.
18 � 1 Groups, Rings and Fields
Finally, as with rings normal subgroups, factor groups are closely tied to homomor-
phisms.
G1 / ker(f ) ≅ im(f ).
1.9 Exercises
1. Let ϕ : K → R be a homomorphism from a field K to a ring R. Show that either
ϕ(a) = 0 for all a ∈ K, or ϕ is a monomorphism.
2. Let R be a ring and M ≠ 0 an arbitrary set. Show that the following are equivalent:
(i) The ring of all mappings from M to R is a field.
(ii) M contains only one element and R is a field.
3. Let π be a set of prime numbers. Define
a
ℚπ = { : all prime divisors of b are in π}.
b
(i) A + B ⊲ R, A + B = (A ∪ B).
(ii) AB = {a1 b1 + ⋅ ⋅ ⋅ + an bn : n ∈ ℕ, ai ∈ A, bi ∈ B}, AB ⊂ A ∩ B.
(iii) A(B + C) = AB + AC, (A + B)C = AB + BC, (AB)C = A(BC).
(iv) A = R ⇔ A ∩ R∗ ≠ 0.
(v) a, b ∈ R ⇒ ⟨a⟩ + ⟨b⟩ = {xa + yb : x, y ∈ R}.
(vi) a, b ∈ R ⇒ ⟨a⟩⟨b⟩ = ⟨ab⟩. Here, ⟨a⟩ = Ra = {xa : x ∈ R}.
6. Solve the following congruence:
3x ≡ 5 (mod 7).
Theorem 2.1.1. The factor ring ℤn = ℤ/nℤ is an integral domain if and only if n = p is a
prime. Furthermore, ℤn is a field again if and only if n = p is a prime.
Hence, for the integers ℤ, a factor ring is a field if and only if it is an integral domain.
We will see later that this is not true in general. However, what is clear is that special
ideals nℤ lead to integral domains and fields when n is a prime. We look at the ideals
pℤ with p a prime in two different ways, and then use these in subsequent sections to
give the general definitions. We first need a famous result, Euclid’s lemma, from number
theory. For integers a, b, the notation a|b means that a divides b.
Proof. Recall that the greatest common divisor or GCD of two integers a, b is an integer
d > 0 such that d is a common divisor of both a and b, and if d1 is another common
divisor of a and b, then d1 |d. We express the GCD of a, b by d = (a, b). It is known that
for any two integers a, b, their GCD exists and is unique, and is the least positive linear
combination of a and b; that is, the least positive integer of the form ax + by for integers
x, y. The integers a, b are relatively prime if their GCD is 1, (a, b) = 1. In this case, 1 is a
linear combination of a and b (see Chapter 3 for proofs and more details).
Now suppose p|ab, where p is a prime. If p does not divide a, then since the only
positive divisors of p are 1 and p, it follows that (a, p) = 1. Hence, 1 is expressible as a
linear combination of a and p. That is, ax+py = 1 for some integers x, y. Multiply through
by b, so that
https://doi.org/10.1515/9783111142524-002
2.2 Prime Ideals and Integral Domains � 21
abx + pby = b.
Now p|ab, so p|abx and p|pby. Therefore, p|abx + pby; that is, p|b.
We now recast this lemma in two different ways in terms of the ideal pℤ. Notice
that pℤ consists precisely of all the multiples of p.
Hence, p|ab is equivalent to ab ∈ pℤ.
This conclusion will be taken as a motivation for the definition of a prime ideal in
the next section.
Lemma 2.1.4. If p is a prime and pℤ ⊂ nℤ, then n = 1, or n = p. That is, every ideal in ℤ
containing pℤ with p a prime is either all of ℤ or pℤ.
In Section 2.3, the conclusion of this lemma will be taken as a motivation for the
definition of a maximal ideal.
This property of an ideal is precisely what is necessary and sufficient to make the
factor ring R/I an integral domain.
Theorem 2.2.2. Let R be a commutative ring with an identity 1 ≠ 0, and let P be a non-
trivial ideal in R. Then P is a prime ideal if and only if the factor ring R/P is an integral
domain.
Proof. Let R be a commutative ring with an identity 1 ≠ 0, and let P be a prime ideal. We
show that R/P is an integral domain. From the results in the last chapter, we have that
R/P is again a commutative ring with an identity. Therefore, we must show that there
are no zero divisors in R/P. Suppose that (a + I)(b + I) = 0 in R/P. The zero element in
R/P is 0 + P and, hence,
(a + P)(b + P) = 0 = 0 + P ⇒ ab + P = 0 + P ⇒ ab ∈ P.
22 � 2 Maximal and Prime Ideals
(a + P)(b + P) = 0.
However, R/P is an integral domain, so it has no zero divisors. It follows that either
a + P = 0 and, hence, a ∈ P or b + P = 0, and b ∈ P. Therefore, either a ∈ P, or b ∈ P.
Therefore, P is a prime ideal.
Definition 2.2.3. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. Define
AB = {a1 b1 + ⋅ ⋅ ⋅ + an bn : ai ∈ A, bi ∈ B, n ∈ ℕ}.
Lemma 2.2.4. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. Then AB is an ideal.
Proof. We must verify that AB is a subring, and that it is closed under multiplication
from R. Le r1 , r2 ∈ AB. Then
r1 = a1 b1 + ⋅ ⋅ ⋅ + an bn for some ai ∈ A, bi ∈ B,
and
r2 = a1′ b′1 + ⋅ ⋅ ⋅ + am
′ ′
bm for some ai′ ∈ A, b′i ∈ B.
Then
r1 ± r2 = a1 b1 + ⋅ ⋅ ⋅ + an bn ± a1′ b′1 ± ⋅ ⋅ ⋅ ± am
′ ′
bm ,
r1 ⋅ r2 = a1 b1 a1′ b′1 + ⋅ ⋅ ⋅ + an bn am
′ ′
bm .
Consider, for example, the first term a1 b1 a1′ b′1 . Since R is commutative, this is equal to
Now a1 a1′ ∈ A since A is a subring, and b1 b′1 ∈ B since B is a subring. Hence, this term
is in AB. Similarly, for each of the other terms. Therefore, r1 r2 ∈ AB and, hence, AB is a
subring.
Now let r ∈ R, and consider rr1 . This is then
Now rai ∈ A for each i since A is an ideal. Hence, each summand is in AB, and then
rr1 ∈ AB. Therefore, AB is an ideal.
Lemma 2.2.5. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. If P is a prime ideal in R, then AB ⊂ P implies that A ⊂ P or B ⊂ P.
Proof. Suppose that AB ⊂ P with P a prime ideal, and suppose that B is not contained
in P. We show that A ⊂ P. Since AB ⊂ P, each product ai bj ∈ P. Choose a b ∈ B with b ∉ P,
and let a be an arbitrary element of A. Then ab ∈ P. Since P is a prime ideal, this implies
either a ∈ P, or b ∈ P. But by assumption b ∉ P, so a ∈ P. Since a was arbitrary, we have
A ⊂ P.
Theorem 2.3.2. Let R be a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R. Then I is a maximal ideal if and only if the factor ring R/I is a field.
Proof. Suppose that R is a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R. Suppose first that I is a maximal ideal, and we show that the factor ring R/I is a field.
Since R is a commutative ring with an identity, the factor ring R/I is also a commu-
tative ring with an identity. We must show then that each nonzero element of R/I has a
multiplicative inverse. Suppose then that r = r + I ∈ R/I is a nonzero element of R/I. It
follows that r ∉ I. Consider the set ⟨r, I⟩ = {rx + i : x ∈ R, i ∈ I}. This is also an ideal (see
exercises) called the ideal generated by r and I, denoted ⟨r, I⟩. Clearly, I ⊂ ⟨r, I⟩, and
since r ∉ I, and r = r ⋅ 1 + 0 ∈ ⟨r, I⟩, it follows that ⟨r, I⟩ ≠ I. Since I is a maximal ideal,
it follows that ⟨r, I⟩ = R the whole ring. Hence, the identity element 1 ∈ ⟨r, I⟩, and so,
there exist elements x ∈ R and i ∈ I such that 1 = rx + i. But then 1 ∈ (r + I)(x + I), and
so, 1 + I = (r + I)(x + I). Since 1 + I is the multiplicative identity of R/I, it follows that
24 � 2 Maximal and Prime Ideals
Recall that a field is already an integral domain. Combining this with the ideas of
prime and maximal ideals we obtain:
Theorem 2.3.3. Let R be a commutative ring with an identity 1 ≠ 0. Then each maximal
ideal is a prime ideal.
Proof. Suppose that R is a commutative ring with an identity and I is a maximal ideal
in R. Then from Theorem 2.3.2, we have that the factor ring R/I is a field. But a field is an
integral domain, so R/I is an integral domain. Therefore, from Theorem 2.2.2, we have
that I must be a prime ideal.
The converse is not true in general. That is, there are prime ideals that are not max-
imal. Consider, for example, R = ℤ the integers and I = {0}. Then I is an ideal, and
R/I = ℤ/{0} ≅ ℤ is an integral domain. Hence, {0} is a prime ideal. However, ℤ is not
a field, so {0} is not maximal. Note, however, that in the integers ℤ, a proper ideal is
maximal if and only if it is a prime ideal.
Zorn’s lemma. If each chain of M has an upper bound in M, then there is at least one
maximal element in M.
Axiom of well-ordering. Each set M can be well-ordered, such that each nonempty sub-
set of M contains a least element.
Axiom of choice. Let {Mi : i ∈ I} be a nonempty collection of nonempty sets. Then there
is a mapping f : I → ⋃i∈I Mi with f (i) ∈ Mi for all i ∈ I.
Theorem 2.4.1. Zorn’s lemma, the axiom of well-ordering and the axiom of choice are all
equivalent.
We now show the existence of maximal ideals in commutative rings with identity.
Theorem 2.4.2. Let R be a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R with I ≠ R. Then there exists a maximal ideal I0 in R with I ⊂ I0 . In particular, a ring
with an identity contains maximal ideals.
Proof. Let I be an ideal in the commutative ring R. We must show that there exists a
maximal ideal I0 in R with I ⊂ I0 .
Let
Then M is partially ordered by containment. We want to show first that each chain in M
has a maximal element. If K = {Xj : Xj ∈ M, j ∈ J} is a chain, let
X ′ = ⋃ Xj .
j∈J
Lemma 2.5.1. Let R be a commutative ring and a1 , . . . , an be elements of R. Then the set
⟨a1 , . . . , an ⟩ = {r1 a1 + ⋅ ⋅ ⋅ + rn an : ri ∈ R}
a = r1 a1 + ⋅ ⋅ ⋅ + rn an , b = s1 a1 + ⋅ ⋅ ⋅ + sn an
Proof. Every ideal I in ℤ is of the form nℤ. This is the principal ideal generated by n.
Definition 2.5.4. A principal ideal domain or PID is an integral domain, in which every
ideal is principal.
We mention that the set of polynomials K[x] with coefficients from a field K is also
a principal ideal domain. We will return to this in the next chapter.
Not every integral domain is a PID. Consider K[x, y] = (K[x])[y], the set of polyno-
mials over K in two variables x, y (see Chapter 4). Let I consist of all the polynomials
with zero constant term.
Lemma 2.5.6. The set I in K[x, y] as defined above is an ideal, but not a principal ideal.
Proof. We leave the proof that I forms an ideal to the exercises. To show that it is not
a principal ideal, suppose I = ⟨p(x, y)⟩. Now the polynomial q(x) = x has zero constant
term, so q(x) ∈ I. Hence, p(x, y) cannot be a constant polynomial. In addition, if p(x, y)
2.6 Exercises � 27
had any terms with y in them, there would be no way to multiply p(x, y) by a polynomial
h(x, y) and obtain just x. Therefore, p(x, y) can contain no terms with y in them. But the
same argument, using s(y) = y, shows that p(x, y) cannot have any terms with x in them.
Therefore, there can be no such p(x, y) generating I, and so, I is not principal, and K[x, y]
is not a principal ideal domain.
2.6 Exercises
1. Consider the set ⟨r, I⟩ = {rx + i : x ∈ R, i ∈ I}, where I is an ideal. Prove that this is
also an ideal called the ideal generated by r and I, denoted ⟨r, I⟩.
2. Let R and S be commutative rings, and let ϕ : R → S be a ring epimorphism. Let
M be a maximal ideal in R. Show that ϕ(M) is a maximal ideal in S if and only if
ker(ϕ) ⊂ M. Is ϕ(M) always a prime ideal of S?
3. Let A1 , . . . , At be ideals of a commutative ring R. Let P be a prime ideal of R. Show:
(i) ⋂ti=1 Ai ⊂ P implies Aj ⊂ P for at least one index j.
(ii) ⋂ti=1 Ai = P implies Aj = P for at least one index j.
4. Which of the following ideals A are prime ideals of R? Which are maximal ideals?
(i) A = ⟨x⟩, R = ℤ[x].
(ii) A = ⟨x 2 ⟩, R = ℤ[x].
(iii) A = ⟨1 + √5⟩, R = ℤ[√5] = {a + b√5 : a, b ∈ ℤ}.
(iv) A = ⟨x, y⟩, R = ℚ[x, y].
5. Let w = 21 (1+ √−3). Show that ⟨2⟩ is a prime ideal and even a maximal ideal of ℤ[w],
but ⟨2⟩ is neither a prime ideal nor a maximal ideal of ℤ[i], i = √−1 ∈ ℂ.
6. Let R = { ab : a, b ∈ ℤ, b odd}. Show that R is a subring of ℚ, and that there is only
one maximal ideal M in R.
7. Let R be a commutative ring with an identity. Let x, y ∈ R and x ≠ 0 not be a zero di-
visor. Furthermore, let ⟨x⟩ be a prime ideal with ⟨x⟩ ⊂ ⟨y⟩ ≠ R. Show that ⟨x⟩ = ⟨y⟩.
8. Consider K[x, y] the set of polynomials over K in two variables x, y. Let I consist of
all the polynomials with zero constant term. Prove that the set I is an ideal.
3 Prime Elements and Unique Factorization Domains
In this chapter we use again polynomials over integral domains with one or two indeter-
minates in an elementary fashion. We will consider polynomial rings in detail in later
chapters.
n = cp1 p2 ⋅ ⋅ ⋅ pk ,
There are two main ingredients that go into the proof: induction and Euclid’s lemma.
We presented this in the last chapter. In turn, however, Euclid’s lemma depends upon
the existence of greatest common divisors and their linear expressibility. Therefore, to
begin, we present several basic ideas from number theory.
The starting point for the theory of numbers is divisibility.
Definition 3.1.2. If a, b are integers, we say that a divides b, or that a is a factor or divisor
of b, if there exists an integer q such that b = aq. We denote this by a|b. b is then a multiple
of a. If b > 1 is an integer whose only factors are ±1, ±b, then b is a prime, otherwise, b > 1
is composite.
https://doi.org/10.1515/9783111142524-003
3.1 The Fundamental Theorem of Arithmetic � 29
Theorem 3.1.4 (Division algorithm). Given integers a, b with a > 0, then there exist unique
integers q and r such that b = qa + r, where either r = 0 or 0 < r < a.
One may think of q and r as the quotient and remainder, respectively, when dividing
b by a.
S = {b − qa ≥ 0 : q ∈ ℤ}.
If b > 0, then b + a ≥ 0, and the sum is in S. If b ≤ 0, then there exists a q > 0 with
−qa < b. Then b + qa > 0 and is in S. Therefore, in either case, S is nonempty. Hence, S
is a nonempty subset of ℕ ∪ {0} and, therefore, has a least element r. If r ≠ 0, we must
show that 0 < r < a. Suppose r ≥ a, then r = a + x with x ≥ 0, and x < r since a > 0.
Then b − qa = r = a + x ⇒ b − (q + 1)a = x. This means that x ∈ S. Since x < r, this
contradicts the minimality of r, which is a contradiction. Therefore, if r ≠ 0, it follows
that 0 < r < a.
The only thing left is to show the uniqueness of q and r. Suppose b = q1 a + r1 also.
By the construction above, r1 must also be the minimal element of S. Hence, r1 ≤ r, and
r ≤ r1 so r = r1 . Now
b − qa = b − q1 a ⇒ (q1 − q)a = 0,
Definition 3.1.5. Given nonzero integers a, b, their greatest common divisor or GCD
d > 0 is a positive integer such that it is their common divisor, that is, d|a and d|b, and
if d1 is any other common divisor, then d1 |d. We denote the greatest common divisor of
a, b by either gcd(a, b) or (a, b).
Certainly, if a, b are nonzero integers with a > 0 and a|b, then a = gcd(a, b).
The next result says that given any nonzero integers, they do have a greatest com-
mon divisor, and it is unique.
Theorem 3.1.6. Given nonzero integers a, b, their GCD exists, is unique, and can be char-
acterized as the least positive linear combination of a and b.
Now, a2 + b2 > 0, so S is a nonempty subset of ℕ and, hence, has a least element, d > 0.
We show that d is the GCD.
First we must show that d is a common divisor. Now d = ax + by and is the least
such positive linear combination. By the division algorithm, a = qd + r with 0 ≤ r < d.
Suppose r ≠ 0. Then r = a − qd = a − q(ax + by) = (1 − qx)a − qby > 0. Hence, r is a
positive linear combination of a and b, and therefore in S. But then r < d, contradicting
the minimality of d in S. It follows that r = 0, and so, a = qd, and d|a. An identical
argument shows that d|b, and so, d is a common divisor of a and b. Let d1 be any other
common divisor of a and b. Then d1 divides any linear combination of a and b, and so
d1 |d. Therefore, d is the GCD of a and b.
Finally, we must show that d is unique. Suppose d1 is another GCD of a and b. Then
d1 > 0, and d1 is a common divisor of a, b. Then d1 |d since d is a GCD. Identically, d|d1
since d1 is a GCD. Therefore, d = ±d1 , and then d = d1 since they are both positive.
If (a, b) = 1, then we say that a, b are relatively prime. It follows that a and b are
relatively prime if and only if 1 is expressible as a linear combination of a and b. We
need the following three results:
Proof. If d = (a, b), then d|a, and d|b. Hence, a = a1 d, and b = b1 d. We have
d = ax + by = a1 dx + b1 dy.
1 = a1 x + b1 y.
Therefore, (a1 , b1 ) = 1.
3.1 The Fundamental Theorem of Arithmetic � 31
Lemma 3.1.8. For any integer c, we have that (a, b) = (a, b + ac).
Proof. Suppose (a, b) = d and (a, b + ac) = d1 . Now d is the least positive linear combi-
nation of a and b. Suppose d = ax + by. d1 is a linear combination of a, b + ac so that
The next result, called the Euclidean algorithm, provides a technique for both find-
ing the GCD of two integers and expressing the GCD as a linear combination.
Theorem 3.1.9 (Euclidean algorithm). Given integers b and a > 0 with a ∤ b, the following
repeated divisions are formed:
b = q1 a + r1 , 0 < r1 < a
a = q2 r1 + r2 , 0 < r2 < r1
..
.
rn−2 = qn rn−1 + rn , 0 < rn < rn−1
rn−1 = qn+1 rn .
Proof. In taking the successive divisions as outlined in the statement of the theorem,
each remainder ri gets strictly smaller and still nonnegative. Hence, it must finally end
with a zero remainder. Therefore, there is a last nonzero remainder rn . We must show
that this is the GCD.
Now from Lemma 3.1.7, the gcd (a, b) = (a, b − q1 a) = (a, r1 ) = (r1 , a − q2 r1 ) = (r1 , r2 ).
Continuing in this manner, we have then that (a, b) = (rn−1 , rn ) = rn since rn divides rn−1 .
This shows that rn is the GCD.
To express rn as a linear combination of a and b, first notice that
rn = rn−2 − qn rn−1 .
Example 3.1.10. Find the GCD of 270 and 2412, and express it as a linear combination of
270 and 2412.
We apply the Euclidean algorithm
Therefore, the last nonzero remainder is 18, which is the GCD. We now must express 18
as a linear combination of 270 and 2412.
From the first equation
The next result that we need is Euclid’s lemma. We stated and proved this in the last
chapter, but we restate it here.
Lemma 3.1.11 (Euclid’s lemma). If p is a prime and p|ab, then p|a, or p|b.
Lemma 3.1.12. Any integer n > 1 can be expressed as a product of primes, perhaps with
only one factor.
Proof. The proof is by induction. n = 2 is prime. Therefore, it is true at the lowest level.
Suppose that any integer 2 ≤ k < n can be decomposed into prime factors, we must
show that n then also has a prime factorization.
If n is prime, then we are done. Suppose then that n is composite. Hence, n = m1 m2
with 1 < m1 < n, 1 < m2 < n. By the inductive hypothesis, both m1 and m2 can be
expressed as products of primes. Therefore, n can, also using the primes from m1 and
m2 , completing the proof.
Before we continue to the fundamental theorem, we mention that the existence of
a prime decomposition, unique or otherwise, can be used to prove that the set of primes
is infinite. The proof we give goes back to Euclid and is quite straightforward.
Proof. Suppose that there are only finitely many primes p1 , . . . , pn . Each of these is pos-
itive, so we can form the positive integer
3.1 The Fundamental Theorem of Arithmetic � 33
N = p1 p2 ⋅ ⋅ ⋅ pn + 1.
From Lemma 3.1.12, N has a prime decomposition. In particular, there is a prime p, which
divides N. Then
p|(p1 p2 ⋅ ⋅ ⋅ pn + 1).
Since the only primes are assumed p1 , p2 , . . . , pn , it follows that p = pi for some i =
1, . . . , n. But then p|p1 p2 ⋅ ⋅ ⋅ pi ⋅ ⋅ ⋅ pn so p cannot divide p1 ⋅ ⋅ ⋅ pn + 1, which is a contradic-
tion. Therefore, p is not one of the given primes showing that the list of primes must be
endless.
Proof. We assume that n ≥ 1. If n ≤ −1, we use c = −n, and the proof is the same. The
statement certainly holds for n = 1 with k = 0. Now suppose n > 1. From Lemma 3.1.12,
n has a prime decomposition:
n = p1 p2 ⋅ ⋅ ⋅ pm .
We must show that this is unique up to the ordering of the factors. Suppose then that n
has another such factorization n = q1 q2 ⋅ ⋅ ⋅ qk with the qi all prime. We must show that
m = k, and that, the primes are the same. Now we have
n = p1 p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk .
n = p1 p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk ,
it follows that p1 |q1 q2 ⋅ ⋅ ⋅ qk . From Lemma 3.1.11 then, we must have that p1 |qi for some i.
But qi is prime, and p1 > 1, so it follows that p1 = qi . Therefore, we can eliminate p1 and
qi from both sides of the factorization to obtain
p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qi−1 qi+1 ⋅ ⋅ ⋅ qk .
Continuing in this manner, we can eliminate all the pi from the left side of the factoriza-
tion to obtain
1 = qm+1 ⋅ ⋅ ⋅ qk .
If qm+1 , . . . , qk were primes, this would be impossible. Therefore, m = k, and each prime
pi was included in the primes q1 , . . . , qm . Therefore, the factorizations differ only in the
order of the factors, proving the theorem.
34 � 3 Prime Elements and Unique Factorization Domains
Notice that in the integers ℤ, the units are just ±1. The set of prime elements co-
incides with the set of irreducible elements. In ℤ, these are precisely the set of prime
numbers. On the other hand, if K is a field, every nonzero element is a unit. Therefore,
in K, there are no prime elements and no irreducible elements.
Recall that the modular rings ℤn are fields (and integral domains) when n is a prime.
In general, if n is not a prime then ℤn is a commutative ring with an identity, and a unit
is still an invertible element. We can characterize the units within ℤn .
Proof. Suppose (a, n) = 1. Then there exist x, y ∈ ℤ such that ax + ny = 1. This implies
that ax ≡ 1 (mod n), which in turn implies that ax = 1 in ℤn and, therefore, a is a unit.
Conversely, suppose a is a unit in ℤn . Then there is an x ∈ ℤn with ax = 1. In terms
of congruence then
ax ≡ 1 (mod n) ⇒ n|(ax − 1) ⇒ ax − 1 = ny ⇒ ax − ny = 1.
If R is an integral domain, then the set of units within R will form a group.
Lemma 3.2.3. If R is a commutative ring with an identity, then the set of units in R form
an Abelian group under ring multiplication. This is called the unit group of R, denoted
U(R).
Proof. The commutativity and associativity of U(R) follow from the ring properties. The
identity of U(R) is the multiplicative identity of R, whereas the ring multiplicative in-
verse for each unit is the group inverse. We must show that U(R) is closed under ring
3.2 Prime Elements, Units and Irreducibles � 35
Hence, ab has an inverse, namely b−1 a−1 (= a−1 b−1 in a commutative ring) and, hence,
ab is also a unit. Therefore, U(R) is closed under ring multiplication.
In general, irreducible elements are not prime. Consider for example the subring of
the complex numbers (see exercises) given by
This is a subring of the complex numbers ℂ and, hence, can have no zero divisors. There-
fore, R is an integral domain.
For an element x + iy√5 ∈ R, define its norm by
Proof. The fact that the norm is multiplicative is straightforward and left to the exer-
cises. If a ∈ R is a unit, then there exists a multiplicative inverse b ∈ R with ab = 1. Then
N(ab) = N(a)N(b) = 1. Since both N(a) and N(b) are nonnegative integers, we must
have N(a) = N(b) = 1.
Conversely, suppose that N(a) = 1. If a = x + iy√5, then x 2 + 5y2 = 1. Since x, y ∈ ℤ,
we must have y = 0 and x 2 = 1. Then a = x = ±1.
Using this lemma we can show that R possesses irreducible elements that are not
prime.
We show that 3 is not prime in R. Let a = 2 + i√5 and b = 2 − i√5. Then ab = 9 and,
hence, 3|ab. Suppose 3|a so that a = 3c for some c ∈ R. Then
Therefore, c is a unit in R, and from Lemma 3.2.4, we get c = ±1. Hence, a = ±3. This
is a contradiction, so 3 does not divide a. An identical argument shows that 3 does not
divide b. Therefore, 3 is not a prime element in R.
Proof. (1) Suppose that p ∈ R is a prime element, and p = ab. We must show that either
a or b must be a unit. Now p|ab, so either p|a, or p|b. Without loss of generality, we may
assume that p|a, so a = pr for some r ∈ R. Hence, p = ab = (pr)b = p(rb). However, R is
an integral domain, so p − prb = p(1 − rb) = 0 implies that 1 − rb = 0 and, hence, rb = 1.
Therefore, b is a unit and, hence, p is irreducible.
(2) Suppose that p is a prime element. Then p ≠ 0. Consider the ideal pR, and suppose
that ab ∈ pR. Then ab is a multiple of p and, hence, p|ab. Since p is prime, it follows that
p|a or p|b. If p|a, then a ∈ pR, whereas if p|b, then b ∈ pR. Therefore, pR is a prime ideal.
Conversely, suppose that pR is a prime ideal, and suppose that p = ab. Then ab ∈ pR,
so a ∈ pR, or b ∈ pR. If a ∈ pR, then p|a, and if b ∈ pR, then p|b. Therefore, p is prime.
(3) Let p be irreducible, then p ≠ 0. Suppose that pR ⊂ aR, where a ∈ R. Then p = ra
for some r ∈ R. Since p is irreducible, it follows that either a is a unit, or r is a unit. If
r is a unit, we have pR = raR = aR ≠ R since p is not a unit. If a is a unit, then aR = R,
and pR = rR ≠ R. Therefore, pR is maximal in the set of principal ideals not equal to R.
Conversely, suppose p ≠ 0 and pR is a maximal ideal in the set of principal ideals ≠ R. Let
p = ab with a not a unit. We must show that b is a unit. Since aR ≠ R, and pR ⊂ aR, from
the maximality we must have pR = aR. Hence, a = rp for some r ∈ R. Then p = ab = rpb
and, as before, we must have rb = 1 and b a unit.
Theorem 3.2.7. Let R be a principle ideal domain. Then we have the following:
(1) An element p ∈ R is irreducible if and only if it is a prime element.
(2) A nonzero ideal of R is a maximal ideal if and only if it is a prime ideal.
(3) The maximal ideals of R are precisely those ideals pR, where p is a prime element.
Proof. First note that {0} is a prime ideal, but not maximal.
(1) We already know that prime elements are irreducible. To show the converse,
suppose that p is irreducible. Since R is a principal ideal domain from Theorem 3.2.6, we
3.3 Unique Factorization Domains � 37
have that pR is a maximal ideal, and each maximal ideal is also a prime ideal. Therefore,
from Theorem 3.2.6, we have that p is a prime element.
(2) We already know that each maximal ideal is a prime ideal. To show the converse,
suppose that I ≠ {0} is a prime ideal. Then I = pR, where p is a prime element with
p ≠ 0. Therefore, p is irreducible from part (1) and, hence, pR is a maximal ideal from
Theorem 3.2.6.
(3) This follows directly from the proof in part (2) and Theorem 3.2.6.
This Theorem especially explains the following remark at the end of Section 2.3: In
the principal ideal domain ℤ, a proper ideal is maximal if and only if it is a prime ideal.
r = p1 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk ,
There are several relationships in integral domains that are equivalent to unique
factorization.
Notice that properties (A) and (C) together are equivalent to what we defined as
unique factorization. Hence, an integral domain satisfying (A) and (C) is a UFD. Next, we
show that there are other equivalent formulations.
(1) R is a UFD.
(2) R satisfies properties (A) and (B).
(3) R satisfies properties (A) and (C).
(4) R satisfies property (A′ ).
Proof. As remarked before, the statement of the theorem by definition (A) and (C) are
equivalent to unique factorization. We show here that (2), (3), and (4) are equivalent.
First, we show that (2) implies (3).
Suppose that R satisfies properties (A) and (B). We must show that it also satisfies (C);
that is, we must show that if q ∈ R is irreducible, then q is prime. Suppose that q ∈ R
is irreducible and q|ab with a, b ∈ R. Then we have ab = cq for some c ∈ R. If a is a
unit from ab = cq, we get that b = a−1 cq, and q|b. The results are identical if b is a unit.
Therefore, we may assume that neither a nor b are units.
If c = 0, then since R is an integral domain, either a = 0, or b = 0, and q|a, or q|b.
We may assume then that c ≠ 0.
If c is a unit, then q = c−1 ab, and since q is irreducible, either c−1 a, or b are units. If
c a is a unit, then a is also a unit. Therefore, if c is a unit, either a or b are units contrary
−1
to our assumption.
Therefore, we may assume that c ≠ 0, and c is not a unit. From (A) we have
a = q1 ⋅ ⋅ ⋅ qr
b = q1′ ⋅ ⋅ ⋅ qs′
c = q1′′ ⋅ ⋅ ⋅ qt′′ ,
From (B), q is an associate of some qi or qj′ . Hence, q|qi or q|qj′ . It follows that q|a, or
q|b and, therefore, q is a prime element.
That (3) implies (4) is direct.
We show that (4) implies (2). Suppose that R satisfies (A′ ). We must show that it satis-
fies both (A) and (B). We show first that (A) follows from (A′ ) by showing that irreducible
elements are prime. Suppose that q is irreducible. Then from (A′ ), we have
q = p1 ⋅ ⋅ ⋅ pr
with each pi prime. It follows, without loss of generality, that p2 ⋅ ⋅ ⋅ pr is a unit, and p1 is a
nonunit and, hence, pi |1 for i = 2, . . . , r. Thus, q = p1 , and q is prime. Therefore, (A) holds.
We now show that (B) holds. Let
q1 ⋅ ⋅ ⋅ qr = q1′ ⋅ ⋅ ⋅ qs′ ,
3.3 Unique Factorization Domains � 39
q1′ |q1 ⋅ ⋅ ⋅ qr ,
and so, q1′ |qi for some i. Without loss of generality, suppose q1′ |q1 . Then q1 = aq1′ . Since q1
is irreducible, it follows that a is a unit, and q1 and q1′ are associates. It follows then that
since R has no zero divisors. Property (B) holds then by induction, and the theorem is
proved.
Note that in our new terminology, ℤ is a UFD. In the next section, we will present
other examples of UFD’s. However, not every integral domain is a unique factorization
domain.
As we defined in the last section, let R be the following subring of ℂ:
9 = 3 ⋅ 3 = (2 + i√5)(2 − i√5)
give two different decompositions for an element in terms of irreducible elements. The
fact that R is not a UFD also follows from the fact that 3 is an irreducible element, which
is not prime.
Unique factorization is tied to the famous solution of Fermat’s big theorem. Wiles
and Taylor in 1995 proved the following:
Theorem 3.3.4. The equation x p + yp = zp has no integral solutions with xyz ≠ 0 for any
prime p ≥ 3.
p−1
zp − yp = ∏(z − ϵj y).
j=0
p−1
R = ℤ[ϵ] = { ∑ aj ϵj : aj ∈ ℤ}.
j=0
40 � 3 Prime Elements and Unique Factorization Domains
Kummer proved that if R is a UFD, then property (Fp ) holds. However, independently,
from Uchida and Montgomery (1971), R is a UFD only if p ≤ 19 (see [59]).
I1 ⊂ I2 ⊂ ⋅ ⋅ ⋅ ⊂ In ⊂ ⋅ ⋅ ⋅
Theorem 3.4.1. Let R be an integral domain. If each ascending chain of principal ideals
in R becomes stationary, then R satisfies property (A).
Proof. Suppose that a ≠ 0 is a not a unit in R. Suppose that a is not a product of ir-
reducible elements. Clearly then, a cannot itself be irreducible. Hence, a = a1 b1 with
a1 , b1 ∈ R, and a1 , b1 are not units. If both a1 or b1 can be expressed as a product of irre-
ducible elements, then so can a. Without loss of generality then, suppose that a1 is not a
product of irreducible elements.
Since a1 |a, we have the inclusion of ideals aR ⊆ a1 R. If a1 R = aR, then a1 ∈ aR, and
a1 = ar = a1 b1 r, which implies that b1 is a unit contrary to our assumption. Therefore,
aR ≠ a1 R, and the inclusion is proper. By iteration then, we obtain a strictly increasing
chain of ideals
aR ⊂ a1 R ⊂ ⋅ ⋅ ⋅ ⊂ an R ⊂ ⋅ ⋅ ⋅ .
From our hypothesis on R, this must become stationary, contradicting the argument
above that the inclusion is proper. Therefore, a must be a product of irreducibles.
Proof. Suppose that R is a principal ideal domain. R satisfies property (C) by Theo-
rem 3.2.7(1). Therefore, to show that it is a unique factorization domain, we must show
that it also satisfies property (A). From the previous theorem, it suffices to show that
each ascending chain of principal ideals becomes stationary. Consider such an ascend-
ing chain
a1 R ⊂ a2 R ⊂ ⋅ ⋅ ⋅ ⊂ an R ⊂ ⋅ ⋅ ⋅ .
Now let
∞
I = ⋃ ai R.
i=1
3.4 Principal Ideal Domains and Unique Factorization � 41
P(x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n
that is, the coefficient of x i in P(x) ± Q(x) is ai ± bi , where ai = 0 for i > n, and bj = 0 for
j > m. Multiplication is given by
P(x) + Q(x) = 3x 2 + 6x + 1
and
From the definitions, the following degree relationships are clear. The proofs are in
the exercises.
Lemma 3.4.4. Let 0 ≠ P(x), 0 ≠ Q(x) in K[x]. Then the following hold:
(1) deg P(x)Q(x) = deg P(x) + deg Q(x).
(2) deg(P(x) ± Q(x)) ≤ max(deg P(x), deg Q(x)) if P(x) ± Q(x) ≠ 0.
Theorem 3.4.5. If K is a field, then K[x] forms an integral domain. K can be naturally
embedded into K[x] by identifying each element of K with the corresponding constant
polynomial. The only units in K[x] are the nonzero elements of K.
Proof. Verification of the basic ring properties is solely computational and is left to the
exercises. Since deg P(x)Q(x) = deg P(x) + deg Q(x), it follows that if neither P(x) ≠ 0,
nor Q(x) ≠ 0, then P(x)Q(x) ≠ 0 and, therefore, K[x] is an integral domain.
If G(x) is a unit in K[x], then there exists an H(x) ∈ K[x] with G(x)H(x) = 1. From
the degrees, we have deg G(x) + deg H(x) = 0, and since deg G(x) ≥ 0, deg H(x) ≥ 0. This
is possible only if deg G(x) = deg H(x) = 0. Therefore, G(x) ∈ K.
Now that we have K[x] as an integral domain, we proceed to show that K[x] is a
principal ideal domain and, hence, there is unique factorization into primes.
We first repeat the definition of a prime in K[x]. If 0 ≠ f (x) has no nontrivial,
nonunit factors (it cannot be factorized into polynomials of lower degree), then f (x) is
a prime in K[x] or a prime polynomial. A prime polynomial is also called an irreducible
polynomial. Clearly, if deg g(x) = 1, then g(x) is irreducible.
The fact that K[x] is a principal ideal domain follows from the division algorithm
for polynomials, which is entirely analogous to the division algorithm for integers.
Lemma 3.4.6 (Division algorithm in K[x]). If 0 ≠ f (x), 0 ≠ g(x) ∈ K[x], then there exist
unique polynomials q(x), r(x) ∈ K[x] such that f (x) = q(x)g(x) + r(x), where r(x) = 0 or
deg r(x) < deg g(x).
(The polynomials q(x) and r(x) are called, respectively, the quotient and remainder.)
We give a formal proof in Chapter 4 on polynomials and polynomial rings. For now
we content ourselves here with doing two computations in ℚ[x] in the following exam-
ple.
3x 4 − 6x 2 + 8x − 6 3 2
= x −6 with remainder 8x + 18.
2x 2 + 4 2
2x 5 + 2x 4 + 6x 3 + 10x 2 + 4x
= 2x 3 + 6x + 4.
x2 + x
Theorem 3.4.8. Let K be a field. Then the polynomial ring K[x] is a principal ideal do-
main; hence a unique factorization domain.
Proof. The proof is essentially analogous to the proof in the integers. Let I be an ideal
in K[x] with I ≠ K[x]. Let f (x) be a polynomial in I of minimal degree. We claim that
I = ⟨f (x)⟩, the principal ideal generated by f (x). Let g(x) ∈ I. We must show that g(x) is
a multiple of f (x). By the division algorithm in K[x], we have
where r(x) = 0, or deg(r(x)) < deg(f (x)). If r(x) ≠ 0, then deg(r(x)) < deg(f (x)). How-
ever, r(x) = g(x)−q(x)f (x) ∈ I since I is an ideal, and g(x), f (x) ∈ I. This is a contradiction
since f (x) was assumed to be a polynomial in I of minimal degree. Therefore, r(x) = 0
and, hence, g(x) = q(x)f (x) is a multiple of f (x). Therefore, each element of I is a multi-
ple of f (x) and, hence, I = ⟨f (x)⟩.
Therefore, K[x] is a principal ideal domain and, from Theorem 3.4.2, a unique fac-
torization domain.
We proved that in a principal ideal domain, every ascending chain of ideals becomes
stationary. In general, a ring R (commutative or not) satisfies the ascending chain con-
dition or ACC if every ascending chain of left (or right) ideals in R becomes stationary.
A ring satisfying the ACC is called a Noetherian ring.
r2 = qr1 + r,
44 � 3 Prime Elements and Unique Factorization Domains
Therefore, Euclidean domains are precisely those integral domains, which allow
division algorithms. In the integers ℤ, define N(z) = |z|. Then N is a Euclidean norm on
ℤ and, hence, ℤ is a Euclidean domain. On K[x], define N(p(x)) = deg(p(x)) if p(x) ≠ 0.
Then N is also a Euclidean norm on K[x] so that K[x] is also a Euclidean domain. In any
Euclidean domain, we can mimic the proofs of unique factorization in both ℤ and K[x]
to obtain the following:
Theorem 3.5.2. Every Euclidean domain is a principal ideal domain; hence a unique fac-
torization domain.
Before proving this theorem, we must develop some results on the number theory
of general Euclidean domains. First, some properties of the norm.
(b) Suppose u is a unit. Then there exists u−1 with u ⋅ u−1 = 1. Then
1 = qu + r.
If r ≠ 0, then N(r) < N(u) = N(1), contradicting the minimality of N(1). Therefore, r = 0,
and 1 = qu. Then u has a multiplicative inverse and, hence, is a unit.
(c) Suppose a, b ∈ R⋆ are associates. Then a = ub with u a unit. Then
Since N(a) ≤ N(b), and N(b) ≤ N(a), it follows that N(a) = N(b).
(d) Suppose N(a) = N(ab). Apply the division algorithm
a = q(ab) + r,
contradicting that N(r) < N(ab). Hence, r = 0, and a = q(ab) = (qb)a. Then
a = (qb)a = 1 ⋅ a ⇒ qb = 1
b = qa + r,
ℤ[i] = {a + bi : a, b ∈ ℤ}.
It was first observed by Gauss that this set permits unique factorization. To show this,
we need a Euclidean norm on ℤ[i].
N(a + bi) = a2 + b2 .
The basic properties of this norm follow directly from the definition (see exercises).
From the multiplicativity of the norm, we have the following concerning primes
and units in ℤ[i].
Proof. Certainly u is a unit if and only if N(u) = N(1). But in ℤ[i], we have N(1) = 1.
Therefore, the first part follows.
Suppose next that π ∈ ℤ[i] with N(π) = p for some p ∈ ℤ. Suppose that π = π1 π2 .
From the multiplicativity of the norm, we have
Since each norm is a positive ordinary integer, and p is a prime, it follows that either
N(π1 ) = 1, or N(π2 ) = 1. Hence, either π1 or π2 is a unit. Therefore, π is a prime in ℤ[i].
Armed with this norm, we can show that ℤ[i] is a Euclidean domain.
Proof. That ℤ[i] forms a commutative ring with an identity can be verified directly and
easily. If αβ = 0, then N(α)N(β) = 0, and since there are no zero divisors in ℤ, we must
have N(α) = 0, or N(β) = 0. But then either α = 0, or β = 0 and, hence, ℤ[i] is an integral
domain. To complete the proof, we show that the norm N is a Euclidean norm.
From the multiplicativity of the norm, we have, if α, β ≠ 0
Therefore, property (1) of Euclidean norms is satisfied. We must now show that the di-
vision algorithm holds.
Let α = a + bi and β = c + di be Gaussian integers. Recall that the inverse for a
nonzero complex number z = x + iy is
1 z x − iy
= = .
z |z|2 x 2 + y2
α β c − di
= α 2 = (a + bi) 2
β |β| c + d2
ac + bd ac − bd
= 2 + 2 i = u + iv.
c + d2 c + d2
3.5 Euclidean Domains � 47
{u + iv : u, v ∈ ℚ}
Now
α 2 2
1 1
− q = (u − m) + i(v − n) = √(u − m)2 + (v − n)2 ≤ √( ) + ( ) < 1.
β 2 2
Therefore,
Since ℤ[i] forms a Euclidean domain, it follows from our previous results that ℤ[i]
must be a principal ideal domain; hence a unique factorization domain.
Since we will now be dealing with many kinds of integers, we will refer to the ordi-
nary integers ℤ as the rational integers and the ordinary primes p as the rational primes.
It is clear that ℤ can be embedded into ℤ[i]. However, not every rational prime is also
prime in ℤ[i]. The primes in ℤ[i] are called the Gaussian primes. For example, we can
show that both 1 + i and 1 − i are Gaussian primes; that is, primes in ℤ[i]. However,
(1 + i)(1 − i) = 2. Therefore, the rational prime 2 is not a prime in ℤ[i]. Using the multi-
plicativity of the Euclidean norm in ℤ[i], we can describe all the units and primes in ℤ[i].
Theorem 3.5.9. (1) The only units in ℤ[i] are ±1, ±i.
(2) Suppose π is a Gaussian prime. Then π is one of the following:
(a) a positive rational prime p ≡ 3 (mod 4), or an associate of such a rational prime.
(b) 1 + i, or an associate of 1 + i.
(c) a + bi, or a − bi, where a > 0, b > 0, a is even, and N(π) = a2 + b2 = p with p a
rational prime congruent to 1 modulo 4, or an associate of a + bi, or a − bi.
Proof. (1) Suppose u = x + iy ∈ ℤ[i] is a unit. Then, from Lemma 3.5.6, N(u) = x 2 + y2 = 1,
implying that (x, y) = (0, ±1) or (x, y) = (±1, 0). Hence, u = ±1 or u = ±i.
48 � 3 Prime Elements and Unique Factorization Domains
(2) Now suppose that π is a Gaussian prime. Since N(π) = ππ, and π ∈ ℤ[i], it
follows that π|N(π). N(π) is a rational integer, so N(π) = p1 ⋅ ⋅ ⋅ pk , where the pi ’s are
rational primes. By Euclid’s lemma π|pi for some pi and, hence, a Gaussian prime must
divide at least one rational prime. On the other hand, suppose π|p and π|q, where p, q
are different primes. Then (p, q) = 1 and, hence, there exist x, y ∈ ℤ such that 1 = px +qy.
It follows that π|1 is a contradiction. Therefore, a Gaussian prime divides one and only
one rational prime.
Let p be the rational prime that π divides. Then N(π)|N(p) = p2 . Since N(π) is a
rational integer, it follows that N(π) = p, or N(π) = p2 . If π = a + bi, then a2 + b2 = p, or
a 2 + b2 = p2 .
If p = 2, then a2 + b2 = 2, or a2 + b2 = 4. It follows that π = ±2, ±2i, or π = 1 + i, or an
associate of 1 + i. Since (1 + i)(1 − i) = 2, and neither 1 + i, nor 1 − i are units, it follows that
neither 2, nor any of its associates are primes. Then π = 1 + i, or an associate of 1 + i. To
see that 1 + i is prime supposes 1 + i = αβ. Then N(1 + i) = 2 = N(α)N(β). It follows that
either N(α) = 1, or N(β) = 1, and either α or β is a unit.
If p ≠ 2, then either p ≡ 3 (mod 4), or p ≡ 1 (mod 4). First suppose p ≡ 3 (mod 4).
Then a2 + b2 = p would imply (Fermat’s two-square theorem, see [53]) p ≡ 1 (mod 4).
Therefore, from the remarks above a2 + b2 = p2 , and N(π) = N(p). Since π|p, we have
π = αp with α ∈ ℤ[i]. From N(π) = N(p), we get that N(α) = 1, and α is a unit. Therefore,
π and p are associates. Hence, in this case, π is an associate of a rational prime congruent
to 3 modulo 4.
Finally, suppose p ≡ 1 (mod 4). From the remarks above, either N(π) = p, or
N(π) = p2 . If N(π) = p2 , then a2 + b2 = p2 . Since p ≡ 1 (mod 4), from Fermat’s two square
theorem, there exist m, n ∈ ℤ with m2 + n2 = p. Let u = m + in, then the norm N(u) = p.
Since p is a rational prime, it follows that u is a Gaussian prime. Similarly, its conjugate
u is also a Gaussian prime. Now uu|p2 = N(π). Since π|N(π), it follows that π|uu, and
from Euclid’s lemma, either π|u, or π|u. If π|u, they are associates since both are primes.
But this is a contradiction since N(π) ≠ N(u). The same is true if π|u.
It follows that if p ≡ 1 (mod 4), then N(π) ≠ p2 . Therefore, N(π) = p = a2 + b2 . An
associate of π has both a, b > 0 (see exercises). Furthermore, since a2 + b2 = p, one of
a or b must be even. If a is odd, then b is even; then iπ is an associate of π with a even,
completing the proof.
Finally, we mention that the methods used in ℤ[i] cannot be applied to all quadratic
integers. For example, we have seen that there is not unique factorization in ℤ[√−5].
Definition 3.6.1. (1) A Dedekind domain D is an integral domain such that each nonzero
proper ideal A ({0} ≠ A ≠ R) can be written uniquely as a product of prime ideals
3.7 Exercises � 49
A = P1 ⋅ ⋅ ⋅ Pr
with each Pi being a prime ideal and the factorization being unique up to ordering.
(2) A Prüfer ring R is an integral domain such that
A ⋅ (B ∩ C) = AB ∩ AC
Dedekind domains arise naturally in algebraic number theory. It can be proved that
the rings of algebraic integers in any algebraic number field are Dedekind domains
(see [53]). If R is a Dedekind domain, it is also a Prüfer Ring. If R is a Prüfer ring and
a unique factorization domain, then R is a principal ideal domain. In the next chapter,
we will prove a Gaussian theorem which states that if R is a UFD, then the polynomial
ring R[x] is also a UFD. If K is a field, we have already seen that K[x] is a UFD. Hence,
the polynomial ring in several variables K[x1 , . . . , xn ] is also a UFD. This fact plays an
important role in algebraic geometry.
3.7 Exercises
1. Let R be an integral domain, and let π ∈ R \ (U(R) ∪ {0}). Show the following:
(i) If for each a ∈ R with π ∤ a, there exist λ, μ ∈ R with λπ + μa = 1, then π is a
prime element of R.
(ii) Give an example for a prime element π in a UFD R, which does not satisfy the
conditions of (i).
2. Let R be a UFD, and let a1 , . . . , at be pairwise coprime elements of R. If a1 ⋅ ⋅ ⋅ at is an
m-th power (m ∈ ℕ), then all factors ai are associates of an m-th power. Is each ai
necessarily an m-th power?
3. Decide if the unit group of ℤ[k] = {a + b√k : a, b ∈ ℤ}, k = 3, 5, 7, is finite or infinite.
For which a ∈ ℤ are (1 − √5) and (a + √5) associates in ℤ[√5]?
4. Let k ∈ ℤ and k ≠ x 2 for all x ∈ ℤ. Let α = a + b√k and β = c + d √k be elements of
ℤ[√k], and N(α) = a2 − kb2 , N(β) = c2 − kd 2 . Show the following:
(i) The equality of the absolute values of N(α) and N(β) is necessary for the asso-
ciation of α and β in ℤ[√k]. Is this constraint also sufficient?
(ii) Sufficient for the irreducibility of α in ℤ[√k] is the irreducibility of N(α) in ℤ.
Is this also necessary?
5. In general irreducible elements are not prime. Consider the set of complex number
given by
∞ m
f = ∑ ri x i = ∑ ri x i
i=0 i=0
for some m ≥ 0 since ri ≠ 0 for only finitely many i. Furthermore, this presenta-
tion is unique. We now call x an indeterminate over R, and write each element of R̃ as
f (x) = ∑m i
i=0 ri x with f (x) = 0 or rm ≠ 0. We also now write R[x] for R. Each element of
̃
R[x] is called a polynomial over R. The elements r0 , . . . , rm are called the coefficients of
f (x) with rm the leading coefficient. If rm ≠ 0, the non-negative integer m is called the
degree of f (x), which we denote by deg f (x). We say that f (x) = 0 has degree −∞. The
uniqueness of the representation of a polynomial implies that two nonzero polynomi-
als are equal if and only if they have the same degree and exactly the same coefficients.
A polynomial of degree 1 is called a linear polynomial, whereas one of degree two is a
quadratic polynomial. The set of polynomials of degree 0, together with 0, form a ring
isomorphic to R and, hence, can be identified with R, the constant polynomials. Thus, the
ring R embeds in the set of polynomials R[x]. The following results are straightforward
concerning degree:
https://doi.org/10.1515/9783111142524-004
52 � 4 Polynomials and Polynomial Rings
Lemma 4.1.1. Let f (x) ≠ 0, g(x) ≠ 0 ∈ R[x]. Then the following hold:
(a) deg f (x)g(x) ≤ deg f (x) + deg g(x).
(b) deg(f (x) ± g(x)) ≤ max(deg f (x), deg g(x)).
Theorem 4.1.2. Let R be a commutative ring with an identity. Then the set of polynomials
R[x] forms a ring called the ring of polynomials over R. The ring R identified with 0 and the
polynomials of degree 0 naturally embeds into R[x]. R[x] is commutative. Furthermore,
R[x] is uniquely determined by R and x.
∑ ri x i → ∑ ri αi .
i≥0 i≥0
Hence, R[x] is uniquely determined by R and x. We remark that R[α] must be commuta-
tive.
f (c) = r0 + r1 c + ⋅ ⋅ ⋅ + rn cn ∈ R
Definition 4.1.4. If f (x) ∈ R[x] and f (c) = 0 for c ∈ R, then c is called a zero or a root of
f (x) in R.
4.2 Polynomial Rings over Fields � 53
Theorem 4.2.1. If K is a field, then K[x] forms an integral domain. K can be naturally
embedded into K[x] by identifying each element of K with the corresponding constant
polynomial. The only units in K[x] are the nonzero elements of K.
Proof. Verification of the basic ring properties is solely computational and is left to the
exercises. Since deg P(x)Q(x) = deg P(x) + deg Q(x), it follows that if neither P(x) ≠ 0,
nor Q(x) ≠ 0, then P(x)Q(x) ≠ 0. Therefore, K[x] is an integral domain.
If G(x) is a unit in K[x], then there exists an H(x) ∈ K[x] with G(x)H(x) = 1.
From the degrees, we have deg G(x) + deg H(x) = 0, and since deg G(x) ≥ 0,
deg H(x) ≥ 0. This is possible only if deg G(x) = deg H(x) = 0. Therefore, G(x) ∈ K.
Now that we have K[x] as an integral domain, we proceed to show that K[x] is a
principal ideal domain and, hence, there is unique factorization into primes. We first
repeat the definition of a prime in K[x]. If 0 ≠ f (x) has no nontrivial, nonunit factors (it
cannot be factorized into polynomials of lower degree), then f (x) is a prime in K[x] or
a prime polynomial. A prime polynomial is also called an irreducible polynomial over K.
Clearly, if deg g(x) = 1, then g(x) is irreducible.
The fact that K[x] is a principal ideal domain follows from the division algorithm
for polynomials, which is entirely analogous to the division algorithm for integers.
Theorem 4.2.2 (Division algorithm in K[x]). If 0 ≠ f (x), 0 ≠ g(x) ∈ K[x], then there exist
unique polynomials q(x), r(x) ∈ K[x] such that f (x) = q(x)g(x) + r(x), where r(x) = 0, or
deg r(x) < deg g(x). (The polynomials q(x) and r(x) are called respectively the quotient
and remainder.)
Proof. If deg f (x) = 0 and deg g(x) ≥ 1, then we just choose q(x) = 0, and r(x) = f (x). If
deg f (x) = 0 = deg g(x), then f (x) = f ∈ K, and g(x) = g ∈ K, and we choose q(x) = gf and
r(x) = 0. Hence, Theorem 4.2.2 is proved for deg f (x) = 0, also certainly the uniqueness
statement.
Now, let n > 0 and Theorem 4.2.2 be proved for all f (x) ∈ K[x] with deg f (x) < n.
Now, given
an n−m
h(x) = f (x) − x g(x).
bm
54 � 4 Polynomials and Polynomial Rings
We have deg h(x) < n. Hence, by induction assumption, there are q1 (x) and r(x) with
h(x) = q1 (x)g(x) + r(x) and deg r(x) < deg g(x). Then
an n−m
f (x) = h(x) + x g(x)
bm
an n−m
=( x + q1 (x))g(x) + r(x)
bm
an n−m
= q(x)g(x) + r(x) with q(x) = x + q1 (x),
bm
with
deg r1 (x) < deg g(x), and deg r2 (x) < deg g(x).
which gives a contradiction because deg(r1 (x) − r2 (x)) < deg g(x), and q2 (x) − q1 (x) ≠ 0
if r1 (x) ≠ r2 (x). Therefore, r1 (x) = r2 (x), and furthermore q1 (x) = q2 (x) because K[x] is
an integral domain.
2x 3 + x 2 − 5x + 3
= 2x − 1 with remainder −6x + 4.
x2 + x + 1
Hence, q(x) = 2x − 1, r(x) = −6x + 4, and
Theorem 4.2.4. Let K be a field. Then the polynomial ring K[x] is a principal ideal domain,
and hence a unique factorization domain.
f (x) = (x − c)h(x),
where r(x) = 0, or deg r(x) < deg(x − c) = 1. Hence, if r(x) ≠ 0, then r(x) is a polynomial
of degree 0, that is, a constant polynomial, and thus r(x) = r for r ∈ K. Hence, we have
f (x) = (x − c)h(x) + r.
0 = f (x) = 0h(c) + r = r
and, therefore, r = 0, and f (x) = (x − c)h(x). Since deg(x − c) = 1, we must have that
deg h(x) < deg f (x).
If f (x) = (x − c)k h(x) for some k ≥ 1 with h(c) ≠ 0, then c is called a zero of order k.
Theorem 4.2.6. Let f (x) ∈ K[x] with degree 2 or 3. Then f is irreducible if and only if f (x)
does not have a zero in K.
Proof. Suppose that f (x) is irreducible of degree 2 or 3. If f (x) has a zero c, then from
Theorem 4.2.5, we have f (x) = (x − c)h(x) with h(x) of degree 1 or 2. Therefore, f (x) is
reducible a contradiction and, hence, f (x) cannot have a zero.
From Theorem 4.2.5, if f (x) has a zero and is of degree greater than 1, then f (x) is
reducible.
If f (x) is reducible, then f (x) = g(x)h(x) with deg g(x) = 1 and, hence, f (x) has a
zero in K.
Notice, for example, that this concept depends on the ring R. For example, 6 and 9
are not coprime over the integers ℤ since 3|6 and 3|9 and 3 is not a unit. However, 6 and
9 are coprime over the rationals ℚ. Here, 3 is a unit.
Definition 4.3.2. Let f (x) = ∑ni=0 ri x i ∈ R[x], where R is an integral domain. Then f (x) is
a primitive polynomial or just primitive if r0 , r1 , . . . , rn are coprime in R.
Proof. If r ∈ R is a unit, then since R embeds into R[x], it follows that r is also a unit
in R[x]. Conversely, suppose that h(x) ∈ R[x] is a unit. Then there is a g(x) such that
h(x)g(x) = 1. Hence, deg f (x) + deg g(x) = deg 1 = 0. Since degrees are nonnegative
integers, it follows that deg f (x) = deg g(x) = 0 and, hence, f (x) ∈ R.
Now suppose that p is a prime element of R. Then p ≠ 0, and pR is a prime ideal in R.
We must show that pR[x] is a prime ideal in R[x]. Consider the map
Then τ is an epimorphism with kernel pR[x]. Since pR is a prime ideal, we know that
R/pR is an integral domain. It follows that (R/pR)[x] is also an integral domain. Hence,
pR[x] must be a prime ideal in R[x], and therefore p is also a prime element of R[x].
Recall that each integral domain R can be embedded into a unique field of frac-
tions K. We can use results on K[x] to deduce some results in R[x].
Proof. Since K is a field, each nonzero element of K is a unit. Therefore, the only com-
mon divisors of the coefficients of f (x) are units and, hence, f (x) ∈ K[x] is primitive.
Theorem 4.3.5. Let R be an integral domain. Then each irreducible f (x) ∈ R[x] of degree
> 0 is primitive.
Proof. Let f (x) be an irreducible polynomial in R[x], and let r ∈ R be a common divisor
of the coefficients of f (x). Then f (x) = rg(x), where g(x) ∈ R[x].
Then deg f (x) = deg g(x) > 0, so g(x) ∉ R. Since the units of R[x] are the units of R,
it follows that g(x) is not a unit in R[x]. Since f (x) is irreducible, it follows that r must
be a unit in R[x] and, hence, r is a unit in R. Therefore, f (x) is primitive.
Theorem 4.3.6. Let R be an integral domain and K its field of fractions. If f (x) ∈ R[x] is
primitive and irreducible in K[x], then f (x) is irreducible in R[x].
Proof. Suppose that f (x) ∈ R[x] is primitive and irreducible in K[x], and suppose that
f (x) = g(x)h(x), where g(x), h(x) ∈ R[x] ⊂ K[x]. Since f (x) is irreducible in K[x], either
g(x) or h(x) must be a unit in K[x]. Without loss of generality, suppose that g(x) is a unit
in K[x]. Then g(x) = g ∈ K. But g(x) ∈ R[x], and K ∩ R[x] = R.
Hence, g ∈ R. Then g is a divisor of the coefficients of f (x), and as f (x) is primitive,
g(x) must be a unit in R and, therefore, also a unit in R[x]. Therefore, f (x) is irreducible
in R[x].
4.4 Polynomial Rings over Unique Factorization Domains � 57
Theorem 4.4.1 (Gauss’ lemma). Let R be a UFD and f (x), g(x) primitive polynomials
in R[x]. Then their product f (x)g(x) is also primitive.
Proof. Let R be a UFD and f (x), g(x) primitive polynomials in R[x]. Suppose that f (x)g(x)
is not primitive. Then there is a prime element p ∈ R that divides each of the coefficients
of f (x)g(x). Then p|f (x)g(x). Since prime elements of R are also prime elements of R[x],
it follows that p is also a prime element of R[x] and, hence, p|f (x), or p|g(x). Therefore,
either f (x) or g(x) is not primitive, giving a contradiction.
r
Proof. (a) Suppose that g(x) = ∑ni=0 ai x i with ai = si , ri , si ∈ R. Set s = s0 s1 ⋅ ⋅ ⋅ sn . Then
i
sg(x) is a nonzero element of R[x]. Let d be a greatest common divisor of the coefficients
of sg(x). If we set a = ds , then ag(x) is primitive.
(b) For a ∈ K, there are coprime r, s ∈ R satisfying a = rs . Suppose that a ∉ R. Then
there is a prime element p ∈ R dividing s. Since g(x) is primitive, p does not divide all the
coefficients of g(x). However, we also have f (x) = ag(x) = rs g(x). Hence, sf (x) = rg(x),
where p|s and p does not divide r. Therefore, p divides all the coefficients of g(x) and,
hence, a ∈ R.
(c) From part (a), there is a nonzero a ∈ K such that af (x) is primitive in R[x]. Then
f (x) = a−1 (af (x)). From part (b), we must have a−1 ∈ R. Set g(x) = af (x) and b = a−1 .
Theorem 4.4.3. Let R be a UFD and K its field of fractions. Let f (x) ∈ R[x] be a polynomial
of degree ≥ 1.
(a) If f (x) is primitive and f (x)|g(x) in K[x], then f (x) divides g(x) also in R[x].
(b) If f (x) is irreducible in R[x], then it is also irreducible in K[x].
(c) If f (x) is primitive and a prime element of K[x], then f (x) is also a prime element
of R[x].
Proof. (a) Suppose that g(x) = f (x)h(x) with h(x) ∈ K[x]. From Theorem 4.4.2 part (a),
there is a nonzero a ∈ K such that h1 (x) = ah(x) is primitive in R[x]. Hence, g(x) =
1
a
(f (x)h1 (x)). From Gauss’ lemma f (x)h1 (x) is primitive in R[x]. Therefore, from Theo-
rem 4.4.2 part (b), we have a1 ∈ R. It follows that f (x)|g(x) in R[x].
58 � 4 Polynomials and Polynomial Rings
(b) Suppose that g(x) ∈ K[x] is a factor of f (x). From Theorem 4.4.2 part (a), there
is a nonzero a ∈ K with g1 (x) = ag(x) primitive in R[x]. Since a is a unit in K, it follows
that
However, by assumption, f (x) is irreducible in R[x]. This implies that either g1 (x) is a
unit in R, or g1 (x) is an associate of f (x).
If g1 (x) is a unit, then g1 ∈ K, and g1 = ga. Hence, g ∈ K; that is, g = g(x) is a unit.
If g1 (x) is an associate of f (x), then f (x) = bg(x), where b ∈ K since g1 (x) = ag(x)
with a ∈ K. Combining these, it follows that f (x) has only trivial factors in K[x], and
since—by assumption—f (x) is nonconstant, it follows that f (x) is irreducible in K[x].
(c) Suppose that f (x)|g(x)h(x) with g(x), h(x) ∈ R[x]. Since f (x) is a prime element
in K[x], we have that f (x)|g(x) or f (x)|h(x) in K[x]. From part (a), we have f (x)|g(x) or
f (x)|h(x) in R[x] implying that f (x) is a prime element in R[x].
Theorem 4.4.4 (Gauss). Let R be a UFD. Then the polynomial ring R[x] is also a UFD.
Proof. By induction, on degree, we show that each nonunit f (x) ∈ R[x], f (x) ≠ 0, is a
product of prime elements. Since R is an integral domain, so is R[x]. Therefore, the fact
that R[x] is a UFD then follows from Theorem 3.3.3.
If deg f (x) = 0, then f (x) = f is a nonunit in R. Since R is a UFD, f is a product of
prime elements in R. However, from Theorem 4.3.3, each prime factor is then also prime
in R[x]. Therefore, f (x) is a product of prime elements.
Now suppose n > 0 and that the claim is true for all polynomials f (x) of degree < n.
Let f (x) be a polynomial of degree n > 0. From Theorem 4.4.2 (c), there is an a ∈ R and a
primitive h(x) ∈ R[x] satisfying f (x) = ah(x). Since R is a UFD, the element a is a product
of prime elements in R, or a is a unit in R. Since the units in R[x] are the units in R, and a
prime element in R is also a prime element in R[x], it follows that a is a product of prime
elements in R[x], or a is a unit in R[x]. Let K be the field of fractions of R. Then K[x] is a
UFD. Hence, h(x) is a product of prime elements of K[x].
Let p(x) ∈ K[x] be a prime divisor of h(x). From Theorem 4.4.2, we can assume by
multiplication of field elements that p(x) ∈ R[x], and p(x) is primitive.
From Theorem 4.4.2 (c), it follows that p(x) is a prime element of R[x]. Furthermore,
from Theorem 4.4.3 (a), p(x) is a divisor of h(x) in R[x]. Therefore,
By our inductive hypothesis, we have then that g(x) is a product of prime elements in
R[x], or g(x) is a unit in R[x]. Therefore, the claim holds for f (x), and therefore holds for
all f (x) by induction.
If R[x] is a polynomial ring over R, we can form a polynomial ring in a new indeter-
minate y over this ring to form (R[x])[y]. It is straightforward that (R[x])[y] is isomor-
phic to (R[y])[x]. We denote both of these rings by R[x, y] and consider this as the ring
of polynomials in two commuting variables x, y with coefficients in R.
If R is a UFD, then from Theorem 4.4.4, R[x] is also a UFD. Hence, R[x, y] is also a
UFD. Inductively then, the ring of polynomials in n commuting variables R[x1 , x2 , . . . , xn ]
is also a UFD.
Here, the ring R[x1 , . . . , xn ] is inductively given by R[x1 , . . . , xn ] = (R[x1 , . . . , xn−1 ])[xn ]
if n > 2.
We now give a condition for a polynomial in R[x] to have a zero in K[x], where K is
the field of fractions of R.
r rn r n−1
f ( ) = 0 = n + rn−1 n−1 + ⋅ ⋅ ⋅ + r0 .
s s s
Hence, it follows that s must divide r n . Since r and s are coprime, s must be a unit, and
then, without loss of generality, we may assume that s = 1. Then β ∈ R, and
and so r|a0 .
Note that since ℤ is a UFD, Gauss’ theorem implies that ℤ[x] is also a UFD. However,
ℤ[x] is not a principal ideal domain. For example, the set of integral polynomials with
even constant term is an ideal, but not principal. We leave the verification to the exer-
60 � 4 Polynomials and Polynomial Rings
cises. On the other hand, we saw that if K is a field, K[x] is a PID. The question arises as
to when R[x] actually is a principal ideal domain. It turns out to be precisely when R is
a field.
Theorem 4.4.7. Let R be a commutative ring with an identity. Then the following are
equivalent:
(a) R is a field.
(b) R[x] is Euclidean.
(c) R[x] is a principal ideal domain.
Proof. From Section 4.2, we know that (a) implies (b), which in turn implies (c). There-
fore, we must show that (c) implies (a). Assume then that R[x] is a principal ideal domain.
Define the map
τ : R[x] → R
by
τ( f (x)) = f (0).
It is easy to see that τ is a ring homomorphism with R[x]/ ker(τ) ≅ R. Therefore, ker(τ) ≠
R[x]. Since R[x] is a principal ideal domain, it is an integral domain. It follows that ker(τ)
must be a prime ideal since the quotient ring is an integral domain. However, since R[x]
is a principal ideal domain, prime ideals are maximal ideals; hence, ker(τ) is a maximal
ideal by Theorem 3.2.7. Therefore, R ≅ R[x]/ ker(τ) is a field.
We now consider the relationship between irreducibles in R[x] for a general integral
domain and irreducibles in K[x], where K is its field of fractions. This is handled by the
next result called Eisenstein’s criterion.
Theorem 4.4.8 (Eisenstein’s criterion). Let R be an integral domain and K its field of frac-
tions. Let f (x) = ∑ni=0 ai x i ∈ R[x] of degree n > 0. Let p be a prime element of R satisfying
the following:
(1) p|ai for i = 0, . . . , n − 1.
(2) p does not divide an .
(3) p2 does not divide a0 .
Proof. (a) Suppose that f (x) = g(x)h(x) with g(x), h(x) ∈ R[x]. Suppose that
k l
g(x) = ∑ bi x i , bk ≠ 0 and h(x) = ∑ cj x j , cl ≠ 0.
i=0 j=0
4.4 Polynomial Rings over Unique Factorization Domains � 61
Then a0 = b0 c0 . Now p|a0 , but p2 does not divide a0 . This implies that either p does not
divide b0 , or p doesn’t divide c0 . Without loss of generality, assume that p|b0 and p does
not divide c0 .
Since an = bk cl , and p does not divide an , it follows that p does not divide bk . Let bj
be the first coefficient of g(x), which is not divisible by p. Consider
aj = bj c0 + ⋅ ⋅ ⋅ + b0 cj ,
where everything after the first term is divisible by p. Since p does not divide both bj and
c0 , it follows that p does not divide bj c0 . Therefore, p does not divide aj , which implies
that j = n. Then from j ≤ k ≤ n, it follows that k = n.
Therefore, deg g(x) = deg f (x) and, hence, deg h(x) = 0. Thus, h(x) = h ∈ R. Then
from f (x) = hg(x) with f primitive, it follows that h is a unit and, therefore, f (x) is
irreducible.
(b) Suppose that f (x) = g(x)h(x) with g(x), h(x) ∈ R[x]. The fact that f (x) was prim-
itive was only used in the final part of part (a). Therefore, by the same arguments as in
part (a), we may assume—without loss of generality—that h ∈ R ⊂ K. Therefore, f (x) is
irreducible in K[x].
Example 4.4.9. Let R = ℤ and p a prime number. Suppose that n, m are integers such
that n ≥ 1 and p does not divide m. Then x n ± pm is irreducible in ℤ[x] and ℚ[x]. In
1
particular, (pm) n is irrational.
xp − 1
Φp (x) = = x p−1 + x p−2 + ⋅ ⋅ ⋅ + 1.
x−1
Since all the coefficients of Φp (x) are equal to 1, Eisenstein’s criterion is not directly ap-
plicable. However, the fact that Φp (x) is irreducible implies that for any integer a, the
polynomial Φp (x + a) is also irreducible in ℤ[x]. It follows that
p p p−1 p p
(x + 1)p − 1 x + ( 1 )x + ⋅ ⋅ ⋅ + (p−1)x + 1 − 1
Φp (x + 1) = =
(x + 1) − 1 x
p−1 p p−2 p
= x + ( )x + ⋅ ⋅ ⋅ + ( ).
1 p−1
Theorem 4.4.11. Let R be a UFD and K its field of fractions. Let f (x) = ∑ni=0 ai x i ∈ R[x] be
a polynomial of degree ≥ 1. Let P be a prime ideal in R with an ∉ P. Let R = R/P, and let
α : R[x] → R[x] be defined by
m m
α(∑ ri x i ) = ∑(ri + P)x i .
i=0 i=0
α is an epimorphism. Then if α(f (x)) is irreducible in R[x], then f (x) is irreducible in K[x].
Proof. By Theorem 4.4.3, there exists an a ∈ R and a primitive g(x) ∈ R[x] satisfying
f (x) = ag(x). Since an ∉ P, we have that α(a) ≠ 0. Furthermore, the highest coefficient
of g(x) is also not an element of P. If α(g(x)) is reducible, then α(f (x)) is also reducible.
Thus, α(g(x)) is irreducible. However, from Theorem 4.4.4, g(x) is irreducible in K[x].
Therefore, f (x) = ag(x) is also irreducible in K[x]. Therefore, to prove the theorem, it
suffices to consider the case where f (x) is primitive in R[x].
Now suppose that f (x) is primitive. We show that f (x) is irreducible in R[x].
Suppose that f (x) = g(x)h(x), g(x), h(x) ∈ R[x] with h(x), g(x) nonunits in R[x].
Since f (x) is primitive, g, h ∉ R. Therefore, deg g(x) < deg f (x), and deg h(x) < deg f (x).
Now we have α(f (x)) = α(g(x))α(h(x)). Since P is a prime ideal, R/P is an integral
domain. Therefore, in R[x] we have
Now
Therefore, deg α(g(x)) = deg g(x), and deg α(h(x)) = deg h(x). Therefore, α(f (x)) is re-
ducible, and we have a contradiction.
It is important to note that α(f (x)), being reducible, does not imply that f (x) is re-
ducible. For example, f (x) = x 2 + 1 is irreducible in ℤ[x]. However, in ℤ2 [x], we have
x 2 + 1 = (x + 1)2
α( f (x)) = x 5 + x 2 + 1 ∈ ℤ2 [x].
4.5 Exercises � 63
Suppose that in ℤ2 [x], we have α(f (x)) = g(x)h(x). Without loss of generality, we may
assume that g(x) is of degree 1 or 2.
If deg g(x) = 1, then α(f (x)) has a zero c in ℤ2 [x]. The two possibilities for c are
c = 0, or c = 1. Then the following hold:
If c = 0, then 0 + 0 + 1 = 1 ≠ 0.
If c = 1, then 1 + 1 + 1 = 1 ≠ 0.
x 2 + x + 1, x 2 + x, x 2 + 1, x2.
The last three, x 2 + x, x 2 + 1, x 2 all have zeros in ℤ2 [x]. Therefore, they cannot divide
α(f (x)). Therefore, g(x) must be x 2 + x + 1. Applying the division algorithm, we obtain
α( f (x)) = (x 3 + x 2 )(x 2 + x + 1) + 1
and, therefore, x 2 + x + 1 does not divide α(f (x)). It follows that α(f (x)) is irreducible,
and from the previous theorem, f (x) must be irreducible in ℚ[x].
4.5 Exercises
1. For which a, b ∈ ℤ does the polynomial x 2 + 3x + 1 divide the polynomial
x 3 + x 2 + ax + b?
f 0 (x) := f (x),
′
f (k) (x) := f (k−1) (x).
Show that α is a zero of order k of the polynomial f (x) ∈ R[x], if f (k−1) (α) = 0, but
f (k) (α) ≠ 0.
7. Prove that the set of integral polynomials with even constant term is an ideal, but
not principal.
8. Prove that p|(pi) for 1 ≤ i ≤ p − 1.
5 Field Extensions
5.1 Extension Fields and Finite Extensions
Much of algebra in general arose from the theory of equations, specifically polynomial
equations. As discovered by Galois and Abel, the solutions of polynomial equations over
fields is intimately tied to the theory of field extensions. This theory eventually blos-
soms into Galois Theory. In this chapter, we discuss the basic material concerning field
extensions.
Recall that if L is a field and K ⊂ L is also a field under the same operations as L,
then K is called a subfield of L. If we view this situation from the viewpoint of K, we say
that L is an extension field or field extension of K. If K, L are fields with K ⊂ L, we always
assume that K is a subfield of L.
Definition 5.1.1. If K, L are fields with K ⊂ L, then we say that L is a field extension or
extension field of K. We denote this by L|K.
Note that this is equivalent to having a field monomorphism
i:K →L
https://doi.org/10.1515/9783111142524-005
66 � 5 Field Extensions
Definition 5.1.2. If L is an extension field of K, then the degree of the extension L|K is
defined as the dimension, dimK (L), of L, as a vector space over K. We denote the degree
by |L : K|. The field extension L|K is a finite extension if the degree |L : K| is finite.
Proof. Every complex number can be written uniquely as a + ib, where a, b ∈ ℝ. Hence,
the elements 1, i constitute a basis for ℂ over ℝ and, therefore, the dimension is 2. That
is, |ℂ : ℝ| = 2.
The fact that |ℝ : ℚ| = ∞ depends on the existence of transcendental numbers.
An element r ∈ ℝ is algebraic (over ℚ) if it satisfies some nonzero polynomial with
coefficients from ℚ. That is, P(r) = 0, where
0 ≠ P(x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n with ai ∈ ℚ.
Any q ∈ ℚ is algebraic since if P(x) = x − q, then P(q) = 0. However, many irrationals are
also algebraic. For example, √2 is algebraic since x 2 − 2 = 0 has √2 as a zero. An element
r ∈ ℝ is transcendental if it is not algebraic.
In general, it is very difficult to show that a particular element is transcendental.
However, there are uncountably many transcendental elements (see exercises). Specific
examples are e and π. We will give a proof of their transcendence in Chapter 20.
Since e is transcendental, for any natural number n, the set {1, e, e2 , . . . , en } must be
independent over ℚ, for otherwise there would be a polynomial that e would satisfy.
Therefore, we have infinitely many independent vectors in ℝ over ℚ, which would be
impossible if ℝ had finite degree over ℚ.
If L|K and L1 |K1 are field extensions, then they are isomorphic field extensions if
there exists a field isomorphism f : L → L1 such that f|K is an isomorphism from K to K1 .
Suppose that K ⊂ L ⊂ M are fields. Below we show that the degrees multiply. In this
situation, where K ⊂ L ⊂ M, we call L an intermediate field.
|M : K| = |M : L||L : K|.
Proof. Let {xi : i ∈ I} be a basis for L as a vector space over K, and let {yj : j ∈ J} be a basis
for M as a vector space over L. To prove the result, it is sufficient to show that the set
B = {xi yj : i ∈ I, j ∈ J}
5.1 Extension Fields and Finite Extensions � 67
is a basis for M as a vector space over K. To show this, we must show that B is a linearly
independent set over K, and that B spans M.
Suppose that
∑(∑ kij xi ) yj = 0.
j i
But ∑i kij xi ∈ L. Since {yj : j ∈ J} is a basis for M over L, the yj are independent over L;
hence, for each j, we get ∑i kij xi = 0. Now since {xi : i ∈ I} is a basis for L over K, it follows
that the xi are linearly independent, and since for each j we have ∑i kij xi = 0, it must be
that kij = 0 for all i and for all j. Therefore, the set B is linearly independent over K.
Now suppose that m ∈ M. Then since {yj : j ∈ J} spans M over L, we have
m = ∑ cj yj with cj ∈ L.
j
m = ∑ kij xi yj
ij
and, hence, B spans M over K. Therefore, B is a basis for M over K, and the result is
proved.
Corollary 5.1.6. (a) If |L : K| is a prime number, then there exists no proper intermediate
field between L and K.
(b) If K ⊂ L and |L : K| = 1, then L = K.
Let L|K be a field extension, and suppose that A ⊂ L. Then certainly there are sub-
rings of L containing both A and K, for example L. We denote by K[A] the intersection of
all subrings of L containing both K and A. Since the intersection of subrings is a subring,
it follows that K[A] is a subring containing both K and A and the smallest such subring.
We call K[A] the ring adjunction of A to K.
In an analogous manner, we let K(A) be the intersection of all subfields of L contain-
ing both K and A. This is then a subfield of L, and the smallest subfield of L containing
both K and A. The subfield K(A) is called the field adjunction of A to K.
68 � 5 Field Extensions
Definition 5.1.7. The field extension L|K is finitely generated if there exist elements
a1 , . . . , an ∈ L such that L = K(a1 , . . . , an ). The extension L|K is a simple extension if there
is an a ∈ L with L = K(a). In this case, a is called a primitive element of L|K.
For the remainder of this section, we assume that L|K is a field extension.
Proof. Suppose that L|K is a finite extension and a ∈ L. We must show that a is algebraic
over K. Suppose that |L : K| = n < ∞, then dimK (L) = n. It follows that any n+1 elements
of L are linearly dependent over K.
Now consider the elements 1, a, a2 , . . . , an in L. These are n + 1 distinct elements in L,
so they are dependent over K. Hence, there exist c0 , . . . , cn ∈ K not all zero such that
c0 + c1 a + ⋅ ⋅ ⋅ + cn an = 0.
From the previous theorem, it follows that every finite extension is algebraic. The
converse is not true; that is, there are algebraic extensions that are not finite. We will
give examples in Section 5.4.
5.3 Minimal Polynomials and Simple Extensions � 69
The following lemma gives some examples of algebraic and transcendental exten-
sions.
Lemma 5.2.4. ℂ|ℝ is algebraic, but ℝ|ℚ and ℂ|ℚ are transcendental. If K is any field,
then K(x)|K is transcendental.
Definition 5.3.1. Suppose that L|K is a field extension and a ∈ L is algebraic over K. The
polynomial ma (x) ∈ K[x] is the minimal polynomial of a over K if the following hold:
(1) ma (x) has leading coefficient 1; that is, it is a monic polynomial.
(2) ma (a) = 0.
(3) If f (x) ∈ K[x] with f (a) = 0, then ma (x)| f (x).
Hence, ma (x) is the monic polynomial of minimal degree that has a as a zero.
We prove next that every algebraic element has such a minimal polynomial.
Theorem 5.3.2. Suppose that L|K is a field extension and a ∈ L is algebraic over K. Then
we have:
(1) The minimal polynomial ma (x) ∈ K[x] exists and is irreducible over K.
(2) K[a] ≅ K(a) ≅ K[x]/(ma (x)), where (ma (x)) is the principal ideal in K[x] generated
by ma (x).
(3) |K(a) : K| = deg(ma (x)). Therefore, K(a)|K is a finite extension.
Hence, there exists g(x) ∈ K[x] with I = (g(x)). Let b be the leading coefficient of
g(x). Then ma (x) = b−1 g(x) is a monic polynomial. We claim that ma (x) is the minimal
polynomial of a and that ma (x) is irreducible. First, it is clear that I = (g(x)) = (ma (x)). If
f (x) ∈ K[x] with f (a) = 0, then f (x) = h(x)ma (x) for some h(x). Therefore, ma (x) divides
any polynomial that has a as a zero. It follows that ma (x) is the minimal polynomial.
Suppose that ma (x) = g1 (x)g2 (x). Then since ma (a) = 0, it follows that either g1 (a) =
0 or g2 (a) = 0. Suppose g1 (a) = 0. Then from above, ma (x)|g1 (x), and since g1 (x)|ma (x),
we must then have that g2 (x) is a unit. Therefore, ma (x) is irreducible.
(2) Consider the map τ : K[x] → K[a] given by
τ(∑ ki x i ) = ∑ ki ai .
i i
Since ma (x) is irreducible, we have K[x]/(ma (x)) is a field and, therefore, K[a] = K(a).
(3) Let n = deg(ma (x)). We claim that the elements 1, a, . . . , an−1 are a basis for K[a] =
K(a) over K. First suppose that
n−1
∑ ci a i = 0
i=1
with ci ∈ K and some ci , but not all might be zero. This implies that
Theorem 5.3.3. Suppose that L|K is a field extension and a ∈ L is algebraic over K. Sup-
pose that f (x) ∈ K[x] is a monic polynomial with f (a) = 0. Then f (x) is the minimal
polynomial if and only if f (x) is irreducible in K[x].
Proof. Suppose that f (x) is the minimal polynomial of a. Then f (x) is irreducible from
the previous theorem.
Conversely, suppose that f (x) is monic, irreducible and f (a) = 0. From the previous
theorem ma (x)| f (x). Since f (x) is irreducible, we have f (x) = cma (x) with c ∈ K. How-
ever, since both f (x) and ma (x) are monic, we must have c = 1, and f (x) = ma (x).
Theorem 5.3.4. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a finite extension.
(2) L|K is an algebraic extension and there exist a1 , . . . , an ∈ L with L = K(a1 , . . . , an ).
(3) There exist algebraic elements a1 , . . . , an ∈ L such that L = K(a1 , . . . , an ).
Proof. (1) ⇒ (2). We have seen in Theorem 5.2.3 that a finite extension is algebraic.
Suppose that a1 , . . . , an are a basis for L over K. Then clearly L = K(a1 , . . . , an ).
(2) ⇒ (3). If L|K is an algebraic extension and L = K(a1 , . . . , an ), then each ai is
algebraic over K.
(3) ⇒ (1). Suppose that there exist algebraic elements a1 , . . . , an ∈ L such that
L = K(a1 , . . . , an ). We show that L|K is a finite extension. We do this by induction on n.
If n = 1, then L = K(a) for some algebraic element a, and the result follows from Theo-
rem 5.3.2. Suppose now that n ≥ 2. We assume then that an extension K(a1 , . . . , an−1 )
with a1 , . . . , an−1 algebraic elements is a finite extension. Now suppose that we have
L = K(a1 , . . . , an ) with a1 , . . . , an algebraic elements.
Then
K(a1 , . . . , an ) : K
The second term |K(a1 , . . . , an−1 ) : K| is finite from the inductive hypothesis. The first
term |K(a1 , . . . , an−1 )(an ) : K(a1 , . . . , an−1 )| is also finite from Theorem 5.3.2 since it is
a simple extension of the field K(a1 , . . . , an−1 ) by the algebraic element an . Therefore,
|K(a1 , . . . , an ) : K| is finite.
72 � 5 Field Extensions
Theorem 5.3.5. Suppose that K is a field and R is an integral domain with K ⊂ R. Then R
can be viewed as a vector space over K. If dimK (R) < ∞, then R is a field.
τ(r) = rr0 .
It is easy to show (see exercises) that this is a linear transformation from R to R, consid-
ered as a vector space over K.
Suppose that τ(r) = 0. Then rr0 = 0 and, hence, r = 0 since r0 ≠ 0 and R is an
integral domain. It follows that τ is an injective map. Since R is a finite-dimensional
vector space over K, and τ is an injective linear transformation, it follows that τ must
also be surjective. This implies that there exists an r1 with τ(r1 ) = 1. Then r1 r0 = 1 and,
hence, r0 has an inverse within R. Since r0 was an arbitrary nonzero element of R, it
follows that R is a field.
Proof. If M|K is algebraic, then certainly M|L and L|K are algebraic.
Now suppose that M|L and L|K are algebraic. We show that M|K is algebraic. Let
a ∈ M. Then since a is algebraic over L, there exist b0 , b1 , . . . , bn ∈ L with
b0 + b1 a + ⋅ ⋅ ⋅ + bn an = 0.
Theorem 5.4.1. Suppose that L|K is a field extension, and let 𝒜K denote the set of all ele-
ments of L that are algebraic over K. Then 𝒜K is a subfield of L. 𝒜K is called the algebraic
closure of K in L.
over K. Now a, b ∈ K(a, b) if b ≠ 0, and K(a, b) is a field. Therefore, a ± b, ab, and a/b are
all in K(a, b) and, hence, all algebraic over K. Therefore, a ± b, ab, a/b, if b ≠ 0, are all
in 𝒜K . It follows that 𝒜K is a subfield of L.
Theorem 5.4.2. Let 𝒜 be the algebraic closure of the rational numbers ℚ within the com-
plex numbers ℂ. Then 𝒜 is an algebraic extension of ℚ, but |𝒜 : ℚ| = ∞.
We will let 𝒜 denote the totality of algebraic numbers within the complex num-
bers ℂ, and 𝒯 the set of transcendentals so that ℂ = 𝒜 ∪ 𝒯 . In the language of the last
subsection, 𝒜 is the algebraic closure of ℚ within ℂ. As in the general case, if α ∈ ℂ is
algebraic, we will let mα (x) denote the minimal polynomial of α over ℚ.
We now examine the sets 𝒜 and 𝒯 more closely. Since 𝒜 is precisely the algebraic
closure of ℚ in ℂ, we have from our general result that 𝒜 actually forms a subfield
of ℂ. Furthermore, since the intersection of subfields is again a subfield, it follows that
𝒜′ = 𝒜 ∩ ℝ, the real algebraic numbers form a subfield of the reals.
Since each rational is algebraic, it is clear that there are algebraic numbers. Fur-
thermore, there are irrational algebraic numbers, √2 for example, since it satisfies the
irreducible polynomial x 2 − 2 = 0 over ℚ. On the other hand, we have not examined the
question of whether transcendental numbers really exist. To show that any particular
complex number is transcendental is, in general, quite difficult. However, it is relatively
easy to show that there are uncountably infinitely many transcendentals.
74 � 5 Field Extensions
Theorem 5.5.3. The set 𝒜 of algebraic numbers is countably infinite. Therefore, 𝒯 , the
set of transcendental numbers, and 𝒯 ′ = 𝒯 ∩ ℝ, the real transcendental numbers, are
uncountably infinite.
Proof. Let
ℚn+1 = ℚ × ℚ × ⋅ ⋅ ⋅ × ℚ.
Since a finite Cartesian product of countable sets is still countable, it follows that 𝒫n is
a countable set.
Now let
ℬn = ⋃ {zeros of p(x)};
p(x)∈𝒫n
that is, ℬn is the union of all zeros in ℂ of all rational polynomials of degree ≤ n. Since
each such p(x) has a maximum of n zeros, and since 𝒫n is countable, it follows that ℬn
is a countable union of finite sets and, hence, is still countable. Now
∞
𝒜 = ⋃ ℬn ,
n=1
From Theorem 5.5.3, we know that there exist infinitely many transcendental num-
bers. Liouville, in 1851, gave the first proof of the existence of transcendentals by exhibit-
ing a few. He gave the following as one example:
is transcendental.
1 1 1
a real number. Furthermore, since ∑∞ j=1 10j = 9 , it follows that c < 9 < 1. Suppose that
c is algebraic so that g(c) = 0 for some rational nonzero polynomial g(x). Multiplying
through by the least common multiple of all the denominators in g(x), we may suppose
that f (c) = 0 for some integral polynomial f (x) = ∑nj=0 mj x j . Then c satisfies
n
∑ mj cj = 0
j=0
for some ζ with ck < ζ < c < 1. Now since 0 < ζ < 1, we have
1
|c − ck | f ′ (ζ ) < 2B
.
10(k+1)!
On the other hand, since f (x) can have at most n zeros, it follows that for all k large
enough, we would have f (ck ) ≠ 0. Since f (c) = 0, we have
n
j 1
f (c) − f (ck ) = f (ck ) = ∑ mj ck > nk!
10
j=1
j
since for each j, mj ck is a rational number with denominator 10jk! . However, if k is chosen
sufficiently large and n is fixed, we have
1 2B
nk!
> (k+1)! ,
10 10
contradicting the equality from the mean value theorem. Therefore, c is transcenden-
tal.
76 � 5 Field Extensions
Theorem 5.5.5. Suppose that L|K is a field extension and a ∈ L is transcendental over K.
Then K(a)|K is isomorphic to K(x)|K. Here the isomorphism μ : K(x) → K(a) can be
chosen such that μ(x) = a.
f (x) f (a)
μ( )=
g(x) g(a)
for f (x), g(x) ∈ K[x] with g(x) ≠ 0. Then μ is a homomorphism, and μ(x) = a. Since
μ ≠ 0, it follows that μ is an isomorphism.
5.6 Exercises
1. Let a ∈ ℂ with a3 − 2a + 2 = 0 and b = a2 − a. Compute the minimal polynomial
mb (x) of b over ℚ and compute the inverse of b in ℚ(a).
2. Determine the algebraic closure of ℝ in ℂ(x).
2n
3. Let an := √2 ∈ ℝ, n = 1, 2, 3, . . . and A := {an : n ∈ ℕ} and E := ℚ(A). Show the
following:
(i) |ℚ(an ) : ℚ| = 2n .
(ii) |E : ℚ| = ∞.
(iii) E = ⋃∞n=1 ℚ(an ).
(iv) E is algebraic over ℚ.
4. Determine |E : ℚ| for
(i) E = ℚ(√2, √−2).
(ii) E = ℚ(√3, √3 + √3).
3
τ(∑ ki x i ) = ∑ ki ai .
i i
τ(r) = rr0 .
Definition 6.2.1. Suppose we are given a line segment of unit length. An α ∈ ℝ is con-
structible if we can construct a line segment of length |α|, in a finite number of steps,
from the unit segment using a straightedge and compass.
Our first result is that the set of all constructible numbers forms a subfield of ℝ.
Theorem 6.2.2. The set 𝒞 of all constructible numbers forms a subfield of ℝ. Furthermore,
ℚ ⊂ 𝒞.
Proof. Let 𝒞 be the set of all constructible numbers. Since the given unit length segment
is constructible, we have 1 ∈ 𝒞 . Therefore, 𝒞 ≠ 0. Thus, to show that it is a field, we must
show that it is closed under the field operations.
https://doi.org/10.1515/9783111142524-006
6.2 Constructible Numbers and Field Extensions � 79
Suppose α, β are constructible. We must show then that α ± β, αβ, and α/β for β ≠ 0
are constructible. If α, β > 0, construct a line segment of length |α|. At one end of this
line segment, extend it by a segment of length |β|. This will construct a segment of length
α + β. Similarly, if α > β, lay off a segment of length |β| at the beginning of a segment of
length |α|. The remaining piece will be α − β. By considering cases, we can do this in the
same manner if either α or β, or both, are negative. These constructions are pictured in
Figure 6.1. Therefore, α ± β are constructible.
In Figure 6.2, we show how to construct αβ. Let the line segment OA have length |α|.
Consider a line L through O not coincident with OA. Let OB have length |β| as in the
diagram. Let P be on ray OB so that OP has length 1. Draw AP and then find Q on ray OA
such that BQ is parallel to AP. From similar triangles, we then have
A similar construction, pictured in Figure 6.3, shows that α/β for β ≠ 0 is con-
structible. Find OA, OB, OP as above. Now, connect A to B, and let PQ be parallel to AB.
From similar triangles again, we have
1 |OQ| |α|
= ⇒ = |OQ|.
|β| |α| |β|
Let us now consider how a constructible number is found in the plane. Starting at
the origin and using the unit length and the constructions above, we can locate any point
80 � 6 Field Extensions and Compass and Straightedge Constructions
in the plane with rational coordinates. That is, we can construct the point P = (q1 , q2 )
with q1 , q2 ∈ ℚ. Using only straightedge and compass, any further point in the plane can
be determined in one of the following three ways:
1. The intersection point of two lines, each of which passes through two known points
each having rational coordinates.
2. The intersection point of a line passing through two known points having rational
coordinates and a circle, whose center has rational coordinates, and whose radius
squared is rational.
3. The intersection point of two circles, each of whose centers has rational coordinates,
and each of whose radii is the square root of a rational number.
Analytically, the first case involves the solution of a pair of linear equations, each with
rational coefficients and, thus, only leads to other rational numbers. In cases two and
three, we must solve equations of the form x 2 + y2 + ax + by + c = 0, with a, b, c ∈ ℚ. These
will then be quadratic equations over ℚ and, thus, the solutions will either be in ℚ, or
in a quadratic extension ℚ(√α) of ℚ. Once a real quadratic extension of ℚ is found, the
process can be iterated. Conversely, using the altitude theorem, if α is constructible, so
is √α. A much more detailed description of the constructible numbers can be found in
[52]. We thus can prove the following theorem:
Theorem 6.2.3. If γ is constructible with γ ∉ ℚ, then there exists a finite number of el-
ements α1 , . . . , αr ∈ ℝ with αr = γ such that for i = 1, . . . , r, ℚ(α1 , . . . , αi ) is a quadratic
extension of ℚ(α1 , . . . , αi−1 ). In particular, |ℚ(γ) : ℚ| = 2n for some n ≥ 1.
Therefore, the constructible numbers are precisely those real numbers that are con-
tained in repeated quadratic extensions of ℚ. In the next section, we use this idea to
show the impossibility of the first three mentioned construction problems.
Theorem 6.3.1. It is impossible to square the circle. That is, it is impossible in general,
given a circle, to construct using straightedge and compass a square having area equal to
that of the given circle.
Proof. Suppose the given circle has radius 1. It is then constructible and would have
an area of π. A corresponding square would then have to have a side of length √π. To
be constructible a number, α must have |ℚ(α) : ℚ| = 2m < ∞ and, hence, α must be
algebraic. However, π is transcendental, so √π is also transcendental (see Section 20.4);
therefore not constructible.
Theorem 6.3.2. It is impossible to double the cube. This means that it is impossible in
general, given a cube of given side length, to construct using a straightedge and compass,
a side of a cube having double the volume of the original cube.
Proof. Let the given side length be 1, so that the original volume is also 1. To double this,
we would have to construct a side of length 21/3 . However, |ℚ(21/3 ) : ℚ| = 3 since the
minimal polynomial over ℚ is m21/3 (x) = x 3 − 2. This is not a power of 2, so 21/3 is not
constructible.
The final construction problem we consider is the construction of regular n-gons. The
algebraic study of the constructibility of regular n-gons was initiated by Gauss in the
early part of the nineteenth century.
Notice first that a regular n-gon will be constructible for n ≥ 3 if and only if the
angle 2πn
is constructible, which is the case if and only if the length cos 2π n
is a con-
2π
structible number. From our techniques, if cos n is a constructible number, then nec-
essarily |ℚ(cos( 2π n
)) : ℚ| = 2m for some m. After we discuss Galois theory, we see that
this condition is also sufficient. Therefore, cos 2πn
is a constructible number if and only
if |ℚ(cos( 2π
n
)) : ℚ| = 2m
for some m.
The solution of this problem, that is, the determination of when |ℚ(cos( 2πn
)) : ℚ| = 2m ,
involves two concepts from number theory: the Euler phi-function and Fermat primes.
Definition 6.3.4. For any natural number n, the Euler phi-function is defined by
1
ϕ(pm ) = pm − pm−1 = pm (1 − ).
p
Proof. If 1 ≤ a ≤ p, then either a = p, or (a, p) = 1. It follows that the positive integers
less than or equal to pm , which are not relatively prime to pm are precisely the multiples
of p; that is, p, 2p, 3p, . . . , pm−1 ⋅ p. All other positive a < pm are relatively prime to pm .
Hence, the number relatively prime to pm is
pm − pm−1 .
Proof. Given a natural number n, a reduced residue system modulo n is a set of integers
x1 , . . . , xk such that each xi is relatively prime to n, xi ≠ xj modulo n unless i = j, and if
(x, n) = 1 for some integer x, then x ≡ xi (mod n) for some i. Clearly, ϕ(n) is the size of a
reduced residue system modulo n.
Let Ra = {x1 , . . . , xϕ(a) } be a reduced residue system modulo a, Rb = {y1 , . . . , yϕ(b) } be
a reduced residue system modulo b, and let
We claim that S is a reduced residue system modulo ab. Since S has ϕ(a)ϕ(b) elements,
it will follow that ϕ(ab) = ϕ(a)ϕ(b).
To show that S is a reduced residue system modulo ab, we must show three things:
first that each x ∈ S is relatively prime to ab; second that the elements of S are distinct;
and, finally, that given any integer n with (n, ab) = 1, then n ≡ s (mod ab) for some s ∈ S.
Let x = ayi + bxj . Then since (xj , a) = 1 and (a, b) = 1, it follows that (x, a) = 1.
Analogously, (x, b) = 1. Since x is relatively prime to both a and b, we have (x, ab) = 1.
This shows that each element of S is relatively prime to ab.
Next suppose that
ayi + bxj ≡ ayk + bxl (mod ab).
Then
Since (a, b) = 1, it follows that yi ≡ yk (mod b). But then yi = yk since Rb is a reduced residue
system. Similarly, xj = xl . This shows that the elements of S are distinct modulo ab.
Finally, suppose (n, ab) = 1. Since (a, b) = 1, there exist x, y with ax + by = 1. Then
anx + bny = n.
Since (x, b) = 1, and (n, b) = 1, it follows that (nx, b) = 1. Therefore, there is an si with
nx = si + tb. In the same manner, (ny, a) = 1, and so there is an rj with ny = rj + ua. Then
∑ ϕ(d) = n.
d|n
Proof. We first prove the theorem for prime powers and then paste together via the
fundamental theorem of arithmetic.
Suppose that n = pe for p a prime. Then the divisors of n are 1, p, p2 , . . . , pe , so
Notice that this sum telescopes; that is, 1 + (p − 1) = p, p + (p2 − p) = p2 and so on.
Hence, the sum is just pe , and the result is proved for n a prime power.
We now do an induction on the number of distinct prime factors of n. The above
argument shows that the result is true if n has only one distinct prime factor. Assume
that the result is true whenever an integer has less than k distinct prime factors, and
e e
suppose n = p11 ⋅ ⋅ ⋅ pkk has k distinct prime factors. Then n = pe c, where p = p1 , e = e1 ,
and c has fewer than k distinct prime factors. By the inductive hypothesis
∑ ϕ(d) = c.
d|c
Since (c, p) = 1, the divisors of n are all of the form pα d1 , where d1 |c, and
α = 0, 1, . . . , e. It follows that
As in the case of prime powers, this sum telescopes, giving a final result
∑ ϕ(d) = pe c = n.
d|n
6.3 Four Classical Construction Problems � 85
Example 6.3.11. Consider n = 10. The divisors are 1, 2, 5, 10. Then ϕ(1) = 1, ϕ(2) = 1,
ϕ(5) = 4, and ϕ(10) = 4. Then
We will see later in the book that the Euler phi-function plays an important role in
the structure theory of Abelian groups.
We now turn to Fermat primes.
Definition 6.3.12. The Fermat numbers are the sequence (Fn ) of positive integers de-
fined by
n
Fn = 22 + 1, n = 0, 1, 2, 3, . . . .
Fermat believed that all the numbers in this sequence were primes. In fact, F0 , F1 ,
F2 , F3 , F4 are all primes, but F5 is composite and divisible by 641 (see exercises). It is still
an open question whether or not there are infinitely many Fermat primes. It has been
conjectured that there are only finitely many. On the other hand, if a number of the form
2n + 1 is a prime for some integer n, then it must be a Fermat prime.
Proof. If a is odd then an + 1 is even and, hence, not a prime. Suppose then that a is even
and n = kl with k odd and k ≥ 3. Then
akl + 1
= a(k−1)l − a(k−2)l + ⋅ ⋅ ⋅ + 1.
al + 1
Theorem 6.3.14. A regular n-gon is constructible with a straightedge and compass if and
only if n = 2m p1 ⋅ ⋅ ⋅ pk , where p1 , . . . , pk are distinct Fermat primes.
For example, before proving the theorem, notice that a regular 20-gon is con-
structible since 20 = 22 ⋅ 5, and 5 is a Fermat prime. On the other hand, a regular 11-gon
is not constructible.
2πi
Proof. Let μ = e n be a primitive n-th root of unity. Since
2πi 2π 2π
e n = cos( ) + i sin( )
n n
86 � 6 Field Extensions and Compass and Straightedge Constructions
1 2π
μ+ = 2 cos( ).
μ n
e e −1 e e −1 e e −1
ϕ(n) = 2m−1 ⋅ (p11 − p11 )(p22 − p22 ) ⋅ ⋅ ⋅ (pkk − pkk ).
6.4 Exercises
1. Let ϕ be a given angle. In which of the following cases is the angle ψ constructible
from the angle ϕ by compass and straightedge?
π π
(a) ϕ = 13 and ψ = 26 .
π π
(b) ϕ = 33 and ψ = 11 .
(c) ϕ = π7 and ψ = 12π
.
2. (The golden section) In the plane, let AB be a given segment from A to B with length a.
The segment AB should be divided such that the proportion of AB to the length of the
bigger subsegment is equal to the proportion of the length of the bigger subsegment
to the length of the smaller subsegment:
a b
= ,
b a−b
where b is the length of the bigger subsegment. Such a division is called division by
the golden section. If we write b = ax, 0 < x < 1, then x1 = 1−x
x
, that is, x 2 = 1 − x. Do
the following:
(a) Show that x1 = 1+2 5 = α.
√
6.4 Exercises � 87
(b) Construct the division of AB by the golden section with compass and straight-
edge.
(c) If we divide the radius r > 0 of a circle by the golden section, then the bigger
part of the so divided radius is the side of the regular 10-gon with its 10 vertices
on the circle.
3. Given a regular 10-gon such that the 10 vertices are on the circle with radius R > 0.
Show that the length of each side is equal to the bigger part of the radius divided by
the golden section. Describe the procedure of the construction of the regular 10-gon
and 5-gon.
4. Construct the regular 17-gon with compass and straightedge.
2πi
Hint: We have to construct the number 21 (ω + ω−1 ) = cos 2π 17
, where ω = e 17 . First,
construct the positive zero ω1 of the polynomial x 2 + x − 4; we get
1
ω1 = (√17 − 1) = ω + ω−1 + ω2 + ω−2 + ω4 + ω−4 + ω8 + ω−8 .
2
1 √
ω2 = ( 17 − 1 + √34 − 2√17) = ω + ω−1 + ω4 + ω−4 .
4
2πi 2π 2π
e n = cos( ) + i sin( ),
n n
show that
1 2π
μ+ = 2 cos( ).
μ n
7 Kronecker’s Theorem and Algebraic Closures
7.1 Kronecker’s Theorem
In the last chapter, we proved that if L|K is a field extension, then there exists an inter-
mediate field K ⊂ 𝒜 ⊂ L such that 𝒜 is algebraic over K, and contains all the elements of
L that are algebraic over K. We call 𝒜 the algebraic closure of K within L. In this chapter,
we prove that starting with any field K, we can construct an extension field K that is al-
gebraic over K and is algebraically closed. By this, we mean that there are no algebraic
extensions of K or, equivalently, that there are no irreducible nonlinear polynomials in
K[x]. In the final section of this chapter, we will give a proof of the famous fundamental
theorem of algebra, which in the language of this chapter says that the field ℂ of com-
plex numbers is algebraically closed. We will present another proof of this important
result later in the book after we discuss Galois theory.
First, we need the following crucial result of Kronecker, which says that given a
polynomial f (x) in K[x], where K is a field, we can construct an extension field L of K, in
which f (x) has a zero α. We say that L has been constructed by adjoining α to K. Recall
that if f (x) ∈ K[x] is irreducible, then f (x) can have no zeros in K. We first need the
following concept:
Definition 7.1.1. Let L|K and L′ |K be field extensions. Then a K-isomorphism is an iso-
morphism τ : L → L′ , that is, the identity map on K; thus, it fixes each element of K.
Theorem 7.1.2 (Kronecker’s theorem). Let K be a field and f (x) ∈ K[x]. Then there exists
a finite extension K ′ of K, where f (x) has a zero.
Proof. Suppose that f (x) ∈ K[x]. We know that f (x) factors into irreducible polynomials.
Let p(x) be an irreducible factor of f (x). From the material in Chapter 4, we know that
since p(x) is irreducible, the principal ideal ⟨p(x)⟩ in K[x] is a maximal ideal. To see this,
suppose that g(x) ∉ ⟨p(x)⟩, so that g(x) is not a multiple of p(x). Since p(x) is irreducible,
it follows that (p(x), g(x)) = 1. Thus, there exist h(x), k(x) ∈ K[x] with
h(x)p(x) + k(x)g(x) = 1.
The element on the left is in the ideal (g(x), p(x)), so the identity, 1, is in this ideal. There-
fore, the whole ring K[x] is in this ideal. Since g(x) was arbitrary, this implies that the
principal ideal ⟨p(x)⟩ is maximal.
Now let K ′ = K[x]/⟨p(x)⟩. Since ⟨p(x)⟩ is a maximal ideal, it follows that K ′ is a field.
We show that K can be embedded in K ′ , and that p(x) has a zero in K ′ .
First, consider the map α : K[x] → K ′ by α(f (x)) = f (x) + ⟨p(x)⟩. This is a homo-
morphism. Since the identity element 1 ∈ K is not in ⟨p(x)⟩, it follows that α restricted
to K is nontrivial. Therefore, α restricted to K is a monomorphism since if ker(α|K ) ≠ K
then ker(α|K ) = {0}. Therefore, K can be embedded into α(K), which is contained in K ′ .
Therefore, K ′ can be considered as an extension field of K. Consider the element a =
https://doi.org/10.1515/9783111142524-007
7.1 Kronecker’s Theorem � 89
x + ⟨p(x)⟩ ∈ K ′ . Then p(a) = p(x) + ⟨p(x)⟩ = 0 + ⟨p(x)⟩ since p(x) ∈ ⟨p(x)⟩. But 0 + ⟨p(x)⟩
is the zero element 0 of the factor ring K[x]/⟨p(x)⟩. Therefore, in K ′ , we have p(a) = 0;
hence, p(x) has a zero in K ′ . Since p(x) divides f (x), we must have f (a) = 0 in K ′ also.
Therefore, we have constructed an extension field of K, in which f (x) has a zero.
In conformity to Chapter 5, we write K(a) for the field adjunction of a = x + ⟨(p(x))⟩
to K. We now outline an intuitive construction. From this, we say that the field K is
constructed by adjoining the zero (α) to K. We remark that this construction is not a
formally correct proof as that given for Theorem 7.1.2.
We can assume that f (x) is irreducible. Suppose that f (x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n with
an ≠ 0. Define α to satisfy
a0 + a1 α + ⋅ ⋅ ⋅ + an αn = 0.
Then on K(α), define addition and subtraction componentwise, and define multiplica-
tion by algebraic manipulation, replacing powers of α higher than αn by using
We claim that K ′ = K(α), then forms a field of finite degree over K. The basic
ring properties follow easily by computation (see exercises) using the definitions. We
must show then that every nonzero element of K(α) has a multiplicative inverse. Let
g(α) ∈ K(α). Then the corresponding polynomial g(x) ∈ K[x] is a polynomial of de-
gree ≤ n − 1. Since f (x) is irreducible of degree n, it follows that f (x) and g(x) must be
relatively prime; that is, (f (x), g(x)) = 1. Hence, there exist a(x), b(x) ∈ K[x] with
b(α)g(α) = 1.
Now b(α) might have degree higher than n − 1 in α. However, using the relation that
f (α) = 0, we can rewrite b(α) as b(α), where b(α) now has degree ≤ n − 1 in α and, hence,
is in K(α). Therefore,
b(α)g(α) = 1;
90 � 7 Kronecker’s Theorem and Algebraic Closures
hence, g(α) has a multiplicative inverse. It follows that K(α) is a field and, by definition,
f (α) = 0. The elements 1, α, . . . , αn−1 form a basis for K(α) over K and, hence,
K(α) : K = n.
Example 7.1.3. Let f (x) = x 2 + 1 ∈ ℝ[x]. This is irreducible over ℝ. We construct the
field, in which this has a zero. Let K ′ ≅ K[x]/⟨x 2 + 1⟩, and let a ∈ K ′ with f (a) = 0. The
extension field ℝ(α) then has the form
K ′ = ℝ(α) = {x + αy : x, y ∈ ℝ, α2 = −1}.
It is clear that this field is ℝ-isomorphic to the complex numbers ℂ; ℝ(α) ≅ ℝ(i) ≅ ℂ.
Theorem 7.1.4. Let p(x) ∈ K[x] be an irreducible polynomial, and let K ′ = K(α) be the
extension field of K constructed in Kronecker’s theorem, in which p(x) has a zero α. Let L
be an extension field of K, and suppose that a ∈ L is algebraic with minimal polynomial
mα (x) = p(x). Then K(α) is K-isomorphic to K(a).
Proof. If L|K is a field extension and a ∈ L with p(a) = 0 and if deg(p(x)) = n, then the
elements 1, a, . . . , an−1 constitute a basis for K(a) over K, and the elements 1, α, . . . , αn−1
constitute a basis for K(α) over K. The mapping
τ : K(a) → K(α)
defined by τ(k) = k if k ∈ K and τ(a) = α, and then extended by linearity, is easily shown
to be a K-isomorphism.
f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ).
f (x) = (x − a1 )h(x),
with deg(g(x)) = n − 2. Continue in this manner, and f (x) factors completely into linear
factors. Hence, (1) implies (2).
Now suppose (2); that is, that each nonconstant polynomial in K[x] factors into lin-
ear factors over K. Suppose that f (x) is irreducible. If deg(f (x)) > 1, then f (x) factors
into linear factors and, hence, is not irreducible. Therefore, f (x) must be of degree 1,
and (2) implies (3).
Now suppose that an element of K[x] is irreducible if and only if it is of degree one,
and suppose that L|K is an algebraic extension. Let a ∈ L. Then a is algebraic over K.
Its minimal polynomial ma (x) is monic and irreducible over K and, hence, from (3), is
linear. Therefore, ma (x) = x−a ∈ K[x]. It follows that a ∈ K and, hence, K = L. Therefore,
(3) implies (4).
Finally, suppose that whenever L|K is an algebraic extension, then L = K. Suppose
that f (x) is a nonconstant polynomial in K[x]. From Kronecker’s theorem, there exists
a field extension L, and a ∈ L with f (a) = 0. However, L is an algebraic extension.
Therefore, by supposition, K = L. Therefore, a ∈ K, and f (x) has a zero in K. Therefore,
(4) implies (1), completing the proof.
In the next section, we will prove that given a field K, we can always find an exten-
sion field K with the properties of the last theorem.
Theorem 7.2.2. A field K is algebraically closed if and only it satisfies any one of the fol-
lowing conditions:
(1) Each nonconstant polynomial in K[x] has a zero in K.
(2) Each nonconstant polynomial in K[x] factors into linear factors over K. That is, for
each f (x) ∈ K[x], there exist elements a1 , . . . , an , b ∈ K with
f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ).
The prime example of an algebraically closed field is the field ℂ of complex num-
bers. The fundamental theorem of algebra says that any nonconstant complex polyno-
mial has a complex zero.
92 � 7 Kronecker’s Theorem and Algebraic Closures
We now show that the algebraic closure of one field within an algebraically closed
field is algebraically closed. First, we define a general algebraic closure.
Theorem 7.2.4. Let K be a field and L|K an extension of K with L algebraically closed. Let
K = 𝒜K be the algebraic closure of K within L. Then K is an algebraic closure of K.
Proof. Let K = 𝒜K be the algebraic closure of K within L. We know that K|K is algebraic.
Therefore, we must show that K is algebraically closed.
Let f (x) be a nonconstant polynomial in K[x]. Then f (x) ∈ L[x]. Since L is alge-
braically closed, f (x) has a zero a in L. Since f (a) = 0 and f (x) ∈ K[x], it follows that a is
algebraic over K. However, K is algebraic over K. Therefore, a is also algebraic over K.
Hence, a ∈ K, and f (x) has a zero in K. Therefore, K is algebraically closed.
We want to note the distinction between being algebraically closed and being an
algebraic closure.
Lemma 7.2.5. The complex numbers ℂ are an algebraic closure of ℝ, but not an algebraic
closure of ℚ. An algebraic closure of ℚ is 𝒜 the field of algebraic numbers within ℂ.
We now show that every field has an algebraic closure. To do this, we first show that
any field can be embedded into an algebraically closed field.
Theorem 7.2.6. Let K be a field. Then K can be embedded into an algebraically closed
field.
Proof. We show first that there is an extension field L of K, in which each nonconstant
polynomial f (x) ∈ K[x] has a zero in L.
Assign to each nonconstant f (x) ∈ K[x] the symbol yf , and consider
R = K[ yf : f (x) ∈ K[x]],
n
I = {∑ fj (yfj )rj : rj ∈ R, fj (x) ∈ K[x]}.
j=1
7.2 Algebraic Closures and Algebraically Closed Fields � 93
1 = g1 f1 (yf1 ) + ⋅ ⋅ ⋅ + gn fn (yfn ),
where gi ∈ I = R.
In the n polynomials g1 , . . . , gn , there are only a finite number of variables, say for
example,
Hence,
n
1 = ∑ gi (yf1 , . . . , yfm )fi (yfi ). (∗)
i=1
f (yf + M) = f (yf ) + M.
K ⊂ K1 (= L) ⊂ K2 ⊂ ⋅ ⋅ ⋅
Proof. Let K̂ be an algebraically closed field containing K, which exists from Theo-
rem 7.2.6. Now let K = 𝒜K̂ be the set of elements of K̂ that are algebraic over K. From
Theorem 7.2.4, K̂ is an algebraic closure of K.
Proof. This is a generalized version of Theorem 7.1.4. If b ∈ K(a), then from the con-
struction of K(a), there is a polynomial g(x) ∈ K[x] with b = g(a). Define a map
ψ : K(a) → K ′ (a′ )
by
ψ(b) = ϕ(g(x))(a′ ).
Since ϕ(f (x))(a′ ) = 0, this implies that ϕ(g(x))(a′ ) = ϕ(h(x))(a′ ); hence, the map ψ is
well defined.
It is easy to show that ψ is a homomorphism. Let b1 = g1 (a), b2 = g2 (a). Then b1 b1 =
g1 g2 (a). Hence,
In the same manner, we have ψ(b1 + b2 ) = ψ(b1 ) + ψ(b2 ). Now suppose that k ∈ K so that
k ∈ K[x] is a constant polynomial. Then ψ(k) = (ϕ(k))(a′ ) = ϕ(k). Therefore, ψ restricted
to K is precisely ϕ. As ψ is not the zero mapping, it follows that ψ is a monomorphism.
Finally, since K(a) is generated from K and a, and ψ restricted to K is ϕ, it follows
that ψ is uniquely determined by ϕ and ψ(a) = a′ . Hence, ψ is unique.
Before we give the proof, we note that the theorem gives the following diagram:
Now the set ℳ is nonempty since (K, ϕ) ∈ ℳ. Order ℳ by (M1 , τ1 ) < (M2 , τ2 ) if
M1 ⊂ M2 and (τ2 )|M = τ1 . Let
1
𝒦 = {(Mi , τi ) : i ∈ I}
It is clear that M is an upper bound for the chain 𝒦. Since each chain has an upper bound
it follows from Zorn’s lemma that ℳ has a maximal element (N, ρ). We show that N = L.
Suppose that N ⊊ L. Let a ∈ L \ N. Then a is algebraic over N and further algebraic
over K, since L|K is algebraic. Let ma (x) ∈ N[x] be the minimal polynomial of a relative
to N. Since L1 is algebraically closed, ρ(ma (x)) has a zero a′ ∈ L1 . Therefore, there is a
monomorphism ρ′ : N(a) → L1 with ρ′ restricted to N, the same as ρ. It follows that
(N, ρ) < (N(a), ρ′ ) since a ∉ N. This contradicts the maximality of N. Therefore, N = L,
completing the proof.
96 � 7 Kronecker’s Theorem and Algebraic Closures
Combining the previous two theorems, we can now prove that any two algebraic
closures of a field K are unique up to K-isomorphism; that is, up to an isomorphism,
thus, is the identity on K.
Theorem 7.2.11. Let L1 and L2 be algebraic closures of the field K. Then there is a
K-isomorphism τ : L → L1 . Again by K-isomorphism, we mean that τ is the identity
on K.
Corollary 7.2.12. Let L|K and L′ |K be field extensions with a ∈ L and a′ ∈ L′ algebraic
elements over K. Then K(a) is K-isomorphic to K(a′ ) if and only if |K(a) : K| = |K(a′ ) : K|,
and there is an element a′′ ∈ K(a′ ) with ma (x) = ma′′ (x).
We have just seen that given an irreducible polynomial over a field K, we could always
find a field extension, in which this polynomial has a zero. We now push this further to
obtain field extensions, where a given polynomial has all its zeros.
Definition 7.3.1. If K is a field and 0 ≠ f (x) ∈ K[x], and K ′ is an extension field of K, then
f (x) splits in K ′ (K ′ may be K), if f (x) factors into linear factors in K ′ [x]. Equivalently,
this means that all the zeros of f (x) are in K ′ .
K ′ is a splitting field for f (x) over K if K ′ is the smallest extension field of K, in which
f (x) splits. (A splitting field for f (x) is the smallest extension field, in which f (x) has all
its possible zeros.)
K ′ is a splitting field over K if it is the splitting field for some finite set of polynomials
over K.
Theorem 7.3.2. If K is a field and 0 ≠ f (x) ∈ K[x], then there exists a splitting field for
f (x) over K.
Proof. The splitting field is constructed by repeated adjoining of zeros. Suppose, with-
out loss of generality, that f (x) is irreducible of degree n over K. From Theorem 7.1.2,
there exists a field K ′ containing α with f (α) = 0. Then f (x) = (x − α)g(x) ∈ K ′ [x] with
deg g(x) = n − 1. By an inductive argument, g(x) has a splitting field; therefore, so does
f (x).
Definition 7.3.3. A group G is a set with one binary operation, which we will denote by
multiplication, such that the following hold:
(1) The operation is associative; that is, (g1 g2 )g3 = g1 (g2 g3 ) for all g1 , g2 , g3 ∈ G.
(2) There exists an identity for this operation; that is, an element 1 such that 1g = g for
each g ∈ G.
(3) Each g ∈ G has an inverse for this operation; that is, for each g, there exists a g −1
with the property that gg −1 = 1.
If in addition the operation is commutative (g1 g2 = g2 g1 for all g1 , g2 ∈ G), the group G
is called an Abelian group. The order of G is the number of elements in G, denoted |G|. If
|G| < ∞, G is a finite group. H ⊂ G is a subgroup if H is also a group under the same op-
eration as G. Equivalently, H is a subgroup if H ≠ 0, and H is closed under the operation
and inverses.
98 � 7 Kronecker’s Theorem and Algebraic Closures
Groups most often arise from invertible mappings of a set onto itself. Such mappings
are called permutations.
Theorem 7.3.5. For any set T, ST forms a group under composition called the symmetric
group on T. If T, T1 have the same cardinality (size), then ST ≅ ST1 . If T is a finite set with
|T| = n, then ST is a finite group, and |ST | = n!.
Proof. If ST is the set of all permutations on the set T, we must show that composition
is an operation on ST that is associative and has an identity and inverses.
Let f , g ∈ ST . Then f , g are one-to-one mappings of T onto itself.
Consider f ∘g : T → T. If f ∘g(t1 ) = f ∘g(t2 ), then f (g(t1 )) = f (g(t2 )), and g(t1 ) = g(t2 ),
since f is one-to-one. But then t1 = t2 since g is one-to-one.
If t ∈ T, there exists t1 ∈ T with f (t1 ) = t since f is onto. Then there exists t2 ∈ T
with g(t2 ) = t1 since g is onto. Putting these together, f (g(t2 )) = t; therefore, f ∘ g is onto.
Therefore, f ∘ g is also a permutation, and composition gives a valid binary operation
on ST .
The identity function 1(t) = t for all t ∈ T will serve as the identity for ST , whereas
the inverse function for each permutation will be the inverse. Such unique inverse func-
tions exist since each permutation is a bijection.
Finally, composition of functions is always associative; therefore, ST forms a group.
If T, T1 have the same cardinality, then there exists a bijection σ : T → T1 . Define a
map F : ST → ST1 in the following manner: if f ∈ ST , let F(f ) be the permutation on T1
given by F(f )(t1 ) = σ(f (σ −1 (t1 ))). It is straightforward to verify that F is an isomorphism
(see the exercises).
Finally, suppose |T| = n < ∞. Then T = {t1 , . . . , tn }. Each f ∈ ST can be pictured as
t1 ⋅⋅⋅ tn
f =( ).
f (t1 ) ⋅⋅⋅ f (tn )
For t1 , there are n choices for f (t1 ). For t2 , there are only n − 1 choices since f is one-to-
one. This continues down to only one choice for tn . Using the multiplication principle,
the number of choices for f and, therefore, the size of ST is
n(n − 1) ⋅ ⋅ ⋅ 1 = n!.
For a set with n elements, we denote ST by Sn called the symmetric group on n sym-
bols.
Example 7.3.6. Write down the six elements of S3 , and give the multiplication table for
the group.
7.3 The Fundamental Theorem of Algebra � 99
1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2
The multiplication table for S3 can be written down directly by doing the required
composition. For example,
1 2 3 1 2 3 1 2 3
ac = ( )( )=( ) = d.
2 3 1 2 1 3 3 2 1
1 a a2 c ac a2 c
1 1 a a2 c ac a2 c
a a a2 1 ac a2 c c
a2 a2 1 a a2 c c ac
c c a2 c ac 1 a2 a
ac ac c a2 c a 1 a2
a2 c a2 c ac c a2 a 1
S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.
An important result, the form of which we will see later in our work on extension
fields, is the following:
Lemma 7.3.7. Let T be a set and T1 ⊂ T a subset. Let H be the subset of ST that fixes each
element of T1 ; that is, f ∈ H if f (t) = t for all t ∈ T1 . Then H is a subgroup.
Example 7.3.9. Let K be a field and k0 , k1 ∈ K. Let h(y1 , y2 ) = k0 (y1 + y2 ) + k1 (y1 y2 ). There
are two permutations on {y1 , y2 }, namely, σ1 : y1 → y1 , y2 → y2 and σ2 : y1 → y2 , y2 → y1 .
Applying either one of these two to {y1 , y2 } leaves h(y1 , y2 ) invariant. Therefore, h(y1 , y2 )
is a symmetric polynomial.
In general, the pattern of the last example holds for y1 , . . . , yn . That is,
s1 = y1 + y2 + ⋅ ⋅ ⋅ + yn
s2 = y1 y2 + y1 y3 + ⋅ ⋅ ⋅ + yn−1 yn
s3 = y1 y2 y3 + y1 y2 y4 + ⋅ ⋅ ⋅ + yn−2 yn−1 yn
..
.
sn = y1 ⋅ ⋅ ⋅ yn .
From this theorem, we obtain the following two lemmas, which will be crucial in
our proof of the fundamental theorem of algebra.
Lemma 7.3.13. Let p(x) ∈ K[x], and suppose p(x) has the zeros α1 , . . . , αn in the splitting
field K ′ . Then the elementary symmetric polynomials in α1 , . . . , αn are in K.
Proof. Suppose p(x) = c0 + c1 x + ⋅ ⋅ ⋅ + cn x n ∈ K[x]. Since p(x) splits in K ′ [x], with zeros
α1 , . . . , αn , we have that, in K ′ [x],
p(x) = cn (x − α1 ) ⋅ ⋅ ⋅ (x − αn ).
The coefficients are then cn (−1)i si (α1 , . . . , αn ), where the si (α1 , . . . , αn ) are the ele-
mentary symmetric polynomials in α1 , . . . , αn . However, p(x) ∈ K[x], so each coefficient
is in K. It follows then that for each i, cn (−1)i si (α1 , . . . , αn ) ∈ K; hence, si (α1 , . . . , αn ) ∈ K
since cn ∈ K.
Lemma 7.3.14. Let p(x) ∈ K[x], and suppose p(x) has the zeros α1 , . . . , αn in the split-
ting field K ′ . Suppose further that g(x) = g(x, α1 , . . . , αn ) ∈ K ′ [x]. If g(x) is a symmetric
polynomial in α1 , . . . , αn , then g(x) ∈ K[x].
The proof depends on the following sequence of lemmas. The crucial one now is the
last, which says that any real polynomial must have a complex zero.
Lemma 7.3.16. Any odd-degree real polynomial must have a real zero.
From (1), P(x) gets arbitrarily large positively, so there exists an x1 with P(x1 ) > 0. Simi-
larly, from (2) there exists an x2 with P(x2 ) < 0.
A real polynomial is a continuous real-valued function for all x ∈ ℝ. Since
P(x1 )P(x2 ) < 0, it follows from the intermediate value theorem that there exists an
x3 , between x1 and x2 , such that P(x3 ) = 0.
Lemma 7.3.17. Any degree-two complex polynomial must have a complex zero.
Proof. This is a consequence of the quadratic formula and of the fact that any complex
number has a square root.
If P(x) = ax 2 + bx + c, a ≠ 0, then the zeros formally are
P(z) = a0 + ⋅ ⋅ ⋅ + an zn = a0 + a1 z + ⋅ ⋅ ⋅ + an zn = P(z).
(2) Suppose P(x) is real, then ai = ai for all its coefficients; hence, P(x) = P(x).
Conversely, suppose P(x) = P(x). Then ai = ai for all its coefficients; hence, ai ∈ ℝ for
each ai ; therefore, P(x) is a real polynomial.
(3) The proof is a computation and left to the exercises.
Lemma 7.3.21. If every nonconstant real polynomial has a complex zero, then every non-
constant complex polynomial has a complex zero.
Proof. Let P(x) ∈ ℂ[x], and suppose that every nonconstant real polynomial has at least
one complex zero. Let H(x) = P(x)P(x). From Lemma 7.3.20, H(x) ∈ ℝ[x]. By supposition
there exists a z0 ∈ ℂ with H(z0 ) = 0. Then P(z0 )P(z0 ) = 0, and since ℂ is a field it has no
zero divisors.
Hence, either P(z0 ) = 0, or P(z0 ) = 0. In the first case, z0 is a zero of P(x). In the
second case, P(z0 ) = 0. Then from Lemma 7.3.19, P(z0 ) = P(z0 ) = P(z0 ) = 0. Therefore,
z0 is a zero of P(x).
Now we come to the crucial lemma.
This is in K ′ [x]. In forming H(x), we chose pairs of zeros {αi , αj }, so the number of
such pairs is the number of ways of choosing two elements out of n = 2m q elements. This
is given by
(2m q)(2m q − 1)
= 2m−1 q(2m q − 1) = 2m−1 q′
2
with q′ odd. Therefore, the degree of H(x) is 2m−1 q′ .
H(x) is a symmetric polynomial in the zeros α1 , . . . , αn . Since α1 , . . . , αn are the zeros
of a real polynomial, from Lemma 7.3.14, any polynomial in the splitting field symmetric
in these zeros must be a real polynomial.
Therefore, H(x) ∈ ℝ[x] with degree 2m−1 q′ . By the inductive hypothesis, then, H(x)
must have a complex zero. This implies that there exists a pair {αi , αj } with
αi + αj + hαi αj ∈ ℂ.
104 � 7 Kronecker’s Theorem and Algebraic Closures
Since h was an arbitrary integer, for any integer h1 , there must exist such a pair
{αi , αj } with
αi + αj + h1 αi αj ∈ ℂ.
Now let h1 vary over the integers. Since there are only finitely many such pairs
{αi , αj }, it follows that there must be at least two different integers h1 , h2 such that
z1 = αi + αj + h1 αi αj ∈ ℂ, and z2 = αi + αj + h2 αi αj ∈ ℂ.
However, p(x) is then a degree-two complex polynomial, and so from Lemma 7.3.17, its
zeros are complex. Therefore, αi , αj ∈ ℂ; thus, f (x) has a complex zero.
It is now easy to give a proof of the fundamental theorem of algebra. From Lem-
ma 7.3.22, every nonconstant real polynomial has a complex zero. From Lemma 7.3.21, if
every nonconstant real polynomial has a complex zero, then every nonconstant complex
polynomial has a complex zero, proving the fundamental theorem.
Proof. Let a ∈ E. Regard the elements 1, a, a2 , . . . . These elements become linearly de-
pendent over ℂ, and we get a nonconstant polynomial over ℂ with zero a. By the fun-
damental theorem of algebra, we know that a ∈ ℂ.
We refer to Section 17.6 where we revisit the fundamental theorem of algebra and
provide a Galois theoretic proof.
i i j j
The piece ax11 ⋅ ⋅ ⋅ xnn with a ≠ 0 is called higher than the piece bx11 ⋅ ⋅ ⋅ xnn with b ≠ 0,
if the first one of the differences i1 − j1 , i2 − j2 , . . . , in − jn that differs from zero is in fact
positive. The highest piece of a polynomial f (x1 , . . . , xn ) is denoted by HG(f ).
Hence,
s1 = x1 + x2 + ⋅ ⋅ ⋅ + xn
s2 = x1 x2 + x1 x3 + ⋅ ⋅ ⋅ + xn−1 xn
s3 = x1 x2 x3 + x1 x2 x4 + ⋅ ⋅ ⋅ + xn−2 xn−1 xn
..
.
sn = x1 ⋅ ⋅ ⋅ xn .
where the sum is taken over all the (kn) different systems of indices i1 , . . . , ik with
i1 < i2 < ⋅ ⋅ ⋅ < ik . Furthermore, a polynomial s(x1 , . . . , xn ) is a symmetric polynomial
if s(x1 , . . . , xn ) is unchanged by any permutation σ of {x1 , . . . , xn }, that is, s(x1 , . . . , xn ) =
s(σ(x1 ), . . . , σ(xn )).
k k
Lemma 7.4.2. In the highest piece ax1 1 ⋅ ⋅ ⋅ xnn with a ≠ 0 of a symmetric polynomial
s(x1 , . . . , xn ), we have k1 ≥ k2 ≥ ⋅ ⋅ ⋅ ≥ kn .
Proof. Assume that ki < kj for some i < j. As a symmetric polynomial, s(x1 , . . . , xn ) also
k k k k k k
must then contain the piece ax1 1 ⋅ ⋅ ⋅ xi j ⋅ ⋅ ⋅ xj i ⋅ ⋅ ⋅ xnn , which is higher than ax1 1 ⋅ ⋅ ⋅ xi i ⋅ ⋅ ⋅
k k
xj j ⋅ ⋅ ⋅ xnn , giving a contradiction.
k −k2 k2 −k3 k −kn kn
Lemma 7.4.3. The product s1 1 s2 ⋅ ⋅ ⋅ sn−1
n−1
sn with k1 ≥ k2 ≥ ⋅ ⋅ ⋅ ≥ kn has the high-
k k k
est piece x1 1 x2 2 ⋅ ⋅ ⋅ xnn .
Proof. From the definition of the elementary symmetric polynomials, we have that
HG(skt ) = (x1 x2 ⋅ ⋅ ⋅ xk )t , 1 ≤ k ≤ n, t ≥ 1.
Proof. We prove the existence of the polynomial f by induction on the size of the highest
pieces. If in the highest piece of a symmetric polynomial all exponents are zero, then it
is constant, that is, an element of R. Therefore, there is nothing to prove.
Now we assume that each symmetric polynomial with the highest piece smaller
than that of s(x1 , . . . , xn ) can be written as a polynomial in the elementary symmetric
k k
polynomials. Let ax1 1 ⋅ ⋅ ⋅ xnn , a ≠ 0, be the highest piece of s(x1 , . . . , xn ). Let
k −k2 k −kn kn
t(x1 , . . . , xn ) = s(x1 , . . . , xn ) − as1 1 ⋅ ⋅ ⋅ sn−1
n−1
sn .
Clearly, t(x1 , . . . , xn ) is another symmetric polynomial, and from Lemma 7.3.19, the
highest piece of t(x1 , . . . , xn ) is smaller than that of s(x1 , . . . , xn ). Therefore, t(x1 , . . . , xn ).
k −k kn−1 −kn kn
Hence, s(x1 , . . . , xn ) = t(x1 , . . . , xn ) + as1 1 2 ⋅ ⋅ ⋅ sn−1 sn can be written as a polynomial
in s1 , . . . , sn . To prove the uniqueness of this expression, assume that
Then
i2 = j2 = k 2 = −1,
ij = k, jk = i, ki = j,
ji = −k, kj = −i, ik = −j.
For
x = x0 + x1 i + x2 j + x3 k and y = y0 + y1 i + y2 j + x3 k,
108 � 7 Kronecker’s Theorem and Algebraic Closures
Together with this addition and multiplication, V becomes a noncommutative ring with
unit element 1. For each quaternion
x = x0 + x1 i + x2 j + x3 k,
x := x0 − x1 i − x2 j − x3 k.
x = x, x + y = x + y, λx = λx, λ ∈ ℝ, and xy = x ⋅ y.
With help of the conjugation, we may now define the norm and the length of a quater-
nion
x = x0 + x1 i + x2 j + x3 k
by
n(x) = xx = xx = x02 + x12 + x22 + x32 and |x| = √x02 + x12 + x22 + x32 ,
x x
xx −1 = x =1=x .
xx xx
Hence, together with the addition and multiplication, V becomes a skew field, in which
ℝ can be embedded via r → r ⋅ 1 for r ∈ ℝ.
Theorem 7.5.1. The set of quaternions ℍ is a skew field, which contains both the reals
and the complexes as subfields. It has dimension 4 as a vector space over ℝ. Furthermore,
rx = xr for all x ∈ ℍ, and all r ∈ ℝ (considered as elements of ℍ).
In ℍ, there is an important multiplicative rule for the norm and the length:
This result on norms in the quaternions provides the general equation in ℝ on sums
of four squares:
Theorem 7.5.2 (Theorem of Lagrange). Each natural number n can be written as a sum
n = a 2 + b2 + c 2 + d 2
Hint: We have only to show that (see [53, Chapter 3.2]) if p is a prime number with
p ≡ 3 (mod 4), then p = a2 + b2 + c2 + d 2 for some a, b, c, d ∈ ℤ. A proof of this can be
found for instance in the book [53].
We remark that the skew field ℍ of the quaternions can be embedded into M(2, ℂ)
via
1 0 i 0
1 → ( ), i → ( ),
0 1 0 −i
0 1 0 i
j → ( ) , k → ( ).
−1 0 i 0
x0 + x1 i x2 + x3 i w z
( )=( )
−x2 + x3 i x0 − x1 i −z w
with w = x0 + x1 i ∈ ℂ and z = x2 + x3 i ∈ ℂ.
We have shown that the quaternions form a skew field of degree 4 over the real
numbers. We ask whether there can be other finite degree skew field extensions of ℝ.
Let V be a ℝ-vector space of dimℝ (V ) = n < ∞. For which n, we may provide V with
a multiplication such that V with the vector addition and this multiplication becomes a
field, or a skew field.
We remark that some nonzero vector in V has to be the unit element 1; therefore,
we automatically have an embedding ℝ → V .
Let n ≥ 2. Since the irreducible polynomials from ℝ[x] have degree 1 or 2, then
under the existence of such a multiplication, each element α ∈ V , which is not in ℝ
(considered as a subset of V ), must be a zero of a quadratic polynomial from ℝ[x].
110 � 7 Kronecker’s Theorem and Algebraic Closures
We now assume that we have in V a multiplication such that V , together with the
addition in V and this multiplication, is a field or a skew field.
If n = 2, we get the field ℂ of the complex numbers.
Now, let n = 3. Using analogous thoughts as for the implementation of ℂ, we may
construct in two steps a basis {1, i, j} of V such that 1 is the unit element of V , and i2 = j2 =
−1. Recall that a two-dimensional subspace of V has to be isomorphic to ℂ as a subfield
of V .
Let k = ij. Since dimℝ (V ) = 3, we must have k = a1 + b1 i + c1 j with a1 , b1 , c1 ∈ ℝ.
Multiplication from the left with i results in
−j = a1 i − b1 + c1 k = a1 i − b1 + c1 (a1 + b1 i + c1 j),
and since 1, i, j are linearly independent, therefore, we get c12 = −1, which is impossible
in ℝ. Therefore, the case n = 3 is not possible.
If n = 4, we may construct in V three linearly independent elements 1, i, j such that
1 is the unit element of V , and i2 = j2 = −1. Certainly ij is linearly independent from 1, i
and j, because otherwise, we get a contradiction as in the case n = 3. Also ji is linearly
independent from 1, i and j. Now i + j and i − j are both zeros of quadratic polynomials
over ℝ; that is, there exists r1 , s1 , r2 , s2 ∈ ℝ with
If we add these equations, we see that r1 = r2 = 0; therefore, we get from the first
equation that ij + ji = c ∈ ℝ. Here, we used that 1, i and j are linearly independent.
Now, we may replace j by j + c2 i, which gives
c c
i(j + i) + (j + i)i = 0.
2 2
So altogether, we may construct a basis {1, i, j, k} of V such that 1 is the unit element of V ,
and i2 = j2 = k 2 = −1, k = ij = −ji. Thereby, V is isomorphic to the skew field ℍ of the
quaternions.
Finally, let n ≥ 5. Analogously as for the case n = 4 and the general observation for
the subfield isomorphic to ℂ, we may construct a basis {1, i, j, k, l, . . .} such that
Analogously, as in the case n = 4, we have that i + l and i − l are both zeros of quadratic
polynomials over ℝ.
Therefore, as in the case n = 4,
il = li = a2 ∈ ℝ.
jl + lj = b2 ∈ ℝ and kl + lk = c2 ∈ ℝ.
We calculate
2lk = a2 j − b2 i + c2 .
−2l = a2 i + b2 j + c2 k,
7.6 Exercises
1. Let f , g ∈ K[x] be irreducible polynomials of degree 2 over the field K. Let α1 , α2
(respectively, β1 , β2 ) be zeros of f and g. For 1 ≤ i, j ≤ 2, let νij = αi + βj . Show the
following:
(a) |K(νij ) : K| ∈ {1, 2, 3, 4}.
(b) For fixed f , g, there are at most two different degrees in (a).
112 � 7 Kronecker’s Theorem and Algebraic Closures
(c) Decide which sets of combinations of degrees in (b) (with f , g variable) are pos-
sible, and give an example in each case.
2. Let L|K be a field extension; let ν ∈ L and f (x) ∈ L[x], a polynomial of degree ≥ 1.
Let all coefficients of f (x) be algebraic over K. If f (ν) = 0, then ν is algebraic over K.
3. Let L|K be a field extension, and let M be an intermediate field. The extension M|K
is algebraic. For ν ∈ L, the following are equivalent:
(a) ν is algebraic over M.
(b) ν is algebraic over K.
4. Let L|K be a field extension and ν1 , ν2 ∈ L. Then the following are equivalent:
(a) ν1 and ν2 are algebraic over K.
(b) ν1 + ν2 and ν1 ν2 are algebraic over K.
5. Let L|K be a simple field extension. Then there is an extension field L′ of L of the
form L′ = K(ν1 , ν2 ) with the following:
(a) ν1 and ν2 are transcendental over K.
(b) The set of all over K algebraic elements of L′ is L.
6. In the proof of Theorem 7.1.4, show that the mapping
τ : K(a) → K(α),
11. Determine all irreducible polynomials over ℝ. Factorize f (x) ∈ ℝ[x] in irreducible
polynomials.
8 Splitting Fields and Normal Extensions
8.1 Splitting Fields
In the last chapter, we introduced splitting fields and used this idea to present a proof of
the fundamental theorem of algebra. The concept of a splitting field is essential to the
Galois theory of equations. Therefore, in this chapter, we look more deeply at this idea.
Definition 8.1.1. Let K be a field and f (x) a nonconstant polynomial in K[x]. An exten-
sion field L of K is a splitting field for f (x) over K if the following hold:
(a) f (x) splits into linear factors in L[x].
(b) K ⊂ M ⊂ L and M ≠ L, resulting in f (x) not splitting into linear factors in M[x].
Lemma 8.1.2. L is a splitting field for f (x) ∈ K[x] if and only if f (x) splits into linear
factors in L[x], and if f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ) with b ∈ K, then L = K(a1 , . . . , an ).
Example 8.1.3. The field ℂ of complex numbers is a splitting field for the polynomial
p(x) = x 2 + 1 in ℝ[x]. In fact, since ℂ is algebraically closed, it is a splitting field for any
real polynomial f (x) ∈ ℝ[x], which has at least one nonreal zero.
The field ℚ(i) adjoining i to ℚ is a splitting field for x 2 + 1 over ℚ[x].
The next result was used in the previous chapter. We restate and reprove it here.
Theorem 8.1.4. Let K be a field. Then each nonconstant polynomial in K[x] has a splitting
field.
https://doi.org/10.1515/9783111142524-008
114 � 8 Splitting Fields and Normal Extensions
Before giving the proof of this theorem, we note that the following important result
is a direct consequence of it:
Proof of Theorem 8.1.5. Suppose that f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ) ∈ L[x] and suppose that
f ′ (x) = b′ (x − a1′ ) ⋅ ⋅ ⋅ (x − an′ ) ∈ L′ [x]. Then
We have proved that polynomials have unique factorization over fields. Since L′ ⊂ L′′ ,
it follows that the set of zeros (ψ(a1 ), . . . , ψ(an )) is a permutation of the set of zeros
(a1′ , . . . , an′ ). In particular, this implies that ψ(ai ) ∈ L′ ; thus,
Since the image of ψ is K ′ (a1 , . . . , an′ ) = K ′ (ψ(ai ), . . . , ψ(an )), it is clear that ψ is uniquely
determined by the images ψ(ai ). This proves part (a).
For part (b), embed L′ in an algebraic closure L′′ . Hence, there is a monomorphism
ϕ′ : K(a) → L′′
with ϕ′|K = ϕ and ϕ′ (a) = a′ . Hence, there is a monomorphism ψ : L → L′′ with ψ|K(a) = ϕ′ .
Then from part (a), it follows that ψ : L → L′ is an isomorphism.
Example 8.1.7. Let f (x) = x 3 −7 ∈ ℚ[x]. This has no zeros in ℚ, and since it is of degree 3,
it follows that it must be irreducible in ℚ[x].
Let ω = − 21 + 23 i ∈ ℂ. Then it is easy to show by computation that ω2 = − 21 − 23 i,
√ √
3
and ω = 1. Therefore, the three zeros of f (x) in ℂ are as follows:
a1 = 71/3
a2 = ω ⋅ 71/3
a3 = ω2 ⋅ 71/3 .
Hence, L = ℚ(a1 , a2 , a3 ), the splitting field of f (x). Since the minimal polynomial of
all three zeros over ℚ is the same f (x), it follows that
Since ℚ(a1 ) ⊂ ℝ and a2 , a3 are nonreal, it is clear that a2 , a3 ∉ ℚ(a1 ). Suppose that
ℚ(a2 ) = ℚ(a3 ). Then ω = a3 a2−1 ∈ ℚ(a2 ), and so 71/3 = ω−1 a2 ∈ ℚ(a2 ). Hence, Q(a1 ) ⊂
ℚ(a2 ); therefore, ℚ(a1 ) = ℚ(a2 ) since they have the same degree over ℚ. This contra-
diction shows that ℚ(a2 ) and ℚ(a3 ) are distinct.
8.2 Normal Extensions � 115
We do not know however if there are any more intermediate fields. There could,
for example, be infinitely many. However, as we will see when we do the Galois theory,
there are no others.
Note, in Example 8.1.7, the extension fields Q(αi )|ℚ are not normal extensions. Al-
though f (x) has a zero in ℚ(αi ), the polynomial f (x) does not split into linear factors in
ℚ(αi )[x].
116 � 8 Splitting Fields and Normal Extensions
We now show that L|K is a finite normal extension if and only if L is the splitting
field for some f (x) ∈ K[x].
Theorem 8.2.2. Let L|K be a finite extension. Then the following are equivalent:
(a) L|K is a normal extension.
(b) L|K is a splitting field for some f (x) ∈ K[x].
(c) If L ⊂ L′ and ψ : L → L′ is a monomorphism with ψ|K , the identity map on K, then ψ
is an automorphism of L; that is, ψ(L) = L.
Proof. Suppose that L|K is a finite normal extension. Since L|K is a finite extension, L is
algebraic over K, and since of finite degree, we have L = K(a1 , . . . , an ) with ai algebraic
over K.
Let fi (x) ∈ K[x] be the minimal polynomial of ai . Since L|K is a normal extension,
fi (x) splits in L[x]. This is true for each i = 1, . . . , n. Let f (x) = f1 (x)f2 (x) ⋅ ⋅ ⋅ fn (x). Then
f (x) splits into linear factors in L[x]. Since K = K(a1 , . . . , an ), the polynomial f (x) cannot
have all its zeros in any intermediate extension between K and L. Therefore, L is the
splitting field for f (x). Hence, (a) implies (b).
Now suppose that L ⊂ L′ and ψ : L → L′ is a monomorphism with ψ|K the identity
map on K. Then the extension field ψ(L) of K is also a splitting field for f (x) since ψ|K
is the identity on K. Hence, ψ maps the zeros of f (x) in L ⊂ L′ onto the zeros of f (x) in
ψ(L) ⊂ L′ , and thus it follows that ψ(L) = L. Hence, (b) implies (c).
Finally, suppose (c). Hence, we assume that if L ⊂ L′ and ψ : L → L′ is a monomor-
phism with ψ|K , the identity map on K, then ψ is an automorphism of L; that is, ψ(L) = L.
As before L|K is algebraic since L|K is finite. Suppose that f (x) ∈ K[x] is irreducible
and that a ∈ L is a zero of f (x). There are algebraic elements a1 , . . . , an ∈ L with L =
K(a1 , . . . , an ) since L|K is finite. For i = 1, . . . , n, let fi (x) ∈ K[x] be the minimal polynomial
of ai , and let g(x) = f (x)f1 (x) ⋅ ⋅ ⋅ fn (x). Let L′ be the splitting field of g(X). Clearly, L ⊂ L′ .
Let b ∈ L′ be a zero of f (x). From Theorem 8.1.5, there is an automorphism ψ of L′ with
ψ(a) = b and ψ|K , the identity on K. Hence, by our assumption, ψ|L is an automorphism
of L. It follows that b ∈ L; hence, f (x) splits in L[x]. Therefore, (c) implies (a), completing
the proof.
Later, we will tie this result to group theory when we prove that a subgroup of in-
dex 2 must be a normal subgroup.
8.2 Normal Extensions � 117
Example 8.2.4. As a first example of the lemma, consider the polynomial f (x) = x 2 −2. In
ℝ, this splits as (x − √2)(x + √2); hence, the field ℚ(√2) is the splitting field of f (x) = x 2 −2
over ℚ. Therefore, ℚ(√2) is a normal extension of ℚ.
Example 8.2.5. As a second example, consider the polynomial x 4 − 2 in ℚ[x]. The zeros
in ℂ are
Hence,
Therefore, we have
8.3 Exercises
1. Determine the splitting field of f (x) ∈ ℚ[x] and its degree over ℚ in the following
cases:
(a) f (x) = x 4 − p, where p is a prime.
(b) f (x) = x p − 2, where p is a prime.
2. Determine the degree of the splitting field of the polynomial x 4 +4 over ℚ. Determine
the splitting field of x 6 + 4x 4 + 4x 2 + 3 over ℚ.
3. For each a ∈ ℤ, let fa (x) = x 3 − ax 2 + (a − 3)x + 1 ∈ ℚ[x] be given:
(a) fa is irreducible over ℚ for each a ∈ ℤ.
(b) If b ∈ ℝ is a zero of fa , then also (1 − b)−1 and (b − 1)b−1 are zeros of fa .
(c) Determine the splitting field L of fa (x) over ℚ and its degree |L : ℚ|.
4. Let K be a field and f (x) ∈ K[x] a polynomial of degree n. Let L be a splitting field
of f (x). Show the following:
(a) If a1 , . . . , an ∈ L are the zeros of f , then |K(a1 , . . . , at ) : K| ≤ n ⋅ (n − 1) ⋅ ⋅ ⋅ (n − t + 1)
for each t with 1 ≤ t ≤ n.
(b) L over K is of degree at most n!.
(c) If f (x) is irreducible over K, then n divides |L : K|.
9 Groups, Subgroups and Examples
9.1 Groups, Subgroups and Isomorphisms
Recall from Chapter 1 that the three most commonly studied algebraic structures are
groups, rings and fields. We have now looked rather extensively at rings and fields. In
this chapter, we consider the basic concepts of group theory. Groups arise in many differ-
ent areas of mathematics. For example they arise in geometry as groups of congruence
motions, and in topology as groups of various types of continuous functions. Later in
this book, they will appear in Galois theory as groups of automorphisms of fields. First,
we recall the definition of a group given previously in Chapter 1.
Definition 9.1.1. A group G is a set with one binary operation, which we will denote by
multiplication, such that
(1) The operation is associative; that is, (g1 g2 )g3 = g1 (g2 g3 ) for all g1 , g2 , g3 ∈ G.
(2) There exists an identity for this operation; that is, an element 1 such that 1g = g and
g1 = g for each g ∈ G.
(3) Each g ∈ G has an inverse for this operation; that is, for each g, there exists a g −1
with the property that gg −1 = 1, and g −1 g = 1.
If, in addition, the operation is commutative; that is, g1 g2 = g2 g1 for all g1 , g2 ∈ G, the
group G is called an Abelian group.
The order of G, denoted |G|, is the number of elements in the group G. If |G| < ∞,
G is a finite group, otherwise, it is an infinite group.
It follows easily from the definition that the identity is unique, and that each element
has a unique inverse.
Proof. Suppose that 1 and e are both identities for G. Then 1e = e since 1 is an identity,
and 1e = 1 since e is an identity. Therefore, 1 = e, and there is only one identity.
Next suppose that g ∈ G, g1 , and g2 are inverses for g. Then
g1 gg2 = g1 (gg2 ) = g1 1 = g1
https://doi.org/10.1515/9783111142524-009
120 � 9 Groups, Subgroups and Examples
Therefore, g2−1 g1−1 is an inverse for g1 g2 , and since inverses are unique, it is the inverse
of the product.
Groups most often arise as permutations on a set. We will see this, as well as other
specific examples of groups, in the next sections.
Finite groups can be completely described by their group tables or multiplication
tables. These are sometimes called Cayley tables. In general, let G = {g1 , . . . , gn } be a
group, then the multiplication table of G is
g1 g2 ⋅⋅⋅ gj ⋅⋅⋅ gn
g1 ⋅⋅⋅
g2 ⋅⋅⋅
..
.
gi ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ gi gj
..
.
gn ...
The entry in the row of gi ∈ G and column of gj ∈ G is the product (in that order)
gi gj in G.
Groups satisfy the cancellation law for multiplication.
A consequence of Lemma 9.1.3 is that each row and each column in a group table is
just a permutation of the group elements. That is, each group element appears exactly
once in each row and each column.
A subset H ⊂ G is a subgroup of G if H is also a group under the same operation
as G. As for rings and fields, a subset of a group is a subgroup if it is nonempty and
closed under both the group operation and inverses.
H = {1 = g 0 , g, g −1 , g 2 , g −2 , . . .}
Lemma 9.1.5. If G is a group and g ∈ G, then ⟨g⟩ forms a subgroup of G called the cyclic
subgroup generated by g. ⟨g⟩ is Abelian, even if G is not.
Suppose that g ∈ G and g m = 1 for some positive integer m. Then let n be the smallest
positive integer such that g n = 1. It follows that the set of elements {1, g, g 2 , . . . , g n−1 } are
all distinct, but for any other power g k , we have g k = g t for some k = 0, 1, . . . , n − 1 (see
exercises). The cyclic subgroup generated by g then has order n, and we say that g has
order n, which we denote by o(g) = n. If no such n exists, we say that g has infinite order.
We will look more deeply at cyclic groups and subgroups in Section 9.5.
We introduce one more concept before looking at examples.
As with rings and fields, we say that two groups G and H are isomorphic, denoted
by G ≅ H, if there exists an isomorphism f : G → H. This means that, abstractly, G and
H have exactly the same algebraic structure.
In a field K, the nonzero elements are all invertible and form a group under mul-
tiplication. This is called the multiplicative group of the field K and is usually denoted
by K ∗ . Since multiplication in a field is commutative, the multiplicative group of a field
is an Abelian group. Hence, ℚ∗ , ℝ∗ , ℂ∗ are all infinite Abelian groups, whereas if p is
a prime, ℤ∗p forms a finite Abelian group. Recall that if p is a prime, then the modular
ring ℤp is a field.
Within ℚ∗ , ℝ∗ , ℂ∗ , there are certain multiplicative subgroups. Since the positive
rationals ℚ+ and the positive reals ℝ+ are closed under multiplication and inverse, they
form subgroups of ℚ∗ and ℝ∗ , respectively. In ℂ, if we consider the set of all complex
numbers z with |z| = 1, these form a multiplicative subgroup. Further within this sub-
group, if we consider the set of n-th roots of unity z (that is zn = 1) for a fixed n, this
forms a subgroup, this time of finite order.
The multiplicative group of a field is a special case of the unit group of a ring. If R
is a ring with identity, recall that a unit is an element of R with a multiplicative inverse.
Hence, in ℤ, the only units are ±1, whereas in any field every nonzero element is a unit.
Lemma 9.2.1. If R is a ring with identity, then the set of units in R forms a group under
multiplication called the unit group of R, and is denoted by U(R). If R is a field, then
U(R) = R∗ .
Proof. Let R be a ring with identity. Then the identity 1 itself is a unit, so 1 ∈ U(R); hence,
U(R) is nonempty. If e ∈ R is a unit, then it has a multiplicative inverse e−1 . Clearly then,
the multiplicative inverse has an inverse, namely, e so e−1 ∈ U(R) if e is. Hence, to show
U(R) is a group, we must show that it is closed under product.
Let e1 , e2 ∈ U(R). Then there exist e1−1 , e2−1 . It follows that e2−1 e1−1 is an inverse for e1 e2 .
Hence, e1 e2 is also a unit, and U(R) is closed under product. Therefore, for any ring R
with identity U(R) forms a multiplicative group.
and
Lemma 9.2.2. If K is a field, then for n ≥ 2, GL(n, K) forms a non-Abelian group under
matrix multiplication, and SL(n, K) forms a subgroup.
GL(n, K) is called the n-dimensional general linear group over K, whereas SL(n, K) is
called the n-dimensional special linear group over K.
Proof. Recall that for two n × n matrices A and B with n ≥ 2 over a field, we have
det(AB) = det(A) det(B) where det is the determinant.
9.2 Examples of Groups � 123
Now for any field, the n × n identity matrix I has determinant 1; hence, I ∈ GL(n, K).
Since the determinant is multiplicative, the product of two matrices with nonzero de-
terminant has nonzero determinant, so GL(n, K) is closed under product. Furthermore,
over a field K, if A is an invertible matrix, then det(A−1 ) = det1 A .
Therefore, if A has nonzero determinant, so does its inverse. It follows that GL(n, K)
has the inverse of any of its elements. Since matrix multiplication is associative, it fol-
lows that GL(n, K) forms a group. It is non-Abelian since in general matrix multiplica-
tion is noncommutative. SL(n, K) forms a subgroup of GL(n, K) because det(A−1 ) = 1 if
det(A) = 1.
Theorem 9.2.3. The set of congruence motions of ℰ 2 forms a group called the Euclidean
group. We denote the Euclidean group by ℰ .
Proof. The identity map I is clearly an isometry, and since composition of mappings is
associative, we need only to show that the product of isometries is an isometry, and that
the inverse of an isometry is an isometry.
Let T, U be isometries. Then d(a, b) = d(T(a), T(b)) and d(a, b) = d(U(a), U(b)) for
any points a, b. Now consider
One of the major results concerning ℰ is the following. We refer to [41], [42], [27],
and [35] for a more thorough treatment.
124 � 9 Groups, Subgroups and Examples
Proof. We outline a brief proof. If T is an isometry and T fixes the origin (0, 0), then T is
a linear mapping. It follows that T is a rotation or a reflection. If T does not fix the origin,
then there is a translation T0 such that T0 T fixes the origin. This gives translations and
glide reflections. In the exercises, we expand out more of the proof.
Proof. We show that Sym(D) is a subgroup of ℰ . The identity map I fixes D, that is, I ∈
Sym(D), and thus Sym(D) is nonempty. Let T, U ∈ Sym(D). Then T maps D to D, and so
does U. It follows directly that so does the composition TU; hence, TU ∈ Sym(D). If T
maps D to D, then certainly the inverse does.
Example 9.2.6. Let T be an equilateral triangle. Then there are exactly six symmetries
of T (see exercises). These are as follows:
– I is the identity,
– r is a rotation of 120∘ around the center of T,
– r is a rotation of 240∘ around the center of T,
– f is a reflection over the perpendicular bisector of one of the sides,
– fr is the composition of f and r, and
– fr 2 is the composition of f and r 2 .
The group Sym(T) is called the dihedral group D3 . In the next section, we will see that it
is isomorphic to S3 , the symmetric group on 3 symbols.
Theorem 9.3.2. For any set A, SA forms a group under composition, called the symmetric
group on A. If |A| > 2, then SA is non-Abelian. Furthermore, if A, B have the same cardi-
nality, then SA ≅ SB .
Proof. If SA is the set of all permutations on the set A, we must show that composition
is an operation on SA that is associative, and has an identity and inverses. Let f , g ∈ SA .
Then f , g are one-to-one mappings of A onto itself.
Consider f ∘ g : A → A. If f ∘ g(a1 ) = f ∘ g(a2 ), then f (g(a1 )) = f (g(a2 )), and g(a1 ) =
g(a2 ), since f is one-to-one. But then a1 = a2 since g is one-to-one.
If a ∈ A, there exists a1 ∈ A with f (a1 ) = a since f is onto. Then there exists a2 ∈ A
with g(a2 ) = a1 since g is onto. Putting these together, f (g(a2 )) = a; therefore, f ∘g is onto.
Therefore, f ∘ g is also a permutation, and composition gives a valid binary operation
on SA .
The identity function 1(a) = a for all a ∈ A will serve as the identity for SA , whereas
the inverse function for each permutation will be the inverse. Such unique inverse func-
tions exist since each permutation is a bijection.
Finally, composition of functions is always associative; therefore, SA forms a group.
Suppose that |A| > 2. Then A has at least 3 elements. Call them a1 , a2 , a2 . Consider
the 2 permutations f and g, which fix (leave unchanged) all of A, except a1 , a2 , a3 and on
these three elements:
f (a1 ) = a2 , f (a2 ) = a3 , f (a3 ) = a1
g(a1 ) = a2 , g(a2 ) = a1 , g(a3 ) = a3 .
whereas
A permutation group is any subgroup of SA for some set A. We now look at finite
permutation groups. Let A be a finite set, say A = {a1 , a2 , . . . , an }. Then each f ∈ SA can
be pictured as
126 � 9 Groups, Subgroups and Examples
a1 ⋅⋅⋅ an
f =( ).
f (a1 ) ⋅⋅⋅ f (an )
For a1 , there are n choices for f (a1 ). For a2 , there are only n − 1 choices since f is one-to-
one. This continues down to only one choice for an . Using the multiplication principle,
the number of choices for f ; therefore, the size of SA is
n(n − 1) ⋅ ⋅ ⋅ 1 = n!.
Example 9.3.5. Write down the six elements of S3 and give the multiplication table for
the group.
Name the three elements 1, 2, 3. The six elements of S3 are then as follows:
1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2
The multiplication table for S3 can be written down directly by doing the required
composition. For example,
1 2 3 1 2 3 1 2 3
ac = ( )( )=( ) = d.
2 3 1 2 1 3 3 2 1
1 a a2 c ac a2 c
1 1 a a2 c ac a2 c
a a a2 1 ac a2 c c
a2 a2 1 a a2 c c ac
c c a2 c ac 1 a2 a
ac ac c a2 c a 1 a2
a2 c a2 c ac c a2 a 1
9.4 Cosets and Lagrange’s Theorem � 127
S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.
Theorem 9.3.6 (Cayley’s theorem). Let G be a group. Consider the set of elements of G.
Then the group G is a permutation group on the set G; that is, G is a subgroup of SG .
Lemma 9.4.2. Let G be a group and H ⊂ G a subgroup. Then the relation defined above is
an equivalence relation on G. The equivalence classes all have the form aH for a ∈ G and
are called the left cosets of H in G. Clearly, G is a disjoint union of its left cosets.
Proof. Let us show, first of all, that this is an equivalence relation. Now a ∼ a since
a−1 a = e ∈ H. Therefore, the relation is reflexive. Furthermore, a ∼ b implies a−1 b ∈ H,
but since H is a subgroup of G, we have b−1 a = (a−1 b)−1 ∈ H. Thus, b ∼ a. Therefore,
the relation is symmetric. Finally, suppose that a ∼ b and b ∼ c. Then a−1 b ∈ H, and
b−1 c ∈ H. Since H is a subgroup a−1 b ⋅ b−1 c = a−1 c ∈ H; hence, a ∼ c. Therefore, the
relation is transitive and, hence, is an equivalence relation.
For a ∈ G, the equivalence class is
But then, clearly, g ∈ aH. It follows that the equivalence class for a ∈ G is precisely the set
These classes, aH, are called left cosets of H, and since they are equivalence classes,
they partition G. This means that every element of g is in one and only one left coset. In
particular, bH = H = eH if and only if b ∈ H.
called right cosets of H. Also, of course, G is the (disjoint) union of distinct right cosets.
It is easy to see that any two left (right) cosets have the same order (number of
elements). To demonstrate this, consider the mapping aH → bH via ah → bh, where
h ∈ H. It is not hard to show that this mapping is 1–1 and onto (see exercises). Thus, we
have |aH| = |bH|. (This is also true for right cosets and can be established in a similar
manner.) Letting b ∈ H in the above discussion, we see |aH| = |H|, for any a ∈ G. That
is, the size of each left or right coset is exactly the same as the subgroup H.
One can also see that the collection {aH} of all distinct left cosets has the same num-
ber of elements as the collection {Ha} of all distinct right cosets. In other words, the
number of left cosets equals the number of right cosets (this number may be infinite).
For example, consider the map f : aH → Ha−1 . This mapping is well defined; for if
aH = bH, then b = ah, where h ∈ H. Thus, f (bH) = Hb−1 = Hh−1 a−1 = f (aH). It is not
hard to show that this mapping is 1–1 and onto (see exercises). Hence, the number of left
cosets equals the number of right cosets.
Definition 9.4.3. Let G be a group and H ⊂ G a subgroup. The number of distinct left
cosets, which is the same as the number of distinct right cosets, is called the index of H
in G, denoted by [G : H].
Now let us consider the case where the group G is finite. Each left coset has the
same size as the subgroup H; here, both are finite. Hence, |aH| = |H| for each coset. In
addition, the group G is a disjoint union of the left cosets; that is,
G = H ∪ g1 H ∪ ⋅ ⋅ ⋅ ∪ gn H.
9.4 Cosets and Lagrange’s Theorem � 129
If G is a finite group, this implies that both the order of a subgroup and the index of a
subgroup are divisors of the order of the group.
This theorem plays a crucial role in the structure theory of finite groups since it
greatly restricts the size of subgroups. For example, in a group of order 10, there can be
proper subgroups only of orders 1, 2, and 5.
As an immediate corollary, we have the following result:
Corollary 9.4.5. The order of any element g ∈ G, where G is a finite group, divides the
order of the group. In particular, if |G| = n and g ∈ G, then o(g)|n, and g n = 1.
Proof. Let g ∈ G and o(g) = m. Then m is the size of the cyclic subgroup generated by g;
hence divides n from Lagrange’s theorem. Then n = mk, and so
k
g n = g mk = (g m ) = 1k = 1.
Before leaving this section, we consider some results concerning general subsets of
a group.
Suppose that G is a group and S is an arbitrary nonempty subset of G, S ⊂ G, and
S ≠ 0. Such a set S is usually called a complex of G.
If U and V are two complexes of G, the product UV is defined as follows:
UV = {g1 g2 ∈ G : u ∈ U, v ∈ V }.
Now suppose that U, V are subgroups of G. When is the complex UV again a sub-
group of G?
Proof. We note first that when we say U and V commute, we do not demand that this
is so elementwise. In other words, it is not required that uv = vu for all u ∈ U and all
v ∈ V . All that is required is that for any u ∈ U and v ∈ V uv = v1 u1 for some elements
u1 ∈ U and v1 ∈ V .
Assume that UV is a subgroup of G. Let u ∈ U and v ∈ V . Then u ∈ U ⋅ 1 ⊂ UV and
v ∈ 1 ⋅ V ⊂ UV . But since UV is assumed itself to be a subgroup, it follows that vu ∈ UV .
130 � 9 Groups, Subgroups and Examples
Theorem 9.4.7 (Product formula). Let U, V be subgroups of G, and let R be a left transver-
sal of the intersection U ∩ V in U. Then
UV = ⋃ rV ,
r∈R
|U||V |
|UV | = .
|U ∩ V |
⋃ rV ⊂ UV .
r∈R
U = ⋃ r(U ∩ V ).
r∈R
uv = rv′ v ∈ rV .
uv ∈ ⋃ rV .
r∈R
|U| |U||V |
|UV | = |R||V | = |U : U ∩ V ||V | = |V | = .
|U ∩ V | |U ∩ V |
9.4 Cosets and Lagrange’s Theorem � 131
We now show that index is multiplicative. Later, we will see how this fact is related
to the multiplicativity of the degree of field extensions.
Theorem 9.4.8. Suppose G is a group and U and V are subgroups with U ⊂ V ⊂ G. Then
if G is the disjoint union
G = ⋃ rV ,
r∈R
V = ⋃ sU,
s∈S
G = ⋃ rsU.
r∈R,s∈S
[G : U] = [G : V ][V : U].
Proof. Now
G = ⋃ rV = ⋃ (⋃ sU) = ⋃ rsU.
r∈R r∈R s∈S r∈R,s∈S
The next result says that the intersection of subgroups of finite index must again be
of finite index.
Theorem 9.4.9 (Poincaré). Suppose that U, V are subgroups of finite index in G. Then U ∩V
is also of finite index. Furthermore,
[G : U ∩ V ] ≤ [G : U][G : V ].
Proof. Let r be the number of left cosets of U in G that are contained in UV . r is finite
since the index [G : U] is finite. From Theorem 9.4.7, we then have
|V : U ∩ V | = r ≤ [G : U].
132 � 9 Groups, Subgroups and Examples
Corollary 9.4.10. Suppose that [G : U] and [G : V ] are finite and relatively prime. Then
G = UV .
[G : U ∩ V ] = [G : U][G : V ].
[G : U ∩ V ] = [G : V ][V : U ∩ V ].
[V : U ∩ V ] = [G : U].
The number of left cosets of U in G that are contained in VU is equal to the number of
all left cosets of U in G. It follows then that we must have G = UV .
Lemma 9.5.1. If U and V are subgroups of a group G, then their intersection U ∩ V is also
a subgroup.
Proof. Since the identity of G is in both U and V , we have that U ∩V is nonempty. Suppose
that g1 , g2 ∈ U ∩ V . Then g1 , g2 ∈ U; hence, g1−1 g2 ∈ U since U is a subgroup. Analogously,
g1−1 g2 ∈ V . Hence, g −1 g2 ∈ U ∩ V ; therefore, U ∩ V is a subgroup.
Definition 9.5.2. A subset M of a group G is a set of generators for G if G = ⟨M⟩; that is,
the smallest subgroup of G containing M is all of G. We say that G is generated by M, and
that M is a set of generators for G.
Notice that any group G has at least one set of generators, namely G itself. If we have
G = ⟨M⟩ and M is a finite set, then G is called finitely generated. Clearly, any finite group
is finitely generated. Shortly, we will give an example of a finitely generated infinite
group.
Example 9.5.3. The set of all reflections forms a set of generators for the Euclidean
group ℰ . Recall that any T ∈ ℰ is either a translation, a rotation, a reflection, or a glide
reflection. It can be shown (see exercises) that any one of these can be expressed as a
product of 3, or fewer reflections.
In this case, G = {g n : n ∈ ℤ}; that is, G consists of all the powers of the element g.
If there exists an integer m such that g m = 1, then there exists a smallest such positive
integer say n. It follows that g k = g l if and only if k ≡ l (mod n). In this situation, the
distinct powers of g are precisely
{1 = g 0 , g, g 2 , . . . , g n−1 }.
It follows that |G| = n. We then call G a finite cyclic group. If no such power exists, then
all the powers of G are distinct and G is an infinite cyclic group.
We show next that any two cyclic groups of the same order are isomorphic.
Theorem 9.5.5. (a) If G = ⟨g⟩ is an infinite cyclic group, then G ≅ (ℤ, +); that is, the
integers under addition.
(b) If G = ⟨g⟩ is a finite cyclic group of order n, then G ≅ (ℤn , +); that is, the integers
modulo n under addition.
It follows that for a given order there is only one cyclic group up to isomorphism.
Proof. Let G be an infinite cyclic group with generator g. Map g onto 1 ∈ (ℤ, +). Since g
generates G and 1 generates ℤ under addition, this can be extended to a homomorphism.
It is straightforward to show that this defines an isomorphism.
Now let G be a finite cyclic group of order n with generator g. As above, map g to
1 ∈ ℤn and extend to a homomorphism. Again it is straightforward to show that this
defines an isomorphism.
134 � 9 Groups, Subgroups and Examples
Now let G and H be two cyclic groups of the same order. If both are infinite, then
both are isomorphic to (ℤ, +) and, hence, isomorphic to each other. If both are finite of
order n, then both are isomorphic to (ℤn , +) and, hence, isomorphic to each other.
Theorem 9.5.6. Let G = ⟨g⟩ be a finite cyclic group of order n. Then every subgroup of G
is also cyclic. Furthermore, if d|n, there exists a unique subgroup of G of order d.
Proof. Let G = ⟨g⟩ be a finite cyclic group of order n, and suppose that H is a subgroup
of G. Notice that if g m ∈ H, then g −m is also in H since H is a subgroup. Hence, H must
contain positive powers of the generator g. Let t be the smallest positive power of g such
that g t ∈ H. We claim that H = ⟨g t ⟩, the cyclic subgroup of G generated by g t . Let h ∈ H,
then h = g m for some positive integer m ≥ t. Divide m by t to get
Theorem 9.5.7. Let G = ⟨g⟩ be an infinite cyclic group. Then a subgroup H is of the form
H = ⟨g t ⟩ for a positive integer t. Furthermore, if t1 , t2 are positive integers with t1 ≠ t2 ,
then ⟨g t1 ⟩ and ⟨g t2 ⟩ are distinct.
Proof. Let G = ⟨g⟩ be an infinite cyclic group and H a subgroup of G. As in the proof of
Theorem 9.5.6, H must contain positive powers of the generator g. Let t be the smallest
positive power of g such that g t ∈ H. We claim that H = ⟨g t ⟩, the cyclic subgroup of G
generated by g t . Let h ∈ H, then h = g m for some positive integer m ≥ t. Divide m by t
to get
a contradiction since r < t and t is the least positive power in H. It follows that r = 0,
so m = qt. This implies that g m = g qt = (g t )q ; that is, g m is a multiple of g t . Therefore,
every element of H is a multiple of g t and, therefore, g t generates H; hence, H = ⟨g t ⟩.
From the proof above in the subgroup ⟨g t ⟩, the integer t is the smallest positive
power of g in ⟨g t ⟩. Therefore, if t1 , t2 are positive integers with t1 ≠ t2 , then ⟨g t1 ⟩ and
⟨g t2 ⟩ are distinct.
Theorem 9.5.8. Let G = ⟨g⟩ be a cyclic group. Then the following hold:
(a) If G = ⟨g⟩ is finite of order n, then g k is also a generator if and only if (k, n) = 1. That
is, the generators of G are precisely those powers g k , where k is relatively prime to n.
(b) If G = ⟨g⟩ is infinite, then the only generators are g, g −1 .
Proof. (a) Let G = ⟨g⟩ be a finite cyclic group of order n, and suppose that (k, n) = 1.
Then there exist integers x, y with kx + ny = 1. It follows that
x y x
g = g kx+ny = (g k ) (g n ) = (g k )
kx + ny = 1.
Corollary 9.5.11. If G = ⟨g⟩ is finite of order n, then there are ϕ(n) generators for G, where
ϕ is the Euler phi-function.
Proof. From Theorem 9.5.8, the generators of G are precisely the powers g k , where
(k, n) = 1. The numbers relatively prime to n are counted by the Euler phi-function.
Recall that in an arbitrary group G, if g ∈ G, then the order of g, denoted o(g), is the
order of the cyclic subgroup generated by g. Given two elements g, h ∈ G, in general,
there is no relationship between o(g), o(h) and the order of the product gh. However, if
they commute, there is a very direct relationship.
136 � 9 Groups, Subgroups and Examples
Lemma 9.5.12. Let G be an arbitrary group and g, h ∈ G both of finite order o(g), o(h). If
g and h commute; that is, gh = hg, then o(gh) divides lcm(o(g), o(h)). In particular, if G is
an Abelian group, then o(gh)| lcm(o(g), o(h)) for all g, h ∈ G of finite order. Furthermore,
if ⟨g⟩ ∩ ⟨h⟩ = {1}, then o(gh) = lcm(o(g), o(h)).
Proof. Suppose o(g) = n and o(h) = m are finite. If g, h commute, then for any k, we
have (gh)k = g k hk . Let t = lcm(n, m), then t = k1 m, t = k2 n. Hence,
k k
(gh)t = g t ht = (g m ) 1 (hn ) 2 = 1.
Therefore, the order of gh is finite and divides t. Suppose that ⟨g⟩ ∩ ⟨h⟩ = {1}; that is, the
cyclic subgroup generated by g intersects trivially with the cyclic subgroup generated
by h. Let k = o(gh), which we know is finite from the first part of the lemma.
Let t = lcm(n, m). We then have (gh)k = g k hk = 1, which implies that g k = h−k .
Since the cyclic subgroups have only trivial intersection, this implies that g k = 1 and
hk = 1. But then n|k and m|k; hence t|k. Since k|t it follows that k = t.
Recall that if m and n are relatively prime, then lcm(m, n) = mn. Furthermore, if the
orders of g and h are relatively prime, it follows from Lagrange’s theorem that ⟨g⟩∩⟨h⟩ =
{1}. We then get the following:
Corollary 9.5.13. If g, h commute and o(g) and o(h) are finite and relatively prime, then
o(gh) = o(g)o(h).
Definition 9.5.14. If G is a finite Abelian group, then the exponent of G is the lcm of the
orders of all elements of G. That is,
Lemma 9.5.15. Let G be a finite Abelian group. Then G contains an element of order
exp(G).
e e
Proof. Suppose that exp(G) = p11 ⋅ ⋅ ⋅ pkk with pi distinct primes. By the definition of
e r
exp(G), there is a gi ∈ G with o(gi ) = pi i ri with pi and ri relatively prime. Let hi = gi i .
ei
Then from Lemma 9.5.12, we get o(hi ) = pi . Now let g = h1 h2 ⋅ ⋅ ⋅ hk . From the corollary
e e
to Lemma 9.5.12, we have o(g) = p11 ⋅ ⋅ ⋅ pkk = exp(G).
Proof. Let A ⊂ K ⋆ with |A| = n. Suppose that m = exp(A). Consider the polynomial
f (x) = x m − 1 ∈ K[x]. Since the order of each element in A divides m, it follows that
9.5 Generators and Cyclic Groups � 137
am = 1 for all a ∈ A; hence, each a ∈ A is a zero of the polynomial f (x). Hence, f (x) has
at least n zeros. Since a polynomial of degree m over a field can have at most m zeros, it
follows that n ≤ m. From Lemma 9.5.15, there is an element a ∈ A with o(a) = m. Since
|A| = n, it follows that m|n; hence, m ≤ n. Therefore, m = n; hence, A = ⟨a⟩ showing that
A is cyclic.
We close this section with two other results concerning cyclic groups. The first
proves, using group theory, a very interesting number theoretic result concerning the
Euler phi-function.
∑ ϕ(d) = n.
d|n
Proof. Consider a cyclic group G of order n. For each d|n, d ≥ 1, there is a unique cyclic
subgroup H of order d. H then has ϕ(d) generators. Each element in G generates its
own cyclic subgroup H1 , say of order d and, hence, must be included in the ϕ(d) gener-
ators of H1 . Therefore, ∑d|n ϕ(d) is the sum of the numbers of generators of the cyclic
subgroups of G. But this must be the whole group; hence, this sum is n.
We shall make use of the above theorem directly in the following theorem.
Theorem 9.5.18. If |G| = n and if for each positive d such that d|n, G has at most one cyclic
subgroup of order d, then G is cyclic (and, consequently, has exactly one cyclic subgroup
of order d).
Proof. For each d|n, d > 0, let ψ(d) denote the number of elements of G of order d. Then
∑ ψ(d) = n.
d|n
Now suppose that ψ(d) ≠ 0 for a given d|n. Then there exists an a ∈ G of order d, which
generates a cyclic subgroup, ⟨a⟩, of order d of G. We claim that all elements of G of
order d are in ⟨a⟩. Indeed, if b ∈ G with o(b) = d and b ∉ ⟨a⟩, then ⟨b⟩ is a second cyclic
subgroup of order d, distinct from ⟨a⟩. This contradicts the hypothesis, so the claim is
proved. Thus, if ψ(d) ≠ 0, then ψ(d) = ϕ(d). In general, we have ψ(d) ≤ ϕ(d), for all
positive d|n. But n = ∑d|n ψ(d) ≤ ∑d|n ϕ(d), by the previous theorem. It follows, clearly,
from this that ψ(d) = ϕ(d) for all d|n. In particular, ψ(n) = ϕ(n) ≥ 1. Hence, there exists
at least one element of G of order n; hence, G is cyclic. This completes the proof.
Corollary 9.5.19. If in a group G of order n, for each d|n, the equation x d = 1 has at most
d solutions in G, then G is cyclic.
Proof. The hypothesis clearly implies that G can have at most one cyclic subgroup of
order d since all elements of such a subgroup satisfy the equation. So Theorem 9.5.18
applies to give our result.
138 � 9 Groups, Subgroups and Examples
Theorem 9.5.20. Let G be a finitely generated group. The number of subgroups of index
n < ∞ is finite.
9.6 Exercises
1. Prove Lemma 9.1.4.
2. Let G be a group and H a nonempty subset. H is a subgroup of G if and only if
ab−1 ∈ H for all a, b ∈ H.
3. Suppose that g ∈ G and g m = 1 for some positive integer m. Let n be the smallest
positive integer such that g n = 1.
Show that the set of elements {1, g, g 2 , . . . , g n−1 } are all distinct but for any other
power g k we have g k = g t for some k = 0, 1, . . . , n − 1.
4. Let G be a group and U1 , U2 be finite subgroups of G. If |U1 | and |U2 | are relatively
prime, then U1 ∩ U2 = {e}.
5. Let A, B be subgroups of a finite group G. If |A| ⋅ |B| > |G| then A ∩ B ≠ {e}.
2 2
6. Let G be the set of all real matrices of the form ( ab −b a ), where a + b ≠ 0. Show:
(a) G is a group.
(b) For each n ∈ ℕ there is at least one element of order n in G.
7. Let p be a prime, and let G = SL(2, p) = SL(2, ℤp ). Show: G has at least 2p−2 elements
of order p.
8. Let p be a prime and a ∈ ℤ. Show that ap ≡ a (mod p).
9. Here we outline a proof that every planar Euclidean congruence motion is either a
rotation, translation, reflection or glide reflection. An isometry in this problem is a
planar Euclidean congruence motion. Show:
(a) If T is an isometry then it is completely determined by its action on a triangle—
equivalent to showing that if T fixes three noncollinear points then it must be
the identity.
9.6 Exercises � 139
(b) If an isometry T has exactly one fixed point then it must be a rotation with that
point as center.
(c) If an isometry T has two fixed points then it fixes the line joining them. Then
show that if T is not the identity it must be a reflection through this line.
(d) If an isometry T has no fixed point but preserves orientation then it must be a
translation.
(e) If an isometry T has no fixed point but reverses orientation then it must be a
glide reflection.
10. Let Pn be a regular n-gon and Dn its group of symmetries. Show that |Dn | = 2n.
(Hint: First show that |Dn | ≤ 2n and then exhibit 2n distinct symmetries.)
11. If A, B have the same cardinality, then there exists a bijection σ : A → B. Define a
map F : SA → SB in the following manner: if f ∈ SA , let F(f ) be the permutation on
B given by F(f )(b) = σ(f (σ −1 (b))). Show that F is an isomorphism.
12. Prove Lemma 9.3.3.
10 Normal Subgroups, Factor Groups and Direct
Products
10.1 Normal Subgroups and Factor Groups
In rings, we saw that there were certain special types of subrings, called ideals, which
allowed us to define factor rings. The analogous object for groups is called a normal
subgroup, which we will define and investigate in this section.
Definition 10.1.1. Let G be an arbitrary group and suppose that H1 and H2 are subgroups
of G. We say that H2 is conjugate to H1 if there exists an element a ∈ G such that H2 =
a−1 H1 a. H1 , H2 are the called conjugate subgroups of G.
Lemma 10.1.2. Let G be an arbitrary group. Then the relation of conjugacy is an equiva-
lence relation on the set of subgroups of G.
Hence, f is a homomorphism.
If f (a1 ) = f (a2 ), then g −1 a1 g = g −1 a2 g. Clearly, by the cancellation law, we then have
a1 = a2 ; hence, f is one-to-one.
Finally, let a ∈ G, and let a1 = gag −1 . Then a = g −1 a1 g; hence, f (a1 ) = a. It follows
that f is onto; therefore, f is an automorphism on G.
https://doi.org/10.1515/9783111142524-010
10.1 Normal Subgroups and Factor Groups � 141
inclusion.
Lemma 10.1.5. Let N be a subgroup of a group G. Then if a−1 Na ⊂ N for all a ∈ G, then
a−1 Na = N. In particular, a−1 Na ⊂ N for all a ∈ G implies that N is a normal subgroup.
Notice that if g −1 Hg = H, then Hg = gH. That is as sets the left coset, gH, is equal to
the right coset, Hg. Hence, for each h1 ∈ H, there is an h2 ∈ H with gh1 = h2 g. If H ⊲ G,
this is true for all g ∈ G. Furthermore, if H is normal, then for the product of two cosets
g1 H and g2 H, we have
Lemma 10.1.6. Let H be a subgroup of a group G. Then the following are equivalent:
(1) H is a normal subgroup of G.
(2) g −1 Hg = H for all g ∈ G.
(3) gH = Hg for all g ∈ G.
(4) (g1 H)(g2 H) = (g1 g2 )H for all g1 , g2 ∈ G.
This is precisely the condition needed to construct factor groups. First we give some
examples of normal subgroups.
G = H ∪ gH = H ∪ Hg.
Since the union is a disjoint union, we must have gH = Hg; hence, H is normal.
Lemma 10.1.9. Let K be any field. Then the group SL(n, K) is a normal subgroup of
GL(n, K) for any positive integer n.
Proof. Recall that GL(n, K) is the group of n × n matrices over the field K with nonzero
determinant, whereas SL(n, K) is the subgroup of n × n matrices over the field K with
determinant equal to 1. Let U ∈ SL(n, K) and T ∈ GL(n, K). Consider T −1 UT. Then
Hence, T −1 UT ∈ SL(n, K) for any U ∈ SL(n, K), and any T ∈ GL(n, K). It follows that
T −1 SL(n, K)T ⊂ SL(n, K); therefore, SL(n, K) is normal in GL(n, K).
The intersection of normal subgroups is again normal, and the product of normal
subgroups is normal.
Lemma 10.1.10. Let N1 , N2 be normal subgroups of the group G. Then the following hold:
(1) N1 ∩ N2 is a normal subgroup of G.
(2) N1 N2 is a normal subgroup of G.
(3) If H is any subgroup of G, then N1 ∩ H is a normal subgroup of H, and N1 H = HN1 .
g −1 (n1 n2 )g = (g −1 n1 g)(g −1 n2 g) ∈ N1 N2
Definition 10.1.11. Let G be an arbitrary group and H a normal subgroup of G. Let G/H
denote the set of distinct left (and hence also right) cosets of H in G. On G/H, define the
multiplication (g1 H)(g2 H) = g1 g2 H for any elements g1 H, g2 H in G/H.
10.1 Normal Subgroups and Factor Groups � 143
Theorem 10.1.12. Let G be a group and H a normal subgroup of G. Then G/H under the
operation defined above forms a group. This group is called the factor group or quotient
group of G modulo H. The identity element is the coset 1H = H, and the inverse of a coset
gH is g −1 H.
Proof. We first show that the operation on G/N is well defined. Suppose that a′ N = aN
and b′ N = bN, then b′ ∈ bN, and so b′ = bn1 . Similarly a′ = an2 , where n1 , n2 ∈ N.
Therefore,
an2 bN = abN.
Thus, we have shown that if N ⊲ G, then a′ b′ N = abN, and the operation on G/N is
indeed well defined.
The associative law is true, because coset multiplication as defined above uses the
ordinary group operation, which is by definition associative.
The coset N serves as the identity element of G/N. Notice that
aN ⋅ N = aN 2 = aN,
and
N ⋅ aN = aN 2 = aN.
aNa−1 N = aa−1 N 2 = N.
We emphasize that the elements of G/N are cosets; thus, subsets of G. If |G| < ∞,
then |G/N| = [G : N], the number of cosets of N in G. It is also to be emphasized that for
G/N to be a group, N must be a normal subgroup of G.
In some cases, properties of G are preserved in factor groups.
Lemma 10.1.13. If G is Abelian, then any factor group of G is also Abelian. If G is cyclic,
then any factor group of G is also cyclic.
Definition 10.1.14. A group G ≠ {1} is simple, provided that N⊲G implies N = G or N = {1}.
One of the most outstanding problems in group theory has been to give a complete
classification of all finite simple groups. In other words, this is the program to discover
all finite simple groups, and to prove that there are no more to be found. This was ac-
complished through the efforts of many mathematicians. The proof of this magnificent
result took thousands of pages. We refer the reader to [30] for a complete discussion of
this. We give one elementary example:
Lemma 10.1.15. Any finite group of prime order is simple and cyclic.
Proof. Suppose that G is a finite group and |G| = p, where p is a prime. Let g ∈ G with
g ≠ 1. Then ⟨g⟩ is a nontrivial subgroup of G, so its order divides the order of G by
Lagrange’s theorem. Since g ≠ 1, and p is a prime, we must have |⟨g⟩| = p. Therefore,
⟨g⟩ is all of G; that is, G = ⟨g⟩; hence, G is cyclic.
The argument above shows that G has no nontrivial proper subgroups and, there-
fore, no nontrivial normal subgroups. Therefore, G is simple.
In the next chapter, we will examine certain other finite simple groups.
That is, the kernel is the set of the elements of G1 that map onto the identity of G2 . The
image of f , denoted im(f ), is the set of elements of G2 mapped onto by f from elements
of G1 . That is,
10.2 The Group Isomorphism Theorems � 145
Proof. Suppose that f is injective. Since f (1) = 1, we always have 1 ∈ ker(f ). Suppose
that g ∈ ker(f ). Then f (g) = f (1). Since f is injective, this implies that g = 1; hence,
ker(f ) = {1}.
Conversely, suppose that ker(f ) = {1} and f (g1 ) = f (g2 ). Then
We now state the group isomorphism theorem. This is entirely analogous to the ring
isomorphism theorem replacing ideals by normal subgroups. We note that this theorem
is sometimes called the first group isomorphism theorem.
Theorem 10.2.3 (Group isomorphism theorem). (a) Let G1 and G2 be groups and f : G1 →
G2 a group homomorphism. Then ker(f ) is a normal subgroup of G1 , im(f ) is a sub-
group of G2 , and
G/ ker(f ) ≅ im(f ).
(b) Conversely, suppose that N is a normal subgroup of a group G. Then there exists a
group H and a homomorphism f : G → H such that ker(f ) = N, and im(f ) = H.
Proof. We first show (a). Since 1 ∈ ker(f ), the kernel is nonempty. Now suppose that
g1 , g2 ∈ ker(f ). Then f (g1 ) = f (g2 ) = 1. It follows that f (g1 g2−1 ) = f (g1 )(f (g2 ))−1 = 1. Hence,
g1 g2−1 ∈ ker(f ); therefore, ker(f ) is a subgroup of G1 . Furthermore, for g ∈ G1 , we have
⋅ 1 ⋅ f (g) = f (g −1 g) = f (1) = 1.
−1
= (f (g))
Suppose that g1 ker(f ) = g2 ker(f ), then g1 g2−1 ∈ ker(f ) so that f (g1 g2−1 ) = 1. This
implies that f (g1 ) = f (g2 ); hence, the map f ̂ is well defined. Now,
therefore, f ̂ is a homomorphism. Suppose that f ̂(g1 ker(f )) = f ̂(g2 ker(f )), then it follows
that f (g1 ) = f (g2 ); and hence, g1 ker(f ) = g2 ker(f ). It follows that f ̂ is injective.
Finally, suppose that h ∈ im(f ). Then there exists a g ∈ G1 with f (g) = h. Then
f ̂(g ker(f )) = h, and f ̂ is a surjection onto im(f ). Therefore, f ̂ is an isomorphism com-
pleting the proof of part (a).
Conversely, suppose that N is a normal subgroup of G. Define the map f : G → G/N
by f (g) = gN for g ∈ G. By the definition of the product in the quotient group G/N, it is
clear that f is a homomorphism with im(f ) = G/N. If g ∈ ker(f ), then f (g) = gN = N
since N is the identity in G/N. However, this implies that g ∈ N; hence, it follows that
ker(f ) = N, completing the proof.
There are two related theorems that are called the second isomorphism theorem
and the third isomorphism theorem.
Proof. From Lemma 10.1.10, we know that U ∩ N is normal in U. We define the map
α : UN → U/U ∩ N by α(un) = u(U ∩ N). If un = u′ n′ , then u′ −1 u = n′ n−1 ∈ U ∩ N.
Therefore, u′ (U ∩ N) = u(U ∩ N); hence, the map α is well defined.
Suppose that un, u′ n′ ∈ UN. Since N is normal in G, we have that unu′ n′ ∈ uu′ N.
Hence, unu′ n′ = uu′ n′′ with n′′ ∈ N. Then
However, U ∩ N is normal in U, so
(G/N)/(M/N) ≅ G/M.
β(gN) = gM.
(G/N)/(M/N) ≅ G/M.
ϕ : H → f (H),
Proof. We first show that the mapping ϕ is surjective. Let H1 be a subgroup of G/N, and
let
H = {g ∈ G : f (g) ∈ H1 }.
G = G1 × G2 = {(a, b) : a ∈ G1 , b ∈ G2 }.
On G, define
With this operation, it is direct to verify the groups axioms for G; hence, G becomes a
group.
Theorem 10.3.1. Let G1 , G2 be groups and G the Cartesian product G1 × G2 with the op-
eration defined above. Then G forms a group called the direct product of G1 and G2 . The
identity element is (1, 1), and (g, h)−1 = (g −1 , h−1 ).
This can be iterated to any finite number of groups (also to an infinite number, that
we will not consider here) G1 , . . . , Gn to form the direct product G1 × G2 × ⋅ ⋅ ⋅ × Gn .
Proof. The map (a, b) → (b, a), where a ∈ G1 , b ∈ G2 provides an isomorphism G1 ×G2 →
G2 × G1 .
Suppose that both G1 , G2 are Abelian. Then if a1 , a2 ∈ G1 , b1 , b2 ∈ G2 , we have
hence, G1 × G2 is Abelian.
Conversely, suppose G1 × G2 is Abelian, and suppose that a1 , a2 ∈ G1 . Then for the
identity 1 ∈ G2 , we have
Proof. Map G1 ×G2 onto G2 by (a, b) → b. It is clear that this map is a homomorphism, and
that the kernel is H1 = {(a, 1) : a ∈ G1 }. This establishes that H1 is a normal subgroup of G,
and that G/H1 ≅ G2 . In an identical fashion, we get that G/H2 ≅ G1 . The map (a, 1) → a
provides the isomorphism from H1 onto G1 .
If the factors are finite, it is easy to find the order of G1 × G2 . The size of the Cartesian
product is just the product of the sizes of the factors.
Lemma 10.3.4. If |G1 | and |G2 | are finite, then |G1 × G2 | = |G1 ||G2 |.
Proof. Since G = G1 G2 , each element of G has the form ab with a ∈ G1 , b ∈ G2 . This repre-
sentation as ab is unique as G1 ∩ G2 = {1}. We first show that each a ∈ G1 commutes with
each b ∈ G2 . Consider the element aba−1 b−1 . Since G1 is normal ba−1 b−1 ∈ G1 , which im-
plies that abab−1 ∈ G1 . Since G2 is normal, aba−1 ∈ G2 , which implies that aba−1 b−1 ∈ G2 .
Therefore, aba−1 b−1 ∈ G1 ∩ G2 = {1}; hence, aba−1 b1 = 1, so that ab = ba.
Now map G onto G1 × G2 by f (ab) → (a, b). We claim that this is an isomorphism. It
is clearly onto. Now
Theorem 10.4.1 (Basis theorem for finite Abelian groups). Let G be a finite Abelian group.
Then G is a direct product of cyclic groups of prime power order.
Before giving the proof, we give two examples showing how this theorem leads to
the classification of finite Abelian groups.
Since all cyclic groups of order n are isomorphic to (ℤn , +), we will denote a cyclic
group of order n by ℤn .
Example 10.4.2. Classify all Abelian groups of order 60. Let G be an Abelian group of
order 60. From Theorem 10.4.1, G must be a direct product of cyclic groups of prime
power order. Now 60 = 22 ⋅ 3 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic group involved in the direct product decomposition of G have order either 2, 4, 3,
or 5 (by Lagrange’s theorem, they must be divisors of 60). Therefore, G must be of the
form
G ≅ ℤ4 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ5 .
Hence, up to isomorphism, there are only two Abelian groups of order 60.
Example 10.4.3. Classify all Abelian groups of order 180. Now 180 = 22 ⋅ 32 ⋅ 5, so the only
primes involved are 2, 3, and 5. Hence, the cyclic group involved in the direct product
decomposition of G have order either 2, 4, 3, 9, or 5 (by Lagrange’s theorem, they must
be divisors of 180). Therefore, G must be of the form
G ≅ ℤ4 × ℤ9 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ9 × ℤ5
G ≅ ℤ4 × ℤ3 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ3 × ℤ5 .
Lemma 10.4.4. Let G be a finite Abelian group, and let p||G|, where p is a prime. Then
all the elements of G, whose orders are a power of p, form a normal subgroup of G. This
subgroup is called the p-primary component of G, which we will denote by Gp .
Proof. Let p be a prime with p||G|, and let a and b be two elements of G of order a power
of p. Since G is Abelian, the order of ab is the lcm of the orders, which is again a power
of p. Therefore, ab ∈ Gp . The order of a−1 is the same as the order of a, so a−1 ∈ Gp ;
therefore, Gp is a subgroup.
e e
Lemma 10.4.5. Let G be a finite Abelian group of order n. Suppose that n = p11 ⋅ ⋅ ⋅ pkk with
p1 , . . . , pk distinct primes. Then
10.4 Finite Abelian Groups � 151
G ≅ Gp1 × ⋅ ⋅ ⋅ × Gpk ,
Proof. Each Gpi is normal since G is Abelian, and since distinct primes are relatively
prime, the intersection of the Gpi is the identity. Therefore, Lemma 10.4.5 will follow by
showing that each element of G is a product of elements in the Gp1 .
f f f
Let g ∈ G. Then the order of g is p11 ⋅ ⋅ ⋅ pkk . We write this as pii m with (m, pi ) = 1. Then
f
g m has order pi i and, hence, is in Gpi . Now since p1 , . . . , pk are relatively prime, there
exists m1 , . . . , mk with
f f
m1 p11 + ⋅ ⋅ ⋅ + mk pkk = 1;
hence,
f1 fk
m1 m
g = (g p1 ) ⋅ ⋅ ⋅ (g pk ) k .
G = ⟨g1 ⟩ × ⋅ ⋅ ⋅ × ⟨gn ⟩;
that is, G is the direct product of the cyclic subgroups generated by the gi . The basis
theorem for finite Abelian groups says that any finite Abelian group has a basis. Suppose
that G is a finite Abelian group with a basis g1 , . . . , gk so that G = ⟨g1 ⟩ × ⋅ ⋅ ⋅ × ⟨gk ⟩. Since
G is finite, each gi has finite order, say mi . It follows then, from the fact that G is a direct
product, that each g ∈ G can be expressed as
n n
g = g1 1 ⋅ ⋅ ⋅ gk k
and, furthermore, the integers n1 , . . . , nk are unique modulo the order of gi . Hence, each
integer ni can be chosen in the range 0, 1, . . . , mi −1, and within this range for the element
g, the integer ni is unique.
From the previous lemma, each finite Abelian group splits into a direct product of
its p-primary components for different primes p. Hence, to complete the proof of the
basis theorem, we must show that any finite Abelian group of order pm for some prime
p has a basis. We call an Abelian group of order pm an Abelian p-group.
Consider an Abelian group G of order pm for a prime p. It is somewhat easier to com-
plete the proof if we consider the group using additive notation. That is, the operation is
considered +, the identity as 0, and powers are given by multiples. Hence, if an element
g ∈ G has order pk , then in additive notation, pk g = 0.
152 � 10 Normal Subgroups, Factor Groups and Direct Products
Lemma 10.4.6. Let G be a finite Abelian group of prime power order pn for some prime p.
Then G is a direct product of cyclic groups.
m1 g1 + ⋅ ⋅ ⋅ + mk gk = 0 (10.1)
for some set of integers mi . Since the order of each gi is p, as explained above, we may
assume that 0 ≤ mi < p for i = 1, . . . , k. Suppose that one mi ≠ 0.
Then (mi , p) = 1; hence, there exists an xi with mi xi ≡ 1 (mod p) (see Chapter 4).
Multiplying the equation (10.1) by xi , we get modulo p,
m1 xi g1 + ⋅ ⋅ ⋅ + gi + ⋅ ⋅ ⋅ + mk xi gk = 0,
and rearranging
gi = −m1 xi g1 − ⋅ ⋅ ⋅ − mk xk gk .
But then gi can be expressed in terms of the other gj ; therefore, the set {g1 , . . . , gk } is
not minimal. It follows that g1 , . . . , gk constitute a basis, and the lemma is true for the
exponent p.
Now suppose that any finite Abelian group of exponent pn−1 has a basis, and assume
that G has exponent pn . Consider the set G = pG = {pg : g ∈ G}. It is straightforward
that this forms a subgroup (see exercises). Since pn g = 0 for all g ∈ G, it follows that
pn−1 g = 0 for all g ∈ G, and so the exponent of G ≤ pn−1 . By the inductive hypothesis, G
has a basis
S = {pg1 , . . . , pgk }.
10.4 Finite Abelian Groups � 153
Consider the set {g1 , . . . , gk }, and adjoin to this set the set of all elements h ∈ G, satisfying
ph = 0. Call this set S1 , so that we have
S1 = {g1 , . . . , gk , h1 , . . . , ht }.
We claim that S1 is a set of generators for G. Let g ∈ G. Then pg ∈ G, which has the basis
pg1 , . . . , pgk , so that
pg = m1 pg1 + ⋅ ⋅ ⋅ + mk pgk .
p(g − m1 g1 − ⋅ ⋅ ⋅ − mk gk ) = 0,
g − m1 g1 − ⋅ ⋅ ⋅ − mk gk = hi , so that g = m1 g1 + ⋅ ⋅ ⋅ + mk gk + hi ,
m1 g1 + ⋅ ⋅ ⋅ + mr gr + n1 h1 + ⋅ ⋅ ⋅ + ns hs = 0 (10.2)
gi = −m1 xi g1 − ⋅ ⋅ ⋅ − ns xi hs .
a1 pg1 + ⋅ ⋅ ⋅ + ar pgr = 0.
The g1 , . . . , gr are independent and, hence, ai p = 0 for each i; hence, ai = 0. Now (10.2)
becomes
n1 h1 + ⋅ ⋅ ⋅ + ns hs = 0.
For more details see the proof of the general result on modules over principal ideal
domains later in the book. There is also an additional elementary proof for the basis
theorem for finitely generated Abelian groups.
Example 10.5.1. In Example 9.2.6, we saw that the symmetry group of an equilateral
triangle had 6 elements, and is generated by elements r and f , which satisfy the relations
r 3 = f 2 = 1, f −1 rf = r −1 , where r is a rotation of 120∘ about the center of the triangle, and
f is a reflection through an altitude. This was called the dihedral group D3 of order 6.
This can be generalized to any regular n-gon, n > 2. If D is a regular n-gon, then
the symmetry group Dn has 2n elements, and is called the dihedral group of order 2n. It
is generated by elements r and f , which satisfy the relations r n = f 2 = 1, f −1 rf = r n−1 ,
where r is a rotation of 2π n
about the center of the n-gon, and f is a reflection.
Hence, D4 , the symmetries of a square, has order 8 and D5 , the symmetries of a
regular pentagon, has order 10.
These elements then form a group of order 8 called the quaternion group denoted by Q.
Since ijk = 1, we have ij = −ji, and the generators i and j satisfy the relations i4 = j4 = 1,
i2 = j2 , ij = i2 ji.
10.5 Some Properties of Finite Groups � 155
We now state the main classification, and then prove it in a series of lemmas.
Recall from Section 10.1, that a finite group of prime order must be cyclic. Hence, in
the theorem, the cases |G| = 2, 3, 5, 7 are handled. We next consider the case, where G
has order p2 , and where p is a prime.
Definition 10.5.4. If G is a group, then its center denoted Z(G), is the set of elements in G,
which commute with everything in G. That is,
Proof. (a) and (b) are direct, and we leave them to the exercises. Consider the case,
where G/Z(G) is cyclic. Then each coset of Z(G) has the form g m Z(G), where g ∈ G.
Let a, b ∈ G. Then since a, b are in cosets of the center, we have a = g m u and b = g n v
with u, v ∈ Z(G). Then
A p-group is any finite group of prime power order pk . We need the following: The
proof of this is based on what is called the class equation, which we will prove in Chap-
ter 13.
Lemma 10.5.7. If |G| = p2 with p a prime, then G is Abelian; hence we have G ≅ ℤp2 , or
G ≅ ℤp × ℤp .
Proof. Suppose that |G| = p2 . Then from the previous lemma, G has a nontrivial center;
hence, |Z(G)| = p, or |Z(G)| = p2 . If |Z(G)| = p2 , then G = Z(G), and G is Abelian. If
|Z(G)| = p, then |G/Z(G)| = p. Since p is a prime this implies that G/Z(G) is cyclic; hence,
from Lemma 10.5.5, G is Abelian.
Lemma 10.5.7 handles the cases n = 4 and n = 9. Therefore, if |G| = 4, we must have
G ≅ ℤ4 , or G ≅ ℤ2 × ℤ2 , and if |G| = 9, we must have G ≅ ℤ9 , or G ≅ ℤ3 × ℤ3 .
This leaves n = 6, 8, 10. We next handle the cases 6 and 10.
Lemma 10.5.8. If G is any group, where every nontrivial element has order 2, then G is
Abelian.
Proof. Suppose that g 2 = 1 for all g ∈ G. This implies that g = g −1 for all g ∈ G. Let a, b
be arbitrary elements of G. Then
g 3 = h2 = 1, h1 gh = g −1 .
Proof. The proof is almost identical to that for n = 6. Since 10 = 2 ⋅ 5, if G were Abelian,
G ≅ ℤ2 × ℤ5 = ℤ10 .
10.5 Some Properties of Finite Groups � 157
g 5 = h2 = 1; h−1 gh = g −1 .
This leaves the case n = 8, the most difficult. If |G| = 8, and G is Abelian, then clearly,
G ≅ ℤ8 , or G ≅ ℤ4 ×ℤ2 , or G ≅ ℤ2 ×ℤ2 ×ℤ2 . The proof of Theorem 10.5.3 is then completed
with the following:
If h−1 gh = g, then as in the cases 6 and 10, ⟨g, h⟩ defines an Abelian subgroup of order 8;
hence, G is Abelian. If h−1 gh = g 2 , then
2 2
(h−1 gh) = (g 2 ) = g 4 = 1 ⇒ g = h−2 gh2 = h−1 g 2 h = g 4 ⇒ g 3 = 1,
contradicting the fact that g has order 4. Therefore, h−1 gh = g 3 = g −1 . It follows that g,
h define a subgroup of order 8, isomorphic to D4 . Since |G| = 8, this must be all of G and
G ≅ D4 .
Therefore, we may now assume that every element h ∈ G with h ∉ ⟨g⟩ has order 4.
Let h be such an element. Then h2 has order 2, so h2 ∈ ⟨g⟩, which implies that h2 = g 2 .
This further implies that g 2 is central; that is, commutes with everything. Identifying g
with i, h with j, and g 2 with −1, we get that G is isomorphic to Q, completing Lemma 10.5.11
and the proof of Theorem 10.5.3.
In principle, this type of analysis can be used to determine the structure of any finite
group, although it quickly becomes impractical. A major tool in this classification is the
following important result known as the Sylow theorem, which we just state. We will
prove this theorem in Chapter 13. If |G| = pm n with p a prime and (n, p) = 1, then a
158 � 10 Normal Subgroups, Factor Groups and Direct Products
Theorem 10.5.12 (Sylow theorem). Let |G| = pm n with p a prime and (n, p) = 1.
(a) G contains a p-Sylow subgroup.
(b) All p-Sylow subgroups of G are conjugate.
(c) Any p-subgroup of G is contained in a p-Sylow subgroup.
(d) The number of p-Sylow subgroups of G is of the form 1 + pk and divides n.
ia : G → G, ia (x) = axa−1 .
Theorem 10.6.4. Inn(G) is a normal subgroup of Aut(G); that is, Inn(G) ⊲ Aut(G).
Hence, ker(φ) = Z(G), the center of G. Now, from Theorem 10.2.3, we get the following:
Let G be a group and f ∈ Aut(G). If a ∈ G has order n, then f (a) also has order n; if
a ∈ G has infinite order then f (a) also has infinite order.
Example 10.6.6. Let V ≅ ℤ2 × ℤ2 ; that is, V has four elements 1, a, b and ab with a2 =
b2 = (ab)2 = 1.
V is often called the Klein four group. An automorphism of V permutes the three
elements a, b and ab of order 2, and each permutation of {a, b, ab} defines an automor-
phism of V . Hence, Aut(V ) ≅ S3 .
10.7 Exercises
1. Prove that if G is cyclic, then any factor group of G is also cyclic.
2. Prove that for any group G, the center Z(G) is a normal subgroup, and G = Z(G) if
and only if G is Abelian.
3. Let U1 and U2 be subgroups of a group G. Let x, y ∈ G. Show the following:
(i) If xU1 = yU2 , then U1 = U2 .
(ii) An example that xU1 = U2 x does not imply U1 = U2 .
4. Let U, V be subgroups of a group G. Let x, y ∈ G. If UxV ∩ UyV ≠ 0, then UxV = UyV .
5. Let N be a cyclic normal subgroup of the group G. Then all subgroups of N are
normal subgroups of G. Give an example to show that the statement is not correct
if N is not cyclic.
6. Let N1 and N2 be normal subgroups of G. Show the following:
(i) If all elements in N1 and N2 have finite order, then also the elements of N1 N2 .
e
(ii) Let e1 , e2 ∈ ℕ. If ni i = 1 for all ni ∈ Ni (i = 1, 2), then x e1 e2 = 1 for all x ∈ N1 N2 .
7. Find groups N1 , N2 and G with N1 ⊲ N2 ⊲ G, but N1 is not a normal subgroup of G.
8. Let G be a group generated by a and b and let bab−1 = ar and an = 1 for suitable
r ∈ ℤ, n ∈ ℕ. Show the following:
(i) The subgroup A := ⟨a⟩ is a normal subgroup of G.
(ii) G/A = ⟨bA⟩.
(iii) G = {bj ai : i, j ∈ ℤ}.
9. Prove that any group of order 24 cannot be simple.
10. Let G be a group with subgroups G1 , G2 . Then the following are equivalent:
(i) G ≅ G1 × G2 .
(ii) G1 ⊲ G, G2 ⊲ G, G = G1 G2 , and G1 ∩ G2 = {1}.
(iii) Every g ∈ G has a unique expression g = g1 g2 , where g1 ∈ G1 , g2 ∈ G2 , and
g1 g2 = g2 g1 for each g1 ∈ G1 , g2 ∈ G2 .
11. Suppose that G is a finite group with normal subgroups G1 , G2 such that
(|G1 |, |G2 |) = 1. If |G| = |G1 ||G2 |, then G ≅ G1 × G2 .
12. Let G be a group with normal subgroups G1 and G2 such that G = G1 G2 . Then
1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2
S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.
Definition 11.1.1. Suppose that f is a permutation of A = {1, 2, . . . , n}, which has the
following effect on the elements of A: There exists an element a1 ∈ A with f (a1 ) = a2 ,
f (a2 ) = a3 , . . . , f (ak−1 ) = ak , f (ak ) = a1 , and f leaves all other elements (if there are any)
of A fixed; that is, f (aj ) = aj for aj ≠ ai , i = 1, 2, . . . , k. Such a permutation f is called a
cycle or a k-cycle.
f = (a1 , a2 , . . . , ak ).
https://doi.org/10.1515/9783111142524-011
162 � 11 Symmetric and Alternating Groups
The cycle notation is read from left to right. It says f takes a1 into a2 , a2 into a3 , et
cetera, and finally ak , the last symbol, into a1 , the first symbol. Moreover, f leaves all the
other elements not appearing in the representation above fixed.
Note that one can write the same cycle in many ways using this type of notation; for
example, f = (a2 , a3 , . . . , ak , a1 ). In fact, any cyclic rearrangement of the symbols gives
the same cycle. The integer k is the length of the cycle. Note we allow a cycle to have
length 1, that is, f = (a1 ), for instance. This is just the identity map. For this reason, we
will usually designate the identity of Sn by (1), or just 1. (Of course, it also could be written
as (ai ), where ai ∈ A.)
If f and g are two cycles, they are called disjoint cycles if the elements moved by
one are left fixed by the other; that is, their representations contain different elements
of the set A (their representations are disjoint as sets).
Lemma 11.1.2. If f and g are disjoint cycles, then they must commute; that is, fg = gf .
Proof. Since the cycles f and g are disjoint, each element moved by f is fixed by g, and
vice versa. First, suppose f (ai ) ≠ ai . This implies that g(ai ) = ai , and f 2 (ai ) ≠ f (ai ).
But since f 2 (ai ) ≠ f (ai ), g(f (ai )) = f (ai ). Thus, (fg)(ai ) = f (g(ai )) = f (ai ), whereas
(gf )(ai ) = g(f (ai )) = f (ai ). Similarly, if g(aj ) ≠ aj , then (fg)(aj ) = (gf )(aj ). Finally, if
f (ak ) = ak and g(ak ) = ak , clearly then, (fg)(ak ) = ak = (gf )(ak ). Thus, gf = fg.
Before proceeding further with the theory, let us consider a specific example. Let
A = {1, 2, . . . , 8}, and let
1 2 3 4 5 6 7 8
f =( ).
2 4 6 5 1 7 3 8
We pick an arbitrary number from the set A, say 1. Then f (1) = 2, f (2) = 4, f (4) = 5,
f (5) = 1. Now select an element from A not in the set {1, 2, 4, 5}, say 3. Then f (3) = 6,
f (6) = 7, f (7) = 3.
Next select any element of A that does not occur in the set {1, 2, 4, 5} ∪ {3, 6, 7}. The
only element left is 8, and f (8) = 8. It is clear that we can now write the permutation f
as a product of cycles:
where the order of the cycles is immaterial since they are disjoint and, therefore, com-
mute. It is customary to omit such cycles as (8) and write f simply as
f = (1, 2, 4, 5)(3, 6, 7)
with the understanding that the elements of A not appearing are left fixed by f .
It is not difficult to generalize what was done here for a specific example, and show
that any permutation f can be written uniquely, except for order, as a product of disjoint
11.1 Symmetric Groups and Cycle Decomposition � 163
cycles. Thus, let f be a permutation on the set A = {1, 2, . . . , n}, and let a1 ∈ A. Let f (a1 ) =
a2 , f 2 (a1 ) = f (a2 ) = a3 , et cetera, and continue until a repetition is obtained. We claim
that this first occurs for a1 ; that is, the first repetition is, say
For suppose the first repetition occurs at the k-th iterate of f and
and so f k−j+1 (a1 ) = a1 . However, k − j + 1 < k if j ≠ 1, and we assumed that the first repe-
tition occurred for k. Thus, j = 1, and so f does cyclically permute the set {a1 , a2 , . . . , ak }.
If k < n, then there exists b1 ∈ A such that b1 ∉ {a1 , a2 , . . . , ak }, and we may proceed
similarly with b1 . We continue in this manner until all the elements of A are accounted
for. It is then seen that f can be written in the form
and so on. Here, by definition, b1 is the smallest element in {1, 2, . . . , n}, which does not
belong to {a1 = f 0 (a1 ) = f k (a1 ), a2 = f 1 (a1 ), . . . , ak = f k−1 (a1 )}; c1 is the smallest element
in {1, 2, . . . , n}, which does not belong to
Example 11.1.4. The elements of S3 can be written in cycle notation as 1 = (1), (1, 2), (1, 3),
(2, 3), (1, 2, 3), (1, 3, 2). This is the largest symmetric group, which consists entirely of cy-
cles.
In S4 , for example, the element (1, 2)(3, 4) is not a cycle, but a product of cycles. Sup-
pose we multiply two elements of S3 , say (1, 2) and (1, 3). In forming the product or com-
position here, we read from right to left. Thus, to compute (1, 2)(1, 3): We note the per-
mutation (1, 3) takes 1 into 3, and then the permutation (1, 2) takes 3 into 3. Therefore,
the composite (1, 2)(1, 3) takes 1 into 3. Continuing the permutation, (1, 3) takes 3 into 1,
and then the permutation (1, 2) takes 1 into 2. Therefore, the composite (1, 2)(1, 3) takes 3
into 2. Finally, (1, 3) takes 2 into 2, and then (1, 2) takes 2 into 1. So (1, 2)(1, 3) takes 2 into 1.
Thus, we see (1, 2)(1, 3) = (1, 3, 2).
As another example of this cycle multiplication consider (1, 2)(2, 4, 5)(1, 3)(1, 2, 5)
in S5 :
Reading from right to left 1 → 2 → 2 → 4 → 4 so 1 → 4. Now 4 → 4 → 4 → 5 → 5
so 4 → 5. Next 5 → 1 → 3 → 3 → 3 so 5 → 3. Then 3 → 3 → 1 → 1 → 2 so 3 → 2.
Finally, 2 → 5 → 5 → 2 → 1, so 2 → 1. Since all the elements of A = {1, 2, 3, 4, 5} have
been accounted for, we have (1, 2)(2, 4, 5)(1, 3)(1, 2, 5) = (1, 4, 5, 3, 2).
From Theorem 11.1.3, any permutation can be written in terms of cycles, but from the
above, any cycle can be written as a product of transpositions. Thus, we have the follow-
ing result:
W (f ) = (k − 1) + (j − 1) + ⋅ ⋅ ⋅ + (t − 1)
Suppose now that f is represented as a product of disjoint cycles, where we include all
the 1-cycles of elements of A, which f fixes, if any. If a and b occur in the same cycle in
this representation for f ,
f = ⋅ ⋅ ⋅ (a, b1 , . . . , bk , b, c1 , . . . , ct ) ⋅ ⋅ ⋅ ,
then, in the computation of W (f ), this cycle contributes k + t + 1. Now consider (a, b)f .
Since the cycles are disjoint and disjoint cycles commute,
(a, b1 , . . . , bk , b, c1 , . . . , ct ).
W ((a, b)f ) = W (f ) ± 1.
Then
Iterating this, together with the fact that W (1) = 0, shows that
W (f )(±1)(±1)(±1) ⋅ ⋅ ⋅ (±1) = 0,
W (f ) = (±1)(±1) ⋅ ⋅ ⋅ (±1),
m times.
Note, if exactly p are + and q = m − p are −, then m = p + q, and W (f ) = p − q. Hence,
m ≡ W (f ) (mod 2). Thus, W (f ) is even if and only if m is even, and this completes the
proof.
It now makes sense to state the following definition since we know that the parity
is indeed unique:
Definition 11.2.3. For n ≥ 2 we define the sign function sgn : Sn → (ℤ2 , +) by setting
sgn(π) = 0 if π is an even permutation and sgn(π) = 1 if π is an odd permutation.
We note that if f and g are even permutations, then so are fg and f −1 and also the
identity permutation is even. Furthermore, if f is even and g is odd, it is clear that fg is
odd. From this it is straightforward to establish the following:
Lemma 11.2.4. The map sgn is a homomorphism from Sn , for n ≥ 2, onto (ℤ2 , +).
We now let
An = {π ∈ Sn : sgn(π) = 0}.
Theorem 11.2.5. For each n ∈ ℕ, n ≥ 2, the set An forms a normal subgroup of index 2 in
Sn , called the alternating group on n symbols. Furthermore, |An | = n!2 .
11.3 The Conjugation in Sn � 167
have the same cycle structure. In particular, if π1 , π2 are two permutations in Sn , then
π1 , π2 are conjugates if and only if they have the same cycle structure. Therefore, in S8 ,
the permutations
are conjugates.
be the cycle decomposition of π ∈ Sn . Let τ ∈ Sn , and denote the image of aij under τ by aijτ .
Then
τ τ τ τ τ τ
τπτ −1 = (a11 , a12 , . . . , a1k 1
) ⋅ ⋅ ⋅ (as1 , as2 , . . . , asks
).
Proof. (a) Consider a11 , then operating on the left like functions, we have
τ τ
τπτ −1 (a11 ) = τπ(a11 ) = τ(a12 ) = a12 .
The same computation then follows for all the symbols aij , proving the lemma.
Theorem 11.3.2. Two permutations π1 , π2 ∈ Sn are conjugates if and only if they are of
the same cycle structure.
Proof. Suppose that π2 = τπ1 τ −1 . Then, from Lemma 11.3.1, we have that π1 and π2 are
of the same cycle structure.
168 � 11 Symmetric and Alternating Groups
Conversely, suppose that π1 and π2 are of the same cycle structure. Let
where we place the cycles of the same length under each other. Let τ be the permutation
in Sn that maps each symbol in π1 to the digit below it in π2 . Then, from Lemma 11.3.1,
we have τπ1 τ −1 = π2 ; hence, π1 and π2 are conjugate.
Case (3): (a, b)(c, d) = (a, b)(b, c)(b, c)(c, d) = (c, a, b)(c, d, b)
since (b, c)(b, c) = 1. Therefore, it is also true here, proving the theorem.
Now our main result:
We claim first that N must contain a 3-cycle. Let 1 ≠ π ∈ N, then π is not a transposi-
tion since π ∈ An . Therefore, π moves at least 3 digits. If π moves exactly 3 digits, then it
is a 3-cycle, and we are done. Suppose then that π moves at least 4 digits. Let π = τ1 ⋅ ⋅ ⋅ τr
with τi disjoint cycles.
Case (1): There is a τi = (. . . , a, b, c, d). Set σ = (a, b, c) ∈ An . Then
However, from Lemma 11.3.1, (b, c, d) = (aτi , bτi , cτi ). Furthermore, since π ∈ N and N is
normal, we have
and
Proof. Let
1 2 a3 ⋅⋅⋅ an
π=( ).
1 2 3 ⋅⋅⋅ n
Furthermore, π(1, 2)π −1 = (1, 2). Hence, U1 = πUπ −1 contains (1, 2) and (1, 2, . . . , n).
Now we have
Analogously,
and so on until
Proof. Suppose, without loss of generality, that τ = (1, 2). Since α, . . . , αp−1 are p-cycles
with no fixed points (recall that p is a prime number), there exists an i with αi (1) = 2.
Without loss of generality, we may assume that α = (1, 2, a3 , . . . , ap ). Now the result fol-
lows from Theorem 11.4.3.
11.5 Exercises � 171
11.5 Exercises
1. Show that for n ≥ 3, the group An is generated by {(1, 2, k) : k ≥ 3}.
2. Let σ = (k1 , . . . , ks ) ∈ Sn be a permutation. Show that the order of σ is the least
common multiple of k1 , . . . , ks . Compute the order of τ = ( 21 62 35 41 35 46 77 ) ∈ S7 .
3. Let G = S4 .
(i) Determine a noncyclic subgroup H of order 4 of G.
(ii) Show that H is normal.
(iii) Show that f (g)(h) := ghg −1 defines an epimorphism f : G → Aut(H) for g ∈ G
and h ∈ H. Determine its kernel.
4. Show that all subgroups of order 6 of S4 are conjugate.
5. Let σ1 = (1, 2)(3, 4) and σ2 = (1, 3)(2, 4) ∈ S4 . Determine τ ∈ S4 such that τσ1 τ −1 = σ2 .
6. Let σ = (a1 , . . . , ak ) ∈ Sn . Describe σ −1 .
12 Solvable Groups
12.1 Solvability and Solvable Groups
The original motivation for Galois theory grew out of a famous problem in the theory of
equations. This problem was to determine the solvability or insolvability of a polynomial
equation of degree 5 or higher in terms of a formula involving the coefficients of the
polynomial and only using algebraic operations and radicals. This question arose out of
the well-known quadratic formula.
The ability to solve quadratic equations and, in essence, the quadratic formula was
known to the Babylonians some 3600 years ago. With the discovery of imaginary num-
bers, the quadratic formula then says that any second degree polynomial over ℂ can
be solved by radicals in terms of the coefficients. In the sixteenth century, the Italian
mathematician, Niccolo Tartaglia, discovered a similar formula in terms of radicals to
solve cubic equations. This cubic formula is now known erroneously as Cardano’s for-
mula in honor of Cardano, who first published it in 1545. An earlier special version of
this formula was discovered by Scipione del Ferro. Cardano’s student, Ferrari, extended
the formula to solutions by radicals for fourth degree polynomials. The combination of
these formulas says that polynomial equations of degree four or less over the complex
numbers can be solved by radicals.
From Cardano’s work until the very early nineteenth century, attempts were made
to find similar formulas for degree five polynomials. In 1805, Ruffini proved that fifth de-
gree polynomial equations are insolvable by radicals in general. Therefore, there exists
no comparable formula for degree 5. Abel (in 1825–1826) and Galois (in 1831) extended
Ruffini’s result and proved the insolubility by radicals for all degrees five or greater. In
doing this, Galois developed a general theory of field extensions and its relationship to
group theory. This has come to be known as Galois theory and is really the main focus
of this book.
The solution of the insolvability of the quintic and higher polynomials involved a
translation of the problem into a group theory setting. For a polynomial equation to
be solvable by radicals, its corresponding Galois group (a concept we will introduce in
Chapter 16) must be a solvable group. This is a group with a certain defined structure. In
this chapter, we introduce and discuss this class of groups.
A normal series for a group G is a finite chain of subgroups beginning with G and
ending with the identity subgroup {1}
G = G0 ⊃ G1 ⊃ G2 ⊃ ⋅ ⋅ ⋅ ⊃ Gn−1 ⊃ Gn = {1},
in which each Gi+1 is a proper normal subgroup of Gi . The factor groups Gi /Gi+1 are
called the factors of the series, and n is the length of the series.
https://doi.org/10.1515/9783111142524-012
12.1 Solvability and Solvable Groups � 173
Definition 12.1.1. A group G is solvable if it has a normal series with Abelian factors;
that is, Gi /Gi+1 is Abelian for all i = 0, 1, . . . , n − 1. Such a normal series is called a solvable
series.
S3 ⊃ A3 ⊃ {1}.
Since |S3 | = 6, we have |A3 | = 3; hence, A3 is cyclic and therefore Abelian. Furthermore,
|S3 /A3 | = 2; hence, the factor group S3 /A3 is also cyclic, thus Abelian. Therefore, the
series above gives a solvable series for S3 .
Lemma 12.1.2. If G is a finite solvable group, then G has a normal series with cyclic fac-
tors.
Proof. If G is a finite solvable group, then by definition, it has a normal series with
Abelian factors. Hence, to prove the lemma, it suffices to show that a finite Abelian group
has a normal series with cyclic factors. Let A be a nontrivial finite Abelian group. We do
an induction on the order of A. If |A| = 2, then A itself is cyclic, and the result follows.
Suppose that |A| > 2. Choose an 1 ≠ a ∈ A. Let N = ⟨a⟩ so that N is cyclic. Then we have
the normal series A ⊃ N ⊃ {1} with A/N Abelian. Moreover, A/N has order less than A,
so A/N has a normal series with cyclic factors, and the result follows.
G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}
is a solvable series for G. Hence, Gi+1 is a normal subgroup of Gi for each i, and the factor
group Gi /Gi+1 is Abelian.
Now let H be a subgroup of G, and consider the chain of subgroups
H = H ∩ G0 ⊃ H ∩ G1 ⊃ ⋅ ⋅ ⋅ ⊃ H ∩ Gr = {1}.
Since Gi+1 is normal in Gi , we know that H ∩ Gi+1 is normal in H ∩ Gi ; this gives a finite
normal series for H. Furthermore, from the second isomorphism theorem, we have
for each i. However, Gi /Gi+1 is Abelian, so each factor in the normal series for H is
Abelian. Therefore, the above series is a solvable series for H; hence, H is also solvable.
(2) Let N be a normal subgroup of G. Then from (1) N is also solvable. As above, let
G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}
It follows that Gi+1 N is normal in Gi N for each i; therefore, the series for G/N is a normal
series.
Again, from the isomorphism theorems,
However, the last group (Gi /Gi+1 )/((Gi ∩ Gi+1 N)/Gi+1 ) is a factor group of the group
Gi /Gi+1 , which is Abelian. Hence, this last group is also Abelian; therefore, each factor
in the normal series for G/N is Abelian. Hence, this series is a solvable series, and G/N
is solvable.
The following is a type of converse of the above theorem:
Theorem 12.1.4. Let G be a group and N a normal subgroup of G. If both N and G/N are
solvable, then G is solvable.
N = N0 ⊃ N1 ⊃ ⋅ ⋅ ⋅ ⊃ Nr = {1}
G/N = G0 /N ⊃ G1 /N ⊃ ⋅ ⋅ ⋅ ⊃ Gs /N = N/N = {1}
G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gs = N ⊃ N1 ⊃ ⋅ ⋅ ⋅ ⊃ Nr = {1}
gives a normal series for G. Furthermore, from the isomorphism theorems again,
hence, each factor is Abelian. Therefore, this is a solvable series for G; hence, G is solv-
able.
12.1 Solvability and Solvable Groups � 175
This theorem allows us to prove that solvability is preserved under direct products.
Corollary 12.1.5. Let G and H be solvable groups. Then their direct product G × H is also
solvable.
Proof. Suppose that G and H are solvable groups and K = G × H. Recall from Chapter 10
that G can be considered as a normal subgroup of K with K/G ≅ H. Therefore, G is a solv-
able subgroup of K, and K/G is a solvable quotient. It follows then, from Theorem 12.1.4,
that K is solvable.
We saw that the symmetric group S3 is solvable. However, the following theorem
shows that the symmetric group Sn is not solvable for n ≥ 5. This result will be crucial
to the proof of the insolvability of the quintic and higher polynomials.
Proof. For n ≥ 5, we saw that the alternating group An is simple. Furthermore, An is non-
Abelian. Hence, An cannot have a nontrivial normal series, and so no solvable series.
Therefore, An is not solvable. If Sn were solvable for n ≥ 5, then from Theorem 12.1.3,
An would also be solvable. Therefore, Sn must also be nonsolvable for n ≥ 5.
Lemma 12.1.7. If a group G is both simple and solvable, then G is cyclic of prime order.
Proof. Suppose that G is a nontrivial simple, solvable group. Since G is simple, the only
normal series for G is G = G0 ⊃ {1}. Since G is solvable, the factors are Abelian; hence,
G is Abelian. Again, since G is simple, G must be cyclic. If G were infinite, then G ≅
(ℤ, +). However, then 2ℤ is a proper normal subgroup, a contradiction. Therefore, G
must be finite cyclic. If the order were not prime, then for each proper divisor of the
order, there would be a nontrivial proper normal subgroup. Therefore, G must be of
prime order.
In general, a finite p-group is solvable.
Definition 12.2.1. Let G′ be the subgroup of G, which is generated by the set of all com-
mutators
G′ = gp({[x, y] : x, y ∈ G}).
Theorem 12.2.2. For any group G, the commutator subgroup G′ is a normal subgroup of
G, and G/G′ is Abelian. Furthermore, if H is a normal subgroup of G, then G/H is Abelian
if and only if G′ ⊂ H.
Proof. The commutator subgroup G′ consists of all finite products of commutators and
inverses of commutators. However,
and so the inverse of a commutator is once again a commutator. It then follows that G′ is
precisely the set of all finite products of commutators; that is, G′ is the set of all elements
of the form
h1 h2 ⋅ ⋅ ⋅ hn ,
since [a, b] ∈ G′ . In other words, any two elements of G/G′ commute; therefore, G/G′ is
Abelian.
Now let N be a normal subgroup of G with G/N Abelian. Let a, b ∈ G, then aN and
bN commute since G/N is Abelian. Therefore,
12.3 Composition Series and the Jordan–Hölder Theorem � 177
From the second part of Theorem 12.2.2, we see that G′ is the minimal normal sub-
group of G such that G/G′ is Abelian. We call G/G′ = Gab the Abelianization of G.
We consider next the following inductively defined sequence of subgroups of an
arbitrary group G called the derived series:
Definition 12.2.3. For an arbitrary group G, define G(0) = G and G(1) = G′ , and then,
inductively, G(n+1) = (G(n) )′ . That is, G(n+1) is the commutator subgroup or derived group
of G(n) . The chain of subgroups
Notice that since G(i+1) is the commutator subgroup of G(i) , we have G(i) /G(i+1) is
Abelian. If the derived series was finite, then G would have a normal series with Abelian
factors; hence would be solvable. The converse is also true and characterizes solvable
groups in terms of the derived series.
Theorem 12.2.4. A group G is solvable if and only if its derived series is finite. That is,
there exists an n such that G(n) = {1}.
Proof. If G(n) = {1} for some n, then as explained above, the derived series provides a
solvable series for G; hence, G is solvable. Conversely, suppose that G is solvable, and let
G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}
be a solvable series for G. We claim first that Gi ⊃ G(i) for all i. We do this by induction
on r. If r = 0, then G = G0 = G(0) . Suppose that Gi ⊃ G(i) . Then Gi′ ⊃ (G(i) )′ = G(i+1) . Since
Gi /Gi+1 is Abelian, it follows, from Theorem 12.2.2, that Gi+1 ⊃ Gi′ . Therefore, Gi+1 ⊃ G(i+1) ,
establishing the claim. Now if G is solvable, from the claim, we have that Gr ⊃ G(r) .
However, Gr = {1}; therefore, G(r) = {1}, proving the theorem.
The length of the derived series is called the solvability length of a solvable group G.
The class of solvable groups of class c consists of those solvable groups of solvability
length c, or less.
are two normal series for the group G, then the second is a refinement of the first if
all the terms of the second occur in the first series. Furthermore, two normal series
are called equivalent or (isomorphic) if there exists a 1–1 correspondence between the
factors (hence the length must be the same) of the two series such that the corresponding
factors are isomorphic.
Theorem 12.3.1 (Schreier’s theorem). Any two normal series for a group G have equiva-
lent refinements.
G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gs−1 ⊃ Gs = {1},
G = H0 ⊃ H1 ⊃ ⋅ ⋅ ⋅ ⊃ Ht−1 ⊃ Ht = {1}.
Now define
Then we have
and
Now, applying the third isomorphism theorem to the groups Gi , Hj , Gi+1 , Hj+1 , we have
that Gi(j+1) = (Gi ∩ Hj+1 )Gi+1 is a normal subgroup of Gij = (Gi ∩ Hj )Gi+1 , and also that
Hj(i+1) = (Gi+1 ∩ Hj )Hj+1 is a normal subgroup of Hji = (Gi ∩ Hj )Hj+1 . Furthermore,
Thus, the above two are normal series, which are refinements of the two given series,
and they are equivalent.
Definition 12.3.2. A composition series for a group G is a normal series, where all the
inclusions are proper and such that Gi+1 is maximal in Gi . Equivalently, a normal series,
where each factor is simple.
It is possible that an arbitrary group does not have a composition series, or even if
it does have one, a subgroup of it may not have one. Of course, a finite group does have
a composition series.
In the case in which a group G does have a composition series, the following impor-
tant theorem, called the Jordan–Hölder theorem, provides a type of unique factoriza-
tion.
Theorem 12.3.3 (Jordan–Hölder theorem). If a group G has a composition series, then any
two composition series are equivalent; that is, the composition factors are unique.
Proof. Suppose we are given two composition series. Applying Theorem 12.3.1, we get
that the two composition series have equivalent refinements. But the only refinement
of a composition series is one obtained by introducing repetitions. If in the 1–1 corre-
spondence between the factors of these refinements, the paired factors equal to {e} are
disregarded; that is, if we drop the repetitions, clearly, we get that the original composi-
tion series are equivalent.
We remarked in Chapter 10 that the simple groups are important, because they play
a role in finite group theory somewhat analogous to that of the primes in number theory.
In particular, an arbitrary finite group G can be broken down into simple components.
These uniquely determined simple components are, according to the Jordan–Hölder the-
orem, the factors of a composition series for G.
12.4 Exercises
1. Let K be a field and
{ a x y }
{ }
G = {(0 b z) : a, b, c, x, y, z ∈ K, abc ≠ 0} .
{ }
{ 0 0 c }
πg : A → A
such that
(1) πg1 (πg2 (a)) = πg1 g2 (a) for all g1 , g2 ∈ G and for all a ∈ A,
(2) π1 (a) = a for all a ∈ A.
For the remainder of this chapter, if g ∈ G and a ∈ A, we will write ga for πg (a). Group
actions are an extremely important idea, and we use this idea in the present chapter to
prove several fundamental results in group theory. If G acts on the set A, then we say
that two elements a1 , a2 ∈ A are congruent under G if there exists a g ∈ G with ga1 = a2 .
The set
Proof. Any element a ∈ A is congruent to itself via the identity map; hence, the relation
is reflexive. If a1 ∼ a2 so that ga1 = a2 for some g ∈ G, then g −1 a2 = a1 , and so a2 ∼ a1 ,
and the relation is symmetric. Finally, if g1 a1 = a2 and g2 a2 = a3 , then g2 g1 a1 = a3 , and
the relation is transitive.
Recall that the equivalence classes under an equivalence relation partition a set.
For a given a ∈ A, its equivalence class under this relation is precisely its orbit Ga , as
defined above.
Corollary 13.1.2. If G acts on the set A, then the orbits under G partition the set A.
We say that G acts transitively on A if any two elements of A are congruent under G.
That is, the action is transitive if for any a1 , a2 ∈ A there is some g ∈ G such that ga1 = a2 .
If a ∈ A, the stabilizer of a consists of those g ∈ G that fix a. Hence,
Lemma 13.1.3. If G acts on A, then for any a ∈ A, the stabilizer StabG (a) is a subgroup of G.
https://doi.org/10.1515/9783111142524-013
182 � 13 Group Actions and the Sylow Theorems
Theorem 13.1.4. Suppose that G acts on A and a ∈ A. Let Ga be the orbit of a under G and
StabG (a) its stabilizer. Then
G : StabG (a) = |Ga |.
That is, the size of the orbit of a is the index of its stabilizer in G.
Proof. Suppose that g1 , g2 ∈ G with g1 StabG (a) = g2 StabG (a); that is, they define the
same left coset of the stabilizer. Then g2−1 g1 ∈ StabG (a). This implies that g2−1 g1 a = a so
that g2 a = g1 a. Hence, any two elements in the same left coset of the stabilizer produce
the same image of a in Ga . Conversely, if g1 a = g2 a, then g1 , g2 define the same left coset
of StabG (a). This shows that there is a one-to-one correspondence between left cosets of
StabG (a) and elements of Ga . It follows that the size of Ga is precisely the index of the
stabilizer.
We will use this theorem repeatedly with different group actions to obtain impor-
tant group theoretic results.
and showed that it is a normal subgroup of G. We use this normal subgroup in conjunc-
tion with what we call the class equation to show that any finite p-group has a nontrivial
center. In this section, we use group actions to derive the class equation and prove the
result for finite p-groups.
Recall that if G is a group, then two elements g1 , g2 ∈ G are conjugate if there exists a
g ∈ G with g −1 g1 g = g2 . We saw that conjugacy is an equivalence relation on G. For The
equivalence class of g ∈ G is called its conjugacy class, which we will denote by Cl(g).
Thus,
If g ∈ G, then its centralizer CG (g) is the set of elements in G that commute with g:
Theorem 13.2.1. Let G be a finite group and g ∈ G. Then the centralizer of g is a subgroup
of G, and
13.2 Conjugacy Classes and the Class Equation � 183
G : CG (g) = Cl(g).
That is, the index of the centralizer of g is the size of its conjugacy class.
In particular, for a finite group the size of each conjugacy class divides the order of
the group.
Proof. Let the group G act on itself by conjugation. That is, g(g1 ) = g −1 g1 g. It is easy
to show that this is an action on the set G (see exercises). The orbit of g ∈ G under this
action is precisely its conjugacy class Cl(g), and the stabilizer is its centralizer CG (g). The
statements in the theorem then follow directly from Theorem 13.1.4.
For any group G, since conjugacy is an equivalence relation, the conjugacy classes
partition G. Hence,
G = ⋃̇ Cl(g),
g∈G
where this union is taken over the distinct conjugacy classes. It follows that
|G| = ∑ Cl(g),
g∈G
G = Z(G) ∪ ⋃̇ Cl(g),
g∉Z(G)
where again the second union is taken over the distinct conjugacy classes Cl(g) with
g ∉ Z(G). The size of G is then the sum of these disjoint pieces, so
|G| = Z(G) + ∑ Cl(g),
g∉Z(G)
where the sum is taken over the distinct conjugacy classes Cl(g) with g ∉ Z(G). However,
from Theorem 13.2.1, |Cl(g)| = |G : CG (g)|, so the equation above becomes
|G| = Z(G) + ∑ G : CG (g),
g∉Z(G)
where the sum is taken over the distinct indices |G : CG (g)| with g ∉ Z(G). This is known
as the class equation.
184 � 13 Group Actions and the Sylow Theorems
|G| = Z(G) + ∑ G : CG (g),
g∉Z(G)
As a first application, we prove the result that finite p-groups have nontrivial centers
(see Lemma 10.5.6).
Proof. Let G be a finite p-group so that |G| = pn for some n, and consider the class equa-
tion
|G| = Z(G) + ∑ G : CG (g),
g∉Z(G)
where the sum is taken over the distinct centralizers. Since |G : CG (g)| divides |G| for
each g ∈ G, we must have that p||G : CG (g)| for each g ∈ G. Furthermore, p||G|. Therefore, p
must divide |Z(G)|; hence, |Z(G)| = pm for some m ≥ 1. Therefore, Z(G) is nontrivial.
The idea of conjugacy and the centralizer of an element can be extended to sub-
groups. If H1 , H2 are subgroups of a group G, then H1 , H2 are conjugate if there exists a
g ∈ G such that g −1 H1 g = H2 . As for elements, conjugacy is an equivalence relation on
the set of subgroups of G.
If H ⊂ G is a subgroup, then its conjugacy class consists of all the subgroups of G
conjugate to it. The normalizer of H is
NG (H) = {g ∈ G : g −1 Hg = H}.
As for elements, let G act on the set of subgroups of G by conjugation. That is, for
g ∈ G, the map is given by H → g −1 Hg. For H ⊂ G, the stabilizer under this action is pre-
cisely the normalizer. Hence, exactly as for elements, we obtain the following theorem:
Theorem 13.2.4. Let G be a group and H ⊂ G a subgroup. Then the normalizer NG (H) of
H is a subgroup of G, H is normal in NG (H), and
G : NG (H) = number of conjugates of H in G.
Lemma 13.3.1. The alternating group on 4 symbols A4 has order 12, but has no subgroup
of order 6.
Proof. Suppose that there exists a subgroup U ⊂ A4 with |U| = 6. Then |A4 : U| = 2 since
|A4 | = 12; hence, U is normal in A4 .
Now id, (1, 2)(3, 4), (1, 3)(2, 4), (1, 4)(2, 3) are in A4 . These each have order 2 and com-
mute, so they form a normal subgroup V ⊂ A4 of order 4. This subgroup V is isomorphic
to ℤ2 × ℤ2 . Then
|V ||U| 4⋅6
12 = |A4 | ≥ |VU| = = .
|V ∩ U| |V ∩ U|
It follows that V ∩U ≠ {1}, and since U is normal, we have that V ∩U is also normal in A4 .
Now (1, 2)(3, 4) ∈ V , and by renaming the entries in V , if necessary, we may assume
that it is also in U, so that (1, 2)(3, 4) ∈ V ∩ U. Since (1, 2, 3) ∈ A4 , we have
and then
But then V ⊂ V ∩ U, and so V ⊂ U. But this is impossible since |V | = 4, which does not
divide |U| = 6.
Definition 13.3.2. Let G be a finite group with |G| = n, and let p be a prime such that
pa |n, but no higher power of p divides n. A subgroup of G of order pa is called a p-Sylow
subgroup.
It is not a clear that a p-Sylow subgroup must exist. We will prove that for each p|n
a p-Sylow subgroup exists.
We first consider and prove a very special case.
Theorem 13.3.3. Let G be a finite Abelian group, and let p be a prime such that p||G|. Then
G contains at least one element of order p.
Proof. Suppose that G is a finite Abelian group of order pn. We use induction on n. If
n = 1, then G has order p, and hence is cyclic. Therefore, it has an element of order p.
Suppose that the theorem is true for all Abelian groups of order pm with m < n, and
186 � 13 Group Actions and the Sylow Theorems
suppose that G has order pn. Suppose that g ∈ G. If the order of g is pt for some integer t,
then g t ≠ 1, and g t has order p, proving the theorem in this case. Hence, we may suppose
that g ∈ G has order prime to p, and we show that there must be an element, whose order
is a multiple of p, and then use the above argument to get an element of exact order p.
Hence, we have g ∈ G with order m, where (m, p) = 1. Since m||G| = pn, we must
have m|n. Since G is Abelian, ⟨g⟩ is normal, and the factor group G/⟨g⟩ is Abelian of
order p( mn ) < pn. By the inductive hypothesis, G/⟨g⟩ has an element h⟨g⟩ of order p,
h ∈ G; hence, hp = g k for some k. g k has order m1 |m; therefore, h has order pm1 . Now,
as above, hm1 has order p, proving the theorem.
Theorem 13.3.4 (First Sylow theorem). Let G be a finite group, and let p||G|, then G con-
tains a p-Sylow subgroup; that is, a p-Sylow subgroup exists.
|G| = Z(G) + ∑ G : CG (g),
g∉Z(G)
where the sum is taken over the distinct centralizers. By assumption, each of the indices
are divisible by p and also p||G|. Therefore, p||Z(G)|. It follows that Z(G) is a finite Abelian
group, whose order is divisible by p. From Theorem 13.3.3, there exists an element g ∈
Z(G) ⊂ G of order p. Since g ∈ Z(G), we must have ⟨g⟩ normal in G. The factor group
G/⟨g⟩ then has order pt−1 m, and—by the inductive hypothesis—must have a p-Sylow
subgroup K of order pt−1 , hence of index m. By the Correspondence Theorem 10.2.6,
there is a subgroup K of G with ⟨g⟩ ⊂ K such that K/⟨g⟩ ≅ K. Therefore, |K| = pt , and K
is a p-Sylow subgroup of G.
On the basis of this theorem, we can now strengthen the result obtained in Theo-
rem 13.3.3.
Theorem 13.3.5 (Cauchy). If G is a finite group, and if p is a prime such that p||G|, then G
contains at least one element of order p.
13.3 The Sylow Theorems � 187
Proof. Let P be a p-Sylow subgroup of G, and let |P| = pt . If g ∈ P, g ≠ 1, then the order
t1 −1
of g is pt1 . Then g p has order p.
We have seen that p-Sylow subgroups exist. We now wish to show that any two
p-Sylow subgroups are conjugate. This is the content of the second Sylow theorem:
Theorem 13.3.6 (Second Sylow theorem). Let G be a finite group and p a prime such that
p||G|. Then any p-subgroup H of G is contained in a p-Sylow subgroup. Furthermore, all
p-Sylow subgroups of G are conjugate. That is, if P1 and P2 are any two p-Sylow subgroups
of G, then there exists an a ∈ G such that P1 = aP2 a−1 .
Proof. Let Ω be the set of p-Sylow subgroups of G, and let G act on Ω by conjugation. This
action will, of course, partition Ω into disjoint orbits. Let P be a fixed p-Sylow subgroup
and ΩP be its orbit under the conjugation action. The size of the orbit is the index of its
stabilizer; that is, |ΩP | = |G : StabG (P)|. Now P ⊂ StabG (P), and P is a maximal p-subgroup
of G. It follows that the index of StabG (P) must be prime to p, and so the number of
p-Sylow subgroups conjugate to P is prime to p.
Now let H be a p-subgroup of G, and let H act on ΩP by conjugation. ΩP will itself
decompose into disjoint orbits under this actions. Furthermore, the size of each orbit is
an index of a subgroup of H, hence must be a power of p. On the other hand, the size of
the whole orbit is prime to p. Therefore, there must be one orbit that has size exactly 1.
This orbit contains a p-Sylow subgroup P′ , and P′ is fixed by H under conjugation; that
is, H normalizes P′ . It follows that HP′ is a subgroup of G, and P′ is normal in HP′ . From
the second isomorphism theorem, we then obtain
We come now to the last of the three Sylow theorems. This one gives us information
concerning the number of p-Sylow subgroups.
Theorem 13.3.7 (Third Sylow theorem). Let G be a finite group and p a prime such that
p||G|. Then the number of p-Sylow subgroups of G is of the form 1 + pk and divides the
order of |G|. It follows that if |G| = pa m with (p, m) = 1, then the number of p-Sylow
subgroups divides m.
188 � 13 Group Actions and the Sylow Theorems
Proof. Let P be a p-Sylow subgroup, and let P act on Ω, the set of all p-Sylow subgroups,
by conjugation. Now P normalizes itself, so there is one orbit, namely, P, having exactly
size 1. Every other orbit has size a power of p since the size is the index of a nontrivial
subgroup of P, and therefore must be divisible by p. Hence, the size of the Ω is 1 + pk.
Theorem 13.4.1. Let G be a group of order pn , p a prime number. Then G contains at least
one normal subgroup of order pm for each m such that 0 ≤ m ≤ n.
Proof. We use induction on n. For n = 1, the theorem is trivial. By Lemma 10.5.7, any
group of order p2 is Abelian. This, together with Theorem 13.3.3, establishes the claim
for n = 2.
We now assume the theorem is true for all groups G of order pk , where 1 ≤ k < n,
where n > 2. Let G be a group of order pn . From Lemma 10.3.4, G has a nontrivial center
of order at least p, hence an element g ∈ Z(G) of order p. Let N = ⟨g⟩. Since g ∈ Z(G),
it follows that N is normal subgroup of order p. Then G/N is of order pn−1 , therefore
contains (by the induction hypothesis) normal subgroups of orders pm−1 , for 0 ≤ m − 1 ≤
n − 1. These groups are of the form H/N, where the normal subgroup H ⊂ G contains N
and is of order pm , 1 ≤ m ≤ n, because |H| = |N|[H : N] = |N| ⋅ |H/N|.
On the basis of the first Sylow theorem, we see that if G is a finite group, and if pk ||G|,
then G must contain a subgroup of order pk . One can actually show that, as in the case
of Sylow p-groups, the number of such subgroups is of the form 1 + pt, but we shall not
prove this here.
Theorem 13.4.2. Let G be a finite Abelian group of order n. Suppose that d|n. Then G
contains a subgroup of order d.
e e f f
Proof. Suppose that n = p11 ⋅ ⋅ ⋅ pkk is the prime factorization of n. Then d = p11 ⋅ ⋅ ⋅ pkk
e
for some nonnegative f1 , . . . , fk . Now G has p1 -Sylow subgroup H1 of order p11 . Hence,
f
from Theorem 13.4.1, H1 has a subgroup K1 of order p11 . Similarly, there are subgroups
f f
K2 , . . . , Kk of G of respective orders p22 , . . . , pkk . Moreover, since the orders are disjoint,
f f
Ki ∩ Kj = {1} if i ≠ j and thus ⟨K1 , K2 , . . . , Kk ⟩ has order |K1 ||K2 | ⋅ ⋅ ⋅ |Kk | = p11 ⋅ ⋅ ⋅ pkk = d.
In Section 10.5, we examined the classification of finite groups of small orders. Here,
we use the Sylow theorems to extend some of this material further.
Theorem 13.4.3. Let p, q be distinct primes with p < q and q not congruent to 1 modulo p.
Then any group of order pq is cyclic. For example, any group of order 15 must be cyclic.
13.4 Some Applications of the Sylow Theorems � 189
Proof. Suppose that |G| = pq with p < q and q not congruent to 1 modulo p. The number
of q-Sylow subgroups is of the form 1 + qk and divides p. Since q is greater than p, this
implies that there can be only one; hence, there is a normal q-Sylow subgroup H. Since
q is a prime, H is cyclic of order q; therefore, there is an element g of order q.
The number of p-Sylow subgroups is of the form 1 + pk and divides q. Since q is not
congruent to 1 modulo p, this implies that there also can be only one p-Sylow subgroup;
hence, there is a normal p-Sylow subgroup K. Since p is a prime K is cyclic of order p;
therefore, there is an element h of order p.
Since p, q are distinct primes H ∩ K = {1}. Consider the element g −1 h−1 gh. Since
K is normal, g −1 hg ∈ K. Then g −1 h−1 gh = (g −1 h−1 g)h ∈ K. But H is also normal, so
h−1 gh ∈ H. This then implies that g −1 h−1 gh = g −1 (h−1 gh) ∈ H; and therefore we have
g −1 h−1 gh ∈ K ∩ H. It follows then that g −1 h−1 gh = 1 or gh = hg. Since g, h commute, the
order of gh is the lcm of the orders of g and h, which is pq. Therefore, G has an element
of order pq. Since |G| = pq, this implies that G is cyclic.
In the above theorem, since we assumed that q is not congruent to 1 modulo p, hence
p ≠ 2. In the case where p = 2, we get another possibility.
Theorem 13.4.4. Let p be an odd prime and G a finite group of order 2p. Then either G is
cyclic, or G is isomorphic to the dihedral group of order 2p; that is, the group of symmetries
of a regular p-gon. In this latter case, G is generated by two elements, g and h, which satisfy
the relations g p = h2 = (gh)2 = 1.
Proof. As in the proof of Theorem 13.4.3, G must have a normal cyclic subgroup of or-
der p, say ⟨g⟩. Since 2||G|, the group G must have an element of order 2, say h. Consider
the order of gh. By Lagrange’s theorem, this element can have order 1, 2, p, 2p. If the
order is 1, then gh = 1 or g = h−1 = h. This is impossible since g has order p, and h
has order 2. If the order of gh is p, then from the second Sylow theorem, gh ∈ ⟨g⟩. But
this implies that h ∈ ⟨g⟩, which is impossible since every nontrivial element of ⟨g⟩ has
order p. Therefore, the order of gh is either 2 or 2p.
If the order of gh is 2p, then since G has order 2p, it must be cyclic.
If the order of gh is 2, then within G, we have the relations g p = h2 = (gh)2 = 1. Let
H = ⟨g, h⟩ be the subgroup of G generated by g and h. The relations g p = h2 = (gh)2 = 1
imply that H has order 2p. Since |G| = 2p, we get that H = G. G is isomorphic to the
dihedral group Dp of order 2p (see exercises).
In the above description, g represents a rotation of 2π p
of a regular p-gon about its
center, whereas h represents any reflection across a line of symmetry of the regular
p-gon.
Example 13.4.5 (The groups of order 21). Let G be a group of order 21. The number of
7-Sylow subgroups of G is 1, because it is of the form 1 + 7k and divides 3. Hence, the
7-Sylow subgroup K is normal and cyclic; that is, K ⊲ G and K = ⟨a⟩ with a of order 7.
The number of 3-Sylow subgroups is analogously 1 or 7. If it is 1, then we have exactly
one element of order 3 in G, and if it is 7, there are 14 elements of order 3 in G.
190 � 13 Group Actions and the Sylow Theorems
Example 13.4.6. Consider GL(n, p), the group of n × n invertible matrices over ℤp . If
{v1 , . . . , vn } is a basis for (ℤp )n over ℤp , then the size of GL(n, p) is the number of inde-
pendent images {w1 , . . . , wn } of {v1 , . . . , vn }. For w1 there are pn − 1 choices; for w2 there
are pn − p choices and so on. It follows that
n(n−1)
n n n n−1 1+2+⋅⋅⋅+(n−1)
GL(n, p) = (p − 1)(p − p) ⋅ ⋅ ⋅ (p − p ) = p m=p 2 m
n(n−1)
with (p, m) = 1. Therefore, a p-Sylow subgroup must have size p 2 .
Let P be the subgroup of upper triangular matrices with 1’s on the diagonal. Then P
n(n−1)
has size p1+2+⋅⋅⋅+(n−1) = p 2 , and is therefore a p-Sylow subgroup of GL(n, p).
The final example is a bit more difficult. We mentioned that a major result on finite
groups is the classification of the finite simple groups. This classification showed that
any finite simple group is either cyclic of prime order, in one of several classes of groups
such as the An , n > 4, or one of a number of special examples called sporadic groups.
One of the major tools in this classification is the following famous result, called the
Feit–Thompson theorem, which showed that any finite group G of odd order is solvable
and, in addition, if G is not cyclic, then G is nonsimple.
Theorem 13.4.7 (Feit–Thompson theorem). Any finite group of odd order is solvable.
The proof of this theorem, one of the major results in algebra in the twentieth cen-
tury, is way beyond the scope of this book. The proof is actually hundreds of pages in
length, when one counts the results used. However, we look at the smallest non-Abelian
simple group.
Theorem 13.4.8. Suppose that G is a simple group of order 60. Then G is isomorphic to A5 .
Moreover, A5 is the smallest non-Abelian finite simple group.
The number of 3-Sylow subgroups is of the form 1 + 3k and divides 20. Hence, there
are 1, 4, 10. We claim that there are 10. There cannot be only 1, since G is simple. Suppose
there were 4. Let G act on the set of 3-Sylow subgroups by conjugation. Since an action
is a permutation, this gives a homomorphism f from G into S4 . By the first isomorphism
theorem, G/ ker(f ) ≅ im(f ).
However, since G is simple, the kernel must be trivial, and this implies that G would
imbed into S4 . This is impossible, since |G| = 60 > 24 = |S4 |. Therefore, there are 10
3-Sylow subgroups. Since each of these is cyclic of order 3, they intersect only in the
identity. Therefore, these 10 subgroups cover 20 distinct elements. Hence, together with
the elements in the 5-Sylow subgroups, we have 44 nontrivial elements.
The number of 2-Sylow subgroups is of the form 1 + 2k and divides 15. Hence, there
are 1, 3, 5, 15. We claim that there are 5. As before, there cannot be only 1, since G is sim-
ple. There cannot be 3, since as for the case of 3-Sylow subgroups, this would imply an
imbedding of G into S3 , which is impossible, given |S3 | = 6. Suppose that there were 15
2-Sylow subgroups, each of order 4. The intersections would have a maximum of 2 ele-
ments. Therefore, each of these would contribute at least 2 distinct elements. This gives
a minimum of 30 distinct elements. However, we already have 44 nontrivial elements
from the 3-Sylow and 5-Sylow subgroups. Since |G| = 60, this is too many. Therefore, G
must have 5 2-Sylow subgroups.
Now let G act on the set of 2-Sylow subgroups. This then, as above, implies an imbed-
ding of G into S5 , so we may consider G as a subgroup of S5 . However, the only subgroup
of S5 of order 60 is A5 ; therefore, G ≅ A5 .
The proof that A5 is the smallest non-Abelian simple group is actually brute force.
We show that any group G of order less than 60 either has prime order, or is nonsimple.
There are strong tools that we can use. By the Feit–Thompson theorem, we must only
consider groups of even order. From Theorem 13.4.4, we do not have to consider or-
ders 2p. The rest can be done by an analysis using Sylow theory. For example, we show
that any group of order 20 is nonsimple. Since 20 = 22 ⋅ 5, the number of 5-Sylow sub-
groups is 1 + 5k and divides 4. Hence, there is only one; therefore, it must be normal, and
so G is nonsimple. There is a strong theorem by Burnside, whose proof is usually done
with representation theory (see Chapter 22), which says that any group, whose order is
divisible by only two primes, is solvable. Therefore, for |G| = 60, we only have to show
that groups of order 30 = 2 ⋅ 3 ⋅ 5 and 42 = 2 ⋅ 3 ⋅ 7 are nonsimple. This is done in the same
manner as the first part of this proof. Suppose |G| = 30. The number of 5-Sylow sub-
groups is of the form 1 + 5k and divides 6. Hence, there are 1 or 6. If G were simple there
would have to be 6 covering 24 distinct elements. The number of 3-Sylow subgroups is
of the form 1 + 3k and divides 10; hence, there are 1 or 10. If there were 10 these would
cover an additional 20 distinct elements, which is impossible, since we already have 24
and G has order 30. Therefore, there is only one, hence a normal 3-Sylow subgroup. It fol-
lows that G cannot be simple. The case |G| = 42 is even simpler. There must be a normal
7-Sylow subgroup.
192 � 13 Group Actions and the Sylow Theorems
13.5 Exercises
1. Prove Lemma 13.1.3.
2. Let the group G act on itself by conjugation; that is, g(g1 ) = g −1 g1 g. Prove that this
is an action on the set G.
3. Show that the dihedral group Dn of order 2n has the presentation
⟨r, f ; r n = f 2 = (rf )2 = 1⟩
1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2
⟨a, c; a3 = c2 = (ac)2 = 1⟩
1 = 1, a = a, b = a2 , c = c, d = ac, e = a2 c,
and so a, c generate S3 .
Now from (ac)2 = acac = 1, we get that ca = a2 c. This implies that if we write any
sequence (or word in our later language) in a and c, we can also rearrange it so that
the only nontrivial powers of a are a and a2 ; the only powers of c are c, and all a terms
precede c terms. For example,
Therefore, using the three relations from the presentation above, each element of S3 can
be written as aα cβ with α = 0, 1, 2 and β = 0, 1. From this the multiplication of any two
elements can be determined.
https://doi.org/10.1515/9783111142524-014
194 � 14 Free Groups and Group Presentations
This type of argument exactly applies to all the dihedral groups Dn . We saw that, in
general, |Dn | = 2n. Since these are the symmetry groups of a regular n-gon, we always
have a rotation r of angle 2π n
about the center of the n-gon. This element r would have
order n. Let f be a reflection about any line of symmetry. Then f 2 = 1, and rf is a reflec-
tion about the rotated line, which is also a line of symmetry. Therefore, (rf )2 = 1. Exactly
as for S3 , the relation (rf )2 = 1 implies that fr = r −1 f = r n−1 f . This allows us to always
place r terms in front of f terms in any word on r and f . Therefore, the elements of Dn
are always of the form
rα f β , α = 0, 1, 2, . . . , n − 1, β = 0, 1.
Theorem 14.1.1. If Dn is the symmetry group of a regular n-gon, then a presentation for
Dn is given by
We now give one class of infinite examples. If G is an infinite cyclic group, so that
G ≅ ℤ, then G = ⟨g; ⟩ is a presentation for G. That is, G has a single generator with no
relations.
A direct product of n copies of ℤ is called a free Abelian group of rank n. We will
denote this by ℤn . A presentation for ℤn is then given by
We first show that given any set X, there does exist a free group with free basis X.
Let X = {xi }i∈I be a set (possibly empty). We will construct a group F(X), which is free
with free basis X. First, let X −1 be a set disjoint from X, but bijective to X. If xi ∈ X, then
we denote as xi−1 the corresponding element of X −1 under the bijection, and say that xi
and xi−1 are associated. The set X −1 is called the set of formal inverses from X, and we
14.2 Free Groups � 195
call X ∪ X −1 the alphabet. Elements of the alphabet are called letters. Hence, a letter has
ϵ
the form xi 1 , where ϵi = ±1. A word in X is a finite sequence of letters from the alphabet.
That is a word has the form
ϵi ϵi ϵ
w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in ,
1 2 n
where xij ∈ X, and ϵij = ±1. If n = 0, we call it the empty word, which we will denote as e.
The integer n is called the length of the word. Words of the form xi xi−1 or xi−1 xi are called
trivial words. We let W (X) be the set of all words on X.
If w1 , w2 ∈ W (X), we say that w1 is equivalent to w2 , denoted as w1 ∼ w2 , if w1 can
be converted to w2 by a finite string of insertions and deletions of trivial words. For
example, if w1 = x3 x4 x4−1 x2 x2 and w2 = x3 x2 x2 , then w1 ∼ w2 . It is straightforward to
verify that this is an equivalence relation on W (X) (see exercises). Let F(X) denote the
set of equivalence classes in W (X) under this relation; hence, F(X) is a set of equivalence
classes of words from X.
A word w ∈ W (X) is said to be freely reduced or reduced if it has no trivial subwords
(a subword is a connected sequence within a word). Hence, in the example above, w2 =
x3 x2 x2 is reduced, but w1 = x3 x4 x4−1 x2 x2 is not reduced. There is a unique element of
minimal length in each equivalence class in F(X). Furthermore, this element must be
reduced or else it would be equivalent to something of smaller length. Two reduced
words in W (X) are either equal or not in the same equivalence class in F(X). Hence,
F(X) can also be considered as the set of all reduced words from W (X).
ϵi ϵi ϵ
Given a word w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , we can find the unique reduced word w equivalent
1 2 n
to w via the following free reduction process. Beginning from the left side of w, we cancel
each occurrence of a trivial subword. After all these possible cancellations, we have a
word w′ . Now we repeat the process again, starting from the left side. Since w has finite
length, eventually the resulting word will either be empty or reduced. The final reduced
w is the free reduction of w.
Now we build a multiplication on F(X). If
ϵi ϵi ϵ ϵj ϵj ϵ
w1 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , w2 = xj 1 xj 2 ⋅ ⋅ ⋅ xj jm
1 2 n 1 2 m
are two words in W (X), then their concatenation w1 ⋆ w2 is simply placing w2 after w1 ,
ϵi ϵi ϵ ϵj ϵj ϵ
w1 ⋆ w2 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in xj 1 xj 2 ⋅ ⋅ ⋅ xj jm .
1 2 n 1 2 m
w1 w2 = equivalence class of w1 ⋆ w2 .
That is, we concatenate w1 and w2 , and the product is the equivalence class of the result-
ing word. It is easy to show that if w1 ∼ w1′ and w2 ∼ w2′ , then w1 ⋆ w2 ∼ w1′ ⋆ w2′ so that
the above multiplication is well defined. Equivalently, we can think of this product in
196 � 14 Free Groups and Group Presentations
the following way. If w1 , w2 are reduced words, then to find w1 w2 , first concatenate, and
ϵ ϵj
then freely reduce. Notice that if xi in xj 1 is a trivial word, then it is cancelled when the
n 1
concatenation is formed. We say then that there is cancellation in forming the product
w1 w2 . Otherwise, the product is formed without cancellation.
Theorem 14.2.2. Let X be a nonempty set, and let F(X) be as above. Then F(X) is a free
group with free basis X. Furthermore, if X = 0, then F(X) = {1}; if |X| = 1, then F(X) ≅ ℤ,
and if |X| ≥ 2, then F(X) is non-Abelian.
Proof. We first show that F(X) is a group, and then show that it satisfies the universal
mapping property on X. We consider F(X) as the set of reduced words in W (X) with
the multiplication defined above. Clearly, the empty word acts as the identity element 1.
ϵi ϵi ϵ −ϵi −ϵi
If w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in and w1 = xi in xi n−1 ⋅ ⋅ ⋅ xi 1 , then both w ⋆ w1 and w1 ⋆ w freely
−ϵ
1 2 n n n−1 1
reduce to the empty word, and so w1 is the inverse of w. Therefore, each element of
F(X) has an inverse. Therefore, to show that F(X) forms a group, we must show that the
multiplication is associative. Let
ϵi ϵi ϵ ϵj ϵj ϵ ϵk ϵk ϵk
w1 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , w2 = xj 1 xj 2 ⋅ ⋅ ⋅ xj jm , w3 = xk 1 xk 2 ⋅ ⋅ ⋅ xk p
1 2 n 1 2 m 1 2 p
ϵ ϵk
However, these are equal since xi in = xk 1 . Therefore, w1 (w2 w3 ) = (w1 w2 )w3 .
n 1
14.2 Free Groups � 197
It follows, inductively, from these four cases, that the associative law holds in F(X);
therefore, F(X) forms a group.
Now suppose that f : X → G is a map from X into a group G. By the construction of
F(X) as a set of reduced words this can be extended to a unique homomorphism. If w ∈ F
ϵi ϵ
with w = xi 1 ⋅ ⋅ ⋅ xi in , then define f (w) = f (xi1 )ϵi1 ⋅ ⋅ ⋅ f (xin )ϵin . Since multiplication in F(X)
1 n
is concatenation, this defines a homomorphism and again form the construction of F(X),
its the only one extending f . This is analogous to constructing a linear transformation
from one vector space to another by specifying the images of a basis. Therefore, F(X)
satisfies the universal mapping property of Definition 14.2.1. Hence, F(X) is a free group
with free basis X.
The final parts of Theorem 14.2.2 are straightforward. If X is empty, the only reduced
word is the empty word; hence, the group is just the identity. If X has a single letter, then
F(X) has a single generator, and is therefore cyclic. It is easy to see that it must be torsion-
free. Therefore, F(X) is infinite cyclic; that is, F(X) ≅ ℤ. Finally, if |X| ≥ 2, let x1 , x2 ∈ X.
Then x1 x2 ≠ x2 x1 , and both are reduced. Therefore, F(X) is non-Abelian.
The proof of Theorem 14.2.2 provides another way to look at free groups.
Theorem 14.2.3. F is a free group if and only if there is a generating set X such that every
element of F has a unique representation as a freely reduced word on X.
The structure of a free group is entirely dependent on the cardinality of a free basis.
In particular, the cardinality of a free basis X for a free group F is unique, and is called
the rank of F. If |X| < ∞, F is of finite rank. If F has rank n and X = {x1 , x2 , . . . , xn }, we
say that F is free on {x1 , x2 , . . . , xn }. We denote this by F(x1 , x2 , . . . , xn ).
Theorem 14.2.4. If X and Y are sets with the same cardinality, that is, |X| = |Y |, then
F(X) ≅ F(Y ), the resulting free groups are isomorphic. Furthermore, if F(X) ≅ F(Y ), then
|X| = |Y |.
Then N(X) is a normal subgroup, and the factor group F(X)/N(X) is Abelian, where
every nontrivial element has order 2 (see exercises). Therefore, F(X)/N(X) can be con-
sidered as a vector space over ℤ2 , the finite field of order 2, with X as a vector space
basis. Hence, |X| is the dimension of this vector space. Let N(Y ) be the corresponding
198 � 14 Free Groups and Group Presentations
subgroup of F(Y ). Since F(X) ≅ F(Y ), we would have F(X)/N(X) ≅ F(Y )/N(Y ); therefore,
|Y | is the dimension of the vector space F(Y )/N(Y ). Thus, |X| = |Y | from the uniqueness
of dimension of vector spaces.
Expressing elements of F(X) as a reduced word gives a normal form for elements in
a free group F. As we will see in Section 14.5, this solves what is termed the word problem
for free groups. Another important concept is the following: a freely reduced word W =
e e e
xv11 xv22 ⋅ ⋅ ⋅ xvnn is cyclically reduced if v1 ≠ vn , or if v1 = vn , then e1 ≠ −en . Clearly then,
every element of a free group is conjugate to an element given by a cyclically reduced
word. This provides a method to determine conjugacy in free groups.
Theorem 14.2.5. In a free group F, two elements g1 , g2 are conjugate if and only if a cycli-
cally reduced word for g1 is a cyclic permutation of a cyclically reduced word for g2 .
The theory of free groups has a large and extensive literature. We close this section
by stating several important properties. Proofs for these results can be found in [37], [36]
or [21].
Finally, a celebrated theorem of Nielsen and Schreier states that a subgroup of a free
group must be free.
Example 14.2.10. Let F be free on {a, b} and H = F(X 2 ) the normal subgroup of F gen-
erated by all squares in F.
Then F/F(X 2 ) = ⟨a, b; a2 = b2 = (ab)2 = 1⟩ = ℤ2 × ℤ2 (see Section 14.3 for the concept
of group presentations). It follows that a Schreier system for F modulo H is {1, a, b, ab}
with a = a, b = b and ba = ab. From this it can be shown that H is free on the generating
set
The theorem also allows for a computation of the rank of H, given the rank of F and
the index. Specifically:
From the example, we see that F is free of rank 2, H has index 4, so H is free of rank
2 ⋅ 4 − 4 + 1 = 5.
Theorem 14.3.1. Every group G is a homomorphic image of a free group. That is, let G be
any group. Then G = F/N, where F is a free group.
In the above theorem, instead of taking all the elements of G, we can consider just
a set X of generators for G. Then G is a factor group of F(X), G ≅ F(X)/N. The normal
subgroup N is the kernel of the homomorphism from F(X) onto G. We use Theorem 14.3.1
to formally define a group presentation.
If H is a subset of a group G, then the normal closure of H denoted by N(H) is the
smallest normal subgroup of G containing H. This can be described alternatively in the
following manner. The normal closure of H is the subgroup of G generated by all conju-
gates of elements of H.
Now suppose that G is a group with X, a set of generators for G. We also call X a gen-
erating system for G. Now let G = F(X)/N as in Theorem 14.3.1 and the comments after
200 � 14 Free Groups and Group Presentations
it. N is the kernel of the homomorphism f : F(X) → G. It follows that if r is a free group
word with r ∈ N, then r = 1 in G (under the homomorphism). We then call r a relator
in G, and the equation r = 1 a relation in G. Suppose that R is a subset of N such that
N = N(R), then R is called a set of defining relators for G. The equations r = 1, r ∈ R, are
a set of defining relations for G. It follows that any relator in G is a product of conjugates
of elements of R. Equivalently, r ∈ F(X) is a relator in G if and only if r can be reduced
to the empty word by insertions and deletions of elements of R, and trivial words.
Definition 14.3.2. Let G be a group. Then a group presentation for G consists of a set of
generators X for G and a set R of defining relators. In this case, we write G = ⟨X; R⟩. We
could also write the presentation in terms of defining relations as G = ⟨X; r = 1, r ∈ R⟩.
From Theorem 14.3.1, it follows immediately that every group has a presentation.
However, in general, there are many presentations for the same group. If R ⊂ R1 , then
R1 is also a set of defining relators.
Theorem 14.3.4. F is a free group if and only if F has a presentation of the form F = ⟨X; ⟩.
Mimicking the construction of a free group from a set X, we can show that to each
presentation corresponds a group. Suppose that we are given a supposed presentation
⟨X; R⟩, where R is given as a set of words in X. Consider the free group F(X) on X. Define
two words w1 , w2 on X to be equivalent if w1 can be transformed into w2 using insertions
and deletions of elements of R and trivial words. As in the free group case, this is an
equivalence relation. Let G be the set of equivalence classes. If we define multiplication
as before, as concatenation followed by the appropriate equivalence class, then G is a
group. Furthermore, each r ∈ R must equal the identity in G so that G = ⟨X; R⟩. Notice
that here there may be no unique reduced word for an element of G.
Theorem 14.3.5. Given (X, R), where X is a set and R is a set of words on X. Then there
exists a group G with presentation ⟨X; R⟩.
Fn = ⟨x1 , . . . , xn ; ⟩.
ℤn = ⟨x; x n = 1⟩.
Example 14.3.9. The dihedral groups of order 2n, representing the symmetry group of
a regular n-gon, has a presentation
In this section, we give a more complicated example, and then a nice application to num-
ber theory.
If R is a commutative ring with identity, then the set of invertible (n × n)-matrices
with entries from R forms a group under matrix multiplication called the n-dimen-
sional general linear group over R, see [41]. This group is denoted by GL(n, R). Since
det(A) det(B) = det(AB) for square matrices A, B, it follows that the subset of GL(n, R),
consisting of those matrices of determinant 1, forms a subgroup. This subgroup is called
the special linear group over R and is denoted by SL(n, R). In this section, we concentrate
on SL(2, ℤ), or more specifically, a quotient of it, PSL(2, ℤ), and find presentations for
them. The group SL(2, ℤ) then consists of (2 × 2)-matrices of determinant 1 with integral
entries:
a b
SL(2, ℤ) = {( ) : a, b, c, d ∈ ℤ, ad − bc = 1} .
c d
The group SL(2, ℤ) is called the homogeneous modular group, and an element of SL(2, ℤ)
is called a unimodular matrix. If G is any group, recall that its center Z(G) consists of those
elements of G, which commute with all elements of G:
The group Z(G) is a normal subgroup of G. Hence, we can form the factor group G/Z(G).
For G = SL(2, ℤ), the only unimodular matrices that commute with all others are
±I = ±( 01 01 ). Therefore, Z(SL(2, ℤ)) = {I, −I}. The quotient
is denoted by PSL(2, ℤ) and is called the projective special linear group or inhomogeneous
modular group. More commonly, PSL(2, ℤ) is just called the modular group, and denoted
by M.
M arises in many different areas of mathematics, including number theory, com-
plex analysis, and Riemann surface theory and the theory of automorphic forms and
202 � 14 Free Groups and Group Presentations
functions. M is perhaps the most widely studied single finitely presented group. Com-
plete discussions of M and its structure can be found in the books Integral Matrices by
M. Newman, see [56], and Algebraic Theory of the Bianchi Groups by B. Fine, see [51].
Since M = PSL(2, ℤ) = SL(2, ℤ)/{I, −I}, it follows that each element of M can be
considered as ±A, where A is a unimodular matrix. A projective unimodular matrix is
then
a b
±( ), a, b, c, d ∈ ℤ, ad − bc = 1.
c d
The elements of M can also be considered as linear fractional transformations over the
complex numbers
az + b
z′ = , a, b, c, d ∈ ℤ, ad − bc = 1, where z ∈ ℂ.
cz + d
Thought of in this way, M forms a Fuchsian group, which is a discrete group of isometries
of the non-Euclidean hyperbolic plane. The book by Katok, see [33], gives a solid and clear
introduction to such groups. This material can also be found in condensed form in [53].
We now determine presentations for both SL(2, ℤ) and M = PSL(2, ℤ).
0 −1 0 1
X =( ) and Y =( ).
1 0 −1 −1
Furthermore, a complete set of defining relations for the group in terms of these gen-
erators is given by
X 4 = Y 3 = YX 2 Y −1 X −2 = I.
⟨X, Y ; X 4 = Y 3 = YX 2 Y −1 X −2 = I⟩.
Proof. We first show that SL(2, ℤ) is generated by X and Y ; that is, every matrix A in the
group can be written as a product of powers of X and Y .
Let
1 1
U =( ).
0 1
Then a direct multiplication shows that U = XY , and we show that SL(2, ℤ) is generated
by X and U, which implies that it is also generated by X and Y . Furthermore,
14.3 Group Presentations � 203
1 n
Un = ( );
0 1
−c −d a + kc b + kd
XA = ( ), and U k A = ( )
a b c d
for any k ∈ ℤ. We may assume that |c| ≤ |a| otherwise start with XA rather than A. If
c = 0, then A = ±U q for some q. If A = U q , then certainly A is in the group generated by
X and U. If A = −U q , then A = X 2 U q since X 2 = −I. It follows that here also A is in the
group generated by X and U.
Now suppose c ≠ 0. Apply the Euclidean algorithm to a and c in the following mod-
ified way:
a = q0 c + r1
−c = q1 r1 + r2
r1 = q2 r2 + r3
..
.
(−1)n rn−1 = qn rn + 0,
Therefore,
A = X m U q0 XU q1 ⋅ ⋅ ⋅ XU qn XU qn+1
X 4 = Y 3 = YX 2 Y −1 X −2 = I
form a complete set of defining relations for SL(2, ℤ), or that every relation on these
generators is derivable from these. It is straightforward to see that X and Y do satisfy
these relations. Assume then that we have a relation
S = X ϵ1 Y α1 X ϵ2 Y α2 ⋅ ⋅ ⋅ Y αn X ϵn+1 = I
X 4 = Y 3 = YX 2 Y −1 X −2 = I,
S = X ϵ1 Y α1 XY α2 ⋅ ⋅ ⋅ Y αm X ϵm+1
Y α1 X ⋅ ⋅ ⋅ Y αm X = X α = S1
a −b
S1 = ( ).
−c d
a, b, c, d ≥ 0, b + c > 0,
or
a, b, c, d ≤ 0, b + c < 0.
1 0 −1 1
YX = ( ), and Y 2 X = ( ).
−1 1 0 −1
a −b1
Suppose it is correct for S2 = ( −c1 ). Then
1 d1
a1 −b1
YXS2 = ( ) and
−(a1 + c1 ) b1 + d1
−a1 − c1 b1 + d1
Y 2 XS2 = ( ).
c1 d1
Therefore, the claim is correct for all S1 with m ≥ 1. This gives a contradiction, for the
entries of X α with α = 0, 1, 2 or 3 do not satisfy the claim. Hence, m = 0, and S can be
reduced to a trivial relation by the given set of relations. Therefore, they are a complete
set of defining relations, and the theorem is proved.
M = ⟨x, y; x 2 = y3 = 1⟩.
14.3 Group Presentations � 205
1 1
x : z′ = − , and y : z′ = − .
z z+1
Proof. The center of SL(2, ℤ) is ±I. Since X 2 = −I, setting X 2 = I in the presentation for
SL2 (ℤ) gives the presentation for M. Writing the projective matrices as linear fractional
transformations gives the second statement.
This corollary says that M is the free product of a cyclic group of order 2 and a cyclic
group of order 3, a concept we will introduce in Section 14.7.
We note that there is an elementary alternative proof to Corollary 14.3.11 as far as
showing that X 2 = Y 3 = 1 are a complete set of defining relations. As linear fractional
transformations, we have
1 1 z+1
X(z) = − , Y (z) = − , Y 2 (z) = − .
z z+1 z
Now let
Then
S = Y α1 XY α2 ⋅ ⋅ ⋅ XY αn
with 1 ≤ αi ≤ 2 and α1 = αn .
In this second case, if x ∈ ℝ+ , then S(x) ∈ ℝ− ; hence, S ≠ 1.
This type of ping-pong argument can be used in many examples, see [36], [21]
and [31]. As another example, consider the unimodular matrices
0 1 0 −1
A=( ), B=( ).
−1 2 1 2
−n + 1 n −n + 1 −n
An = ( ), Bn = ( ) for n ∈ ℤ.
−n n+1 n n+1
n n
A (ℝ− ) ⊂ ℝ+ and B (ℝ+ ) ⊂ ℝ−
for all n ≠ 0. The ping-pong argument used for any element of the type
n m1 m nk+1
S = A 1B ⋅⋅⋅B kA
with all ni , mi ≠ 0 and n1 + nk+1 ≠ 0 shows that S(x) ∈ ℝ+ if x ∈ ℝ− . It follows that there
are no nontrivial relations on A and B; therefore, the subgroup of M generated by A, B
must be a free group of rank 2.
To close this section, we present a significant number of theoretical applications of
the modular group. First, we need the following corollary to Corollary 14.3.11:
Theorem 14.3.14 (Fermat’s two-square theorem). Let n > 0 be a natural number. Then
n = a2 + b2 with (a, b) = 1 if and only if −1 is a quadratic residue modulo n.
Proof. Suppose −1 is a quadratic residue modulo n, then there exists an x such that x 2 ≡
−1 (mod n) or x 2 = −1 + mn. This implies that −x 2 − mn = 1 so that there must exist a
projective unimodular matrix
x n
A = ±( ).
m −x
d −b
T −1 = ( ),
−c a
and
a b 0 1 d −b −(bd + ac) a2 + b2
TXT −1 = ( )( )( ) = ±( ). (∗)
c d −1 0 −c a −(c2 + d 2 ) bd + ac
Therefore, any conjugate of X must have the form (∗), and thus A also must have the
form (∗). Therefore, n = a2 + b2 . Furthermore, (a, b) = 1 since in finding the form (∗), we
had ad − bc = 1.
14.3 Group Presentations � 207
a b
T = ±( ).
c d
Then
α a2 + b2 α n
TXT −1 = ± ( ) = ±( ).
γ −α γ −α
This type of group theoretical proof can be extended in several directions. Kern-
Isberner and Rosenberger, see [34], considered groups of matrices of the form
a b√N
U =( ), a, b, c, d, N ∈ ℤ, ad − Nbc = 1,
c√N d
or
a√N b
U =( ), a, b, c, d, N ∈ ℤ, Nad − bc = 1.
c d √N
N ∈ {1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 16, 18, 22, 25, 28, 37, 58}
The proof of the above results depends on the class number of ℚ(√−N) (see [34]).
In another direction, Fine [50] and [49] showed that the Fermat two-square property
is actually a property satisfied by many rings R. These are called sum of squares rings.
For example, if p ≡ 3 (mod 4), then ℤpn for n > 1 is a sum of squares ring.
208 � 14 Free Groups and Group Presentations
Reidemeister–Schreier process
Let G, H and T be as above. Then H is generated by the set
with a complete set of defining relations given by conjugates of the original relators
rewritten in terms of the subgroup generating set.
To actually rewrite the relators in terms of the new generators, we use a mapping τ
on words on the generators of G called the Reidemeister rewriting process. This map is
defined as follows: If
e
W = ave11 ave22 ⋅ ⋅ ⋅ avjj with ei = ±1 defines an element of H
then
e e e
τ(W ) = St11,av St22,av ⋅ ⋅ ⋅ Stjj,av ,
1 2 j
We present two examples; one with a finite group, and then an important example
with a free group, which shows that a countable free group contains free subgroups of
arbitrary ranks.
Let H = A′4 be the commutator subgroup. We use the above method to find a presenta-
tion for H. Now
Therefore, |A4 : A′4 | = 3. A Schreier system is then {1, b, b2 }. The generators for A′4 are
then
Example 14.4.2. Let F = ⟨x, y; ⟩ be the free group of rank 2. Let H be the commutator
subgroup. Then
a free Abelian group of rank 2. It follows that H has infinite index in F. As Schreier coset
representatives, we can take
The relations are only trivial; therefore, H is free on the countable infinitely many gen-
erators above. It follows that a free group of rank 2 contains as a subgroup a free group
of countably infinite rank. Since a free group of countable infinite rank contains as sub-
groups free groups of all finite ranks, it follows that a free group of rank 2 contains as a
subgroup a free subgroup of any arbitrary finite rank.
210 � 14 Free Groups and Group Presentations
Theorem 14.4.3. Let F be free of rank 2. Then the commutator subgroup F ′ is free of count-
able infinite rank. In particular, a free group of rank 2 contains as a subgroup a free group
of any finite rank n.
Corollary 14.4.4. Let n, m be any pair of positive integers n, m ≥ 2 and Fn , Fm free groups
of ranks n, m, respectively. Then Fn can be embedded into Fm , and Fm can be embedded
into Fn .
Theorem 14.5.1. Suppose that K is a connected cell complex. Suppose that T is a maximal
tree within the 1-skeleton of K. Then a presentation for π(K) can be determined in the
following manner:
Generators: all edges outside of the maximal tree T.
Relations: (a) {u, v} = 1 if {u, v} is an edge in T.
(b) {u, v}{v, w} = {u, w} if u, v, w lie in a simplex of K.
Corollary 14.5.2. The fundamental group of a connected graph is free. Furthermore, its
rank is the number of edges outside a maximal tree.
p−1 (S) = ⋃ Si
Lemma 14.5.4. If K1 is a connected covering complex for K, then K1 and K have the same
dimension.
What is crucial in using covering complexes to study the fundamental group is that
there is a Galois theory of covering complexes and maps. The covering map p induces a
homomorphism of the fundamental group, which we will also call p. Then we have the
following:
Theorem 14.5.5. Let K1 be a covering complex of K with covering map p. Then p(π(K1 )) is
a subgroup of π(K). Conversely, to each subgroup H of π(K), there is a covering complex
K1 with π(K1 ) = H. Hence, there is a one-to-one correspondence between subgroups of the
fundamental group of a complex K and covers of K.
212 � 14 Free Groups and Group Presentations
We will see the analog of this theorem in regard to algebraic field extensions in
Chapter 15.
A topological space X is simply connected if π(X) = {1}. Hence, the covering com-
plex of K corresponding to the identity in π(K) is simply connected. This is called the
universal cover of K since it covers any other cover of K.
Based on Theorem 14.5.1, we get a very simple proof of the Nielsen–Schreier theo-
rem.
Proof. Let F be a free group. Then F = π(K), where K is a connected graph. Let H be a
subgroup of F. Then H corresponds to a cover K1 of K. But a cover is also 1-dimensional;
hence, H = π(K1 ), where K1 is a connected graph. Therefore, H is also free.
Theorem 14.5.7. Given an arbitrary presentation ⟨X; R⟩, there exists a connected 2-com-
plex K with π(K) = ⟨X; R⟩.
We note that the books by Rotman, see [43], and Fine, Moldenhauer, Rosenberger,
and Wienke, see [26], have significantly detailed and accessible descriptions of groups
and complexes. Cayley, and then Dehn, introduced for each group G a graph, now called
Cayley graph, as a tool to apply complexes to the study of G. The Cayley graph is actually
tied to a presentation, and not to the group itself. Gromov reversed the procedure and
showed that by considering the geometry of the Cayley graph, one could get information
about the group. This led to the development of the theory of hyperbolic groups.
In the following, we need a special kind of generating systems for finitely presented
groups G = ⟨X; R⟩. Let S ⊂ G be a generating system for G. Then S is called a valid
generating system if it has the following two properties:
(a) 1 ∉ S where 1 is the neutral element of G.
(b) the set S is a symmetric generating system, that is, if γ ∈ S then also γ−1 ∈ S.
In the following, the pair (G, S) denotes a finitely presented group G together with a valid
generating system S. Given such a pair we define a metric on G with respect to S in the
following way. Let (G, S) be a pair as above. Then define lS : G → [0, ∞) as follows: If
γ ∈ G, then lS (γ) = 1 if γ = 1, and if γ ≠ 1 then let lS (γ) be the minimal length of a word
that is completely constructed of elements from S that represent γ. This length is also
called S-length.
14.5 Geometric Interpretation � 213
We now define the desired metric dS : G × G → [0, ∞) via dS (γ1 , γ2 ) = lS (γ1−1 γ2 ) and
check that dS is indeed a metric:
1. The equivalence lS (γ) = 0 if and only if γ = 1 implies the equivalence dS (γ1 , γ2 ) = 0
if and only if γ1 = γ2 .
2. We have dS (γ1 , γ2 ) = lS (γ1−1 γ2 ) = lS (γ2−1 γ1 ) = dS (γ2 , γ1 ), because S is symmetric.
3. We have dS (γ1 , γ2 ) ≤ dS (γ1 , β) + dS (β, γ2 ) for all γ1 , γ2 , β ∈ G as γ1−1 γ2 = γ1−1 ββ−1 γ2 .
The construction of the Cayley graph depends on the choice of S as well as on the metric
on (G, S). We would like to have an equivalence relation that permits to connect the
different metric spaces for G if we alter S.
Definition 14.5.8. Let (X, d) and (X ′ , d ′ ) be metric spaces. Then (X, d) are (X ′ , d ′ ) are
quasi-isometric, if there are functions f : X → X ′ and g: X ′ → X together with constants
λ > 0 and C ≥ 0, such that
(a) d ′ (f (x), f (y)) ≤ λd(x, y) + C for all x, y ∈ X,
(b) d(g(x ′ ), g(y′ )) ≤ λd ′ (x ′ , y′ ) + C for all x ′ , y′ ∈ X ′ ,
(c) d(g(f (x)), x) ≤ C for all x ∈ X, and
(d) d ′ (f (g(x ′ )), x ′ ) ≤ C for all x ′ ∈ X ′ .
f f′
tions X X ′ , X ′ X ′′ , and constants λ, C and λ′ , C ′ respectively, such that the con-
g g′
f ′′
ditions (a)–(d) are satisfied. We look for functions X X ′′ and constants λ′′ , C ′′ , such
g ′′
that conditions (a)–(d) are satisfied again. Set f = f ∘ f and g ′′ = g ∘ g ′ , λ′′ = λλ′ and
′′ ′
d ′ (g ′ ∘ f ′ (f (x)), f (x)) ≤ C ′ ,
which gives
Theorem 14.5.10. Let G be a group of finitely presented group with finite valid generating
systems S and S ′ . Then the metric spaces (G, S) and (G, S ′ ) are quasi-isometric.
The proof of (b) is analogous and that of (c) and (d) is obvious because f and g are in-
verses for each other.
We observe: The quasi-isometry class of the metric spaces for (G, S), S finite, is an
invariant of the group G and does not depend on the finite generating set S.
We ask: Is this invariant suitable in order to study group theoretical properties of G
and to what extent does quasi-isometry preserve group theoretic properties?
We call two finitely presented groups G1 and G2 quasi-isometric, if the metric spaces
for (G1 , S1 ), S1 a valid generating set for G1 , and (G2 , S2 ), S2 a valid generating set for G2 ,
are quasi-isometric.
Aiming at the motivation of hyperbolic groups we first have to describe a hyperbolic
metric space.
Note that the definition explicitly allows degenerated triangles, for instance, take
y = z and the geodesic segments from x to y and x to z are different.
An example of a geodesic space is the Cayley graph for a finitely presented group. If
the Cayley graph is not a tree, then it contains a circle (or embedded loop). Hence, there
is more than one geodesic segment allowed between the same pair of points.
We fix the following notation: Let x0 , x1 ∈ X for a geodesic space X. Although several
geodesic segments in X with start points x0 and end points x1 are allowed, we denote by
[x0 , x1 ] a given geodesic segment with x0 and x1 as start and end points.
Definition 14.5.12. Let δ ≥ 0. We say that a geodesic space X satisfies the Rips condition
for the constant δ if for every geodesic triangle [x, y] ∪ [y, z] ∪ [z, x] in X and for every
u ∈ [x, y] the following holds: d(u, [y, z] ∪ [z, x]) ≤ δ, see Figure 14.1. We call a geodesic
space X hyperbolic if it satisfies the Rips condition for a constant δ ≥ 0.
u
y
x Figure 14.1: Geodesic triangle.
216 � 14 Free Groups and Group Presentations
Theorem 14.5.13. Let X1 and X2 be geodesic spaces that are quasi-isometric. If X1 is hy-
perbolic then also X2 is hyperbolic.
Definition 14.5.14. Let Γ be a group of finite type. Γ is called hyperbolic group if there is
a finite generating system S such that the metric space for (Γ, S)—or the Cayley graph
for (Γ, S)—is a hyperbolic space.
A proof is given in [26]. Hyperbolic groups have many other important properties
(see, for instance, [26]). We end this section with a collection of examples of hyperbolic
groups.
Example 14.5.16. The following groups are hyperbolic. For proofs see [26].
1. Finite groups and infinite cyclic groups.
2. Fundamental groups of compact, connected Riemann manifolds. Especially, co-
compact Fuchsian and Kleinian groups.
3. One-relator groups with torsion.
4. Free products of finitely many hyperbolic groups, see Section 14.8.
5. A group G of F-type is a group with a presentation
r
G = ⟨a1 , . . . , an ; a11 = ⋅ ⋅ ⋅ = anrn = u(a1 , . . . , ap )v(ap+1 , . . . , an ) = 1⟩
Theorem 14.6.1 (Dyck’s theorem). Let G = ⟨X; R⟩, and suppose that H ≅ G/N, where N is
a normal subgroup of G. Then a presentation for H is ⟨X; R ∪ R1 ⟩ for some set of words R1
on X. Conversely, the presentation ⟨X; R ∪ R1 ⟩ defines a group, that is, a factor group of G.
Proof. Since each element of H is a coset of N, they have the form gN for g ∈ G. It is clear
then that the images of X generate H. Furthermore, since H is a homomorphic image of
G, each relator in R is a relator in H. Let N1 be a set of elements that generate N, and
let R1 be the corresponding words in the free group on X. Then R1 is an additional set of
relators in H. Hence, R ∪ R1 is a set of relators for H. Any relator in H is either a relator
in G, hence a consequence of R, or can be realized as an element of G that lies in N, and
therefore a consequence of R1 . Therefore, R ∪ R1 is a complete set of defining relators for
H, and H has the presentation H = ⟨X; R ∪ R1 ⟩.
Conversely, G = ⟨X; R⟩, G1 = ⟨X; R ∪ R1 ⟩. Then G = F(X)/N1 , where N1 = N(R), and
G1 = F(X)/N2 , where N2 = N(R ∪ R1 ). Hence, N1 ⊂ N2 . The normal subgroup N2 /N1 of
F(X)/N1 corresponds to a normal subgroup of H of G, and therefore by the isomorphism
theorem
All three of these problems have negative answers in general. That is, for each of these
problems one can find a finite presentation, for which these questions cannot be an-
swered algorithmically (see [36]). Attempts for solutions, and for solutions in restricted
cases, have been of central importance in combinatorial group theory. For this reason
combinatorial group theory has always searched for and studied classes of groups, in
which these decision problems are solvable.
For finitely generated free groups, there are simple and elegant solutions to all three
problems. If F is a free group on x1 , . . . , xn and W is a freely reduced word in x1 , . . . , xn ,
then W ≠ 1 if and only if L(W ) ≥ 1 for L(W ) the length of W . Since freely reducing
any word to a freely reduced word is algorithmic, this provides a solution to the word
e e e
problem. Furthermore, a freely reduced word W = xv11 xv22 ⋅ ⋅ ⋅ xvnn is cyclically reduced if
v1 ≠ vn , or if v1 = vn , then e1 ≠ −en . Clearly then, every element of a free group is
conjugate to an element given by a cyclically reduced word called a cyclic reduction.
This leads to a solution to the conjugacy problem. Suppose V and W are two words in
the generators of F and V , W are respective cyclic reductions. Then V is conjugate to W
if and only if V is a cyclic permutation of W . Finally, two finitely generated free groups
are isomorphic if and only if they have the same rank.
Definition 14.8.1. The free product of A and B, denoted by A ∗ B, is the group G with
the presentation ⟨a1 , . . . , b1 , . . . ; R1 , . . . , S1 , . . .⟩; that is, the generators of G consist of the
disjoint union of the generators of A and B with relators taken as the disjoint union of
the relators Ri of A and Sj of B. A and B are called the factors of G.
Free products exist and are nontrivial. In that regard, we have the following:
Theorem 14.8.3. Let G = A ∗ B. Then the maps A → G and B → G are injections. The
subgroup of G generated by the generators of A has the presentation ⟨generators of A;
relators of A⟩, that is, is isomorphic to A. Similarly for B. Thus, A and B can be considered
as subgroups of G. In particular, A ∗ B is nontrivial if A and B are.
14.8 Group Amalgams: Free Products and Direct Products � 219
Free products share many properties with free groups. First of all there is a categor-
ical formulation of free products. Specifically we have the following:
Theorem 14.8.4. A group G is the free product of its subgroups A and B if A and B generate
G, and given homomorphisms f1 : A → H, f2 : B → H into a group H, there exists a unique
homomorphism f : G → H, extending f1 and f2 .
Secondly, each element of a free product has a normal form related to the reduced
words of free groups. If G = A ∗ B, then a reduced sequence or reduced word in G is a
sequence g1 g2 . . . gn , n ≥ 0, with gi ≠ 1, each gi in either A or B and gi , gi+1 not both in the
same factor. Then the following hold:
Theorem 14.8.7. If two elements of a free product commute, then they are both powers
of a single element or are contained in a conjugate of an Abelian subgroup of a factor.
Theorem 14.8.8 (Kurosh). A subgroup of a free product is also a free product. Explicitly,
if G = A ∗ B and H ⊂ G, then
H = F ∗ (∗Aα ) ∗ (∗Bβ ),
where F is a free group, (∗Aα ) is a free product of conjugates of subgroups of A, and (∗Bβ )
is a free product of conjugates of subgroups of B.
We note that the rank of F and the number of the other factors can be computed.
A complete discussion of these is in [37], [36] and [21].
If A and B are disjoint groups, then we now have two types of products forming new
groups out of them: the free product and the direct product. In both these products, the
original factors inject. In the free product, there are no relations between elements of A
and elements of B, whereas in a direct product, each element of A commutes with each
element of B. If a ∈ A and b ∈ B, a cross commutator is [a, b] = aba−1 b−1 . The direct
220 � 14 Free Groups and Group Presentations
product is a factor group of the free product, and the kernel is precisely the normal
subgroup generated by all the cross commutators.
A × B = (A ⋆ B)/H,
14.9 Exercises
1. Let X −1 be a set disjoint from X, but bijective to X. A word in X is a finite sequence
of letters from the alphabet. That is, a word has the form
ϵi ϵi ϵ
w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in ,
1 2 n
where xij ∈ X, and ϵij = ±1. Let W (X) be the set of all words on X.
If w1 , w2 ∈ W (X), we say that w1 is equivalent to w2 , denoted by w1 ∼ w2 , if w1 can be
converted to w2 by a finite string of insertions and deletions of trivial words. Verify
that this is an equivalence relation on W (X).
2. In F(X), let N(X) be the subgroup generated by all squares in F(X); that is,
Show that N(X) is a normal subgroup, and that the factor group F(X)/N(X) is
Abelian, where every nontrivial element has order 2.
3. Show that a free group F is torsion-free.
4. Let F be a free group, and a, b ∈ F. Show: If ak = bk , k ≠ 0, then a = b.
5. Let F = ⟨a, b; ⟩ be a free group with basis {a, b}. Let ci = a−i bai , i ∈ ℤ. Show that
then G = ⟨ci , i ∈ ℤ⟩ is free with basis {ci | i ∈ ℤ}.
6. Show that ⟨x, y; x 2 y3 , x 3 y4 ⟩ ≅ ⟨x; x⟩ = {1}.
7. Let G = ⟨v1 , . . . , vn ; v21 ⋅ ⋅ ⋅ v2n ⟩, n ≥ 1, and α : G → ℤ2 be the epimorphism with
α(vi ) = −1 for all i. Let U be the kernel of α. Show that then U has a presentation
https://doi.org/10.1515/9783111142524-015
222 � 15 Finite Galois Extensions
is called the set of automorphisms of L over K. Notice that if α ∈ Aut(L|K), then α(k) = k
for all k ∈ K.
Lemma 15.2.2. Let L|K be a field extension. Then Aut(L|K) forms a group called the Galois
group of L|K.
Proof. Aut(L|K) ⊂ Aut(L). Hence, to show that Aut(L|K) is a group, we only have to show
that its a subgroup of Aut(L). Now the identity map on L is certainly the identity map on
K, so 1 ∈ Aut(L|K); hence, Aut(L|K) is nonempty. If α, β ∈ Aut(L|K), then consider α−1 β.
If k ∈ K, then β(k) = k, and α(k) = k, so α−1 (k) = k.
Therefore, α−1 β(k) = k for all k ∈ K, and hence α−1 β ∈ Aut(L|K). It follows that
Aut(L|K) is a subgroup of Aut(L), and therefore a group.
If f (x) ∈ K[x] \ K and L is the splitting field of f (x) over K, then Aut(L|K) is also
called the Galois group of f (x).
Proof. We must show that any automorphism of a prime field P is the identity. Now
if α ∈ Aut(L), then α(1) = 1, and so α(n ⋅ 1) = n ⋅ 1. Therefore, in P, α fixes all integer
multiples of the identity. However, every element of P can be written as a quotient m⋅1
n⋅1
of
integer multiples of the identity. Since α is a field homomorphism and α fixes both the
top and the bottom, it follows that α will fix every element of this form, and hence fix
each element of P.
For splitting fields, the Galois group is a permutation group on the zeros of the defin-
ing polynomial.
Theorem 15.2.4. Let f (x) ∈ K[x] and L the splitting field of f (x) over K. Suppose that f (x)
has zeros α1 , . . . , αn ∈ L.
(a) Then each ϕ ∈ Aut(L|K) is a permutation on the zeros. In particular, Aut(L|K) is
isomorphic to a subgroup of Sn and uniquely determined by the zeros of f (x).
(b) If f (x) is irreducible, then Aut(L|K) operates transitively on {α1 , . . . , αn }. Hence, for
each i, j, there is a ϕ ∈ Aut(L|K) such that ϕ(αi ) = αj .
(c) If f (x) = b(x − α1 ) ⋅ ⋅ ⋅ (x − αn ) with α1 , . . . , αn pairwise distinct and Aut(L|K) operates
transitively on α1 , . . . , αn , then f (x) is irreducible.
Example 15.2.5. Let f (x) = (x 2 − 2)(x 2 − 3) ∈ ℚ[x]. The field L = ℚ(√2, √3) is the spitting
field of f (x).
Over L, we have
Proof. First, we show that |Aut(L)| ≤ 4. Let α ∈ Aut(L). Then α is uniquely determined
by α(√2) and α(√3), and
2 2 2
α(2) = 2 = (√2) = α(√2 ) = (α(√2)) .
Hence, α(√2) = ±√2. Analogously, α(√3) = ±√3. From this it follows that |Aut(L)| ≤ 4.
Furthermore, α2 = 1 for any α ∈ G.
Next we show that the polynomial f (x) = x 2 − 3 is irreducible over K = ℚ(√2).
Assume that x 2 − 3 were reducible over K. Then √3 ∈ K. This implies that √3 = ab + dc √2
with a, b, c, d ∈ ℤ and b ≠ 0 ≠ d, and gcd(c, d) = 1. Then bd √3 = ad + bc√2, hence
3b2 d 2 = a2 b2 + 2b2 c2 + 2√2adbc. Since bd ≠ 0, this implies that we must have ac = 0.
If c = 0, then √3 = ab ∈ ℚ, a contradiction. If a = 0, then √3 = dc √2, which implies
3d 2 = 2c2 . It follows from this that 3| gcd(c, d) = 1, again a contradiction.
Hence f (x) = x 2 − 3 is irreducible over K = ℚ(√2).
Since L is the splitting field of f (x) and f (x) is irreducible over K, then there exists
an automorphism α ∈ Aut(L) with α(√3) = −√3 and α|K = IK ; that is, α(√2) = √2.
Analogously, there is a β ∈ Aut(L) with β(√2) = −√2 and β(√3) = √3.
Clearly, α ≠ β, αβ = βα and α ≠ αβ ≠ β. It follows that Aut(L) = {1, α, β, αβ}, complet-
ing the proof.
224 � 15 Finite Galois Extensions
Theorem 15.3.1. For a G ⊂ Aut(K), the set Fix(K, G) is a subfield of K called the fix field
of G over K.
Proof. 1 ∈ K is in Fix(K, G), so Fix(K, G) is not empty. Let k1 , k2 ∈ Fix(K, G), and let g ∈ G.
Then g(k1 ± k2 ) = g(k1 ) ± g(k2 ) since g is an automorphism.
Then g(k1 ) ± g(k2 ) = k1 ± k2 , and it follows that k1 ± k2 ∈ Fix(K, G). In an analogous
manner, k1 k2−1 ∈ Fix(K, G) if k2 ≠ 0; therefore, Fix(K, G) is a subfield of K.
Definition 15.3.2. The extension L|K is a (finite) Galois extension if there exists a finite
subgroup G ⊂ Aut(L) such that K = Fix(L, G).
Lemma 15.3.3. Let L = ℚ(√2, √3) and K = ℚ. Then L|K is a Galois extension.
Proof. Let G = Aut(L|K). From the example in the previous section, there are automor-
phisms α, β ∈ G with
We have
Theorem 15.4.1 (Fundamental theorem of Galois theory). Let L|K be a Galois extension
with Galois group G = Aut(L|K). For each intermediate field E, let τ(E) be the subgroup
of G fixing E. Then the following hold:
(1) τ is a bijection between intermediate fields containing K and subgroups of G.
(2) L|K is a finite extension, and if M is an intermediate field, then |L : M| = |Aut(L|M)|
and |M : K| = |Aut(L|K) : Aut(L|M)|.
(3) If M is an intermediate field, then the following hold:
(a) L|M is always a Galois extension.
(b) M|K is a Galois extension if and only if Aut(L|M) is a normal subgroup of
Aut(L|K).
(4) If M is an intermediate field and M|K is a Galois extension we have the following:
(a) α(M) = M for all α ∈ Aut(L|K).
(b) The map ϕ : Aut(L|K) → Aut(M|K) with ϕ(α) = α|M = β is an epimorphism.
(c) Aut(M|K) = Aut(L|K)/ Aut(L|M).
(5) The lattice of subfields of L containing K is the inverted lattice of subgroups of
Aut(L|K).
We will prove this main result via a series of theorems, and then combine them all.
Theorem 15.4.2. Let G be a group, K a field, and α1 , . . . , αn pairwise distinct group ho-
momorphisms from G to K ⋆ , the multiplicative group of K. Then α1 , . . . , αn are linearly
independent elements of the K-vector space of all homomorphisms from G to K.
then we must show that all ki = 0. Since α1 ≠ αn , there exists an a ∈ G such that α1 (a) ≠
αn (a). Let g ∈ G and apply the sum above to ag. We get
n
∑ ki (αi (a))(αi (g)) = 0. (∗∗)
i=1
If we subtract equation (∗∗∗) from equation (∗∗), then the last term vanishes and we
have an equation in the n − 1 homomorphism α1 , . . . , αn−1 . Since these are linearly inde-
pendent, we obtain
for the coefficient for α1 . Since α1 (a) ≠ αn (a), we must have k1 = 0. Now α2 , . . . , αn−1 are
by assumption linearly independent, so k2 = ⋅ ⋅ ⋅ = kn = 0 also. Hence, all the coefficients
must be zero, and therefore the mappings are independent.
Theorem 15.4.3. Let α1 , . . . , αn be pairwise distinct monomorphisms from the field K into
the field K ′ . Let
Proof. Certainly L is a field. Assume that r = |K : L| < n, and let {a1 , . . . , ar } be a basis
of the L-vector space K. We consider the following system of linear equations with r
equations and n unknowns:
n n r r n
∑ xi (αi (a)) = ∑ xi (∑ αi (lj )αi (aj )) = ∑(α1 (lj )) ∑ xi (αi (aj )) = 0
i=1 i=1 j=1 j=1 i=1
since α1 (lj ) = αi (lj ) for i = 2, . . . , n. This holds for all a ∈ K, and hence ∑ni=1 xi αi = 0,
contradicting Theorem 15.4.2. Therefore, our assumption that |K : L| < n must be false,
and hence |K : L| ≥ n.
15.4 The Fundamental Theorem of Galois Theory � 227
Definition 15.4.4. Let K be a field and G a finite subgroup of Aut(K). The map
trG : K → K, given by
K : Fix(K, G) = |G|.
Proof. Let L = Fix(K, G), and suppose that |G| = n. From Theorem 15.4.3, we know that
|K : L| ≥ n. We must show that |K : L| ≤ n.
Suppose that G = {α1 , . . . , αn }. To prove the result, we show that if m > n and
a1 , . . . , am ∈ K, then a1 , . . . , am are linearly dependent.
We consider the system of equations
(x1 , . . . , xm ) = ky−1
l (y1 , . . . , ym ).
This m-tuple (x1 , . . . , xm ) is then also a nontrivial solution of the system of equations
considered above.
228 � 15 Finite Galois Extensions
Then we have
Summation leads to
m n m
0 = ∑ aj ∑(αi (xj )) = ∑(trG (xj ))aj
j=1 i=1 j=1
Aut(K|Fix(K, G)) = G.
From Theorem 15.4.3, we have that |K : Fix(K, G)| ≥ n+1. However, from Theorem 15.4.6,
|K : Fix(K, G)| = n, getting a contradiction.
Suppose that L|K is a Galois extension. We now establish that the map τ between
intermediate fields K ⊂ E ⊂ L and subgroups of Aut(L|K) is a bijection.
Theorem 15.4.8. Let L|K be a Galois extension. Then we have the following:
(1) Aut(L|K) is finite and
Fix(L, Aut(L|K)) = K.
Aut(L|Fix(L, H)) = H.
15.4 The Fundamental Theorem of Galois Theory � 229
Proof. If (L|K) is a Galois extension, there exists a finite subgroup G of Aut(L) with K =
Fix(K, G). From Theorem 15.4.7, we have G = Aut(L|K). In particular, Aut(L|K) is finite,
and K = Fix(L, Aut(L|K)).
Now, let H ⊂ Aut(L|K). From the first part, H is finite, and then Aut(L|Fix(L, H)) = H
from Theorem 15.4.7.
Theorem 15.4.9. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a Galois extension.
(2) |L : K| = |Aut(L|K)| < ∞.
(3) |Aut(L|K)| < ∞, and K = Fix(L, Aut(L|K)).
Proof. (1) ⇒ (2): Now, from Theorem 15.4.8, |Aut(L|K)| < ∞, and Fix(L, Aut(L|K)) = K.
Therefore, from Theorem 15.4.6, |L : K| = |Aut(L|K)|.
(2) ⇒ (3): Let G = Aut(L|K). Then K ⊂ Fix(L, G) ⊂ L. From Theorem 15.4.6, we have
L : Fix(L, G) = |G| = |L : K|.
(3) ⇒ (1) follows directly from the definition completing the proof.
We now show that if L|K is a Galois extension, then L|M is also a Galois extension
for any intermediate field M.
Proof. Let G = Aut(L|K). Then, from Theorem 15.4.9, |G| < ∞, and K = Fix(L, G). Define
H = Aut(L|M) and M ′ = Fix(L, H). We must show that M ′ = M for then L|M is a Galois
extension.
Since the elements of H fix M, we have M ⊂ M ′ . Let G = ⋃ri=1 αi H, a disjoint union
of the cosets of H. Let α1 = 1, and define βi = αi|M . The β1 , . . . , βr are pairwise distinct for
if βi = βj ; that is αi|M = αj|M . Then αj−1 αi ∈ H, so αi and αj are in the same coset.
We claim that
M ∩ Fix(L, G) = M ∩ K = K.
since
Aut(L|α(M)) = α Aut(L|M)α−1 .
Proof. Now, β ∈ Aut(L|α(M)) if and only if β(α(a)) = α(a) for all a ∈ M. This occurs if
and only if α−1 βα(a) = a for all a ∈ M, which is true if and only if β ∈ α Aut(L|M)α−1 .
Proof. (1) ⇒ (2): Suppose that M|K is a Galois extension. Let Aut(M|K) = {α1 , . . . , αr }.
Consider the αi as monomorphisms from M into L. Let αr+1 : M → L be a monomorphism
with αr+1|K = 1. Then
since M|K is a Galois extension. Therefore, from Theorem 15.4.3, we have that if the
α1 , . . . , αr , αr+1 are distinct, then
|M : K| ≥ r + 1 > r = Aut(M|K) = |M : K|,
Aut(L|α(M)) = Aut(L|M).
Now L|M and L|α(M) are Galois extensions by Theorem 15.4.10. Therefore,
We now combine all of these results to give the proof of Theorem 15.4.1, the funda-
mental theorem of Galois theory.
232 � 15 Finite Galois Extensions
For (3), let M be an intermediate field of L|K. From Theorem 15.4.10, we have that
L|M is a Galois extension, hence (a) holds. From Theorem 15.4.13, M|K is a Galois exten-
sion if and only if Aut(L|M) is a normal subgroup of Aut(L|K), that is, (b) holds.
For (4), let M|K be a Galois extension. Assertion (a) holds because α(M) = M for all
α ∈ Aut(L|K) by Theorem 15.4.13. The map ϕ : Aut(L|K) → Aut(M|K) with ϕ(α) = α|M = β
is an epimorphism by Lemma 15.4.12 and Theorem 15.4.13, hence (b) holds. Assertion (c),
that is, Aut(M|K) = Aut(L|K)/ Aut(L|M), follows directly from the group isomorphism
theorem.
That the lattice of subfields of L containing K is the inverted lattice of subgroups
of Aut(L|K) follows directly from the previous results, this shows (5) and finishes the
proof.
In Chapter 8, we looked at Example 8.1.7. Here, we analyze it further using the Galois
theory.
Example 15.4.14. Let f (x) = x 3 − 7 ∈ ℚ[x]. This has no zeros in ℚ, and since it is of
degree 3, it follows that it must be irreducible in ℚ[x].
Let ω = − 21 + 23 i ∈ ℂ. Then it is easy to show by computation that
√
1 √3
ω2 = − − i and ω3 = 1.
2 2
Therefore, the three zeros of f (x) in ℂ are
Hence, L = ℚ(a1 , a2 , a3 ) is the splitting field of f (x). Since the minimal polynomial
of all three zeros over ℚ is the same f (x), it follows that
The question then arises as to whether these are all the intermediate fields. The
answer is yes, which we now prove.
Let G = Aut(L|ℚ) = Aut(L). (Aut(L|ℚ) = Aut(L), since ℚ is a prime field.) Now
G ≅ S3 . G acts transitively on {a1 , a2 , a3 }, since f is irreducible. Let δ : ℂ → ℂ be the
automorphism of ℂ taking each element to its complex conjugate; that is, δ(z) = z. Then
δ(f ) = f and δ|L ∈ G (Theorem 8.2.2). Since a1 ∈ ℝ, we get that δ|{a1 ,a2 ,a3 } = (a2 , a3 ), the
2-cycle that maps a2 to a3 and a3 to a2 . Since G is transitive on {a1 , a2 , a3 }, there is a τ ∈ G
with τ(a1 ) = a2 .
Case 1: τ(a3 ) = a3 . Then τ = (a1 , a2 ), and (a1 , a2 )(a2 , a3 ) = (a1 , a2 , a3 ) ∈ G.
Case 2: τ(a3 ) ≠ a3 . Then τ is a 3-cycle. In either case, G is generated by a transposition
and a 3-cycle. Hence, G is all of S3 . Then L|ℚ is a Galois extension from Theorem 15.4.9,
since |G| = |L : ℚ|.
The subgroups of S3 are as follows:
234 � 15 Finite Galois Extensions
Hence, the above lattice of fields is complete. L|ℚ, ℚ|ℚ, ℚ(ω)|ℚ and L|ℚ(ai ) are Ga-
lois extensions, whereas ℚ(ai )|ℚ with i = 1, 2, 3 are not Galois extensions.
15.5 Exercises
1. Let K ⊂ M ⊂ L be a chain of fields, and let ϕ : Aut(L|K) → Aut(M|K) be defined by
ϕ(α) = α|M . Show that ϕ is an epimorphism with kernel ker(ϕ) = Aut(L|M).
1 1
2. Show that ℚ(5 4 )|ℚ(√5) and ℚ(√5)|ℚ are Galois extensions, and ℚ(5 4 )|ℚ is not a
Galois extension.
3. Let L|K be a field extension and u, v ∈ L algebraic over K with |K(u) : K| = m and
|K(v) : K| = n. If m and n are coprime, then |K(u, v) : K| = n ⋅ m.
1 1
4. Let p, q be prime numbers with p ≠ q. Let L = ℚ(√p, q 3 ). Show that L = ℚ(√p ⋅ q 3 ).
1
Determine a basis of L over ℚ and the minimal polynomial of √p ⋅ q 3 .
1
5. Let K = ℚ(2 n ) with n ≥ 2.
(i) Determine the number of ℚ-embeddings σ : K → ℝ. Show that for each such
embedding, we have σ(K) = K.
(ii) Determine Aut(K|ℚ).
6. Let α = √5 + 2√5.
(i) Determine the minimal polynomial of α over ℚ.
(ii) Show that ℚ(a)|ℚ is a Galois extension.
(iii) Determine Aut(ℚ(a)|ℚ).
7. Let K be a field of prime characteristic p, and let f (x) = x p − x + a ∈ K be an
irreducible polynomial. Let L = K(v), where v is a zero of f (x).
(i) If α is a zero of f (x), then also α + 1 is.
(ii) L|K is a Galois extension.
(iii) There is exactly one K-automorphism σ of L with σ(v) = v + 1.
(iv) The Galois group Aut(L|K) is cyclic with generating element σ.
16 Separable Field Extensions
16.1 Separability of Fields and Polynomials
In the previous chapter, we introduced and examined Galois extensions. Recall that L|K
is a Galois extension if there exists a finite subgroup G ⊂ Aut(L) with K = Fix(L, G). The
following questions logically arise:
(1) Under what conditions is a field extension L|K a Galois extension?
(2) If L|K is a Galois extension when L is the splitting field of a polynomial f (x) ∈ K[x]?
In this chapter, we consider these questions and completely characterize Galois exten-
sions. To do this, we must introduce separable extensions.
Definition 16.1.1. Let K be a field. Then a nonconstant polynomial f (x) ∈ K[x] is called
separable over K if each irreducible factor of f (x) has only simple zeros in its splitting
field.
Definition 16.1.2. Let L|K be a field extension and a ∈ L. Then a is separable over K if a
is a zero of a separable polynomial. The field extension L|K is a separable field extension,
or just separable if all a ∈ L are separable over K. In particular, a separable extension is
an algebraic extension.
and
Lemma 16.1.4. Let K be a field and f (x) an irreducible nonconstant polynomial in K[x].
Then f (x) is separable if and only if its formal derivative is nonzero.
https://doi.org/10.1515/9783111142524-016
236 � 16 Separable Field Extensions
Proof. Let L be the splitting field of f (x) over K. Let f (x) = (x − a)r g(x), where (x − a)
does not divide g(x). Then
If g ′ (x) = 0, then f ′ (x) = g(x) ≠ 0. Now suppose that g ′ (x) ≠ 0. Assume that f ′ (x) = 0;
then, necessarily, (x − a)|g(x) giving a contradiction. Therefore, f ′ (x) ≠ 0.
Conversely, suppose that f ′ (x) ≠ 0. Assume that f (x) is not separable. Then both f (x)
and f ′ (x) have a common zero a ∈ L. Let ma (x) be the minimal polynomial of a in K[x].
Then ma (x)|f (x), and ma (x)|f ′ (x). Since f (x) is irreducible, then the degree of ma (x) must
equal the degree of f (x). But ma (x) must also have the same degree as f ′ (x), which is less
than that of f (x), giving a contradiction. Therefore, f (x) must be separable.
We now consider the following example of a nonseparable polynomial over the fi-
nite field ℤp of p elements. We will denote this field now as GF(p), the Galois field of p
elements.
Example 16.1.5. Let K = GF(p) and L = K(t), the field of rational functions in t over K.
Consider the polynomial f (x) = x p − t ∈ L[x].
Now K[t]/tK[t] ≅ K. Since K is a field, this implies that tK[t] is a maximal ideal,
and hence a prime ideal in K[t] with prime element t ∈ K[t] (see Theorem 3.2.7). By
the Eisenstein criteria, f (x) is an irreducible polynomial in L[x] (see Theorem 4.4.8).
However, f ′ (x) = px p−1 = 0, since char(K) = p. Therefore, f (x) is not separable.
Proof. Suppose that K is a field with char(K) = 0. Suppose that f (x) is a nonconstant
polynomial in K[x]. Then f ′ (x) ≠ 0. If f (x) is irreducible, then f (x) is separable from
Lemma 16.1.4. Therefore, by definition, each nonconstant polynomial f (x) ∈ K[x] is sep-
arable.
16.2 Perfect Fields � 237
We remark that in the original motivation for Galois theory, the ground field was
the rationals ℚ. Since this has characteristic zero, it is perfect and all extensions are sep-
arable. Hence, the question of separability did not arise until the question of extensions
of fields of prime characteristic arose.
If in (1) and (2) f (x) is irreducible, then f (x) is not separable over K if and only if f (x) is a
polynomial in x p .
Proof. Let f (x) = ∑ni=1 ai x i . Then f ′ (x) = 0 if and only if p|i for all i with ai ≠ 0. But this
is equivalent to
f (x) = a0 + ap x p + ⋅ ⋅ ⋅ + am x mp .
If f (x) is irreducible, then f (x) is not separable if and only if f ′ (x) = 0 from
Lemma 16.1.4.
Theorem 16.2.4. Let K be a field with char(K) = p ≠ 0. Then the following are equivalent:
(1) K is perfect.
(2) Each element in K has a p-th root in K.
(3) The Frobenius homomorphism τ : x → x p is an automorphism of K.
Proof. First we show that (1) implies (2). Suppose that K is perfect, and a ∈ K. Then
x p − a is separable over K. Let g(x) ∈ K[x] be an irreducible factor of x p − a. Let L be
the splitting field of g(x) over K, and b a zero of g(x) in L. Then bp = a. Furthermore,
x p − bp = (x − b)p ∈ L[x], since the characteristic of K is p. Hence, g(x) = (x − b)s , and
then s must equal 1 since g(x) is irreducible. Therefore, b ∈ K, and b is a p-th root of a.
Now we show that (2) implies (3). Recall that the Frobenius homomorphism τ is
injective (see Theorem 1.8.8). We must show that it is also surjective. Let a ∈ K, and let
b be a p-th root of a so that a = bp . Then τ(b) = bp = a, and τ is surjective.
Finally, we show that (3) implies (1). Let τ : x → x p be surjective. It follows that each
a ∈ K has a p-th root in K. Now let f (x) ∈ K[x] be irreducible. Assume that f (x) is not
separable. From Theorem 16.2.3, there is a g(x) ∈ K[x] with f (x) = g(x p ); that is,
f (x) = a0 + a1 x p + ⋅ ⋅ ⋅ + am x mp .
p
Let bi ∈ K with ai = bi . Then
238 � 16 Separable Field Extensions
p p
f (x) = bpo + b1 x p + ⋅ ⋅ ⋅ + bpm x mp = (b0 + b1 x + ⋅ ⋅ ⋅ + bm x m ) .
Theorem 16.2.5. Let K be a field with char(K) = p ≠ 0. Then each element of K has at
most one p-th power in K.
p p
Proof. Suppose that b1 , b2 ∈ K with b1 = b2 = a. Then
p p
0 = b1 − b2 = (b1 − b2 )p .
Proof. Let K be a finite field of characteristic p > 0. Then the Frobenius map τ is surjec-
tive since it is injective and K is finite. Therefore, K is perfect from Theorem 16.2.4.
Next we show that each finite field has order pm for some prime p and natural num-
ber m > 0.
Lemma 16.3.2. Let K be a finite field. Then |K| = pm for some prime p and natural number
m > 0.
Proof. Let K be a finite field with characteristic p > 0. Then K can be considered as a
vector space over K = GF(p), and hence of finite dimension since |K| < ∞. If α1 , . . . , αm
is a basis, then each f ∈ K can be written as f = c1 α1 + ⋅ ⋅ ⋅ + cn αm with each ci ∈ GF(p).
Hence, there are p choices for each ci , and therefore pm choices for each f .
In Theorem 9.5.16, we proved that any finite subgroup of the multiplicative group
of a field is cyclic. If K is a finite field, then its multiplicative subgroup K ⋆ is finite, and
hence cyclic.
Lemma 16.3.3. Let K be a finite field. Then its multiplicative subgroup K ⋆ is cyclic.
16.4 Separable Extensions � 239
If K is a finite field with order pm , then its multiplicative subgroup K ⋆ has order
p − 1. Then, from Lagrange’s theorem, each nonzero element to the power pm is the
m
Lemma 16.3.4. Let K be a field of order pm . Then each α ∈ K is a zero of the polynomial
m m
x p − x. In particular, if α ≠ 0, then α is a zero of x p −1 − 1.
Proof. Let |K1 | = |K2 | = pm . From the remarks above, K1 = GF(p)(α), where α has order
pm − 1 in K1⋆ . Similarly, K2 = GF(p)(β), where β also has order pm − 1 in K2⋆ . Hence,
GF(p)(α) ≅ GF(p)(β), and therefore K1 ≅ K2 .
In Lemma 16.3.2, we saw that if K is a finite field, then |K| = pn for some prime p and
positive integer n. We now show that given a prime power pn , there does exist a finite
field of that order.
Theorem 16.3.6. Let p be a prime and n > 0 a natural number. Then there exists a field K
of order pn .
n
Proof. Given a prime p, consider the polynomial g(x) = x p − x ∈ GF(p)[x]. Let K be the
splitting field of this polynomial over GF(p). Since a finite field is perfect, K is a separable
extension, and hence all the zeros of g(x) are distinct in K.
Let F be the set of pn distinct zeros of g(x) within K. Let a, b ∈ F. Since
n n n n n n
(a ± b)p = ap ± bp and (ab)p = ap bp ,
it follows that F forms a subfield of K. However, F contains all the zeros of g(x), and
since K is the smallest extension of GF(p) containing all the zeros of g(x), we must have
K = F. Since F has pn elements, it follows that the order of K is pn .
Combining Theorems 16.3.5 and 16.3.6, we get the following summary result, indi-
cating that up to isomorphism there exists one and only one finite field of order pn .
Theorem 16.3.7. Let p be a prime and n > 0 a natural number. Then up to isomorphism,
there exists a unique finite field of order pn .
Lemma 16.4.2. Let L|K be a finite extension with L ⊂ L, and L algebraically closed. In
particular, L = K(a1 , . . . , an ), where the ai are algebraic over K. Let pi be the number of
pairwise distinct zeros of the minimal polynomial mai of ai over K(a1 , . . . , an−1 ) in L. Then
there are exactly p1 , . . . , pn monomorphisms β : L → L with β|K = 1K .
Proof. From Theorem 16.4.1, there are exactly p1 monomorphisms α : K(a1 ) → L with
α|K equal to the identity on K. Each such α has exactly p2 extensions of the identity on K
to K(a1 , a2 ). We now continue in this manner.
Proof. This follows directly from the fact that the minimal polynomial of a over M di-
vides the minimal polynomial of a over K.
Theorem 16.4.4. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is finite and separable.
(2) There are finitely many separable elements a1 , . . . , an over K with K = K(a1 , . . . , an ).
(3) L|K is finite, and if L ⊂ L with L algebraically closed, then there are exactly [L : K]
monomorphisms α : L → L with α|K = 1K .
Proof. That (1) implies (2) follows directly from the definitions. We show then that (2)
implies (3). Let L = K(a1 , . . . , an ), where a1 , . . . , an are separable elements over K. The
extension L|K is finite (see Theorem 5.3.4).
Let pi be the number of pairwise distinct zeros in L of the minimal polynomial
mai (x) = fi (x) of ai over K(a1 , . . . , ai−1 ). Then
pi ≤ deg(fi ) = K(a1 , . . . , ai ) : K(a1 , . . . , ai−1 ).
Hence, pi = deg(fi (x)) since ai is separable over K(a1 , . . . , ai−1 ) from Theorem 16.4.3.
Therefore, [L : K] = p1 ⋅ ⋅ ⋅ pn is equal to the number of monomorphisms α : L → L with
α|K , the identity on K.
16.4 Separable Extensions � 241
Finally, we show that (3) implies (1). Suppose then the conditions of (3). Since L|K is
finite, there are finitely many a1 , . . . , an ∈ L with L = K(a1 , . . . , an ). Let pi and fi (x) be as
in the proof above, and hence pi ≤ deg(fi (x)). By assumption we have
[L : K] = p1 ⋅ ⋅ ⋅ pn
k
mb (x) = ∑ bi x pi , bi ∈ K, bk = 1
i=0
b0 + b1 bp + ⋅ ⋅ ⋅ + bk bpk = 0.
Theorem 16.4.5. Let L|K be a field extension, and let M be an intermediate field. Then the
following are equivalent:
(1) L|K is separable.
(2) L|M and M|K are separable.
Proof. We first show that (1) implies (2): If L|K is separable then L|M is separable by
Theorem 16.4.3, and M|K is separable.
242 � 16 Separable Field Extensions
Now suppose (2), and let M|K and L|M be separable. Let a ∈ L, and let
M ′ = K(b1 , . . . , bn−1 ).
Theorem 16.4.6. Let L|K be a field extension, and let S ⊂ L such that all elements of S are
separable over K. Then K(S)|K is separable, and K[S] = K(S).
Proof. Let W be the set of finite subsets of S. Let T ∈ W . From Theorem 16.4.4, we
obtain that K(T)|K is separable. Since each element of K(S) is contained in some K(T),
we have that K(S)|K is separable. Since all elements of S are algebraic, we have that
K[S] = K(S).
Theorem 16.4.7. Let L|K be a field extension. Then there exists in L a uniquely determined
maximal field M with the property that M|K is separable. If a ∈ L is separable over M,
then a ∈ M. M is called the separable hull of K in L.
Proof. Let S be the set of all elements in L, which are separable over K. We now define
M = K(S). Then M|K is separable from Theorem 16.4.6. Now, let a ∈ L be separable
over M. Then M(a)|M is separable from Theorem 16.4.4. Furthermore, M(a)|K is sepa-
rable from Theorem 16.4.5. It follows that a ∈ M.
Theorem 16.5.1. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a Galois extension.
(2) L is the splitting field of a separable polynomial in K[x].
(3) L|K is finite, normal, and separable.
Therefore, we may characterize Galois extensions of a field K as finite, normal, and sepa-
rable extensions of K.
16.5 Separability and Galois Extensions � 243
Proof. Recall from Theorem 8.2.2 that an extension L|K is normal if the following hold:
(1) L|k is algebraic, and
(2) each irreducible polynomial f (x) ∈ K[x] that has a zero in L splits into linear factors
in L[x].
Now suppose that L|K is a Galois extension. Then L|K is finite from Theorem 15.4.1.
Let L = K(b1 , . . . , bm ) and mbi (x) = fi (x) be the minimal polynomial of bi over K. Let
ai1 , . . . , ain be the pairwise distinct elements from
Hi = {α(bi ) : α ∈ Aut(L|K)}.
Define
If α ∈ Aut(L|K), then α(gi ) = gi , since α permutes the elements of Hi . This means that the
coefficients of gi (x) are in Fix(L, Aut(L|K)) = K. Furthermore, gi (x) ∈ K[x], because bi is
one of the aij , and fi (x)|gi (x). The group Aut(L|K) acts transitively on {ai1 , . . . , ain } by the
choice of ai1 , . . . , ain . Therefore, each gi (x) is irreducible (see Theorem 15.2.4). It follows
that fi (x) = gi (x). Now, fi (x) has only simple zeros in L; that is, no zero has multiplicity
≥ 2, and hence fi (x) splits over L. Thus, L is a splitting field of f (x) = f1 (x) ⋅ ⋅ ⋅ fm (x), and
f (x) is separable by definition. Hence, (1) implies (2).
Now suppose that L is a splitting field of the separable polynomial f (x) ∈ K[x], and
L|K is finite. From Theorem 16.4.4, we get that L|K is separable, since L = K(a1 , . . . , an )
with each ai separable over K. Therefore, L|K is normal from Definition 8.2.1. Hence,
(2) implies (3).
Finally, suppose that L|K is finite, normal, and separable. Since L|K is finite and
separable from Theorem 16.4.4, there exist exactly [L : K] monomorphisms α : L →
L, L, the algebraic closure of L, with α|K the identity on K. Since L|K is normal, these
monomorphisms are already automorphisms of L from Theorem 8.2.2.
Hence, [L : K] ≤ |Aut(L|K)|. Furthermore, |L : K| ≥ |Aut(L|K)| from Theorem 15.4.3.
Combining these, we have [L : K] = Aut(L|K), and hence L|K is a Galois extension from
Theorem 15.4.9. Therefore, (3) implies (1), completing the proof.
Recall that any field of characteristic 0 is perfect, and therefore any finite extension
is separable. Applying this to ℚ implies that the Galois extensions of the rationals are
precisely the splitting fields of polynomials.
Corollary 16.5.2. The Galois extensions of the rationals are precisely the splitting fields
of polynomials in ℚ[x].
Theorem 16.5.3. Let L|K be a finite, separable field extension. Then there exists an exten-
sion field M of L such that M|K is a Galois extension.
244 � 16 Separable Field Extensions
Proof. Let L = K(a1 , . . . , an ) with all ai separable over K. Let fi (x) be the minimal poly-
nomial of ai over K. Then each fi (x), and hence also f (x) = f1 (x) ⋅ ⋅ ⋅ fn (x), is separable
over K. Let M be the splitting field of f (x) over K. Then M|K is a Galois extension from
Theorem 16.5.1.
Example 16.5.4. Let K = ℚ be the rationals, and let f (x) = x 4 −2 ∈ ℚ[x]. From Chapter 8,
we know that L = ℚ(√2, i) is a splitting field of f (x). By the Eisenstein criteria, f (x) is
4
are the zeros of f (x). Since the rationals are perfect, f (x) is separable. L|K is a Galois
extension by Theorem 16.5.1. From the calculations in Chapter 15, we have
Aut(L|K) = Aut(L) = [L : K] = 8.
Let
We want to determine the subgroup lattice of the Galois group G. We show G ≅ D4 , the
dihedral group of order 8. Since there are 4 zeros of f (x), and G permutes these, G must
be a subgroup of S4 , and since the order is 8, G is a 2-Sylow subgroup of S4 . From this,
we have that
If we let τ = (2, 4) and σ = (1, 2, 3, 4), we get the isomorphism between G and D4 . From
Theorem 14.1.1, we know that D4 = ⟨r, f ; r 4 = f 2 = (rf )2 = 1⟩.
This can also be seen in the following manner. Let
4 4 4 4
a1 = √2, a2 = i√2, a3 = −√2, a4 = −i√2.
Let α ∈ G. α is determined if we know α(√2) and α(i). The possibilities for α(i) are i or
4
for α. These are exactly the elements of the group G. We have δ, τ ∈ G with
4 4
δ(√2) = i√2, δ(i) = i
and
4 4
τ(√2) = √2, τ(i) = −i.
It is straightforward to show that δ has order 4, τ has order 2, and δτ has order 2. These
define a group of order 8 isomorphic to D4 , and since G has 8 elements, this must be all
of G.
16.5 Separability and Galois Extensions � 245
We now look at the subgroup lattice of G, and then the corresponding field lattice.
Let δ and τ be as above. Then G has 5 subgroups of order 2
From this we construct the lattice of fields and intermediate fields. Since there are
10 proper subgroups of G from the fundamental theorem of Galois theory, there are 10
intermediate fields in L|ℚ, namely, the fix fields Fix(L, H), where H is a proper subgroup
of G. In the identification, the extension field corresponding to the whole group G is the
ground field ℚ (recall that the lattice of fields is the inverted lattice of the subgroups),
whereas the extension field corresponding to the identity is the whole field L. We now
consider the other proper subgroups. Let δ, τ be as before.
(1) Let M1 = Fix(L, {1, τ}). Now, {1, τ} fixes ℚ(√2) elementwise such that ℚ(√2) ⊂ M1 .
4 4
4 4 4
τδ(√2) = τ(i√2) = −i√2
4 4 4
τδ(i√2) = τ(−√2) = −√2
4 4 4
τδ(−√2) = τ(−i√2) = i√2
4 4 4
τδ(−i√2) = τ(√2) = √2.
(3) Consider M3 = Fix(L, {1, τδ2 }). The map τδ2 interchanges a1 and a3 and fixes a2
and a4 . Therefore, M3 = ℚ(i√2).
4
In an analogous manner, we can then consider the other 5 proper subgroups and corre-
sponding intermediate fields. We get the following lattice of fields and subfields:
Theorem 16.6.1 (Primitive element theorem). Let L = K(γ1 , . . . , γn ), and suppose that
each γi is separable over K. Then there exists a γ0 ∈ L such that L = K(γ0 ). The element
γ0 is called a primitive element.
Proof. Suppose first that K is a finite field. Then L is also a finite field, and therefore
L⋆ = ⟨γ0 ⟩ is cyclic. Therefore, L = K(γ0 ), and the theorem is proved if K is a finite field.
Now suppose that K is infinite. Inductively, it suffices to prove the theorem for n = 2.
Hence, let α, β ∈ L be separable over K. We must show that there exists a γ ∈ L with
K(α, β) = K(γ).
Let L be the splitting field of the polynomial mα (x)mβ (x) over L, where mα (x), mβ (x)
are, respectively, the minimal polynomials of α, β over K. In L[x], we have the following:
α1 + zβ1 = αi + zβj
16.6 The Primitive Element Theorem � 247
α1 + cβ1 ≠ αi + cβj
γ = α + cβ = α1 + cβ1 .
We claim that K(α, β) = K(γ) holds. It suffices to show that β ∈ K(γ), for then α =
γ − cβ ∈ K(γ). This implies that K(α, β) ⊂ K(γ), and since γ ∈ K(α, β), it follows that
K(α, β) = K(γ). To show that β ∈ K(γ), we first define f (x) = mα (γ − cx), and let d(x) =
gcd(f (x), mβ (x)). We may assume that d(x) is monic. We show that d(x) = x − β. Then
β ∈ K(γ), since d(x) ∈ K(γ)[x].
Assume first that d(x) = 1. Then gcd(f (x), mβ (x)) = 1, and f (x) and mβ (x) are also
relatively prime in L[x]. This is a contradiction, since f (x) and mβ (x) have the common
zero β ∈ L, and hence the common divisor x − β.
Therefore, d(x) ≠ 1, so deg(d(x)) ≥ 1.
The polynomial d(x) is a divisor of mβ (x), and hence d(x) splits into linear factors
of the form x − βj , 1 ≤ j ≤ t in L[x]. The proof is completed if we can show that no linear
factor of the form x − βj with 2 ≤ j ≤ t is a divisor of f (x). That is, we must show that
f (βj ) ≠ 0 in L if j ≥ 2.
Now f (βj ) = mα (γ − cβj ) = mα (α1 + cβ1 − cβj ). Suppose that f (βj ) = 0 for some j ≥ 2.
This would imply that αi = α1 +cβ1 −cβj ; that is, α1 +cβ1 = αj +cβj for j ≥ 2. This contradicts
the choice of the value c. Therefore, f (βj ) ≠ 0 if j ≥ 2, completing the proof.
Corollary 16.6.2. Let L|K be a finite extension with K a perfect field. Then L = K(γ) for
some γ ∈ L.
Corollary 16.6.3. Let L|K be a finite extension with K a perfect field. Then there exist only
finitely many intermediate fields E with K ⊂ E ⊂ L.
Proof. Since K is a perfect field, we have L = K(γ) for some γ ∈ L. Let mγ (x) ∈ K[x]
be the minimal polynomial of γ over K, and let L be the splitting field of mγ (x) over K.
Then L|K is a Galois extension; hence, there are only finitely many intermediate fields
between K and L. Therefore, also only finitely many fields between K and L.
Suppose that L|K is algebraic. Then, in general, L = K(γ) for some γ ∈ L if and only
if there exist only finitely many intermediate fields E with K ⊂ E ⊂ L.
248 � 16 Separable Field Extensions
This condition on intermediate fields implies that L|K is finite if L|K is algebraic.
Hence, we have proved this result, in the case that K is perfect. The general case is dis-
cussed in the book of S. Lang [13].
16.7 Exercises
1. Let f (x) = x 4 −8x 3 +24x 2 −32x+14 ∈ ℚ[x], and let v ∈ ℂ be a zero of f . Let α := v(4−v),
and K a splitting field of f over ℚ. Show the following:
(i) f is irreducible over ℚ, and f (x) = f (4 − x).
(ii) There is exactly one automorphism σ of ℚ(v) with σ(v) = 4 − v.
(iii) L := ℚ(α) is the Fix field of σ and |L : ℚ| = 2.
(iv) Determine the minimal polynomial of α over ℚ and determine α.
(v) |ℚ(v) : L| = 2, and determine the minimal polynomial of v over L; also deter-
mine v and all other zeros of f (x).
(vi) Determine the degree of |K : ℚ|.
(vii) Determine the structure of Aut(K|ℚ).
2. Let L|K be a field extension and f ∈ K[x] a separable polynomial. Let Z be a splitting
field of f over L and Z0 a splitting field of f over K. Show that Aut(Z|L) is isomorphic
to a subgroup of Aut(Z0 |K).
3. Let L|K be a field extension and v ∈ L. For each element c ∈ K it is K(v + c) = K(v).
For c ≠ 0, it is K(cv) = K(v).
4. Let v = √2+ √3 and let K = ℚ(v). Show that √2 and √3 are presentable as a ℚ-linear
combination of 1, v, v2 , v3 . Conclude that K = ℚ(√2, √3).
5. Let L be the splitting field of x 3 − 5 over ℚ in ℂ. Determine a primitive element t of
L over ℚ.
17 Applications of Galois Theory
As we mentioned in Chapter 1, Galois theory was originally developed as part of the
proof that polynomial equations of degree 5 or higher over the rationals cannot be
solved by formulas in terms of radicals. In this chapter, we do this first and prove the in-
solvability of the quintic polynomials by radicals. To do this, we must examine in detail
what we call radical extensions.
We then return to some geometric material we started in Chapter 6. There, using
general field extensions, we proved the impossibility of certain geometric compass and
straightedge constructions. Here, we use Galois theory to consider constructible n-gons.
Finally, we will use Galois theory to present a proof of the fundamental theorem of
algebra, which says, essentially, that the complex number field ℂ is algebraically closed.
In Chapter 17, we always assume that K is a field of characteristic 0; in particular, K
is perfect. We remark that some parts of Sections 17.1–17.4 go through for finite fields of
characteristic p > 3.
K = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Lm = L
In proving the insolvability of the quintic polynomial, we will look for necessary
and sufficient conditions for the solvability of polynomial equations. Our main result
will be that if f (x) ∈ K[x], then f (x) = 0 is solvable over K if the Galois group of the
splitting field of f (x) over K is a solvable group (see Chapter 11).
In the remainder of this section, we assume that all fields have characteristic zero.
The next theorem gives a characterization of simple extensions by radicals:
https://doi.org/10.1515/9783111142524-017
250 � 17 Applications of Galois Theory
Theorem 17.1.2. Let L|K be a field extension and n ∈ ℕ. Assume that the polynomial x n −1
splits into linear factors in K[x] so that K contains all the n-th roots of unity.
Then L = K(√n a) for some a ∈ K if and only if L is a Galois extension over K, and if
Aut(L|K) = ℤ/mℤ for some m ∈ ℕ with m|n.
Proof. The n-th roots of unity, that is, the zeros of the polynomial x n − 1 ∈ K[x], form a
cyclic multiplicative group ℱ ⊂ K ⋆ of order n, since each finite subgroup of the multi-
plicative group K ⋆ of K is cyclic, and |ℱ | = n. We call an n-th root of unity ω primitive if
ℱ = ⟨ω⟩.
Now let L = K(√n a) with a ∈ K; that is, L = K(β) with βn = a ∈ K. Let ω be a
primitive n-th root of unity. With this β, the elements ωβ, ω2 β, . . . , ωn β = β are zeros of
x n − a. Hence, the polynomial x n − a splits into linear factors over L; hence, L = K(β) is
a splitting field of x n − a over K. It follows that L|K is a Galois extension.
Let σ ∈ Aut(L|K). Then σ(β) = ων β for some 0 < ν ≤ n. The element ων is uniquely
determined by σ, and we may write ων = ωσ .
Consider the map ϕ : Aut(L|K) → ℱ given by σ → ωσ , where ωσ is defined as above
by σ(β) = ωσ β. If τ, σ ∈ Aut(L|K), then
because ωτ ∈ K.
Therefore, ϕ(στ) = ϕ(σ)ϕ(τ); hence, ϕ is a homomorphism. The kernel ker(ϕ) con-
tains all the K-automorphisms of L, for which σ(β) = β. However, since K = K(β), it
follows that ker(ϕ) contains only the identity. The Galois group Aut(L|K) is, therefore,
isomorphic to a subgroup of ℱ . Since ℱ is cyclic of order n, we have that Aut(L|K) is
cyclic of order m for some m|n, completing one way in the theorem.
Conversely, first suppose that L|K is a Galois extension with Aut(L|K) = ℤn , a cyclic
group of order n. Let σ be a generator of Aut(L|K). This is equivalent to
n
ω ⋆ η = ∑ ων σ ν (η) ≠ 0.
ν=1
n n n+1
σ(ω ⋆ η) = ∑ ων σ ν+1 (η) = ω−1 ∑ ων+1 σ ν+1 (η) = ω−1 ∑ ων σ ν (η)
ν=1 ν=1 ν=2
17.1 Field Extensions by Radicals � 251
n
= ω−1 ∑ ων σ ν (η) = ω−1 (ω ⋆ η).
ν=1
Proof. We use induction on the degree m = [L : K]. Suppose that m = 1. If L = K(√n a),
then if ω is a primitive n-th root of unity, define K̃ = K(ω) and L̃ = K(̃ √n a). We then get
the chain K ⊂ K̃ ⊂ L̃ with L ⊂ L,̃ and L|K ̃ is a Galois extension. This last statement is due
to the fact that L is the splitting field of the polynomial x n − a ∈ K[x] over K. Hence, the
̃
theorem is true if m = 1.
Now suppose that m ≥ 2, and suppose that the theorem is true for all extensions F
of K by radicals with [F : K] < m.
Since m ≥ 2 by the definition of extension by radicals, there exists a simple extension
L|E by a radical. That is, there exists a field E with
K ⊂ E ⊂ L, [L : E] ≥ 2
f (x) = (x n − α1 ) ⋅ ⋅ ⋅ (x n − αs )
with αi ∈ Ẽ for i = 1, . . . , s. All zeros of f (x) in L̃ are radicals over E.̃ Therefore, L̃ is an
extension by radicals of E.̃ Since Ẽ is also an extension by radicals of K, we obtain that
L̃ is an extension by radicals of K.
Since Ẽ is a Galois extension of K, we have that Ẽ is a splitting field of a polynomial
g(x) ∈ K[x]. Furthermore, L̃ is a splitting field of f (x) ∈ K[x] over E.̃ Altogether then, we
have that L̃ is a splitting field of f (x)g(x) ∈ K[x] over K. Therefore, L̃ is a Galois extension
of K, completing the proof.
Proof. We prove the lemma by induction on r. If r = 0, then G = {1}, and there is nothing
to prove. Suppose then that r ≥ 1, and assume that the lemma holds for all such chains
of fields with a length r ′ < r. Since L1 |K is a Galois extension, then Aut(L1 |K) is a normal
subgroup of G by the fundamental theorem of Galois theory. Moreover,
Lemma 17.1.5. Let L|K be a field extension. Let K̃ and L̃ be the splitting fields of the poly-
nomial x n − 1 ∈ K[x] over K and L, respectively. Since K ⊂ L, we have K̃ ⊂ L.̃ Then the
following hold:
(1) If σ ∈ Aut(L|L),
̃ then σ|K̃ ∈ Aut(K|K),
̃ and the map
Aut(L|L)
̃ → Aut(K|K),
̃ given by σ → σ|K̃ ,
is an injective homomorphism.
(2) Suppose that in addition L|K is a Galois extension. Then L|K
̃ is also a Galois extension.
If furthermore, σ ∈ Aut(L|̃ K),
̃ then σ|L ∈ Aut(L|K), and
17.2 Cyclotomic Extensions � 253
is an injective homomorphism.
Proof. (1) Let ω be a primitive nth root of unity. Then K̃ = K(ω), and L̃ = L(ω). Each
σ ∈ Aut(L|L)̃ maps ω onto a primitive nth root of unity, and fixes K ⊂ L elementwise.
Hence, from σ ∈ Aut(L|L), ̃ we get that σ|K̃ ∈ Aut(K|K).
̃ Certainly, the map σ → σ|K̃
defines a homomorphism Aut(L|L) → Aut(K|K). Let σ|K̃ = 1 with σ ∈ Aut(L|L).
̃ ̃ ̃ Then
σ(ω) = ω; therefore, we have already that σ = 1, since L̃ = L(ω).
(2) If L is the splitting field of a polynomial g(x) over K, then L̃ is the splitting field of
g(x)(x n − 1) over K. Hence, L|K ̃ is a Galois extension. Therefore, K ⊂ L ⊂ L,̃ and L|K, L|L ̃
and L|K are all Galois extensions. Therefore, from the fundamental theorem of Galois
̃
theory
Definition 17.2.1. The splitting field of the polynomial x n − 1 ∈ ℚ[x] with n ≥ 2 is called
the nth cyclotomic field denoted by kn .
We have kn = ℚ(ω), where ω is a primitive nth root of unity. For example, consider
2πi
ω = e n over ℚ. kn |ℚ is a Galois extension, and the Galois group Aut(kn |ℚ) is the set of
automorphisms σm : ω → ωm with 1 ≤ m ≤ n and gcd(m, n) = 1.
To understand this group G, we need the following concept: A prime residue class
modulo n is a residue class a + nℤ with gcd(a, n) = 1. The set of the prime residue classes
modulo n is just the set of invertible elements with respect to multiplication of the ℤ/nℤ.
This forms a multiplicative group that we denote by (ℤ/nℤ)⋆ = Pn . We have |Pn | = ϕ(n),
where ϕ(n) is the Euler phi-function. If G = Aut(kn |ℚ), then G ≅ Pn under the map
σm → m + nℤ. If n = p is a prime number, then G = Aut(kp |ℚ) is cyclic with |G| = p − 1.
If n = p2 , then |G| = |Aut(kp2 |ℚ)| = p(p − 1), since
2
x p −1 x − 1
= x p(p−1) + x p(p−1)−1 + ⋅ ⋅ ⋅ + 1.
x − 1 xp − 1
254 � 17 Applications of Galois Theory
Lemma 17.2.2. Let K be a field and K̃ be the splitting field of x n − 1 over K. Then Aut(K|K)
̃
is Abelian.
Proof. We apply Lemma 17.1.5 for the field extension K|ℚ. This can be done since the
characteristic of K is zero, and ℚ is the prime field of K. It follows that Aut(K|K)
̃ is iso-
morphic to a subgroup of Aut(ℚ|ℚ) from part (1) of Lemma 17.1.5. But ℚ = kn , and hence
̃ ̃
Aut(ℚ|ℚ)
̃ is Abelian. Therefore, Aut(K|K)
̃ is Abelian.
Proof. Suppose that L|K is a Galois extension. Then we have a chain of fields
K = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Lr = L
n
such that Lj = Lj−1 ( √j aj ) for some aj ∈ Lj . Let n = n1 ⋅ ⋅ ⋅ nr , and let L̃ j be the splitting field
of the polynomial x n − 1 ∈ K[x] over Lj for each j = 0, 1, . . . , r. Then L̃ j = L̃ j−1 ( √j aj ), and
n
K ⊂ K̃ = L̃ 0 ⊂ L̃ 1 ⊂ ⋅ ⋅ ⋅ ⊂ L̃ r = L.̃
From part (2) of Lemma 17.1.5, we get that L|K ̃ is a Galois extension. Furthermore,
̃Lj |L̃ j−1 is a Galois extension with Aut(L̃ j |L̃ j−1 ) cyclic from Theorem 17.1.2. In particular,
Aut(L̃ j |L̃ j−1 ) is Abelian. The group Aut(K|K)̃ is Abelian from Lemma 17.2.2. Therefore,
we may apply Lemma 17.1.4 to the chain
K ⊂ K̃ = L̃ 0 ⊂ ⋅ ⋅ ⋅ ⊂ L̃ r = L.̃
Therefore, G̃ = Aut(L|K)
̃ is solvable. The group G = Aut(L|K) is a homomorphic im-
age of G from the fundamental theorem of Galois theory. Since homomorphic images of
̃
solvable groups are still solvable (see Theorem 12.1.3), it follows that G is solvable.
Lemma 17.3.2. Let L|K be a Galois extension, and suppose that G = Aut(L|K) is solv-
able. Assume further that K contains all q-th roots of unity for each prime divisor q of
m = [L : K]. Then L is an extension of K by radicals.
Proof. Let L|K be a Galois extension, and suppose that G = Aut(L|K) is solvable; also
assume that K contains all the q-th roots of unity for each prime divisor q of m = [L : K].
We prove the result by induction on m.
17.4 The Insolvability of the Quintic Polynomial � 255
If m = 1, then L = K, and the result is clear. Now suppose that m ≥ 2, and as-
sume that the result holds for all Galois extensions L′ |K ′ with [L′ : K ′ ] < m. Now
G = Aut(L|K) is solvable, and G is nontrivial since m ≥ 2. Let q be a prime divisor of m.
From Lemma 12.1.2 and Theorem 13.3.5, it follows that there is a normal subgroup H of
G with G/H cyclic of order q. Let E = Fix(L, H). From the fundamental theorem of Galois
theory, E|K is a Galois extension with Aut(E|K) ≅ G/H, and hence Aut(E|K) is cyclic of
order q. From Theorem 17.1.2, E|K is a simple extension of K by a radical. The proof is
completed if we can show that L is an extension of E by radicals.
The extension L|E is a Galois extension, and the group Aut(L|E) is solvable, since it
is a subgroup of G = Aut(L|K). Each prime divisor p of [L : E] is also a prime divisor of
m = [L : K] by the degree formula. Hence, as an extension of K, the field E contains all
the p-th roots of unity. Finally,
[L : K] m
[L : E] = = < m.
[E : K] q
Theorem 17.4.1. Let K be a field of characteristic 0, and let f (x) ∈ K[x]. Suppose that L
is the splitting field of f (x) over K. Then the polynomial equation f (x) = 0 is solvable by
radicals if and only if Aut(L|K) is solvable.
Proof. Suppose first that f (x) = 0 is solvable by radicals. Then L is contained in an ex-
tension L′ of K by radicals. Hence, L is contained in a Galois extension L̃ of K by radicals
from Theorem 17.1.3. The group G̃ = Aut(L|K) ̃ is solvable from Theorem 17.3.1. Further-
more, L|K is a Galois extension. Therefore, the Galois group Aut(L|K) is solvable as a
subgroup of G.̃
Conversely, suppose that the group Aut(L|K) is solvable. Let q1 , . . . , qr be the prime
divisors of m = [K : K], and let n = q1 ⋅ ⋅ ⋅ qr . Let K̃ and L̃ be the splitting fields of the
polynomial x n − 1 ∈ K[x] over K and L, respectively. We have K̃ ⊂ L.̃ From part (2) of
Lemma 17.1.5, we have that L|K ̃ is a Galois extension, and Aut(L|̃ K)̃ is isomorphic to a
subgroup of Aut(L|K). From this, we first obtain that [L̃ : K]̃ = |Aut(L|̃ K)|̃ is a divisor of
[L : K] = |Aut(L|K)|. Hence, each prime divisor q of [L : K] is also a prime divisor of
̃ ̃
[L : K]. Therefore, L̃ is an extension by radicals of K̃ by Lemma 17.3.2. Since K̃ = K(ω),
where ω is a primitive n-th root of unity, we obtain that L̃ is also an extension of K by
256 � 17 Applications of Galois Theory
Corollary 17.4.2. Let K be a field of characteristic 0, and let f (x) ∈ K[x] be a polynomial
of degree m with 1 ≤ m ≤ 4. Then the equation f (x) = 0 is solvable by radicals.
Proof. Let L be the splitting field of f (x) over K. The Galois group Aut(L|K) is isomorphic
to the subgroup of the symmetric group Sm . Now the group S4 is solvable via the chain
{1} ⊂ ℤ2 ⊂ D2 ⊂ A4 ⊂ S4 ,
where ℤ2 is the cyclic group of order 2, and D2 is the Klein 4-group, which is isomorphic
to ℤ2 × ℤ2 . Because Sm ⊂ S4 for 1 ≤ m ≤ 4, it follows that Aut(L|K) is solvable. From
Theorem 17.4.1, the equation f (x) = 0 is solvable by radicals.
Corollary 17.4.2 uses the general theory to show that any polynomial equation of
degree less than or equal to 4 is solvable by radicals. This, however, does not provide
explicit formulas for the solutions. We present these below:
Let K be a field of characteristic 0, and let f (x) ∈ K[x] be a polynomial of degree
m with 1 ≤ m ≤ 4. As mentioned above, we assume that K is the splitting field of the
respective polynomial.
Case (1): If deg(f (x)) = 1, then f (x) = ax + b with a, b ∈ K and a ≠ 0. A zero is then
given by k = − ab .
Case (2): If deg(f (x)) = 2, then f (x) = ax 2 + bx + c with a, b, c ∈ K and a ≠ 0. The
zeros are then given by the quadratic formula
−b ± √b2 − 4ac
k= .
2a
We note that the quadratic formula holds over any field of characteristic not equal to 2.
Whether there is a solution within the field K then depends on whether b2 − 4ac has a
square root within K.
For the cases of degrees 3 and 4, we have the general forms of what are known as
Cardano’s formulas.
Case (3): If deg(f (x)) = 3, then f (x) = ax 3 + bx 2 + cx + d with a, b, c, d ∈ K and a ≠ 0.
Dividing through by a, we may assume, without loss of generality, that a = 1.
By a substitution x = y − b3 , the polynomial is transformed into
g(y) = y3 + py + q ∈ K[y].
Let L be the splitting field of g(y) over K, and let α ∈ L be a zero of g(y) so that
α3 + pα + q = 0.
If p = 0, then α = √−q
3
so that g(y) has the three zeros
17.4 The Insolvability of the Quintic Polynomial � 257
3
√−q, ω√−q,
3
ω2√−q,
3
α3 + pα + q = 0,
we get
p3
β3 − + q = 0.
27β3
Define γ = β3 and δ = ( −p
3β
)3 so that
γ + δ + q = 0.
Then
3 3
p p3 p
γ2 + qγ − ( ) = 0 and − +δ+q=0 and δ2 + qδ − ( ) = 0.
3 27δ 3
3
p
x 2 + qx − ( )
3
are
2 3
q q p
γ, δ = − ± √( ) + ( ) .
2 2 3
2 3
√( q ) + ( p ) = 0.
2 3
p
Then from the definitions of γ, δ, we have γ = β3 , and δ = ( −p
3β
)3 . From above, α = β − 3β .
Therefore, we get α by finding the cube roots of γ and δ.
There are certain possibilities and combinations with these cube roots, but because
of the conditions, the cube roots of γ and δ are not independent. We must satisfy the
condition
258 � 17 Applications of Galois Theory
−p p
√3 γ√3 δ = β =− .
3β 3
u + v, ωu + ω2 v, ω2 u + ωv,
2 3 2 3
3 q q p 3 q q p
u = √− + √( ) + ( ) and v = √− − √( ) + ( ) .
2 2 3 2 2 3
g(y) = y4 + py2 + qy + r.
We have to find the zeros of g(y). Let x1 , x2 , x3 , x4 be the solutions in the splitting field of
the polynomial
y4 + py2 + qy + r = 0.
Then
0 = x1 + x2 + x3 + x4 ,
p = x1 x2 + x1 x3 + x1 x4 + x2 x3 + x2 x4 + x3 x4 ,
−q = x1 x2 x3 + x1 x2 x4 + x1 x3 x4 + x2 x3 x4 ,
r = x1 x2 x3 x4 .
We define
y1 = (x1 + x2 )(x3 + x4 ),
y2 = (x1 + x3 )(x2 + x4 ),
y3 = (x1 + x4 )(x2 + x3 ).
From x1 + x2 + x3 + x4 = 0, we get
17.4 The Insolvability of the Quintic Polynomial � 259
Let y3 + fy2 + gy + h = 0 be the cubic equation with the solutions y1 , y2 , and y3 . This
polynomial y3 + fy2 + gy + h is called the cubic resolvent of the equation of degree four.
If we compare the coefficients, we get the following:
f = −y1 − y2 − y3 ,
g = y1 y2 + y1 y3 + y2 y3 ,
h = −y1 y2 y3 .
f = −2p,
g = p2 − 4r,
h = q2 .
x1 + x2 = −(x3 + x4 ) = ±√−y1 ,
x1 + x3 = −(x2 + x4 ) = ±√−y2 ,
x1 + x4 = −(x2 + x3 ) = ±√−y3 .
The formulas for x2 , x3 , and x4 follow analogously, and are of the same type as that for x1 .
By variation of the signs we get eight numbers ±x1 , ±x2 , ±x3 and ±x4 . Four of them
are the solutions of the equation
y4 + py3 + qy + r = 0.
The correct ones we get by putting into the equation. They are as follows:
260 � 17 Applications of Galois Theory
1
x1 = (√−y1 + √−y2 + √−y3 ),
2
1
x2 = (√−y1 − √−y2 − √−y3 ),
2
1
x3 = (−√−y1 + √−y2 − √−y3 ),
2
1
x4 = (−√−y1 − √−y2 + √−y3 ).
2
The following theorem is due to Abel; it shows the insolvability of the general de-
gree 5 polynomial over the rationals ℚ.
Theorem 17.4.3. Let L be the splitting field of the polynomial f (x) = x 5 − 2x 4 + 2 ∈ ℚ[x]
over ℚ. Then Aut(L|K) = S5 , the symmetric group on 5 letters. Since S5 is not solvable, the
equation f (x) = 0 is not solvable by radicals.
Proof. The polynomial f (x) is irreducible over ℚ by the Eisenstein criterion. Further-
more, f (x) has five zeros in the complex numbers ℂ by the fundamental theorem of al-
gebra (see Section 17.6). We claim that f (x) has exactly 3 real zeros and 2 nonreal zeros,
which then necessarily are complex conjugates. In particular, the 5 zeros are pairwise
distinct.
To see the claim, notice first that f (x) has at least 3 real zeros from the intermediate
value theorem. As a real function, f (x) is continuous, f (−1) = −1 < 0, f (0) = 2 > 0, so
it must have a real zero between −1 and 0. Furthermore, we have f ( 32 ) = − 813 < 0 and
f (2) = 2 > 0. Hence, there must be distinct real zeros between 0 and 32 , and between 32
and 2. Suppose that f (x) has more than 3 real zeros. Then f ′ (x) = x 3 (5x − 8) has at least 3
pairwise distinct real zeros from Rolle’s theorem. But f ′ (x) clearly has only 2 real zeros,
so this is not the case. Therefore, f (x) has exactly 3 real zeros, and hence 2 nonreal zeros
that are complex conjugates.
Let L be the splitting field of f (x). The field L lies in ℂ, and the restriction of the map
δ : z → z of ℂ to L maps the set of zeros of f (x) onto themselves. Therefore, δ is an
automorphism of L. The map δ fixes the 3 real zeros and transposes the 2 nonreal zeros.
From this, we now show that Aut(L|ℚ) = Aut L = G = S5 , the full symmetric group on 5
symbols. Clearly, G ⊂ S5 , since G acts as a permutation group on the 5 zeros of f (x).
Since δ transposes the 2 nonreal zeros, G (as a permutation group) contains at least
one transposition. Since f (x) is irreducible, G acts transitively on the zeros of f (x). Let
x0 be one of the zeros of f (x), and let Gx0 be the stabilizer of x0 .
Since G acts transitively, x0 has five images under G; therefore, the index of the
stabilizer must be 5 (see Chapter 10):
5 = [G : Gx0 ],
which—by Lagrange’s theorem—must divide the order of G. Therefore, from the Sylow
theorems, G contains an element of order 5. Hence, G contains a 5-cycle and a transpo-
17.5 Constructibility of Regular n-Gons � 261
Since Abel’s theorem shows that there exists a degree 5 polynomial that cannot be
solved by radicals, it follows that there can be no formula like Cardano’s formula in
terms of radicals for degree 5.
Corollary 17.4.4. There is no general formula for solving by radicals a fifth degree poly-
nomial over the rationals.
We now show that this result can be further extended to any degree greater than 5.
Theorem 17.4.5. For each n ≥ 5, there exist polynomials f (x) ∈ ℚ[x] of degree n, for
which the equation f (x) = 0 is not solvable by radicals.
Proof. Let f (x) = x n−5 (x 5 − 2x 4 + 2), and let L be the splitting field of f (x) over ℚ. Then
Aut(L|ℚ) = Aut(L) contains a subgroup that is isomorphic to S5 . It follows that Aut(L) is
not solvable; therefore, the equation f (x) = 0 is not solvable by radicals.
Corollary 17.4.6. There is no general formula for solving by radicals polynomial equa-
tions over the rationals of degree 5 or greater.
Fermat believed that all the numbers in this sequence were primes. In fact, F0 , F1 ,
F2 , F3 , F4 are all prime, but F5 is composite and divisible by 641 (see exercises). It is still
an open question whether or not there are infinitely many Fermat primes. It has been
conjectured that there are only finitely many. On the other hand, if a number of the form
2n + 1 is a prime for some integer n, then it must be a Fermat prime; that is, n must be a
power of 2.
We first need the following:
ℚ = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Ln = kp ,
[Lj : Lj−1 ] = 2
for j = 1, . . . , n.
with [Uj−1 : Uj ] = 2 for j = 1, . . . , n. From the fundamental theorem of Galois theory, the
fields Lj = Fix(kp , Uj ) with j = 0, . . . , n have the desired properties.
The following corollaries describe completely the constructible n-gons, tying them
to Fermat primes.
Corollary 17.5.2. Consider the numbers 0, 1, that is, a unit line segment or a unit circle.
A regular p-gon with p ≥ 3 prime is constructible from {0, 1} using a straightedge and
s
compass if and only if p = 22 + 1, s ≥ 0 is a Fermat prime.
Proof. From Theorem 6.3.13, we have that if a regular p-gon is constructible with a
straightedge and compass, then p must be a Fermat prime. The sufficiency follows from
Theorem 17.5.1.
We now extend this to general n-gons. Let m, n ∈ ℕ. Assume that we may construct
from {0, 1} a regular n-gon and a regular m-gon. In particular, this means that we may
construct the real numbers cos( 2π
n
), sin( 2π
n
), cos( 2π
m
), and sin( 2π
m
). If the gcd(m, n) = 1,
then we may construct from {0, 1} a regular mn-gon.
To see this, notice that
2π 2π 2(n + m)π 2π 2π 2π 2π
cos( + ) = cos( ) = cos( ) cos( ) − sin( ) sin( ),
n m nm n m n m
17.6 The Fundamental Theorem of Algebra � 263
and
2π 2π 2(n + m)π 2π 2π 2π 2π
sin( + ) = sin( ) = sin( ) cos( ) + cos( ) sin( ).
n m nm n m n m
2π 2π
Therefore, we may construct from {0, 1} the numbers cos( mn ) and sin( mn ), because
gcd(n + m, mn) = 1. Therefore, we may construct from {0, 1} a regular mn-gon.
Now let p ≥ 3 be a prime. Then [kp2 : ℚ] = p(p − 1), which is not a power of 2.
Therefore, from {0, 1} it is not possible to construct a regular p2 -gon. Hence, altogether
we have the following:
Corollary 17.5.3. Consider the numbers 0, 1, that is, a unit line segment or a unit circle.
A regular n-gon with n ∈ ℕ is constructible from {0, 1} using a straightedge and compass
if and only if
(i) n = 2m , m ≥ 0 or
(ii) n = 2m p1 p2 ⋅ ⋅ ⋅ pr , m ≥ 0, and the pi are pairwise distinct Fermat primes.
Theorem 17.6.1. Each nonconstant polynomial f (x) ∈ ℂ[x], where ℂ is the field of com-
plex numbers, has a zero in ℂ. Therefore, ℂ is an algebraically closed field.
Proof. Let f (x) ∈ ℂ[x] be a nonconstant polynomial, and let K be the splitting field of
f (x) over ℂ. Since the characteristic of the complex numbers ℂ is zero, this will be a
Galois extension of ℂ. Since ℂ is a finite extension of ℝ, this field K would also be a
Galois extension of ℝ. The fundamental theorem of algebra asserts that K must be ℂ
itself, and hence the fundamental theorem of algebra is equivalent to the fact that any
nontrivial Galois extension of ℂ must be ℂ.
Let K be any finite extension of ℝ with |K : ℝ| = 2m q, (2, q) = 1. If m = 0, then K is
an odd-degree extension of ℝ. Since K is separable over ℝ, from the primitive element
theorem, it is a simple extension, and hence K = ℝ(α), where the minimal polynomial
mα (x) over ℝ has odd degree. However, odd-degree real polynomials always have a real
zero, and therefore mα (x) is irreducible only if its degree is one. But then, α ∈ ℝ, and
K = ℝ. Therefore, if K is a nontrivial finite extension of ℝ of degree 2m q, we must have
m > 0. This shows more generally that there are no odd-degree finite extensions of ℝ.
264 � 17 Applications of Galois Theory
The fact that ℂ is algebraically closed limits the possible algebraic extensions of the
reals.
Corollary 17.6.2. Let K be a finite field extension of the real numbers ℝ. Then K = ℝ or
K = ℂ.
Proof. Since |K : ℝ| < ∞ by the primitive element theorem, K = ℝ(α) for some α ∈ K.
Then the minimal polynomial mα (x) of α over ℝ is in ℝ[x], and hence in ℂ[x]. Therefore,
from the fundamental theorem of algebra it has a zero in ℂ. Hence, α ∈ ℂ. If α ∈ ℝ, then
K = ℝ, if not, then K = ℂ.
17.7 Exercises
1. For f (x) ∈ ℚ[x] with
√3 (2 ± √−121) = 2 ± √−1,
x n − 1 = (x − ξ1 )(x − ξ2 ) ⋅ ⋅ ⋅ (x − ξn ),
ν 2πν 2πν
ξν = e2πi n = cos + i ⋅ sin , 1 ≤ ν ≤ n,
n n
are all (different) n-th roots of unity, that is, especially ξn = 1. These ξν form a mul-
tiplicative cyclic group G = {ξ1 , ξ2 , . . . , ξn } generated by ξ1 . It is ξν = ξ1ν .
An n-th root of unity ξν is called a primitive n-th root of unity, if ξν is not an m-th root
of unity for any m < n.
Show that the following are equivalent:
(i) ξν is a primitive n-th root of unity.
(ii) ξν is a generating element of G.
(iii) gcd(ν, n) = 1.
11. The polynomial ϕn (x) ∈ ℂ[x], whose zeros are exactly the primitive n-th roots of
unity, is called the n-th cyclotomic polynomial. With Exercise 6 it is
ν
ϕn (x) = ∏ (x − ξν ) = ∏ (x − e2πi n ).
1≤ν≤n 1≤ν≤n
gcd(ν,n)=1 gcd(ν,n)=1
266 � 17 Applications of Galois Theory
The degree of ϕn (x) is the number of the integers {1, . . . , n}, which are coprime to n.
Show the following:
(i) x n − 1 = ∏d≥1 ϕd (x).
d|n
(ii) ϕn (x) ∈ ℤ[x] for all n ≥ 1.
(iii) ϕn (x) is irreducible over ℚ (and therefore also over ℤ) for all n ≥ 1.
12. Show that the Fermat numbers F0 , F1 , F2 , F3 , F4 are all prime but F5 is composite and
divisible by 641.
18 The Theory of Modules
18.1 Modules over Rings
Recall that a vector space V over a field K is an Abelian group V with a scalar multipli-
cation ⋅ : K × V → V , satisfying the following:
(1) f (v1 + v2 ) = fv1 + fv2 for f ∈ K and v1 , v2 ∈ V .
(2) (f1 + f2 )v = f1 v + f2 v for f1 , f2 ∈ K and v ∈ V .
(3) (f1 f2 )v = f1 (f2 v) for f1 , f2 ∈ K and v ∈ V .
(4) 1v = v for v ∈ V .
Vector spaces are the fundamental algebraic structures in linear algebra, and the study
of linear equations. Vector spaces have been crucial in our study of fields and Galois
theory, since any field extension is a vector space over any subfield. In this context, the
degree of a field extension is just the dimension of the extension field as a vector space
over the base field. If we modify the definition of a vector space to allow scalar multipli-
cation from an arbitrary ring, we obtain a more general structure called a module. We
will formally define this below. Modules generalize vector spaces, but the fact that the
scalars do not necessarily have inverses makes the study of modules much more com-
plicated. Modules will play an important role in both the study of rings and the study
of Abelian groups. In fact, any Abelian group is a module over the integers ℤ so that
modules, besides being generalizations of vector spaces, can also be considered as gen-
eralizations of Abelian groups.
In this chapter, we will introduce the theory of modules. In particular, we will ex-
tend to modules the basic algebraic properties such as the isomorphism theorems, which
have been introduced earlier in presenting groups, rings, and fields. We restrict our-
selves to commutative rings, so that throughout R is always a commutative ring. If R has
an identity 1, then we always consider only the case that 1 ≠ 0. Throughout this chapter,
we use letters a, b, c, m, . . . for ideals in R. For principal ideals, we write ⟨a⟩ or aR for
the ideal generated by a ∈ R. We note, however, that the definition can be extended to
include modules over noncommutative rings (see Chapter 22). In this case, we would
speak of left modules and right modules.
Definition 18.1.1. Let R = (R, +, ⋅) a commutative ring and M = (M, +) an Abelian group.
M together with a scalar multiplication ⋅ : R × M → M, (α, x) → αx, is called a R-module
or module over R if the following axioms hold:
(M1) (α + β)x = αx + βx,
(M2) α(x + y) = αx + αy, and
(M3) (αβ)x = α(βx) for all α, β ∈ R and x, y ∈ M.
https://doi.org/10.1515/9783111142524-018
268 � 18 The Theory of Modules
0 ⋅ x = 0,
nx = x⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
+ ⋅⋅⋅ + x if n > 0, and
n-times
nx = (−n)(−x) if n < 0.
⋅ : ℤ × G → G, (n, x) → nx.
(3) Let S be a subring of R. Then, via (s, r) → sr, the ring R itself becomes an S-module.
(4) Let K be a field, V a K-vector space, and f : V → V a linear map of V .
Let p = ∑i αi t i ∈ K[t]. Then p(f ) := ∑i αi f i defines a linear map of V , and V is an
unitary K[t]-module via the scalar multiplication
Basic to all algebraic theory is the concept of substructures. Next we define submod-
ules.
We next extend to modules the concept of a generating system. For a single genera-
tor, as with groups, this is called cyclic.
⟨⋃ Ui ⟩ = {∑ ai : ai ∈ Ui , L ⊂ I finite}.
i∈I i∈L
We write ⟨⋃i∈I Ui ⟩ =: ∑i∈I Ui and call this submodule the sum of the Ui . A sum ∑i∈I Ui
is called a direct sum if for each representation of 0, as 0 = ∑ ai , ai ∈ Ui , it follows that
all ai = 0. This is equivalent to Ui ∩ ∑i=j̸ Uj = 0 for all i ∈ I.
Notation: ⨁i∈I Ui ; and if I = {1, . . . , n}, then we also write U1 ⊕ ⋅ ⋅ ⋅ ⊕ Un .
In analogy with our previously defined algebraic structure, we extend to modules
the concepts of quotient modules and module homomorphisms.
270 � 18 The Theory of Modules
Definition 18.1.8. Let U be a submodule of the R-module M. Let M/U be the factor
group. We define a (well defined) scalar multiplication:
With this M/U is a R-module, the factor module or quotient module of M by U. In M/U,
we have the operations
(x + U) + (y + U) = (x + y) + U,
and
α(x + U) = αx + U.
A module M over a ring R can also be considered as a module over a quotient ring
of R. The following is straightforward to verify (see exercises):
Lemma 18.1.9. Let a ⊲ R an ideal in R and M a R-module. The set of all finite sums of the
form ∑ αi xi , αi ∈ a, xi ∈ M, is a submodule of M, which we denote by aM. The factor group
M/aM becomes a R/a-module via the well defined scalar multiplication
If here R has an identity 1 and a is a maximal ideal, then M/aM becomes a vector space
over the field K = R/a.
for all α ∈ R and all x, y ∈ M. Endo-, epi-, mono-, iso- and automorphisms are defined
analogously via the corresponding properties of the maps. If f : M → N and g : N → P
are module homomorphisms, then g ∘ f : M → P is also a module homomorphism. If
f : M → N is an isomorphism, then also f −1 : N → M.
and
f (M) ≅ M/ ker(f ).
U/(U ∩ V ) ≅ (U + V )/V .
For the proofs, as for groups, just consider the map f : U + V → U/(U ∩ V ), u + v →
u + (U ∩ V ), which is well defined because U ∩ V is a submodule of U; then we have
ker(f ) = V .
Note that α → αρ, ρ ∈ R fixed, defines a module homomorphism R → R if we
consider R itself as a R-module.
Definition 18.2.1. Let M be an R-module. For a fixed a ∈ M, consider the module homo-
morphism λa : R → M, λa (α) := αa where we consider R as an R-module. We call ker(λa )
the annihilator of a denoted by Ann(a); that is,
Ann(a) = {α ∈ R : αa = 0}.
Lemma 18.2.2. The annihilator Ann(a) is a submodule of R and the module isomorphism
theorem (1) gives R/ Ann(a) ≅ Ra.
As for single elements, since Ann(U) = ⋂u∈U Ann(u), then Ann(U) is a submodule
of R. If ρ ∈ R, u ∈ U, then ρu ∈ U; that means, if u ∈ Ann(U), then also ρu ∈ Ann(U),
because (αρ)u = α(ρu) = 0. Hence, Ann(U) is an ideal in R.
Suppose that G is an Abelian group. Then as aforementioned, G is a ℤ-module. An
element g ∈ G is a torsion element, or has finite order if ng = 0 for some n ∈ ℕ. The
set Tor(G) consists of all the torsion elements in G. An Abelian group is torsion-free if
Tor(G) = {0}.
Lemma 18.2.4. Let G be an Abelian group. Then Tor(G) is a subgroup of G, and the factor
group G/ Tor(G) is torsion-free.
Definition 18.2.5. The R-module M is called faithful if Ann(M) = {0}. We call an element
a ∈ M a torsion element, or element of finite order, if Ann(a) ≠ {0}. A module without
torsion elements ≠ 0 is called torsion-free. If the R-module M is torsion-free, then R has
no zero divisors ≠ 0.
Theorem 18.2.6. Let R be an integral domain and M an R-module (by our agreement M
is unitary). Let Tor(M) = T(M) be the set of torsion elements of M. Then Tor(M) is a
submodule of M, and M/ Tor(M) is torsion-free.
via
∏ Mi ≅ ∏ Mπ(i) ,
i∈I i∈I
and
⨁ Mi ≅ ⨁ Mπ(i) .
i∈I i∈I
∏ Mi ≅ ∏(∏ Mi ),
i∈I j∈J i∈Ij
and
⨁ Mi ≅ ⨁(⨁ Mi ).
i∈I j∈J i∈Ij
Proof. We first consider (1). If there is such ϕ, then the jth component of ϕ(a) is equal
ϕj (a), because πj ∘ ϕ = ϕj . Hence, define ϕ(a) ∈ ∏i∈I Mi via ϕ(a)(i) := ϕi (a), and ϕ is the
desired map.
We now prove (2). If there is such a Ψ with Ψ ∘ αj = Ψj , then
Hence, define Ψ((xi )) = ∑i∈I Ψi (xi ), and Ψ is the desired map (recall that the sum is well
defined).
We now define a basis for a module. Those modules that actually have a basis are
called free modules.
Let R be a ring with identity 1, M be a unitary R-module, and S ⊂ M. Each finite sum
∑ αi si , the αi ∈ R, and the si ∈ S, is called a linear combination in S. Since M is unitary,
and S ≠ 0, then ⟨S⟩ is exactly the set of all linear combinations in S. In the following, we
assume that S ≠ 0. If S = 0, then ⟨S⟩ = ⟨0⟩ = {0}, and this case is not interesting. For
convention, in the following, we always assume mi ≠ mj if i ≠ j in a finite sum ∑ αi mi
with all αi ∈ R and all mi ∈ M.
Example 18.4.3. 1. R × R = R2 , as an R-module, is free with basis {(1, 0), (0, 1)}.
2. More generally, let I ≠ 0. Then ⨁i∈I Ri with Ri = R for all i ∈ I is free with basis
{ϵi : I → R : ϵi (j) = δij , i, j ∈ I}, where
0 if i ≠ j,
δij = {
1 if i = j.
Theorem 18.4.4. The R-module M is free on S if and only if each m ∈ M can be written
uniquely in the form ∑ αi si with αi ∈ R, si ∈ S. This is exactly the case, where M = ⨁s∈S Rs
is the direct sum of the cyclic submodules Rs, and each Rs is module isomorphic to R.
The rest of the theorem, essentially, is a rewriting of the definition. If each m ∈ M can
be written as m = ∑ αi si , then M = ∑s∈S Rs. If x ∈ Rs′ ∩ ∑s∈S,s=s̸ ′ Rs with s′ ∈ S, then
x = α′ s′ = ∑si =s̸ ′ ,si ∈S αi si , and 0 = α′ s′ − ∑si =s̸ ′ ,si ∈S αi si . Therefore, α′ = 0, and αi = 0 for
all i. This gives M = ⨁s∈S Rs. The cyclic modules Rs are isomorphic to R/ Ann(s), and
Ann(s) = {0} in the free modules. On the other side such modules are free on S.
M ≅ Rn = R ⊕ ⋅⋅⋅ ⊕ R.
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
n-times
Proof. Part (1) is clear. For (2), let M = ⟨x1 , . . . , xr ⟩ and S a basis of M. Each xi is uniquely
representable on S, as xi = ∑si ∈S αi si . Since the xi generates M, m = ∑ βi xi = ∑ βi αj sj for
arbitrary m ∈ M, and we need only finitely many sj to generate M. Hence, S is finite.
Theorem 18.4.6. Let R be a commutative ring with identity 1, and M a free R-module.
Then any two bases of M have the same cardinality.
Proof. The ring R contains a maximal ideal m, and R/m is a field (see Theorems 2.3.2
and 2.4.2). Then M/mM is a vector space over R/m. From M ≅ ⨁s∈S Rs with basis S, we
get mM ≅ ⨁s∈S ms; hence,
Therefore, the R/m-vector space M/mM has a basis of the cardinality of S. This gives the
result.
Let R be a commutative ring with identity 1, and M a free R-module. The cardinality
of a basis is an invariant of M, called the rank of M or dimension of M.
If rank(M) = n < ∞, then this means M ≅ Rn .
Proof. Let S be a basis of F. By the axiom of choice, there exists for each s ∈ S an element
ms ∈ M with f (ms ) = s (f is surjective). We define the map g : F → M via s → ms
linearly; that is, g(∑si ∈S αi si ) = ∑si ∈S αi msi . Since F is free, the map g is well defined.
Obviously, f ∘ g(s) = f (ms ) = s for s ∈ S; that means f ∘ g = idF , because F is free on S. For
18.5 Modules over Principal Ideal Domains � 277
each m ∈ M, we have also m = g ∘ f (m) + (m − g ∘ f (m)), where g ∘ f (m) = g(f (m)) ∈ g(F).
Since f ∘ g = idF , the elements of the form m − g ∘ f (m) are in the kernel of f . Therefore,
M = g(F) + ker(f ). Now let x ∈ g(F) ∩ ker(f ). Then x = g(y) for some y ∈ F and 0 = f (x) =
f ∘ g(y) = y, and hence x = 0. Therefore, the sum is direct: M = g(F) ⊕ ker(f ).
Corollary 18.4.9. Let M be an R-module and N a submodule such that M/N is free. Then
there is a submodule N ′ of M with M = N ⊕ N ′ .
Proof. Apply the above theorem for the canonical map π : M → M/N with ker(π) = N.
Theorem 18.5.1. Let M be a free R-module of finite rank over the principal ideal domain R.
Then each submodule U is free of finite rank, and rank(U) ≤ rank(M).
Proof. We prove the theorem by induction on n = rank(M). The theorem certainly holds
if n = 0. Now let n ≥ 1, and assume that the theorem holds for all free R-modules of
rank < n. Let M be a free R-module of rank n with basis {x1 , . . . , xn }. Let U be a submod-
ule of M. We represent the elements of U as linear combination of the basis elements
x1 , . . . , xn , and we consider the set of coefficients of x1 for the elements of U:
n
a = {β ∈ R : βx1 + ∑ βi xi ∈ U}.
i=2
Certainly a is an ideal in R. Since R is a principal ideal domain, we have a = (α1 ) for some
α1 ∈ R. Let u ∈ U be an element in U, which has α1 as its first coefficient; that is
n
u = α1 x1 + ∑ αi xi ∈ U.
i=2
Now let be 0 = γu + ∑ti=1 μi yi . We write u and the yi as linear combinations in the basis
elements x1 , . . . , xn of M. There is only an x1 -portion in γu. Hence,
n
0 = γα1 x1 + ∑ μ′i xi .
i=2
Therefore, first γα1 x1 = 0; that is, γ = 0, because R has no zero divisor ≠ 0, and further-
more, μ′2 = ⋅ ⋅ ⋅ = μ′n = 0. That means, μ1 = ⋅ ⋅ ⋅ = μt = 0.
Let R be a principal ideal domain. Then the annihilator Ann(x) in R-modules M has
certain further properties. Let x ∈ M. By definition
Ann(x) = {α ∈ R : αx = 0} ⊲ R, an ideal in R,
hence Ann(x) = (δx ). If x = 0, then (δx ) = R. δx is called the order of x and (δx ) the
order ideal of x. δx is uniquely determined up to units in R (that is, up to elements η with
ηη′ = 1 for some η′ ∈ R). For a submodule U of M, we call Ann(U) = ⋂u∈U (δu ) = (μ), the
order ideal of U.
In an Abelian group G, considered as a ℤ-module, this order for elements corre-
sponds exactly to the order as group elements if we choose δx ≥ 0 for x ∈ G.
Theorem 18.5.2. Let R be a principal ideal domain and M be a finitely generated torsion-
free R-module. Then M is free.
Proof. Let M = ⟨x1 , . . . , xn ⟩ torsion-free and R a principal ideal domain. Each submodule
⟨xi ⟩ = Rxi is free, because M is torsion-free. We call a subset S ⊂ ⟨x1 , . . . , xn ⟩ free if the
submodule ⟨S⟩ is free. Since ⟨xi ⟩ is free, there exist such nonempty subsets. Under all
free subsets S ⊂ ⟨x1 , . . . , xn ⟩, we choose one with a maximal number of elements. We
may assume that {x1 , . . . , xs }, 1 ≤ s ≤ n, is such a maximal set—after possible renaming.
If s = n, then the theorem holds. Now, let s < n. By the choice of s, the sets {x1 , . . . , xs , xj }
with s < j ≤ n are not free. Hence, there are αj ∈ R, and αi ∈ R, not all 0, with
s
αj xj = ∑ αi xi , αj ≠ 0, s < j ≤ n.
i=1
For the product α := αs+1 ⋅ ⋅ ⋅ αn ≠ 0, we get αxj ∈ Rx1 ⊕ ⋅ ⋅ ⋅ ⊕ Rxs =: F, s < j ≤ n, because
αxi ∈ F for 1 ≤ i ≤ s. Altogether, we get αM ⊂ F. αM is a submodule of the free R-module
F of rank s. By Theorem 18.5.1, we have that αM is free. Since α ≠ 0, and M is torsion-
free, the map M → αM, x → αx, defines an (module) isomorphism; that is, M ≅ αM.
Therefore, also M is free.
Corollary 18.5.3. Let R be a principal ideal domain and M be a finitely generated R-mod-
ule. Then M = T(M) ⊕ F with a free submodule F ≅ M/T(M).
Proof. M/T(M) is a finitely generated, torsion-free R-module, and hence free. By Corol-
lary 18.4.9, we have M = T(M) ⊕ F, F ≅ M/T(M).
From now on, we are interested in the case where M ≠ {0} is a torsion R-module;
that is, M = T(M). Let R be a principal ideal domain and M = T(M) an R-module. Let
M ≠ {0} and finitely generated. As above, let δx be the order of x ∈ M, unique up to units
in R, and let (δx ) = {α ∈ R : αx = 0} be the order ideal of x.
Let (μ) = ⋂x∈M (δx ) be the order ideal of M. Since (μ) ⊂ (δx ), we have δx |μ for
all x ∈ M. Since principal ideal domains are unique factorization domains, if μ ≠ 0,
then there can not be many essentially different orders (that means, different up to
units). Since M ≠ {0} and finitely generated, we have in any case μ ≠ 0, because if
M = ⟨x1 , . . . , xn ⟩, αi xi = 0 with αi ≠ 0, then αM = {0} if α := α1 ⋅ ⋅ ⋅ αn ≠ 0.
Lemma 18.5.4. Let R be a principal ideal domain and M ≠ {0} be an R-module with M =
T(M).
(1) If the orders δx and δy of x, y ∈ M are relatively prime; that is, gcd(δx , δy ) = 1, then
(δx+y ) = (δx δy ).
(2) Let δz be the order of z ∈ M, z ≠ 0. If δz = αβ with gcd(α, β) = 1, then there exist
x, y ∈ M with z = x + y and (δx ) = (α), (δy ) = (β).
z = 1 ⋅ z = ⏟⏟ραz
⏟⏟⏟⏟⏟ + ⏟⏟σβz
⏟⏟⏟⏟⏟ = y + x = x + y.
=:y =:x
Since αx = ασβz = σδz z = 0, we get α ∈ (δz ); that means, δx |α. On the other hand, from
0 = δx x = σβδx z, we get δz |σβδx , and hence αβ|σβδx , because δz = αβ. Therefore, α|σδx .
From gcd(α, σ) = 1, we get α|δx . Therefore, α is associated to δx ; that is α = δx ϵ with ϵ a
unit in R, and furthermore, (α) = (δx ). Analogously, (β) = (δy ).
Corollary 18.5.5. Let R be a principal ideal domain and M ≠ {0} be an R-module with
M = T(M).
1. Let x1 , . . . , xn ∈ M be pairwise different and pairwise relatively prime orders δxi = αi .
Then y = x1 + ⋅ ⋅ ⋅ + xn has order α := α1 ⋅ ⋅ ⋅ αn .
k k
2. Let 0 ≠ x ∈ M and δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of the order δx of x (ϵ a
unit in R and the πi pairwise nonassociate prime elements in R), where n > 0, ki > 0.
k
Then there exist xi , i = 1, . . . , n, with δxi associated with πi i and x = x1 + ⋅ ⋅ ⋅ + xn .
This is exercise 7.
Theorem 18.6.1 (Theorem 10.4.1, basis theorem for finite Abelian groups). Let G be a fi-
nite Abelian group. Then G is a direct product of cyclic groups of prime power order.
This allowed us, for a given finite order n, to present a complete classification of
Abelian groups of order n. In this section, we extend this result to general modules
over principal ideal domains. As a consequence, we obtain the fundamental decom-
position theorem for finitely generated (not necessarily finite) Abelian groups, which
finally proves Theorem 10.4.1. In the next chapter, we present a separate proof of this in
a slightly different format.
Theorem 18.6.3. Let R be a principal ideal domain and M ≠ {0} be an R-module with
M = T(M). Then M is the direct sum of its π-primary components.
k k
Proof. x ∈ M has finite order δx . Let δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of δx . By
Corollary 18.5.5, we have that x = ∑ xi with xi ∈ Mπi . That means, M = ∑π∈P Mπ , where P
is the set of the prime elements of R. Let y ∈ Mπ ∩ ∑σ∈P,σ =π̸ Mσ ; that is, δy = π k for some
k ≥ 0 and y = ∑ xi with xi ∈ Mσi . That means, δxi = σ li for some li ≥ 0. By Corollary 18.5.5,
l l
we get that y has the order ∏σi =π̸ σi i ; that means, π k is associated to ∏σi =π̸ σi i . Therefore,
k = li = 0 for all i, and the sum is direct.
18.6 The Fundamental Theorem for Finitely Generated Modules � 281
Corollary 18.6.4. Let R be a principal ideal domain and {0} ≠ M be a finitely gener-
ated torsion R-module. Then M has only finitely many nontrivial primary components
Mπ1 , . . . , Mπn , and we have
n
M = ⨁ Mπi .
i=1
Theorem 18.6.5. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0}
a R-module with π k M = {0}; furthermore, let m ∈ M with (δm ) = (π k ). Then there exists a
submodule N ⊂ M with M = Rm ⊕ N.
Proof. By Zorn’s lemma, the set {U : U submodule of M and U ∩Rm = {0}} has a maximal
element N. This set is nonempty, because it contains {0}. We consider M ′ := N ⊕Rm ⊂ M,
and have to show that M ′ = M. Assume that M ′ ≠ M. Then there exists a x ∈ M with
x ∉ M ′ , especially x ∉ N. Then N is properly contained in the submodule Rx+N = ⟨x, N⟩.
By our choice of N, we get A := (Rx + N) ∩ Rm ≠ {0}. If z ∈ A, z ≠ 0, then z = ρm = αx + n
with ρ, α ∈ R and n ∈ N. Since z ≠ 0, we have ρm ≠ 0; also x ≠ 0, because otherwise
z ∈ Rm ∩ N = {0}; α is not a unit in R, because otherwise x = α−1 (ρm − n) ∈ M ′ . Hence
we have: If x ∈ M, x ∉ M ′ , then there exist α ∈ R, α ≠ 0, α not a unit in R, ρ ∈ R with
ρm ≠ 0, and n ∈ N such that
αx = ρm + n. (⋆)
In particular, αx ∈ M ′ .
Now let α = ϵπ1 ⋅ ⋅ ⋅ πr be a prime decomposition. We consider one after the other
the elements x, πr x, πr−1 πr x, . . . , ϵπ1 ⋅ ⋅ ⋅ πr x = αx. We have x ∉ M ′ , but αx ∈ M ′ ; hence,
there exists an y ∉ M ′ with πi y ∈ N + Rm.
1. πi ≠ π, π the prime element in the statement of the theorem. Then we have
gcd(πi , π k ) = 1; hence, there are σ, σ ′ ∈ R with σπi + σ ′ π k = 1, and we get
Rm = (Rπi + Rπ k )m = πi Rm,
Hence, in any case, we have πi y ∈ N + πi Rm; that is, πi y = n + πi z with n ∈ N and z ∈ Rm.
It follows that πi (y − z) = n ∈ N.
282 � 18 The Theory of Modules
πi (y − z) = n. (⋆⋆)
Theorem 18.6.6. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0}
a finitely generated π-primary R-module. Then there exist finitely many m1 , . . . , ms ∈ M
with M = ⨁si=1 Rmi .
Since Rmi ≅ R/ Ann(mi ), and Ann(mi ) = (δmi ) = (π ki ), we get the following extension
of Theorem 18.6.6:
Theorem 18.6.7. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0} a
finitely generated π-primary R-module. Then there exist finitely many k1 , . . . , ks ∈ ℕ with
s
M ≅ ⨁ R/(π ki ),
i=0
Proof. The first part, that is, a description as M ≅ ⨁si=0 R/(π ki ), follows directly from
Theorem 18.6.6. Now, let
n m
M ≅ ⨁ R/(π ki ) ≅ ⨁ R/(π li ).
i=0 i=0
we get
n = dimR/(π) N = m. (⋆⋆⋆)
Assume that there is an i with ki < li or li < ki . Without loss of generality, assume that
there is an i with ki < li .
Let j be the smallest index, for which kj < lj . Then (because of the ordering of the ki )
n j−1
M ′ := π kj M ≅ ⨁ π kj R/π ki R ≅ ⨁ π kj R/π ki R,
i=1 i=1
Theorem 18.6.8 (Fundamental theorem for finitely generated modules over principal ideal
domains). Let R be a principal ideal domain and M ≠ {0} be a finitely generated (uni-
tary) R-module. Then there exist prime elements π1 , . . . , πr ∈ R, 0 ≤ r < ∞ and numbers
k1 , . . . , kr ∈ tℕ, t ∈ ℕ0 such that
k k
M ≅ R/(π1 1 ) ⊕ R/(π2 2 ) ⊕ ⋅ ⋅ ⋅ ⊕ R/(πrkr ) ⊕ R ⊕ ⋅ ⋅ ⋅ ⊕ R,
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
t-times
k k
and M is, up to isomorphism, uniquely determined by (π1 1 , . . . , πr r , t).
The prime elements πi are not necessarily pairwise different (up to units in R); that
means, it can be πi = ϵπj for i ≠ j, where ϵ is a unit in R.
Proof. The proof is a combination of the preceding results. The free part of M is isomor-
phic to M/T(M), and the rank of M/T(M), which we call here t, is uniquely determined,
because two bases of M/T(M) have the same cardinality. Therefore, we may restrict our-
selves on torsion modules. Here, we have a reduction to π-primary modules, because in
284 � 18 The Theory of Modules
k k
a decomposition M = ⨁i R/(πi i ) is Mπ = ⨁πi =π R/(πi i ), the π-primary component of M
(an isomorphism certainly maps a π-primary component onto a π-primary component).
Therefore, it is only necessary, now, to consider π-primary modules M. The uniqueness
statement now follows from Theorem 18.6.8:
Theorem 18.6.9 (Fundamental theorem for finitely generated Abelian groups). Let {0} ≠
G = (G, +) be a finitely generated Abelian group. Then there exist prime numbers p1 , . . . , pr ,
0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 such that
k
G ≅ ℤ/(p1 1 ℤ) ⊕ ⋅ ⋅ ⋅ ⊕ ℤ/(pkr r ℤ) ⊕ ℤ
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⊕ ⋅ ⋅ ⋅ ⊕ ℤ,
t-times
k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).
18.7 Exercises
1. Let M and N be isomorphic modules over a commutative ring R. Then EndR (M) and
EndR (N) are isomorphic rings. (EndR (M) is the set of all R-modules endomorphisms
of M.)
2. Let R be an integral domain and M an R-module with M = Tor(M) (torsion module).
Show that HomR (M, R) = 0. (HomR (M, R) is the set of all R-module homomorphisms
from M to R.)
3. Prove the isomorphism theorems for modules (1), (2), and (3) in Theorem 18.1.11.
4. Let M, M ′ , N be R-modules, R a commutative ring. Show the following:
(i) HomR (M ⊕ M ′ , N) ≅ HomR (M, N) × HomR (M ′ , N).
(ii) HomR (N, M × M ′ ) ≅ HomR (N, M) ⊕ HomR (N, M ′ ).
5. Show that two free R-modules having bases, whose cardinalities are equal are iso-
morphic.
6. Let M be an unitary R-module (R a commutative ring), and let {m1 , . . . , ms } be a finite
subset of M. Show that the following are equivalent:
(i) {m1 , . . . , ms } generates M freely.
(ii) {m1 , . . . , ms } is linearly independent and generates M.
(iii) Every element m ∈ M is uniquely expressible in the form m = ∑si=1 ri mi with
ri ∈ R.
(iv) Each Rmi is torsion-free, and M = Rm1 ⊕ ⋅ ⋅ ⋅ ⊕ Rms .
7. Let R be a principal domain and M ≠ {0} be an R-module with M = T(M).
(i) Let x1 , . . . , xn ∈ M be pairwise different and pairwise relatively prime orders
δxi = αi . Then y = x1 + ⋅ ⋅ ⋅ + xn has order α := α1 . . . αn .
18.7 Exercises � 285
k k
(ii) Let 0 ≠ x ∈ M and δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of the order δx of
x (ϵ a unit in R and the πi pairwise nonassociate prime elements in R), where
k
n > 0, ki > 0. Then there exist xi , i = 1, . . . , n, with δxi associated with πi i and
x = x1 + ⋅ ⋅ ⋅ + xn .
19 Finitely Generated Abelian Groups
19.1 Finite Abelian Groups
In Chapter 10, we described the theorem below that completely provides the structure
of finite Abelian groups. As we saw in Chapter 18, this result is a special case of a general
result on modules over principal ideal domains.
Theorem 19.1.1 (Theorem 10.4.1, basis theorem for finite Abelian groups). Let G be a finite
Abelian group. Then G is a direct product of cyclic groups of prime power order.
We review two examples that show how this theorem leads to the classification of
finite Abelian groups. In particular, this theorem allows us, for a given finite order n, to
present a complete classification of Abelian groups of order n.
Since all cyclic groups of order n are isomorphic to (ℤn , +), ℤn = ℤ/nℤ, we will
denote a cyclic group of order n by ℤn .
Example 19.1.2. Classify all Abelian groups of order 60. Let G be an Abelian group of
order 60. From Theorem 10.4.1, G must be a direct product of cyclic groups of prime
power order. Now 60 = 22 ⋅ 3 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic groups involved in the direct product decomposition of G have order either 2, 4,
3, or 5 (by Lagrange’s theorem they must be divisors of 60). Therefore, G must be of the
form
G ≅ ℤ4 × ℤ3 × ℤ5 ,
or
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ5 .
Hence, up to isomorphism, there are only two Abelian groups of order 60.
Example 19.1.3. Classify all Abelian groups of order 180. Let G be an Abelian group of
order 180. Now 180 = 22 ⋅ 32 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic groups involved in the direct product decomposition of G have order either 2, 4,
3, 9, or 5 (by Lagrange’s theorem they must be divisors of 180). Therefore, G must be of
the form
G ≅ ℤ4 × ℤ9 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ9 × ℤ5
G ≅ ℤ4 × ℤ3 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ3 × ℤ5 .
https://doi.org/10.1515/9783111142524-019
19.2 The Fundamental Theorem: p-Primary Components � 287
The proof of Theorem 19.1.1 involves the lemmas that follow. We refer back to Chap-
ter 10 or Chapter 18 for the proofs. Notice how these lemmas mirror the results for
finitely generated modules over principal ideal domains considered in the last chap-
ter.
Lemma 19.1.4. Let G be a finite Abelian group, and let p||G|, where p is a prime. Then
all the elements of G, whose orders are a power of p form a normal subgroup of G. This
subgroup is called the p-primary component of G, which we will denote by Gp .
e e
Lemma 19.1.5. Let G be a finite Abelian group of order n. Suppose that n = p11 ⋅ ⋅ ⋅ pkk with
p1 , . . . , pk distinct primes.
Then
G ≅ Gp1 × ⋅ ⋅ ⋅ × Gpk ,
Theorem 19.1.6 (Basis theorem for finite Abelian groups). Let G be a finite Abelian group.
Then G is a direct product of cyclic groups of prime power order.
Theorem 19.2.1 (Fundamental theorem for finitely generated modules over principal ideal
domains). Let R be a principal ideal domain and M ≠ {0} be a finitely generated (uni-
tary) R-module. Then there exist prime elements π1 , . . . , πr ∈ R, 0 ≤ r < ∞ and numbers
k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 , such that
k k
M ≅ R/(π1 1 ) ⊕ R/(π2 2 ) ⊕ ⋅ ⋅ ⋅ ⊕ R/(πrkr ) ⊕ R ⊕ ⋅ ⋅ ⋅ ⊕ R,
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
t-times
k k
and M is, up to isomorphism, uniquely determined by (π1 1 , . . . , πr r , t).
The prime elements πi are not necessarily pairwise different (up to units in R); that
means, it can be πi = ϵπj for i ≠ j, where ϵ is a unit in R.
Since Abelian groups can be considered as ℤ-modules, and ℤ is a principal ideal
domain, we get the following corollary, which is extremely important in its own right.
288 � 19 Finitely Generated Abelian Groups
Theorem 19.2.2 (Fundamental theorem for finitely generated Abelian groups). Suppose
{0} ≠ G = (G, +) is a finitely generated Abelian group. Then there exist prime numbers
p1 , . . . , pr , 0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 , such that
k
G ≅ ℤ/(p1 1 ℤ) ⊕ ⋅ ⋅ ⋅ ⊕ ℤ/(pkr r ℤ) ⊕ ℤ
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⊕ ⋅ ⋅ ⋅ ⊕ ℤ,
t-times
k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).
Notice that the number t of infinite components is unique. This is called the rank or
Betti number of the Abelian group G. This number plays an important role in the study
of homology and cohomology groups in topology.
If G = ℤ × ℤ × ⋅ ⋅ ⋅ × ℤ = ℤr for some r, we call G a free Abelian group of rank r.
Notice that if an Abelian group G is torsion-free, then the p-primary components are just
the identity. It follows that, in this case, G is a free Abelian group of finite rank. Again,
using module theory, it follows that subgroups of this must also be free Abelian and of
smaller or equal rank. Notice the distinction between free Abelian groups and absolutely
free groups (see Chapter 14). In the free group case, a non-Abelian free group of finite
rank contains free subgroups of all possible countable ranks. In the free Abelian case,
however, the subgroups have smaller or equal rank. We summarize these comments as
follows:
Theorem 19.2.3. Let G ≠ {0} be a finitely generated torsion-free Abelian group. Then G is
a free Abelian group of finite rank r; that is, G ≅ ℤr . Furthermore, if H is a subgroup of G,
then H is also free Abelian and the rank of H is smaller than or equal to the rank of G.
+ : G × G → G, (x, y) → x + y.
We also write ng instead of g n , and use 0 as the symbol for the identity element in G;
that is, 0 + g = g for all g ∈ G. G = ⟨g1 , . . . , gt ⟩, 0 ≤ t < ∞. That is, G is (finitely)
generated by g1 , . . . , gt , is equivalent to the fact that each g ∈ G can be written in the
form g = n1 g1 + n2 g2 + ⋅ ⋅ ⋅ + nt gt , ni ∈ ℤ. A relation between the gi with coefficients
n1 , . . . , nt is then each an equation of the form n1 g1 + ⋅ ⋅ ⋅ + nt gt = 0. A relation is called
nontrivial if ni ≠ 0 for at least one i. A system R of relations in G is called a system of
defining relations, if each relation in G is a consequence of R. The elements g1 , . . . , gt are
called integrally linear independent if there are no nontrivial relations between them.
19.3 The Fundamental Theorem: Elementary Divisors � 289
s
Ui ∩ ( ∏ Uj ) = {0}.
j=1,j=i̸
To emphasize the little difference between Abelian groups and ℤ-modules, here we
use the notation “direct product” instead of “direct sum”. Considered as ℤ-modules, for
finite index sets I = {1, . . . , s}, we have anyway
s s
∏ Ui = ⨁ Ui .
i=1 i=1
Theorem 19.3.1 (Basis theorem for finitely generated Abelian groups). Let G ≠ {0} be a
finitely generated Abelian group. Then G is a direct product
G ≅ Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us ,
Lemma 19.3.3. Let G be a finitely generated Abelian group. Among all nontrivial relations
between elements of minimal generating systems of G, we choose one relation,
290 � 19 Finitely Generated Abelian Groups
m1 g1 + ⋅ ⋅ ⋅ + mt gt = 0 (⋆)
with smallest possible positive coefficient, and let this smallest coefficient be m1 . Let
n1 g1 + ⋅ ⋅ ⋅ + nt gt = 0 (⋆⋆)
Proof. For (1), assume m1 ∤ n1 . Then n1 = qm1 + m1′ with 0 < m1′ < m1 . If we multiply the
relation (⋆) with q and subtract the resulting relation from the relation (⋆⋆), then we get
a relation with a coefficient m1′ < m1 , contradicting the choice of m1 . Hence, m1 |n1 .
For (2), assume m1 ∤ m2 . Then m2 = qm1 + m2′ with 0 < m2′ < m2 . {g1 + qg2 , g2 , . . . , gt }
is a minimal generating system, which satisfies the relation
and this relation has a coefficient m2′ < m1 . This again contradicts the choice of m1 .
Hence, m1 |m2 , and furthermore, m1 |mi for i = 1, . . . , t.
Lemma 19.3.4 (Invariant characterization of kr for finite Abelian groups G). Consider the
group G = Zk1 × ⋅ ⋅ ⋅ × Zkr with Zki finite cyclic of order ki ≥ 2, i = 1, . . . , r and ki |ki+1 for
i = 1, . . . , r − 1. Then kr is the smallest natural number n such that ng = 0 for all g ∈ G. kr
is called the exponent or the maximal order of G.
n1 g1 ∈ U1 ∩ (U2 × ⋅ ⋅ ⋅ × Us ) = {0}.
s+1 s s s+1
∑ xi (∑ nij bj ) = ∑(∑ nij xi )bj = 0.
i=1 j=1 j=1 i=1
The system ∑s+1 i=1 nij xi = 0, j = 1, . . . , s, of linear equations has at least one nontrivial ra-
tional solution (x1 , . . . , xs+1 ), because we have more unknowns than equations. Multipli-
cation with the common denominator gives a nontrivial integral solution (x1 , . . . , xs+1 ) ∈
ℤs+1 . For this solution, we get
s+1
∑ xi gi = 0.
i=1
Case 2: mij aj arbitrary. Let k ≠ 0 be a common multiple of the orders kj of the cyclic
groups Zkj , j = 1, . . . , r. Then
kgi = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
mi1 ka1 + ⋅ ⋅ ⋅ + ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
mir kar +ni1 kb1 + ⋅ ⋅ ⋅ + nis kbs
=0 =0
for i = 1, . . . , s + 1. By case 1, the kg1 , . . . , kgs+1 are integrally linear dependent; that is, we
have integers x1 , . . . , xs+1 , not all 0, with ∑s+1 s+1
i=1 xi (kgi ) = 0 = ∑i=1 (xi k)gi , and the xi k are
not all 0. Hence, also g1 , . . . , gs+1 are integrally linear dependent.
Lemma 19.3.6. Let G := Zk1 × ⋅ ⋅ ⋅ × Zkr ≅ Zk1′ × ⋅ ⋅ ⋅ × Zk ′′ =: G′ , the Zki , Zk ′ cyclic groups
r j
of orders ki ≠ 1 and kj′ ≠ 1, respectively, and ki |ki+1 for i = 1, . . . , r − 1 and kj′ |kj+1
′
for
j = 1, . . . , r − 1. Then r = r , and k1 = k1 , k2 = k2 , . . . , kr = kr .
′ ′ ′ ′ ′
Proof. We prove this lemma by induction on the group order |G| = |G′ |. Certainly,
Lemma 19.3.6 holds if |G| ≤ 2, because then, either G = {0}, and here r = r ′ = 0, or
G ≅ ℤ2 , and here r = r ′ = 1. Now let |G| > 2. Then, in particular, r ≥ 1. Inductively we
assume that Lemma 19.3.6 holds for all finite Abelian groups of order less than |G|. By
Lemma 19.3.4 the number kr is invariantly characterized, that is, from G ≅ G′ follows
kr = kr′′ , that is especially, Zkr ≅ Zk ′′ . Then G/Zkr ≅ G/Zk ′′ , that is,
r r
We can now present the main result, which we state again, and its proof.
Theorem 19.3.7 (Basis theorem for finitely generated Abelian groups). Let G ≠ {0} be a
finitely generated Abelian group. Then G is a direct product
292 � 19 Finitely Generated Abelian Groups
G ≅ Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us , r ≥ 0, s ≥ 0,
Proof. We first prove the existence of the given decomposition. Let G ≠ {0} be a finitely
generated Abelian group. Let t, 0 < t < ∞, be the number of elements in a minimal
generating system of G. We have to show that G is decomposable as a direct product of
t cyclic groups with the given description. We prove this by induction on t. If t = 1, then
the basis theorem is correct. Now let t ≥ 2, and assume that the assertion holds for all
Abelian groups with less then t generators.
Case 1: There does not exist a minimal generating system of G, which satisfies a
nontrivial relation. Let {g1 , . . . , gt } be an arbitrary minimal generating system for G. Let
Ui = ⟨gi ⟩. Then all Ui are infinite cyclic, and we have G = U1 × ⋅ ⋅ ⋅ × Ut , because if, for
instance, U1 ∩ (U2 + ⋅ ⋅ ⋅ + Ut ) ≠ {0}, then we must have a nontrivial relation between the
g1 , . . . , gt .
Case 2: There exist minimal generating systems of G, which satisfy nontrivial rela-
tions. Among all nontrivial relations between elements of minimal generating systems
of G, we choose one relation,
m1 g1 + ⋅ ⋅ ⋅ + mt gt = 0 (⋆)
with smallest possible positive coefficient. Without loss of generality, let m1 be this coef-
ficient. By Lemma 19.3.3, we get m2 = q2 m1 , . . . , mt = qt m1 . Now,
t
{g1 + ∑ qi gi , g2 , . . . , gt }
i=2
m⏟ ⏟⏟
⏟⏟ 1 h⏟ 1⏟ + k h⏟2⏟ = 0,
⏟⏟⏟2⏟⏟
=0 =0
since k2 ≠ 0. Again m1 |k2 by Lemma 19.3.3. This gives the desired decomposition.
19.3 The Fundamental Theorem: Elementary Divisors � 293
G = Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us ,
Theorem 19.3.8. Let {0} ≠ G = (G, +) be a finitely generated Abelian group. Then there
exist prime numbers p1 , . . . , pr , 0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 such that
k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).
Proof. For the existence, we only have to show that ℤmn ≅ ℤm × ℤn if gcd(m, n) = 1.
For this, we write Un = ⟨m + mnℤ⟩ < ℤmn , Um = ⟨n + nmℤ⟩ < ℤmn , and Un ∩ Um =
{mnℤ}, because gcd(m, n) = 1. Furthermore, there are h, k ∈ ℤ with 1 = hm + kn. Hence,
l + mnℤ = hlm + mnℤ + kln + mnℤ, and therefore ℤmn = Un × Um ≅ ℤn × ℤm .
For the uniqueness statement, we may reduce the problem to the case |G| = pk for a
prime number p and k ∈ ℕ. But here the result follows directly from Lemma 19.3.6.
From this proof, we automatically get the Chinese remainder theorem for the case
ℤn = ℤ/nℤ.
Proof. By Theorem 19.3.1, we get that π is an additive group isomorphism, which can be
extended directly to a ring isomorphism via
Let A(n) be the number of nonisomorphic finite Abelian groups that have order
k k
n = p1 1 ⋅ ⋅ ⋅ pr r , r ≥ 1, with pairwise different primes p1 , . . . , pr and k1 , . . . , kr ∈ ℕ. By
k k
Theorem 19.2.2, we have A(n) = A(p1 1 ) ⋅ ⋅ ⋅ A(pr r ). Hence, to calculate A(n), we have to
calculate A(pm ) for a prime number p and a natural number m ∈ ℕ. Again, by Theo-
rem 19.2.2, we get G ≅ ℤpm1 × ⋅ ⋅ ⋅ × ℤpmk , all mi ≥ 1, if G is Abelian of order pm . If we
compare the orders, we get m = m1 + ⋅ ⋅ ⋅ + mk . We may order the mi by size. A k-tuple
(m1 , . . . , mk ) with 0 < m1 ≤ m2 ≤ ⋅ ⋅ ⋅ ≤ mk and m1 + m2 + ⋅ ⋅ ⋅ + mk = m is called a partition
of m. From above, each Abelian group of order pm gives a partition (m1 , . . . , mk ) of m for
some k with 1 ≤ k ≤ m. On the other hand, each partition (m1 , . . . , mk ) of m gives an
Abelian group of order pm , namely ℤpm1 × ⋅ ⋅ ⋅ × ℤpmk . Theorem 19.2.2 shows that different
partitions give nonisomorphic groups. If we define p(m) to be the number of partitions
k k
of m, then we get the following: A(pm ) = p(m), and A(p1 1 ⋅ ⋅ ⋅ pr r ) = p(k1 ) ⋅ ⋅ ⋅ p(kr ).
19.4 Exercises
1. Let H be a finite generated Abelian group, which is the homomorphic image of a
torsion-free Abelian group of finite rank n. Show that H is the direct sum of ≤ n
cyclic groups.
2. Determine (up to isomorphism) all groups of order p2 (p prime) and all Abelian
groups of order ≤ 15.
3. Let G be an Abelian group with generating elements a1 , . . . , a4 and defining relations
5. Let p be a prime and G a finite Abelian p-group; that is, the order of all elements of
G is finite and a power of p. Show that G is cyclic, if G has exactly one subgroup of
order p. Is the statement still correct if G is not Abelian?
20 Integral and Transcendental Extensions
20.1 The Ring of Algebraic Integers
Recall that a complex number α is an algebraic number if it is algebraic over the rational
numbers ℚ. That is, α is a zero of a polynomial p(x) ∈ ℚ[x]. If α ∈ ℂ is not algebraic,
then it is a transcendental number.
We will let 𝒜 denote the totality of algebraic numbers within the complex num-
bers ℂ, and 𝒯 the set of transcendentals, so that ℂ = 𝒜 ∪ 𝒯 . The set 𝒜 is the algebraic
closure of ℚ within ℂ.
The set 𝒜 of algebraic numbers forms a subfield of ℂ (see Chapter 5), and the subset
𝒜′ = 𝒜 ∩ ℝ of real algebraic numbers forms a subfield of ℝ. The field 𝒜 is an algebraic
extension of the rationals ℚ. However, the degree is infinite.
Since each rational is algebraic, it is clear that there are algebraic numbers. Fur-
thermore, there are irrational algebraic numbers, √2 for example, since it is a zero of
the irreducible polynomial x 2 − 2 over ℚ. In Chapter 5, we proved that there are un-
countably infinitely many transcendental numbers (Theorem 5.5.3). However, it is very
difficult to prove that any particular real or complex number is actually transcendental.
In Theorem 5.5.4, we showed that the real number
∞
1
c=∑
j=1 10j!
is transcendental.
In this section, we examine a special type of algebraic number called an algebraic
integer. These are the algebraic numbers that are zeros of monic integral polynomials.
The set of all such algebraic integers forms a subring of ℂ. The proofs in this section can
be found in [53].
After we do this, we extend the concept of an algebraic integer to a general con-
text and define integral ring extensions. We then consider field extensions that are
nonalgebraic—transcendental field extensions. Finally, we will prove that the familiar
numbers e and π are transcendental.
Definition 20.1.1. An algebraic integer is a complex number α, that is, a zero of a monic
integral polynomial. That is, α ∈ ℂ is an algebraic integer if there exists f (x) ∈ ℤ[x] with
f (x) = x n + bn−1 x n−1 + ⋅ ⋅ ⋅ + b0 , bi ∈ ℤ, n ≥ 1, and f (α) = 0.
https://doi.org/10.1515/9783111142524-020
296 � 20 Integral and Transcendental Extensions
To prove the converse of this lemma, we need the concept of a primitive integral
polynomial. This is a polynomial p(x) ∈ ℤ[x] such that the GCD of all its coefficients is 1.
The following can be proved (see exercises or Chapter 4):
(1) If f (x) and g(x) are primitive, then so is f (x)g(x).
(2) If f (x) ∈ ℤ[x] is monic, then it is primitive.
(3) If f (x) ∈ ℚ[x], then there exists a rational number c such that f (x) = cf1 (x) with
f1 (x) primitive.
Now suppose f (x) ∈ ℤ[x] is a monic polynomial with f (α) = 0. Let p(x) = mα (x). Then
p(x) divides f (x) so f (x) = p(x)q(x).
Let p(x) = c1 p1 (x) with p1 (x) primitive, and let q(x) = c2 q1 (x) with q1 (x) primitive.
Then
Lemma 20.1.4. If α is an algebraic integer and also rational, then it is a rational inte-
ger.
We saw that the set 𝒜 of all algebraic numbers is a subfield of ℂ. In the same manner,
the set ℐ of all algebraic integers forms a subring of 𝒜. First, an extension of the following
result on algebraic numbers.
Lemma 20.1.6. Suppose α1 , . . . , αn form the set of conjugates over ℚ of an algebraic inte-
ger α. Then any integral symmetric function of α1 , . . . , αn is a rational integer.
We note that 𝒜, the field of algebraic numbers, is precisely the quotient field of the
ring of algebraic integers.
20.1 The Ring of Algebraic Integers � 297
An algebraic number field is a finite extension of ℚ within ℂ. Since any finite exten-
sion of ℚ is a simple extension, each algebraic number field has the form K = ℚ(θ) for
some algebraic number θ.
Let K = ℚ(θ) be an algebraic number field, and let RK = K ∩ ℐ . Then RK forms a
subring of K called the algebraic integers, or integers of K. An analysis of the proof of
Theorem 20.1.5 shows that each β ∈ K can be written as
α
β=
r
with α ∈ RK and r ∈ ℤ.
These rings of algebraic integers share many properties with the rational integers.
Whereas there may not be unique factorization into primes, there is always prime fac-
torization.
Theorem 20.1.8. Let K be an algebraic number field and RK its ring of integers. Then each
α ∈ RK is either 0, a unit, or can be factored into a product of primes.
We stress again that the prime factorization need not be unique. However, from the
existence of a prime factorization, we can extend Euclid’s original proof of the infinitude
of primes (see [53]) to obtain the following:
Corollary 20.1.9. There exist infinitely many primes in RK for any algebraic number
ring RK .
Just as any algebraic number field is finite-dimensional over ℚ, we will see that each
RK is of finite degree over ℚ. That is, if K has degree n over ℚ, we show that there exists
ω1 , . . . , ωn in RK such that each α ∈ RK is expressible as
α = m1 ω1 + ⋅ ⋅ ⋅ + mn ωn ,
where m1 , . . . , mn ∈ ℤ.
α = m1 ω1 + ⋅ ⋅ ⋅ + mt ωt ,
where m1 , . . . , mt ∈ ℤ.
The finite degree comes from the following result that shows there does exist an
integral basis (see [53]):
Theorem 20.1.11. Let RK be the ring of integers in the algebraic number field K of degree
n over ℚ. Then there exists at least one integral basis for RK .
298 � 20 Integral and Transcendental Extensions
Example 20.2.3. 1. Let E|K be a field extension. a ∈ E is integral over K if and only if
a is algebraic over K. If K is the quotient field of an integral domain R, and a ∈ E
is algebraic over K. Then there exists an α ∈ R with αa integral over R, because if
0 = αn an + ⋅ ⋅ ⋅ + α0 , thus, 0 = (αn a)n + ⋅ ⋅ ⋅ + αnn−1 α0 .
2. The elements of ℂ, which are integral over ℤ are precisely the algebraic integers
over ℤ, that is, the zeros of monic polynomials over ℤ.
n
∑ (αkj − δjk a)bk = 0 (⋆⋆)
k=1
for j = 1, . . . , n, where
0 if j ≠ k,
δjk = {
1 if j = k.
Define γjk := αkj − δjk a and C = (γjk )j,k . C is an (n × n)-matrix over the commutative ring
R[a]. Recall that R[a] has an identity element. Let C̃ = (γ̃jk )j,k be the complementary
matrix of C (see for instance [9]). Then CC ̃ = (det C)En . From (⋆⋆), we get
n n n n n
0 = ∑ γ̃ij ( ∑ γjk bk ) = ∑ ∑ γ̃ij γjk bk = ∑ (det C)δik bk = (det C)bi
j=1 k=1 k=1 j=1 k=1
for all 1 ≤ i ≤ n. Since b1 = 1, we have necessarily that det C = det(αjk − δjk a)j,k = 0
(recall that δjk = δkj ). Hence, a is a zero of the monic polynomial f (x) = det(δjk x − αjk )
in R[x] of degree n ≥ 1. Therefore, a is integral over R.
Definition 20.2.5. A ring extension A|R is called an integral extension if each element of
A is integral over R. A ring extension A|R is called finite if A, as a R-module, is finitely
generated.
Recall that finite field extensions are algebraic extensions. As an immediate conse-
quence of Theorem 20.2.4, we get the corresponding result for ring extensions.
Proof. (1) implies (2): We have R[a] = {g(a) : g ∈ R[x]}. Let f (a) = 0 be an integral
equation of a over R. Since f is monic, by the division algorithm, for each g ∈ R[x], there
300 � 20 Integral and Transcendental Extensions
Theorem 20.2.8. Let A|R and B|A be finite ring extensions. Then also B|R is finite.
Proof. From A = Re1 + ⋅ ⋅ ⋅ + Rem , and B = Af1 + ⋅ ⋅ ⋅ + Afn , we get B = Re1 f1 + ⋅ ⋅ ⋅ + Rem fn .
Theorem 20.2.9. Let A|R be a ring extension. Then the following are equivalent:
(1) There are finitely many, over R integral elements a1 , . . . , am in A such that
A = R[a1 , . . . , am ].
C = {a ∈ A : a is integral over R} ⊂ A
Theorem 20.2.11. Let A|R be a ring extension. Then the integral closure of R in A is a
subring of A with R ⊂ A.
Definition 20.2.12. Let A|R a ring extension. R is called integrally closed in A, if R itself
is its integral closure in R; that is, R = C, the integral closure of R in A.
Theorem 20.2.13. For each ring extension A|R, the integral closure C of R in A, is inte-
grally closed in A.
20.2 Integral Ring Extensions � 301
Theorem 20.2.14. Let A|R and B|A be ring extensions. If A|R and B|A are integral exten-
sions, then also B|R is an integral extension (and certainly vice versa).
Theorem 20.2.17. Let R be an integral domain and K its quotient field. Let E|K be a finite
field extension. Let R be integrally closed and α ∈ E integral over R. Then the minimal
polynomial g ∈ K[x] of α over K has only coefficients of R.
Proof. Let g ∈ K[x] be the minimal polynomial of α over K (recall that g is monic by
definition). Let Ē be an algebraic closure of E. Then g(x) = (x − α1 ) ⋅ ⋅ ⋅ (x − αn ) with α1 = α
over E.̄ There are K-isomorphisms σi : K(α) → Ē with σi (α) = αi . Hence, all αi are also
integral over R. Since all coefficients of g are polynomial expressions Cj (α1 , . . . , αn ) in
the αi , we get that all coefficients of g are integral over R (see Theorem 20.2.11). Now
g ∈ R[x], because g ∈ K[x], and R is integrally closed.
Theorem 20.2.18. Let R be an integrally closed integral domain and K its quotient field.
Let f , g, h ∈ K[x] be monic polynomials over K with f = gh.
If f ∈ R[x], then also g, h ∈ R[x].
Theorem 20.2.19. Let E|R be an integral ring extension. If E is a field, then also R is a field.
1
Proof. Let α ∈ R \ {0}. The element α
∈ E satisfies an integral equation
n n−1
1 1
( ) + an−1 ( ) + ⋅ ⋅ ⋅ + a0 = 0
α α
1
= −an−1 − an−2 α − ⋅ ⋅ ⋅ − a0 αn−1 ∈ R.
α
Hence, R is a field.
(4) M is algebraically dependent if and only if there is a finite subset in M, which is alge-
braically dependent.
(5) M is algebraically independent if and only if each finite subset of M is algebraically
independent.
(6) M is algebraically independent if and only if the following holds: If α1 , . . . , αn are
finitely many, pairwise different elements of M, then the canonical homomorphism
ϕ : K[x1 , . . . , xn ] → E, f (x1 , . . . , xn ) → f (α1 , . . . , αn ) is injective; or in other words,
for all f ∈ K[x1 , . . . , xn ], we have that f = 0 if f (α1 , . . . , αn ) = 0. That is, there is no
nontrivial algebraic relation between the α1 , . . . , αn over K.
(7) Let M ⊂ E, α ∈ E. If M is algebraically independent and M ∪ {α} algebraically depen-
dent, then α ∈ H(M); that is, α is algebraically dependent on M.
(8) Let M ⊂ E, B ⊂ M. If B is maximal algebraically independent, that is, if α ∈ M \ B,
then B ∪ {α} is algebraically dependent, thus M ⊂ H(B). That is, each element of M is
algebraic over K(B).
We will show that any field extension can be decomposed into a transcendental
extension over an algebraic extension. We need the idea of a transcendence basis.
Definition 20.3.3. B ⊂ E is called a transcendence basis of the field extension E|K if the
following two conditions are satisfied:
1. E = H(B), that is, the extension E|K(B) is algebraic.
2. B is algebraically independent over K.
Proof. (1) implies (2): Let α ∈ M \ B. We have to show that B ∪ {α} is algebraically depen-
dent. But this is clear, because α ∈ H(B) = E.
(2) implies (3): We just take M = E.
(3) implies (1): We have to show that H(B) = E. Certainly, M ⊂ H(B).
Hence, E = H(M) ⊂ H(H(B)) = H(B) ⊂ E.
We next show that any field extension does have a transcendence basis:
Theorem 20.3.5. Each field extension E|K has a transcendence basis. More concretely, if
there is a subset M ⊂ E such that E|K(M) is algebraic and if there is a subset C ⊂ M,
which is algebraically independent, then there exists a transcendence basis B of E|K with
C ⊂ B ⊂ M.
304 � 20 Integral and Transcendental Extensions
Theorem 20.3.6. Let E|K be a field extension and M a subset of E, for which E|K(M) is
algebraic. Let C be an arbitrary subset of E, which is algebraically independent on K. Then
there exists a subset M ′ ⊂ M with C ∩ M ′ = 0 such that C ∪ M ′ is a transcendence basis
of E|K.
Theorem 20.3.7. Let B, B′ be two transcendence bases of the field extension E|K. Then
there is a bijection ϕ : B → B′ . In other words, any two transcendence bases of E|K have
the same cardinal number.
Proof. (a) If B is a transcendental basis of E|K and M is a subset of E such that E|K(M)
is algebraic, then we may write B = ⋃α∈M Bα with finite sets Bα . In particular, if B is
infinite, then the cardinal number of B is not bigger than the cardinal number of M.
(b) Let B and B′ be two transcendence bases of E|K. If B and B′ are both infinite,
then B and B′ have the same cardinal number by (a) and the theorem by Schroeder–
Bernstein [10]. We now prove Theorem 20.3.7 for the case that E|K has a finite transcen-
dence basis. Let B be finite with n elements. Let C be an arbitrary algebraically inde-
pendent subset in E over K with m elements. We show that m ≤ n. Let C = {α1 , . . . , αm }
with m ≥ n. We show, by induction, that for each integer k, 0 ≤ k ≤ n, there are subsets
B ⫌ B1 ⫌ ⋅ ⋅ ⋅ ⫌ Bk of B such that {α1 , . . . , αk } ∪ Bk is a transcendence basis of E|K, and
{α1 , . . . , αk } ∩ Bk = 0. For k = 0, we take B0 = B, and the statement holds. Assume now
that the statement is correct for 0 ≤ k < n. By Theorems 20.3.4 and 20.3.5, there is a
subset Bk+1 of {α1 , . . . , αk } ∪ Bk such that {α1 , . . . , αk+1 } ∪ Bk+1 is a transcendence basis of
E|K, and {α1 , . . . , αk+1 } ∩ Bk+1 = 0. Then necessarily, Bk+1 ⊂ Bk . Assume Bk = Bk+1 . Then
on the one hand, Bk ∪ {α1 , . . . , αk+1 } is algebraic independent because Bk = Bk+1 . On the
other hand, also Bk ∪ {α1 , . . . , αk } ∪ {ak+1 } is algebraically dependent, which gives a con-
tradiction. Hence, Bk+1 ⫋ Bk . Now Bk has at most n − k elements. Therefore, Bn = 0; that
is, {α1 , . . . , αn } = {α1 , . . . , αn }∪Bn is a transcendence basis of E|K. Because C = {α1 , . . . , αm }
is algebraically independent, we cannot have m > n. Thus, m ≤ n, and B and B′ have the
same number of elements, because B′ must also be finite.
20.3 Transcendental Field Extensions � 305
Since the cardinality of any transcendence basis for a field extension E|K is the
same, we can define the transcendence degree.
Definition 20.3.8. The transcendence degree trgd(E|K) of a field extension is the cardi-
nal number of one (and hence of each) transcendence basis of E|K. A field extension E|K
is called purely transcendental, if E|K has a transcendence basis B with E = K(B).
Theorem 20.3.9. Let E|K be a field extension and F an arbitrary intermediate field, that
is, K ⊂ F ⊂ E. Let B be a transcendence basis of F|K and B′ a transcendence base of E|F.
Then B ∩ B′ = 0, and B ∪ B′ is a transcendence basis of E|K.
In particular, trgd(E|K) = trgd(E|F) + trgd(F|K).
Proof. (1) Assume α ∈ B ∩ B′ . As an element of F, then α is algebraic over F(B′ ) \ {α}. But
this gives a contradiction, because α ∈ B′ , and B′ is algebraically independent over F.
(2) F|K(B) is an algebraic extension, and also F(B′ )|K(B ∪ B′ ) = K(B)(B′ ). Since the
relation “algebraic extension” is transitive, we have that E|K(B ∪ B′ ) is algebraic.
(3) Finally, we have to show that B ∪ B′ is algebraically independent over K. By The-
orems 20.3.5 and 20.3.6, there is a subset B′′ of B ∪ B′ with B ∩ B′′ = 0 such that B ∪ B′′ is
a transcendence basis of E|K. We have B′′ ⊂ B′ , and have to show that B′ ⊂ B′′ . Assume
that there is an α ∈ B′ with α ∉ B′′ . Then α is algebraic over K(B ∪ B′′ ) = K(B)(B′′ ), and
hence algebraic over F(B′′ ). Since B′′ ⊂ B′ , we have that α is algebraically independent
over F, which gives a contradiction. Hence, B′′ = B′ .
a ∈ A \ K[u1 , . . . , um ]
306 � 20 Integral and Transcendental Extensions
of degree n ≥ 1 with
Proof. Without loss of generality, let the a1 , . . . , an be pairwise different. We prove the
theorem by induction on n. If n = 1, then there is nothing to show. Now, let n ≥ 2,
and assume that the statement holds for n − 1. If there is no nontrivial algebraic re-
lation f (a1 , . . . , an ) = 0 over K between the a1 , . . . , an , then there is nothing to show.
Hence, let there exist a polynomial f ∈ K[x1 , . . . , xn ] with f ≠ 0 and f (a1 , . . . , an ) = 0. Let
ν ν
f = ∑ν=(ν1 ,...,νn ) cν x1 1 ⋅ ⋅ ⋅ xnn . Let μ2 , μ3 , . . . , μn be natural numbers, which we specify later.
μ2 μ μ μ
Define b2 = a2 − a1 , b3 = a3 − a1 3 , . . . , bn = an − a1 n . Then ai = bi + a1 i for 2 ≤ i ≤ n,
μ2 μn
hence, f (a1 , b2 + a1 , . . . , bn + a1 ) = 0. We write R := K[x1 , . . . , xn ] and consider the poly-
nomial ring R[y2 , . . . , yn ] of the n − 1 independent indeterminates y2 , . . . , yn over R. In
μ μ
R[y2 , . . . , yn ], we consider the polynomial f (x1 , y2 + x1 2 , . . . , yn + x1 n ). We may rewrite this
polynomial as
ν +μ2 ν2 +⋅⋅⋅+μn νn
∑ cν x1 1 + g(x1 , y2 , . . . , yn )
ν=(ν1 ,...,νn )
Therefore, R is a field, which proves the claim. This is possible only for m = 0, and then
E|K is integral; here, that is algebraic.
z
By (∫0 1 )γ , we mean the integral from 0 to z1 along γ. Recall that
z1 z1
Let |f |(x) be the polynomial we get if we replace the coefficients of f (x) by their absolute
values. Since |ez1 −z | ≤ e|z1 −z| ≤ e|z1 | , we get
(2) |I(z1 )| ≤ |z1 |e|z1 | |f |(|z1 |).
For a detailed proof of these facts see for instance [52]. We consider now the polynomial
f (x) = x p−1 (x − 1)p ⋅ ⋅ ⋅ (x − n)p with p a sufficiently large prime number, and we consider
I(z1 ) with respect to this polynomial. Let J = q0 I(0) + q1 I(1) + ⋅ ⋅ ⋅ + qn I(n).
From (1) and (3), we get that
m n
J = − ∑ ∑ qk f (j) (k),
j=0 k=0
Now, f (j) (k) = 0 if j < p, k > 0, and if j < p − 1, then k = 0. Hence, f (j) (k) is
an integer that is divisible by p! for all j, k, except for j = p − 1, k = 0. Furthermore,
f (p−1) (0) = (p − 1)!(−1)np (n!)p . Hence, if p > n, then f (p−1) (0) is an integer divisible by
(p − 1)!, but not by p!. It follows that J is a nonzero integer that is divisible by (p − 1)!
if p > |q0 | and p > n. So let p > n, p > |q0 |, so that |J| ≥ (p − 1)!. Now, |f |(k) ≤ (2n)m .
Together with (2), we then get that |J| ≤ |q1 |e|f |(1) + ⋅ ⋅ ⋅ + |qn |nen |f |(n) ≤ cp for a number
c independent of p. It follows that
(p − 1)! ≤ |J| ≤ cp ;
that is,
|J| cp−1
1≤ ≤c .
(p − 1)! (p − 1)!
cp−1
This gives a contradiction, since (p−1)!
→ 0 as p → ∞.
Proof.
ann−1 f (x) = ann x n + ann−1 an−1 x n−1 + ⋅ ⋅ ⋅ + ann−1 a0
= (an x)n + an−1 (an x)n−1 + ⋅ ⋅ ⋅ + ann−1 a0
= g(an x) = g(y) ∈ ℤ[y],
Proof. Assume that π is an algebraic number. Then θ = iπ is also algebraic. Consider the
conjugates θ1 = θ, θ2 , . . . , θd of θ. Suppose
The product on the left side can be written as a sum of 2d terms eϕ , where
ϕ = ϵ1 θ1 + ⋅ ⋅ ⋅ + ϵd θd ,
q + eα1 + ⋅ ⋅ ⋅ + eαn = 0
with q = 2d − n > 0. Recall that all tαi are algebraic integers, and we consider the poly-
nomial
f (x) = t np x p−1 (x − α1 )p ⋅ ⋅ ⋅ (x − αn )p
with p a sufficiently large prime integer. We have f (x) ∈ ℝ[x], since the αi are algebraic
numbers, and the elementary symmetric polynomials in α1 , . . . , αn are rational numbers.
Let I(z1 ) be defined as in the proof of Theorem 20.4.1, and now let
J = I(α1 ) + ⋅ ⋅ ⋅ + I(αn ).
with m = (n + 1)p − 1.
Now, ∑nk=1 f (j) (αk ) is a symmetric polynomial in tα1 , . . . , tαn with integer coefficients,
since the tαi are algebraic integers. It follows from the main theorem on symmetric poly-
nomials that ∑m n
j=0 ∑k=1 f (αk ) is an integer. Furthermore, f (αk ) = 0 for j < p. Hence,
(j) (j)
∑m n
j=0 ∑k=1 f (αk ) is an integer divisible by p!. Now, f (0) is an integer divisible by p! if
(j) (j)
(p − 1)! ≤ |J| ≤ cp ;
that is,
|J| cp−1
1≤ ≤c .
(p − 1)! (p − 1)!
cp−1
This, as before, gives a contradiction, since (p−1)!
→ 0 as p → ∞. Therefore, π is
transcendental.
20.5 Exercises
1. A polynomial p(x) ∈ ℤ[x] is primitive if the GCD of all its coefficients is 1. Prove the
following:
(i) If f (x) and g(x) are primitive, then so is f (x)g(x).
(ii) If f (x) ∈ ℤ[x] is monic, then it is primitive.
(iii) If f (x) ∈ ℚ[x], then there exists a rational number c such that f (x) = cf1 (x) with
f1 (x) primitive.
2. Let d be a square-free integer and K = ℚ(√d) be a quadratic field. Let RK be the
subring of K of the algebraic integers of K. Show the following:
(i) RK = {m + n√d : m, n ∈ ℤ} if d ≡ 2 (mod 4) or d ≡ 3 (mod 4). {1, √d} is an
integral basis for RK .
(ii) RK = {m + n 1+2 d : m, n ∈ ℤ} if d ≡ 1 (mod 4). {1, 1+2 d } is an integral basis for RK .
√ √
(ii) trgd(E|K) < ∞ and [E : K(B)] < ∞ for each transcendence basis B of E|K.
(iii) There is a finite transcendence basis B of E|K with [E : K(B)] < ∞.
(iv) There are finitely many x1 , . . . , xn ∈ E with E = K(x1 , . . . , xn ).
7. Let E|K be a field extension. If E|K is purely transcendental, then K is algebraically
closed in E.
21 The Hilbert Basis Theorem and the Nullstellensatz
21.1 Algebraic Geometry
An extremely important application of abstract algebra and an application central to
all of mathematics is the subject of algebraic geometry. As the name suggests this is
the branch of mathematics that uses the techniques of abstract algebra to study geo-
metric problems. Classically, algebraic geometry involved the study of algebraic curves,
which roughly are the sets of zeros of a polynomial or set of polynomials in several vari-
ables over a field. For example, in two variables a real algebraic plane curve is the set
of zeros in ℝ2 of a polynomial p(x, y) ∈ ℝ[x, y]. The common planar curves, such as
parabolas and the other conic sections, are all plane algebraic curves. In actual prac-
tice, plane algebraic curves are usually considered over the complex numbers and are
projectivized.
The algebraic theory that deals most directly with algebraic geometry is called com-
mutative algebra. This is the study of commutative rings, ideals in commutative rings,
and modules over commutative rings. A large portion of this book has dealt with com-
mutative algebra.
Although we will not consider the geometric aspects of algebraic geometry in gen-
eral, we will close the book by introducing some of the basic algebraic ideas that are
crucial to the subject. These include the concept of an algebraic variety or algebraic set
and its radical. We also state and prove two of the cornerstones of the theory as applied
to commutative algebra—the Hilbert basis theorem and the nullstellensatz.
In this chapter, we also often consider a fixed field extension C|K and the polynomial
ring K[x1 , . . . , xn ] of the n independent indeterminates x1 , . . . , xn . Again, in this chapter,
we often use letters a, b, m, p, P, A, Q, . . . for ideals in rings.
n
𝒩 (M) = {(α1 , . . . , αn ) ∈ C : f (α1 , . . . , αn ) = 0 ∀f ∈ M}.
For any subset N of C n , we can reverse the procedure and consider the set of poly-
nomials, whose zero set is N.
https://doi.org/10.1515/9783111142524-021
21.2 Algebraic Varieties and Radicals � 313
Instead of f ∈ I(N), we also say that f vanishes on N (over K). If we want to mention K,
then we write I(N) = IK (N).
What is important is that the set I(N) forms an ideal. The proof is straightforward.
Theorem 21.2.3. For any subset N ⊂ C n , the set I(N) is an ideal in K[x1 , . . . , xn ]; it is called
the vanishing ideal of N ⊂ C n in K[x1 , . . . , xn ].
The following result examines the relationship between subsets in C n and their van-
ishing ideals.
Proof. The proofs are straightforward. Hence, we prove only (7), (8), and (9). The rest
can be left as exercise for the reader.
Proof of (7): Since ab ⊂ a ∩ b ⊂ a, b, we have, by (1), the inclusion
a ⊲ K[x1 , . . . , xn ].
If a ⊲ K[x1 , . . . , xn ], then we do not have a = I 𝒩 (a) in general. That is, a is, in general,
not equal to the vanishing ideal of its zero set in C n . The reason for this is that not each
ideal a occurs as a vanishing ideal of some N ⊂ C n . If a = I(N), then we must have that
f m ∈ a for m ≥ 1 implies f ∈ a.
Hence, for instance, if a = (x12 , . . . , xn2 ) ⊲ K[x1 , . . . , xn ], then a is not of the form a =
I(N) for some N ⊂ C n . We now define the radical of an ideal:
√a = { f ∈ R : f m ∈ a for some m ∈ ℕ}
We note that the √0 is called the nil radical of R; it contains exactly the nilpotent
elements of R; that is, the elements a ∈ R with am = 0 for some m ∈ ℕ.
Let a ⊲ R be an ideal in R and π : R → R/a the canonical mapping. Then √a is exactly
the preimage of the nil radical of R/a.
Theorem 21.3.2. Let R be a noetherian ring. Then the polynomial ring R[x] over R is also
noetherian.
Proof. Let 0 ≠ fk ∈ R[x]. We denote the degree of fk with deg(fk ). Let a ⊲ R[x] be an ideal
in R[x]. Assume that a is not finitely generated. Then, particularly, a ≠ 0. We construct a
sequence of polynomials fk ∈ a such that the highest coefficients ak generate an ideal in
R, which is not finitely generated. This produces then a contradiction; hence, a is in fact
finitely generated. Choose f1 ∈ a, f1 ≠ 0, so that deg(f1 ) = n1 is minimal.
21.4 The Nullstellensatz � 315
Theorem 21.3.3 (Hilbert basis theorem). Let K be a field. Then any ideal a ⊲ K[x1 , . . . , xn ]
is finitely generated; that is, a = (f1 , . . . , fm ) for finitely many f1 , . . . , fm ∈ K[x1 , . . . , xn ].
Corollary 21.3.4. If C|K is a field extension, then each algebraic K-set V of C n is already
the zero set of only finitely many polynomials f1 , . . . , fm ∈ K[x1 , . . . , xn ]:
f m ∈ a, m ≥ 1 ⇒ f ∈ a
Theorem 21.4.1 (Hilbert’s nullstellensatz, first form). Let C|K be a field extension with C
algebraically closed. If a ⊲ K[x1 , . . . , xn ], then I 𝒩 (a) = √a. Moreover, if a is reduced, that
is, a = √a, then I 𝒩 (a) = a. Therefore, 𝒩 defines a bijective map between the set of reduced
ideals in K[x1 , . . . , xn ] and the set of the algebraic K-sets in C n , and I defines the inverse
map.
316 � 21 The Hilbert Basis Theorem and the Nullstellensatz
Theorem 21.4.2 (Hilbert’s nullstellensatz, second form). Let C|K be a field extension with
C algebraically closed. Let a ⊲ K[x1 , . . . , xn ] with a ≠ K[x1 , . . . , xn ]. Then there exists an
α = (α1 , . . . , αn ) ∈ C n with f (α) = 0 for all f ∈ a; that is, 𝒩C (a) ≠ 0.
Proof of Theorem 21.4.1. Let a ⊲ K[x1 , . . . , xn ], and let f ∈ I 𝒩 (a). We have to show that
f m ∈ a for some m ∈ ℕ. If f = 0, then there is nothing to show.
Now, let f ≠ 0. We consider K[x1 , . . . , xn ] as a subring of K[x1 , . . . , xn , xn+1 ] of the n + 1
independent indeterminates x1 , . . . , xn , xn+1 . In K[x1 , . . . , xn , xn+1 ], we consider the ideal
ā = (a, 1 − xn+1 f ) ⊲ K[x1 , . . . , xn , xn+1 ], generated by a and 1 − xn+1 f .
Case 1: ā ≠ K[x1 , . . . , xn , xn+1 ]. Then ā has a zero (β1 , . . . , βn , βn+1 ) in C n+1 by Theo-
rem 21.2.4. Hence, for (β1 , . . . , βn , βn+1 ) ∈ 𝒩 (a), ̄ we have the equations:
(1) g(β1 , . . . , βn ) = 0 for all g ∈ a, and
(2) f (β1 , . . . , βn )βn+1 = 1.
From (1), we get (β1 , . . . , βn ) ∈ 𝒩 (a). In particular, f (β1 , . . . , βn ) = 0 for our f ∈ I 𝒩 (a).
But this contradicts (2). Therefore, ā ≠ K[x1 , . . . , xn , xn+1 ] is not possible. Thus, we have
Case 2: ā = K[x1 , . . . , xn , xn+1 ], that is, 1 ∈ a.̄ Then there exists a relation of the form
V1 ⊃ V2 ⊃ ⋅ ⋅ ⋅ ⊃ Vm ⊃ Vm+1 ⊃ ⋅ ⋅ ⋅ (21.1)
21.5 Applications and Consequences of Hilbert’s Theorems � 317
Proof. We apply the operator I; that is, we pass to the vanishing ideals. This gives an
ascending chain of ideals
The union of the I(Vi ) is an ideal in K[x1 , . . . , xn ], and hence, by Theorem 21.3.3,
finitely generated. Therefore, there is an m with I(Vm ) = I(Vm+1 ) = I(Vm+2 ) = ⋅ ⋅ ⋅ .
Now we apply the operator 𝒩 and get the desired result, because Vi = 𝒩 I(Vi ) by
Theorem 21.2.4 (10).
Proof. (1) Let V be irreducible. Let fg ∈ I(V ). Then V = 𝒩 I(V ) ⊂ 𝒩 (fg) = 𝒩 (f ) ∪ 𝒩 (g);
hence, V = V1 ∪ V2 with the algebraic K-sets V1 = 𝒩 (f ) ∩ V and V2 = 𝒩 (g) ∩ V . Now
V is irreducible; hence, V = V1 , or V = V2 , say V = V1 . Then V ⊂ 𝒩 (f ). Therefore,
f ∈ I 𝒩 (f ) ⊂ I(V ). Since V ≠ 0, we have further 1 ∉ I(V ); that is, I(V ) ≠ R.
(2) Let I(V )⊲R with I(V ) ≠ R be a prime ideal. Let V = V1 ∪V2 , V1 ≠ V , with algebraic
K-sets Vi in C n . First,
where I(V1 )I(V2 ) is the ideal generated by all products fg with f ∈ I(V1 ), g ∈ I(V2 ).
We have I(V1 ) ≠ I(V ), because otherwise V1 = 𝒩 I(V1 ) = 𝒩 I(V ) = V contradicting
V1 ≠ V . Hence, there is a f ∈ I(V1 ) with f ∉ I(V ). Now, I(V ) ≠ R is a prime ideal; hence,
necessarily I(V2 ) ⊂ I(V ) by (⋆). It follows that V ⊂ V2 . Therefore, V is irreducible.
Note that the affine space K n is, as the zero set of the zero polynomial 0, itself an
algebraic K-set in K n . If K is infinite, then I(K n ) = {0}. Hence, K n is irreducible by The-
orem 21.5.3. Moreover, if K is infinite, then K n can not be written as a union of finitely
many proper algebraic K-subsets. If K is finite, then K n is not irreducible.
Furthermore, each algebraic K-set V in C n is also an algebraic C-set in C n . If V is
an irreducible algebraic K-set in C n , then—in general—it is not an irreducible algebraic
C-set in C n .
Theorem 21.5.4. Let V be an algebraic K-set in C n . Then V can be written as a finite union
V = V1 ∪ V2 ∪ ⋅ ⋅ ⋅ ∪ Vr of irreducible algebraic K-sets Vi in C n . If here Vi ⊈ Vk for all pairs
318 � 21 The Hilbert Basis Theorem and the Nullstellensatz
(i, k) with i ≠ k, then this presentation is unique, up to the ordering of the Vi , and then the
Vi are called the irreducible K-components of V .
Proof. Let a be the set of all algebraic K-sets in C n , which can not be presented as a finite
union of irreducible algebraic K-sets in C n .
Assume that a ≠ 0. By Theorem 21.4.1, there is a minimal element V in a. This V
is not irreducible, otherwise we have a presentation as desired. Hence, there exists a
presentation V = V1 ∪ V2 with algebraic K-sets Vi , which are strictly smaller than V . By
definition, both V1 and V2 have a presentation as desired; hence, V also has one, which
gives a contradiction. Hence, a = 0.
Now suppose that V = V1 ∪ ⋅ ⋅ ⋅ ∪ Vr = W1 ∪ ⋅ ⋅ ⋅ ∪ Ws are two presentations of the
desired form. For each Vi , we have a presentation Vi = (Vi ∩ W1 ) ∪ ⋅ ⋅ ⋅ ∪ (Vi ∩ Ws ). Each
Vi ∩ Wj is a K-algebraic set (see Theorem 21.2.4). Since Vi is irreducible, we get that there
is a Wj with Vi = Vi ∩ Wj , that is, Vi ⊂ Wj . Analogously, for this Wj , there is a Vk with
Wj ⊂ Vk . Altogether, Vi ⊂ Wj ⊂ Vk . But Vp ⊈ Vq if p ≠ q. Hence, from Vi ⊂ Wj ⊂ Vk ,
we get i = k. Therefore, Vi = Wj ; that means, for each Vi there is a Wj with Vi = Wj .
Analogously, for each Wk , there is a Vl with Wk = Vl . This proves the theorem.
Definition 21.5.6. Let V be an algebraic K-set in C n . Then the residue class ring
K[V ] can be identified with the ring of all those functions V → C, which are given
by polynomials from K[x1 , . . . , xn ]. As a homomorphic image of K[x1 , . . . , xn ], we get that
K[V ] can be described in the form K[V ] = K[α1 , . . . , αn ]; therefore, a K-algebra of the
form K[α1 , . . . , αn ] is often called an affine K-algebra. If the algebraic K-set V in C n is
irreducible—we can call V now an (affine) K-variety in C n —then K[V ] is an integral
domain with an identity, because I(V ) is then a prime ideal with I(V ) ≠ R by Theo-
rem 21.4.2. The quotient field K(V ) = Quot K[V ] is called the field of rational functions
on the K-variety V .
We note the following:
1. If C is algebraically closed, then V = C n is a K-variety, and K(V ) is the field
K(x1 , . . . , xn ) of the rational functions in n variables over K.
2. Let the affine K-algebra A = K[α1 , . . . , αn ] be an integral domain with an identity
1 ≠ 0. Then A ≅ K[x1 , . . . , xn ]/p for some prime ideal p ≠ K[x1 , . . . , xn ]. Hence, if C
is algebraically closed, then A is isomorphic to the coordinate ring of the K-variety
V = 𝒩 (p) in C n (see Hilbert’s nullstellensatz, first form, Theorem 21.4.1).
21.6 Dimensions � 319
Example 21.5.7. Let ω1 , ω2 ∈ ℂ two elements which are linear independent over ℝ. An
element ω = m1 ω1 + m2 ω2 with m1 , m2 ∈ ℤ, is called a period. The periods describe an
Abelian group Ω = {m1 ω1 + m2 ω2 : m1 , m2 ∈ ℤ} ≅ ℤ ⊕ ℤ and give a lattice in ℂ.
1 1 1
℘(z) = 2
+ ∑ ( 2
− 2 ),
z 0=w∈Ω
̸ (z − w) w
is an elliptic function.
1 1
With g2 = 60 ∑0=w∈Ω ̸ w4
, and g3 = 140 ∑0=w∈Ω
̸ w6
, we get the differential equation
2 3
℘ (z) = 4℘(z) + g2 ℘(z) + g3 = 0. The set of elliptic functions is a field E, and each elliptic
′
function is a rational function in ℘ and ℘′ (for details see, for instance, [44]).
The polynomial f (t) = t 2 − 4s3 + g2 s + g3 ∈ ℂ(s)[t] is irreducible over ℂ(s). For the
corresponding algebraic ℂ(s)-set V , we get K(V ) = ℂ(s)[t]/(t 2 − 4s3 + g2 s + g3 ) ≅ E with
respect to t → ℘′ , s → ℘.
21.6 Dimensions
From now we assume that C is algebraically closed.
(2) Let A be a commutative ring with an identity 1 ≠ 0. The height h(p) of a prime ideal
p ≠ A of A is said to be the supremum of all integers m, for which there exists a
strictly ascending chain p0 ⊊ p1 ⊊ ⋅ ⋅ ⋅ ⊊ pm = p of prime ideals pi of A with pi ≠ A.
The dimension (Krull dimension) dim(A) of A is the supremum of the heights of all
prime ideals ≠ A in A.
Proof. By Theorem 21.2.4 and Theorem 21.4.2, we have a bijective map between the
K-varieties W with W ⊂ V and the prime ideals ≠ R = K[x1 , . . . , xn ] of R, which con-
tain I(V ) (the bijective map reverses the inclusion). But these prime ideals correspond
exactly with the prime ideals ≠ K[V ] of K[V ] = K[x1 , . . . , xn ]/I(V ), which gives the state-
ment.
Theorem 21.6.3. Let A = K[α1 , . . . , αn ] be an affine K-algebra, and let A be also an integral
domain. Let {0} = p0 ⊊ p1 ⊊ ⋅ ⋅ ⋅ ⊊ pm be a maximal strictly ascending chain of prime ideals
in A (such a chain exists since A is noetherian). Then m = trgd(A|K) = dim(A). In other
words;
All maximal ideals of A have the same height, and this height is equal to the transcen-
dence degree of A over K.
Lemma 21.6.5. Let R be an unique factorization domain. Then each prime ideal p with
height h(p) = 1 is a principal ideal.
Lemma 21.6.6. Let R = K[y1 , . . . , yr ] be the polynomial ring of the r independent indeter-
minates y1 , . . . , yr over the field K (recall that R is a unique factorization domain). If p is
21.6 Dimensions � 321
a prime ideal in R with height h(p) = 1, then the residue class ring R̄ = R/p has transcen-
dence degree r − 1 over K.
Proof. By Lemma 21.6.5, we have that p = (p) for some nonconstant polynomial
p ∈ K[y1 , . . . , yr ]. Let the indeterminate y = yr occur in p, that is, degy (p) ≥ 1, the
degree in y. If f is a multiple of p, then also degy (f ) ≥ 1. Hence, p ∩ K[y1 , . . . , yr ] ≠ {0}.
Therefore, the residue class mapping R → R̄ = K[ȳ1 , . . . , ȳr ] induces an isomorphism
K[y1 , . . . , yr−1 ] → K[ȳ1 , . . . , ȳr−1 ] of the subring K[y1 , . . . , yr−1 ]; that is, ȳ1 , . . . , ȳr−1 are al-
gebraically independent over K. On the other hand, p(ȳ1 , . . . , ȳr−1 , ȳr ) = 0 is a nontrivial
algebraic relation for ȳr over K(ȳ1 , . . . , ȳr−1 ).
Hence, altogether trgd(R|K) ̄ = trgd(K(ȳ1 , . . . , ȳr )|K) = r − 1 by Theorem 20.3.9.
Before we describe the last technical lemma, we need some preparatory theoretical
material.
Let R, A be integral domains (with identity 1 ≠ 0), and let A|R be a ring extension.
We first consider only R.
(1) A subset S ⊂ R \ {0} is called a multiplicative subset of R if 1 ∈ S for the identity 1
of R, and if s, t ∈ S, then also, st ∈ S. (x, s) ∼ (y, t) :⇔ xt − ys = 0 defines an equivalence
relation on M = R × S. Let xs be the equivalence class of (x, s) and S −1 R, the set of all
equivalence classes. We call xs a fraction. If we add and multiply fractions as usual,
we get that S −1 R becomes an integral domain; it is called the ring of fractions of R with
respect to S. If, in particular, S = R \ {0}, then S −1 R = Quot(R), the quotient field of R.
Now, back to the general situation. i : R → S −1 R, i(r) = r1 , defines an embedding of
R into S −1 R. Hence, we may consider R as a subring of S −1 R. For each s ∈ S ⊂ R \ {0},
we have that i(s) is an unit in S −1 R. That is, i(s) is invertible, and each element of S −1 R
has the form i(s)−1 i(r) with r ∈ R, s ∈ S. Therefore, S −1 R is uniquely determined up to
isomorphisms, and we have the following universal property:
If ϕ : R → R′ is a ring homomorphism (of integral domains) with ϕ(s) invertible
for each s ∈ S, then there exist exactly one ring homomorphism λ : S −1 R → R′ with
λ ∘ i = ϕ. If a ⊲ R is an ideal in a, then we write S −1 a for the ideal in S −1 R, generated
by i(a). S −1 a is the set of all elements of the form as with a ∈ a and s ∈ S. Furthermore,
S −1 a = (1) ⇔ a ∩ S ≠ 0.
Vice versa; if A ⊲ S −1 R is an ideal in S −1 R, then we also denote the ideal i−1 (A) ⊲ R
with A ∩ R. An ideal a ⊲ R is of the form a = i−1 (A) if and only if there is no s ∈ S such that
its image in R/a under the canonical map R → R/a is a proper zero divisor in R/a. Under
the mapping P → P ∩ R and p → S −1 p, the prime ideals in S −1 R correspond exactly to
the prime ideals in R, which do not contain an element of S.
We now identify R with i(R):
(2) Now, let p ⊲ R be a prime ideal in R. Then S = R \ p is multiplicative. In this
case, we write Rp instead of S −1 R, and call Rp the quotient ring of R with respect to p,
or the localization of R of p. Put m = pRp = S −1 p. Then 1 ∉ m. Each element of Rp /m is
a unit in Rp and vice versa. In other words, each ideal a ≠ (1) in Rp is contained in m,
or equivalently, m is the only maximal ideal in Rp . A commutative ring with an identity
322 � 21 The Hilbert Basis Theorem and the Nullstellensatz
1 ≠ 0, which has exactly one maximal ideal, is called a local ring. Hence, Rp is a local
ring. From part (1), we additionally get the prime ideals of the local ring Rp correspond
bijectively to the prime ideals of R, which are contained in p.
(3) Now we consider our ring extension A|R as above. Let q be a prime ideal in R.
Claim: If qA ∩ R = q, then there exists a prime ideal Q ⊲ A with Q ∩ R = q (and vice
versa).
Proof of the claim: If S = R \ q, then qA ∩ S = 0. Hence, qS −1 A is a proper ideal in
S −1 A, and hence contained in a maximal ideal m in S −1 A. Here, qS −1 A is the ideal in S −1 A,
which is generated by q. Define Q = m ∩ A; Q is a prime ideal in A, and Q ∩ R = q by
part (1), because Q ∩ S = 0, where S = R \ q.
(4) Now let A|R be an integral extension (A, R integral domains as above). Assume
that R is integrally closed in its quotient field K. Let P ⊲ A be a prime ideal in A and
p = P ∩ R.
Claim: If q ⊲ R is a prime ideal in A with q ⊂ p then qAp ∩ R = q.
Proof of the claim: An arbitrary β ∈ qAp has the form β = αs with α ∈ qA, qA (the
ideal in A generated by q), and s ∈ S = A \ p. An integral equation for α ∈ qA over K
is given a form αn + an−1 αn−1 + ⋅ ⋅ ⋅ + a0 = 0 with ai ∈ q. This can be seen as follows:
we have certainly a form α = b1 α1 + ⋅ ⋅ ⋅ + bm αm with bi ∈ q and αi ∈ A. The subring
A′ = R[α1 , . . . , αm ] is, as an R-module, finitely generated, and αA′ ⊂ qA′ . Now, ai ∈ q
follows with the same type of arguments as in the proof of Theorem 20.2.4.
Now, in addition, let β ∈ R. Then, for s = βα , we have an equation
an−1 n−1 a
sn + s + ⋅ ⋅ ⋅ + 0n = 0
β β
a
over K. But s is integral over R; hence, all βn−1i ∈ R.
We are now prepared to prove the last preliminary lemma, which we need for the
proof of Theorem 21.6.3.
Lemma 21.6.7 (Krull’s going up lemma). Let A|R be an integral ring extension of integral
domains, and let R be integrally closed in its quotient field. Let p and q be prime ideals in
R with q ⊂ p. Furthermore, let P be a prime ideal in A with P ∩ R = p. Then there exists a
prime ideal Q in A with Q ∩ R = q, and Q ⊂ P.
Proof. It is enough to show that there exists a prime ideal Q in Ap with Q ∩ R = q. This
can be seen from the preceding preparations. By part (1) and (2) such a Q has the form
Q = Q′ Ap with a prime ideal Q′ in A with Q′ ⊂ P, and Q ∩ A = Q′ . It follows that
q = Q′ ∩ R ⊂ P ∩ R = p. And the existence of such a Q follows from parts (3) and (4).
Proof of Theorem 21.6.3. Let first be m = 0. Then {0} is a maximal ideal in A; and
hence, A = K[α1 , . . . , αn ] a field. By Corollary 20.3.11 then, A|K is algebraic; therefore,
trgd(A|K) = 0. So, Theorem 21.3.3 holds for m = 0.
Now, let m ≥ 1. We use Noether’s normalization theorem. A has a polynomial ring
R = K[y1 , . . . , yr ] of the r independent indeterminates y1 , . . . , yr as a subring, and A|R is
21.6 Dimensions � 323
{0} = P0 ⊊ P1 ⊊ ⋅ ⋅ ⋅ ⊊ Pm (21.3)
{0} = p0 ⊂ p1 ⊂ ⋅ ⋅ ⋅ ⊂ pm (21.4)
of prime ideals pi = Pi ∩ R of R. Since A|R is integral, the chain (21.4) is also a strictly
ascending chain. This follows from Krull’s going up lemma (Lemma 21.6.7), because if
pi = pj , then Pi = Pj . If Pm is a maximal ideal in A, then also pm is a maximal ideal in
R, because A|R is integral (consider A/Pm and use Theorem 20.2.19). If the chain (21.3) is
maximal and strictly, then also the chain (2).
Now, let the chain (21.3) be maximal and strictly. If we pass to the residue class rings
Ā = A/P1 and R̄ = R/p1 , then we get the chains of prime ideals
for the affine K-algebras Ā and R,̄ respectively, but with a 1 less length. By induction,
we may assume that already trgd(A|K) ̄ = m − 1 = trgd(R|K).̄ On the other hand, by
construction, we have trgd(A|K) = trgd(R|K) = r. Finally, to prove Theorem 21.3.3, we
have to show that r = m. If we compare both equations, then r = m follows if trgd(R|K)
̄ =
r − 1. But this holds by Lemma 21.6.6.
Proof. (1) Let V be a K-variety in C n with dim(V ) = n − 1. The corresponding ideal (in
the sense of Theorem 21.2.4) is by Theorem 21.4.2 a prime ideal p in K[x1 , . . . , xn ]. By
Theorem 21.3.3 and Corollary 21.3.4, we get h(p) = 1 for the height of p, because dim(V ) =
n − 1 (see also Theorem 21.3.2). Since K[x1 , . . . , xn ] is a unique factorization domain, we
get that p = (f ) is a principal ideal by Lemma 21.6.5.
(2) Now let f ∈ K[x1 , . . . , xn ] be irreducible. We have to show that V = 𝒩 (f ) has
dimension n − 1. For that, by Theorem 21.6.3, we have to show that the prime ideal p =
(f ) has the height h(p) = 1. Assume that this is not the case. Then there exists a prime
e e
ideal q ≠ p with {0} ≠ q ⊂ p. Choose g ∈ q, g ≠ 0. Let g = uf e1 π2 2 ⋅ ⋅ ⋅ πr r be its prime
factorization in K[x1 , . . . , xn ]. Now g ∈ q and f ∉ q, because q ≠ p. Hence, there is a πi in
q ⊊ p = (f ), which is impossible. Therefore h(p) = 1.
324 � 21 The Hilbert Basis Theorem and the Nullstellensatz
21.7 Exercises
1. Let A = K[a1 , . . . , an ] and C|K be a field extension with C algebraically closed. Show
that there is a K-algebra homomorphism K[a1 , . . . , an ] → C.
2. Let K[x1 , . . . , xn ] be the polynomial ring of the n independent indeterminates
x1 , . . . , xn over the algebraically closed field K. The maximal ideals of K[x1 , . . . , xn ]
are exactly the ideals of the form m(α) = (x1 − α1 , x2 − α2 , . . . , xn − αn ) with
α = (α1 , . . . , αn ) ∈ K n .
3. The nil radical √0 of A = K[a1 , . . . , an ] corresponds with the Jacobson radical of A,
that is, the intersection of all maximal ideals of A.
4. Let R be a commutative ring with 1 ≠ 0. If each prime ideal of R is finitely generated,
then R is noetherian.
5. Prove the theoretical preparations for Krull’s going up lemma in detail.
6. Let K[x1 , . . . , xn ] be the polynomial ring of the n independent indeterminates
x1 , . . . , xn . For each ideal a of K[x1 , . . . , xn ], there exists a natural number m with the
following property: if f ∈ K[x1 , . . . , xn ] vanishes on the zero set of a, then f m ∈ a.
7. Let K be a field with char K ≠ 2 and a, b ∈ K ⋆ . We consider the polynomial
as the polynomial ring of the independent indeterminates x and y. Let C be the al-
gebraic closure of K(x) and β ∈ C with f (x, β) = 0. Show the following:
(i) f is irreducible over the algebraic closure C0 of K (in C).
(ii) trgd(K(x, β)|K) = 1, [K(x, β) : K(x)] = 2, and K is algebraically closed in K(x, β).
22 Algebras and Group Representations
22.1 Group Representations
In Chapter 13, we spoke about group actions. These are homomorphisms from a group G
into a set of permutations on a set S. The way a group G acts on a set S can often be used
to study the structure of the group G, and, in Chapter 13, we used group actions to prove
the important Sylow theorems.
In this chapter, we discuss a very important type of group action called a group
representation or linear representation. This is a homomorphism of a group G into the
set of linear transformations of a vector space V over a field K. It is a finite-dimensional
representation if V is a finite dimensional vector space over K, and infinite-dimensional
otherwise. For an n-dimensional representation, each element of the group G can be
represented by an (n × n)-matrix over K, and the group operation can be represented
by matrix multiplication. As with general group actions, much information about the
structure of the group G can be obtained from representations. In particular, in this
chapter, we will present an important Burnside theorem, which shows that any finite
group, whose order is divisible by only two primes, must be solvable.
Representations of groups are important in many areas of mathematics. Group rep-
resentations allow many group-theoretic problems to be reduced to problems in linear
algebra, which is well understood. They are also important in physics and the study of
physical structure, because they describe how the symmetry group of a physical system
affects the solutions of equations describing that system.
The theory of group representations can be divided into several areas depending on
the kind of group being represented. The various areas can be quite different in detail,
though the basic definitions and concepts are the same. The most important areas are:
(1) The theory of finite group representations. Group representations constitute a cru-
cial tool in the study of finite groups. They also arise in applications of finite group
theory to crystallography and to geometry.
(2) Group representations of compact and locally compact groups. Using integration
theory and Haar measure, many of the results on representations of finite groups
can be extended to infinite locally compact groups. The resulting theory is a cen-
tral part of the area of mathematics called harmonic analysis. Pontryagin dual-
ity describes the theory for commutative groups as a generalized Fourier trans-
form.
(3) Representations of Lie Groups. Lie groups are continuous groups with a differen-
tiable structure. Most of the groups that arise in physics and chemistry are Lie
groups, and their representation theory is important to the application of group
theory in those fields.
(4) Linear algebraic groups are the analogues of Lie groups, but over more general
fields than just the reals or complexes. Their representation theory is more compli-
cated than that of Lie groups.
https://doi.org/10.1515/9783111142524-022
326 � 22 Algebras and Group Representations
For this chapter, we will consider solely the representation theory of finite groups, and
for the remainder of this chapter, when we say group, we mean finite group.
Recall that group actions correspond to group homomorphisms into symmetric groups.
For linear actions on a vector space V , we have a stronger result.
Theorem 22.2.1. There is a bijective correspondence between the set of linear actions of
a group G on a K-vector space V and the set of homomorphisms from G into GL(V ), the
group of all invertible linear transformations of V , which is called the general linear group
over V .
From Theorem 22.2.1, it follows that the study of group representations is equivalent
to the study of linear actions of groups. This area of study, with emphasis on finite groups
and finite-dimensional vector spaces, has many applications to finite group theory.
The modern approach to the representation theory of finite groups involves another
equivalent concept, namely that of finitely generated modules over group rings.
In Chapter 18, we considered R-modules over commutative rings R, and used this
study to prove the fundamental theorem of finitely generated modules over principal
ideal domains. In particular, we used the same study to prove the fundamental theorem
22.2 Representations and Modules � 327
of finitely generated Abelian groups. Here we must extend the concepts and allow R to
be a general ring with identity.
Definition 22.2.3. Let R be a ring with identity 1, and let M be an Abelian group written
additively. M is called left R-module if there is a map R × M → M written as (r, m) → rm
such that the following hold:
(1) 1 ⋅ m = m;
(2) r(m + n) = rm + rn;
(3) (r + s)m = rm + sm;
(4) r(sm) = (rs)m;
Finite minimal sets for a given module may have different numbers of elements.
This is in contrast to the situation in free R-modules over a commutative ring R with
identity, where any two finite bases have the same number of elements (Theorem 18.4.6).
In the following, we review the module theory that is necessary for the study of
group representations. The facts we use are straightforward extensions of the respective
facts for modules over commutative rings or for groups.
Example 22.2.6. The R-submodules of a ring R are exactly the left ideals of R (see Chap-
ter 1). Every R-module M has at least two submodules, namely, M itself and the zero
submodule {0}.
Definition 22.2.7. A simple R-module is an R-module M ≠ {0}, which has only M and {0}
as submodules.
If N is a submodule of M, then we may construct the factor group M/N (recall that
M is Abelian). We may give the factor group M/N an R-module structure by defining
r(m + N) = rm + N for every r ∈ R and m + N ∈ M/N. We call M/N the factor R-module,
or just factor module of M/N.
328 � 22 Algebras and Group Representations
N1 + N2 = {x + y | x ∈ N1 , y ∈ N2 } ⊂ M.
N ⊕ N ⊕ ⋅⋅⋅ ⊕ N
of k copies of N.
As for groups, we also have the external notion of a direct sum. If M and N are
R-modules, then we give the Cartesian product M × N an R-module structure by setting
r(m, n) = (rm, rn), and we write M ⊕ N instead of M × N.
The notions of internal and external direct sums can be extended to any finite num-
ber of submodules and modules, respectively.
M = M0 ⊃ M1 ⊃ ⋅ ⋅ ⋅ ⊃ Mk = {0}
of finitely many submodules Mi of M beginning with M and ending with {0}, where the
inclusions are proper, and in which each successive factor module Mi /Mi+1 is a simple
module. We call the length of the composition series k.
Therefore, we can speak in a well defined manner about the factor modules of a
composition series. If an R-module M has a composition series, then each submodule N
and each factor module M/N also has a composition series.
22.2 Representations and Modules � 329
If the submodule N and the factor module M/N each have a composition series,
then the module M also has one (see Chapter 13 for the respective proofs for groups).
Theorem 22.2.15 (Schur’s lemma). Let M and N be simple R-modules, and let ϕ : M → N
be a nonzero R-module homomorphism. Then ϕ is an R-module isomorphism.
Proof. Since both M and N are simple, we must have either ker(ϕ) = M or ker(ϕ) = {0}.
If ker(ϕ) = M, then ϕ = 0 the zero homomorphism. Hence, ker(ϕ) = {0} and Im(ϕ) = N.
Therefore, if ϕ ≠ 0, then ϕ is an R-module isomorphism.
Definition 22.2.16. Let R be a ring and G a group. Then the group ring of G over R, de-
noted by RG, consists of all finite R-linear combinations of elements of G. This is the set
of linear combinations of the form
∑ αg g + ∑ βg g = ∑ (αg + βg )g.
g∈G g∈G g∈G
330 � 22 Algebras and Group Representations
( ∑ αg g)( ∑ βg g) = ∑ ∑ αg βh gh
g∈G g∈G g∈G h∈G
= ∑ ( ∑ (αg βg −1 x ))x.
x∈G g∈G
The group ring RG has an identity element, which coincides with the identity element
of G. We usually denote this by just 1.
From the viewpoint of abstract group theory, it is of interest to consider the case,
where the underlying ring is an integral domain. In this connection, we mention the
famous zero divisor conjecture by Higman and Kaplansky, which poses the question
whether every group ring RG of a torsion-free group G over an integral domain R or
over a field K has no zero divisors.
The conjecture has been proved only for a fairly restricted class of torsion-free
groups.
In this chapter, we will primarily consider the case where R = K is a field and the
group G is finite, in which case the group ring KG is not only a ring, but also a finite-
dimensional K-vector space having G as a basis. In this case, KG is called the group alge-
bra.
In mathematics, in general, an algebra over a field K is a K-vector space with a
bilinear product that makes it a ring. That is, an algebra over K is an algebraic structure
A with both a ring structure and a K-vector space structure that are compatible. That is,
α(ab) = (αa)b = a(αb) for any α ∈ K and a, b ∈ A. An algebra is finite-dimensional if it
has finite dimension as K-vector space.
Example 22.2.17. (1) The matrix ring M(n, K) is a finite-dimensional K-algebra for any
natural number n.
(2) The group ring KG is a finite-dimensional K-algebra when the group G is finite.
Modules over a group algebra KG can also be considered as K-vector spaces with
α ∈ K acting as α ⋅ 1 ∈ KG.
Lemma 22.2.19. If K is a field, and G is a finite group, then a KG-module is finitely gener-
ated if and only if it is finite-dimensional as a K-vector space.
We now describe the fundamental connections between modules over group alge-
bras and group representation theory.
Theorem 22.2.20. If K is a field and G is a finite group, then there is a one-to-one cor-
respondence between finitely generated KG-modules and linear actions of G on finite-
dimensional K-vector spaces V , and hence with the homomorphisms ρ : G → GL(V )
for finite dimensional K-vector spaces V .
Proof. If V is a finitely generated KG-module, then dim K(V ) < ∞ by Lemma 22.2.19,
and the map from G × V to V obtained by restricting the module structure map from
KG × V to V is a linear action.
Conversely, let V be a finite-dimensional K-vector space, on which G acts linearly.
Then we place a KG-module structure on V by defining
∑ αg g ∈ KG and v ∈ V.
g∈G
Example 22.2.21. (1) The field K can always be considered as a KG-module by defining
gλ = λ for all g ∈ G and λ ∈ K. This module is called the trivial module.
(2) Let G act on the finite set X = {x1 , . . . , xn }. Let KX be the set
n
{∑ ci xi ci ∈ K, xi ∈ X for i = 1, . . . , n}
i=1
n n
g(∑ ci xi ) = ∑ ci (gxi ).
i=1 i=1
332 � 22 Algebras and Group Representations
(4) Let U, V be KG-modules, and let HomKG (U, V ) be the set of all KG-module homo-
morphisms from U to V . For ϕ, ψ ∈ HomKG (U, V ) define ϕ + ψ ∈ HomKG (U, V ) by
With this definition HomKG (U, V ) is an Abelian group. Furthermore, HomKG (U.V )
is a K-vector space with (λϕ)(u) = λϕ(u) for λ ∈ K, u ∈ U and ϕ ∈ HomKG (U, V ).
Note that this K-vector space has finite dimension. The K-vector space HomKG (U, V )
also admits a natural KG-module structure. For g ∈ G and ϕ ∈ HomKG (U, V ) then, we
define
Therefore, (g1 g2 )ϕ = g1 (g2 ϕ). It follows that HomKG (U, V ) has a KG-module structure.
G acts on HomKG (U, V ), and we write U ⋆ for HomKG (U, K), where K is the trivial
module. U ⋆ is called the dual module of U, and here we have (gϕ)(u) = ϕ(g −1 u).
Theorem 22.2.22 (Maschke’s Theorem). Let G be a finite group, and suppose that the char-
acteristic of K is either 0 or co-prime to |G|; that is, gcd(char(K), |G|) = 1. If U is a KG-
module and V is a KG-submodule of U, then V is a direct summand of U as KG-modules.
π′ : U → U
by
1
π ′ (u) = ∑ gπ(g −1 u) for u ∈ U.
|G| g∈G
22.2 Representations and Modules � 333
1
Since char(K) = 0, or gcd(char(K), |G|) = 1, it follows that |G| ≠ 0 in K; hence, |G|
exists
in K. Therefore, the definition of π makes sense.
′
1
π ′ (xu) = ∑ gπ(g −1 xu)
|G| g∈G
1
= ∑ xx −1 gπ(g −1 xu)
|G| g∈G
1
= x( ∑ x −1 gπ(g −1 xu)).
|G| g∈G
1
π ′ (xu) = x( ∑ yπ(y−1 u)) = xπ ′ (u)
|G| y∈G
as required.
Corollary 22.2.24. Let G be a finite group and K a field. Suppose that either char(K) = 0
or char(K) is relatively prime to |G|. Then every nonzero KG-module is semisimple.
Theorem 22.2.26 (Maschke’s theorem). Let G be a finite group and K a field. Suppose that
either char(K) = 0 or char(K) is relatively prime to |G|. Let U be a finite-dimensional
K-vector space. Then each representation ρ : G → GL(V ) is fully reducible.
Proof. By Theorem 22.2.1, we may consider U as a KG-module. Then the above version
of Maschke’s theorem follows from the proof for modules, because the KG-submodules
of U together with the respective definitions for group representations represent the
G-invariant subspaces of U.
The theory of KG-modules, when char(K) = p > 0 and p, divides |G|. In which case,
arbitrary KG-modules need not be semisimple, and is called modular representation the-
ory. The earliest work on modular representations was done by Dickson and many of
the main developments were done by Brauer. More details and a good overview may be
found in [1], [4], [5], and [18].
Proof. The implication (1) ⇒ (2) follows in the same manner as Corollary 22.2.24. The
implication (2) ⇒ (3) is direct.
Finally, we must show the implication (3) ⇒ (1). Suppose that (3) holds, and let
N be a submodule of M. Let V also be a submodule of M; that is, maximal among all
submodules of M that intersect N trivially. Such a submodule V exists by Zorn’s lemma.
We wish to show that N +V = M. Suppose that N +V ≠ M (certainly we have N +V ⊂ M). If
22.3 Semisimple Algebras and Wedderburn’s Theorem � 335
Lemma 22.3.2. Submodules and factor modules of semisimple modules are also semi-
simple.
Proof. Let M be a semisimple A-module. By the previous lemma and the isomorphism
theorem for modules, we get that every submodule of M is isomorphic to a factor module
of M. Therefore, it suffices to show that factor modules of M are semisimple. Let M/N
be an arbitrary factor module, and let η : M → M/N with m → m + N be the canonical
map. Since M is semisimple, we have M = S1 + ⋅ ⋅ ⋅ + Sn with n ∈ ℕ, and each Si a simple
module. Then M/N = η(M) = η(S1 ) + ⋅ ⋅ ⋅ + η(Sn ). But each η(Si ) is isomorphic to a factor
module of Si , and hence each η(Si ) is either {0} or a simple module. Therefore, M/N is a
sum of simple modules, and hence semisimple by Lemma 22.3.1.
Note that if G is a finite group, and either char(K) = 0 or gcd(char(K), |G|) = 1, then
KG is semisimple.
We now give some fundamental results on semisimple algebras.
Lemma 22.3.4. The algebra A is semisimple if and only if the A-module A is semisimple.
Proof. Suppose that the A-module A is semisimple, and let M be an A-module generated
by {m1 , . . . , mr }.
Let Ar denote the direct sum of r copies of A; (a1 , . . . , ar ) → a1 m1 + ⋅ ⋅ ⋅ + ar mr defines
a map from Ar to M, which is an A-module epimorphism. Thus, M is isomorphic to a
factor module of the semisimple module Ar , and hence semisimple by Lemma 22.3.2. It
follows that A is a semisimple algebra.
The converse is clear.
A ≅ S1 ⊕ ⋅ ⋅ ⋅ ⊕ Sr , r ∈ ℕ,
where the Si are simple submodules of A. Then any simple A-module is isomorphic to
some Si .
336 � 22 Algebras and Group Representations
M ≅ m1 S1 + ⋅ ⋅ ⋅ + mr Sr
Definition 22.3.7. An algebra D is a division algebra or skew field if the nonzero ele-
ments of D form a group. Equivalently, it is a ring, where every nonzero element has a
multiplicative inverse. It is exactly the definition of a field without requiring commuta-
tivity.
Any field K is a division algebra over itself, but there may be division algebras that
are noncommutative. If the interest is on the ring structure of D, one often speaks about
division rings (see Chapter 7).
Theorem 22.3.8. Let D be a division algebra and n ∈ ℕ. Then any simple M(n, D)-module
is isomorphic to Dn , and M(n, D) is an M(n, D)-module isomorphic to the direct sum of n
copies of Dn . In particular, M(n, D) is a semisimple algebra.
22.3 Semisimple Algebras and Wedderburn’s Theorem � 337
Proof. A nonzero submodule of Dn must contain some nonzero vector, which must have
a nonzero entry x in the j-th place for some j. This x is invertible in D.
By premultiplying this vector by Ejj (x −1 ), we see that the submodule contains the j-th
canonical basis vector. By premultiplying this basis vector by appropriate permutation
matrices, we get that the submodule contains every canonical basis vector, and hence
contains every vector.
It follows that Dn is the only nonzero M(n, D)-submodule of Dn , and hence Dn is
simple. Now for each 1 ≤ k ≤ n, let Ck be the submodule of M(n, D) consisting of those
matrices, whose only nonzero entries appear in the k-th column. Then we have
n
M(n, D) ≅ ⨁ Ck
k=1
Definition 22.3.9. A nonzero algebra is simple if its only (two-sided) ideals (as a ring)
are itself and the zero ideal.
Proof. Let A be a simple algebra, and let Σ be the sum of all simple submodules of A. Let
S be a simple submodule of A, and let a ∈ A. Then the map ϕ : S → Sa, given by s → sa,
is a module epimorphism. Therefore, Sa is simple, or Sa = {0}. In either case, we have
Sa ⊂ Σ for any submodule S and any a ∈ A.
It follows that Σ is a right ideal in A, and hence that Sa is a two-sided ideal. How-
ever, A is simple, and Σ ≠ {0}, so we must have Σ = A. Therefore, A is the sum of
simple A-modules, and from Lemmas 22.3.1 and 22.3.4, it follows that A is a semisimple
algebra.
Theorem 22.3.11. Let D be a division algebra, and let n ∈ ℕ. Then M(n, D) is a simple
algebra.
Proof. Let M ∈ M(n, D) with M ≠ {0}. We must show that the principal two-sided ideal
J of M(n, D) generated by M is equal to M(n, D).
It suffices to show that J contains each Eij (1), since these matrices generate M(n, D)
as an M(n, D)-module. Since M ≠ {0}, there exists some 1 ≤ r, s ≤ n such that the (r, s)-
entry of M is nonzero. We call this entry x. By calculation, we have
Let B1 , . . . , Br be algebras. The external direct sum B = B1 ⊕B2 ⊕⋅ ⋅ ⋅⊕Br is the algebra,
whose underlying set is the Cartesian product, and whose addition, multiplication, and
scalar multiplication are defined componentwise.
If M is a Bi -module for some i, then M has a B-module structure given by
(b1 , . . . , br )m = bi m.
b = b1 + ⋅ ⋅ ⋅ + br → (b1 , . . . , br ).
The algebra B is the internal direct sum as algebras of the Bi . This can be seen as follows.
If i ≠ j and bi ∈ Bi , bj ∈ Bj , then we must have bi bj ∈ Bi ∩ Bj = {0}, since Bi and Bj are
ideals. Therefore, the product in B of b1 + ⋅ ⋅ ⋅ + br and b′1 + ⋅ ⋅ ⋅ b′r is just b1 b′1 + ⋅ ⋅ ⋅ + br b′r .
Proof. Let J be a (two-sided) ideal of B, and let Ji = J ∩ Bi for each i. Certainly, ⨁ri=1 Ji ⊂ J.
Let b ∈ J, then b = b1 + ⋅ ⋅ ⋅ + br with bi ∈ Bi for each i. For some i, consider ei =
(0, . . . , 0, 1, 0, . . . , 0); that is, the element of B, whose only nonzero entry is the identity
element of Bi . Then b = bei ∈ J ∩ Bi = Ji . Therefore, b ∈ ⨁ri=1 Ji , which shows that
J = J1 ⊕ ⋅ ⋅ ⋅ ⊕ Jr .
The converse is clear.
Proof. For each i, we write Bi = Ci1 ⊕ ⋅ ⋅ ⋅ ⊕ Cin using Theorem 22.3.8, where the Cij are
mutually isomorphic Bi -modules. As we saw above, each Cij is also simple as a B-module.
Therefore, as B-modules, we have B ≅ ⨁i,j Cij , and hence B is a semisimple algebra by
Lemma 22.3.4. From Theorem 22.3.5, we get that any simple B-module is isomorphic to
some Cij , but Cij ≅ Ckl if and only if i = k. Hence, there are exactly r isomorphisms of sim-
ple B-modules. The final statement is a straightforward consequence of Theorem 22.3.11
and Lemma 22.3.12.
22.3 Semisimple Algebras and Wedderburn’s Theorem � 339
We saw that a direct sum of matrix algebras over a division algebras is semisimple.
We now start to show that the converse is also true; that is, any semisimple algebra is iso-
morphic to a direct sum of matrix algebras over division algebras. This is Wedderburn’s
theorem.
Definition 22.3.14. If M is an A-module, then let EndA (M) = HomA (M, M) denote the
set of all A-module endomorphisms of M. In a more general context, we have seen that
EndA (M) has the structure of an A-module via
for all ϕ, ψ ∈ EndA (M), λ ∈ A, and m ∈ M. This composition of mappings gives a multipli-
cation in EndA (M), and hence EndA (M) is a K-algebra, called the endomorphism algebra
of M.
Definition 22.3.15. The opposite algebra of B, denoted Bop , is the set B together with the
usual addition and scalar multiplication, but with the opposite multiplication, that is,
the multiplication rule of B reversed.
Proof. Let ϕ ∈ EndB (B), and let a = ϕ(1). Then ϕ(b) = bϕ(1) = ba for any b ∈ B; hence, ϕ
is equal to the automorphism ψa , given by right multiplication of a. Therefore, EndB (B) =
{ψa : a ∈ B}; hence, EndB (B) and B are in one-to-one correspondence. To finish the proof,
we must show that ψa ψb = ψa⋅b for any a, b ∈ B.
Let a, b ∈ B. Then ψa ψb (x) = ψa (xb) = xba = ψba (x) = ψa⋅b (x), as required.
Lemma 22.3.17. Let S1 , . . . , Sr be the r distinct simple A-modules of Theorem 22.3.6. For
each i, let Ui be a direct sum of copies of Si , and let U = U1 ⊕ ⋅ ⋅ ⋅ ⊕ Ur . Then
Proof. Let ϕ ∈ EndA (U). Fix some i. Then every composition factor of Ui is isomorphic
to Si . Therefore, by the Jordan–Hölder theorem for modules (Theorem 22.3.10), we see
that the same is true for ϕ(Ui ), since ϕ(Ui ) is isomorphic to a quotient of Ui . Assume that
ϕ(Ui ) is not contained in Ui . Then the image of ϕ(Ui ) in U/Ui under the canonical map
is a nonzero submodule, having Si as a composition factor. However, the composition
factors of U/Ui are exactly those Sj for j ≠ i. This gives a contradiction. It follows that
340 � 22 Algebras and Group Representations
ϕ(Ui ) ⊂ Ui , and a submodule of U/Ui cannot have Si as a composition factor. For each i,
we can define ϕi = ϕ|U , and we have ϕi ∈ EndA (Ui ). In this way, we define a map
i
by setting
Lemma 22.3.18. If S is a simple A-module, then EndA (nS) ≅ M(n, EndA (S)) for n ∈ ℕ.
Proof. We regard the elements of nS as being column vectors of length n with entries
from S. Let Φ = (ϕij ) ∈ M(n, EndA (S)). We now define the map
Γ(Φ) : nS → nS
by
s1
s = ( .. ) ∈ nS. Then
We write → .
sn
Γ(Φ(a→
s + →
t )) = aΓ(Φ)(→
s ) + Γ(Φ)(→
t)
by
Φ → Γ(Φ)
is an algebra monomorphism.
Now let ψ ∈ EndA (nS). For each 1 ≤ i, j ≤ n, we define ψij : S → S implicitly by
s 0
ψ11 (s) ψ1n (s)
0 .. 0 .
ψ ( . ) = ( . ) , . . . , ψ ( . ) = ( .. ) .
.. ..
ψn1 (s) ψnn (s)
0 s
We get that each ψij ∈ EndA (S). Now let Ψ = (ψij ) ∈ M(n, EndA (S)). Then Γ(Ψ) = ψ,
showing that Γ is also surjective, and hence an isomorphism.
If S is a simple A-module, then EndA (S) is a division algebra by Schur’s lemma (The-
orem 22.2.15). If the ground field K is algebraically closed, then more specific results can
be stated about the structure of EndA (S).
Lemma 22.3.19. Suppose that K is algebraically closed, and let S be a simple A-module.
Then EndA (S) ≅ K.
Proof. Let ϕ ∈ EndA (S). Consider ϕ as an invertible K-linear map of the finite-dimen-
sional K-vector space S onto itself. Since K is algebraically closed, ϕ has a nonzero eigen-
value λϕ ∈ K.
If I is the identity element of Enda (S), then (ϕ−λϕ I) ∈ EndA (S) has a nonzero kernel,
and therefore is not invertible. From this, it follows that ϕ = λϕ I, since EndA (S) is a
division algebra. The map ϕ → λϕ is then an isomorphism from EndA (S) to K.
Lemma 22.3.20. Let B be an algebra. Then (M(n, B))op ≅ M(n, Bop ) for any n ∈ ℕ.
Proof. Define the map ψ : (M(n, B))op → M(n, Bop ) by ψ(X) = X t , where X t is the trans-
pose of the matrix X. This map is bijective.
Let X = (xij ) and Y = (yij ) be elements of (M(n, B))op . Then for any i and j we have
n n
(ψ(X)ψ(Y ))ij = ∑ ψ(X)ij ⋅ ψ(Y )kj = ∑ (X t )ik ⋅ (Y t )kj
k=1 k=1
n n
= ∑ Xki ⋅ Yjk = ∑ Yjk Xki = (YX)ji
k=1 k=1
Since the endomorphism algebra of a simple module is a division algebra, and the
opposite algebra of a division algebra is also a division algebra, it follows that a semisim-
ple algebra is isomorphic to a direct sum of matrix algebras over division algebras. The
converse is a direct consequence of Theorem 22.3.13.
Theorem 22.3.22. The algebra A is simple if and only if it is isomorphic to a matrix alge-
bra over a division ring.
We see that an algebra is semisimple if and only if it is a direct sum of simple alge-
bras. This affirms the consistency of the choice of terminology.
Theorem 22.3.23. Suppose that the field K is algebraically closed. Then any semisimple
algebra is isomorphic to a direct sum of matrix algebras over K.
Proof. This follows directly from Lemma 22.3.19 and Theorem 22.3.21.
this case, representation theory of groups is called ordinary representation theory. Recall
that ℂ has characteristic 0 and is algebraically closed. For this section, G will denote
a finite group, and all ℂG-modules are finitely generated, or equivalently have finite
dimension as ℂ-vector spaces. From Theorem 22.3.21, we see that every nonzero ℂG-
modules is semisimple for any group G. It follows, from Wedderburn’s theorem, that we
have very specific information about the nature of the group algebra ℂG.
ℂG = M(f1 , ℂ) ⊕ ⋅ ⋅ ⋅ ⊕ M(fr , ℂ)
ℂG ≅ f1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ fr Sr
as ℂG-modules, where dimℂ Si = fi for each i. Any ℂG-module can be written uniquely in
the form a1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ ar Sr where all ai ∈ ℕ ∪ {0}.
Proof. The theorem follows from our results on the classification of simple and semisim-
ple algebras. The first statement follows from Corollary 22.2.24 and Theorem 22.3.23. The
second statement follows from Theorems 22.3.8 and 22.3.13, where we take Si as the space
of column vectors of length fi with the canonical module structure over the ith summand
M(fi , ℂ).
The final statement follows from Theorem 22.3.6.
Definition 22.4.2. The ℂ-dimensions f1 , . . . , fr of the r simple ℂG-modules are called the
degrees of the representations of G.
r
|G| = dimℂ (ℂG) = dim(⨁ M(fi , ℂ))
i=1
r r
= ∑ dimℂ M(fi , ℂ) = ∑ fi2 .
i=1 i=1
344 � 22 Algebras and Group Representations
We note that the degrees of G divide |G|. We do not need this fact. For a proof see
the appendix in the book [1].
Theorem 22.4.4. The number r of simple G-modules is equal to the number of conjugacy
classes of G.
Proof. Let Z be the center of ℂG; that is, the subalgebra of ℂG consisting of all elements
that commute with every element of ℂG. From Theorem 22.4.1, it follows that Z is iso-
morphic to the center of M(f1 , ℂ)⊕⋅ ⋅ ⋅⊕M(fr , ℂ), and therefore is isomorphic to the direct
sum of the centers of the M(fi , ℂ). It is straightforward that the center of M(fi , ℂ) is equal
to the set of diagonal matrices
( ∑ λg G)h = h( ∑ λg g),
g∈G g∈G
which leads to
∑ λg g = ∑ λg h−1 gh = ∑ λhgh−1 g.
g∈G g∈G g∈G
We note that for any representation U, we have χU (1) = dimℂ (U), since the identity
element of G induces the identity transformation of U. Furthermore, if ρ : G → GL(U)
is the representation corresponding to U, then χU (g) is just the trace of the map ρ(g).
Thus, isomorphic ℂG-modules have equal characters.
If g, h ∈ G, then the linear transformations of U, defined by g and hgh−1 , have the
same trace. These linear transformations are called similar. Therefore, any character
22.4 Ordinary Representations, Characters and Character Theory � 345
is constant on each conjugacy class of G; that is, the value of the character on any two
conjugate elements is the same.
Example 22.4.6. Let U = ℂG and g ∈ G. By considering the matrix of the linear trans-
formation defined by g with respect to the basis G of ℂG, we get that χU (g) is equal to
the number of elements x ∈ G, for which gx = x. Therefore, we have χU (1) = |G| and
χU (g) = 0 for every g ∈ G with g ≠ 1. This character is called the regular character of G.
Since one-dimensional modules are simple, we get that all linear characters are ir-
reducible. Let χ be the linear character arising from the ℂG-module U, and let g, h ∈ G.
Since U is one-dimensional for any u ∈ U, we have gu = χ(g)u, and hu = χ(h)u. Then
χ(gh)u = (gh)u = χ(g)χ(h)u. Hence, χ is a homomorphism from G to the multiplicative
group ℂ⋆ = ℂ \ {0}. On the other hand, given a homomorphism ϕ : G → ℂ⋆ , we can
define a one-dimensional ℂG-module U by gu = ϕ(g)u for g ∈ G and u ∈ U. Therefore,
χU = ϕ. It follows that the linear characters of G are precisely the group of homomor-
phisms from G to ℂ⋆ .
eigenvalues are precisely the zeros of the minimal polynomial of ρ(g), which divides
X n − 1. Consequently, these roots are nth roots of unity, which proves (iii), since χU (1) =
dimℂ (U). Each eigenvector of ρ(g) is also an eigenvector for ρ(g −1 ) with the eigenvalue
for ρ(g −1 ) being the inverse of the eigenvalue for ρ(g). Since the eigenvalues are roots
of unity, it follows that χU (g −1 ) = χU (g). From this we obtain (iv).
Now (v) follows directly from (iii). We have already seen that χU (g) is the sum of its
χU (1) eigenvalues, each of which is a root of unity. If the sum is equal to χU (1), then it
follows that each of these eigenvalues must be 1, in which case ρ(g) must be the identity
map. Conversely, if ρ(g) is the identity map, then χU (g) = dimℂ (U) = χU (1). Therefore,
{x ∈ G : χU (x) = χU (1)} = ker(ρ), and hence is a normal subgroup of G.
Proof. By considering a ℂ-basis for U ⊕ V , whose first dimℂ (U) elements form a ℂ-basis
for U ⊕ {0}, and whose remaining elements form a ℂ-basis for {0} ⊕ V , we get that
χU⊕V (g) = χU (g) + χV (g) for any g ∈ G.
Proof. The first statement follows directly from Lemma 22.4.10. Now, suppose that
χU = χV for some ℂG-modules U and V .
Since ℂG is semisimple, we can write U ≅ a1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ ar Sr and V ≅ b1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ br Sr
with ai , bi ∈ ℕ ∪ {0}. By taking characters, we have
22.4 Ordinary Representations, Characters and Character Theory � 347
r
0 = χU − χV = ∑(ai − bi )χi .
i=1
Theorem 22.4.13. The irreducible characters for G form a basis for the ℂ-vector space of
class functions on G.
Proof. By Theorem 22.4.9, the irreducible characters of G are linearly independent el-
ements of the space of class functions. Their number equals the number of conjugacy
classes of G by Theorem 22.4.4, and this number is equal to the dimension of the space
of class functions.
Definition 22.4.14. If α, β are class function of G, then their inner product is the complex
number
1
⟨α, β⟩ = ∑ α(g)β(g).
|G| g∈G
This inner product is a traditional complex inner product on the space of class func-
tion. Therefore, we have the following properties:
(1) ⟨α, α⟩ ≥ 0, and ⟨α, α⟩ = 0, if and only if α = 0;
(2) ⟨α, β⟩ = ⟨β, α⟩;
(3) ⟨λα, β⟩ = λ⟨α, β⟩ for all λ ∈ ℂ;
(4) ⟨α1 + α2 , β⟩ = ⟨α1 , β⟩ + ⟨α2 , β⟩.
1
dimℂ (U G ) = ∑ χ (g).
|G| g∈G U
1
Proof. Let a = |G|
∑g∈G g ∈ ℂG. Clearly, ga = a for any g ∈ G, and hence a2 = a. If T is a
linear transformation of U, defined by a, then T must satisfy the equation X 2 −X = 0, and
consequently, T is diagonalizable. It follows that the only eigenvalues of T are 0 and 1.
Let U1 ⊂ U be the eigenspace of T corresponding to the eigenvalue 1. If u ∈ U1 , then
gu = gau = au = u for any g ∈ G. Therefore, u ∈ U G . Conversely, suppose that u ∈ U G .
Then
and hence a ∈ U1 . It follows that U G = U1 . However, the trace of T is equal to the dimen-
sion of U1 , and then the result follows from the linearity of the trace map.
Recall that HomℂG (U, V ) is an ℂ-vector space with (ϕ + ψ)(u) = ϕ(u) + ψ(u), and
(λϕ)(u) = λϕ(u) for any λ ∈ ℂ, u ∈ U and ϕ, ψ ∈ HomℂG (U, V ).
Proof. We observe that HomℂG (U, V ) is a subspace of the ℂG-module HomℂG (U, V ). If
ϕ ∈ HomℂG (U, V ) and g ∈ G, then (gϕ)(u) = gϕ(g −1 u) = gg −1 ϕ(u) = ϕ(u) for any u ∈ U.
Hence, gϕ = ϕ for all g ∈ G. This implies that ϕ ∈ HomℂG (U, V )G . By reversing the
elements, we get HomℂG (U, V ) = HomℂG (U, V )G .
Therefore,
= ⟨ χV , χU ⟩
Theorem 22.4.18 (First orthogonality relation). Let χ1 , . . . , χr be the set of irreducible char-
acters of G. Then
1 0, if i ≠ j,
∑ χi (g)χj (g) = {
|G| g∈G 1, if i = j.
In other words, the irreducible characters form an orthonormal set with respect to
the defined inner product.
Proof. Let S1 , . . . , Sr be the distinct simple ℂG-modules that go with the irreducible char-
acters. From the previous theorem, we have
for any i, j. We further have HomℂG (Si , Si ) ≅ ℂ, and by Schur’s lemma HomℂG (Si , Sj ) = 0
for i ≠ j, proving the theorem.
Corollary 22.4.19. The set of irreducible characters form an orthonormal basis for the
vector space of class functions.
Proof. The irreducible characters form a basis for the space of characters, and from the
orthogonality result they are an orthonormal set relative to the inner product.
The second orthogonality relation says that the columns of the character table are
also a set of orthogonal vectors. That is, the irreducible characters of a set of conju-
gacy class representatives also forms an orthogonal set with respect to the defined inner
product.
r 0, if i ≠ j,
∑ χs (gi )χs (gj ) = { |G|
s=1 ki
, if i = j.
Proof. Let χ = (χi (gj ))1≤i,j≤r be the character table for G, and let K be the r × r diagonal
matrix with the set {k1 , . . . , kr } as its main diagonal. Then we have (χK)i,j = χi (gj )kj for
any i, j. Then
r
t
(χKχ )ij = ∑ kℓ χi (gℓ )χj (gℓ ) = ∑ χi (g)χj (g),
ℓ=1 g∈G
and
r
0 = ∑ kj χℓ (gj )χℓ (gi ) for i ≠ j,
ℓ=1
As mentioned before, more information about character tables and their conse-
quences can be found in [1].
Lemma 22.5.1. Let χ be a character of G. The value χ(g) for any g ∈ G is an algebraic
integer.
Proof. For any g ∈ G, the value χ(g) is a sum of roots of unity. However, any root of unity
satisfies a monic integral polynomial X n − 1 = 0, and hence is an algebraic integer. Since
the algebraic integers form a ring, any sum of roots of unity is an algebraic integer.
Lemma 22.5.2. Let χ be an irreducible character of G. Let g ∈ G and CG (g) the centralizer
of g in G. Then
|G : CG (g)|
χ(g)
χ(1)
is an algebraic integer.
|G : CG (g)|
λ= = χ(g).
χ(1)
Let τ : ℂG → ℂG, τ(z) = zα for z ∈ ℂG. We get τ ∈ EndℂG (ℂG) by the proof of Lem-
ma 22.3.6. Since S is a simple ℂG-module. Therefore, we may consider S as a submodule
of ℂG, and for 0 ≠ s ∈ S ⊂ ℂG, we have τ(s) = sα = αs = λs, since α is a central element.
352 � 22 Algebras and Group Representations
|G| 1 r r
χ(gi )
∑ G : CG (gi ) χ(gi )χ(gi ) = ∑ G : CG (gi ) χ(gi ),
=
χ(1) χ(1) i=1 i=1
χ(1)
Theorem 22.5.5. If G has a conjugacy class of nontrivial prime power order, then G is not
simple.
Proof. Suppose that G is simple and that the conjugacy class of 1 ≠ g ∈ G has order pn
with p a prime number, and n ∈ ℕ. From the second orthogonality relation, we get
0 1 r 1 1 r
0= = ∑ χi (g)χi (1) = + ∑ χi (g)χi (1),
p p i=1 p p i=2
where χ1 , χ2 , . . . , χr are the irreducible characters of G (recall that χ1 is the principal char-
acter).
22.5 Burnside’s Theorem � 353
χ (g)χ (1)
Since − p1 is not an algebraic integer, it follows that i p i is not an algebraic integer
for some 2 ≤ i ≤ r. As χi (g) is an algebraic integer, this implies that p ∤ χi (1), and χi (g) ≠ 0.
Now |G : CG (g)| = pn is relatively prime to χi (1).
Therefore,
Zi /Ki = Z(G/Ki ),
Theorem 22.5.6 (Burnside’s Theorem). If |G| = pa qb , where p and q are prime numbers
and a, b ∈ ℕ, then G is solvable.
354 � 22 Algebras and Group Representations
pa qb = |G| = 1 + h2 + h3 + ⋅ ⋅ ⋅ + hr .
22.6 Exercises
1. Let K be a field, and let G be a finite group. Let U and V be KG-modules having the
same dimension n, and let ρ : G → GL(U) and τ : G → GL(V ) be the corresponding
representations.
By fixing K-bases for U and V , consider ρ and τ as homomorphisms from G to
GL(n, K). Show that U and V are KG-module isomorphic if and only if there exists
some M ∈ GL(n, K) such that ρ(g)M = Mτ(g) for every g ∈ G.
2. Let K be a field, and let G be a finite group. Let x = ∑g∈G g ∈ KG.
(i) Show that the subspace Kx of KG is the unique submodule of KG, that is, iso-
morphic to the trivial module.
(ii) Let ϵ : KG → K be the KG-module epimorphism defined by ϵ(g) = 1 for all
g ∈ G.
Show that ker(ϵ) is the unique KG-submodule of KG, whose quotient is isomor-
phic to the trivial module. This kernel is called the augmentation ideal of KG.
22.6 Exercises � 355
(iii) Suppose that char(K) = p, with p dividing |G|. Show that KG ⊂ ker(ϵ), the aug-
mentation ideal of KG. Show that ker(ϵ) is not a direct summand of KG, and
hence that the KG-module KG is not semisimple.
3. Show that the converse of Corollary 22.2.24 is true.
4. Let U be a finite-dimensional K-vector space and let G be a finite group with fully
reducible representation ρ : G → GL(U). Show that ρ gives a direct decomposition
U = V1 ⊕ ⋅ ⋅ ⋅ ⊕ Vk
Cryptography refers to the science of sending and receiving coded messages. Coding and
hidden ciphering is an old endeavor used by governments and military, and between
private individuals from ancient times. Recently, it has become even more prominent
because of the necessity of sending secure and private information, such as credit card
numbers and passwords, over essentially open communication systems.
Traditionally, cryptography deals with devising and implementing secret codes or
cryptosystems. Cryptoanalysis is the science of breaking cryptosystems while cryptology
refers to the whole field of cryptography plus cryptoanalysis.
A cryptosystem or code is an algorithm to change a plain message, called the plain
text message, into a coded message, called the ciphertext message. In general, both the
plaintext message (uncoded message) and the ciphertext message (coded message) are
written in some N-letter alphabet which is usually the same for both plaintext and code.
The method of coding, or the encoding algorithm, is then a transformation of the N let-
ters. The most common way to perform this transformation is to consider the N letters
as N integers modulo N and then apply a number theoretical function to them. There-
fore, many encoding algorithms use modular arithmetic and hence cryptography is tied
to number theory and Abelian groups.
Modern cryptography is usually separated into classical cryptography, called sym-
metric key cryptography, and public key cryptography. In the former, both the encoding
and decoding algorithms are supposedly known only to the sender and receiver, usually
referred to as Bob and Alice. In the latter, the encryption method is public knowledge
but only the receiver knows how to decode.
The message that one wants to send is written in plaintext and then converted into
code. The coded message is written in ciphertext. The plaintext message and the cipher-
text message are written in some alphabets that are usually the same. The process of
putting the plaintext into code is called enciphering or encryption while the reverse pro-
cess is called deciphering or decryption.
Encryption algorithms break the plaintext and ciphertext message into message
units. These are single letters or, more generally, k-vectors of letters. The transforma-
tions are done in these message units and the encryption algorithm is a mapping from
the set of plaintext message units to the set of ciphertext message units.
Putting this into a mathematical formulation, we let 𝒫 to be the set of all plaintext
message units and 𝒞 be the set of all ciphertext message units. The encryption algorithm
is then the application of an injective map f : 𝒫 → 𝒞 . The map f is the encryption map. The
left inverse map g: 𝒞 → 𝒫 is the decryption or deciphering map. The collection {𝒫 , 𝒞 , f , g}
is called a basic cryptosystem.
https://doi.org/10.1515/9783111142524-023
23.1 Basic Algebraic Cryptography � 357
We may place this in a more general context. We call this wider model a (general)
cryptosystem, indexed by a set 𝒦, called the key space. Formally, a cryptosystem is a tuple
(𝒫 , 𝒞 , 𝒦, ℰ , 𝒟) where 𝒫 is the set of plaintext message units, called the plaintext space, 𝒞
is the set of ciphertext message units, called the ciphertext space, the elements k ∈ 𝒦 are
called keys, ℰ is a set of injective maps fk : 𝒫 → 𝒞 indexed by the key space. This is called
the set of encryption maps. Hence, for each k ∈ K, there is an injective map fk : 𝒫 → 𝒞 .
The set 𝒟 consists of maps gk : 𝒞 → 𝒫 , also indexed by the key space. This is called the
set of decryption maps.
The central property of a cryptosystem is that, for each k ∈ K, there exists a corre-
sponding key k ′ ∈ 𝒦 and a decryption map gk ′ : 𝒞 → 𝒫 such that gk ′ is the left inverse
of fk . In our previous language this means that for each k ∈ 𝒦 we have a basic cryptosys-
tem {𝒫 , 𝒞 , fk , gk ′ } with k the encryption key and k ′ the decryption key.
Using this model, we can easily distinguish symmetric from asymmetric cryptosys-
tems. In a symmetric key cryptosystem, if the encryption key k is given, it is easy to
find the corresponding decryption key k ′ . In fact, most of the time we have k = k ′ . In
an asymmetric or public key cryptosystem, even if the encryption key k is known, it is
infeasible to find or to compute the corresponding decryption key k ′ .
In the following, we describe some cryptosystems and start with the symmetric key
cryptosystems.
This is often known as a Caesar code after Julius Caesar who supposedly invented it.
Any permutation encryption algorithm is very simple to attack using statistical analy-
sis. Polyalphabetic ciphers are an attempt to thwart statistical attacks. One variation of
the basic Caesar code is the following where message units are k-vectors. It is actually
a type of polyalphabetic cipher called a Vigenère code. In this code, message units are
considered as t-vectors of integers modulo N from an N letter alphabet. Let (b1 , . . . , bt )
be a fixed t-vector in ℤtn . This Vigenère code then takes a message unit (a1 , . . . , at ) to
(a1 + b1 , . . . , at + bt ) (mod N). For a long period of time polyalphabetic ciphers where
considered unbreakable. In 1920, the Friedmann test was developed. Given a sequence
of letters of length m representing a Vigenère encrypted cipher text, the Friedmann test
calculates the length t of the key word (b1 , . . . , bt ), see for instance [66]. A statistical anal-
ysis then allows to break the Vigenère code.
358 � 23 Algebraic Cryptography
Shannon, see [98], proved that the one-time pad, under the assumptions provided
in the definition, is perfectly secure, as long as the keys are randomly chosen and used
only once.
Theorem 23.1.2. A one-time pad has perfect security if the keys are randomly chosen from
the uniform distribution of keys and a key is used only once.
Although the one-time pad is theoretically secure there are many problems with
its practical use because of the assumptions described above. For these reasons the one-
time pad, while important theoretically, is not used to a great extent in encryption. How-
ever, a stream cipher is a method to attempt to mimic the important properties of the
one-time pad. A stream cipher is a symmetric key cipher where plaintext characters are
combined with a pseudo-random key generator called a key stream. In a stream cipher
the plaintext characters are encrypted one at a time and the encryption of successive
characters varies during the encryption.
Stream ciphers require sequences of pseudo-random digits. These are sequences
that behave as if they are random. Here we will discuss a procedure to generate pseudo-
random sequences and hence stream cipher key generation. First we need the concept
of a linear congruence generator. For a given natural number n we denote by ℤn the
ring of integers modulo n. Elements of ℤn are residue classes of integers modulo n. If a
is an integer, we will denote the corresponding residue class in ℤn by a.
chosen such that the function g has no fixed point in ℤn . Then b ≠ 0 for otherwise 0 is
a fixed point. Hence, let b ≠ 0. If a = 1, then f has no fixed point but then the function is
just a linear shift which is insecure. Therefore, let a ≠ 1. Then f has a fixed point in ℤn if
gcd(a − 1, n) = 1 because then there exists a d ∈ ℤ with (d(a − 1)) = 1 and then x = −db is
a fixed point in ℤn . Therefore, altogether for a linear congruence generator we should
choose a and b such that gcd(a − 1, n) > 1, a ≠ 1, and b ≠ 0.
Using the idea of a linear congruence generator, we now give a procedure for the
generation of a stream cipher.
1. Choose a seed s ∈ ℤ by key agreement or as a random number.
2. Let n ∈ ℕ, a, b ∈ ℤ and f : ℤn → ℤn , x → ax + b be a linear congruence generator.
Define the sequences x0 = s, x 1 ≡ f (x 0 ) (mod n), x 2 ≡ f (x 1 ) (mod n), . . . .
3. Transform the sequence of plaintext units into a sequence of residue classes
m0 , m1 , . . . in ℤn .
4. Encrypt the mi into ci = mi + x i ∈ ℤn . The secret key is s ∈ ℤn .
Theorem 23.1.4 (Maximal period length for n ≥ 2). Let n ∈ ℕ with n = 2m , m ≥ 1, and let
a, b ∈ ℤ such that f : ℤn → ℤn , x → ax + b is a linear congruence generator. Further let
s ∈ {0, 1, . . . , n − 1} be given, x 0 = s, x 1 = f (x 0 ), . . . . Then the sequence x 0 , x 1 , . . . is periodic
with the maximal period length n = 2m if and only if the following holds:
(1) a is odd.
(2) If m ≥ 2 then a ≡ 1 (mod 4).
(3) b is odd.
Proof. We show that (1), (2), and (3) hold if the period length is maximal. First we must
have gcd(a, n) = 1 since f is a linear congruence generator. Further f has no fixed point
because the period length is maximal. We show that a ≡ 3 (mod 4) is not possible if
m ≥ 2. Suppose that a ≡ 3 (mod 4) and m ≥ 2. Suppose that a ≡ 3 (mod 4) and m ≥ 2.
Then a + 1 ≡ 0 (mod 4) and it follows that
We now consider
for k ≥ 1. Therefore
k−1
x k = (1 + a + ⋅ ⋅ ⋅ + a )b
for k ≥ 1 since x 0 = 0 and x 1 = b. All elements in the sequence are multiples of b. There
is an x i = 1 and therefore b is invertible in ℤn and hence b is odd.
Now, assume that (1), (2), and (3) are satisfied. The theorem follows directly if n = 2
since then if x 0 = 0 we have x 1 = 1 and if x 0 = 1 we have x 1 = 0. Now suppose that m ≥ 2,
so that n ≥ 4. We show that we may obtain the maximal length n = 2m for x 0 = 0 which
proves the theorem.
k−1
Let x 0 = 0. Then as before we obtain recursively x k = (1 + a + ⋅ ⋅ ⋅ + a )b for k ≥ 1.
k−1
Since b is odd we have x k = 0 if and only if (1 + a + . . . a ) = 0 in ℤn .
We write k = 2r t with r ≥ 0 and t odd. Then
k−1
0 = (1 + . . . a )
2r −1 2r 2r 2 2r t−1
= (1 + a + ⋅ ⋅ ⋅ + a )(1 + a + (a ) + ⋅ ⋅ ⋅ + (a ) ).
The second factor is congruent to 1 modulo 2 and hence 2m |(1 + a + ⋅ ⋅ ⋅ + ak−1 ) if and only
r r
if 2m |(1 + a + ⋅ ⋅ ⋅ + a2 −1 ). The integer 1 + a + ⋅ ⋅ ⋅ + a2 −1 is divisible by 2r since it is the
sum of 2r odd numbers but not divisible by 2r+1 . It follows that r ≥ m if and only if 2m |k.
Therefore x k = 0 occurs for k ≥ 1 for the first time when k = n = 2m .
We now describe some of the current public key cryptosystems. We start with the
RSA cryptosystem named after L. Rivest, A. Shamir, and L. Adleman.
23.1 Basic Algebraic Cryptography � 361
= pq − p − q + 1
= (p − 1)(q − 1).
Now Alice computes two numbers e, s ≥ 3 such that es ≡ 1 (mod φ(n)). The number s
should be large; otherwise, the private key (n, s) is insecure due to an attack by Wiener,
see [104]. Assume that the plaintext message is given by an integer x ∈ {0, 1, . . . , n − 1}.
The public key is the pair (n, e), and the encryption is done by x → x e (mod n). Alice
decrypts by y → ys (mod n).
Now, let y = x e (mod n). If es = 1 + (p − 1)k, then
ys ≡ x es
k
≡ x ⋅ (x p−1 )
0 if p|x
={ k
x⋅1 otherwise
≡ x (mod p)
to encrypt plaintext into residue classes within ℤ∗p . For each message transmission the
user’s public key is (p, g, A) where A = g a for some integer a.
The encryption works as follows. Suppose that Bob wants to send a message to Alice.
Alice’s public key is (p, g, A) as above. The message is m, and as above, is encrypted in
some workable efficient manner within ℤ∗p , that is, the message is encrypted in a man-
ner known to all users as an integer in {0, 1, . . . , p − 1}. Bob now randomly chooses an
integer b and computes B = g b . He now sends to Alice (B, mC) where C = g ab . To decrypt,
Alice first uses B to determine the common shared key C. Since B = g b , and she knows
A = g a , she knows C = g ab and the modulus p. Hence, she can compute the inverse g (−ab)
to obtain the message m.
The security certificate of the ElGamal cryptosystem is based on the difficulty of the
Computational Diffie–Hellman problem (CDH) for ℤ∗p : given a prime p, a generator g of
ℤ∗p , g a modulo p and g b modulo p, determine g ab modulo p. Certainly, the CDH can be
formulated for each cyclic group G = ⟨g⟩: the CDH is the problem to find g ab for two
elements g a and g b . At present, the only known solution of CDH is to solve the discrete
logarithm problem (DLP): for G = ⟨g⟩ being a cyclic group and h ∈ G, find a ∈ ℤ such
that h = g a .
The DLP appears to be very hard for large orders |G| of G. Solving the DLP for ℤ∗p
breaks the ElGamal cryptosystem, as does solving the CDH. It is not known whether the
CDH can be solved without solving the DLP. The ElGamal encryption becomes the basis
for elliptic curve cryptography which we discuss briefly.
The important thing about elliptic curves from the viewpoint of cryptography is that a
group structure can be placed on E(K). In particular, we define the operation + on E(K)
by:
1. 𝒪 + P = P for any point P ∈ E(K).
2. If P = (x, y), then −P = (x, −y) and −𝒪 = 𝒪.
3. P + (−P) = 𝒪 for any point P ∈ E(K).
4. If P1 = (x1 , y1 ) and P2 = (x2 , y2 ) such that P1 ≠ −P2 , then P1 + P2 = (x3 , y3 ) with
y2 −y1 3x12 +a
x3 = m2 − (x1 + x2 ) and y3 = −m(x3 − x1 ) − y1 where m = x2 −x1
if x2 ≠ x1 and m = 2y1
if x2 = x1 .
23.1 Basic Algebraic Cryptography � 363
Theorem 23.1.5. If K is a finite field of order pk then the group E(K) is either cyclic or is
isomorphic to ℤm1 × ℤm2 with m1 |m2 and m1 |(pk − 1).
Theorem 23.1.6 (Hasse’s Theorem). Let I = [p+1−2√p, p+1+2√p]∩ℕ. Then there exists
for each k ∈ I at least one elliptic curve with |E(ℤp )| = k.
The public key is (P, dP) and the elliptic curve itself. The secret key is d.
For encryption, let m ∈ ℳ be a plain text message unit. Calculate Q = ρ(m). Choose
a random integer k and define c = (kP, Q + k(dP)) ∈ 𝒞 , where 𝒞 is the set of cipher text
units. This is the encrypted message unit.
For decryption, let C = (c1 , c2 ) ∈ 𝒞 be a ciphertext unit. Calculate Q = c2 − dc1
and m = ρ−1 (Q), the preimage of Q. Recall that Q ∈ E(ℤp ) \ {0} if Q = ρ(m) and (c1 , c2 ) =
(kP, Q+k(dP)). The elliptic curve public key cryptosystem provides a valid cryptosystem:
if (c1 , c2 ) = (kP, Q+k(dP)), then c2 −dc1 = Q = ρ(m). The security certificate of the elliptic
364 � 23 Algebraic Cryptography
curve public key cryptosystem is also based on the difficulty of the Computational Diffie–
Hellman problem for E(ℤp ). For this, care should be taken that the discrete logarithm
problem in E(ℤp ) is difficult. Elliptic curve public key cryptosystems are at present the
most important commutative alternatives to the use of the RSA algorithm. There are
several reasons for that. They are more efficient in many cases than RSA and keys in
elliptic curve systems are much smaller than keys in RSA.
Besides secure confidential message transmission there are many other tasks that are
important in cryptography, both symmetric key and public key. Although it is not en-
tirely precise, we say that a cryptographic task is where one or more people must com-
municate with some degree of secrecy. The set of algorithms and procedures needed to
accomplish a cryptographic task is called cryptographic protocol. A cryptosystem is just
one type of a cryptographic protocol. More formally, suppose that several parties want
to manage a cryptographic task. Then they must communicate with each other and co-
operate. Hence, each party must follow certain rules and implement a certain algorithm
that they agreed upon.
We now discuss some cryptographic tasks that we will occasionally refer to in this
book but many more can be found in detail in the book [66].
where a0 = S is the secret and a1 , . . . , at−1 ∈ K. The dealer chooses pairwise distinct
xi ∈ K \ {0}, i = 1, . . . , n, which are stored in a public area. The dealer calculates yi =
p(xi ), i = 1, . . . , n, and distributes to the n participants via a secure channel so that each
participant pi gets one share yi .
For the secret recovery we use the Lagrange interpolation. We can construct the
Lagrange interpolating polynomial with respect to (x1 , y1 ), . . . , (xn , yn ), all xi ∈ K \ {0}
pairwise distinct, as
t
p(x) = ∑ yi li (x)
i=1
xi −xj
where li (x) = ∏j=1,j=i̸ xi −xj
. Clearly, p(x) is a polynomial of degree at most t − 1. In partic-
ular, the secret a0 will be
t t −xj
a0 = p(0) = ∑ yi ∏ .
i=1 j=i,j=i̸
xi − xj
This scheme is perfect in the sense that for t − 1 participants any secret S ∈ K is equally
likely.
We now describe a geometric alternative scheme which depends on the closest vec-
tor theorem. Let W be a real inner product space and V be a subspace of finite dimen-
sion t. Suppose that w⃗ ∈ W and {e⃗1 , . . . , e⃗t } is an orthonormal basis of V . Note that, given
any basis for the subspace V , the Gram–Schmidt orthonormalization procedure can be
used to find an orthonormal basis for V . Suppose that w⃗ ∈ W is not in V . Then the unique
vector w⃗ ∗ ∈ V closest to w⃗ is given by
Exercises
1. Let F: ℤ24 → ℤ24 be given by x → 5x + 3. Calculate the period length for x 0 = 0.
2. We use the standard allocation A = 01, B = 02, . . . , Z = 26. Calculate the plaintext
number M for the plaintext message ‘Louisa is born on Christmas Day.’
3. Distribute the secret 42 using the Shamir secret sharing scheme evenly among three
people such that any two can put together the secret.
4. The company Ruin Invest has two directors, seven department managers, and 87
further employees. A valuable customer file is protected by a secret key. Develop a
procedure of the information about the key among the following groups of autho-
rized people:
(1) both directors,
(2) one director and all seven department managers together, and
(3) one director, at least four department managers, and also at least 11 employees.
5. Given are prime numbers p and q with q < p and n = pq. For an RSA cryptosystem
assume that p − q is very small. Show that n can be factorized using the following
procedure:
(1) Let t ∈ ℕ be the smallest number with t ≥ √n.
(2) If t 2 − n is a square, that is, t 2 − n = s2 for some s ∈ ℕ, then p = t + s and q = t − s
provides the factorization.
(3) Otherwise take the next integer t ≥ √n and go back to (2).
Use the procedure to factorize n = 9898828507.
6. Let (n, e) = (2047, 179) be the public RSA key. A plaintext alphabet has the 26 letters
A, B, . . . , Z and the empty sign 0 between words. The plaintext message c with 0 be-
tween words will be subdivided into double blocks with 0 at the end, if necessary.
By the assignment A → 00, B → 01, . . . , Z → 25, 0 → 26 each double block gives a
block with 4 digits. We consider the four digit numbers as residue classes modulo
2047. Encryption with the public key (2047, 179) gives the ciphertext message 1054,
92, 1141, 1571, 92, 832 in the form of residue classes modulo 2047.
(a) Break the encryption by factoring 2047.
(b) Why is the number 2047 besides the small size, a particularly unfavorable
choice?
It is possible to break the encryption without factoring 2047?
7. Alice and Bob agree on the following public key cryptosystem:
(1) Alice chooses a, b ∈ ℤ with ab ≠ 1 and calculates M = ab − 1. Then Alice chooses
two integers a′ , b′ and calculates e = a′ M +a and d = b′ M +b. She then calculates
n = ed−1 M
.
(2) Alice publishes the pair (n, e). The secret key is d.
(3) Bob wants to send a message m ∈ {0, 1, . . . , n − 1} to Alice.
He calculates c ≡ em (mod n) and sends c to Alice.
(4) She decrypts the message by calculating cd modulo n.
Show that this is a valid cryptosystem, that is, Alice gets the message.
368 � 23 Algebraic Cryptography
8. Show that breaking the ElGamal encryption scheme and breaking the Diffie–
Hellman key exchange protocol are equally difficult.
9. (a) Let K = ℤ5 and y2 = x 3 + x. This equation defines an elliptic curve over ℤ5 .
Show that E(ℤ5 ) ≅ ℤ2 × ℤ2 .
(b) Let K = ℤ11 and y2 = x 3 + x + 6 be a curve over ℤ11 . Show that y2 = x 3 + x + 6 is
an elliptic curve over ℤ11 and that E(ℤ11 ) is cyclic of order 13.
10. Determine all possible groups E(ℤ5 ) for elliptic curves over ℤ5 . Give all possible
orders for a group E(ℤ5 ).
24 Non-Commutative Group Based Cryptography
24.1 Group Based Methods
The public key cryptosystems and public key exchange protocols that we have discussed,
such as the RSA algorithm, or the Diffie–Hellman, ElGamal and elliptic curve meth-
ods, are number theory based, and thus depend on the structure of Abelian groups. As
computing machinery has gotten stronger, and computational techniques have become
more sophisticated and improved, there have been successful attacks on both RSA and
Diffie–Hellman for smaller and specialized parameters (RSA and Diffie–Hellman mod-
uli). Furthermore, there exist quantum algorithms that specifically break both RSA and
Diffie–Hellman. As a consequence, when and if a workable quantum computer will be
realized, these cryptographic methods will have to be altered.
Because of these attacks there is a feeling that these number theoretic techniques
are theoretically susceptible to attack. Somehow the relatively simple structure of
Abelian groups opens up the possibility of weaknesses in cryptographic protocols. As
a result there has been an active line of research to develop cryptosystems and key
exchange protocols using noncommutative cryptographic platforms which is called
noncommutative algebraic cryptography. Since most of the cryptographic platforms are
groups this is also known as group based cryptography.
The main sources for non-Abelian groups are combinatorial group theory and lin-
ear group theory, that is matrix groups. Braid group cryptography where encryption is
done within the classical braid groups, is one prominent example. The one-way func-
tions in braid group systems are based on the difficulty of solving group theoretic de-
cision problems such as the conjugacy problem and conjugator search problem. Recall
that a one-way function is a function which is easy to implement but very hard to invert.
Although braid group cryptography had initial spectacular success, various potential
attacks have been identified. Borovik, Myasnikov, Shpilrain, see [70], and others have
studied the statistical aspects of these attacks and have identified what is termed black
holes in the platform groups, the outsides of which present cryptographic problems.
The extension of the cryptographic ideas to noncommutative platforms involves the
following ideas:
1. general algebraic techniques for developing cryptosystems;
2. potential algebraic platforms (specific groups, rings, etc.) for implementing the tech-
niques; and
3. cryptanalysis and security analysis of the resulting systems.
The basic idea in using combinatorial group theory for cryptography is that elements
of groups can be expressed as words in some alphabet. If there is an easy method to
rewrite group elements in terms of these words, and further the technique used in this
rewriting process can be supplied by a secret key, then a cryptosystem can be created.
https://doi.org/10.1515/9783111142524-024
370 � 24 Non-Commutative Group Based Cryptography
G = ⟨X; R⟩ = ⟨x1 , . . . , xn ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩
and that the protocol security is based on a group theoretic problem that we denote by 𝒫 .
The first necessity is that there is an efficient way to uniquely represent and then multi-
24.1 Group Based Methods � 371
ply the elements of G. In most cases this requires a normal form for elements g ∈ G, that
is, a unique representation in terms of the generators {x1 , . . . , xn }. In particular, reduced
words provide normal forms for elements of free groups. Normal forms provide an ef-
fective method of disguising group elements. Without this, one can determine a secret
key simply by inspection of group elements. The existence of a normal form in a group
implies solvable word problem, which is also essential for these protocols. For g ∈ G we
will denote its normal form, in terms of the set of generators X, by NFX (g).
To be useful in cryptography, given g ∈ G, expressed as a word in x1 , . . . , xn , the
process of moving between the word and the unique normal form must be efficiently
computable. Usually we require at most polynomial time in the input length of g.
In addition to the platform group having normal forms, ideally, it would also exhibit
exponential growth. That is, the growth function for G, γ : ℕ → ℝ, defined by γ(n) =
# {w ∈ G : l(w) ≤ n}, has an exponential growth rate, also see [93]. In the definition l(W )
stands for the minimal number of letters needed to express W as a word in x1 , . . . , xn .
Exponential growth is a necessity that ensures that the group will provide a large key
space.
Further, the normal form must exhibit good diffusion in determining the normal
forms of products. This means that in finding the normal forms of products it is compu-
tationally difficult to rediscover the factors, that is if we know NFX (g1 g2 ) it is computa-
tionally difficult to discover g1 , g2 or NFX (g1 ), NFX (g2 ).
Other necessities for a platform group depend on the particular protocol. If the secu-
rity is based on the group problem 𝒫 , such as the word problem or conjugacy problem,
we have to assume that in G, the solution to 𝒫 is computationally hard (NP-hard) or un-
solvable. However, what we really want is generically hard, that is, hard on most inputs.
The solution to 𝒫 might be unsolvable but have polynomial average case complexity. In
this case, if care is not taken in choosing the inputs, the solution to 𝒫 is easy and the cryp-
tographic protocol is broken. This does not eliminate a group G as a possible platform
group but indicates that one must take great care in choosing cryptographic inputs.
Among the first attempts to use non-Abelian groups as platforms for public key cryp-
tosystems were the schemes [62] by Anshel, Anshel and Goldfeld, and the schemes [85]
by Ko, Lee et al. The first protocol was developed by I. Anshel, M. Anshel and D. Goldfeld.
The original version of the Ko–Lee protocol was published by K. H. Ko, S. J. Lee, J. H. Han,
J. Kang and C. Park. We will refer to the second protocol as Ko–Lee. Both sets of authors,
at about the same time, proposed using non-Abelian groups and combinatorial group
theory for public key exchange.
The Anshel–Anshel–Goldfeld and Ko–Lee methods can be considered as group theo-
retic analogs of the number theory based Diffie–Hellman method. The basic underlying
idea is the following. If G is a group and g, h ∈ G we let g h denote the conjugate of g by h,
that is g h = h−1 gh. The simple observation is that (g h1 )h2 = g h1 h2 . Therefore writing con-
jugation in this exponential manner behaves like ordinary exponentiation. From this
straightforward idea one can almost exactly mimic the Diffie–Hellman protocol, now
within a non-Abelian group.
372 � 24 Non-Commutative Group Based Cryptography
which is now called the Artin presentation. We remark that there are several possibili-
ties for normal forms for elements of Bn , see [24].
We describe both protocols in a most general context, that is, with a general platform
group. This platform group must have a finite presentation with efficiently computable
normal forms, exponential growth, and good diffusion in determining the normal form
of products. For the following Ko–Lee protocol and the Anshel–Anshel–Goldfeld pro-
tocols, the platform group must also contain an abundant collection of subgroups that
commute elementwise and that can be efficiently described.
a b
M = {± ( ) : ad − bc = 1, a, b, c, d ∈ ℤ} .
c d
az + b
z′ = with ad − bc = 1 and a, b, c, d ∈ ℤ.
cz + d
1 1 1 + 4t 2 2t
±( ), ±( ), t = 1, 2, 3, . . . ,
1 2 2t 1
24.2 Initial Group Theoretic Cryptosystems—The Magnus Method � 373
freely generate a free subgroup F of infinite index in M. Further, distinct elements of F have
distinct first columns (up to sign). The group F is of infinite rank.
Proof. Without loss of generality we first work in the homogenous modular group
a b
Γ = {( ) : a, b, c, d ∈ ℤ, ad − bc = 1 = SL(2, ℤ)} .
c d
B. H. Neumann, see [40], constructed infinitely many subgroups N of Γ with the fol-
lowing properties:
(i) N contains the matrix T = ( 01 −1
0 ).
(ii) Let a and c be any pair of coprime integers. Then N contains exactly one matrix in
which the first column consists of the ordered pair (a, c).
We remark that Neumann showed that such an N has properties (i) and (ii) if it contains
T and has exactly all the elements U n , n = 0, ±1, ±2, . . . , as right coset representatives in
Γ where U = ( 01 11 ).
To prove Theorem 24.2.1 we do not need the whole procedure, also not the additional
remark (for the complete construction see [40]). We just pick up the single procedure for
the special group given in Theorem 24.2.1. We consider the bijective map f : ℤ → ℤ given
by f (f (n)) = n, f (0) = 0, f (−1) = −1, and for any positive integer k we have f (2k) = 2k,
f (6k − 1) = −3k − 1, f (6k − 3) = −3k, f (6k − 5) = 1 − 3k.
We define the subgroup N generated by the elements
n −1 − nf (n)
γn = ( ).
1 −f (n)
We now consider N as a subgroup of the modular group M and use the Reide-
meister–Schreier method in combination with Tietze transformations, see Chapter 14.
We see that N is generated by the elements
2 2
γ−1 = γ2k , k = 1, 2, 3, . . . .
This shows that the elements A = γ0−1 γ−1 and B2k = γ2k γ0−1 , k = 1, 2, 3, . . . , freely gen-
erate a free subgroup F of infinite rank in N using the Reidemeister–Schreier method.
This, in fact, also follows if we consider F acting on the upper half plane.
2
We have A = ±( 11 21 ) and B2k = ±( 1+4k 2k 1
2k ), k = 1, 2, 3, . . . . The group F does not
contain any power of U , t ∈ ℤ \ {0}. In fact, all the elements U n , n = 0, ±1, ±2, ±3, . . . , are
t
the same first column as C, up to the sign, and if t runs through the integers, we get all
elements of M with the same first column. This we can see as follows.
Let D = ±( ac gh ) be any element of M with the same first column. Then
1 = ad − bc = ah − gc
from the determinant. It follows that a(d − h) = c(b − g). Since gcd(a, c) = 1 we get
c|(d − h), that is, there exists a t ∈ ℤ with ct = d − h, and therefore h = d − ct. We get
with this that ad − bc = a(d − ct) − gc, that is, g = b − at.
Hence, D = ±( ac b−at
d−ct ). Now consider CU
−t
∈ M, t ∈ ℤ, then
a b 1 −t a b − at
CU −t = ± ( )( ) = ±( ).
c d 0 1 c d − ct
This shows that distinct elements of F have distinct first columns, up to sign.
Magnus, see [89], had the idea to use this for cryptographic protocols. Since the en-
tries in the generating matrices are positive we can do the following.
Choose a set T1 , . . . , Tn of projective matrices from the set above with n large enough
to encode a desired plaintext alphabet 𝒜. Any message would be encoded by a word
w(T1 , . . . , Tn ) with nonnegative exponents. This represents an element g of F. The two
elements in the first column determine w and therefore g. Receiving w then determines
the message uniquely. Pure free cryptography as Magnus proposed is subject to many
attacks. We will discuss this further in Section 24.3.
The idea of using the difficulty of group theory decision problems in devising hard one-
way functions for cryptographic purposes was first developed by Magyarik and Wagner
in 1985, see [103]. They devised a public key protocol based on the difficulty of the so-
lution to the word problem. Although this was a seminal idea, their basic cryptosystem
was really unworkable and not secure in the form they presented.
Wagner and Magyarik outlined a conceptual public key cryptosystem based on the
hardness of the word problem for finitely presented groups. At the same time, they gave
a specific example of such a system. Gonzalez Vasco and Steinwandt, see [78], proved
that their approach is vulnerable to so-called reaction attacks. In particular, for the pro-
posed instance it is possible to retrieve the private key just by watching the performance
of a legitimate recipient.
The general scheme of the Wagner and Magyarik public key cryptosystem is as fol-
lows. Let X be a finite set of generators, and let R and S be finite sets of relators on X.
Consider the two groups G and G0 with presentations
The group G0 is then a homomorphic image of G. We assume first that G has a hard word
problem so that the word problem in G is not solvable in polynomial time. We next
assume that the homomorphic image G0 has a word problem solvable in polynomial
time, that is an easy word problem.
Choose two words w0 and w1 which are not equivalent in G0 (and hence not equiva-
lent in G since G0 is a homomorphic image of G). The public key is the presentation ⟨X; R⟩
and the chosen words w0 and w1 . To encrypt a single bit ∈ {0, 1}, pick wi and transform it
into a ciphertext word w by repeatedly and randomly applying Tietze transformations
to the presentation ⟨X; R⟩. To decrypt a word w, run the algorithm for the word problem
of G0 in order to decide which of wi w−1 is equivalent to the empty word for the pre-
sentation ⟨X; R ∪ S⟩. The private key is the set S. As pointed out by González Vasco and
Steinwandt, this is not sufficient and Wagner and Magyarik are not clear on this point.
The public key should be a deterministic polynomial-time algorithm for the word prob-
lem of G0 = ⟨X; R ∪ S⟩. Just knowing S does not automatically and explicitly give us an
efficient algorithm (even if such an algorithm exists).
Although the Wagner–Magyarik protocol was not workable as a public key system,
the idea opened the door for using similar types of encryption involving group theoretic
decision problems.
𝒜 → {w1 , . . . , wk }
transversal from which the set of generators for H was constructed, the Reidemeister–
Schreier rewriting process allows us to algorithmically rewrite an element of H. Given
such an element expressed as a word w = w(x1 , . . . , xr ) in the generators of F this
algorithm rewrites w as a word w⋆ (w1 , . . . , wk ) in the generators of H.
Pure free group cryptosystems are subject to various attacks and can be broken of-
ten easily. However, a public key free group cryptosystem using a free group represen-
tation in the modular group was developed by Baumslag, Fine and Xu, see [67] and [68].
The most successful attacks on free group cryptosystems are called length based attacks.
The general idea in a length based attack is that an attacker multiplies a word in cipher-
text by a generator to get a shorter word which then could possibly be decoded. We refer
to [76] for more on length based attacks.
Baumslag, Fine and Xu in [67] described the following general encryption scheme
using free group cryptography. A further enhancement was discussed in the paper [68].
We start with a finitely presented group
G = ⟨X; R⟩,
ρ : G → G.
G can be any one of several different kinds of objects; linear group, permutation group,
power series ring, etc.
We assume that there is an algorithm to re-express an element of ρ(G) in G in terms
of the generators of G. That is if g = w(x1 , . . . , xn ) ∈ G, where w is a word in these
generators and we are given ρ(g) ∈ G, we can algorithmically find g and its expression
as the word w(x1 , . . . , xn ).
Once we have G, we assume that we have two free subgroups K, H with
H ⊂ K ⊂ G.
We assume that we have fixed Schreier transversals for K in G and for H in K both of
which are held in secret by the communicating parties Bob and Alice. Now based on the
fixed Schreier transversals we have sets of Schreier generators constructed from the
Reidemeister–Schreier process for K and for H:
k1 , . . . , km , . . . for K
and
h1 , . . . , ht , . . . for H.
Notice that the generators for K will be given as words in x1 , . . . , xn , the generators
of G, while the generators for H will be given as words in the generators k1 , k2 , . . . for K.
24.3 Free Group Cryptosystems � 377
We note further that H and K may coincide and that H and K need not in general be free
but only have a unique set of normal forms so that the representation of an element in
terms of the given Schreier generators is unique.
We will encode within H, or more precisely within ρ(H). We assume that the num-
ber of generators for H is larger than the set of characters within our plaintext alphabet.
Let 𝒜 = {a, b, c, . . . } be our plaintext alphabet. At the simplest level we choose a starting
point i, within the generators of H, and encode
a → hi , b → hi+1 , . . . , etc.
Suppose that Bob wants to communicate the message w(a, b, c, . . . ) to Alice where
w is a word in the plaintext alphabet. Recall that both Bob and Alice know the var-
ious Schreier transversals which are kept secret between them. Bob then encodes
w(hi , hi+1 , . . . ) and computes the element w(ρ(hi ), ρ(hi+1 ), . . . ) in G which he sends to Al-
ice. This is sent as a matrix if G is a linear group or as a permutation if G is a permutation
group and so on.
Alice uses the algorithm for G relative to G to rewrite w(ρ(hi ), ρ(hi+1 ), . . . ) as a word
w⋆ (x1 , . . . , xn ) in the generators of G. She then uses the Schreier transversal for K in
G to rewrite using the Reidemeister–Schreier process w⋆ as a word w⋆⋆ (k1 , . . . , ks ) in
the generators of K. Since K is free or has unique normal forms this expression for the
element of K is unique. Once she has the word written in the generators of K she uses
the transversal for H in K to rewrite again, using the Reidemeister–Schreier process,
in terms of the generators for H. She then has a word w⋆⋆⋆ (hi , hi+1 , . . . ) and using the
allocation hi → a, hi+1 → b, . . . decodes the message.
In an actual implementation an additional random noise factor is added. This is
explained in more detail below.
We now describe an implementation of this process using for the base group G
the classical modular group M = PSL(2, ℤ). Further, this implementation uses a poly-
alphabetic cipher which is secure. This was introduced originally in [67] and [68].
The system in the modular group M works as follows. A list of finitely generated
free subgroups H1 , . . . , Hm of M is public and presented by their systems of generators
(presented as matrices). In a full practical implementation it is assumed that m is large.
For each Hi we have a Schreier transversal
h1, i , . . . , ht(i), i
w1, i , . . . , wm(i), i
Bob and Alice know these subgroups in terms of free group generators what is made
public are generating systems given in terms of matrices.
The subgroups on this list and their corresponding Schreier transversals can be cho-
sen in a variety of ways. For example the commutator subgroup of the modular group is
free of rank 2 and some of the subgroups Hi can be determined from homomorphisms
of this subgroup onto a set of finite groups.
Suppose that Bob wants to send a message to Alice. Bob first chooses three integers
(m, q, t) where m is the choice of the subgroup Hm , q is the choice of the starting point
among the generators of Hm for the substitution of the plaintext alphabet, and t is the
choice of the size of the message unit.
We clarify the meanings of q and t. Once Bob chooses m, to further clarify the mean-
ing of q, he makes the substitution
Again the assumption is that m(i) ≫ l so that starting almost anywhere in the sequence
of generators of Hm will allow this substitution. The message unit size t is the number
of coded letters that Bob will place into each coded integral matrix.
Once Bob has chosen (m, q, t) he takes his plaintext message w(a, b, . . . ) and groups
blocks of t letters. He then makes the given substitution above to form the corresponding
matrices in the modular group:
T1 , . . . , Ts .
We now introduce a random noise factor. After forming T1 , . . . , Ts Bob then multiplies
on the right each Ti by a random matrix in M say RTi (different for each Ti ). The only
restriction on this random matrix RTi is that there is no free cancellation in forming the
product Ti RTi . This can be easily checked and ensures that the freely reduced form for
Ti RTi is just the concatenation of the expressions for Ti and RTi . Next he sends Alice the
integral key (m, q, t) by some public key method (RSA, Anshel–Goldfeld, etc.). He then
sends the message as s random matrices
Hence what is actually being sent out are not elements of the chosen subgroup Hm
but rather elements of random right cosets of Hm in M. The purpose of sending coset
elements is two-fold. The first is to hinder any geometric attack by masking the sub-
group. The second is that it makes the resulting words in the modular group generators
longer—effectively hindering a brute force attack.
To decode the message Alice first uses public key decryption to obtain the integral
keys (m, q, t). She then knows the subgroup Hm , the ciphertext substitution from the gen-
erators of Hm and how many letters t each matrix encodes. She next uses the algorithms
described in Section 24.2 to express each Ti RTi in terms of the free group generators of M
24.4 Non-Abelian Digital Signatures � 379
say wTi (y1 , . . . , yn ). She has knowledge of the Schreier transversal, which is held secretly
by Bob and Alice, so now uses the Reidemeister–Schreier rewriting process to start ex-
pressing this freely reduced word in terms of the generators of Hm . The Reidemeister–
Schreier rewriting is done letter by letter from left to right. Hence when she reaches
t of the free generators she stops. Notice that the string that she is rewriting is longer
than what she needs to rewrite in order to decode as a result of the random matrix RTi .
This is due to the fact that she is actually rewriting not an element of the subgroup but
an element in a right coset. This presents a further difficulty to an attacker. Since these
are random right cosets it makes it difficult to pick up statistical patterns in the genera-
tors even if more than one message is intercepted. In practice the subgroups should be
changed with each message.
The initial key (m, q, t) is changed frequently. Hence as mentioned above this
method becomes a type of polyalphabetic cipher which is difficult to decode.
– Key Generation: Alice wants to sign and send a message, m, to Bob. Alice begins by
choosing two conjugate elements u, v ∈ G with conjugator a. The conjugate pair
(u, v) is public information while the conjugator a is Alice’s secret key.
– Signature Generation: Alice chooses arbitrary b ∈ G, and computes α = ub and
y = h(mα). Then a signature σ on the message m is the triple (α, β, γ) where β = yb
and γ = ya b . She sends this to Bob for verification and acceptance.
−1
– Verification: Upon receiving the signature, Bob checks whether or not the following
hold:
(1) There exists c1 ∈ G such that u = αc1 .
(2) There exist c2 , c3 ∈ G such that γ = βc2 and y = γc3 .
(3) There exists c4 ∈ G such that uy = (αβ)c4 .
(4) There exists c5 ∈ G such that vy = (αγ)c5 .
Bob accepts the signature if and only if conditions (1)–(4) hold.
The security of this scheme lies in the assumption that, given a pair of conjugate ele-
ments u, v ∈ G, finding elements α, β, γ such that (1)–(4) above hold is infeasible. If the
conjugator a can be found, then (α, β, γ) = (ub , yb , ya b ) satisfy properties (1)–(4) for any
−1
prover. To each user in conjunction with a standard password there will be assigned a
finitely presented group with a solvable word problem. We call this the challenge group.
This will be done randomly by the group randomizer system and will be held in secret
by the prover and the verifier.
Cryptographically, we assume the adversary can steal the encrypted form of the
group theoretic responses. Probabilistically this does not present a problem. Each chal-
lenge response set of questions forms a virtual one time key pad as we will explain.
Therefore the adversary must steal three things: the original password, the challenge
group and the group randomizer. Hence there is almost total security in the challenge
response system.
Further there is an infinite supply of finitely presented groups to use as challenge
groups and an infinite supply of challenge response questions that never have to be
duplicated. We will explain these in the section on this protocol’s security. Finally the
method is symmetric between the verifier and the prover, so while the verifier verifies
the prover’s password simultaneously the prover verifies that he or she is dealing with
the verifier.
The theoretical security of the system is provided by several results in asymptotic
group theory which we discuss in Section 24.6. In particular, a result of Lysenok and
Myasnikov, see [91], implies that stealing the challenge group is NP-hard while a result
of Jitsukawa, see [81], says that the asymptotic density of using homomorphisms to attack
the group randomizer protocol is zero.
The whole password protocol depends upon the group randomizer system. This is
a computer program that can handle several elementary tasks involving finitely pre-
sented groups. The scope of the particular group randomizer system will depend on
the type of login protocol or cryptographic protocol desired. At the most basic level the
group randomizer system has the ability to do the following things:
1. To recognize a finite presentation of a finitely presented group with a solvable word
problem and manipulate arbitrary words in the alphabet of generators according
to the rewriting rules of the presentation. In particular, if the group has a normal
form for each element, the group randomizer can rewrite an arbitrary word in the
generators in terms of its group normal form.
2. Given a finite presentation of a group with a solvable word problem, to recognize
whether two free group words have the same value in the given group when con-
sidered in terms of the given generators of the group.
3. To randomly generate free group words on an alphabet of any finite size.
4. To recognize and store sets of free group words w1 , . . . , wk on an alphabet x1 , . . . , xn
and rewrite words w(w1 , . . . , wk ) as the corresponding word in x1 , . . . , xn .
5. Given a free group of finite rank on x1 , . . . , xn and a set of words w1 , . . . , wk on an al-
phabet x1 , . . . , xn , to solve the membership problem in F relative to H = ⟨w1 , . . . , wk ⟩,
the subgroup of F generated by w1 , . . . , wk .
6. Given a stored finitely presented group or a stored set of free group words, the ran-
domizer can accept a random free group word and rewrite it as a normal form in
382 � 24 Non-Commutative Group Based Cryptography
the finitely presented group in the former case or as a word in the ambient free
group in the latter case.
We now present several variations on secure password authentication using the group
randomizer. First we give an overall outline of the protocol.
This is a symmetric key cryptographic authentication protocol. Both the prover and ver-
ifier use a single private key to both encrypt and decrypt within the authentication pro-
cess. At the first step the prover and verifier must communicate directly, either face-to-
face or by a public key method, to set the private shared secret. This is the model now
used for most password/password back-up schemes. We assume that both the prover
and verifier have a group randomizer system. For security analysis we assume that an
adversary or eavesdropper has access to the encrypted form of the transmission but is
passive in that the adversary will not change any transmissions.
1. The prover and verifier communicate directly to set up a common shared secret
(P, G) where P is a standard password and G is a challenge group. Each prover’s
challenge group is unique to that prover. The challenge group is a finitely presented
group with a solvable word problem and satisfying the strong generic free group
property which we discuss in Section 24.6. The password is chosen by the prover
while the challenge group is randomly chosen by the group randomizer system.
2. The prover presents the password to the verifier. The group randomizer of the ver-
ifier presents a group theoretic “question” concerning the challenge group G to the
prover. The assumption is that this “question” is difficult in the sense that it is in-
feasible to answer it if the group G is unknown. The question is then answered by
the group randomizer. This is repeated a finite number of times. If the answers are
correct, the prover (and the password) is verified.
3. The protocol is then repeated from the viewpoint of the prover, authenticating the
verifier to the prover.
We assume that both the prover and the verifier has a group randomizer. Each prover
has a standard password. Suppose that F is a free group on {x1 , . . . , xn }. The prover’s
password is linked to a finitely generated subgroup of a free group given as words in
the generators, that is, the prover’s password is linked to w1 , . . . , wk where each wi is a
word in x1 , . . . , xn . The group G = ⟨w1 , . . . , wk ⟩ is called the challenge group. In general
we have k ≠ n. The prover does not need to know the generators. The randomizer can
randomly choose words from this subgroup and then freely reduce them. The prover
has the challenge group or subgroup also stored in its randomizer.
24.5 Password Authentication Using Combinatorial Group Theory � 383
The prover submits his or her standard password to the prover. This activates the
verifier’s randomizer to the prover’s set of words. The verifier now submits a random
free group word on y1 , . . . , yk to the prover’s randomizer say w(y1 , . . . , yk ). The prover’s
randomizer treats this as w(w1 , . . . , wk ) and then reduces it in terms of the free group
generators x1 , . . . , xn and rewrites it as w⋆ (x1 , . . . , xn ). The verifier checks that this is cor-
rect, that is, w(w1 , . . . , wk ) = w⋆ (x1 , . . . , xn ) on the free group on x1 , . . . , xn . If it is, the
verifier continues and does this three (or some other finite number) of times. There is
one proviso. The verifier submits a word to the prover only once, so that a submitted
word can never be reused. The prover’s randomizer will recognize if it has (this is a
verification to the prover of the verifier).
To verify that the verifier is legitimate, the process is repeated from the prover’s
randomizer to the verifier.
An attacker only has access to the transmitted words. Given a series of free group
words there is essentially zero probability of reconstructing the subgroup. To prevent an
attacker using an already used word to gain access, the group randomizer system allows
a free group word, submitted as a challenge word, to be used only once. If an attacker
gets access to the verifier and submits an already submitted word or vice versa from the
prover, this will red flag the attempt. We also suggest that if there is a previously used
word, indicating perhaps an attack, the group randomizer should change the prover’s
group. The beauty of this system is that this can be done extremely easily; change several
of the words for example. Essentially this presents an essential one-time key pad each
time the prover presents the password. The map yi → wi is a homomorphism and an
attacker can manipulate various equations in an attempt to solve. Presumably, if there
are enough equations, the words w1 , . . . , wk can be discovered. However, in Section 24.6
we present a security proof based on several results in asymptotic group theory showing
that this cannot happen with asymptotic density one.
We suggest a noise/diffusion enhancement. The provers challenge group generator
words w1 , . . . , wk are indexed. With each use the randomizer applies a random permu-
tation ϕ on {1, . . . , k} to scramble the indices. These permutations are coded and stored
both in the prover’s randomizer and the verifier’s one. This prevents a length based at-
tack by an eavesdropper since discovering, for example, what w37 is, is of no use since
it will be indexed differently for the next use. The coded permutation is sent as part of
the challenge.
This is essentially the same method, however, rather than working with an ambient free
group we work with a given finitely presented group with a solvable word problem. Let
G = ⟨X; R⟩ be the group. As before we assume that both the prover and the verifier has a
group randomizer. Each prover has a standard password. Suppose that X = {x1 , . . . , xn }
384 � 24 Non-Commutative Group Based Cryptography
and F is a free group on {x1 , . . . , xn }. The prover’s password is linked to a finitely gen-
erated subgroup of G again given as words in the generators X, that is, the prover’s
password is linked to w1 , . . . , wk where each wi is a word in x1 , . . . , xn . As before, we let
k ≠ n. The randomizer can randomly choose words from this subgroup and then reduce
them via the finite presentation. The verifier has the group and subgroup also stored in
its randomizer.
The remainder of the procedure is exactly the same as in the free group case. The
prover submits his or her standard password to the verifier. This activates the verifier’s
randomizer to the prover’s set of words. The verifier now submits a random free group
word on y1 , . . . , yk to the prover’s randomizer, say, w(y1 , . . . , yk ). The prover’s randomizer
treats this as w(w1 , . . . , wk ) and rewrites it as w⋆ (x1 , . . . , xn ). The verifier checks that this
is correct, that is, w(w1 , . . . , wk ) = w⋆ (x1 , . . . , xn ), however, this time in the group G. If it is,
the verifier continues and does this three (or some other finite number) of times. There
is one proviso. The verifier submits a word to the prover only once so that a submitted
word can never be reused. The prover’s randomizer will recognize if it has (this is a
verification to the prover of the verifier).
To verify that the verifier is legitimate, the process is repeated from the prover’s
randomizer to the verifier.
As in the free group method, an attacker only has access to the transmitted words.
Given a series of group words there is zero probability of reconstructing the group, how-
ever, as in the free group method a given challenge response word is to be used only
once.
|S ∩ Bn |
lim
n→∞ |Bn |
provided this limit exists. We say that the property 𝒫 is generic if the asymptotic density
of the set S of elements satisfying 𝒫 equals 1.
This concept can be easily extended to properties of finitely generated subgroups.
We consider the asymptotic density of finite sets of elements that generate subgroups
that have a considered property. For example, to say that a group has the generic free
group property we mean that
|Sm ∩ Bm,n |
lim =1
m,n→∞ |Bm,n |
where Sm is the collection of finite sets of elements of size m that generate a free sub-
group while Bm,n are all the m-element subsets within the n-ball. We refer to the pa-
per [70] and the book [93] for terminology and further definitions.
We say that a group G has the generic free group property if a finitely generated sub-
group is generically a free group. For example, a result by Epstein, see [25], says that the
group GL(n, ℝ) satisfies the generic free group property. A group G has the strong generic
free group property if given randomly chosen elements g1 , . . . , gn in G then generically
they are a free basis for the free subgroup they generate. Jitsukawa, see [81], proved
that free groups have the strong generic free group property. That is, given k random
elements w1 , . . . , wk in the free group on y1 , . . . , yn , then with asymptotic density one the
elements w1 , . . . , wk are a free basis for the subgroup they generate. We compare this
with the Nielsen–Schreier theorem that says that w1 , . . . , wk generate a free group. In
the context of the group randomizer protocols, the strong generic free group property
implies that if v1 (y1 , . . . , ym ), . . . , vk (y1 , . . . , ym ) have already been presented as challenge
words then the probability is approximately zero that a new challenge word v(y1 , . . . , ym )
lies in the subgroup generated by v1 , . . . , vk , and hence a homomorphism attack is nulli-
fied.
The strong generic free group property has been extended to many classes of groups
including surface groups by Fine, Myasnikov and Rosenberger, see [29]. Let us mention
some further results. Gilman, Myasnikov and Osin, see [77], showed that torsion-free
hyperbolic groups have the generic free group property. Myasnikov and Ushakov, see
[94], showed that pure braid groups Pn with n ≥ 3 also have the strong generic free
group property. We will show that all Fuchsian groups of finite co-volume and all braid
groups Bn with n ≥ 3 have the strong generic free group property.
Extremely useful in proving that a group has the generic or strong generic free
group property is the following, see Exercise 6.
Theorem 24.6.1. Let G be a group and N a normal subgroup. If the quotient G/N satisfies
the strong generic free group property then G also satisfies the strong generic free group
property.
386 � 24 Non-Commutative Group Based Cryptography
g
⟨a1 , . . . , ag , b1 , . . . , bg ; ∏[ai , bi ] = 1⟩
i=1
Corollary 24.6.3. The strong generic free group property is suitable in any finitely gener-
ated group G which has a non-Abelian free quotient.
We remark that in a strong generic free group the conjugacy problem and the root
extraction problem are generic problems.
In [23] it was shown that there is an interesting connection between the strong
generic free group property of a group G and its subgroups of finite index. The main
result of that paper is that a finitely generated group which has a non-Abelian free quo-
tient satisfies the strong generic free group property if and only if each subgroup of finite
index satisfies the strong generic free group property. As a consequence of this and The-
orem 24.6.4, it follows that many important classes of groups, such as finitely generated
Fuchsian groups with finite co-volume and the braid groups Bn for n ≥ 3 satisfy the
strong generic free group property.
Proof. Let X be a finite generating system for G. As X is finite, it follows that H is finitely
generated, and H has finite index in G. Let Y be a finite generating system for H. Let 𝒫
be the strong generic free group property and suppose that 𝒫 is a suitable and generic
property in H. Let Sm be the collection of m element subsets that generate a free sub-
group of G.
24.6 The Strong Generic Free Group Property � 387
Let Bk (G) be the ball of radius k in the Cayley graph of G (with respect to X). Since
H is a subgroup of finite index n in G, there exists a complete system of representatives
a1 , . . . , an ∈ G for the left cosets of H in G. We consider the elements of H as vertices in
the Cayley graph of G. Let Bk (H) be the set of vertices in Bk (G) which belong to H. For all i
let ai Bk (H) denote the displaced Bk (H) around the representative ai in the Cayley graph
of G, that is the set of all elements of the form ai h, where h ∈ H is of length ≤ k. Define
Bk′ (H) = ⋃ni=1 ai Bk (H) as the (disjoint) union of these Bk (H). We have |Bk′ (H)| = n⋅|Bk (H)|,
since the cosets ai H and also the ai Bk (H) with them are pairwise disjoint. Let t ∈ ℕ be
the length of the longest geodesic in the Cayley graph of G from the identity element 1 to
one of the representatives ai . With this t we have
Bk−t
′
(H) ⊂ Bk (G) ⊂ Bk+t
′
(H). (1)
Now let Bm,k (G) and ai Bm,k (H) be the collection of m element subsets within Bk (G) and
ai Bk (H), respectively, for i = 1, . . . , m. Let A be any m-element subset within Bk (G). Then
A splits into the disjoint union ⋃ni=1 Ai of mi -element subsets Ai within ai Bk+t (H) for i =
1, . . . , n and we have 0 ≤ mi ≤ m for all i (some of the mi may be zero).
In this sense, if we define Bm,k
′
(H) = ⋃ni=1 ai Bmi ,k (H), m = m1 + ⋅ ⋅ ⋅ + mn , then we get
the inclusions
Bm,k−t
′
(H) ⊂ Bm,k (G) ⊂ Bm,k+t
′
(H). (1’)
Here, we consider a disjoint union ⋃ni=1 Ai of mi -element subsets Ai in ai Bk−t (H) with
m1 + ⋅ ⋅ ⋅ + mn = m as an m-element subset Bk (G). If A is a free generating system for a
free subgroup of G, then each Ai is a free generating system for a free subgroup of G.
Then intersecting with Sm leads to
Sm ∩ Bm,k−t
′
(H) ⊂ Sm ∩ Bm,k (G) ⊂ Sm ∩ Bm,k+t
′
(H). (2)
On the other hand, if some Ai ⊂ ai Bk (H) contains a subset which generates a free sub-
group of G, then also Aj = aj ai−1 Ai ⊂ aj Bk (H) contains a subset which generates a free
subgroup of G. More concretely, if Ai freely generates a free subgroup of rank p, then
⟨Aj ⟩ has a p generating system which contains a basis for a free subgroup of rank at
least p − 1.
This shows that for k large enough the sets Sm ∩ ai Bm,k (H) are of the same order of
magnitude in m. Applying this we get approximately the equality
|Sm ∩ Bm,k
′
(H)| |Sm ∩ Bm,k (H)|
′ (H)|
= (3)
|Bm,k |Bm,k (H)|
Assume that 𝒫 holds and is suitable in H. Then there exists a constant integer s > 0
such that the length of each y ∈ Y written as a word in X is less then s. Therefore the
fraction on the right hand side of (3) converges to 1 as k → ∞ and m → ∞. Therefore
from the inclusions (1’) and (2), completing the proof of (1). The proof for (2) follows in
an entirely analogous manner.
Corollary 24.6.5. Let G be a finitely generated group and H ⊂ G a subgroup of finite index.
Assume that both G and H have non-Abelian free quotients. Then G has the strong generic
free group property if and only if H has the strong generic free group property.
We now show the strong generic free group property for braid groups.
Theorem 24.6.6. The braid group Bn , n ≥ 3, has the strong generic free group property.
Proof. Denote by σi,i+1 the transposition (i, i + 1) in the symmetric group Sn . The map
σi → σi,i+1 , i = 1, . . . , n − 1, defines a canonical epimorphism π: Bn → Sn . The kernel
of σ is a subgroup of index n! in Bn , called the pure braid group PBn . The group PBn ,
n ≥ 3, maps onto the group PB3 , and the group PB3 is isomorphic to F2 × ℤ, where F2 is
the free group of rank 2. Hence, PBn , n ≥ 3, maps onto F2 . Now, the result follows from
Corollary 24.6.3 and the Inheritance Theorem 24.6.4.
case. The main problem with the cryptographic protocols based on braid groups turns
out to be the key generation.
Public and secret keys are so far chosen at random, and this implies often that the
protocols are insecure against algorithms which have generically a fast complexity. The
importance and the future of braid groups cryptography depends on finding a suitable
key generation procedure, or in popular words, in finding so-called suitable black holes.
Another promising possibility is to look for nongeneric properties of braid groups which
could be used for cryptographic protocols.
In order to analyze the security of the group randomizer password protocols, we make
the security assumption that an adversary has access to the coded group theoretic re-
sponses. The strength of the proposed protocol include that an attacker must steal three
things: the original password, the group randomizer and the challenge group. There is
no access without all three. This immediately nullifies middleman attacks. If the adver-
sary pretends to be the verifier to obtain the group words the attack is thwarted by the
facts that the prover can verify the verifier and further if the attacker just transmits from
the middle, nothing can be stolen since each time through a new challenge word must
be used. Further, the group randomizer has an infinite supply of both subgroups and
challenge responses that are done randomly. In addition, since a challenge word can
be used only once the protocol nullifies replay attacks. Since challenge responses are
machine to machine there is essentially zero probability of an incorrect response. The
protocol shuts down with an incorrect response and hence repeat attacks are harmless.
These are in distinction to answer-driven challenge–response systems where a
prover often forgets or misspells a response. In these systems a prover is usually per-
mitted several opportunities to answer making it susceptible to both man-in-the-middle
and repeat attacks.
There are two theoretical attacks that must be dealt with. Relative to these the se-
curity of the system, and hence a security proof for the protocol, is provided by several
results in asymptotic group theory.
The most straightforward attack is for the adversary to collect enough challenge
words and responses. This provides a system of equations in a free group (or a finitely
presented group)
zi = wi (x1 , . . . , xn )
However, a result by Lysenok and Myasnikov, see [91], shows that solving such sys-
tems of equations in free groups (and in most finitely presented groups) is NP-hard.
Hence this method of attack is impractical in most cases.
A second method of attack is based on the following. The mapping yi → wi is a ho-
momorphism. If a challenge word appears in the subgroup generated by previous chal-
lenge words then an attacker can use this to answer a challenge without ever solving for
the challenge group. However, the probability of succeeding with this approach is essen-
tially zero due to Jitsukawa’s result mentioned in the previous section. Each challenge
word lies in a free group which has the strong generic free group property. Hence as ex-
plained in the previous section the probability is essentially zero that a new challenge
word is in the subgroup generated by previous challenge words.
is the one with n total participants and in which any t participants can combine their
shares to recover the secret but not fewer than t. The number t is called the thresh-
old. The scheme is called a secure secret sharing scheme if, given fewer shares than the
threshold, there is no chance to recover the secret.
Panagopoulos, see [96], devised a secret sharing scheme based on the word problem
in finitely presented groups. It is an (t, n)-threshold scheme and its main advantage over
many other secret sharing schemes is that it does not require the secret message to be de-
termined before each individual person receives his share of the secret. For this scheme
it is assumed that the secret is given in the form of a binary sequence. The scheme is as
follows.
1. A finitely presented group G = ⟨x1 , x2 , . . . , xk ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩ is chosen. It is
assumed that the word problem is solvable for the presentation and that m = (t−1 n ).
Then any t of the n persons can obtain the sequence a1 , . . . , ak by taking the union of
the subsets of the relations of G that they possess. Thus they obtain the presentation
G = ⟨x1 , x2 , . . . , xk ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩ and can solve the word problem wi = 1 in G for
i = 1, . . . , k. A collection of fewer than t persons cannot decode the message correctly,
since the union of fewer than t of the sets R1 , . . . , Rn contains some but not all of the
relations r1 , . . . , rm .
Such a collection leads to a group presentation
G
̃ = ⟨x1 , x2 , . . . , xk ; rj = ⋅ ⋅ ⋅ = rj = 1⟩
1 p
sage. Then any t persons may check whether this predetermined sequence is contained
in the encoded message and thus validate it.
In the paper by Panagopoulos, see [96], he also describes some methods for attacking
this scheme and makes some suggestions for possible group presentation types to use.
Moldenhauer [90] proposed a modification of Panagopoulos’ (t, n)-threshold scheme
using Nielsen transformations. We need the following.
−rj −1 + rj2
Tj = ( )
1 −rj
where rj are integers and rj+1 − rj ≥ 3, r1 ≥ 2. Then the T1 , T2 , . . . form a basis of a free
group of countable rank.
Proof. The isometric circle of Tj is given by |z−rj | = 1 and that of Tj−1 is given by |z+rj | = 1.
The respective isometric disks
are pairwise disjoint because of the restriction on rj . Let F be the group generated by
{T1 , T2 , . . . }. Clearly, F is a subgroup of SL(2, ℤ). Let Sk ⋅ ⋅ ⋅ S1 be a reduced word in F. Each
Si is a Tj or Tj−1 . It may happen that Si+1 = Si . Suppose p lies outside every isometric disk
K(Tj ), K(Tj−1 ), j = 1, 2, . . . . Such a P exists because F is a subgroup of the SL(2, ℤ). Then
S1 (P) lies inside K(S1−1 ). Since S1 (P) lies outside K(S2 ), this is true even if S1 = S2 , it is seen
that S2 S1 (P) is inside K(S2−1 ). We conclude that Q = Sk ⋅ ⋅ ⋅ S1 (P) is inside K(Sk−1 ). Hence,
Sk ⋅ ⋅ ⋅ S1 ≠ 1(= E2 ). This shows that F is free on {T1 , T2 , . . . }.
We now describe the modified (t, n)-threshold scheme. We write Nrj instead of Tj
and choose a large number m of the form m = 2n , n ≥ 64. This allows us to use the idea
of linear congruence generators (modulo m) to get a stream cipher. The dealer performs
the following to distribute the secret among n participants:
1. Start with a set (x1 , Nr1 ), . . . , (xm , Nrm ), where x1 , . . . , xm are the generators of the free
group F(x1 , . . . , xm ) and Nr1 , . . . , Nrm are matrices in SL(2, ℤ) of the form
−ri −1 + r12
Nri = ( )
1 −ri
satisfying r1 ≥ 2 and ri+1 ≥ ri + 3 (more generally, any free generating set for a free
subgroup in SL(2, ℚ)). The secret is a rational number
m m
1 1
∑ =∑ .
i=1
| tr(Nri )| i=1 2|ri |
24.8 The Ko–Lee and AAG Protocols � 393
(ν1 , M1 ), . . . , (νm , Mm ).
3. Distribute subsets in Panagopoulos’ scheme. To recover the secret, perform the fol-
lowing:
4. Take a union of their shares. In the case that t participants gather, they are able to
recover the set (ν1 , M1 , . . . , νm , Mm ).
5. Apply a sequence of Nielsen transformations to the obtained set of pairs in order
to Nielsen-reduce the first components and obtain the set x1 , . . . , xm in the first
components. As a result, in the second components, we get the original matrices
Nr1 , . . . , Nrm . Compute the sum ∑m 1
i=1 | tr(N )| .
i
Kotov, Panteleev and Ushakov, see [87], analyzed this secret sharing protocol. They
could reduce it to a system of polynomial equations over the free group F({x1 , . . . , xm } ∪
{a1 , . . . , am−1 }) where xi stands for an unknown matrix Nri and ai stands for the ma-
trix Mi . Replacing xi with an unknown matrix Nri and ai with Mi and performing matrix
multiplication, we obtain a system of polynomial equations which can further fed to
any computer algebra system that can solve polynomial equations, for instance CoCoA.
The solution of the systems provides the original matrices M1 , . . . , Mm . The attack
reconstructs the original data generated by the dealer and does not depend on the func-
tion of M1 , . . . , Mm used to calculate the shared secret. It seems unlikely that their attack
is successful if m ≥ 264 . If so, for chosen matrices Nr1 , . . . , Nrm we still may collect in each
round m new matrices from the countably many and/or may use the stream cipher for
a one-time pad.
Moreover, increasing the length of keys, the number of Nielsen transformations in-
creases the sizes of polynomials and seems to be successful countermeasure against
their attack. Another possibility to repel such attacks is to change the tactic and to work
with more general matrices Nr1 , Nr2 , . . . which form a free generating set of a free sub-
group in SL(k, ℝ), k ≥ 2.
Hence a solution to the conjugacy problem is usually associated with a particular class
of group presentations. For example, the conjugacy problem is solvable in free groups
and in torsion-free hyperbolic groups.
Relevant to the Ko–Lee protocol is the conjugator search problem. This is, given a
group presentation for G, and two elements g1 , g2 in G, that are known to be conjugate,
to determine algorithmically a conjugator, that is, an element h ∈ G with g1 = hg2 h−1 . It
is known, as with the decision conjugacy problem, that the conjugator search problem
is undecidable in general.
Ko, Lee et al., see [85], developed a public key exchange system that is a direct translation
of the Diffie–Hellman protocol to a non-Abelian group theoretic setting. Its security is
based on the difficulty of the conjugacy problem. We assume that the platform group
has nice unique normal forms that are easy to compute for a given group element but
hard to recover the individual group elements under group multiplication.
Recall from Section 24.1 that by this we mean that if G = ⟨X; R⟩ is a finite presenta-
tion for the group G and g ∈ G then there is a unique expression NFX (g) called a normal
form as a word in the generators X. Further, given any g ∈ G it is computationally easy
to find NFX (g). On the other hand, given g1 , g2 ∈ G and given the normal form NFX (g1 g2 ),
it is computationally difficult to recover g1 and g2 . We say that there is good diffusion in
terms of normal forms in forming products.
In any group G and for g, h ∈ G the notation g h indicates the conjugate of g by h,
that is, g h = h−1 gh. What is important for both the Ko–Lee and Anshel–Anshel–Goldfeld
protocols is that relative to this notation, group conjugation behaves exactly as ordinary
exponentiation. That is for groups elements g, h1 , h2 ∈ G we have (g h1 )h2 = g h1 h2 . That
this is true is a straightforward computation
h2
(g h1 ) = h2−1 g h1 h2 = h2−1 h1−1 gh1 h2 = (h1 h2 )−1 g(h1 h2 ) = g h1 h2 .
With this observation, the Ko–Lee protocol exactly mimics, using group conjuga-
tion, the traditional Diffie–Hellman protocol. We first start with a platform group G sat-
isfying the necessary requirements on normal forms. We assume further that the plat-
form group G has a collection of large (noncyclic) subgroups that commute elementwise.
That is, if A, B are two of these subgroups and a ∈ A and b ∈ B, then ab = ba. It is not
necessary that the subgroups themselves be Abelian.
Alice and Bob choose a pair of these commuting subgroups A and B of the platform
group G. A is Alice’s subgroup while Bob’s subgroup is B and these are secret. By assump-
tion each element of A commutes with each element of B. Further, it is not assumed that
A and/or B are themselves Abelian. Now the method completely mimics the classical
Diffie–Hellman technique.
24.8 The Ko–Lee and AAG Protocols � 395
with Alice all public information and communication is done in terms of the normal
forms of these elements.
3. The secret shared key is g ab .
A = {a1 , . . . , an }, B = {b1 , . . . , bm },
24.8 The Ko–Lee and AAG Protocols � 397
and make them public. The subgroup A is Alice’s subgroup while the subgroup B is Bob’s
subgroup.
Alice chooses a secret group word a = w(a1 , . . . , an ) in her subgroup while Bob
chooses a secret group word b = v(b1 , . . . , bm ) in his subgroup. As before, for an ele-
ment g ∈ G we denote by NFX (g) the normal form for g. Alice knows her secret word a
and knows the generators bi of Bob’s subgroup. She can then form the conjugates of the
generators of Bob’s subgroup B by her secret element a ∈ A. That is, she can compute
bai = a−1 bi a for each bi . She then makes public the normal forms of these conjugates
NFX (bai ), i = 1, . . . , m.
Bob does the analogous thing. He knows his secret word b and the generators ai ,
i = 1, . . . , n of Alice’s subgroup A and hence can compute the conjugates aib = b−1 ai b for
i = 1, . . . , n. He then makes public the normal forms of the conjugates
NFX (ajb ), j = 1, . . . , n.
Notice that this is known for both Alice and Bob. Alice knows ab = b−1 ab since she
knows a in terms of generators ai of her subgroup and she knows the conjugates by
b, since Bob has made the conjugates of the generators of A by b public. That is, Alice
knows a = w(a1 , . . . , an ) and ab = b−1 ab = w(b−1 a1 b, . . . , b−1 an b) = w(a1b , . . . , anb ). Since
Alice knows ab , she knows
In an analogous manner Bob knows [a, b] = (ba )−1 b, since he knows his secret el-
ement b in terms of the generators bj , j = 1, . . . , m, of his subgroup B and Alice has
made public the conjugates of each of his generators by her secret element a. Hence
b = v(b1 , . . . , bm ) so that ba = v(ba1 , . . . , bam ) and this is known to Bob. Since Bob knows ba
and b, he knows
a
[a, b] = a−1 b−1 ab = ab b = (b−1 ) b = (ba ) b.
−1
Notice that in this system there is no requirement that the chosen subgroups A and
B commute.
An attacker would have to know the corresponding conjugator, that is the element
that conjugates each of the generators, that is, the conjugator search problem: Given
elements g, h in a group G, where it is known that g k = k −1 gk = h, determine the conju-
gator k. It is known that this problem is undecidable in general, that is, there are groups
398 � 24 Non-Commutative Group Based Cryptography
where the conjugator cannot be determined algorithmically. On the other hand there
are groups where the conjugator search problem is solvable but difficult, that is, the
complexity of solving the conjugator search problem is hard. Such groups become the
ideal platform groups for the Anshel–Anshel–Goldfeld protocol.
The security in this system is then in the computational difficulty of the conjuga-
tor search problem. Anshel, Anshel, Goldfeld suggested, as did Ko, Lee et al., the braid
groups, Bn , as potential platforms. The braid groups are a class of infinite, finitely pre-
sented groups that arise in many different contexts. The braid group Bn has a standard
presentation with n − 1 generators.
The necessary parameters that must be decided in using the braid groups as plat-
forms for either the Ko–Lee protocol or the Anshel–Anshel–Goldfeld protocol are then
the number of generators of the braid groups used and the number of generators for
the chosen subgroups. For example B200 , the braid group on 200 strands with 12 or more
generators in the chosen subgroups might be used. It has been shown that the larger
the number of strands, the harder it is to attack the protocol. The suggested use of the
braid groups by both Anshel, Anshel and Goldfeld and Ko and Lee led to the develop-
ment of braid group cryptography. There have been various attacks on the braid group
cryptosystems.
We now summarize the formal setup for the Anshel–Anshel–Goldfeld Key Exchange
Protocol. After this we will show how to use the ElGamal method to construct a public
key encryption system from this.
A = {a1 , . . . , an }, B = {b1 , . . . , bm },
and make them public. The subgroup A is Alice’s subgroup while the subgroup B is
Bob’s subgroup.
NFX (bai ), i = 1, . . . , m.
Exercises � 399
2. Bob chooses a secret group word b = w(b1 , . . . , bm ) in his subgroup. Bob knows his
secret word b and knows the generators ai of Alice’s subgroup. He can then form
the conjugates of the generators of Alice’s subgroup A by his secret element b ∈ B.
That is, he can compute aib = b−1 ai b for each ai . He then makes public the normal
forms of these conjugates
NFX (aib ), i = 1, . . . , m.
Exercises
1. Bob has a backup authentication security system as described in Section 24.5. His
basic words are w1 = x1−1 x22 x3−2 , w2 = x15 x23 , and w3 = x25 x13 x2−2 x34 . The bank sends him
w = y21 y33 y1 . What must the group randomizer send back?
2. Let M = PSL(2, ℤ) be the modular group. Let 𝒜 = {a, b, c, d, e, f , g} be a 7 letter
plaintext alphabet. Choose a free subgroup of the modular group to encrypt these.
(a) Using your basic encryption and message units of size 3, what would be the
encryption matrices for the message abbdceffgcba?
(b) Using your basic encryption and the algorithm given in Problem 1, what is the
plaintext message for ( 85 35 ) and ( 73 49 )?
3. The following protocol is based on the factorization search problem which is: Given
two subgroups A, B of a group G and w ∈ G, to find a ∈ A, b ∈ B with w = ab. This
400 � 24 Non-Commutative Group Based Cryptography
protocol is described in [93]. For this problem you must show and explain that the
protocol works.
The requirements for the protocol are as follows: a public group and two public
subgroups A, B that commute elementwise. Alice randomly chooses two private el-
ements a1 ∈ A and b1 ∈ B and sends a1 b1 to Bob. Bob does the same choosing a2 ∈ A
and b2 ∈ B and sends a2 b2 to Alice.
The common shared secret is K = a2 a1 b1 b2 .
4. Prove Epstein’s theorem: Given a random finitely generated subgroup of GL(n, ℝ),
2
with probability 1 it is a free group. The probability is standard measure on ℝn .
Hint: Given a finite set of matrices in GL(n, ℝ), think what a relation between them
would mean algebraically on the coefficients and where this would place the matri-
ces topologically.
5. Let G = H1 ∗ ⋅ ⋅ ⋅ ∗ Hn with n ≥ 2 be the free product of finitely many nontrivial
groups. Suppose that H1 ≥ 3 if n = 2. Show that G has the strong generic free group
property.
6. Let G be a group and N be a normal subgroup. Show: If the quotient G/N satisfies
the strong generic free group property then G also satisfies the strong generic free
group property.
7. Show that a group with a generating set X is an epimorphic image of F(X). Moreover,
every map X → G with G a group can be extended to a unique homomorphism
f : F(X) → G.
8. Let F be a free group on {x1 , . . . , xn }. Show that each conjugation xi → gxi g −1 with
g ∈ F can be written as a sequence of elementary Nielsen transformations.
9. Let F be a free group on {x1 , . . . , xn }. Show that the automorphism group Aut(F) is
generated by the elementary Nielsen transformations (N1) and (N2).
10. Let PBn stand for the pure braid group, n ≥ 3. Using the Reidemeister–Schreier
method, show that this group has a presentation with generators
{
{ Aij if s < i or j > r,
{
{
{Ais Aij Ais
{ if i < j = r < s,
Ars Aij A−1
rs = { −1 −1
{Aij Air Aij Air Aij
{
{ if i < r < j = s,
{
{ −1 −1
{Ais Air Ais Air Aij Air Ais Air Ais if i < r, j, s.
−1 −1
11. Show that the pure braid group PB3 is isomorphic to the direct product F2 × ℤ.
12. Let Fn be the free group of rank n on the free generating system X = {x1 , . . . , xn }
and let β ∈ Aut(Fn ). Show that β ∈ Bn if and only if β satisfies the following two
conditions:
(1) β(xi ) is conjugate to another generator.
(2) β(x1 ⋅ ⋅ ⋅ xn ) = x1 ⋅ ⋅ ⋅ xn .
Exercises � 401
13. Let G be B20 , the braid group on 19 generators σ1 , . . . , σ19 . Let A be the subgroup
generated by σ1 , . . . , σ5 and B the subgroup generated by σ16 , . . . , σ19 .
Let g = σ73 σ12 σ3−1 σ5−2 σ10 , a = σ24 σ32 σ1 , and b = σ17
4 −1
σ18 σ17 .
(a) What is the secret shared key using the Ko–Lee protocol?
(b) What is the secret shared key using the Anshel–Anshel–Goldfeld protocol?
Bibliography
General Abstract Algebra
https://doi.org/10.1515/9783111142524-025
404 � Bibliography
Number Theory
Cryptography
[62] I. Anshel, M. Anshel, and D. Goldfeld, An algebraic method for public key cryptography, Math. Res. Lett.,
6, 287–291, 1999.
[63] G. Baumslag, Y. Brjukhov, B. Fine, and G. Rosenberger, Some cryptoprimitives for noncommutative
algebraic cryptography, in Aspects of Infinite Groups, 26–44, World Scientific Press, 2009.
Bibliography � 405
[64] G. Baumslag, Y. Brjukhov, B. Fine, and D. Troeger, Challenge response password security using
combinatorial group theory, Groups Complex. Cryptol., 2, 67–81, 2010.
[65] G. Baumslag, T. Camps, B. Fine, G. Rosenberger, and X. Xu, Designing key transport protocols using
combinatorial group theory, Contemp. Math., 418, 35–43, 2006.
[66] G. Baumslag, B. Fine, M. Kreuzer, and G. Rosenberger, A Course in Mathematical Cryptography,
De Gruyter, 2015.
[67] G. Baumslag, B. Fine, and X. Xu, Cryptosystems using linear groups, Appl. Algebra Eng. Commun.
Comput., 17, 205–217, 2006.
[68] G. Baumslag, B. Fine, and X. Xu, A proposed public key cryptosystem using the modular group,
Contemp. Math., 421, 35–44, 2007.
[69] J. Birman, Braids, Links and Mapping Class Groups, Annals of Math Studies, 82, Princeton University
Press, 1975.
[70] A. V. Borovik, A. G. Myasnikov, and V. Shpilrain, Measuring sets in infinite groups, in Computational and
Statistical Group Theory, Contemp. Math., 298, 21–42, 2002.
[71] J. A. Buchmann, Introduction to Cryptography, Springer 2004.
[72] T. Camps, Surface braid groups as platform groups and applications in cryptography, Ph. D. thesis,
Universität Dortmund, 2009.
[73] R. E. Crandall and C. Pomerance, Prime Numbers. A Computational Perspective, 2nd ed.,
Springer-Verlag, 2005.
[74] P. Dehornoy, Braid-based cryptography, Contemp. Math., 360, 5–34, 2004.
[75] B. Eick and D. Kahrobaei, Polycyclic groups: A new platform for cryptology?, arXiv:math/0411077, 1–7,
2004.
[76] D. Garber, Braid group cryptography, World Scientific Review Volume, arXiv:0711.3941, 2008.
[77] R. Gilman, A. G. Myasnikov, and D. Osin, Exponentially generic subsets of groups, Ill. J. Math., 54,
371–388, 2010.
[78] M. I. Gonzalez Vasco and R. Steinwandt, Group Theoretic Cryptography, Chapman & Hall, 2015.
[79] D. Grigoriev and I. Ponomarenko, Homomorphic public-key cryptosystems over groups and rings, Quad.
Mat., 2005.
[80] P. Hoffman, Archimedes’ Revenge, W. W. Norton & Company, 1988.
[81] T. Jitsuwaka, Malnormal subgroups of free groups, Contemp. Math., 298, 83–96, 2002.
[82] D. Kahrobaei and B. Khan, A non-commutative generalization of the El-Gamal key exchange using
polycyclic groups, in Proceedings of IEEE, 1–5, 2006.
[83] I. Kapovich and A. Myasnikov, Stallings foldings and subgroups of free groups, J. Algebra, 248, 608–668,
2002.
[84] K. H. Ko, D. Choi, M. Cho, and J. Lee, New signature scheme using conjugacy problem, IACR Cryptology
ePrint Archive, 168, 1–13, 2002.
[85] K. H. Ko, S. J. Lee, J. H. Cheon, J. H. Han, J. S. Kang, and C. Park, New public-key cryptosystems using
Braid groups, in Advances in Cryptography, Proceedings of Crypto 2000, Lecture Notes in Computer
Science, 1880, 166–183, 2000.
[86] N. Koblitz, Algebraic Methods of Cryptography, Springer, 1998.
[87] M. Kotov, D. Panteleev, and A. Ushakov, Analysis of the secret schemes based on Nielsen transformations,
Groups Complex. Cryptol., 10, 1–8, 2018.
[88] S. Lal and A. Chaturvedi, Authentication schemes using braid groups, arXiv:cs/0507066, 2005.
[89] W. Magnus, Rational Representations of Fuchsian Groups and Non-Parabolic Subgroups of the Modular
Group, Nachrichten der Akad. Göttingen, 179–189, 1973.
[90] A. Moldenhauer, Cryptographic protocols based on inner product spaces and group theory with a special
focus on the use of Nielsen transformations, Ph. D. thesis, University of Hamburg, 2016.
[91] I. G. Lysenok and A. G. Myasnikov, A polynomial bound on solutions of quadratic equations in free
groups, Proc. Steklov Inst. Math., 274, 136–173, 2011.
406 � Bibliography
[92] A. G. Myasnikov, V. Shpilrain, and A. Ushakov, A practical attack on some braid group based
cryptographic protocols, in CRYPTO 2005, Lecture Notes in Computer Science, 3621, 86–96, 2005.
[93] A. G. Myasnikov, V. Shpilrain, and A. Ushakov, Group-Based Cryptography, Advanced Courses in
Mathematics, CRM, Barcelona, 2007.
[94] A. D. Myasnikov and A. Ushakov, Length based attack and braid groups: Cryptanalysis of
Anshel–Anshel–Goldfeld key exchange protocol, Lect. Notes Comput. Sci., 4450, 76–88, 2007.
[95] G. Petrides, Cryptoanalysis of the public key cryptosystem based on the word problem on the Grigorchuk
groups, in Cryptography and Coding, Lecture Notes in Computer Science, 2898, 234–244, 2003.
[96] D. Panagopoulos, A secret sharing scheme using groups, arXiv:1009.0026, 2010.
[97] J. J. Quisquarter, L. C. Guillou, and T. A. Bersom, How to explain zero-knowledge protocols to your
children, in Advances in Cryptology – CRYPTO’ 89 Proceedings, Lecture Notes in Computer Science,
435, 628–631, 1990.
[98] C. E. Shannon, Communication theory of secrecy systems, Bell Syst. Tech. J., 28, 656–715, 1949.
[99] V. Shpilrain and A. Ushakov, The conjugacy search problem in public key cryptography; unnecessary and
insufficient, Appl. Algebra Eng. Commun. Comput., 17, 285–289, 2006.
[100] V. Shpilrain and A. Zapata, Using the subgroup memberhsip problem in public key cryptography,
Contemp. Math., 418, 169–179, 2006.
[101] R. Steinwandt, Loopholes in two public key cryptosystems using the modular groups, preprint, University
of Karlsruhe, 2000.
[102] R. Stinson, Cryptography; Theory and Practice, Chapman and Hall, 2002.
[103] N. R. Wagner and M. R. Magyarik, A public-key cryptosystem based on the word problem, in Advances in
Cryptology, 19–36, 1985.
[104] M. J. Wiener, Cryptoanalysis of short RSA secret exponents, IEEE Trans. Inf. Theory, 36, 553–558, 1990.
[105] X. Xu, Cryptography and infinite group theory, Ph. D. thesis, CUNY, 2006.
[106] A. Yamamura, Public key cryptosystems using the modular group, in Public Key Cryptography, Lecture
Notes in Computer Sciences, 1431, 203–216, 1998.
Index
Abelian group 2, 97 conjugacy search problem 379
Abelianization 177 conjugation in groups 140
affine coordinate ring 318 conjugator search problem 370
algebra 330 constructible number 78
algebraic closure 72, 88, 92 construction of a regular n-gon 82
algebraic extension 68 coset 17, 127
algebraic geometry 312 cryptosystem
algebraic integer 295 – free group 372
algebraic number field 297 cyclic group 121
algebraic numbers 66, 73 cyclotomic field 253
algebraic variety 312
algebraically closed 88, 91 decision conjugacy problem 379
alternating group 166 Dedekind domain 48
annihilator 271 degree of a representation 343
associates 34 derived series 177
attack dihedral groups 154
– length based 376 dimension of an algebraic set 319
automorphism 11 direct summand 328
axiom of choice 25 divisibility 28
axiom of well-ordering 25 division algorithm 29
division ring 107, 336
basis theorem for finite Abelian groups 150, 286 doubling the cube 81
Betti number 288 dual module 332
Burnside’s theorem 350 Dyck’s theorem 217
https://doi.org/10.1515/9783111142524-026
408 � Index