Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
546 views423 pages

Abstract Algebra

The document is a comprehensive introduction to abstract algebra, covering essential topics such as groups, rings, fields, and Galois theory, aimed at advanced undergraduate and beginning graduate students. The third edition includes updated material on skew field extensions, group representations, and cryptography, while maintaining a structured approach to teaching algebraic concepts. It serves as a foundational text for understanding advanced mathematics and its applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
546 views423 pages

Abstract Algebra

The document is a comprehensive introduction to abstract algebra, covering essential topics such as groups, rings, fields, and Galois theory, aimed at advanced undergraduate and beginning graduate students. The third edition includes updated material on skew field extensions, group representations, and cryptography, while maintaining a structured approach to teaching algebraic concepts. It serves as a foundational text for understanding advanced mathematics and its applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 423

Gerhard Rosenberger, Annika Schürenberg, Leonard Wienke

Abstract Algebra
Also of Interest
Algebra and Number Theory. A Selection of Highlights
Benjamin Fine, Anja Moldenhauer, Gerhard Rosenberger,
Annika Schürenberg, Leonard Wienke, 2023
ISBN 978-3-11-078998-0, e-ISBN (PDF) 978-3-11-079028-3,
e-ISBN (EPUB) 978-3-11-079039-9

Geometry and Discrete Mathematics. A Selection of Highlights


Benjamin Fine, Anja Moldenhauer, Gerhard Rosenberger,
Annika Schürenberg, Leonard Wienke, 2023
ISBN 978-3-11-074077-6, e-ISBN (PDF) 978-3-11-074078-3,
e-ISBN (EPUB) 978-3-11-074093-6

Abstract Algebra. An Introduction with Applications


Derek J. S. Robinson, 2022
ISBN 978-3-11-068610-4, e-ISBN (PDF) 978-3-11-069116-0,
e-ISBN (EPUB) 978-3-11-069121-4

Elements of Discrete Mathematics. Numbers and Counting, Groups, Graphs,


Orders and Lattices
Volker Diekert, Manfred Kufleitner Gerhard Rosenberger, Ulrich Hertrampf,
2023
ISBN 978-3-11-106069-9, e-ISBN (PDF) 978-3-11-106255-6,
e-ISBN (EPUB) 978-3-11-106288-4

A Course in Mathematical Cryptography


Gilbert Baumslag, Benjamin Fine, Martin Kreuzer, Gerhard Rosenberger,
2015
ISBN 978-3-11-037276-2, e-ISBN (PDF) 978-3-11-037277-9,
e-ISBN (EPUB) 978-3-11-038616-5
Gerhard Rosenberger, Annika Schürenberg,
Leonard Wienke

Abstract Algebra


With Applications to Galois Theory, Algebraic Geometry,
Representation Theory and Cryptography

3rd edition
Mathematics Subject Classification 2020
Primary: 11-01, 12-01, 13-01, 14-01, 16-01, 20-01, 20C15; Secondary: 01-01, 08-01, 94-01

Authors
Prof. Dr. Gerhard Rosenberger Dr. Leonard Wienke
University of Hamburg University of Bremen
Bundesstr. 55 Bibliothekstr. 5
20146 Hamburg 28359 Bremen
Germany Germany

Annika Schürenberg
Grundschule Hoheluft
Wrangelstr. 80
20253 Hamburg
Germany

ISBN 978-3-11-113951-7
e-ISBN (PDF) 978-3-11-114252-4
e-ISBN (EPUB) 978-3-11-114284-5

Library of Congress Control Number: 2024933441

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2024 Walter de Gruyter GmbH, Berlin/Boston


Cover image: Comstock / Stockbyte / Getty Images and sakkmesterke / iStock / Getty Images Plus
Typesetting: VTeX UAB, Lithuania
Printing and binding: CPI books GmbH, Leck

www.degruyter.com
Preface
Traditionally, mathematics has been separated into three main areas: algebra, analysis,
and geometry. Of course, there is a great deal of overlap between these areas. In gen-
eral, algebraic methods and symbolism pervade all of mathematics, and it is essential
for anyone learning any advanced mathematics to be familiar with the concepts and
methods in abstract algebra.
This is an introductory text on abstract algebra. It grew out of courses given to ad-
vanced undergraduate and beginning graduate students in the United States, and to
mathematics students and teachers in Germany. We assume that the reader is famil-
iar with calculus and with some linear algebra, primarily matrix algebra and the basic
concepts of vector spaces, bases, and dimensions. All other necessary material is intro-
duced and explained in the book. Our expectation is that the material in this text can be
completed in a full year’s course.
We present the material sequentially, so that polynomials and field extensions pre-
cede an in-depth look at advanced topics in group theory and Galois theory. This text
follows the new approach of conveying abstract algebra starting with rings and fields,
rather than with groups. Our teaching experience shows that examples of groups seem
rather abstract and require a certain formal framework and mathematical maturity that
would distract a course from its main objectives. The idea is that the integers provide
the most natural example of an algebraic structure that students know from school.
A student who goes through ring theory first, will attain a solid background in abstract
algebra and will be able to move on to more advanced topics.
The centerpiece of our book is the development of Galois theory and its important
applications, especially the insolvability of the quintic polynomial. After introducing the
basic algebraic structures, groups, rings, and fields, we begin with the theory of polyno-
mials and polynomial equations over fields. We then develop the main ideas of field
extensions and adjoining elements to fields. Since the second edition, we include added
material on skew field extensions of ℂ and Frobenius’s theorem.
After this, we present the necessary material from group theory needed to complete
both the insolvability of the quintic polynomial and solvability by radicals in general.
Hence, the middle part of the book, Chapters 9 through 14, are concerned with group
theory, including permutation groups, solvable groups, Abelian groups, and group ac-
tions. Chapter 14 is somewhat off to the side of the main theme of the book. Here, we give
a brief introduction to free groups, group presentations, and combinatorial group the-
ory. In this third edition, we have extended Chapter 14 to include a primer on hyperbolic
groups. With the group theory material at hand, we return to Galois theory and study
general normal and separable extensions, and the fundamental theorem of Galois the-
ory. Using this approach, we present several major applications of the theory, including
solvability by radicals and the insolvability of the quintic, the fundamental theorem of
algebra, the construction of regular n-gons, and the famous impossibilities: squaring the
circle, doubling the cube, and trisecting an angle.

https://doi.org/10.1515/9783111142524-201
VI � Preface

We continue with the theory of modules and prove the fundamental theorem for
finitely generated modules over principle ideal domains. We then consider transcen-
dental field extensions and prove Noether’s normalization theorem as preparation for
algebraic geometry based on Hilbert’s basis theorem and the nullstellensatz, and de-
scribe several applications. Since the second edition, we include a new chapter on al-
gebras and group representations. We finish in a slightly different direction, giving an
introduction to algebraic and noncommutative group-based cryptography. In this third
edition, we have devoted a modernized chapter to each of these topics including recent
developments and results.
In the bibliography we choose to mention some interesting books and papers which
are not used explicitly in our exposition but are very much related to the topics of the
present book and could be helpful for additional reading.
We were very pleased with the response to the second edition of this book, and we
were very happy to do a third edition. In this third edition, we have added the extensions
mentioned above, cleaned up various typos pointed out by readers, and have incorpo-
rated their suggestions. Here, we have to give a special thank you to Ahmad Mirzay and
O-joung Kwon. We would also like to thank Anja Rosenberger, who helped tremendously
with editing and LaTeX, and who made some invaluable suggestions about the contents.
Last but not least, we thank De Gruyter for publishing our book.

June 2024 Gerhard Rosenberger


Annika Schürenberg
Leonard Wienke
Contents
Preface � V

1 Groups, Rings and Fields � 1


1.1 Abstract Algebra � 1
1.2 Rings � 2
1.3 Integral Domains and Fields � 3
1.4 Subrings and Ideals � 6
1.5 Factor Rings and Ring Homomorphisms � 9
1.6 Fields of Fractions � 12
1.7 Characteristic and Prime Rings � 14
1.8 Groups � 16
1.9 Exercises � 18

2 Maximal and Prime Ideals � 20


2.1 Maximal and Prime Ideals of the Integers � 20
2.2 Prime Ideals and Integral Domains � 21
2.3 Maximal Ideals and Fields � 23
2.4 The Existence of Maximal Ideals � 24
2.5 Principal Ideals and Principal Ideal Domains � 25
2.6 Exercises � 27

3 Prime Elements and Unique Factorization Domains � 28


3.1 The Fundamental Theorem of Arithmetic � 28
3.2 Prime Elements, Units and Irreducibles � 34
3.3 Unique Factorization Domains � 37
3.4 Principal Ideal Domains and Unique Factorization � 40
3.5 Euclidean Domains � 43
3.6 Overview of Integral Domains � 48
3.7 Exercises � 49

4 Polynomials and Polynomial Rings � 51


4.1 Degrees, Reducibility and Roots � 51
4.2 Polynomial Rings over Fields � 53
4.3 Polynomial Rings over Integral Domains � 55
4.4 Polynomial Rings over Unique Factorization Domains � 57
4.5 Exercises � 63

5 Field Extensions � 65
5.1 Extension Fields and Finite Extensions � 65
5.2 Finite and Algebraic Extensions � 68
VIII � Contents

5.3 Minimal Polynomials and Simple Extensions � 69


5.4 Algebraic Closures � 72
5.5 Algebraic and Transcendental Numbers � 73
5.6 Exercises � 76

6 Field Extensions and Compass and Straightedge Constructions � 78


6.1 Geometric Constructions � 78
6.2 Constructible Numbers and Field Extensions � 78
6.3 Four Classical Construction Problems � 80
6.3.1 Squaring the Circle � 81
6.3.2 The Doubling of the Cube � 81
6.3.3 The Trisection of an Angle � 81
6.3.4 Construction of a Regular n-Gon � 82
6.4 Exercises � 86

7 Kronecker’s Theorem and Algebraic Closures � 88


7.1 Kronecker’s Theorem � 88
7.2 Algebraic Closures and Algebraically Closed Fields � 91
7.3 The Fundamental Theorem of Algebra � 96
7.3.1 Splitting Fields � 97
7.3.2 Permutations and Symmetric Polynomials � 97
7.4 The Fundamental Theorem of Symmetric Polynomials � 104
7.5 Skew Field Extensions of ℂ and the Frobenius Theorem � 107
7.6 Exercises � 111

8 Splitting Fields and Normal Extensions � 113


8.1 Splitting Fields � 113
8.2 Normal Extensions � 115
8.3 Exercises � 118

9 Groups, Subgroups and Examples � 119


9.1 Groups, Subgroups and Isomorphisms � 119
9.2 Examples of Groups � 121
9.3 Permutation Groups � 124
9.4 Cosets and Lagrange’s Theorem � 127
9.5 Generators and Cyclic Groups � 132
9.6 Exercises � 138

10 Normal Subgroups, Factor Groups and Direct Products � 140


10.1 Normal Subgroups and Factor Groups � 140
10.2 The Group Isomorphism Theorems � 144
10.3 Direct Products of Groups � 148
Contents � IX

10.4 Finite Abelian Groups � 149


10.5 Some Properties of Finite Groups � 154
10.6 Automorphisms of a Group � 158
10.7 Exercises � 160

11 Symmetric and Alternating Groups � 161


11.1 Symmetric Groups and Cycle Decomposition � 161
11.2 Parity and the Alternating Groups � 164
11.3 The Conjugation in Sn � 167
11.4 The Simplicity of An � 168
11.5 Exercises � 171

12 Solvable Groups � 172


12.1 Solvability and Solvable Groups � 172
12.2 The Derived Series � 176
12.3 Composition Series and the Jordan–Hölder Theorem � 177
12.4 Exercises � 179

13 Group Actions and the Sylow Theorems � 181


13.1 Group Actions � 181
13.2 Conjugacy Classes and the Class Equation � 182
13.3 The Sylow Theorems � 184
13.4 Some Applications of the Sylow Theorems � 188
13.5 Exercises � 192

14 Free Groups and Group Presentations � 193


14.1 Group Presentations and Combinatorial Group Theory � 193
14.2 Free Groups � 194
14.3 Group Presentations � 199
14.3.1 The Modular Group � 201
14.4 Presentations of Subgroups � 208
14.5 Geometric Interpretation � 210
14.6 Presentations of Factor Groups � 217
14.7 Decision Problems � 217
14.8 Group Amalgams: Free Products and Direct Products � 218
14.9 Exercises � 220

15 Finite Galois Extensions � 221


15.1 Galois Theory and the Solvability of Polynomial Equations � 221
15.2 Automorphism Groups of Field Extensions � 221
15.3 Finite Galois Extensions � 224
15.4 The Fundamental Theorem of Galois Theory � 225
X � Contents

15.5 Exercises � 234

16 Separable Field Extensions � 235


16.1 Separability of Fields and Polynomials � 235
16.2 Perfect Fields � 236
16.3 Finite Fields � 238
16.4 Separable Extensions � 239
16.5 Separability and Galois Extensions � 242
16.6 The Primitive Element Theorem � 246
16.7 Exercises � 248

17 Applications of Galois Theory � 249


17.1 Field Extensions by Radicals � 249
17.2 Cyclotomic Extensions � 253
17.3 Solvability and Galois Extensions � 254
17.4 The Insolvability of the Quintic Polynomial � 255
17.5 Constructibility of Regular n-Gons � 261
17.6 The Fundamental Theorem of Algebra � 263
17.7 Exercises � 264

18 The Theory of Modules � 267


18.1 Modules over Rings � 267
18.2 Annihilators and Torsion � 271
18.3 Direct Products and Direct Sums of Modules � 272
18.4 Free Modules � 274
18.5 Modules over Principal Ideal Domains � 277
18.6 The Fundamental Theorem for Finitely Generated Modules � 280
18.7 Exercises � 284

19 Finitely Generated Abelian Groups � 286


19.1 Finite Abelian Groups � 286
19.2 The Fundamental Theorem: p-Primary Components � 287
19.3 The Fundamental Theorem: Elementary Divisors � 288
19.4 Exercises � 294

20 Integral and Transcendental Extensions � 295


20.1 The Ring of Algebraic Integers � 295
20.2 Integral Ring Extensions � 298
20.3 Transcendental Field Extensions � 302
20.4 The Transcendence of e and π � 307
20.5 Exercises � 310
Contents � XI

21 The Hilbert Basis Theorem and the Nullstellensatz � 312


21.1 Algebraic Geometry � 312
21.2 Algebraic Varieties and Radicals � 312
21.3 The Hilbert Basis Theorem � 314
21.4 The Nullstellensatz � 315
21.5 Applications and Consequences of Hilbert’s Theorems � 316
21.6 Dimensions � 319
21.7 Exercises � 324

22 Algebras and Group Representations � 325


22.1 Group Representations � 325
22.2 Representations and Modules � 326
22.3 Semisimple Algebras and Wedderburn’s Theorem � 334
22.4 Ordinary Representations, Characters and Character Theory � 342
22.5 Burnside’s Theorem � 350
22.6 Exercises � 354

23 Algebraic Cryptography � 356


23.1 Basic Algebraic Cryptography � 356
23.1.1 Cryptosystems Tied to Abelian Groups � 356
23.1.2 Cryptographic Protocols � 364

24 Non-Commutative Group Based Cryptography � 369


24.1 Group Based Methods � 369
24.2 Initial Group Theoretic Cryptosystems—The Magnus Method � 372
24.2.1 The Wagner–Magyarik Method � 374
24.3 Free Group Cryptosystems � 375
24.4 Non-Abelian Digital Signature Procedure � 379
24.5 Password Authentication Using Combinatorial Group Theory � 380
24.5.1 General Outline of the Authentication Protocol � 382
24.5.2 Free Subgroup Method � 382
24.5.3 General Finitely Presented Group Method � 383
24.6 The Strong Generic Free Group Property � 384
24.6.1 Security Analysis of the Group Randomizer Protocols � 389
24.6.2 Implementation of a Group Randomizer System Protocol � 390
24.7 A Secret Sharing Scheme Using Combinatorial Group Theory � 390
24.8 Ko–Lee and Anshel–Anshel–Goldfeld Protocols � 393
24.8.1 The Ko–Lee Protocol � 394
24.8.2 The Anshel–Anshel–Goldfeld Protocol � 396
XII � Contents

Bibliography � 403

Index � 407
1 Groups, Rings and Fields
1.1 Abstract Algebra
Abstract algebra or modern algebra can be best described as the theory of algebraic
structures. Briefly, an algebraic structure is a set together with one or more binary oper-
ations on it satisfying axioms governing the operations. There are many algebraic struc-
tures, but the most commonly studied structures are groups, rings, fields, and vector
spaces. Also, widely used are modules and algebras. In this first chapter, we will look at
some basic preliminaries concerning groups, rings, and fields. We will only briefly touch
on groups here; a more extensive treatment will be done later in the book.
Mathematics traditionally has been subdivided into three main areas—analysis, al-
gebra, and geometry. These areas overlap in many places so that it is often difficult, for
example, to determine whether a topic is one in geometry or in analysis. Algebra and
algebraic methods permeate all these disciplines and most of mathematics has been al-
gebraicized; that is, uses the methods and language of algebra. Groups, rings, and fields
play a major role in the study of analysis, topology, geometry, and even applied mathe-
matics. We will see these connections in examples throughout the book.
Abstract algebra has its origins in two main areas and questions that arose in these
areas—the theory of numbers and the theory of equations. The theory of numbers deals
with the properties of the basic number systems—integers, rationals, and reals, whereas
the theory of equations, as the name indicates, deals with solving equations, in partic-
ular, polynomial equations. Both are subjects that date back to classical times. A whole
section of Euclid’s elements is dedicated to number theory. The foundations for the mod-
ern study of number theory were laid by Fermat in the 1600s, and then by Gauss in the
1800s. In an attempt to prove Fermat’s big theorem, Gauss introduced the complex inte-
gers a + bi, where a and b are integers and showed that this set has unique factorization.
These ideas were extended by Dedekind and Kronecker, who developed a wide ranging
theory of algebraic number fields and algebraic integers. A large portion of the termi-
nology used in abstract algebra, such as rings, ideals, and factorization, comes from the
study of algebraic number fields. This has evolved into the modern discipline of alge-
braic number theory.
The second origin of modern abstract algebra was the problem of trying to deter-
mine a formula for finding the solutions in terms of radicals of a fifth degree polynomial.
It was proved first by Ruffini in 1800, and then by Abel that it is impossible to find a for-
mula in terms of radicals for such a solution. Galois in 1820 extended this and showed
that such a formula is impossible for any degree five or greater. In proving this, he laid
the groundwork for much of the development of modern abstract algebra, especially
field theory and finite group theory. Earlier, in 1800, Gauss proved the fundamental the-
orem of algebra, which says that any nonconstant complex polynomial equation must
have a solution. One of the goals of this book is to present a comprehensive treatment
of Galois theory and a proof of the results mentioned above.

https://doi.org/10.1515/9783111142524-001
2 � 1 Groups, Rings and Fields

The locus of real points (x, y), which satisfy a polynomial equation f (x, y) = 0, is
called an algebraic plane curve. Algebraic geometry deals with the study of algebraic
plane curves and extensions to loci in a higher number of variables. Algebraic geometry
is intricately tied to abstract algebra and especially commutative algebra. We will touch
on this in the book also.
Finally linear algebra, although a part of abstract algebra, arose in a somewhat dif-
ferent context. Historically, it grew out of the study of solution sets of systems of linear
equations and the study of the geometry of real n-dimensional spaces. It began to be
developed formally in the early 1800s with work of Jordan and Gauss, and then later in
the century by Cayley, Hamilton, and Sylvester.

1.2 Rings
The primary motivating examples for algebraic structures are the basic number sys-
tems: the integers ℤ, the rational numbers ℚ, the real numbers ℝ, and the complex
numbers ℂ. Each of these has two basic operations, addition and multiplication, and
form what is called a ring. We formally define this.

Definition 1.2.1. A ring is a set R with two binary operations defined on it: addition,
denoted by +, and multiplication, denoted by ⋅, or just by juxtaposition, satisfying the
following six axioms:
(1) Addition is commutative: a + b = b + a for each pair a, b in R.
(2) Addition is associative: a + (b + c) = (a + b) + c for a, b, c ∈ R.
(3) There exists an additive identity, denoted by 0, such that a + 0 = a for each a ∈ R.
(4) For each a ∈ R, there exists an additive inverse, denoted by −a, such that a + (−a) = 0.
(5) Multiplication is associative: a(bc) = (ab)c for a, b, c ∈ R.
(6) Multiplication is left and right distributive over addition: a(b + c) = ab + ac, and
(b + c)a = ba + ca for a, b, c ∈ R.

The ring R is commutative if


(7) Multiplication is commutative: ab = ba for a, b in R.

We call R a ring with identity if


(8) There exists a multiplicative identity denoted by 1 such that a ⋅ 1 = a and 1 ⋅ a = a
for each a in R.

If R satisfies (1) through (8), then R is a commutative ring with identity.

A set G with one operation, +, on it satisfying axioms (1) through (4) is called an
Abelian group. We will discuss these further later in the chapter.
The numbers systems ℤ, ℚ, ℝ, ℂ are commutative rings with identity.
A ring R with only one element is called trivial. A ring R with identity is trivial if and
only if 0 = 1. A finite ring is a ring R with only finitely many elements in it. Otherwise, R is
1.3 Integral Domains and Fields � 3

an infinite ring. ℤ, ℚ, ℝ, ℂ are all infinite rings. Examples of finite rings are given by the
integers modulo n, ℤn , with n > 1. The ring ℤn consists of the elements 0, 1, 2, . . . , n − 1
with addition and multiplication done modulo n. That is, for example 4 ⋅ 3 = 12 = 2
modulo 5. Hence, in ℤ5 , we have 4 ⋅ 3 = 2. The rings ℤn are all finite commutative rings
with identity.
To give examples of rings without an identity, consider the set nℤ = {nz : z ∈ ℤ}
consisting of all multiples of the fixed integer n. It is an easy verification (see exercises)
that this forms a ring under the same addition and multiplication as in ℤ, but that there
is no identity for multiplication. Hence, for each n ∈ ℤ with n > 1, we get an infinite
commutative ring without an identity.
To obtain examples of noncommutative rings, we consider matrices. Let M(2, ℤ) be
the set of (2 × 2)-matrices with integral entries. Addition of matrices is done component-
wise; that is,

a1 b1 a b2 a + a2 b1 + b2
( )+( 2 )=( 1 ),
c1 d1 c2 d2 c1 + c2 d1 + d2

whereas multiplication is matrix multiplication

a1 b1 a b2 a a + b1 c2 a1 b2 + b1 d2
( )⋅( 2 )=( 1 2 ).
c1 d1 c2 d2 c1 a2 + d1 c2 c1 b2 + d1 d2

Then again, it is an easy verification (see exercises) that M(2, ℤ) forms a ring. Further,
since matrix multiplication is noncommutative, this forms a noncommutative ring.
However, the identity matrix does form a multiplicative identity for it. M(2, ℤn ) with
n > 1 provides an example of an infinite noncommutative ring without an identity.
Finally, M(2, ℤn ) for n > 1 will give an example of a finite noncommutative ring.

1.3 Integral Domains and Fields


Our basic number systems have the property that if ab = 0, then either a = 0, or b = 0.
However, this is not necessarily true in the modular rings. For example, 2 ⋅ 3 = 0 in ℤ6 .

Definition 1.3.1. A zero divisor in a ring R is an element a ∈ R with a ≠ 0 such that there
exists an element b ≠ 0 with ab = 0. A commutative ring with an identity 1 ≠ 0 and with
no zero divisors is called an integral domain.

Notice that having no zero divisors is equivalent to the fact that if ab = 0 in R, then
either a = 0, or b = 0.
Hence, ℤ, ℚ, ℝ, ℂ are all integral domains, but from the example above, ℤ6 is not.
In general, we have the following:

Theorem 1.3.2. ℤn is an integral domain if and only if n is a prime.


4 � 1 Groups, Rings and Fields

Proof. First of all, notice that under multiplication modulo n, an element m is 0 if and
only if n divides m. We will make this precise shortly. Recall further Euclid’s lemma
(see Chapter 2), which says that if a prime p divides a product ab, then p divides a, or p
divides b.
Now suppose that n is a prime and ab = 0 in ℤn . Then n divides ab. From Euclid’s
lemma it follows that n divides a, or n divides b. In the first case, a = 0 in ℤn , whereas
in the second, b = 0 in ℤn . It follows that there are no zero divisors in ℤn , and since ℤn
is a commutative ring with an identity, it is an integral domain.
Conversely, suppose ℤn is an integral domain. Suppose that n is not prime. Then n =
ab with 1 < a < n, 1 < b < n. It follows that ab = 0 in ℤn with neither a nor b being zero.
Therefore, they are zero divisors, which is a contradiction. Hence, n must be prime.

In ℚ, every nonzero element has a multiplicative inverse. This is not true in ℤ,


where only the elements −1, 1 have multiplicative inverses within ℤ.

Definition 1.3.3. A unit in a ring R with identity 1 ≠ 0 is an element a ∈ R, which has a


multiplicative inverse; that is, an element b ∈ R such that ab = ba = 1. If a is a unit in R,
we denote its inverse by a−1 . We denote the set of units of R by R⋆ .

Hence, every nonzero element of ℚ and of ℝ and of ℂ is a unit, but in ℤ, the


only units are ±1. In M(2, ℝ), the units are precisely those matrices that have nonzero
determinant, whereas in M(2, ℤ), the units are those integral matrices that have deter-
minant ±1.

Definition 1.3.4. A field K is a commutative ring with an identity 1 ≠ 0, where every


nonzero element is a unit.

Hence, a field K always contains at least two elements, a zero element 0 and an
identity 1 ≠ 0.
The rationals ℚ, the reals ℝ, and the complexes ℂ are all fields. If we relax the com-
mutativity requirement and just require that in the ring R with identity, each nonzero
element is a unit, then we get a skew field or division ring.

Lemma 1.3.5. If K is a field, then K is an integral domain.

Proof. Since a field K is already a commutative ring with an identity, we must only show
that there are no zero divisors in K.
Suppose that ab = 0 with a ≠ 0. Since K is a field and a is nonzero, it has an in-
verse a−1 . Hence,

a−1 (ab) = a−1 0 = 0 󳨐⇒ (a−1 a)b = 0 󳨐⇒ b = 0.

Therefore, K has no zero divisors and must be an integral domain.


Recall that ℤn was an integral domain only when n was a prime. This turns out to
also be necessary and sufficient for ℤn to be a field.
1.3 Integral Domains and Fields � 5

Theorem 1.3.6. ℤn is a field if and only if n is a prime.

Proof. First suppose that ℤn is a field. Then from Lemma 1.3.5, it is an integral domain.
Therefore, from Theorem 1.3.2, n must be a prime.
Conversely, suppose that n is a prime. We must show that ℤn is a field. Since we
already know that ℤn is an integral domain, we must only show that each nonzero ele-
ment of ℤn is a unit. Here, we need some elementary facts from number theory. If a, b
are integers, we use the notation a|b to indicate that a divides b.
Recall that given nonzero integers a, b, their greatest common divisor or GCD d > 0
is a positive integer, which is a common divisor; that is, d|a and d|b, and if d1 is any
other common divisor, then d1 |d. We denote the greatest common divisor of a, b by either
gcd(a, b) or (a, b). It can be proved that given nonzero integers a, b their GCD exists, is
unique and can be characterized as the least positive linear combination of a and b. If
the GCD of a and b is 1, then we say that a and b are relatively prime or coprime. This is
equivalent to being able to express 1 as a linear combination of a and b (see Chapter 3
for proofs and more details).
Now let a ∈ ℤn with n prime and a ≠ 0. Since a ≠ 0, we have that n does not divide a.
Since n is prime, it follows that a and n must be relatively prime, (a, n) = 1. From the
number theoretic remarks above, we then have that there exist x, y with

ax + ny = 1.

However, in ℤn , the element ny = 0. Therefore, in ℤn , we have

ax = 1.

Therefore, a has a multiplicative inverse in ℤn and is, hence, a unit. Since a was an
arbitrary nonzero element, we conclude that ℤn is a field.
The theorem above is actually a special case of a more general result from which
Theorem 1.3.6 could also be obtained.

Theorem 1.3.7. Each finite integral domain is a field.

Proof. Let K be a finite integral domain. We must show that K is a field. It is clearly
sufficient to show that each nonzero element of K is a unit. Let

{0, 1, r1 , . . . , rn }

be the elements of K. Let ri be a fixed nonzero element and multiply each element of K
by ri on the left. Now

if ri rj = ri rk then ri (rj − rk ) = 0.

Since ri ≠ 0, it follows that rj − rk = 0 or rj = rk . Therefore, all the products ri rj are


distinct. Hence,
6 � 1 Groups, Rings and Fields

R = {0, 1, r1 , . . . , rn } = ri R = {0, ri , ri r1 , . . . , ri rn }.

Therefore, the identity element 1 must be in the right-hand list; that is, there is an rj such
that ri rj = 1. Therefore, ri has a multiplicative inverse and is, hence, a unit. Therefore,
K is a field.

1.4 Subrings and Ideals


A very important concept in algebra is that of a substructure that is a subset having the
same structure as the superset.

Definition 1.4.1. A subring of a ring R is a nonempty subset S that is also a ring under
the same operations as R. If R is a field and S also a field, then it is a subfield.

If S ⊂ R, then S satisfies the same basic axioms, associativity, and commutativity


of addition, for example. Therefore, S will be a subring if it is nonempty and closed
under the operations; that is, closed under addition, multiplication, and taking additive
inverses.

Lemma 1.4.2. A subset S of a ring R is a subring if and only if S is nonempty, and whenever
a, b ∈ S, we have a + b ∈ S, a − b ∈ S and ab ∈ S.

Example 1.4.3. Show that if n > 1, the set nℤ is a subring of ℤ. Here, clearly nℤ is
nonempty. Suppose a = nz1 , b = nz2 are two elements of nℤ. Then

a + b = nz1 + nz2 = n(z1 + z2 ) ∈ nℤ


a − b = nz1 − nz2 = n(z1 − z2 ) ∈ nℤ
ab = nz1 ⋅ nz2 = n(nz1 z2 ) ∈ nℤ.

Therefore, nℤ is a subring.

Example 1.4.4. Show that the set of real numbers of the form

S = {u + v√2 : u, v ∈ ℚ}

is a subring of ℝ. Here, 1 + √2 ∈ S; therefore, S is nonempty. Suppose a = u1 + v1 √2,


b = u2 + v2 √2 are two element of S. Then

a + b = (u1 + v1 √2) + (u2 + v2 √2) = u1 + u2 + (v1 + v2 )√2 ∈ S


a − b = (u1 + v1 √2) − (u2 + v2 √2) = u1 − u2 + (v1 − v2 )√2 ∈ S
a ⋅ b = (u1 + v1 √2) ⋅ (u2 + v2 √2) = (u1 u2 + 2v1 v2 ) + (u1 v2 + v1 u2 )√2 ∈ S.

Therefore, S is a subring.
1.4 Subrings and Ideals � 7

In fact, S is a field because 1√ = u2 −2v


u v 2
2 − u2 −v2 if (u, v) ≠ (0, 0). In the following,

u+v 2
we are especially interested in special types of subrings called ideals.

Definition 1.4.5. Let R be a ring and I ⊂ R. Then I is a (two-sided) ideal if the following
properties hold:
(1) I is nonempty.
(2) If a, b ∈ I, then a ± b ∈ I.
(3) If a ∈ I and r is any element of R, then ra ∈ I, and ar ∈ I.

We denote the fact that I forms an ideal in R by I ⊲ R.

Notice that if a, b ∈ I, then from (3), we have ab ∈ I, and ba ∈ I. Hence, I forms a


subring; that is, each ideal is also a subring. The set {0} and the whole ring R are trivial
ideals of R.
If we assume that in (3), only ra ∈ I, then I is called a left ideal. Analogously, we
define a right ideal.

Lemma 1.4.6. Let R be a commutative ring and a ∈ R. Then the set

⟨a⟩ = aR = {ar : r ∈ R}

is an ideal of R.

This ideal is called the principal ideal generated by a.

Proof. We must verify the three properties of the definition. Since a ∈ R, we have that
aR is nonempty. If u = ar1 , v = ar2 are two elements of aR, then

u ± v = ar1 ± ar2 = a(r1 ± r2 ) ∈ aR.

Therefore, (2) is satisfied.


Finally, let u = ar1 ∈ aR and r ∈ R. Then

ru = rar1 = a(rr1 ) ∈ aR, and ur = ar1 r = a(r1 r) ∈ aR.

Recall that a ∈ ⟨a⟩ if R has an identity.


Notice that if n ∈ ℤ, then the principal ideal generated by n is precisely the ring nℤ,
which we have already examined. Hence, for each n > 1, the subring nℤ is actually an
ideal. We can show more.

Theorem 1.4.7. Any subring of ℤ is of the form nℤ for some n. Hence, each subring of ℤ
is actually a principal ideal.

Proof. Let S be a subring of ℤ. If S = {0}, then S = 0ℤ, so we may assume that S has
nonzero elements. Since S is a subring if it has nonzero elements, it must have positive
elements (since it has the additive inverse of any element in it).
8 � 1 Groups, Rings and Fields

Let S + be the set of positive elements in S. From the remarks above, this is a
nonempty set, and so, there must be a least positive element n. We claim that S = nℤ.
Let m be a positive element in S. By the division algorithm

m = qn + r,

where either r = 0, or 0 < r < n (see Chapter 3). Suppose that r ≠ 0. Then

r = m − qn.

Now m ∈ S, and n ∈ S. Since S is a subring, it is closed under addition so that qn ∈ S. But


S is a subring, therefore, m − qn ∈ S. It follows that r ∈ S. But this is a contradiction since
n was the least positive element in S. Therefore, r = 0, and m = qn. Hence, each positive
element in S is a multiple of n.
Now let m be a negative element of S. Then −m ∈ S, and −m is positive. Hence,
−m = qn, and thus, m = (−q)n. Therefore, every element of S is a multiple of n, and so,
S = nℤ. It follows that every subring of ℤ is of this form and, therefore, every subring
of ℤ is an ideal.

We mention that this is true in ℤ, but not always true. For example, ℤ is a subring
of ℚ, but not an ideal. An extension of the proof of Lemma 1.4.6 gives the following. We
leave the proof as an exercise.

Lemma 1.4.8. Let R be a commutative ring and a1 , . . . , an ∈ R be a finite set of elements


in R. Then the set

⟨a1 , . . . , an ⟩ = {r1 a1 + r2 a2 + ⋅ ⋅ ⋅ + rn an : ri ∈ R}

is an ideal of R.

This ideal is called the ideal generated by a1 , . . . , an . Recall that a1 , . . . , an are in


⟨a1 , . . . , an ⟩ if R has an identity.

Theorem 1.4.9. Let R be a commutative ring with an identity 1 ≠ 0. Then R is a field if and
only if the only ideals in R are {0} and R.

Proof. Suppose that R is a field and I ⊲ R is an ideal. We must show that either I = {0},
or I = R. Suppose that I ≠ {0}, then we must show that I = R.
Since I ≠ {0}, there exists an element a ∈ I with a ≠ 0. Since R is a field, this element
a has an inverse a−1 . Since I is an ideal, it follows that a−1 a = 1 ∈ I. Let r ∈ R, then, since
1 ∈ I, we have r ⋅ 1 = r ∈ I. Hence, R ⊂ I and, therefore, R = I.
Conversely, suppose that R is a commutative ring with an identity, whose only ideals
are {0} and R. We must show that R is a field, or equivalently, that every nonzero element
of R has a multiplicative inverse.
1.5 Factor Rings and Ring Homomorphisms � 9

Let a ∈ R with a ≠ 0. Since R is a commutative ring, and a ≠ 0, the principal ideal


aR is a nontrivial ideal in R. Hence, aR = R. Therefore, the multiplicative identity 1 ∈ aR.
It follows that there exists an r ∈ R with ar = 1. Hence, a has a multiplicative inverse,
and R must be a field.

1.5 Factor Rings and Ring Homomorphisms


Given an ideal I in a ring R, we can build a new ring called the factor ring or quotient
ring of R modulo I. The special condition on the subring I, that rI ⊂ I and Ir ⊂ I for all
r ∈ R, that makes it an ideal, is specifically to allow this construction to be a ring.

Definition 1.5.1. Let I be an ideal in a ring R. Then a coset of I is a subset of R of the


form

r + I = {r + i : i ∈ I}

with r a fixed element of R.

Lemma 1.5.2. Let I be an ideal in a ring R. Then the cosets of I partition R; that is, any
two cosets are either coincide or disjoint.

We leave the proof to the exercises. Now, on the set of all cosets of an ideal, we will
build a new ring.

Theorem 1.5.3. Let I be an ideal in a ring R. Let R/I = {r + I : r ∈ R} be the set of all cosets
of I in R. We define addition and multiplication on R/I in the following manner:

(r1 + I) + (r2 + I) = (r1 + r2 ) + I


(r1 + I) ⋅ (r2 + I) = (r1 ⋅ r2 ) + I.

Then R/I forms a ring called the factor ring of R modulo I. The zero element of R/I is
0 + I and the additive inverse of r + I is −r + I. Further, if R is commutative, then R/I is
commutative, and if R has an identity, then R/I has an identity 1 + I.

Proof. The proof that R/I satisfies the ring axioms under the definitions above is
straightforward. For example,

(r1 + I) + (r2 + I) = (r1 + r2 ) + I = (r2 + r1 ) + I = (r2 + I) + (r1 + I),

and so, addition is commutative. What must be shown is that both addition and multi-
plication are well defined. That is, if

r1 + I = r1′ + I, and r2 + I = r2′ + I

then
10 � 1 Groups, Rings and Fields

(r1 + I) + (r2 + I) = (r1′ + I) + (r2′ + I),

and

(r1 + I) ⋅ (r2 + I) = (r1′ + I) ⋅ (r2′ + I).

Now if r1 + I = r1′ + I, then r1 ∈ r1′ + I, and so, r1 = r1′ + i1 for some i1 ∈ I. Similarly, if
r2 + I = r2′ + I, then r2 ∈ r2′ + I, and so, r2 = r2′ + i2 for some i2 ∈ I. Then

(r1 + I) + (r2 + I) = (r1′ + i1 + I) + (r2′ + i2 + I) = (r1′ + I) + (r2′ + I)

since i1 + I = I and i2 + I = I. Similarly,

(r1 + I) ⋅ (r2 + I) = (r1′ + i1 + I) ⋅ (r2′ + i2 + I)


= r1′ ⋅ r2′ + r1′ ⋅ i2 + r2′ ⋅ i1 + r1′ ⋅ I + r2′ ⋅ I + I ⋅ I
= (r1′ ⋅ r2′ ) + I

since all the other products are in the ideal I. This shows that addition and multiplication
are well defined. It also shows why the ideal property is necessary.

As an example, let R be the integers ℤ. As we have seen, each subring is an ideal and
of the form nℤ for some natural number n. The factor ring ℤ/nℤ is called the residue
class ring modulo n, denoted ℤn . Notice that we can take as cosets

0 + nℤ, 1 + nℤ, . . . , (n − 1) + nℤ.

Addition and multiplication of cosets is then just addition and multiplication modulo n.
As we can see, this is just a formalization of the ring ℤn , which we have already looked
at. Recall that ℤn is an integral domain if and only if n is prime and ℤn is a field for
precisely the same n. If n = 0, then ℤ/nℤ is the same as ℤ.
We now show that ideals and factor rings are closely related to certain mappings
between rings.

Definition 1.5.4. Let R and S be rings. Then a mapping f : R → S is a ring homomor-


phism if

f (r1 + r2 ) = f (r1 ) + f (r2 ) for any r1 , r2 ∈ R


f (r1 ⋅ r2 ) = f (r1 ) ⋅ f (r2 ) for any r1 , r2 ∈ R.

In addition,
(1) f is an epimorphism if it is surjective.
(2) f is an monomorphism if it is injective.
(3) f is an isomorphism if it is bijective; that is, both surjective and injective. In this case,
R and S are said to be isomorphic rings, which we denote by R ≅ S.
1.5 Factor Rings and Ring Homomorphisms � 11

(4) f is an endomorphism if R = S; that is, a ring homomorphism from a ring to itself.


(5) f is an automorphism if R = S and f is an isomorphism.

Lemma 1.5.5. Let R and S be rings, and let f : R → S be a ring homomorphism. Then
(1) f (0) = 0, where the first and second 0 are the zero elements of R and S, respectively.
(2) f (−r) = −f (r) for any r ∈ R.

Proof. We obtain f (0) = 0 from the equation f (0) = f (0 + 0) = f (0) + f (0). Hence,
0 = f (0) = f (r − r) = f (r + (−r)) = f (r) + f (−r); that is, f (−r) = −f (r).

Definition 1.5.6. Let R and S be rings, and let f : R → S be a ring homomorphism. Then
the kernel of f is

ker(f ) = {r ∈ R : f (r) = 0}.

The image of f , denoted im(f ), is the range of f within S. That is,

im(f ) = {s ∈ S : there exists r ∈ R with f (r) = s}.

Theorem 1.5.7 (Ring isomorphism theorem). Let R and S be rings, and let

f :R→S

be a ring homomorphism. Then


(1) ker(f ) is an ideal in R, im(f ) is a subring of S, and

R/ ker(f ) ≅ im(f ).

(2) Conversely, suppose that I is an ideal in a ring R. Then the map f : R → R/I, given by
f (r) = r + I for r ∈ R, is a ring homomorphism, whose kernel is I, and whose image
is R/I.

The theorem says that the concepts of ideal of a ring and kernel of a ring homomor-
phism coincide; that is, each ideal is the kernel of a homomorphism and the kernel of
each ring homomorphism is an ideal.

Proof. If s1 , s2 ∈ im(f ), then there exist r1 , r2 ∈ R, such that f (r1 ) = s1 , and f (r2 ) = s2 .
Then certainly, im(f ) is a subring of S from Definition 1.5.4 and Lemma 1.5.5. Now, let
I = ker(f ). We show first that I is an ideal. If r1 , r2 ∈ I, then f (r1 ) = f (r2 ) = 0. It follows
from the homomorphism property that

f (r1 ± r2 ) = f (r1 ) ± f (r2 ) = 0 + 0 = 0


f (r1 ⋅ r2 ) = f (r1 ) ⋅ f (r2 ) = 0 ⋅ 0 = 0.

Therefore, I is a subring.
12 � 1 Groups, Rings and Fields

Now let i ∈ I and r ∈ R. Then

f (r ⋅ i) = f (r) ⋅ f (i) = f (r) ⋅ 0 = 0 and f (i ⋅ r) = f (i) ⋅ f (r) = 0 ⋅ f (r) = 0

and, hence, I is an ideal.


Consider the factor ring R/I. Let f ∗ : R/I → im(f ) by f ∗ (r + I) = f (r). We show that
f ∗ is an isomorphism.
First, we show that it is well defined. Suppose r1 + I = r2 + I, then r1 − r2 ∈ I = ker(f ).
It follows that f (r1 − r2 ) = 0, so f (r1 ) = f (r2 ). Hence, f ∗ (r1 + I) = f ∗ (r2 + I), and the map
f ∗ is well defined.
Now

f ∗ ((r1 + I) + (r2 + I)) = f ∗ ((r1 + r2 ) + I) = f (r1 + r2 )


= f (r1 ) + f (r2 ) = f ∗ (r1 + I) + f ∗ (r2 + I),

and

f ∗ ((r1 + I) ⋅ (r2 + I)) = f ∗ ((r1 ⋅ r2 ) + I) = f (r1 ⋅ r2 )


= f (r1 ) ⋅ f (r2 ) = f ∗ (r1 + I) ⋅ f ∗ (r2 + I).

Hence, f ∗ is a homomorphism. We must now show that it is injective and surjective.


Suppose that f ∗ (r1 + I) = f ∗ (r2 + I). Then f (r1 ) = f (r2 ) so that f (r1 − r2 ) = 0. Hence,
r1 − r2 ∈ ker(f ) = I. Therefore, r1 ∈ r2 + I, and thus, r1 + I = r2 + I, and the map f ∗ is
injective.
Finally, let s ∈ im(f ). Then there exists r ∈ R such that f (r) = s. Then f ∗ (r + I) = s,
and the map f ∗ is surjective and, hence, an isomorphism. This proves the first part of
the theorem.
To prove the second part, let I be an ideal in R and R/I the factor ring. Consider the
map f : R → R/I, given by f (r) = r +I. From the definition of addition and multiplication
in the factor ring R/I, it is clear that this is a homomorphism. Consider the kernel of f .
If r ∈ ker(f ), then f (r) = r + I = 0 = 0 + I. This implies that r ∈ I and, hence, the kernel
of this map is exactly the ideal I, completing the proof.

Theorem 1.5.7 is called the ring isomorphism theorem or the first ring isomorphism
theorem. We mention that there is an analogous theorem for each algebraic structure,
in particular, for groups and vector spaces. We will mention the result for groups in
Section 1.8.

1.6 Fields of Fractions


The integers are an integral domain, and the rationals ℚ are a field that contains the
integers. First, we show that ℚ is the smallest field containing ℤ.
1.6 Fields of Fractions � 13

Theorem 1.6.1. The rationals ℚ are the smallest field containing the integers ℤ. That is,
if ℤ ⊂ K ⊂ ℚ with K a subfield of ℚ, then K = ℚ.

Proof. Since ℤ ⊂ K, we have m, n ∈ K for any two integers m, n with n ≠ 0. Since K is


a subfield, it is closed under taking division; that is, taking multiplicative inverses and,
hence, the fraction mn ∈ K. Since each element of ℚ is such a fraction, it follows that
ℚ ⊂ K. Since K ⊂ ℚ, it follows that K = ℚ.
Notice that to construct the rationals from the integers, we form all fractions mn with
n ≠ 0, and where mn 1 = mn 2 if m1 n2 = n1 m2 . We then do the standard operations on
1 2
fractions. If we start with any integral domain D, we can mimic this construction to
build a field of fractions from D; that is, the smallest field containing D.

Theorem 1.6.2. Let D be an integral domain. Then there is a field K containing D, called
the field of fractions for D, such that each element of K is a fraction from D; that is, an
element of the form d1 d2−1 with d1 , d2 ∈ D. Further, K is unique up to isomorphism and is
the smallest field containing D.

Proof. The proof is just the mimicking of the construction of the rationals from the in-
tegers. Let

K ′ = {(d1 , d2 ) : d1 , d2 ≠ 0, d1 , d2 ∈ D}.

Define on K ′ the equivalence relation

(d1 , d2 ) = (d1′ , d2′ ) if d1 d2′ = d2 d1′ .

Let K be the set of equivalence classes, and define addition and multiplication in the
usual manner as for fractions, where the result is the equivalence class:

(d1 , d2 ) + (d3 , d4 ) = (d1 d4 + d2 d3 , d2 d4 )


(d1 , d2 ) ⋅ (d3 , d4 ) = (d1 d3 , d2 d4 ).

It is now straightforward to verify the ring axioms for K. The inverse of (d1 , 1) is (1, d1 )
for d1 ≠ 0 in D. As with ℤ, we identify the elements of K as fractions dd1 . The proof that
2
K is the smallest field containing D is the same as for ℚ from ℤ.

As examples, we have that ℚ is the field of fractions for ℤ. A familiar, but less com-
mon, example is the following:
Let ℝ[x] be the set of polynomials over the real numbers ℝ. It can be shown that
ℝ[x] forms an integral domain (see Chapter 3). The field of fractions consists of all
f (x)
formal functions g(x) , where f (x), g(x) are real polynomials with g(x) ≠ 0. The cor-
responding field of fractions is called the field of rational functions over ℝ and is de-
noted ℝ(x).
14 � 1 Groups, Rings and Fields

1.7 Characteristic and Prime Rings


We saw in the last section that ℚ is the smallest field containing the integers. Since any
subfield of ℚ must contain the identity, it follows that any nontrivial subfield of ℚ must
contain the integers and, hence, be all of ℚ. Therefore, ℚ has no nontrivial subfields.
We say that ℚ is a prime field.

Definition 1.7.1. A field K is a prime field if K contains no nontrivial subfields.

Lemma 1.7.2. Let K be any field. Then K contains a prime field K as a subfield.

Proof. Let K1 , K2 be subfields of K. If k1 , k2 ∈ K1 ∩ K2 , then k1 ± k2 ∈ K1 since K1 is a


subfield, and k1 ± k2 ∈ K2 since K2 is a subfield. Therefore, k1 ± k2 ∈ K1 ∩ K2 . Similarly,
k1 k2−1 ∈ K1 ∩ K2 . It follows that K1 ∩ K2 is again a subfield.
Now, let K be the intersection of all subfields of K. From the argument above K is a
subfield, and the only nontrivial subfield of K is itself. Hence, K is a prime field.

Definition 1.7.3. Let R be a commutative ring with an identity 1 ≠ 0. The smallest posi-
tive integer n such that n ⋅ 1 = 1 + 1 + ⋅ ⋅ ⋅ + 1 = 0 is called the characteristic of R. If there
is no such n, then R has characteristic 0. We denote the characteristic by char(R).

First, notice that 0 is the characteristic of ℤ, ℚ, ℝ. Further the characteristic of ℤn


is n.

Theorem 1.7.4. Let R be an integral domain. Then the characteristic of R is either 0 or a


prime. In particular, the characteristic of a field is zero or a prime.

Proof. Suppose that R is an integral domain and char(R) = n ≠ 0. Suppose that n = mk


with 1 < m < n, 1 < k < n. Then n ⋅ 1 = 0 = (m ⋅ 1)(k ⋅ 1). Since R is an integral domain, we
have no zero divisors and, hence, m ⋅ 1 = 0, or k ⋅ 1 = 0. However, this is a contradiction
since n is the least positive integer such that n ⋅ 1 = 0. Therefore, n must be a prime.

We have seen that every field contains a prime field. We extend this.

Definition 1.7.5. A commutative ring R with an identity 1 ≠ 0 is a prime ring if the only
subring containing the identity is the whole ring.

Clearly both the integers ℤ and the modular integers ℤn are prime rings. In fact, up
to isomorphism, they are the only prime rings.

Theorem 1.7.6. Let R be a prime ring. Then char(R) = 0 implies R ≅ ℤ, whereas char(R) =
n > 0 implies R ≅ ℤn .

Proof. Suppose that char(R) = 0. Let S = {r = m ⋅ 1 : r ∈ R, m ∈ ℤ}. Then S is a subring


of R containing the identity and, hence, S = R. However, the map m ⋅ 1 → m gives an
isomorphism from S to ℤ. It follows that R is isomorphic to ℤ.
If char(R) = n > 0, the proof is identical. Since n ⋅ 1 = 0, the subring S of R, defined
above, is all of R and isomorphic to ℤn .
1.7 Characteristic and Prime Rings � 15

Theorem 1.7.6 can be extended to fields with ℚ, taking the place of ℤ and ℤp , with
p a prime, taking the place of ℤn .

Theorem 1.7.7. Let K be a prime field. If K has characteristic 0, then K ≅ ℚ, whereas if K


has characteristic p, then K ≅ ℤp .

Proof. The proof is identical to that of Theorem 1.7.6; however, we consider the smallest
subfield K1 of K containing S.
We mention that there can be infinite fields of characteristic p. Consider, for ex-
ample, the field of fractions of the polynomial ring ℤp [x]. This is the field of rational
functions with coefficients in ℤp .
We give a theorem on fields of characteristic p that will be important much later
when we look at Galois theory.

Theorem 1.7.8. Let K be a field of characteristic p. Then the mapping ϕ : K → K, given


by ϕ(k) = k p , is an injective endomorphism of K. In particular, (a + b)p = ap + bp for any
a, b ∈ K.
This mapping is called the Frobenius homomorphism of K. Further, if K is finite, ϕ is
an automorphism.

Proof. We first show that ϕ is a homomorphism. Now

ϕ(ab) = (ab)p = ap bp = ϕ(a)ϕ(b).

We need a little more work for addition:


p p−1
p p
ϕ(a + b) = (a + b)p = ∑ ( )ai bp−i = ap + ∑ ( )ai bp−i + bp
i=0
i i=1
i

by the binomial expansion, which holds in any commutative ring. However,

p p(p − 1) ⋅ ⋅ ⋅ (p − i + 1)
( )= ,
i i ⋅ (i − 1) ⋅ ⋅ ⋅ 1

and it is clear that p|(pi) for 1 ≤ i ≤ p − 1. Hence, in K, we have (pi) ⋅ 1 = 0, and so, we have

ϕ(a + b) = (a + b)p = ap + bp = ϕ(a) + ϕ(b).

Therefore, ϕ is a homomorphism.
Further, ϕ is always injective. To see this, suppose that ϕ(x) = ϕ(y). Then

ϕ(x − y) = 0 󳨐⇒ (x − y)p = 0.

But K is a field, so there are no zero divisors. Therefore, we must have x − y = 0, or x = y.


If K is finite and ϕ is injective, it must also be surjective and, hence, an automorphism
of K.
16 � 1 Groups, Rings and Fields

1.8 Groups
We close this first chapter by introducing some basic definitions and results from
group theory that mirror the results, which were presented for rings and fields. We
will look at group theory in more detail later in the book. Proofs will be given at that
point.

Definition 1.8.1. A group G is a set with one binary operation (which we will denote by
multiplication) such that
(1) the operation is associative;
(2) there exists an identity for this operation; and
(3) each g ∈ G has an inverse for this operation.

If, in addition, the operation is commutative, the group G is called an Abelian group. The
order of G is the number of elements in G, denoted by |G|. If |G| < ∞, G is a finite group;
otherwise G is an infinite group.

Groups most often arise from invertible mappings of a set onto itself. Such mappings
are called permutations.

Theorem 1.8.2. The group of all permutations on a set A forms a group called the sym-
metric group on A, which we denote by SA . If A has more than 2 elements, then SA is non-
Abelian.

Definition 1.8.3. Let G1 and G2 be groups. Then a mapping f : G1 → G2 is a (group)


homomorphism if

f (g1 g2 ) = f (g1 )f (g2 ) for any g1 , g2 ∈ G1 .

As with rings, we have, in addition,


(1) f is an epimorphism if it is surjective.
(2) f is an monomorphism if it is injective.
(3) f is an isomorphism if it is bijective; that is, both surjective and injective. In this case,
G1 and G2 are said to be isomorphic groups, which we denote by G1 ≅ G2 .
(4) f is an endomorphism if G1 = G2 ; that is, a homomorphism from a group to itself.
(5) f is an automorphism if G1 = G2 , and f is an isomorphism.

Lemma 1.8.4. Let G1 and G2 be groups, and let f : G1 → G2 be a homomorphism. Then


1. f (1) = 1, where the first 1 is the identity element of G1 , and the second is the identity
element of G2 .
2. f (g −1 ) = (f (g))−1 for any g ∈ G1 .

If A is a set, |A| denotes the size of A.


1.8 Groups � 17

Theorem 1.8.5. If A1 and A2 are sets with |A1 | = |A2 |, then SA1 ≅ SA2 . If |A| = n with n
finite, we call SA the symmetric group on n elements, which we denote by Sn . Further, we
have |Sn | = n!.

Subgroups are defined in an analogous manner to subrings. Special types of sub-


groups, called normal subgroups, take the place in group theory that ideals play in ring
theory.

Definition 1.8.6. A subset H of a group G is a subgroup if H ≠ 0 and H forms a group


under the same operation as G. Equivalently, H is a subgroup if H ≠ 0, and H is closed
under the operation and inverses.

Definition 1.8.7. If H is a subgroup of a group G, then a left coset of H is a subset of G of


the form gH = {gh : h ∈ H}. A right coset of H is a subset of G of the form Hg = {hg : h ∈ H}.

As with rings the cosets of a subgroup partition a group. We call the number of right
cosets of a subgroup H in a group G, then index of H in G, denoted |G : H|. One can prove
that the number of right cosets is equal to the number of left cosets. For finite groups,
we have the following beautiful result called Lagrange’s theorem.

Theorem 1.8.8 (Lagrange’s theorem). Let G be a finite group and H a subgroup. Then the
order of H divides the order of G. In particular,

|G| = |H||G : H|.

Normal subgroups take the place of ideals in group theory.

Definition 1.8.9. A subgroup H of a group G is a normal subgroup, denoted H ⊲ G, if


every left coset of H is also a right coset; that is, gH = Hg for each g ∈ G. Note that this
does not say that g and H commute elementwise, just that the subsets gH and Hg are
the same. Equivalently, H is normal if g −1 Hg = H for any g ∈ G.

Normal subgroups allow us to construct factor groups, just as ideals allowed us to


construct factor rings.

Theorem 1.8.10. Let H be a normal subgroup of a group G. Let G/H be the set of all cosets
of H in G; that is,

G/H = {gH : g ∈ G}.

We define multiplication on G/H in the following manner:

(g1 H)(g2 H) = g1 g2 H.

Then G/H forms a group called the factor group or quotient group of G modulo H.
The identity element of G/H is 1H, and the inverse of gH is g −1 H. Further, if G is Abelian,
then G/H is also Abelian.
18 � 1 Groups, Rings and Fields

Finally, as with rings normal subgroups, factor groups are closely tied to homomor-
phisms.

Definition 1.8.11. Let G1 and G2 be groups, and let f : G1 → G2 be a homomorphism.


Then the kernel of f , denoted ker(f ), is

ker(f ) = {g ∈ G1 : f (g) = 1}.

The image of f , denoted im(f ), is the range of f within G2 . That is,

im(f ) = {h ∈ G2 : there exists g ∈ G1 with f (g) = h}.

Theorem 1.8.12 (Group isomorphism theorem). Let f : G1 → G2 be a homomorphism of


groups G1 and G2 . Then
(1) ker(f ) is a normal subgroup in G1 . im(f ) is a subgroup of G2 , and

G1 / ker(f ) ≅ im(f ).

(2) Conversely, suppose that H is a normal subgroup of a group G. Then f : G → G/H,


given by f (g) = gH for g ∈ G is a homomorphism, whose kernel is H and whose image
is G/H.

1.9 Exercises
1. Let ϕ : K → R be a homomorphism from a field K to a ring R. Show that either
ϕ(a) = 0 for all a ∈ K, or ϕ is a monomorphism.
2. Let R be a ring and M ≠ 0 an arbitrary set. Show that the following are equivalent:
(i) The ring of all mappings from M to R is a field.
(ii) M contains only one element and R is a field.
3. Let π be a set of prime numbers. Define

a
ℚπ = { : all prime divisors of b are in π}.
b

(i) Show that ℚπ is a subring of ℚ.


(ii) Let R be a subring of ℚ and let ab ∈ R with coprime integers a, b. Show that
1
b
∈ R.
(iii) Determine all subrings R of ℚ.
(Hint: Consider the set of all prime divisors of denominators of reduced ele-
ments of R.)
4. Prove Lemma 1.5.2.
5. Let R be a commutative ring with an identity 1 ∈ R. Let A, B and C be ideals in R.
A + B := {a + b : a ∈ A, b ∈ B} and AB := ({ab : a ∈ A, b ∈ B}). Show:
1.9 Exercises � 19

(i) A + B ⊲ R, A + B = (A ∪ B).
(ii) AB = {a1 b1 + ⋅ ⋅ ⋅ + an bn : n ∈ ℕ, ai ∈ A, bi ∈ B}, AB ⊂ A ∩ B.
(iii) A(B + C) = AB + AC, (A + B)C = AB + BC, (AB)C = A(BC).
(iv) A = R ⇔ A ∩ R∗ ≠ 0.
(v) a, b ∈ R ⇒ ⟨a⟩ + ⟨b⟩ = {xa + yb : x, y ∈ R}.
(vi) a, b ∈ R ⇒ ⟨a⟩⟨b⟩ = ⟨ab⟩. Here, ⟨a⟩ = Ra = {xa : x ∈ R}.
6. Solve the following congruence:

3x ≡ 5 (mod 7).

Is this congruence also solvable modulo 17?


7. Show that the set of (2 × 2)-matrices over a ring R forms a ring.
8. Prove Lemma 1.4.8.
9. Prove that if R is a ring with identity and S = {r = m ⋅ 1 : r ∈ R, m ∈ ℤ} then S is a
subring of R containing the identity.
2 Maximal and Prime Ideals
In this chapter we use polynomials over integral domains with one or two indetermi-
nates in an elementary fashion. We will consider polynomial rings in detail in later chap-
ters.

2.1 Maximal and Prime Ideals of the Integers


In the first chapter, we defined ideals I in a ring R, and then the factor ring R/I of R
modulo the ideal I. We saw, furthermore, that if R is commutative, then R/I is also com-
mutative, and if R has an identity, then so does R/I. This raises further questions concern-
ing the structure of factor rings. In particular, we can ask under what conditions does
R/I form an integral domain, and under what conditions does R/I form a field. These
questions lead us to define certain special properties of ideals, called prime ideals and
maximal ideals.
Let us look back at the integers ℤ. Recall that each proper ideal in ℤ has the form
nℤ for some n > 1, and the resulting factor ring ℤ/nℤ is isomorphic to ℤn . We proved
the following result:

Theorem 2.1.1. The factor ring ℤn = ℤ/nℤ is an integral domain if and only if n = p is a
prime. Furthermore, ℤn is a field again if and only if n = p is a prime.

Hence, for the integers ℤ, a factor ring is a field if and only if it is an integral domain.
We will see later that this is not true in general. However, what is clear is that special
ideals nℤ lead to integral domains and fields when n is a prime. We look at the ideals
pℤ with p a prime in two different ways, and then use these in subsequent sections to
give the general definitions. We first need a famous result, Euclid’s lemma, from number
theory. For integers a, b, the notation a|b means that a divides b.

Lemma 2.1.2 (Euclid). If p is a prime and p|ab, then p|a or p|b.

Proof. Recall that the greatest common divisor or GCD of two integers a, b is an integer
d > 0 such that d is a common divisor of both a and b, and if d1 is another common
divisor of a and b, then d1 |d. We express the GCD of a, b by d = (a, b). It is known that
for any two integers a, b, their GCD exists and is unique, and is the least positive linear
combination of a and b; that is, the least positive integer of the form ax + by for integers
x, y. The integers a, b are relatively prime if their GCD is 1, (a, b) = 1. In this case, 1 is a
linear combination of a and b (see Chapter 3 for proofs and more details).
Now suppose p|ab, where p is a prime. If p does not divide a, then since the only
positive divisors of p are 1 and p, it follows that (a, p) = 1. Hence, 1 is expressible as a
linear combination of a and p. That is, ax+py = 1 for some integers x, y. Multiply through
by b, so that

https://doi.org/10.1515/9783111142524-002
2.2 Prime Ideals and Integral Domains � 21

abx + pby = b.

Now p|ab, so p|abx and p|pby. Therefore, p|abx + pby; that is, p|b.

We now recast this lemma in two different ways in terms of the ideal pℤ. Notice
that pℤ consists precisely of all the multiples of p.
Hence, p|ab is equivalent to ab ∈ pℤ.

Lemma 2.1.3. If p is a prime and ab ∈ pℤ, then a ∈ pℤ, or b ∈ pℤ.

This conclusion will be taken as a motivation for the definition of a prime ideal in
the next section.

Lemma 2.1.4. If p is a prime and pℤ ⊂ nℤ, then n = 1, or n = p. That is, every ideal in ℤ
containing pℤ with p a prime is either all of ℤ or pℤ.

Proof. Suppose that pℤ ⊂ nℤ. Then p ∈ nℤ; therefore, p is a multiple of n. Since p is a


prime, it follows easily that either n = 1, or n = p.

In Section 2.3, the conclusion of this lemma will be taken as a motivation for the
definition of a maximal ideal.

2.2 Prime Ideals and Integral Domains


Motivated by Lemma 2.1.3, we make the following general definition for commutative
rings R with identity:

Definition 2.2.1. Let R be a commutative ring. An ideal P in R with P ≠ R is a prime ideal


if whenever ab ∈ P with a, b ∈ R, then either a ∈ P, or b ∈ P.

This property of an ideal is precisely what is necessary and sufficient to make the
factor ring R/I an integral domain.

Theorem 2.2.2. Let R be a commutative ring with an identity 1 ≠ 0, and let P be a non-
trivial ideal in R. Then P is a prime ideal if and only if the factor ring R/P is an integral
domain.

Proof. Let R be a commutative ring with an identity 1 ≠ 0, and let P be a prime ideal. We
show that R/P is an integral domain. From the results in the last chapter, we have that
R/P is again a commutative ring with an identity. Therefore, we must show that there
are no zero divisors in R/P. Suppose that (a + I)(b + I) = 0 in R/P. The zero element in
R/P is 0 + P and, hence,

(a + P)(b + P) = 0 = 0 + P 󳨐⇒ ab + P = 0 + P 󳨐⇒ ab ∈ P.
22 � 2 Maximal and Prime Ideals

However, P is a prime ideal; therefore, we must have a ∈ P, or b ∈ P. If a ∈ P, then


a + P = P = 0 + P so a + P = 0 in R/P. The identical argument works if b ∈ P. Therefore,
there are no zero divisors in R/P and, hence, R/P is an integral domain.
Conversely, suppose that R/P is an integral domain. We must show that P is a prime
ideal. Suppose that ab ∈ P. Then (a + P)(b + P) = ab + P = 0 + P. Hence, in R/P, we have

(a + P)(b + P) = 0.

However, R/P is an integral domain, so it has no zero divisors. It follows that either
a + P = 0 and, hence, a ∈ P or b + P = 0, and b ∈ P. Therefore, either a ∈ P, or b ∈ P.
Therefore, P is a prime ideal.

In a commutative ring R, we can define a multiplication of ideals. We then obtain


an exact analog of Euclid’s lemma. Since R is commutative, each ideal is 2-sided.

Definition 2.2.3. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. Define

AB = {a1 b1 + ⋅ ⋅ ⋅ + an bn : ai ∈ A, bi ∈ B, n ∈ ℕ}.

That is, AB is the set of finite sums of products ab with a ∈ A and b ∈ B.

Lemma 2.2.4. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. Then AB is an ideal.

Proof. We must verify that AB is a subring, and that it is closed under multiplication
from R. Le r1 , r2 ∈ AB. Then

r1 = a1 b1 + ⋅ ⋅ ⋅ + an bn for some ai ∈ A, bi ∈ B,

and

r2 = a1′ b′1 + ⋅ ⋅ ⋅ + am
′ ′
bm for some ai′ ∈ A, b′i ∈ B.

Then

r1 ± r2 = a1 b1 + ⋅ ⋅ ⋅ + an bn ± a1′ b′1 ± ⋅ ⋅ ⋅ ± am
′ ′
bm ,

which is clearly in AB. Furthermore,

r1 ⋅ r2 = a1 b1 a1′ b′1 + ⋅ ⋅ ⋅ + an bn am
′ ′
bm .

Consider, for example, the first term a1 b1 a1′ b′1 . Since R is commutative, this is equal to

(a1 a1′ )(b1 b′1 ).


2.3 Maximal Ideals and Fields � 23

Now a1 a1′ ∈ A since A is a subring, and b1 b′1 ∈ B since B is a subring. Hence, this term
is in AB. Similarly, for each of the other terms. Therefore, r1 r2 ∈ AB and, hence, AB is a
subring.
Now let r ∈ R, and consider rr1 . This is then

rr1 = ra1 b1 + ⋅ ⋅ ⋅ + ran bn .

Now rai ∈ A for each i since A is an ideal. Hence, each summand is in AB, and then
rr1 ∈ AB. Therefore, AB is an ideal.

Lemma 2.2.5. Let R be a commutative ring with an identity 1 ≠ 0, and let A and B be
ideals in R. If P is a prime ideal in R, then AB ⊂ P implies that A ⊂ P or B ⊂ P.

Proof. Suppose that AB ⊂ P with P a prime ideal, and suppose that B is not contained
in P. We show that A ⊂ P. Since AB ⊂ P, each product ai bj ∈ P. Choose a b ∈ B with b ∉ P,
and let a be an arbitrary element of A. Then ab ∈ P. Since P is a prime ideal, this implies
either a ∈ P, or b ∈ P. But by assumption b ∉ P, so a ∈ P. Since a was arbitrary, we have
A ⊂ P.

2.3 Maximal Ideals and Fields


Now, motivated by Lemma 2.1.4, we define a maximal ideal.

Definition 2.3.1. Let R be a ring and I an ideal in R. Then I is a maximal ideal if I ≠ R,


and if J is an ideal in R with I ⊂ J, then I = J, or J = R.

If R is a commutative ring with an identity this property of an ideal I is precisely


what is necessary and sufficient, so that R/I is a field.

Theorem 2.3.2. Let R be a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R. Then I is a maximal ideal if and only if the factor ring R/I is a field.

Proof. Suppose that R is a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R. Suppose first that I is a maximal ideal, and we show that the factor ring R/I is a field.
Since R is a commutative ring with an identity, the factor ring R/I is also a commu-
tative ring with an identity. We must show then that each nonzero element of R/I has a
multiplicative inverse. Suppose then that r = r + I ∈ R/I is a nonzero element of R/I. It
follows that r ∉ I. Consider the set ⟨r, I⟩ = {rx + i : x ∈ R, i ∈ I}. This is also an ideal (see
exercises) called the ideal generated by r and I, denoted ⟨r, I⟩. Clearly, I ⊂ ⟨r, I⟩, and
since r ∉ I, and r = r ⋅ 1 + 0 ∈ ⟨r, I⟩, it follows that ⟨r, I⟩ ≠ I. Since I is a maximal ideal,
it follows that ⟨r, I⟩ = R the whole ring. Hence, the identity element 1 ∈ ⟨r, I⟩, and so,
there exist elements x ∈ R and i ∈ I such that 1 = rx + i. But then 1 ∈ (r + I)(x + I), and
so, 1 + I = (r + I)(x + I). Since 1 + I is the multiplicative identity of R/I, it follows that
24 � 2 Maximal and Prime Ideals

x + I is the multiplicative inverse of r + I in R/I. Since r + I was an arbitrary nonzero


element of R/I, it follows that R/I is a field.
Now suppose that R/I is a field for an ideal I. We show that I must be maximal.
Suppose then that I1 is an ideal with I ⊂ I1 and I ≠ I1 . We must show that I1 is all of R.
Since I ≠ I1 , there exists an r ∈ I1 with r ∉ I. Therefore, the element r + I is nonzero in
the factor ring R/I, and since R/I is a field, it must have a multiplicative inverse x + I.
Hence, (r + I)(x + I) = rx + I = 1 + I and, therefore, there is an i ∈ I with 1 = rx + i.
Since r ∈ I1 , and I1 is an ideal, we get that rx ∈ I1 . In addition, since I ⊂ I1 , it follows that
rx + i ∈ I1 , and so, 1 ∈ I1 . If r1 is an arbitrary element of R, then r1 ⋅ 1 = r1 ∈ I1 . Hence,
R ⊂ I1 , and so, R = I1 . Therefore, I is a maximal ideal.

Recall that a field is already an integral domain. Combining this with the ideas of
prime and maximal ideals we obtain:

Theorem 2.3.3. Let R be a commutative ring with an identity 1 ≠ 0. Then each maximal
ideal is a prime ideal.

Proof. Suppose that R is a commutative ring with an identity and I is a maximal ideal
in R. Then from Theorem 2.3.2, we have that the factor ring R/I is a field. But a field is an
integral domain, so R/I is an integral domain. Therefore, from Theorem 2.2.2, we have
that I must be a prime ideal.

The converse is not true in general. That is, there are prime ideals that are not max-
imal. Consider, for example, R = ℤ the integers and I = {0}. Then I is an ideal, and
R/I = ℤ/{0} ≅ ℤ is an integral domain. Hence, {0} is a prime ideal. However, ℤ is not
a field, so {0} is not maximal. Note, however, that in the integers ℤ, a proper ideal is
maximal if and only if it is a prime ideal.

2.4 The Existence of Maximal Ideals


In this section, we prove that in any ring R with an identity, there do exist maximal ideals.
Furthermore, given an ideal I ≠ R, then there exists a maximal ideal I0 such that I ⊂ I0 .
To prove this, we need three important equivalent results from logic and set theory.
First, recall that a partial order ≤ on a set S is a reflexive, transitive relation on S.
That is, a ≤ a for all a ∈ S, and if a ≤ b, b ≤ c, then a ≤ c. This is a “partial” order since
there may exist elements a ∈ S, where neither a ≤ b, nor b ≤ a. If A is any set, then it is
clear that containment of subsets is a partial order on the power set 𝒫 (A).
If ≤ is a partial order on a set M, then a chain on M is a subset K ⊂ M such that
a, b ∈ K implies that a ≤ b or b ≤ a. A chain on M is bounded if there exists an m ∈ M
such that k ≤ m for all k ∈ K. The element m is called an upper bound for K. An element
m0 ∈ M is maximal if whenever m ∈ M with m0 ≤ m, then m = m0 . We now state the
three important results from logic.
2.5 Principal Ideals and Principal Ideal Domains � 25

Zorn’s lemma. If each chain of M has an upper bound in M, then there is at least one
maximal element in M.

Axiom of well-ordering. Each set M can be well-ordered, such that each nonempty sub-
set of M contains a least element.

Axiom of choice. Let {Mi : i ∈ I} be a nonempty collection of nonempty sets. Then there
is a mapping f : I → ⋃i∈I Mi with f (i) ∈ Mi for all i ∈ I.

The following can be proved.

Theorem 2.4.1. Zorn’s lemma, the axiom of well-ordering and the axiom of choice are all
equivalent.

We now show the existence of maximal ideals in commutative rings with identity.

Theorem 2.4.2. Let R be a commutative ring with an identity 1 ≠ 0, and let I be an ideal
in R with I ≠ R. Then there exists a maximal ideal I0 in R with I ⊂ I0 . In particular, a ring
with an identity contains maximal ideals.

Proof. Let I be an ideal in the commutative ring R. We must show that there exists a
maximal ideal I0 in R with I ⊂ I0 .
Let

M = {X : X is an ideal with I ⊂ X ≠ R}.

Then M is partially ordered by containment. We want to show first that each chain in M
has a maximal element. If K = {Xj : Xj ∈ M, j ∈ J} is a chain, let

X ′ = ⋃ Xj .
j∈J

If a, b ∈ X ′ , then there exists an i, j ∈ J with a ∈ Xi , b ∈ Xj . Since K is a chain, either


Xi ⊂ Xj or Xj ⊂ Xi . Without loss of generality, suppose that Xi ⊂ Xj so that a, b ∈ Xj .
Then a ± b ∈ Xj ⊂ X ′ , and ab ∈ Xj ⊂ X ′ , since Xj is an ideal. Furthermore, if r ∈ R, then
ra ∈ Xj ⊂ X ′ , since Xj is an ideal. Therefore, X ′ is an ideal in R.
Since Xj ≠ R, it follows that 1 ∉ Xj for all j ∈ J. Therefore, 1 ∉ X ′ , and so X ′ ≠ R. It
follows that under the partial order of containment X ′ is an upper bound for K.
We now use Zorn’s lemma. From the argument above, we have that each chain has
a maximal element. Hence, for an ideal I, the set M above has a maximal element. This
maximal element I0 is then a maximal ideal containing I.

2.5 Principal Ideals and Principal Ideal Domains


Recall again that in the integers ℤ, each ideal I is of the form nℤ for some integer n.
Hence, in ℤ, each ideal can be generated by a single element.
26 � 2 Maximal and Prime Ideals

Lemma 2.5.1. Let R be a commutative ring and a1 , . . . , an be elements of R. Then the set

⟨a1 , . . . , an ⟩ = {r1 a1 + ⋅ ⋅ ⋅ + rn an : ri ∈ R}

forms an ideal in R called the ideal generated by a1 , . . . , an .

Proof. The proof is straightforward. Let

a = r1 a1 + ⋅ ⋅ ⋅ + rn an , b = s1 a1 + ⋅ ⋅ ⋅ + sn an

with r1 , . . . , rn , s1 , . . . , sn elements of R, be two elements of ⟨a1 , . . . , an ⟩. Then

a ± b = (r1 ± s1 )a1 + ⋅ ⋅ ⋅ + (rn ± sn )an ∈ ⟨a1 , . . . , an ⟩


ab = (r1 s1 a1 )a1 + (r1 s2 a1 )a2 + ⋅ ⋅ ⋅ + (rn sn an )an ∈ ⟨a1 , . . . , an ⟩,

so ⟨a1 , . . . , an ⟩ forms a subring. Furthermore, if r ∈ R, we have

ra = (rr1 )a1 + ⋅ ⋅ ⋅ + (rrn )an ∈ ⟨a1 , . . . , an ⟩,

and so ⟨a1 , . . . , an ⟩ is an ideal.

Definition 2.5.2. Let R be a commutative ring. An ideal I ⊂ R is a principal ideal if it has


a single generator. That is,

I = ⟨a⟩ = aR for some a ∈ R.

We now restate Theorem 1.4.7 of Chapter 1.

Theorem 2.5.3. Every nonzero ideal in ℤ is a principal ideal.

Proof. Every ideal I in ℤ is of the form nℤ. This is the principal ideal generated by n.

Definition 2.5.4. A principal ideal domain or PID is an integral domain, in which every
ideal is principal.

Corollary 2.5.5. The integers ℤ are a principal ideal domain.

We mention that the set of polynomials K[x] with coefficients from a field K is also
a principal ideal domain. We will return to this in the next chapter.
Not every integral domain is a PID. Consider K[x, y] = (K[x])[y], the set of polyno-
mials over K in two variables x, y (see Chapter 4). Let I consist of all the polynomials
with zero constant term.

Lemma 2.5.6. The set I in K[x, y] as defined above is an ideal, but not a principal ideal.

Proof. We leave the proof that I forms an ideal to the exercises. To show that it is not
a principal ideal, suppose I = ⟨p(x, y)⟩. Now the polynomial q(x) = x has zero constant
term, so q(x) ∈ I. Hence, p(x, y) cannot be a constant polynomial. In addition, if p(x, y)
2.6 Exercises � 27

had any terms with y in them, there would be no way to multiply p(x, y) by a polynomial
h(x, y) and obtain just x. Therefore, p(x, y) can contain no terms with y in them. But the
same argument, using s(y) = y, shows that p(x, y) cannot have any terms with x in them.
Therefore, there can be no such p(x, y) generating I, and so, I is not principal, and K[x, y]
is not a principal ideal domain.

2.6 Exercises
1. Consider the set ⟨r, I⟩ = {rx + i : x ∈ R, i ∈ I}, where I is an ideal. Prove that this is
also an ideal called the ideal generated by r and I, denoted ⟨r, I⟩.
2. Let R and S be commutative rings, and let ϕ : R → S be a ring epimorphism. Let
M be a maximal ideal in R. Show that ϕ(M) is a maximal ideal in S if and only if
ker(ϕ) ⊂ M. Is ϕ(M) always a prime ideal of S?
3. Let A1 , . . . , At be ideals of a commutative ring R. Let P be a prime ideal of R. Show:
(i) ⋂ti=1 Ai ⊂ P implies Aj ⊂ P for at least one index j.
(ii) ⋂ti=1 Ai = P implies Aj = P for at least one index j.
4. Which of the following ideals A are prime ideals of R? Which are maximal ideals?
(i) A = ⟨x⟩, R = ℤ[x].
(ii) A = ⟨x 2 ⟩, R = ℤ[x].
(iii) A = ⟨1 + √5⟩, R = ℤ[√5] = {a + b√5 : a, b ∈ ℤ}.
(iv) A = ⟨x, y⟩, R = ℚ[x, y].
5. Let w = 21 (1+ √−3). Show that ⟨2⟩ is a prime ideal and even a maximal ideal of ℤ[w],
but ⟨2⟩ is neither a prime ideal nor a maximal ideal of ℤ[i], i = √−1 ∈ ℂ.
6. Let R = { ab : a, b ∈ ℤ, b odd}. Show that R is a subring of ℚ, and that there is only
one maximal ideal M in R.
7. Let R be a commutative ring with an identity. Let x, y ∈ R and x ≠ 0 not be a zero di-
visor. Furthermore, let ⟨x⟩ be a prime ideal with ⟨x⟩ ⊂ ⟨y⟩ ≠ R. Show that ⟨x⟩ = ⟨y⟩.
8. Consider K[x, y] the set of polynomials over K in two variables x, y. Let I consist of
all the polynomials with zero constant term. Prove that the set I is an ideal.
3 Prime Elements and Unique Factorization Domains
In this chapter we use again polynomials over integral domains with one or two indeter-
minates in an elementary fashion. We will consider polynomial rings in detail in later
chapters.

3.1 The Fundamental Theorem of Arithmetic


The integers ℤ have served as much of our motivation for properties of integral do-
mains. In the last chapter, we saw that ℤ is a principal ideal domain, and furthermore,
that prime ideals ≠ {0} are maximal. From the viewpoint of the multiplicative structure
of ℤ and the viewpoint of classical number theory, the most important property of ℤ
is the fundamental theorem of arithmetic. This states that any integer n ≠ 0 is uniquely
expressible as a product of primes, where uniqueness is up to ordering and the intro-
duction of ±1; that is, units. In this chapter, we show that this property is not unique to
the integers, and there are many other integral domains, where this also holds. These
are called unique factorization domains, and we will present several examples. First, we
review the fundamental theorem of arithmetic, its proof and several other ideas from
classical number theory.

Theorem 3.1.1 (Fundamental theorem of arithmetic). Given any integer n ≠ 0, there is a


factorization

n = cp1 p2 ⋅ ⋅ ⋅ pk ,

where c = ±1 and p1 , . . . , pk are primes. Furthermore, this factorization is unique up to


the ordering of the factors.

There are two main ingredients that go into the proof: induction and Euclid’s lemma.
We presented this in the last chapter. In turn, however, Euclid’s lemma depends upon
the existence of greatest common divisors and their linear expressibility. Therefore, to
begin, we present several basic ideas from number theory.
The starting point for the theory of numbers is divisibility.

Definition 3.1.2. If a, b are integers, we say that a divides b, or that a is a factor or divisor
of b, if there exists an integer q such that b = aq. We denote this by a|b. b is then a multiple
of a. If b > 1 is an integer whose only factors are ±1, ±b, then b is a prime, otherwise, b > 1
is composite.

The following properties of divisibility are straightforward consequences of the def-


inition.

Lemma 3.1.3. The following properties hold:


(1) a|b ⇒ a|bc for any integer c.

https://doi.org/10.1515/9783111142524-003
3.1 The Fundamental Theorem of Arithmetic � 29

(2) a|b and b|c implies a|c.


(3) a|b and a|c implies that a|(bx + cy) for any integers x, y.
(4) a|b and b|a implies that a = ±b.
(5) If a|b and a > 0, b > 0, then a ≤ b.
(6) a|b if and only if ca|cb for any integer c ≠ 0.
(7) a|0 for all a ∈ ℤ, and 0|a only for a = 0.
(8) a| ± 1 only for a = ±1.
(9) a1 |b1 and a2 |b2 implies that a1 a2 |b1 b2 .

If b, c, x, y are integers, then an integer bx + cy is called a linear combination of b, c.


Thus, part (3) of Lemma 3.1.3 says that if a is a common divisor of b, c, then a divides any
linear combination of b and c.
Furthermore, note that if b > 1 is a composite, then there exists x > 0 and y > 0 such
that b = xy, and from part (5), we must have 1 < x < b, 1 < y < b.
In ordinary arithmetic, given a, b, we can always attempt to divide a into b. The
next result, called the division algorithm, says that if a > 0, either a will divide b, or the
remainder of the division of b by a will be less than a.

Theorem 3.1.4 (Division algorithm). Given integers a, b with a > 0, then there exist unique
integers q and r such that b = qa + r, where either r = 0 or 0 < r < a.

One may think of q and r as the quotient and remainder, respectively, when dividing
b by a.

Proof. Given a, b with a > 0, consider the set

S = {b − qa ≥ 0 : q ∈ ℤ}.

If b > 0, then b + a ≥ 0, and the sum is in S. If b ≤ 0, then there exists a q > 0 with
−qa < b. Then b + qa > 0 and is in S. Therefore, in either case, S is nonempty. Hence, S
is a nonempty subset of ℕ ∪ {0} and, therefore, has a least element r. If r ≠ 0, we must
show that 0 < r < a. Suppose r ≥ a, then r = a + x with x ≥ 0, and x < r since a > 0.
Then b − qa = r = a + x ⇒ b − (q + 1)a = x. This means that x ∈ S. Since x < r, this
contradicts the minimality of r, which is a contradiction. Therefore, if r ≠ 0, it follows
that 0 < r < a.
The only thing left is to show the uniqueness of q and r. Suppose b = q1 a + r1 also.
By the construction above, r1 must also be the minimal element of S. Hence, r1 ≤ r, and
r ≤ r1 so r = r1 . Now

b − qa = b − q1 a 󳨐⇒ (q1 − q)a = 0,

but since a > 0, it follows that q1 − q = 0 so that q = q1 .


The next idea that is necessary is the concept of greatest common divisor.
30 � 3 Prime Elements and Unique Factorization Domains

Definition 3.1.5. Given nonzero integers a, b, their greatest common divisor or GCD
d > 0 is a positive integer such that it is their common divisor, that is, d|a and d|b, and
if d1 is any other common divisor, then d1 |d. We denote the greatest common divisor of
a, b by either gcd(a, b) or (a, b).

Certainly, if a, b are nonzero integers with a > 0 and a|b, then a = gcd(a, b).
The next result says that given any nonzero integers, they do have a greatest com-
mon divisor, and it is unique.

Theorem 3.1.6. Given nonzero integers a, b, their GCD exists, is unique, and can be char-
acterized as the least positive linear combination of a and b.

Proof. Given nonzero a, b, consider the set

S = {ax + by > 0 : x, y ∈ ℤ}.

Now, a2 + b2 > 0, so S is a nonempty subset of ℕ and, hence, has a least element, d > 0.
We show that d is the GCD.
First we must show that d is a common divisor. Now d = ax + by and is the least
such positive linear combination. By the division algorithm, a = qd + r with 0 ≤ r < d.
Suppose r ≠ 0. Then r = a − qd = a − q(ax + by) = (1 − qx)a − qby > 0. Hence, r is a
positive linear combination of a and b, and therefore in S. But then r < d, contradicting
the minimality of d in S. It follows that r = 0, and so, a = qd, and d|a. An identical
argument shows that d|b, and so, d is a common divisor of a and b. Let d1 be any other
common divisor of a and b. Then d1 divides any linear combination of a and b, and so
d1 |d. Therefore, d is the GCD of a and b.
Finally, we must show that d is unique. Suppose d1 is another GCD of a and b. Then
d1 > 0, and d1 is a common divisor of a, b. Then d1 |d since d is a GCD. Identically, d|d1
since d1 is a GCD. Therefore, d = ±d1 , and then d = d1 since they are both positive.

If (a, b) = 1, then we say that a, b are relatively prime. It follows that a and b are
relatively prime if and only if 1 is expressible as a linear combination of a and b. We
need the following three results:

Lemma 3.1.7. If d = (a, b), then a = a1 d and b = b1 d with (a1 , b1 ) = 1.

Proof. If d = (a, b), then d|a, and d|b. Hence, a = a1 d, and b = b1 d. We have

d = ax + by = a1 dx + b1 dy.

Dividing both sides of the equation by d, we obtain

1 = a1 x + b1 y.

Therefore, (a1 , b1 ) = 1.
3.1 The Fundamental Theorem of Arithmetic � 31

Lemma 3.1.8. For any integer c, we have that (a, b) = (a, b + ac).

Proof. Suppose (a, b) = d and (a, b + ac) = d1 . Now d is the least positive linear combi-
nation of a and b. Suppose d = ax + by. d1 is a linear combination of a, b + ac so that

d1 = ar + (b + ac)s = a(cs + r) + bs.

Hence, d1 is also a linear combination of a and b; therefore, d1 ≥ d. On the other hand,


d1 |a, and d1 |(b + ac), and so, d1 |b. Therefore, d1 |d, so d1 ≤ d. Combining these, we must
have d1 = d.

The next result, called the Euclidean algorithm, provides a technique for both find-
ing the GCD of two integers and expressing the GCD as a linear combination.

Theorem 3.1.9 (Euclidean algorithm). Given integers b and a > 0 with a ∤ b, the following
repeated divisions are formed:

b = q1 a + r1 , 0 < r1 < a
a = q2 r1 + r2 , 0 < r2 < r1
..
.
rn−2 = qn rn−1 + rn , 0 < rn < rn−1
rn−1 = qn+1 rn .

The last nonzero remainder rn is the GCD of a, b. Furthermore, rn can be expressed as


a linear combination of a and b by successively eliminating the ri ’s in the intermediate
equations.

Proof. In taking the successive divisions as outlined in the statement of the theorem,
each remainder ri gets strictly smaller and still nonnegative. Hence, it must finally end
with a zero remainder. Therefore, there is a last nonzero remainder rn . We must show
that this is the GCD.
Now from Lemma 3.1.7, the gcd (a, b) = (a, b − q1 a) = (a, r1 ) = (r1 , a − q2 r1 ) = (r1 , r2 ).
Continuing in this manner, we have then that (a, b) = (rn−1 , rn ) = rn since rn divides rn−1 .
This shows that rn is the GCD.
To express rn as a linear combination of a and b, first notice that

rn = rn−2 − qn rn−1 .

Substituting this in the immediately preceding division, we get

rn = rn−2 − qn (rn−3 − qn−1 rn−2 ) = (1 + qn qn−1 )rn−2 − qn rn−3 .

Doing this successively, we ultimately express rn as a linear combination of a and b.


32 � 3 Prime Elements and Unique Factorization Domains

Example 3.1.10. Find the GCD of 270 and 2412, and express it as a linear combination of
270 and 2412.
We apply the Euclidean algorithm

2412 = 8 ⋅ 270 + 252


270 = 1 ⋅ 252 + 18
252 = 14 ⋅ 18.

Therefore, the last nonzero remainder is 18, which is the GCD. We now must express 18
as a linear combination of 270 and 2412.
From the first equation

252 = 2412 − 8 ⋅ 270,

which gives in the second equation

270 = 2412 − 8 ⋅ 270 + 18 󳨐⇒ 18 = −1 ⋅ 2412 + 9 ⋅ 270,

which is the desired linear combination.

The next result that we need is Euclid’s lemma. We stated and proved this in the last
chapter, but we restate it here.

Lemma 3.1.11 (Euclid’s lemma). If p is a prime and p|ab, then p|a, or p|b.

We can now prove the fundamental theorem of arithmetic. Induction suffices to


show that there always exists such a decomposition into prime factors.

Lemma 3.1.12. Any integer n > 1 can be expressed as a product of primes, perhaps with
only one factor.

Proof. The proof is by induction. n = 2 is prime. Therefore, it is true at the lowest level.
Suppose that any integer 2 ≤ k < n can be decomposed into prime factors, we must
show that n then also has a prime factorization.
If n is prime, then we are done. Suppose then that n is composite. Hence, n = m1 m2
with 1 < m1 < n, 1 < m2 < n. By the inductive hypothesis, both m1 and m2 can be
expressed as products of primes. Therefore, n can, also using the primes from m1 and
m2 , completing the proof.
Before we continue to the fundamental theorem, we mention that the existence of
a prime decomposition, unique or otherwise, can be used to prove that the set of primes
is infinite. The proof we give goes back to Euclid and is quite straightforward.

Theorem 3.1.13. There are infinitely many primes.

Proof. Suppose that there are only finitely many primes p1 , . . . , pn . Each of these is pos-
itive, so we can form the positive integer
3.1 The Fundamental Theorem of Arithmetic � 33

N = p1 p2 ⋅ ⋅ ⋅ pn + 1.

From Lemma 3.1.12, N has a prime decomposition. In particular, there is a prime p, which
divides N. Then

p|(p1 p2 ⋅ ⋅ ⋅ pn + 1).

Since the only primes are assumed p1 , p2 , . . . , pn , it follows that p = pi for some i =
1, . . . , n. But then p|p1 p2 ⋅ ⋅ ⋅ pi ⋅ ⋅ ⋅ pn so p cannot divide p1 ⋅ ⋅ ⋅ pn + 1, which is a contradic-
tion. Therefore, p is not one of the given primes showing that the list of primes must be
endless.

We can now prove the fundamental theorem of arithmetic.

Proof. We assume that n ≥ 1. If n ≤ −1, we use c = −n, and the proof is the same. The
statement certainly holds for n = 1 with k = 0. Now suppose n > 1. From Lemma 3.1.12,
n has a prime decomposition:

n = p1 p2 ⋅ ⋅ ⋅ pm .

We must show that this is unique up to the ordering of the factors. Suppose then that n
has another such factorization n = q1 q2 ⋅ ⋅ ⋅ qk with the qi all prime. We must show that
m = k, and that, the primes are the same. Now we have

n = p1 p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk .

Assume that k ≥ m. From

n = p1 p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk ,

it follows that p1 |q1 q2 ⋅ ⋅ ⋅ qk . From Lemma 3.1.11 then, we must have that p1 |qi for some i.
But qi is prime, and p1 > 1, so it follows that p1 = qi . Therefore, we can eliminate p1 and
qi from both sides of the factorization to obtain

p2 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qi−1 qi+1 ⋅ ⋅ ⋅ qk .

Continuing in this manner, we can eliminate all the pi from the left side of the factoriza-
tion to obtain

1 = qm+1 ⋅ ⋅ ⋅ qk .

If qm+1 , . . . , qk were primes, this would be impossible. Therefore, m = k, and each prime
pi was included in the primes q1 , . . . , qm . Therefore, the factorizations differ only in the
order of the factors, proving the theorem.
34 � 3 Prime Elements and Unique Factorization Domains

3.2 Prime Elements, Units and Irreducibles


We now let R be an arbitrary integral domain and attempt to mimic the divisibility def-
initions and properties.

Definition 3.2.1. Let R be an integral domain.


(1) Suppose that a, b ∈ R. Then a is a factor or divisor of b if there exists a c ∈ R with
b = ac. We denote this, as in the integers, by a|b. If a is a factor of b, then b is called
a multiple of a.
(2) An element a ∈ R is a unit if a has a multiplicative inverse within R; that is, there
exists an element a−1 ∈ R with aa−1 = 1.
(3) A prime element of R is an element p ≠ 0 such that p is not a unit, and if p|ab, then
p|a or p|b.
(4) An irreducible element in R is an element c ≠ 0 such that c is not a unit, and if c = ab,
then a or b must be a unit.
(5) a and b in R are associates if there exists a unit e ∈ R with a = eb.

Notice that in the integers ℤ, the units are just ±1. The set of prime elements co-
incides with the set of irreducible elements. In ℤ, these are precisely the set of prime
numbers. On the other hand, if K is a field, every nonzero element is a unit. Therefore,
in K, there are no prime elements and no irreducible elements.
Recall that the modular rings ℤn are fields (and integral domains) when n is a prime.
In general, if n is not a prime then ℤn is a commutative ring with an identity, and a unit
is still an invertible element. We can characterize the units within ℤn .

Lemma 3.2.2. a ∈ ℤn is a unit if and only if (a, n) = 1.

Proof. Suppose (a, n) = 1. Then there exist x, y ∈ ℤ such that ax + ny = 1. This implies
that ax ≡ 1 (mod n), which in turn implies that ax = 1 in ℤn and, therefore, a is a unit.
Conversely, suppose a is a unit in ℤn . Then there is an x ∈ ℤn with ax = 1. In terms
of congruence then

ax ≡ 1 (mod n) 󳨐⇒ n|(ax − 1) 󳨐⇒ ax − 1 = ny 󳨐⇒ ax − ny = 1.

Therefore, 1 is a linear combination of a and n and so (a, n) = 1.

If R is an integral domain, then the set of units within R will form a group.

Lemma 3.2.3. If R is a commutative ring with an identity, then the set of units in R form
an Abelian group under ring multiplication. This is called the unit group of R, denoted
U(R).

Proof. The commutativity and associativity of U(R) follow from the ring properties. The
identity of U(R) is the multiplicative identity of R, whereas the ring multiplicative in-
verse for each unit is the group inverse. We must show that U(R) is closed under ring
3.2 Prime Elements, Units and Irreducibles � 35

multiplication. If a ∈ R is a unit, we denote its multiplicative inverse by a−1 . Now suppose


a, b ∈ U(R). Then a−1 , b−1 exist. It follows that

(ab)(b−1 a−1 ) = a(bb−1 )a−1 = aa−1 = 1.

Hence, ab has an inverse, namely b−1 a−1 (= a−1 b−1 in a commutative ring) and, hence,
ab is also a unit. Therefore, U(R) is closed under ring multiplication.

In general, irreducible elements are not prime. Consider for example the subring of
the complex numbers (see exercises) given by

R = ℤ[i√5] = {x + iy√5 : x, y ∈ ℤ}.

This is a subring of the complex numbers ℂ and, hence, can have no zero divisors. There-
fore, R is an integral domain.
For an element x + iy√5 ∈ R, define its norm by

N(x + iy√5) = 󵄨󵄨󵄨x + iy√5󵄨󵄨󵄨 = x 2 + 5y2 .


󵄨 󵄨

Since x, y ∈ ℤ, it is clear that the norm of an element in R is a nonnegative integer.


Furthermore, if a ∈ R with N(a) = 0, then a = 0.
We have the following result concerning the norm:

Lemma 3.2.4. Let R and N be as above. Then


(1) N(ab) = N(a)N(b) for any elements a, b ∈ R.
(2) The units of R are those a ∈ R with N(a) = 1. In R, the only units are ±1.

Proof. The fact that the norm is multiplicative is straightforward and left to the exer-
cises. If a ∈ R is a unit, then there exists a multiplicative inverse b ∈ R with ab = 1. Then
N(ab) = N(a)N(b) = 1. Since both N(a) and N(b) are nonnegative integers, we must
have N(a) = N(b) = 1.
Conversely, suppose that N(a) = 1. If a = x + iy√5, then x 2 + 5y2 = 1. Since x, y ∈ ℤ,
we must have y = 0 and x 2 = 1. Then a = x = ±1.

Using this lemma we can show that R possesses irreducible elements that are not
prime.

Lemma 3.2.5. Let R be as above. Then 3 = 3 + i0√5 is an irreducible element in R, but 3 is


not prime.

Proof. Suppose that 3 = ab with a, b ∈ R and a, b nonunits. Then N(3) = 9 = N(a)N(b)


with neither N(a) = 1, nor N(b) = 1. Hence, N(a) = 3, and N(b) = 3. Let a = x + iy√5. It
follows that x 2 + 5y2 = 3. Since x, y ∈ ℤ, this is impossible. Therefore, one of a or b must
be a unit, and 3 is an irreducible element.
36 � 3 Prime Elements and Unique Factorization Domains

We show that 3 is not prime in R. Let a = 2 + i√5 and b = 2 − i√5. Then ab = 9 and,
hence, 3|ab. Suppose 3|a so that a = 3c for some c ∈ R. Then

9 = N(a) = N(3)N(c) = 9N(c) 󳨐⇒ N(c) = 1.

Therefore, c is a unit in R, and from Lemma 3.2.4, we get c = ±1. Hence, a = ±3. This
is a contradiction, so 3 does not divide a. An identical argument shows that 3 does not
divide b. Therefore, 3 is not a prime element in R.

We now examine the relationship between prime elements and irreducibles.

Theorem 3.2.6. Let R be an integral domain. Then


(1) Each prime element of R is irreducible.
(2) p ∈ R is a prime element if and only if p ≠ 0, and ⟨p⟩ = pR is a prime ideal.
(3) p ∈ R is irreducible if and only if p ≠ 0, and ⟨p⟩ = pR is maximal in the set of all
principal ideals of R, which are not equal to R.

Proof. (1) Suppose that p ∈ R is a prime element, and p = ab. We must show that either
a or b must be a unit. Now p|ab, so either p|a, or p|b. Without loss of generality, we may
assume that p|a, so a = pr for some r ∈ R. Hence, p = ab = (pr)b = p(rb). However, R is
an integral domain, so p − prb = p(1 − rb) = 0 implies that 1 − rb = 0 and, hence, rb = 1.
Therefore, b is a unit and, hence, p is irreducible.
(2) Suppose that p is a prime element. Then p ≠ 0. Consider the ideal pR, and suppose
that ab ∈ pR. Then ab is a multiple of p and, hence, p|ab. Since p is prime, it follows that
p|a or p|b. If p|a, then a ∈ pR, whereas if p|b, then b ∈ pR. Therefore, pR is a prime ideal.
Conversely, suppose that pR is a prime ideal, and suppose that p = ab. Then ab ∈ pR,
so a ∈ pR, or b ∈ pR. If a ∈ pR, then p|a, and if b ∈ pR, then p|b. Therefore, p is prime.
(3) Let p be irreducible, then p ≠ 0. Suppose that pR ⊂ aR, where a ∈ R. Then p = ra
for some r ∈ R. Since p is irreducible, it follows that either a is a unit, or r is a unit. If
r is a unit, we have pR = raR = aR ≠ R since p is not a unit. If a is a unit, then aR = R,
and pR = rR ≠ R. Therefore, pR is maximal in the set of principal ideals not equal to R.
Conversely, suppose p ≠ 0 and pR is a maximal ideal in the set of principal ideals ≠ R. Let
p = ab with a not a unit. We must show that b is a unit. Since aR ≠ R, and pR ⊂ aR, from
the maximality we must have pR = aR. Hence, a = rp for some r ∈ R. Then p = ab = rpb
and, as before, we must have rb = 1 and b a unit.

Theorem 3.2.7. Let R be a principle ideal domain. Then we have the following:
(1) An element p ∈ R is irreducible if and only if it is a prime element.
(2) A nonzero ideal of R is a maximal ideal if and only if it is a prime ideal.
(3) The maximal ideals of R are precisely those ideals pR, where p is a prime element.

Proof. First note that {0} is a prime ideal, but not maximal.
(1) We already know that prime elements are irreducible. To show the converse,
suppose that p is irreducible. Since R is a principal ideal domain from Theorem 3.2.6, we
3.3 Unique Factorization Domains � 37

have that pR is a maximal ideal, and each maximal ideal is also a prime ideal. Therefore,
from Theorem 3.2.6, we have that p is a prime element.
(2) We already know that each maximal ideal is a prime ideal. To show the converse,
suppose that I ≠ {0} is a prime ideal. Then I = pR, where p is a prime element with
p ≠ 0. Therefore, p is irreducible from part (1) and, hence, pR is a maximal ideal from
Theorem 3.2.6.
(3) This follows directly from the proof in part (2) and Theorem 3.2.6.

This Theorem especially explains the following remark at the end of Section 2.3: In
the principal ideal domain ℤ, a proper ideal is maximal if and only if it is a prime ideal.

3.3 Unique Factorization Domains


We now consider integral domains, where there is unique factorization into primes. If R
is an integral domain and a, b ∈ R, then we say that a and b are associates if there exists
a unit ϵ ∈ R with a = ϵb.

Definition 3.3.1. An integral domain D is a unique factorization domain or UFD if for


each d ∈ D either d = 0, d is a unit, or d has a factorization into primes, which is unique
up to ordering and unit factors. This means that if

r = p1 ⋅ ⋅ ⋅ pm = q1 ⋅ ⋅ ⋅ qk ,

then m = k, and each pi is an associate of some qj .

There are several relationships in integral domains that are equivalent to unique
factorization.

Definition 3.3.2. Let R be an integral domain.


(1) R has property (A) if and only if for each nonunit a ≠ 0 there are irreducible ele-
ments q1 , . . . , qr ∈ R, satisfying a = q1 ⋅ ⋅ ⋅ qr .
(2) R has property (A′ ) if and only if for each nonunit a ≠ 0 there are prime elements
p1 , . . . , pr ∈ R, satisfying a = p1 ⋅ ⋅ ⋅ pr .
(3) R has property (B) if and only if whenever q1 , . . . , qr and q1′ , . . . , qs′ are irreducible
elements of R with q1 ⋅ ⋅ ⋅ qr = q1′ ⋅ ⋅ ⋅ qs′ . Then r = s, and there is a permutation π ∈ Sr
such that for each i ∈ {1, . . . , r} the elements qi and qπ(i) ′
are associates (uniqueness
up to ordering and unit factors).
(4) R has property (C) if and only if each irreducible element of R is a prime element.

Notice that properties (A) and (C) together are equivalent to what we defined as
unique factorization. Hence, an integral domain satisfying (A) and (C) is a UFD. Next, we
show that there are other equivalent formulations.

Theorem 3.3.3. In an integral domain R, the following are equivalent:


38 � 3 Prime Elements and Unique Factorization Domains

(1) R is a UFD.
(2) R satisfies properties (A) and (B).
(3) R satisfies properties (A) and (C).
(4) R satisfies property (A′ ).

Proof. As remarked before, the statement of the theorem by definition (A) and (C) are
equivalent to unique factorization. We show here that (2), (3), and (4) are equivalent.
First, we show that (2) implies (3).
Suppose that R satisfies properties (A) and (B). We must show that it also satisfies (C);
that is, we must show that if q ∈ R is irreducible, then q is prime. Suppose that q ∈ R
is irreducible and q|ab with a, b ∈ R. Then we have ab = cq for some c ∈ R. If a is a
unit from ab = cq, we get that b = a−1 cq, and q|b. The results are identical if b is a unit.
Therefore, we may assume that neither a nor b are units.
If c = 0, then since R is an integral domain, either a = 0, or b = 0, and q|a, or q|b.
We may assume then that c ≠ 0.
If c is a unit, then q = c−1 ab, and since q is irreducible, either c−1 a, or b are units. If
c a is a unit, then a is also a unit. Therefore, if c is a unit, either a or b are units contrary
−1

to our assumption.
Therefore, we may assume that c ≠ 0, and c is not a unit. From (A) we have

a = q1 ⋅ ⋅ ⋅ qr
b = q1′ ⋅ ⋅ ⋅ qs′
c = q1′′ ⋅ ⋅ ⋅ qt′′ ,

where q1 , . . . qr , q1′ , . . . , qs′ , q1′′ , . . . qt′′ are all irreducibles. Hence,

q1 ⋅ ⋅ ⋅ qr q1′ ⋅ ⋅ ⋅ qs′ = q1′′ ⋅ ⋅ ⋅ qt′′ ⋅ q.

From (B), q is an associate of some qi or qj′ . Hence, q|qi or q|qj′ . It follows that q|a, or
q|b and, therefore, q is a prime element.
That (3) implies (4) is direct.
We show that (4) implies (2). Suppose that R satisfies (A′ ). We must show that it satis-
fies both (A) and (B). We show first that (A) follows from (A′ ) by showing that irreducible
elements are prime. Suppose that q is irreducible. Then from (A′ ), we have

q = p1 ⋅ ⋅ ⋅ pr

with each pi prime. It follows, without loss of generality, that p2 ⋅ ⋅ ⋅ pr is a unit, and p1 is a
nonunit and, hence, pi |1 for i = 2, . . . , r. Thus, q = p1 , and q is prime. Therefore, (A) holds.
We now show that (B) holds. Let

q1 ⋅ ⋅ ⋅ qr = q1′ ⋅ ⋅ ⋅ qs′ ,
3.3 Unique Factorization Domains � 39

where qi , qj′ are all irreducibles; hence primes. Then

q1′ |q1 ⋅ ⋅ ⋅ qr ,

and so, q1′ |qi for some i. Without loss of generality, suppose q1′ |q1 . Then q1 = aq1′ . Since q1
is irreducible, it follows that a is a unit, and q1 and q1′ are associates. It follows then that

aq2 ⋅ ⋅ ⋅ qr = q2′ ⋅ qs′

since R has no zero divisors. Property (B) holds then by induction, and the theorem is
proved.

Note that in our new terminology, ℤ is a UFD. In the next section, we will present
other examples of UFD’s. However, not every integral domain is a unique factorization
domain.
As we defined in the last section, let R be the following subring of ℂ:

R = ℤ[i√5] = {x + iy√5 : x, y ∈ ℤ}.

R is an integral domain, and we showed, using the norm, that 3 is an irreducible in R.


Analogously, we can show that the elements 2 + i√5, 2 − i√5 are also irreducibles in R,
and furthermore, 3 is not an associate of either 2 + i√5 or 2 − i√5. Then

9 = 3 ⋅ 3 = (2 + i√5)(2 − i√5)

give two different decompositions for an element in terms of irreducible elements. The
fact that R is not a UFD also follows from the fact that 3 is an irreducible element, which
is not prime.
Unique factorization is tied to the famous solution of Fermat’s big theorem. Wiles
and Taylor in 1995 proved the following:

Theorem 3.3.4. The equation x p + yp = zp has no integral solutions with xyz ≠ 0 for any
prime p ≥ 3.

Kummer tried to prove this theorem by attempting to factor x p = zp − yp . We call


2πi
the statement of Theorem 3.3.4 in an integral domain R property (Fp ). Let ϵ = e p . Then

p−1
zp − yp = ∏(z − ϵj y).
j=0

View this equation in the ring:

p−1
R = ℤ[ϵ] = { ∑ aj ϵj : aj ∈ ℤ}.
j=0
40 � 3 Prime Elements and Unique Factorization Domains

Kummer proved that if R is a UFD, then property (Fp ) holds. However, independently,
from Uchida and Montgomery (1971), R is a UFD only if p ≤ 19 (see [59]).

3.4 Principal Ideal Domains and Unique Factorization


In this section, we prove that every principal ideal domain (PID) is a unique factorization
domain (UFD). We say that an ascending chain of ideals in R

I1 ⊂ I2 ⊂ ⋅ ⋅ ⋅ ⊂ In ⊂ ⋅ ⋅ ⋅

becomes stationary if there exists an m such that Ir = Im for all r ≥ m.

Theorem 3.4.1. Let R be an integral domain. If each ascending chain of principal ideals
in R becomes stationary, then R satisfies property (A).

Proof. Suppose that a ≠ 0 is a not a unit in R. Suppose that a is not a product of ir-
reducible elements. Clearly then, a cannot itself be irreducible. Hence, a = a1 b1 with
a1 , b1 ∈ R, and a1 , b1 are not units. If both a1 or b1 can be expressed as a product of irre-
ducible elements, then so can a. Without loss of generality then, suppose that a1 is not a
product of irreducible elements.
Since a1 |a, we have the inclusion of ideals aR ⊆ a1 R. If a1 R = aR, then a1 ∈ aR, and
a1 = ar = a1 b1 r, which implies that b1 is a unit contrary to our assumption. Therefore,
aR ≠ a1 R, and the inclusion is proper. By iteration then, we obtain a strictly increasing
chain of ideals

aR ⊂ a1 R ⊂ ⋅ ⋅ ⋅ ⊂ an R ⊂ ⋅ ⋅ ⋅ .

From our hypothesis on R, this must become stationary, contradicting the argument
above that the inclusion is proper. Therefore, a must be a product of irreducibles.

Theorem 3.4.2. Each principal ideal domain R is a unique factorization domain.

Proof. Suppose that R is a principal ideal domain. R satisfies property (C) by Theo-
rem 3.2.7(1). Therefore, to show that it is a unique factorization domain, we must show
that it also satisfies property (A). From the previous theorem, it suffices to show that
each ascending chain of principal ideals becomes stationary. Consider such an ascend-
ing chain

a1 R ⊂ a2 R ⊂ ⋅ ⋅ ⋅ ⊂ an R ⊂ ⋅ ⋅ ⋅ .

Now let

I = ⋃ ai R.
i=1
3.4 Principal Ideal Domains and Unique Factorization � 41

Now I is an ideal in R; hence a principal ideal. Therefore, I = aR for some a ∈ R. Since


I is a union, there exists an m such that a ∈ am R. Therefore, I = aR ⊂ am R and, hence,
I = am R, and ai R ⊂ am R for all i ≥ m. Therefore, the chain becomes stationary and, from
Theorem 3.4.1, R satisfies property (A).
Since we showed that the integers ℤ are a PID, we can recover the fundamental
theorem of arithmetic from Theorem 3.4.2. We now present another important example
of a PID; hence a UFD. In the next chapter, we will look in detail at polynomials with
coefficients in an integral domain. Below, we consider polynomials with coefficients in
a field, and for the present leave out many of the details.
If K is a field and n is a nonnegative integer, then a polynomial of degree n over K is
a formal sum of the form

P(x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n

with ai ∈ K for i = 0, . . . , n, an ≠ 0, and x an indeterminate. A polynomial P(x) over K


is either a polynomial of some degree or the expression P(x) = 0, which is called the
zero polynomial, and has degree −∞. We denote the degree of P(x) by deg P(x). A poly-
nomial of zero degree has the form P(x) = a0 and is called a constant polynomial, and
can be identified with the corresponding element of K. The elements ai ∈ K are called
the coefficients of P(x); an is the leading coefficient. If an = 1, P(x) is called a monic poly-
nomial. Two nonzero polynomials are equal if and only if they have the same degree
and exactly the same coefficients. A polynomial of degree 1 is called a linear polynomial,
whereas one of degree two is a quadratic polynomial.
We denote by K[x] the set of all polynomials over K, and we will show that K[x]
becomes a principal ideal domain; hence a unique factorization domain. We first de-
fine addition, subtraction, and multiplication on K[x] by algebraic manipulation. That
is, suppose P(x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n , Q(x) = b0 + b1 x + ⋅ ⋅ ⋅ + bm x m , then

P(x) ± Q(x) = (a0 ± b0 ) + (a1 ± b1 )x + ⋅ ⋅ ⋅ ;

that is, the coefficient of x i in P(x) ± Q(x) is ai ± bi , where ai = 0 for i > n, and bj = 0 for
j > m. Multiplication is given by

P(x)Q(x) = (a0 b0 ) + (a1 b0 + a0 b1 )x + (a0 b2 + a1 b1 + a2 b0 )x 2 + ⋅ ⋅ ⋅ + (an bm )x n+m ;

that is, the coefficient of x i in P(x)Q(x) is (a0 bi + a1 bi−1 + ⋅ ⋅ ⋅ + ai b0 ).

Example 3.4.3. Let P(x) = 3x 2 + 4x − 6 and Q(x) = 2x + 7 be in ℚ[x]. Then

P(x) + Q(x) = 3x 2 + 6x + 1

and

P(x)Q(x) = (3x 2 + 4x − 6)(2x + 7) = 6x 3 + 29x 2 + 16x − 42.


42 � 3 Prime Elements and Unique Factorization Domains

From the definitions, the following degree relationships are clear. The proofs are in
the exercises.

Lemma 3.4.4. Let 0 ≠ P(x), 0 ≠ Q(x) in K[x]. Then the following hold:
(1) deg P(x)Q(x) = deg P(x) + deg Q(x).
(2) deg(P(x) ± Q(x)) ≤ max(deg P(x), deg Q(x)) if P(x) ± Q(x) ≠ 0.

We next obtain the following:

Theorem 3.4.5. If K is a field, then K[x] forms an integral domain. K can be naturally
embedded into K[x] by identifying each element of K with the corresponding constant
polynomial. The only units in K[x] are the nonzero elements of K.

Proof. Verification of the basic ring properties is solely computational and is left to the
exercises. Since deg P(x)Q(x) = deg P(x) + deg Q(x), it follows that if neither P(x) ≠ 0,
nor Q(x) ≠ 0, then P(x)Q(x) ≠ 0 and, therefore, K[x] is an integral domain.
If G(x) is a unit in K[x], then there exists an H(x) ∈ K[x] with G(x)H(x) = 1. From
the degrees, we have deg G(x) + deg H(x) = 0, and since deg G(x) ≥ 0, deg H(x) ≥ 0. This
is possible only if deg G(x) = deg H(x) = 0. Therefore, G(x) ∈ K.

Now that we have K[x] as an integral domain, we proceed to show that K[x] is a
principal ideal domain and, hence, there is unique factorization into primes.
We first repeat the definition of a prime in K[x]. If 0 ≠ f (x) has no nontrivial,
nonunit factors (it cannot be factorized into polynomials of lower degree), then f (x) is
a prime in K[x] or a prime polynomial. A prime polynomial is also called an irreducible
polynomial. Clearly, if deg g(x) = 1, then g(x) is irreducible.
The fact that K[x] is a principal ideal domain follows from the division algorithm
for polynomials, which is entirely analogous to the division algorithm for integers.

Lemma 3.4.6 (Division algorithm in K[x]). If 0 ≠ f (x), 0 ≠ g(x) ∈ K[x], then there exist
unique polynomials q(x), r(x) ∈ K[x] such that f (x) = q(x)g(x) + r(x), where r(x) = 0 or
deg r(x) < deg g(x).
(The polynomials q(x) and r(x) are called, respectively, the quotient and remainder.)

We give a formal proof in Chapter 4 on polynomials and polynomial rings. For now
we content ourselves here with doing two computations in ℚ[x] in the following exam-
ple.

Example 3.4.7. (1) Let f (x) = 3x 4 − 6x 2 + 8x − 6, g(x) = 2x 2 + 4. Then

3x 4 − 6x 2 + 8x − 6 3 2
= x −6 with remainder 8x + 18.
2x 2 + 4 2

Thus, here, q(x) = 32 x 2 − 6, r(x) = 8x + 18.


3.5 Euclidean Domains � 43

(2) Let f (x) = 2x 5 + 2x 4 + 6x 3 + 10x 2 + 4x, g(x) = x 2 + x. Then

2x 5 + 2x 4 + 6x 3 + 10x 2 + 4x
= 2x 3 + 6x + 4.
x2 + x

Thus, here, q(x) = 2x 3 + 6x + 4, and r(x) = 0.

Theorem 3.4.8. Let K be a field. Then the polynomial ring K[x] is a principal ideal do-
main; hence a unique factorization domain.

Proof. The proof is essentially analogous to the proof in the integers. Let I be an ideal
in K[x] with I ≠ K[x]. Let f (x) be a polynomial in I of minimal degree. We claim that
I = ⟨f (x)⟩, the principal ideal generated by f (x). Let g(x) ∈ I. We must show that g(x) is
a multiple of f (x). By the division algorithm in K[x], we have

g(x) = q(x)f (x) + r(x),

where r(x) = 0, or deg(r(x)) < deg(f (x)). If r(x) ≠ 0, then deg(r(x)) < deg(f (x)). How-
ever, r(x) = g(x)−q(x)f (x) ∈ I since I is an ideal, and g(x), f (x) ∈ I. This is a contradiction
since f (x) was assumed to be a polynomial in I of minimal degree. Therefore, r(x) = 0
and, hence, g(x) = q(x)f (x) is a multiple of f (x). Therefore, each element of I is a multi-
ple of f (x) and, hence, I = ⟨f (x)⟩.
Therefore, K[x] is a principal ideal domain and, from Theorem 3.4.2, a unique fac-
torization domain.

We proved that in a principal ideal domain, every ascending chain of ideals becomes
stationary. In general, a ring R (commutative or not) satisfies the ascending chain con-
dition or ACC if every ascending chain of left (or right) ideals in R becomes stationary.
A ring satisfying the ACC is called a Noetherian ring.

3.5 Euclidean Domains


In analyzing the proof of unique factorization in both ℤ and K[x], it is clear that it de-
pends primarily on the division algorithm. In ℤ, the division algorithm depended on
the fact that the positive integers could be ordered, and in K[x], on the fact that the de-
grees of nonzero polynomials are nonnegative integers and, hence, could be ordered.
This basic idea can be generalized in the following way.

Definition 3.5.1. An integral domain D is a Euclidean domain if there exists a function


N from D⋆ = D \ {0} to the nonnegative integers such that:
(1) N(r1 ) ≤ N(r1 r2 ) for any r1 , r2 ∈ D⋆ .
(2) For all r1 , r2 ∈ D with r1 ≠ 0, there exist q, r ∈ D such that

r2 = qr1 + r,
44 � 3 Prime Elements and Unique Factorization Domains

where either r = 0, or N(r) < N(r1 ).

The function N is called a Euclidean norm on D.

Therefore, Euclidean domains are precisely those integral domains, which allow
division algorithms. In the integers ℤ, define N(z) = |z|. Then N is a Euclidean norm on
ℤ and, hence, ℤ is a Euclidean domain. On K[x], define N(p(x)) = deg(p(x)) if p(x) ≠ 0.
Then N is also a Euclidean norm on K[x] so that K[x] is also a Euclidean domain. In any
Euclidean domain, we can mimic the proofs of unique factorization in both ℤ and K[x]
to obtain the following:

Theorem 3.5.2. Every Euclidean domain is a principal ideal domain; hence a unique fac-
torization domain.

Before proving this theorem, we must develop some results on the number theory
of general Euclidean domains. First, some properties of the norm.

Lemma 3.5.3. If R is a Euclidean domain then the following hold:


(a) N(1) is minimal among {N(r) : r ∈ R⋆ }.
(b) N(u) = N(1) if and only if u is a unit.
(c) N(a) = N(b) for a, b ∈ R⋆ if a, b are associates.
(d) N(a) < N(ab) unless b is a unit.

Proof. (a) From property (1) of Euclidean norms, we have

N(1) ≤ N(1 ⋅ r) = N(r) for any r ∈ R⋆ .

(b) Suppose u is a unit. Then there exists u−1 with u ⋅ u−1 = 1. Then

N(u) ≤ N(u ⋅ u−1 ) = N(1).

From the minimality of N(1), it follows that N(u) = N(1).


Conversely, suppose N(u) = N(1). Apply the division algorithm to get

1 = qu + r.

If r ≠ 0, then N(r) < N(u) = N(1), contradicting the minimality of N(1). Therefore, r = 0,
and 1 = qu. Then u has a multiplicative inverse and, hence, is a unit.
(c) Suppose a, b ∈ R⋆ are associates. Then a = ub with u a unit. Then

N(b) ≤ N(ub) = N(a).

On the other hand, b = u−1 a. Therefore,

N(a) ≤ N(u−1 a) = N(b).


3.5 Euclidean Domains � 45

Since N(a) ≤ N(b), and N(b) ≤ N(a), it follows that N(a) = N(b).
(d) Suppose N(a) = N(ab). Apply the division algorithm

a = q(ab) + r,

where r = 0, or N(r) < N(ab). If r ≠ 0, then

r = a − qab = a(1 − qb) 󳨐⇒ N(ab) = N(a) ≤ N(a(1 − qb)) = N(r),

contradicting that N(r) < N(ab). Hence, r = 0, and a = q(ab) = (qb)a. Then

a = (qb)a = 1 ⋅ a 󳨐⇒ qb = 1

since there are no zero divisors in an integral domain. Hence, b is a unit.


Since N(a) ≤ N(ab), it follows that if b is not a unit, we must have N(a) < N(ab).

We can now prove Theorem 3.5.2.

Proof. Let D be a Euclidean domain. We show that each ideal I ≠ D in D is principal.


Let I ≠ D be an ideal in D. If I = {0}, then I = ⟨0⟩, and I is principal. Therefore, we
may assume that there are nonzero elements in I. Hence, there are elements x ∈ I with
strictly positive norm. Let a be an element of I of minimal norm. We claim that I = ⟨a⟩.
Let b ∈ I. We must show that b is a multiple of a. Now by the division algorithm

b = qa + r,

where either r = 0, or N(r) < N(a). As in ℤ and K[x], we have a contradiction if r ≠ 0. In


this case, N(r) < N(a), but r = b − qa ∈ I since I is an ideal, contradicting the minimality
of N(a). Therefore, r = 0, and b = qa and, hence, I = ⟨a⟩.

As a final example of a Euclidean domain, we consider the Gaussian integers

ℤ[i] = {a + bi : a, b ∈ ℤ}.

It was first observed by Gauss that this set permits unique factorization. To show this,
we need a Euclidean norm on ℤ[i].

Definition 3.5.4. If z = a + bi ∈ ℤ[i], then its norm N(z) is defined by

N(a + bi) = a2 + b2 .

The basic properties of this norm follow directly from the definition (see exercises).

Lemma 3.5.5. If α, β ∈ ℤ[i] then we have the following:


(1) N(α) is an integer for all α ∈ ℤ[i].
(2) N(α) ≥ 0 for all α ∈ ℤ[i].
46 � 3 Prime Elements and Unique Factorization Domains

(3) N(α) = 0 if and only if α = 0.


(4) N(α) ≥ 1 for all α ≠ 0.
(5) N(αβ) = N(α)N(β); that is, the norm is multiplicative.

From the multiplicativity of the norm, we have the following concerning primes
and units in ℤ[i].

Lemma 3.5.6. (1) u ∈ ℤ[i] is a unit if and only if N(u) = 1.


(2) If π ∈ ℤ[i] and N(π) = p, where p is an ordinary prime in ℤ, then π is a prime in ℤ[i].

Proof. Certainly u is a unit if and only if N(u) = N(1). But in ℤ[i], we have N(1) = 1.
Therefore, the first part follows.
Suppose next that π ∈ ℤ[i] with N(π) = p for some p ∈ ℤ. Suppose that π = π1 π2 .
From the multiplicativity of the norm, we have

N(π) = p = N(π1 )N(π2 ).

Since each norm is a positive ordinary integer, and p is a prime, it follows that either
N(π1 ) = 1, or N(π2 ) = 1. Hence, either π1 or π2 is a unit. Therefore, π is a prime in ℤ[i].

Armed with this norm, we can show that ℤ[i] is a Euclidean domain.

Theorem 3.5.7. The Gaussian integers ℤ[i] form a Euclidean domain.

Proof. That ℤ[i] forms a commutative ring with an identity can be verified directly and
easily. If αβ = 0, then N(α)N(β) = 0, and since there are no zero divisors in ℤ, we must
have N(α) = 0, or N(β) = 0. But then either α = 0, or β = 0 and, hence, ℤ[i] is an integral
domain. To complete the proof, we show that the norm N is a Euclidean norm.
From the multiplicativity of the norm, we have, if α, β ≠ 0

N(αβ) = N(α)N(β) ≥ N(α) since N(β) ≥ 1.

Therefore, property (1) of Euclidean norms is satisfied. We must now show that the di-
vision algorithm holds.
Let α = a + bi and β = c + di be Gaussian integers. Recall that the inverse for a
nonzero complex number z = x + iy is

1 z x − iy
= = .
z |z|2 x 2 + y2

Therefore, as a complex number

α β c − di
= α 2 = (a + bi) 2
β |β| c + d2
ac + bd ac − bd
= 2 + 2 i = u + iv.
c + d2 c + d2
3.5 Euclidean Domains � 47

Now since a, b, c, d are integers u, v must be rationals. The set

{u + iv : u, v ∈ ℚ}

is called the set of the Gaussian rationals.


If u, v ∈ ℤ, then u + iv ∈ ℤ[i], α = qβ with q = u + iv, and we are done. Otherwise,
choose ordinary integers m, n satisfying |u − m| ≤ 21 and |v − n| ≤ 21 , and let q = m + in.
Then q ∈ ℤ[i]. Let r = α − qβ. We must show that N(r) < N(β).
Working with complex absolute value, we get
󵄨󵄨 α 󵄨󵄨
|r| = |α − qβ| = |β|󵄨󵄨󵄨 − q󵄨󵄨󵄨.
󵄨 󵄨
󵄨󵄨 β 󵄨󵄨

Now

󵄨󵄨 α 2 2
󵄨󵄨 1 1
󵄨󵄨 − q󵄨󵄨󵄨 = 󵄨󵄨󵄨(u − m) + i(v − n)󵄨󵄨󵄨 = √(u − m)2 + (v − n)2 ≤ √( ) + ( ) < 1.
󵄨󵄨 󵄨 󵄨 󵄨
󵄨󵄨 β 󵄨󵄨 2 2

Therefore,

|r| < |β| 󳨐⇒ |r|2 < |β|2 󳨐⇒ N(r) < N(β),

completing the proof.

Since ℤ[i] forms a Euclidean domain, it follows from our previous results that ℤ[i]
must be a principal ideal domain; hence a unique factorization domain.

Corollary 3.5.8. The Gaussian integers are a UFD.

Since we will now be dealing with many kinds of integers, we will refer to the ordi-
nary integers ℤ as the rational integers and the ordinary primes p as the rational primes.
It is clear that ℤ can be embedded into ℤ[i]. However, not every rational prime is also
prime in ℤ[i]. The primes in ℤ[i] are called the Gaussian primes. For example, we can
show that both 1 + i and 1 − i are Gaussian primes; that is, primes in ℤ[i]. However,
(1 + i)(1 − i) = 2. Therefore, the rational prime 2 is not a prime in ℤ[i]. Using the multi-
plicativity of the Euclidean norm in ℤ[i], we can describe all the units and primes in ℤ[i].

Theorem 3.5.9. (1) The only units in ℤ[i] are ±1, ±i.
(2) Suppose π is a Gaussian prime. Then π is one of the following:
(a) a positive rational prime p ≡ 3 (mod 4), or an associate of such a rational prime.
(b) 1 + i, or an associate of 1 + i.
(c) a + bi, or a − bi, where a > 0, b > 0, a is even, and N(π) = a2 + b2 = p with p a
rational prime congruent to 1 modulo 4, or an associate of a + bi, or a − bi.

Proof. (1) Suppose u = x + iy ∈ ℤ[i] is a unit. Then, from Lemma 3.5.6, N(u) = x 2 + y2 = 1,
implying that (x, y) = (0, ±1) or (x, y) = (±1, 0). Hence, u = ±1 or u = ±i.
48 � 3 Prime Elements and Unique Factorization Domains

(2) Now suppose that π is a Gaussian prime. Since N(π) = ππ, and π ∈ ℤ[i], it
follows that π|N(π). N(π) is a rational integer, so N(π) = p1 ⋅ ⋅ ⋅ pk , where the pi ’s are
rational primes. By Euclid’s lemma π|pi for some pi and, hence, a Gaussian prime must
divide at least one rational prime. On the other hand, suppose π|p and π|q, where p, q
are different primes. Then (p, q) = 1 and, hence, there exist x, y ∈ ℤ such that 1 = px +qy.
It follows that π|1 is a contradiction. Therefore, a Gaussian prime divides one and only
one rational prime.
Let p be the rational prime that π divides. Then N(π)|N(p) = p2 . Since N(π) is a
rational integer, it follows that N(π) = p, or N(π) = p2 . If π = a + bi, then a2 + b2 = p, or
a 2 + b2 = p2 .
If p = 2, then a2 + b2 = 2, or a2 + b2 = 4. It follows that π = ±2, ±2i, or π = 1 + i, or an
associate of 1 + i. Since (1 + i)(1 − i) = 2, and neither 1 + i, nor 1 − i are units, it follows that
neither 2, nor any of its associates are primes. Then π = 1 + i, or an associate of 1 + i. To
see that 1 + i is prime supposes 1 + i = αβ. Then N(1 + i) = 2 = N(α)N(β). It follows that
either N(α) = 1, or N(β) = 1, and either α or β is a unit.
If p ≠ 2, then either p ≡ 3 (mod 4), or p ≡ 1 (mod 4). First suppose p ≡ 3 (mod 4).
Then a2 + b2 = p would imply (Fermat’s two-square theorem, see [53]) p ≡ 1 (mod 4).
Therefore, from the remarks above a2 + b2 = p2 , and N(π) = N(p). Since π|p, we have
π = αp with α ∈ ℤ[i]. From N(π) = N(p), we get that N(α) = 1, and α is a unit. Therefore,
π and p are associates. Hence, in this case, π is an associate of a rational prime congruent
to 3 modulo 4.
Finally, suppose p ≡ 1 (mod 4). From the remarks above, either N(π) = p, or
N(π) = p2 . If N(π) = p2 , then a2 + b2 = p2 . Since p ≡ 1 (mod 4), from Fermat’s two square
theorem, there exist m, n ∈ ℤ with m2 + n2 = p. Let u = m + in, then the norm N(u) = p.
Since p is a rational prime, it follows that u is a Gaussian prime. Similarly, its conjugate
u is also a Gaussian prime. Now uu|p2 = N(π). Since π|N(π), it follows that π|uu, and
from Euclid’s lemma, either π|u, or π|u. If π|u, they are associates since both are primes.
But this is a contradiction since N(π) ≠ N(u). The same is true if π|u.
It follows that if p ≡ 1 (mod 4), then N(π) ≠ p2 . Therefore, N(π) = p = a2 + b2 . An
associate of π has both a, b > 0 (see exercises). Furthermore, since a2 + b2 = p, one of
a or b must be even. If a is odd, then b is even; then iπ is an associate of π with a even,
completing the proof.

Finally, we mention that the methods used in ℤ[i] cannot be applied to all quadratic
integers. For example, we have seen that there is not unique factorization in ℤ[√−5].

3.6 Overview of Integral Domains


Here we present some additional definitions for special types of integral domains.

Definition 3.6.1. (1) A Dedekind domain D is an integral domain such that each nonzero
proper ideal A ({0} ≠ A ≠ R) can be written uniquely as a product of prime ideals
3.7 Exercises � 49

A = P1 ⋅ ⋅ ⋅ Pr

with each Pi being a prime ideal and the factorization being unique up to ordering.
(2) A Prüfer ring R is an integral domain such that

A ⋅ (B ∩ C) = AB ∩ AC

for all ideals A, B, C in R.

Dedekind domains arise naturally in algebraic number theory. It can be proved that
the rings of algebraic integers in any algebraic number field are Dedekind domains
(see [53]). If R is a Dedekind domain, it is also a Prüfer Ring. If R is a Prüfer ring and
a unique factorization domain, then R is a principal ideal domain. In the next chapter,
we will prove a Gaussian theorem which states that if R is a UFD, then the polynomial
ring R[x] is also a UFD. If K is a field, we have already seen that K[x] is a UFD. Hence,
the polynomial ring in several variables K[x1 , . . . , xn ] is also a UFD. This fact plays an
important role in algebraic geometry.

3.7 Exercises
1. Let R be an integral domain, and let π ∈ R \ (U(R) ∪ {0}). Show the following:
(i) If for each a ∈ R with π ∤ a, there exist λ, μ ∈ R with λπ + μa = 1, then π is a
prime element of R.
(ii) Give an example for a prime element π in a UFD R, which does not satisfy the
conditions of (i).
2. Let R be a UFD, and let a1 , . . . , at be pairwise coprime elements of R. If a1 ⋅ ⋅ ⋅ at is an
m-th power (m ∈ ℕ), then all factors ai are associates of an m-th power. Is each ai
necessarily an m-th power?
3. Decide if the unit group of ℤ[k] = {a + b√k : a, b ∈ ℤ}, k = 3, 5, 7, is finite or infinite.
For which a ∈ ℤ are (1 − √5) and (a + √5) associates in ℤ[√5]?
4. Let k ∈ ℤ and k ≠ x 2 for all x ∈ ℤ. Let α = a + b√k and β = c + d √k be elements of
ℤ[√k], and N(α) = a2 − kb2 , N(β) = c2 − kd 2 . Show the following:
(i) The equality of the absolute values of N(α) and N(β) is necessary for the asso-
ciation of α and β in ℤ[√k]. Is this constraint also sufficient?
(ii) Sufficient for the irreducibility of α in ℤ[√k] is the irreducibility of N(α) in ℤ.
Is this also necessary?
5. In general irreducible elements are not prime. Consider the set of complex number
given by

R = ℤ[i√5] = {x + iy√5 : x, y ∈ ℤ}.

Show that they form a subring of ℂ.


50 � 3 Prime Elements and Unique Factorization Domains

6. For an element x + iy√5 ∈ R define its norm by

N(x + iy√5) = 󵄨󵄨󵄨x + iy√5󵄨󵄨󵄨 = x 2 + 5y2 .


󵄨 󵄨

Prove that the norm is multiplicative, that is N(ab) = N(a)N(b).


7. Prove Lemma 3.4.4.
8. Prove that the set of polynomials R[x] with coefficients in a ring R forms a ring.
9. Prove the basic properties of the norm of the Gaussian integers. If α, β ∈ ℤ[i], then:
(i) N(α) is an integer for all α ∈ ℤ[i].
(ii) N(α) ≥ 0 for all α ∈ ℤ[i].
(iii) N(α) = 0 if and only if α = 0.
(iv) N(α) ≥ 1 for all α ≠ 0.
(v) N(αβ) = N(α)N(β), that is the norm is multiplicative.
4 Polynomials and Polynomial Rings
4.1 Degrees, Reducibility and Roots
In the last chapter, we saw that if K is a field, then the set of polynomials with coefficients
in K, which we denoted K[x], forms a unique factorization domain. In this chapter, we
take a more detailed look at polynomials over a general ring R. We then prove that if
R is a UFD, then the polynomial ring R[x] is also a UFD. We first take a formal look at
polynomials.
Let R be a commutative ring with an identity. Consider the set R̃ of functions f from
the nonnegative integers N = ℕ∪{0} into R with only a finite number of values nonzero.
That is,

R̃ = { f : N → R : f (n) ≠ 0 for only finitely many n}.

On R,̃ we define the following addition and multiplication:

(f + g)(n) = f (n) + g(n)


(f ⋅ g)(n) = ∑ f (i)g(j).
i+j=n

If we let x = (0, 1, 0, . . .) and identify (r, 0, . . .) with r ∈ R, then

x 0 = (1, 0, . . .) = 1, and x i+1 = x ⋅ x i .

Now if f = (r0 , r1 , r2 , . . .), then f can be written as

∞ m
f = ∑ ri x i = ∑ ri x i
i=0 i=0

for some m ≥ 0 since ri ≠ 0 for only finitely many i. Furthermore, this presenta-
tion is unique. We now call x an indeterminate over R, and write each element of R̃ as
f (x) = ∑m i
i=0 ri x with f (x) = 0 or rm ≠ 0. We also now write R[x] for R. Each element of
̃
R[x] is called a polynomial over R. The elements r0 , . . . , rm are called the coefficients of
f (x) with rm the leading coefficient. If rm ≠ 0, the non-negative integer m is called the
degree of f (x), which we denote by deg f (x). We say that f (x) = 0 has degree −∞. The
uniqueness of the representation of a polynomial implies that two nonzero polynomi-
als are equal if and only if they have the same degree and exactly the same coefficients.
A polynomial of degree 1 is called a linear polynomial, whereas one of degree two is a
quadratic polynomial. The set of polynomials of degree 0, together with 0, form a ring
isomorphic to R and, hence, can be identified with R, the constant polynomials. Thus, the
ring R embeds in the set of polynomials R[x]. The following results are straightforward
concerning degree:

https://doi.org/10.1515/9783111142524-004
52 � 4 Polynomials and Polynomial Rings

Lemma 4.1.1. Let f (x) ≠ 0, g(x) ≠ 0 ∈ R[x]. Then the following hold:
(a) deg f (x)g(x) ≤ deg f (x) + deg g(x).
(b) deg(f (x) ± g(x)) ≤ max(deg f (x), deg g(x)).

If R is an integral domain, then we have equality in (a).

Theorem 4.1.2. Let R be a commutative ring with an identity. Then the set of polynomials
R[x] forms a ring called the ring of polynomials over R. The ring R identified with 0 and the
polynomials of degree 0 naturally embeds into R[x]. R[x] is commutative. Furthermore,
R[x] is uniquely determined by R and x.

Proof. Set f (x) = ∑ni=0 ri x i and g(x) = ∑m j


j=0 sj x . The ring properties follow directly by
computation. The identification of r ∈ R with the polynomial r(x) = r provides the em-
bedding of R into R[x]. From the definition of multiplication in R[x], if R is commutative,
then R[x] is commutative. Note that if R has a multiplicative identity 1 ≠ 0, then this is
also the multiplicative identity of R[x].
Finally, if S is a ring that contains R and α ∈ S, then

R[α] = {∑ ri αi : ri ∈ R, and ri ≠ 0 for only a finite number of i}


i≥0

is a homomorphic image of R[x] via the map

∑ ri x i 󳨃→ ∑ ri αi .
i≥0 i≥0

Hence, R[x] is uniquely determined by R and x. We remark that R[α] must be commuta-
tive.

If R is an integral domain, then irreducible polynomials are defined as irreducibles


in the ring R[x]. If R is a field, then f (x) is an irreducible polynomial if there is no fac-
torization f (x) = g(x)h(x), where g(x) and h(x) are polynomials of lower degree than
f (x). Otherwise, f (x) is called reducible. In elementary mathematics, polynomials are
considered as functions. We recover that idea via the concept of evaluation.

Definition 4.1.3. Let f (x) = r0 + r1 x + ⋅ ⋅ ⋅ + rm x n be a polynomial over a commutative


ring R with an identity, and let c ∈ R. Then the element

f (c) = r0 + r1 c + ⋅ ⋅ ⋅ + rn cn ∈ R

is called the evaluation of f (x) at c.

Definition 4.1.4. If f (x) ∈ R[x] and f (c) = 0 for c ∈ R, then c is called a zero or a root of
f (x) in R.
4.2 Polynomial Rings over Fields � 53

4.2 Polynomial Rings over Fields


We now restate some of the result of the last chapter for K[x], where K is a field. We
then consider some consequences of these results to zeros of polynomials.

Theorem 4.2.1. If K is a field, then K[x] forms an integral domain. K can be naturally
embedded into K[x] by identifying each element of K with the corresponding constant
polynomial. The only units in K[x] are the nonzero elements of K.

Proof. Verification of the basic ring properties is solely computational and is left to the
exercises. Since deg P(x)Q(x) = deg P(x) + deg Q(x), it follows that if neither P(x) ≠ 0,
nor Q(x) ≠ 0, then P(x)Q(x) ≠ 0. Therefore, K[x] is an integral domain.
If G(x) is a unit in K[x], then there exists an H(x) ∈ K[x] with G(x)H(x) = 1.
From the degrees, we have deg G(x) + deg H(x) = 0, and since deg G(x) ≥ 0,
deg H(x) ≥ 0. This is possible only if deg G(x) = deg H(x) = 0. Therefore, G(x) ∈ K.

Now that we have K[x] as an integral domain, we proceed to show that K[x] is a
principal ideal domain and, hence, there is unique factorization into primes. We first
repeat the definition of a prime in K[x]. If 0 ≠ f (x) has no nontrivial, nonunit factors (it
cannot be factorized into polynomials of lower degree), then f (x) is a prime in K[x] or
a prime polynomial. A prime polynomial is also called an irreducible polynomial over K.
Clearly, if deg g(x) = 1, then g(x) is irreducible.
The fact that K[x] is a principal ideal domain follows from the division algorithm
for polynomials, which is entirely analogous to the division algorithm for integers.

Theorem 4.2.2 (Division algorithm in K[x]). If 0 ≠ f (x), 0 ≠ g(x) ∈ K[x], then there exist
unique polynomials q(x), r(x) ∈ K[x] such that f (x) = q(x)g(x) + r(x), where r(x) = 0, or
deg r(x) < deg g(x). (The polynomials q(x) and r(x) are called respectively the quotient
and remainder.)

Proof. If deg f (x) = 0 and deg g(x) ≥ 1, then we just choose q(x) = 0, and r(x) = f (x). If
deg f (x) = 0 = deg g(x), then f (x) = f ∈ K, and g(x) = g ∈ K, and we choose q(x) = gf and
r(x) = 0. Hence, Theorem 4.2.2 is proved for deg f (x) = 0, also certainly the uniqueness
statement.
Now, let n > 0 and Theorem 4.2.2 be proved for all f (x) ∈ K[x] with deg f (x) < n.
Now, given

f (x) = an x n + an−1 x n−1 + ⋅ ⋅ ⋅ + a1 x + a0 , with an ≠ 0, and


m m−1
g(x) = bm x + bm−1 x + ⋅ ⋅ ⋅ + b1 x + b0 , with bm ≠ 0, m ≥ 0.

If m > n, then just choose q(x) = 0 and r(x) = f (x).


Now, finally, let 0 ≤ m ≤ n. We define

an n−m
h(x) = f (x) − x g(x).
bm
54 � 4 Polynomials and Polynomial Rings

We have deg h(x) < n. Hence, by induction assumption, there are q1 (x) and r(x) with
h(x) = q1 (x)g(x) + r(x) and deg r(x) < deg g(x). Then
an n−m
f (x) = h(x) + x g(x)
bm
an n−m
=( x + q1 (x))g(x) + r(x)
bm
an n−m
= q(x)g(x) + r(x) with q(x) = x + q1 (x),
bm

which proves the existence.


We now show the uniqueness. Let

f (x) = q1 (x)g(x) + r1 (x)


= q2 (x)g(x) + r2 (x),

with

deg r1 (x) < deg g(x), and deg r2 (x) < deg g(x).

Assume r1 (x) ≠ r2 (x). Let deg r1 (x) ≥ deg r2 (x). We get

(q2 (x) − q1 (x))g(x) = r1 (x) − r2 (x),

which gives a contradiction because deg(r1 (x) − r2 (x)) < deg g(x), and q2 (x) − q1 (x) ≠ 0
if r1 (x) ≠ r2 (x). Therefore, r1 (x) = r2 (x), and furthermore q1 (x) = q2 (x) because K[x] is
an integral domain.

Example 4.2.3. Let f (x) = 2x 3 + x 2 − 5x + 3, g(x) = x 2 + x + 1. Then

2x 3 + x 2 − 5x + 3
= 2x − 1 with remainder −6x + 4.
x2 + x + 1
Hence, q(x) = 2x − 1, r(x) = −6x + 4, and

2x 3 + x 2 − 5x + 3 = (2x − 1)(x 2 + x + 1) + (−6x + 4).

Theorem 4.2.4. Let K be a field. Then the polynomial ring K[x] is a principal ideal domain,
and hence a unique factorization domain.

We now give some consequences relative to zeros of polynomials in K[x].

Theorem 4.2.5. If f (x) ∈ K[x] and c ∈ K with f (c) = 0, then

f (x) = (x − c)h(x),

where deg h(x) < deg f (x).


4.3 Polynomial Rings over Integral Domains � 55

Proof. Divide f (x) by x − c. Then by the division algorithm, we have

f (x) = (x − c)h(x) + r(x),

where r(x) = 0, or deg r(x) < deg(x − c) = 1. Hence, if r(x) ≠ 0, then r(x) is a polynomial
of degree 0, that is, a constant polynomial, and thus r(x) = r for r ∈ K. Hence, we have

f (x) = (x − c)h(x) + r.

This implies that

0 = f (x) = 0h(c) + r = r

and, therefore, r = 0, and f (x) = (x − c)h(x). Since deg(x − c) = 1, we must have that
deg h(x) < deg f (x).

If f (x) = (x − c)k h(x) for some k ≥ 1 with h(c) ≠ 0, then c is called a zero of order k.

Theorem 4.2.6. Let f (x) ∈ K[x] with degree 2 or 3. Then f is irreducible if and only if f (x)
does not have a zero in K.

Proof. Suppose that f (x) is irreducible of degree 2 or 3. If f (x) has a zero c, then from
Theorem 4.2.5, we have f (x) = (x − c)h(x) with h(x) of degree 1 or 2. Therefore, f (x) is
reducible a contradiction and, hence, f (x) cannot have a zero.
From Theorem 4.2.5, if f (x) has a zero and is of degree greater than 1, then f (x) is
reducible.
If f (x) is reducible, then f (x) = g(x)h(x) with deg g(x) = 1 and, hence, f (x) has a
zero in K.

4.3 Polynomial Rings over Integral Domains


Here we consider R[x] where R is an integral domain.

Definition 4.3.1. Let R be an integral domain. Then a1 , a2 , . . . , an ∈ R are coprime if the


set of all common divisors of a1 , a2 , . . . , an consists only of units.

Notice, for example, that this concept depends on the ring R. For example, 6 and 9
are not coprime over the integers ℤ since 3|6 and 3|9 and 3 is not a unit. However, 6 and
9 are coprime over the rationals ℚ. Here, 3 is a unit.

Definition 4.3.2. Let f (x) = ∑ni=0 ri x i ∈ R[x], where R is an integral domain. Then f (x) is
a primitive polynomial or just primitive if r0 , r1 , . . . , rn are coprime in R.

Theorem 4.3.3. Let R be an integral domain. Then the following hold:


(a) The units of R[x] are the units of R.
(b) If p is a prime element of R, then p is a prime element of R[x].
56 � 4 Polynomials and Polynomial Rings

Proof. If r ∈ R is a unit, then since R embeds into R[x], it follows that r is also a unit
in R[x]. Conversely, suppose that h(x) ∈ R[x] is a unit. Then there is a g(x) such that
h(x)g(x) = 1. Hence, deg f (x) + deg g(x) = deg 1 = 0. Since degrees are nonnegative
integers, it follows that deg f (x) = deg g(x) = 0 and, hence, f (x) ∈ R.
Now suppose that p is a prime element of R. Then p ≠ 0, and pR is a prime ideal in R.
We must show that pR[x] is a prime ideal in R[x]. Consider the map

τ : R[x] → (R/pR)[x] given by


n n
τ(∑ ri x i ) = ∑(ri + pR)x i .
i=0 i=0

Then τ is an epimorphism with kernel pR[x]. Since pR is a prime ideal, we know that
R/pR is an integral domain. It follows that (R/pR)[x] is also an integral domain. Hence,
pR[x] must be a prime ideal in R[x], and therefore p is also a prime element of R[x].

Recall that each integral domain R can be embedded into a unique field of frac-
tions K. We can use results on K[x] to deduce some results in R[x].

Lemma 4.3.4. If K is a field, then each nonzero f (x) ∈ K[x] is a primitive.

Proof. Since K is a field, each nonzero element of K is a unit. Therefore, the only com-
mon divisors of the coefficients of f (x) are units and, hence, f (x) ∈ K[x] is primitive.

Theorem 4.3.5. Let R be an integral domain. Then each irreducible f (x) ∈ R[x] of degree
> 0 is primitive.

Proof. Let f (x) be an irreducible polynomial in R[x], and let r ∈ R be a common divisor
of the coefficients of f (x). Then f (x) = rg(x), where g(x) ∈ R[x].
Then deg f (x) = deg g(x) > 0, so g(x) ∉ R. Since the units of R[x] are the units of R,
it follows that g(x) is not a unit in R[x]. Since f (x) is irreducible, it follows that r must
be a unit in R[x] and, hence, r is a unit in R. Therefore, f (x) is primitive.

Theorem 4.3.6. Let R be an integral domain and K its field of fractions. If f (x) ∈ R[x] is
primitive and irreducible in K[x], then f (x) is irreducible in R[x].

Proof. Suppose that f (x) ∈ R[x] is primitive and irreducible in K[x], and suppose that
f (x) = g(x)h(x), where g(x), h(x) ∈ R[x] ⊂ K[x]. Since f (x) is irreducible in K[x], either
g(x) or h(x) must be a unit in K[x]. Without loss of generality, suppose that g(x) is a unit
in K[x]. Then g(x) = g ∈ K. But g(x) ∈ R[x], and K ∩ R[x] = R.
Hence, g ∈ R. Then g is a divisor of the coefficients of f (x), and as f (x) is primitive,
g(x) must be a unit in R and, therefore, also a unit in R[x]. Therefore, f (x) is irreducible
in R[x].
4.4 Polynomial Rings over Unique Factorization Domains � 57

4.4 Polynomial Rings over Unique Factorization Domains


In this section, we prove that if R is a UFD, then the polynomial ring R[x] is also a UFD.
We first need the following due to Gauss:

Theorem 4.4.1 (Gauss’ lemma). Let R be a UFD and f (x), g(x) primitive polynomials
in R[x]. Then their product f (x)g(x) is also primitive.

Proof. Let R be a UFD and f (x), g(x) primitive polynomials in R[x]. Suppose that f (x)g(x)
is not primitive. Then there is a prime element p ∈ R that divides each of the coefficients
of f (x)g(x). Then p|f (x)g(x). Since prime elements of R are also prime elements of R[x],
it follows that p is also a prime element of R[x] and, hence, p|f (x), or p|g(x). Therefore,
either f (x) or g(x) is not primitive, giving a contradiction.

Theorem 4.4.2. Let R be a UFD and K its field of fractions.


(a) If g(x) ∈ K[x] is nonzero, then there is a nonzero a ∈ K such that ag(x) ∈ R[x] is
primitive.
(b) Let f (x), g(x) ∈ R[x] with g(x) primitive and f (x) = ag(x) for some a ∈ K. Then a ∈ R.
(c) If f (x) ∈ R[x] is nonzero, then there is a b ∈ R and a primitive g(x) ∈ R[x] such that
f (x) = bg(x).

r
Proof. (a) Suppose that g(x) = ∑ni=0 ai x i with ai = si , ri , si ∈ R. Set s = s0 s1 ⋅ ⋅ ⋅ sn . Then
i
sg(x) is a nonzero element of R[x]. Let d be a greatest common divisor of the coefficients
of sg(x). If we set a = ds , then ag(x) is primitive.
(b) For a ∈ K, there are coprime r, s ∈ R satisfying a = rs . Suppose that a ∉ R. Then
there is a prime element p ∈ R dividing s. Since g(x) is primitive, p does not divide all the
coefficients of g(x). However, we also have f (x) = ag(x) = rs g(x). Hence, sf (x) = rg(x),
where p|s and p does not divide r. Therefore, p divides all the coefficients of g(x) and,
hence, a ∈ R.
(c) From part (a), there is a nonzero a ∈ K such that af (x) is primitive in R[x]. Then
f (x) = a−1 (af (x)). From part (b), we must have a−1 ∈ R. Set g(x) = af (x) and b = a−1 .

Theorem 4.4.3. Let R be a UFD and K its field of fractions. Let f (x) ∈ R[x] be a polynomial
of degree ≥ 1.
(a) If f (x) is primitive and f (x)|g(x) in K[x], then f (x) divides g(x) also in R[x].
(b) If f (x) is irreducible in R[x], then it is also irreducible in K[x].
(c) If f (x) is primitive and a prime element of K[x], then f (x) is also a prime element
of R[x].

Proof. (a) Suppose that g(x) = f (x)h(x) with h(x) ∈ K[x]. From Theorem 4.4.2 part (a),
there is a nonzero a ∈ K such that h1 (x) = ah(x) is primitive in R[x]. Hence, g(x) =
1
a
(f (x)h1 (x)). From Gauss’ lemma f (x)h1 (x) is primitive in R[x]. Therefore, from Theo-
rem 4.4.2 part (b), we have a1 ∈ R. It follows that f (x)|g(x) in R[x].
58 � 4 Polynomials and Polynomial Rings

(b) Suppose that g(x) ∈ K[x] is a factor of f (x). From Theorem 4.4.2 part (a), there
is a nonzero a ∈ K with g1 (x) = ag(x) primitive in R[x]. Since a is a unit in K, it follows
that

g(x)|f (x) in K[x] implies g1 (x)|f (x) in K[x]

and, hence, since g1 (x) is primitive

g1 (x)|f (x) in R[x].

However, by assumption, f (x) is irreducible in R[x]. This implies that either g1 (x) is a
unit in R, or g1 (x) is an associate of f (x).
If g1 (x) is a unit, then g1 ∈ K, and g1 = ga. Hence, g ∈ K; that is, g = g(x) is a unit.
If g1 (x) is an associate of f (x), then f (x) = bg(x), where b ∈ K since g1 (x) = ag(x)
with a ∈ K. Combining these, it follows that f (x) has only trivial factors in K[x], and
since—by assumption—f (x) is nonconstant, it follows that f (x) is irreducible in K[x].
(c) Suppose that f (x)|g(x)h(x) with g(x), h(x) ∈ R[x]. Since f (x) is a prime element
in K[x], we have that f (x)|g(x) or f (x)|h(x) in K[x]. From part (a), we have f (x)|g(x) or
f (x)|h(x) in R[x] implying that f (x) is a prime element in R[x].

We can now state and prove our main result.

Theorem 4.4.4 (Gauss). Let R be a UFD. Then the polynomial ring R[x] is also a UFD.

Proof. By induction, on degree, we show that each nonunit f (x) ∈ R[x], f (x) ≠ 0, is a
product of prime elements. Since R is an integral domain, so is R[x]. Therefore, the fact
that R[x] is a UFD then follows from Theorem 3.3.3.
If deg f (x) = 0, then f (x) = f is a nonunit in R. Since R is a UFD, f is a product of
prime elements in R. However, from Theorem 4.3.3, each prime factor is then also prime
in R[x]. Therefore, f (x) is a product of prime elements.
Now suppose n > 0 and that the claim is true for all polynomials f (x) of degree < n.
Let f (x) be a polynomial of degree n > 0. From Theorem 4.4.2 (c), there is an a ∈ R and a
primitive h(x) ∈ R[x] satisfying f (x) = ah(x). Since R is a UFD, the element a is a product
of prime elements in R, or a is a unit in R. Since the units in R[x] are the units in R, and a
prime element in R is also a prime element in R[x], it follows that a is a product of prime
elements in R[x], or a is a unit in R[x]. Let K be the field of fractions of R. Then K[x] is a
UFD. Hence, h(x) is a product of prime elements of K[x].
Let p(x) ∈ K[x] be a prime divisor of h(x). From Theorem 4.4.2, we can assume by
multiplication of field elements that p(x) ∈ R[x], and p(x) is primitive.
From Theorem 4.4.2 (c), it follows that p(x) is a prime element of R[x]. Furthermore,
from Theorem 4.4.3 (a), p(x) is a divisor of h(x) in R[x]. Therefore,

f (x) = ah(x) = ap(x)g(x) ∈ R[x],


4.4 Polynomial Rings over Unique Factorization Domains � 59

where the following hold:


(1) a is a product of prime elements of R[x], or a is a unit in R[x],
(2) deg p(x) > 0, since p(x) is a prime element in K[x],
(3) p(x) is a prime element in R[x], and
(4) deg g(x) < deg f (x) since deg p(x) > 0.

By our inductive hypothesis, we have then that g(x) is a product of prime elements in
R[x], or g(x) is a unit in R[x]. Therefore, the claim holds for f (x), and therefore holds for
all f (x) by induction.

If R[x] is a polynomial ring over R, we can form a polynomial ring in a new indeter-
minate y over this ring to form (R[x])[y]. It is straightforward that (R[x])[y] is isomor-
phic to (R[y])[x]. We denote both of these rings by R[x, y] and consider this as the ring
of polynomials in two commuting variables x, y with coefficients in R.
If R is a UFD, then from Theorem 4.4.4, R[x] is also a UFD. Hence, R[x, y] is also a
UFD. Inductively then, the ring of polynomials in n commuting variables R[x1 , x2 , . . . , xn ]
is also a UFD.
Here, the ring R[x1 , . . . , xn ] is inductively given by R[x1 , . . . , xn ] = (R[x1 , . . . , xn−1 ])[xn ]
if n > 2.

Corollary 4.4.5. If R is a UFD, then the polynomial ring in n commuting variables


R[x1 , . . . , xn ] is also a UFD.

We now give a condition for a polynomial in R[x] to have a zero in K[x], where K is
the field of fractions of R.

Theorem 4.4.6. Let R be a UFD and K its field of fractions. Let

f (x) = x n + rn−1 x n−1 + ⋅ ⋅ ⋅ + r0 ∈ R[x].

Suppose that β ∈ K is a zero of f (x). Then β is in R and is a divisor of r0 .

Proof. Let β = rs , where s ≠ 0, and r, s ∈ R and r, s are coprime. Now

r rn r n−1
f ( ) = 0 = n + rn−1 n−1 + ⋅ ⋅ ⋅ + r0 .
s s s

Hence, it follows that s must divide r n . Since r and s are coprime, s must be a unit, and
then, without loss of generality, we may assume that s = 1. Then β ∈ R, and

r(r n−1 + ⋅ ⋅ ⋅ + r1 ) = −r0 ,

and so r|a0 .

Note that since ℤ is a UFD, Gauss’ theorem implies that ℤ[x] is also a UFD. However,
ℤ[x] is not a principal ideal domain. For example, the set of integral polynomials with
even constant term is an ideal, but not principal. We leave the verification to the exer-
60 � 4 Polynomials and Polynomial Rings

cises. On the other hand, we saw that if K is a field, K[x] is a PID. The question arises as
to when R[x] actually is a principal ideal domain. It turns out to be precisely when R is
a field.

Theorem 4.4.7. Let R be a commutative ring with an identity. Then the following are
equivalent:
(a) R is a field.
(b) R[x] is Euclidean.
(c) R[x] is a principal ideal domain.

Proof. From Section 4.2, we know that (a) implies (b), which in turn implies (c). There-
fore, we must show that (c) implies (a). Assume then that R[x] is a principal ideal domain.
Define the map

τ : R[x] → R

by

τ( f (x)) = f (0).

It is easy to see that τ is a ring homomorphism with R[x]/ ker(τ) ≅ R. Therefore, ker(τ) ≠
R[x]. Since R[x] is a principal ideal domain, it is an integral domain. It follows that ker(τ)
must be a prime ideal since the quotient ring is an integral domain. However, since R[x]
is a principal ideal domain, prime ideals are maximal ideals; hence, ker(τ) is a maximal
ideal by Theorem 3.2.7. Therefore, R ≅ R[x]/ ker(τ) is a field.

We now consider the relationship between irreducibles in R[x] for a general integral
domain and irreducibles in K[x], where K is its field of fractions. This is handled by the
next result called Eisenstein’s criterion.

Theorem 4.4.8 (Eisenstein’s criterion). Let R be an integral domain and K its field of frac-
tions. Let f (x) = ∑ni=0 ai x i ∈ R[x] of degree n > 0. Let p be a prime element of R satisfying
the following:
(1) p|ai for i = 0, . . . , n − 1.
(2) p does not divide an .
(3) p2 does not divide a0 .

Then the following hold:


(a) If f (x) is primitive, then f (x) is irreducible in R[x].
(b) Suppose that R is a UFD. Then f (x) is also irreducible in K[x].

Proof. (a) Suppose that f (x) = g(x)h(x) with g(x), h(x) ∈ R[x]. Suppose that
k l
g(x) = ∑ bi x i , bk ≠ 0 and h(x) = ∑ cj x j , cl ≠ 0.
i=0 j=0
4.4 Polynomial Rings over Unique Factorization Domains � 61

Then a0 = b0 c0 . Now p|a0 , but p2 does not divide a0 . This implies that either p does not
divide b0 , or p doesn’t divide c0 . Without loss of generality, assume that p|b0 and p does
not divide c0 .
Since an = bk cl , and p does not divide an , it follows that p does not divide bk . Let bj
be the first coefficient of g(x), which is not divisible by p. Consider

aj = bj c0 + ⋅ ⋅ ⋅ + b0 cj ,

where everything after the first term is divisible by p. Since p does not divide both bj and
c0 , it follows that p does not divide bj c0 . Therefore, p does not divide aj , which implies
that j = n. Then from j ≤ k ≤ n, it follows that k = n.
Therefore, deg g(x) = deg f (x) and, hence, deg h(x) = 0. Thus, h(x) = h ∈ R. Then
from f (x) = hg(x) with f primitive, it follows that h is a unit and, therefore, f (x) is
irreducible.
(b) Suppose that f (x) = g(x)h(x) with g(x), h(x) ∈ R[x]. The fact that f (x) was prim-
itive was only used in the final part of part (a). Therefore, by the same arguments as in
part (a), we may assume—without loss of generality—that h ∈ R ⊂ K. Therefore, f (x) is
irreducible in K[x].

Following are some examples:

Example 4.4.9. Let R = ℤ and p a prime number. Suppose that n, m are integers such
that n ≥ 1 and p does not divide m. Then x n ± pm is irreducible in ℤ[x] and ℚ[x]. In
1
particular, (pm) n is irrational.

Example 4.4.10. Let R = ℤ and p a prime number. Consider the polynomial

xp − 1
Φp (x) = = x p−1 + x p−2 + ⋅ ⋅ ⋅ + 1.
x−1

Since all the coefficients of Φp (x) are equal to 1, Eisenstein’s criterion is not directly ap-
plicable. However, the fact that Φp (x) is irreducible implies that for any integer a, the
polynomial Φp (x + a) is also irreducible in ℤ[x]. It follows that

p p p−1 p p
(x + 1)p − 1 x + ( 1 )x + ⋅ ⋅ ⋅ + (p−1)x + 1 − 1
Φp (x + 1) = =
(x + 1) − 1 x
p−1 p p−2 p
= x + ( )x + ⋅ ⋅ ⋅ + ( ).
1 p−1

Now p|(pi) for 1 ≤ i ≤ p − 1 (see exercises) and, moreover, (p−1


p
) = p is not divisible
by p2 . Therefore, we can apply the Eisenstein criterion to conclude that Φp (x) is irre-
ducible in ℤ[x] and ℚ[x].
62 � 4 Polynomials and Polynomial Rings

Theorem 4.4.11. Let R be a UFD and K its field of fractions. Let f (x) = ∑ni=0 ai x i ∈ R[x] be
a polynomial of degree ≥ 1. Let P be a prime ideal in R with an ∉ P. Let R = R/P, and let
α : R[x] → R[x] be defined by

m m
α(∑ ri x i ) = ∑(ri + P)x i .
i=0 i=0

α is an epimorphism. Then if α(f (x)) is irreducible in R[x], then f (x) is irreducible in K[x].

Proof. By Theorem 4.4.3, there exists an a ∈ R and a primitive g(x) ∈ R[x] satisfying
f (x) = ag(x). Since an ∉ P, we have that α(a) ≠ 0. Furthermore, the highest coefficient
of g(x) is also not an element of P. If α(g(x)) is reducible, then α(f (x)) is also reducible.
Thus, α(g(x)) is irreducible. However, from Theorem 4.4.4, g(x) is irreducible in K[x].
Therefore, f (x) = ag(x) is also irreducible in K[x]. Therefore, to prove the theorem, it
suffices to consider the case where f (x) is primitive in R[x].
Now suppose that f (x) is primitive. We show that f (x) is irreducible in R[x].
Suppose that f (x) = g(x)h(x), g(x), h(x) ∈ R[x] with h(x), g(x) nonunits in R[x].
Since f (x) is primitive, g, h ∉ R. Therefore, deg g(x) < deg f (x), and deg h(x) < deg f (x).
Now we have α(f (x)) = α(g(x))α(h(x)). Since P is a prime ideal, R/P is an integral
domain. Therefore, in R[x] we have

deg α(g(x)) + deg α(h(x)) = deg α( f (x)) = deg f (x)

since an ∉ P. Since R is a UFD, it has no zero divisors. Therefore,

deg f (x) = deg g(x) + deg h(x).

Now

deg α(g(x)) ≤ deg g(x)


deg α(h(x)) ≤ deg h(x).

Therefore, deg α(g(x)) = deg g(x), and deg α(h(x)) = deg h(x). Therefore, α(f (x)) is re-
ducible, and we have a contradiction.
It is important to note that α(f (x)), being reducible, does not imply that f (x) is re-
ducible. For example, f (x) = x 2 + 1 is irreducible in ℤ[x]. However, in ℤ2 [x], we have

x 2 + 1 = (x + 1)2

and, hence, f (x) is reducible in ℤ2 [x].

Example 4.4.12. Let f (x) = x 5 − x 2 + 1 ∈ ℤ[x]. Choose P = 2ℤ so that

α( f (x)) = x 5 + x 2 + 1 ∈ ℤ2 [x].
4.5 Exercises � 63

Suppose that in ℤ2 [x], we have α(f (x)) = g(x)h(x). Without loss of generality, we may
assume that g(x) is of degree 1 or 2.
If deg g(x) = 1, then α(f (x)) has a zero c in ℤ2 [x]. The two possibilities for c are
c = 0, or c = 1. Then the following hold:

If c = 0, then 0 + 0 + 1 = 1 ≠ 0.
If c = 1, then 1 + 1 + 1 = 1 ≠ 0.

Hence, the degree of g(x) cannot be 1.


Suppose deg g(x) = 2. The polynomials of degree 2 over ℤ2 [x] have the form

x 2 + x + 1, x 2 + x, x 2 + 1, x2.

The last three, x 2 + x, x 2 + 1, x 2 all have zeros in ℤ2 [x]. Therefore, they cannot divide
α(f (x)). Therefore, g(x) must be x 2 + x + 1. Applying the division algorithm, we obtain

α( f (x)) = (x 3 + x 2 )(x 2 + x + 1) + 1

and, therefore, x 2 + x + 1 does not divide α(f (x)). It follows that α(f (x)) is irreducible,
and from the previous theorem, f (x) must be irreducible in ℚ[x].

4.5 Exercises
1. For which a, b ∈ ℤ does the polynomial x 2 + 3x + 1 divide the polynomial

x 3 + x 2 + ax + b?

2. Let a + bi ∈ ℂ be a zero of f (x) ∈ ℝ[x]. Show that also a − ib is a zero of f (x).


3. Determine all quadratic irreducible polynomials over ℝ.
4. Let R be an integral domain, I ⊲ R an ideal, and f ∈ R[x] a monic polynomial. Define
(R/I)[x] by the mapping R[x] → (R/I)[x], f = ∑ ai x i 󳨃→ f ̄ = ∑ aī x i , where ā := a + I.
Show, if (R/I)[x] is irreducible, then f ∈ R[x] is also irreducible.
5. Decide if the following polynomials f ∈ R[x] are irreducible:
(i) f (x) = x 3 + 2x 2 + 3, R = ℤ.
(ii) f (x) = x 5 − 2x + 1, R = ℚ.
(iii) f (x) = 3x 4 + 7x 2 + 14x + 7, R = ℚ.
(iv) f (x) = x 7 + (3 − i)x 2 + (3 + 4i)x + 4 + 2i, R = ℤ[i].
(v) f (x) = x 4 + 3x 3 + 2x 2 + 3x + 4, R = ℚ.
(vi) f (x) = 8x 3 − 4x 2 + 2x − 1, R = ℤ.
6. Let R be an integral domain with characteristic 0, let k ≥ 1 and α ∈ R. In R[x], define
the derivatives f (k) (x), k = 0, 1, 2, . . . , of a polynomial f (x) ∈ R[x] by
64 � 4 Polynomials and Polynomial Rings

f 0 (x) := f (x),

f (k) (x) := f (k−1) (x).

Show that α is a zero of order k of the polynomial f (x) ∈ R[x], if f (k−1) (α) = 0, but
f (k) (α) ≠ 0.
7. Prove that the set of integral polynomials with even constant term is an ideal, but
not principal.
8. Prove that p|(pi) for 1 ≤ i ≤ p − 1.
5 Field Extensions
5.1 Extension Fields and Finite Extensions
Much of algebra in general arose from the theory of equations, specifically polynomial
equations. As discovered by Galois and Abel, the solutions of polynomial equations over
fields is intimately tied to the theory of field extensions. This theory eventually blos-
soms into Galois Theory. In this chapter, we discuss the basic material concerning field
extensions.
Recall that if L is a field and K ⊂ L is also a field under the same operations as L,
then K is called a subfield of L. If we view this situation from the viewpoint of K, we say
that L is an extension field or field extension of K. If K, L are fields with K ⊂ L, we always
assume that K is a subfield of L.

Definition 5.1.1. If K, L are fields with K ⊂ L, then we say that L is a field extension or
extension field of K. We denote this by L|K.
Note that this is equivalent to having a field monomorphism

i:K →L

and then identifying K and i(K).

As examples, we have that ℝ is an extension field of ℚ, and ℂ is an extension field


of both ℂ and ℚ. If K is any field then the ring of polynomials K[x] over K is an integral
domain. Let K(x) be the field of fractions of K[x]. This is called the field of rational func-
tions over K. Since K can be considered as part of K[x], it follows that K ⊂ K(x) and,
hence, K(x) is an extension field of K.
A crucial concept is that of the degree of a field extension. Recall that a vector space
V over a field K consists of an Abelian group V together with scalar multiplication from
K satisfying the following:
(1) fv ∈ V if f ∈ K, v ∈ V .
(2) f (u + v) = fu + fv for f ∈ K, u, v ∈ V .
(3) (f + g)v = fv + gv for f , g ∈ K, v ∈ V .
(4) (fg)v = f (gv) for f , g ∈ K, v ∈ V .
(5) 1v = v for v ∈ V .

Notice that if K is a subfield of L, then products of elements of L with elements of K are


still in L. Since L is an Abelian group under addition, L can be considered as a vector
space over K. Thus, any extension field is a vector space over any of its subfields. Using
this, we define the degree |L : K| of an extension K ⊂ L as the dimension dimK (L) of L
as a vector space over K. We call L a finite extension of K if |L : K| < ∞.

https://doi.org/10.1515/9783111142524-005
66 � 5 Field Extensions

Definition 5.1.2. If L is an extension field of K, then the degree of the extension L|K is
defined as the dimension, dimK (L), of L, as a vector space over K. We denote the degree
by |L : K|. The field extension L|K is a finite extension if the degree |L : K| is finite.

Lemma 5.1.3. |ℂ : ℝ| = 2, but |ℝ : ℚ| = ∞.

Proof. Every complex number can be written uniquely as a + ib, where a, b ∈ ℝ. Hence,
the elements 1, i constitute a basis for ℂ over ℝ and, therefore, the dimension is 2. That
is, |ℂ : ℝ| = 2.
The fact that |ℝ : ℚ| = ∞ depends on the existence of transcendental numbers.
An element r ∈ ℝ is algebraic (over ℚ) if it satisfies some nonzero polynomial with
coefficients from ℚ. That is, P(r) = 0, where

0 ≠ P(x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n with ai ∈ ℚ.

Any q ∈ ℚ is algebraic since if P(x) = x − q, then P(q) = 0. However, many irrationals are
also algebraic. For example, √2 is algebraic since x 2 − 2 = 0 has √2 as a zero. An element
r ∈ ℝ is transcendental if it is not algebraic.
In general, it is very difficult to show that a particular element is transcendental.
However, there are uncountably many transcendental elements (see exercises). Specific
examples are e and π. We will give a proof of their transcendence in Chapter 20.
Since e is transcendental, for any natural number n, the set {1, e, e2 , . . . , en } must be
independent over ℚ, for otherwise there would be a polynomial that e would satisfy.
Therefore, we have infinitely many independent vectors in ℝ over ℚ, which would be
impossible if ℝ had finite degree over ℚ.

Lemma 5.1.4. If K is any field, then |K(x) : K| = ∞.

Proof. For any n, the elements 1, x, x 2 , . . . , x n are independent over K. Therefore, as in


the proof of Lemma 5.1.3, K(x) must be infinite-dimensional over K.

If L|K and L1 |K1 are field extensions, then they are isomorphic field extensions if
there exists a field isomorphism f : L → L1 such that f|K is an isomorphism from K to K1 .
Suppose that K ⊂ L ⊂ M are fields. Below we show that the degrees multiply. In this
situation, where K ⊂ L ⊂ M, we call L an intermediate field.

Theorem 5.1.5. Let K, L, M be fields with K ⊂ L ⊂ M. Then

|M : K| = |M : L||L : K|.

Note that |M : K| = ∞ if and only if either |M : L| = ∞, or |L : K| = ∞.

Proof. Let {xi : i ∈ I} be a basis for L as a vector space over K, and let {yj : j ∈ J} be a basis
for M as a vector space over L. To prove the result, it is sufficient to show that the set

B = {xi yj : i ∈ I, j ∈ J}
5.1 Extension Fields and Finite Extensions � 67

is a basis for M as a vector space over K. To show this, we must show that B is a linearly
independent set over K, and that B spans M.
Suppose that

∑ kij xi yj = 0 where kij ∈ K.


i,j

We can then write this sum as

∑(∑ kij xi ) yj = 0.
j i

But ∑i kij xi ∈ L. Since {yj : j ∈ J} is a basis for M over L, the yj are independent over L;
hence, for each j, we get ∑i kij xi = 0. Now since {xi : i ∈ I} is a basis for L over K, it follows
that the xi are linearly independent, and since for each j we have ∑i kij xi = 0, it must be
that kij = 0 for all i and for all j. Therefore, the set B is linearly independent over K.
Now suppose that m ∈ M. Then since {yj : j ∈ J} spans M over L, we have

m = ∑ cj yj with cj ∈ L.
j

However, {xi : i ∈ I} spans L over K, and so for each cj , we have

cj = ∑ kij xi with kij ∈ K.


i

Combining these two sums, we have

m = ∑ kij xi yj
ij

and, hence, B spans M over K. Therefore, B is a basis for M over K, and the result is
proved.

Corollary 5.1.6. (a) If |L : K| is a prime number, then there exists no proper intermediate
field between L and K.
(b) If K ⊂ L and |L : K| = 1, then L = K.

Let L|K be a field extension, and suppose that A ⊂ L. Then certainly there are sub-
rings of L containing both A and K, for example L. We denote by K[A] the intersection of
all subrings of L containing both K and A. Since the intersection of subrings is a subring,
it follows that K[A] is a subring containing both K and A and the smallest such subring.
We call K[A] the ring adjunction of A to K.
In an analogous manner, we let K(A) be the intersection of all subfields of L contain-
ing both K and A. This is then a subfield of L, and the smallest subfield of L containing
both K and A. The subfield K(A) is called the field adjunction of A to K.
68 � 5 Field Extensions

Clearly, K[A] ⊂ K(A). If A = {a1 , . . . , an }, then we write

K[A] = K[a1 , . . . , an ] and K(A) = K(a1 , . . . , an ).

Definition 5.1.7. The field extension L|K is finitely generated if there exist elements
a1 , . . . , an ∈ L such that L = K(a1 , . . . , an ). The extension L|K is a simple extension if there
is an a ∈ L with L = K(a). In this case, a is called a primitive element of L|K.

In Chapter 7, we will look at an alternative way to view the adjunction constructions


in terms of polynomials.

5.2 Finite and Algebraic Extensions


We now turn to the relationship between field extensions and the solution of polynomial
equations.

Definition 5.2.1. Let L|K be a field extension. An element a ∈ L is algebraic over K if


there exists a polynomial p(x) ∈ K[x] with p(a) = 0. L is an algebraic extension of K
if each element of L is algebraic over K. An element a ∈ L that is not algebraic over
K is called transcendental. L is a transcendental extension if there are transcendental
elements; that is, they are not algebraic over K.

For the remainder of this section, we assume that L|K is a field extension.

Lemma 5.2.2. Each element of K is algebraic over K.

Proof. Let k ∈ K. Then k is a zero of the polynomial p(x) = x − k ∈ K[x].

We tie now algebraic extensions to finite extensions.

Theorem 5.2.3. If L|K is a finite extension, then L|K is an algebraic extension.

Proof. Suppose that L|K is a finite extension and a ∈ L. We must show that a is algebraic
over K. Suppose that |L : K| = n < ∞, then dimK (L) = n. It follows that any n+1 elements
of L are linearly dependent over K.
Now consider the elements 1, a, a2 , . . . , an in L. These are n + 1 distinct elements in L,
so they are dependent over K. Hence, there exist c0 , . . . , cn ∈ K not all zero such that

c0 + c1 a + ⋅ ⋅ ⋅ + cn an = 0.

Let p(x) = c0 + c1 x + ⋅ ⋅ ⋅ + cn x n . Then p(x) ∈ K[x], and p(a) = 0. Therefore, a is algebraic


over K. Since a was arbitrary, it follows that L is an algebraic extension of K.

From the previous theorem, it follows that every finite extension is algebraic. The
converse is not true; that is, there are algebraic extensions that are not finite. We will
give examples in Section 5.4.
5.3 Minimal Polynomials and Simple Extensions � 69

The following lemma gives some examples of algebraic and transcendental exten-
sions.

Lemma 5.2.4. ℂ|ℝ is algebraic, but ℝ|ℚ and ℂ|ℚ are transcendental. If K is any field,
then K(x)|K is transcendental.

Proof. Since 1, i constitute a basis for ℂ over ℝ, we have |ℂ : ℝ| = 2. Hence, ℂ is a finite


extension of ℝ; therefore, from Theorem 5.2.3, an algebraic extension. More directly, if
α = a + ib ∈ ℂ, then α is a zero of x 2 − 2ax + (a2 + b2 ) ∈ ℝ[x].
The existence of transcendental numbers (we will discuss these more fully in Sec-
tion 5.5) shows that both ℝ|ℚ and ℂ|ℚ are transcendental extensions.
Finally, the element x ∈ K(x) is not a zero of any polynomial in K[x]. Therefore, x is
a transcendental element, so the extension K(x)|K is transcendental.

5.3 Minimal Polynomials and Simple Extensions


If L|K is a field extension and a ∈ L is algebraic over K, then p(a) = 0 for some polyno-
mial p(x) ∈ K[x]. In this section, we consider the smallest such polynomial and tie it to
a simple extension of K.

Definition 5.3.1. Suppose that L|K is a field extension and a ∈ L is algebraic over K. The
polynomial ma (x) ∈ K[x] is the minimal polynomial of a over K if the following hold:
(1) ma (x) has leading coefficient 1; that is, it is a monic polynomial.
(2) ma (a) = 0.
(3) If f (x) ∈ K[x] with f (a) = 0, then ma (x)| f (x).

Hence, ma (x) is the monic polynomial of minimal degree that has a as a zero.

We prove next that every algebraic element has such a minimal polynomial.

Theorem 5.3.2. Suppose that L|K is a field extension and a ∈ L is algebraic over K. Then
we have:
(1) The minimal polynomial ma (x) ∈ K[x] exists and is irreducible over K.
(2) K[a] ≅ K(a) ≅ K[x]/(ma (x)), where (ma (x)) is the principal ideal in K[x] generated
by ma (x).
(3) |K(a) : K| = deg(ma (x)). Therefore, K(a)|K is a finite extension.

Proof. (1) Suppose that a ∈ L is algebraic over K. Let

I = { f (x) ∈ K[x] : f (a) = 0}.

Since a is algebraic, I ≠ 0. It is straightforward to show (see exercises) that I is an ideal


in K[x]. Since K is a field, we have that K[x] is a principal ideal domain.
70 � 5 Field Extensions

Hence, there exists g(x) ∈ K[x] with I = (g(x)). Let b be the leading coefficient of
g(x). Then ma (x) = b−1 g(x) is a monic polynomial. We claim that ma (x) is the minimal
polynomial of a and that ma (x) is irreducible. First, it is clear that I = (g(x)) = (ma (x)). If
f (x) ∈ K[x] with f (a) = 0, then f (x) = h(x)ma (x) for some h(x). Therefore, ma (x) divides
any polynomial that has a as a zero. It follows that ma (x) is the minimal polynomial.
Suppose that ma (x) = g1 (x)g2 (x). Then since ma (a) = 0, it follows that either g1 (a) =
0 or g2 (a) = 0. Suppose g1 (a) = 0. Then from above, ma (x)|g1 (x), and since g1 (x)|ma (x),
we must then have that g2 (x) is a unit. Therefore, ma (x) is irreducible.
(2) Consider the map τ : K[x] → K[a] given by

τ(∑ ki x i ) = ∑ ki ai .
i i

Then τ is a ring epimorphism (see exercises), and

ker(τ) = { f (x) ∈ K[x] : f (a) = 0} = (ma (x))

from the argument in the proof of part (1). It follows that

K[x]/(ma (x)) ≅ K[a].

Since ma (x) is irreducible, we have K[x]/(ma (x)) is a field and, therefore, K[a] = K(a).
(3) Let n = deg(ma (x)). We claim that the elements 1, a, . . . , an−1 are a basis for K[a] =
K(a) over K. First suppose that

n−1
∑ ci a i = 0
i=1

with not all ci = 0 and ci ∈ K. Then h(a) = 0, where h(x) = ∑n−1 i


i=0 ci x . But this contradicts
the fact that ma (x) has minimal degree over all polynomials in K[x] that have a as a
zero. Therefore, the set 1, a, . . . , an−1 is linearly independent over K.
Now let b ∈ K[a] ≅ K[x]/(ma (x)). Then there is a g(x) ∈ K[x] with b = g(a). By the
division algorithm

g(x) = h(x)ma (x) + r(x),

where r(x) = 0 or deg(r(x)) < deg(ma (x)). Now

r(a) = g(a) − h(a)ma (a) = g(a) = b.

If r(x) = 0, then b = 0. If r(x) ≠ 0, then since deg(r(x)) < n, we have

r(x) = c0 + c1 x + ⋅ ⋅ ⋅ + cn−1 x n−1


5.3 Minimal Polynomials and Simple Extensions � 71

with ci ∈ K and some ci , but not all might be zero. This implies that

b = r(a) = c0 + c1 a + ⋅ ⋅ ⋅ + cn−1 an−1

and, hence, b is a linear combination over K of 1, a, . . . , an−1 . Hence, 1, a, . . . , an−1 spans


K[a] over K and, hence, forms a basis.

Theorem 5.3.3. Suppose that L|K is a field extension and a ∈ L is algebraic over K. Sup-
pose that f (x) ∈ K[x] is a monic polynomial with f (a) = 0. Then f (x) is the minimal
polynomial if and only if f (x) is irreducible in K[x].

Proof. Suppose that f (x) is the minimal polynomial of a. Then f (x) is irreducible from
the previous theorem.
Conversely, suppose that f (x) is monic, irreducible and f (a) = 0. From the previous
theorem ma (x)| f (x). Since f (x) is irreducible, we have f (x) = cma (x) with c ∈ K. How-
ever, since both f (x) and ma (x) are monic, we must have c = 1, and f (x) = ma (x).

We now show that a finite extension of K is actually finitely generated over K. In


addition, it is generated by finitely many algebraic elements.

Theorem 5.3.4. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a finite extension.
(2) L|K is an algebraic extension and there exist a1 , . . . , an ∈ L with L = K(a1 , . . . , an ).
(3) There exist algebraic elements a1 , . . . , an ∈ L such that L = K(a1 , . . . , an ).

Proof. (1) ⇒ (2). We have seen in Theorem 5.2.3 that a finite extension is algebraic.
Suppose that a1 , . . . , an are a basis for L over K. Then clearly L = K(a1 , . . . , an ).
(2) ⇒ (3). If L|K is an algebraic extension and L = K(a1 , . . . , an ), then each ai is
algebraic over K.
(3) ⇒ (1). Suppose that there exist algebraic elements a1 , . . . , an ∈ L such that
L = K(a1 , . . . , an ). We show that L|K is a finite extension. We do this by induction on n.
If n = 1, then L = K(a) for some algebraic element a, and the result follows from Theo-
rem 5.3.2. Suppose now that n ≥ 2. We assume then that an extension K(a1 , . . . , an−1 )
with a1 , . . . , an−1 algebraic elements is a finite extension. Now suppose that we have
L = K(a1 , . . . , an ) with a1 , . . . , an algebraic elements.
Then

󵄨󵄨K(a1 , . . . , an ) : K 󵄨󵄨󵄨
󵄨󵄨 󵄨

= 󵄨󵄨󵄨K(a1 , . . . , an−1 )(an ) : K(a1 , . . . , an−1 )󵄨󵄨󵄨󵄨󵄨󵄨K(a1 , . . . , an−1 ) : K 󵄨󵄨󵄨.


󵄨 󵄨󵄨 󵄨

The second term |K(a1 , . . . , an−1 ) : K| is finite from the inductive hypothesis. The first
term |K(a1 , . . . , an−1 )(an ) : K(a1 , . . . , an−1 )| is also finite from Theorem 5.3.2 since it is
a simple extension of the field K(a1 , . . . , an−1 ) by the algebraic element an . Therefore,
|K(a1 , . . . , an ) : K| is finite.
72 � 5 Field Extensions

Theorem 5.3.5. Suppose that K is a field and R is an integral domain with K ⊂ R. Then R
can be viewed as a vector space over K. If dimK (R) < ∞, then R is a field.

Proof. Let r0 ∈ R with r0 ≠ 0. Define the map from R to R given by

τ(r) = rr0 .

It is easy to show (see exercises) that this is a linear transformation from R to R, consid-
ered as a vector space over K.
Suppose that τ(r) = 0. Then rr0 = 0 and, hence, r = 0 since r0 ≠ 0 and R is an
integral domain. It follows that τ is an injective map. Since R is a finite-dimensional
vector space over K, and τ is an injective linear transformation, it follows that τ must
also be surjective. This implies that there exists an r1 with τ(r1 ) = 1. Then r1 r0 = 1 and,
hence, r0 has an inverse within R. Since r0 was an arbitrary nonzero element of R, it
follows that R is a field.

Theorem 5.3.6. Suppose that K ⊂ L ⊂ M is a chain of field extensions. Then M|K is


algebraic if and only if M|L is algebraic, and L|K is algebraic.

Proof. If M|K is algebraic, then certainly M|L and L|K are algebraic.
Now suppose that M|L and L|K are algebraic. We show that M|K is algebraic. Let
a ∈ M. Then since a is algebraic over L, there exist b0 , b1 , . . . , bn ∈ L with

b0 + b1 a + ⋅ ⋅ ⋅ + bn an = 0.

Each bi is algebraic over K and, hence, K(b0 , . . . , bn ) is finite-dimensional over K. There-


fore, K(b0 , . . . , bn )(a) = K(b0 , . . . , bn , a) is also finite-dimensional over K. Therefore,
K(b0 , . . . , bn , a) is a finite extension of K and, hence, an algebraic extension K. Since
a ∈ K(b0 , . . . , bn , a), it follows that a is algebraic over K and, therefore, M is algebraic
over K.

5.4 Algebraic Closures


As before, suppose that L|K is a field extension. Since each element of K is algebraic over
K, there are certainly algebraic elements over K within L. Let 𝒜K denote the set of all
elements of L that are algebraic over K. We prove that 𝒜K is actually a subfield of L. It
is called the algebraic closure of K within L.

Theorem 5.4.1. Suppose that L|K is a field extension, and let 𝒜K denote the set of all ele-
ments of L that are algebraic over K. Then 𝒜K is a subfield of L. 𝒜K is called the algebraic
closure of K in L.

Proof. Since K ⊂ 𝒜K , we have that 𝒜K ≠ 0. Let a, b ∈ 𝒜K . Since a, b are both algebraic


over K from Theorem 5.3.4, we have that K(a, b) is a finite extension of K. Therefore,
K(a, b) is an algebraic extension of K and, hence, each element of K(a, b) is algebraic
5.5 Algebraic and Transcendental Numbers � 73

over K. Now a, b ∈ K(a, b) if b ≠ 0, and K(a, b) is a field. Therefore, a ± b, ab, and a/b are
all in K(a, b) and, hence, all algebraic over K. Therefore, a ± b, ab, a/b, if b ≠ 0, are all
in 𝒜K . It follows that 𝒜K is a subfield of L.

In Section 5.2, we showed that every finite extension is an algebraic extension. We


mentioned that the converse is not necessarily true; that is, there are algebraic exten-
sions that are not finite. Here we give an example.

Theorem 5.4.2. Let 𝒜 be the algebraic closure of the rational numbers ℚ within the com-
plex numbers ℂ. Then 𝒜 is an algebraic extension of ℚ, but |𝒜 : ℚ| = ∞.

Proof. From the previous theorem, 𝒜 is an algebraic extension of ℚ. We show that it


cannot be a finite extension.
By Eisenstein’s criterion, the rational polynomial f (x) = x p + p is irreducible over ℚ
for any prime p. Let a be a zero in ℂ of f (x). Then a ∈ 𝒜, and |ℚ(a) : ℚ| = p. Therefore,
|𝒜 : ℚ| ≥ p for all primes p. Since there are infinitely many primes, this implies that
|𝒜 : ℚ| = ∞.

5.5 Algebraic and Transcendental Numbers


In this section, we consider the string of field extensions ℚ ⊂ ℝ ⊂ ℂ.

Definition 5.5.1. An algebraic number α is an element of ℂ, which is algebraic over ℚ.


Hence, an algebraic number is an α ∈ ℂ such that f (α) = 0 for some f (x) ∈ ℚ[x]. If α ∈ ℂ
is not algebraic, it is transcendental.

We will let 𝒜 denote the totality of algebraic numbers within the complex num-
bers ℂ, and 𝒯 the set of transcendentals so that ℂ = 𝒜 ∪ 𝒯 . In the language of the last
subsection, 𝒜 is the algebraic closure of ℚ within ℂ. As in the general case, if α ∈ ℂ is
algebraic, we will let mα (x) denote the minimal polynomial of α over ℚ.
We now examine the sets 𝒜 and 𝒯 more closely. Since 𝒜 is precisely the algebraic
closure of ℚ in ℂ, we have from our general result that 𝒜 actually forms a subfield
of ℂ. Furthermore, since the intersection of subfields is again a subfield, it follows that
𝒜′ = 𝒜 ∩ ℝ, the real algebraic numbers form a subfield of the reals.

Theorem 5.5.2. The set 𝒜 of algebraic numbers forms a subfield of ℂ.


The subset 𝒜′ = 𝒜 ∩ ℝ of real algebraic numbers forms a subfield of ℝ.

Since each rational is algebraic, it is clear that there are algebraic numbers. Fur-
thermore, there are irrational algebraic numbers, √2 for example, since it satisfies the
irreducible polynomial x 2 − 2 = 0 over ℚ. On the other hand, we have not examined the
question of whether transcendental numbers really exist. To show that any particular
complex number is transcendental is, in general, quite difficult. However, it is relatively
easy to show that there are uncountably infinitely many transcendentals.
74 � 5 Field Extensions

Theorem 5.5.3. The set 𝒜 of algebraic numbers is countably infinite. Therefore, 𝒯 , the
set of transcendental numbers, and 𝒯 ′ = 𝒯 ∩ ℝ, the real transcendental numbers, are
uncountably infinite.

Proof. Let

𝒫n = { f (x) ∈ ℚ[x] : deg( f (x)) ≤ n}.

Since if f (x) ∈ 𝒫n , f (x) = qo + q1 x + ⋅ ⋅ ⋅ + qn x n with qi ∈ ℚ, we can identify a polynomial


of degree ≤ n with an (n + 1)-tuple (q0 , q1 , . . . , qn ) of rational numbers. Therefore, the set
𝒫n has the same size as the (n + 1)-fold Cartesian product of ℚ:

ℚn+1 = ℚ × ℚ × ⋅ ⋅ ⋅ × ℚ.

Since a finite Cartesian product of countable sets is still countable, it follows that 𝒫n is
a countable set.
Now let

ℬn = ⋃ {zeros of p(x)};
p(x)∈𝒫n

that is, ℬn is the union of all zeros in ℂ of all rational polynomials of degree ≤ n. Since
each such p(x) has a maximum of n zeros, and since 𝒫n is countable, it follows that ℬn
is a countable union of finite sets and, hence, is still countable. Now

𝒜 = ⋃ ℬn ,
n=1

so 𝒜 is a countable union of countable sets and is, therefore, countable.


Since both ℝ and ℂ are uncountably infinite, the second assertions follow directly
from the countability of 𝒜. If say 𝒯 were countable, then ℂ = 𝒜 ∪ 𝒯 would also be
countable, which is a contradiction.

From Theorem 5.5.3, we know that there exist infinitely many transcendental num-
bers. Liouville, in 1851, gave the first proof of the existence of transcendentals by exhibit-
ing a few. He gave the following as one example:

Theorem 5.5.4. The real number



1
c=∑
j=1 10j!

is transcendental.

Proof. First of all, since 101j! < 101 j , and ∑∞ 1


j=1 10j is a convergent geometric series, it fol-
lows from the comparison test that the infinite series defining c converges and defines
5.5 Algebraic and Transcendental Numbers � 75

1 1 1
a real number. Furthermore, since ∑∞ j=1 10j = 9 , it follows that c < 9 < 1. Suppose that
c is algebraic so that g(c) = 0 for some rational nonzero polynomial g(x). Multiplying
through by the least common multiple of all the denominators in g(x), we may suppose
that f (c) = 0 for some integral polynomial f (x) = ∑nj=0 mj x j . Then c satisfies
n
∑ mj cj = 0
j=0

for some integers m0 , . . . , mn .


If 0 < x < 1, then by the triangle inequality
󵄨󵄨 n 󵄨󵄨 n
󵄨󵄨 ′ 󵄨󵄨 󵄨󵄨󵄨 j−1 󵄨󵄨
󵄨󵄨 f (x)󵄨󵄨 = 󵄨󵄨 ∑ jmj x 󵄨󵄨󵄨 ≤ ∑ | jmj | = B,
󵄨󵄨 󵄨󵄨
󵄨 j=1 󵄨 j=1
where B is a real constant depending only on the coefficients of f (x).
Now let
k
1
ck = ∑
j=1 10j!

be the k-th partial sum for c. Then



1 1
|c − ck | = ∑ j!
< 2 ⋅ (k+1)! .
j=k+1 10 10

Apply the mean value theorem to f (x) at c and ck to obtain

󵄨󵄨 f (c) − f (ck )󵄨󵄨󵄨 = |c − ck |󵄨󵄨󵄨 f (ζ )󵄨󵄨󵄨


󵄨󵄨 󵄨 󵄨 ′ 󵄨

for some ζ with ck < ζ < c < 1. Now since 0 < ζ < 1, we have
1
|c − ck |󵄨󵄨󵄨 f ′ (ζ )󵄨󵄨󵄨 < 2B
󵄨 󵄨
.
10(k+1)!
On the other hand, since f (x) can have at most n zeros, it follows that for all k large
enough, we would have f (ck ) ≠ 0. Since f (c) = 0, we have
󵄨󵄨 n 󵄨󵄨
󵄨󵄨 󵄨󵄨󵄨 j 󵄨󵄨󵄨 1
󵄨󵄨 f (c) − f (ck )󵄨󵄨 = 󵄨󵄨 f (ck )󵄨󵄨 = 󵄨󵄨 ∑ mj ck 󵄨󵄨 > nk!
󵄨󵄨 󵄨󵄨 󵄨󵄨
󵄨󵄨 󵄨󵄨 10
󵄨 j=1 󵄨
j
since for each j, mj ck is a rational number with denominator 10jk! . However, if k is chosen
sufficiently large and n is fixed, we have
1 2B
nk!
> (k+1)! ,
10 10
contradicting the equality from the mean value theorem. Therefore, c is transcenden-
tal.
76 � 5 Field Extensions

In 1873, Hermite proved that e is transcendental, whereas, in 1882, Lindemann


showed that π is transcendental. Schneider, in 1934, showed that ab is transcendental if
a ≠ 0, a, and b are algebraic and b is irrational. In Chapter 20, we will prove that both e
and π are transcendental. An interesting open question is the following:
Is π transcendental over ℚ(e)?
To close this section, we show that in general if a ∈ L is transcendental over K, then
K(a)|K is isomorphic to the field of rational functions over K.

Theorem 5.5.5. Suppose that L|K is a field extension and a ∈ L is transcendental over K.
Then K(a)|K is isomorphic to K(x)|K. Here the isomorphism μ : K(x) → K(a) can be
chosen such that μ(x) = a.

Proof. Define the map μ : K(x) → K(a) by

f (x) f (a)
μ( )=
g(x) g(a)

for f (x), g(x) ∈ K[x] with g(x) ≠ 0. Then μ is a homomorphism, and μ(x) = a. Since
μ ≠ 0, it follows that μ is an isomorphism.

5.6 Exercises
1. Let a ∈ ℂ with a3 − 2a + 2 = 0 and b = a2 − a. Compute the minimal polynomial
mb (x) of b over ℚ and compute the inverse of b in ℚ(a).
2. Determine the algebraic closure of ℝ in ℂ(x).
2n
3. Let an := √2 ∈ ℝ, n = 1, 2, 3, . . . and A := {an : n ∈ ℕ} and E := ℚ(A). Show the
following:
(i) |ℚ(an ) : ℚ| = 2n .
(ii) |E : ℚ| = ∞.
(iii) E = ⋃∞n=1 ℚ(an ).
(iv) E is algebraic over ℚ.
4. Determine |E : ℚ| for
(i) E = ℚ(√2, √−2).
(ii) E = ℚ(√3, √3 + √3).
3

(iii) E = ℚ( 1+i , −1+i ).


√2 √2
5. Show that ℚ(√2, √3) = {a + b√2 + c√3 + d √6 : a, b, c, d ∈ ℚ}. Determine the degree
of ℚ(√2, √3) over ℚ. Further show that ℚ(√2, √3) = ℚ(√2 + √3).
6. Let K, E be fields and a ∈ E be transcendental over K.
Show the following:
(i) Each element of K(a)|K, which is not in K, is transcendental over K.
(ii) an is transcendental over K for each n > 1.
a3
(iii) If L := K( a+1 ), then a is algebraic over L. Determine the minimal polynomial
ma (x) of a over L.
5.6 Exercises � 77

7. Let K be a field and a ∈ K(x) \ K. Show the following:


(i) x is algebraic over K(a).
(ii) If L is a field with K ⊂ L ⊆ K(x) and if a ∈ L, then |K(x) : L| < ∞.
(iii) a is transcendental over K.
8. Suppose that a ∈ L is algebraic over K. Let

I = { f (x) ∈ K[x] : f (a) = 0}.

Since a is algebraic I ≠ 0. Prove that I is an ideal in K[x].


9. Prove that there are uncountably many transcendental numbers. To do this show
that the set 𝒜 of algebraic numbers is countable. To do this:
(i) Show that ℚn [x], the set of rational polynomials of degree ≤ n, is countable
(finite Cartesian product of countable sets).
(ii) Let ℬn = {Zeros of polynomials in ℚn }. Show that ℬ is countable.
(iii) Show that 𝒜 = ⋃∞ n=1 ℬn and conclude that 𝒜 is countable.
(iv) Show that the transcendental numbers are uncountable.
10. Consider the map τ : K[x] → K[a] given by

τ(∑ ki x i ) = ∑ ki ai .
i i

Show that τ is a ring epimorphism.


11. Suppose that K is a field and R is an integral domain with K ⊂ R. Then R can be
viewed as a vector space over K. Let r0 ∈ R with r0 ≠ 0. Define the map from R to R
given by

τ(r) = rr0 .

Show that this is a linear transformation from R to R, considered as a vector space


over K.
6 Field Extensions and Compass and Straightedge
Constructions
6.1 Geometric Constructions
Greek mathematicians in the classical period posed the problem of constructing certain
geometric figures in the Euclidean plane using only a straightedge and a compass. These
are known as geometric construction problems.
Recall from elementary geometry that using a straightedge and compass, it is pos-
sible to draw a line parallel to a given line segment through a given point, to extend a
given line segment, and to erect a perpendicular to a given line at a given point on that
line. There were other geometric construction problems that the Greeks could not de-
termine straightedge and compass solutions but, on the other hand, were never able to
prove that such constructions were impossible. In particular, there were four famous in-
solvable (to the Greeks) construction problems. The first is the squaring of the circle. This
problem is, given a circle, to construct using straightedge and compass a square having
an area equal to that of the given circle. The second is the doubling of the cube. This prob-
lem is, given a cube of given side length, to construct using a straightedge and compass,
a side of a cube having double the volume of the original cube. The third problem is the
trisection of an angle. This problem is to trisect a given angle using only a straightedge
and compass. The final problem is the construction of a regular n-gon. This problems
asks which regular n-gons could be constructed using only straightedge and compass.
By translating each of these problems into the language of field extensions, we can
show that each of the first three problems are insolvable in general, and we can give the
complete solution to the construction of the regular n-gons.

6.2 Constructible Numbers and Field Extensions


We now translate the geometric construction problems into the language of field exten-
sions. As a first step, we define a constructible number.

Definition 6.2.1. Suppose we are given a line segment of unit length. An α ∈ ℝ is con-
structible if we can construct a line segment of length |α|, in a finite number of steps,
from the unit segment using a straightedge and compass.

Our first result is that the set of all constructible numbers forms a subfield of ℝ.

Theorem 6.2.2. The set 𝒞 of all constructible numbers forms a subfield of ℝ. Furthermore,
ℚ ⊂ 𝒞.

Proof. Let 𝒞 be the set of all constructible numbers. Since the given unit length segment
is constructible, we have 1 ∈ 𝒞 . Therefore, 𝒞 ≠ 0. Thus, to show that it is a field, we must
show that it is closed under the field operations.

https://doi.org/10.1515/9783111142524-006
6.2 Constructible Numbers and Field Extensions � 79

Suppose α, β are constructible. We must show then that α ± β, αβ, and α/β for β ≠ 0
are constructible. If α, β > 0, construct a line segment of length |α|. At one end of this
line segment, extend it by a segment of length |β|. This will construct a segment of length
α + β. Similarly, if α > β, lay off a segment of length |β| at the beginning of a segment of
length |α|. The remaining piece will be α − β. By considering cases, we can do this in the
same manner if either α or β, or both, are negative. These constructions are pictured in
Figure 6.1. Therefore, α ± β are constructible.

Figure 6.1: Addition of constructible numbers.

In Figure 6.2, we show how to construct αβ. Let the line segment OA have length |α|.
Consider a line L through O not coincident with OA. Let OB have length |β| as in the
diagram. Let P be on ray OB so that OP has length 1. Draw AP and then find Q on ray OA
such that BQ is parallel to AP. From similar triangles, we then have

|OP| |OA| 1 |α|


= ⇒ = .
|OB| |OQ| |β| |OQ|

Then |OQ| = |α||β|, and so αβ is constructible.

Figure 6.2: Multiplication of constructible numbers.

A similar construction, pictured in Figure 6.3, shows that α/β for β ≠ 0 is con-
structible. Find OA, OB, OP as above. Now, connect A to B, and let PQ be parallel to AB.
From similar triangles again, we have

1 |OQ| |α|
= 󳨐⇒ = |OQ|.
|β| |α| |β|

Hence, α/β is constructible.


Therefore, 𝒞 is a subfield of ℝ. Since char 𝒞 = 0, it follows that ℚ ⊂ 𝒞 .

Let us now consider how a constructible number is found in the plane. Starting at
the origin and using the unit length and the constructions above, we can locate any point
80 � 6 Field Extensions and Compass and Straightedge Constructions

Figure 6.3: Inversion of constructible numbers.

in the plane with rational coordinates. That is, we can construct the point P = (q1 , q2 )
with q1 , q2 ∈ ℚ. Using only straightedge and compass, any further point in the plane can
be determined in one of the following three ways:
1. The intersection point of two lines, each of which passes through two known points
each having rational coordinates.
2. The intersection point of a line passing through two known points having rational
coordinates and a circle, whose center has rational coordinates, and whose radius
squared is rational.
3. The intersection point of two circles, each of whose centers has rational coordinates,
and each of whose radii is the square root of a rational number.

Analytically, the first case involves the solution of a pair of linear equations, each with
rational coefficients and, thus, only leads to other rational numbers. In cases two and
three, we must solve equations of the form x 2 + y2 + ax + by + c = 0, with a, b, c ∈ ℚ. These
will then be quadratic equations over ℚ and, thus, the solutions will either be in ℚ, or
in a quadratic extension ℚ(√α) of ℚ. Once a real quadratic extension of ℚ is found, the
process can be iterated. Conversely, using the altitude theorem, if α is constructible, so
is √α. A much more detailed description of the constructible numbers can be found in
[52]. We thus can prove the following theorem:

Theorem 6.2.3. If γ is constructible with γ ∉ ℚ, then there exists a finite number of el-
ements α1 , . . . , αr ∈ ℝ with αr = γ such that for i = 1, . . . , r, ℚ(α1 , . . . , αi ) is a quadratic
extension of ℚ(α1 , . . . , αi−1 ). In particular, |ℚ(γ) : ℚ| = 2n for some n ≥ 1.

Therefore, the constructible numbers are precisely those real numbers that are con-
tained in repeated quadratic extensions of ℚ. In the next section, we use this idea to
show the impossibility of the first three mentioned construction problems.

6.3 Four Classical Construction Problems


We now consider the aforementioned construction problems. Our main technique will
be to use Theorem 6.2.3. From this result, we have that if γ is constructible with γ ∉ ℚ,
then |ℚ(γ) : ℚ| = 2n for some n ≥ 1.
6.3 Four Classical Construction Problems � 81

6.3.1 Squaring the Circle

Theorem 6.3.1. It is impossible to square the circle. That is, it is impossible in general,
given a circle, to construct using straightedge and compass a square having area equal to
that of the given circle.

Proof. Suppose the given circle has radius 1. It is then constructible and would have
an area of π. A corresponding square would then have to have a side of length √π. To
be constructible a number, α must have |ℚ(α) : ℚ| = 2m < ∞ and, hence, α must be
algebraic. However, π is transcendental, so √π is also transcendental (see Section 20.4);
therefore not constructible.

6.3.2 The Doubling of the Cube

Theorem 6.3.2. It is impossible to double the cube. This means that it is impossible in
general, given a cube of given side length, to construct using a straightedge and compass,
a side of a cube having double the volume of the original cube.

Proof. Let the given side length be 1, so that the original volume is also 1. To double this,
we would have to construct a side of length 21/3 . However, |ℚ(21/3 ) : ℚ| = 3 since the
minimal polynomial over ℚ is m21/3 (x) = x 3 − 2. This is not a power of 2, so 21/3 is not
constructible.

6.3.3 The Trisection of an Angle

Theorem 6.3.3. It is impossible to trisect an angle. This means that it is impossible, in


general, to trisect a given angle using only a straightedge and compass.

Proof. An angle θ is constructible if and only if a segment of length | cos θ| is con-


structible. Since cos(π/3) = 1/2, therefore, π/3 is constructible. We show that it cannot
be trisected by straightedge and compass.
The following trigonometric identity holds:

cos(3θ) = 4 cos3 (θ) − 3 cos(θ).

Let α = cos(π/9). From the above identity, we have 4α3 − 3α − 21 = 0.


The polynomial 4x 3 − 3x − 21 is irreducible over ℚ and, hence, the minimal poly-
nomial over ℚ is mα (x) = x 3 − 43 x − 81 . It follows that |ℚ(α) : ℚ| = 3; hence, α is not
constructible. Therefore, the corresponding angle π/9 is not constructible. Therefore,
π/3 is constructible, but it cannot be trisected.
82 � 6 Field Extensions and Compass and Straightedge Constructions

6.3.4 Construction of a Regular n-Gon

The final construction problem we consider is the construction of regular n-gons. The
algebraic study of the constructibility of regular n-gons was initiated by Gauss in the
early part of the nineteenth century.
Notice first that a regular n-gon will be constructible for n ≥ 3 if and only if the
angle 2πn
is constructible, which is the case if and only if the length cos 2π n
is a con-

structible number. From our techniques, if cos n is a constructible number, then nec-
essarily |ℚ(cos( 2π n
)) : ℚ| = 2m for some m. After we discuss Galois theory, we see that
this condition is also sufficient. Therefore, cos 2πn
is a constructible number if and only
if |ℚ(cos( 2π
n
)) : ℚ| = 2m
for some m.
The solution of this problem, that is, the determination of when |ℚ(cos( 2πn
)) : ℚ| = 2m ,
involves two concepts from number theory: the Euler phi-function and Fermat primes.

Definition 6.3.4. For any natural number n, the Euler phi-function is defined by

ϕ(n) = number of integers less than or equal to n, and relatively prime to n.

Example 6.3.5. ϕ(6) = 2 since among 1, 2, 3, 4, 5, 6 only 1, 5 are relatively prime to 6.

It is fairly straightforward to develop a formula for ϕ(n). A formula is first deter-


mined for primes and for prime powers, and then pasted back together via the funda-
mental theorem of arithmetic.

Lemma 6.3.6. For any prime p and m > 0,

1
ϕ(pm ) = pm − pm−1 = pm (1 − ).
p
Proof. If 1 ≤ a ≤ p, then either a = p, or (a, p) = 1. It follows that the positive integers
less than or equal to pm , which are not relatively prime to pm are precisely the multiples
of p; that is, p, 2p, 3p, . . . , pm−1 ⋅ p. All other positive a < pm are relatively prime to pm .
Hence, the number relatively prime to pm is

pm − pm−1 .

Lemma 6.3.7. If (a, b) = 1, then ϕ(ab) = ϕ(a)ϕ(b).

Proof. Given a natural number n, a reduced residue system modulo n is a set of integers
x1 , . . . , xk such that each xi is relatively prime to n, xi ≠ xj modulo n unless i = j, and if
(x, n) = 1 for some integer x, then x ≡ xi (mod n) for some i. Clearly, ϕ(n) is the size of a
reduced residue system modulo n.
Let Ra = {x1 , . . . , xϕ(a) } be a reduced residue system modulo a, Rb = {y1 , . . . , yϕ(b) } be
a reduced residue system modulo b, and let

S = {ayi + bxj : i = 1, . . . , ϕ(b), j = 1, . . . , ϕ(a)}.


6.3 Four Classical Construction Problems � 83

We claim that S is a reduced residue system modulo ab. Since S has ϕ(a)ϕ(b) elements,
it will follow that ϕ(ab) = ϕ(a)ϕ(b).
To show that S is a reduced residue system modulo ab, we must show three things:
first that each x ∈ S is relatively prime to ab; second that the elements of S are distinct;
and, finally, that given any integer n with (n, ab) = 1, then n ≡ s (mod ab) for some s ∈ S.
Let x = ayi + bxj . Then since (xj , a) = 1 and (a, b) = 1, it follows that (x, a) = 1.
Analogously, (x, b) = 1. Since x is relatively prime to both a and b, we have (x, ab) = 1.
This shows that each element of S is relatively prime to ab.
Next suppose that
ayi + bxj ≡ ayk + bxl (mod ab).

Then

ab|(ayi + bxj ) − (ayk + bxl ) 󳨐⇒ ayi ≡ ayk (mod b).

Since (a, b) = 1, it follows that yi ≡ yk (mod b). But then yi = yk since Rb is a reduced residue
system. Similarly, xj = xl . This shows that the elements of S are distinct modulo ab.
Finally, suppose (n, ab) = 1. Since (a, b) = 1, there exist x, y with ax + by = 1. Then

anx + bny = n.

Since (x, b) = 1, and (n, b) = 1, it follows that (nx, b) = 1. Therefore, there is an si with
nx = si + tb. In the same manner, (ny, a) = 1, and so there is an rj with ny = rj + ua. Then

a(si + tb) + b(rj + ua) = n 󳨐⇒ n = asi + brj + (t + u)ab


󳨐⇒ n ≡ ari + bsj (mod ab),

and we are done.

We now give the general formula for ϕ(n).


e e
Theorem 6.3.8. Suppose n = p11 ⋅ ⋅ ⋅ pkk , then
e e −1 e e −1 e e −1
ϕ(n) = (p11 − p11 )(p22 − p22 ) ⋅ ⋅ ⋅ (pkk − pkk ).

Proof. From the previous lemma, we have


e e e
ϕ(n) = ϕ(p11 )ϕ(p22 ) ⋅ ⋅ ⋅ ϕ(pkk )
e e −1 e e −1 e e −1
= (p11 − p11 )(p22 − p22 ) ⋅ ⋅ ⋅ (pkk − pkk )
e e e e
= p11 (1 − 1/p1 ) ⋅ ⋅ ⋅ pkk (1 − 1/pk ) = p11 ⋅ ⋅ ⋅ pkk ⋅ (1 − 1/p1 ) ⋅ ⋅ ⋅ (1 − 1/pk )
= n ∏(1 − 1/pi ).
i

Example 6.3.9. Determine ϕ(126). Now

126 = 2 ⋅ 32 ⋅ 7 󳨐⇒ ϕ(126) = ϕ(2)ϕ(32 )ϕ(7) = (1)(32 − 3)(6) = 36.


84 � 6 Field Extensions and Compass and Straightedge Constructions

Hence, there are 36 units in ℤ126 .

An interesting result with many generalizations in number theory is the following:

Theorem 6.3.10. For n > 1 and for d ≥ 1

∑ ϕ(d) = n.
d|n

Proof. We first prove the theorem for prime powers and then paste together via the
fundamental theorem of arithmetic.
Suppose that n = pe for p a prime. Then the divisors of n are 1, p, p2 , . . . , pe , so

∑ ϕ(d) = ϕ(1) + ϕ(p) + ϕ(p2 ) + ⋅ ⋅ ⋅ + ϕ(pe )


d|n

= 1 + (p − 1) + (p2 − p) + ⋅ ⋅ ⋅ + (pe − pe−1 ).

Notice that this sum telescopes; that is, 1 + (p − 1) = p, p + (p2 − p) = p2 and so on.
Hence, the sum is just pe , and the result is proved for n a prime power.
We now do an induction on the number of distinct prime factors of n. The above
argument shows that the result is true if n has only one distinct prime factor. Assume
that the result is true whenever an integer has less than k distinct prime factors, and
e e
suppose n = p11 ⋅ ⋅ ⋅ pkk has k distinct prime factors. Then n = pe c, where p = p1 , e = e1 ,
and c has fewer than k distinct prime factors. By the inductive hypothesis

∑ ϕ(d) = c.
d|c

Since (c, p) = 1, the divisors of n are all of the form pα d1 , where d1 |c, and
α = 0, 1, . . . , e. It follows that

∑ ϕ(d) = ∑ ϕ(d1 ) + ∑ ϕ(pd1 ) + ⋅ ⋅ ⋅ + ∑ ϕ(pe d1 ).


d|n d1 |c d1 |c d1 |c

Since (d1 , pα ) = 1, for any divisor of c, this sum equals

∑ ϕ(d1 ) + ∑ ϕ(p)ϕ(d1 ) + ⋅ ⋅ ⋅ + ∑ ϕ(pe )ϕ(d1 )


d1 |c d1 |c d1 |c

= ∑ ϕ(d1 ) + (p − 1) ∑ ϕ(d1 ) + ⋅ ⋅ ⋅ + (pe − pe−1 ) ∑ ϕ(d1 )


d1 |c d1 |c d1 |c
2 e e−1
= c + (p − 1)c + (p − p)c + ⋅ ⋅ ⋅ + (p − p )c.

As in the case of prime powers, this sum telescopes, giving a final result

∑ ϕ(d) = pe c = n.
d|n
6.3 Four Classical Construction Problems � 85

Example 6.3.11. Consider n = 10. The divisors are 1, 2, 5, 10. Then ϕ(1) = 1, ϕ(2) = 1,
ϕ(5) = 4, and ϕ(10) = 4. Then

ϕ(1) + ϕ(2) + ϕ(5) + ϕ(10) = 1 + 1 + 4 + 4 = 10.

We will see later in the book that the Euler phi-function plays an important role in
the structure theory of Abelian groups.
We now turn to Fermat primes.

Definition 6.3.12. The Fermat numbers are the sequence (Fn ) of positive integers de-
fined by
n
Fn = 22 + 1, n = 0, 1, 2, 3, . . . .

If a particular Fn is prime, it is called a Fermat prime.

Fermat believed that all the numbers in this sequence were primes. In fact, F0 , F1 ,
F2 , F3 , F4 are all primes, but F5 is composite and divisible by 641 (see exercises). It is still
an open question whether or not there are infinitely many Fermat primes. It has been
conjectured that there are only finitely many. On the other hand, if a number of the form
2n + 1 is a prime for some integer n, then it must be a Fermat prime.

Theorem 6.3.13. If a ≥ 2 and an + 1 is a prime for some n ≥ 1, then a is even, and n = 2m


for some nonnegative integer m. In particular, if p = 2k + 1 is a prime for some k ≥ 1, then
k = 2n for some n, and p is a Fermat prime.

Proof. If a is odd then an + 1 is even and, hence, not a prime. Suppose then that a is even
and n = kl with k odd and k ≥ 3. Then

akl + 1
= a(k−1)l − a(k−2)l + ⋅ ⋅ ⋅ + 1.
al + 1

Therefore, al +1 divides akl +1 if k ≥ 3. Hence, if an +1 is a prime, we must have n = 2m .

We can now state the solution to the constructibility of regular n-gons.

Theorem 6.3.14. A regular n-gon is constructible with a straightedge and compass if and
only if n = 2m p1 ⋅ ⋅ ⋅ pk , where p1 , . . . , pk are distinct Fermat primes.

For example, before proving the theorem, notice that a regular 20-gon is con-
structible since 20 = 22 ⋅ 5, and 5 is a Fermat prime. On the other hand, a regular 11-gon
is not constructible.
2πi
Proof. Let μ = e n be a primitive n-th root of unity. Since

2πi 2π 2π
e n = cos( ) + i sin( )
n n
86 � 6 Field Extensions and Compass and Straightedge Constructions

is easy to compute that (see exercises)

1 2π
μ+ = 2 cos( ).
μ n

Therefore, ℚ(μ + μ1 ) = ℚ(cos( 2π


n
)). After we discuss Galois theory in more detail, we will
prove that
󵄨󵄨 1 󵄨󵄨 ϕ(n)
󵄨󵄨 󵄨
󵄨󵄨ℚ(μ + ) : ℚ󵄨󵄨󵄨 = ,
󵄨󵄨 μ 󵄨󵄨 2

where ϕ(n) is the Euler phi-function. Therefore, cos( 2π


n
) is constructible if and only if
ϕ(n)
2
and, hence, ϕ(n) is a power of 2.
e e
Suppose that n = 2m p11 ⋅ ⋅ ⋅ pkk , all pi odd primes. Then from Theorem 6.3.8,

e e −1 e e −1 e e −1
ϕ(n) = 2m−1 ⋅ (p11 − p11 )(p22 − p22 ) ⋅ ⋅ ⋅ (pkk − pkk ).

If this was a power of 2 each factor must also be a power of 2. Now


e e −1 e −1
pi i − pi i = pi i (pi − 1).

If this is to be a power of 2, we must have ei = 1 and pi − 1 = 2ki for some ki . Therefore,


each prime is distinct to the first power, and pi = 2ki + 1 is a Fermat prime, proving the
theorem.

6.4 Exercises
1. Let ϕ be a given angle. In which of the following cases is the angle ψ constructible
from the angle ϕ by compass and straightedge?
π π
(a) ϕ = 13 and ψ = 26 .
π π
(b) ϕ = 33 and ψ = 11 .
(c) ϕ = π7 and ψ = 12π
.
2. (The golden section) In the plane, let AB be a given segment from A to B with length a.
The segment AB should be divided such that the proportion of AB to the length of the
bigger subsegment is equal to the proportion of the length of the bigger subsegment
to the length of the smaller subsegment:

a b
= ,
b a−b

where b is the length of the bigger subsegment. Such a division is called division by
the golden section. If we write b = ax, 0 < x < 1, then x1 = 1−x
x
, that is, x 2 = 1 − x. Do
the following:
(a) Show that x1 = 1+2 5 = α.

6.4 Exercises � 87

(b) Construct the division of AB by the golden section with compass and straight-
edge.
(c) If we divide the radius r > 0 of a circle by the golden section, then the bigger
part of the so divided radius is the side of the regular 10-gon with its 10 vertices
on the circle.
3. Given a regular 10-gon such that the 10 vertices are on the circle with radius R > 0.
Show that the length of each side is equal to the bigger part of the radius divided by
the golden section. Describe the procedure of the construction of the regular 10-gon
and 5-gon.
4. Construct the regular 17-gon with compass and straightedge.
2πi
Hint: We have to construct the number 21 (ω + ω−1 ) = cos 2π 17
, where ω = e 17 . First,
construct the positive zero ω1 of the polynomial x 2 + x − 4; we get

1
ω1 = (√17 − 1) = ω + ω−1 + ω2 + ω−2 + ω4 + ω−4 + ω8 + ω−8 .
2

Then, construct the positive zero ω2 of the polynomial x 2 − ω1 x − 1; we get

1 √
ω2 = ( 17 − 1 + √34 − 2√17) = ω + ω−1 + ω4 + ω−4 .
4

From ω1 and ω2 , construct β = 21 (ω22 − ω1 + ω2 − 4). Then ω3 = 2 cos 2π


17
is the biggest
of the two positive zeros of the polynomial x 2 − ω2 x + β.
5. The Fibonacci numbers fn are defined by f0 = 0, f1 = 1 and fn+2 = fn+1 + fn for
n ∈ ℕ ∪ {0}. Show the following:
n
−βn
(a) fn = αα−β with α = 1+2 5 , β = 1−2 5 .
√ √

fn+1 fn+1 1+√5


(b) ( )
fn n∈ℕ
converges and limn→∞ fn
= 2
= α.
n f fn
(c) ( 01 11 ) = ( n−1
fn fn+1 ), n ∈ ℕ.
(d) f1 + f2 + ⋅ ⋅ ⋅ + fn = fn+2 − 1, n ≥ 1.
(e) fn−1 fn+1 − fn2 = (−1)n , n ∈ ℕ.
(f) f12 + f22 + ⋅ ⋅ ⋅ + fn2 = fn fn+1 , n ∈ ℕ.
6. Show: The Fermat numbers F0 , F1 , F2 , F3 , F4 are all prime but F5 is composite and
divisible by 641.
2πi
7. Let μ = e n be a primitive n-th root of unity. Using

2πi 2π 2π
e n = cos( ) + i sin( ),
n n

show that

1 2π
μ+ = 2 cos( ).
μ n
7 Kronecker’s Theorem and Algebraic Closures
7.1 Kronecker’s Theorem
In the last chapter, we proved that if L|K is a field extension, then there exists an inter-
mediate field K ⊂ 𝒜 ⊂ L such that 𝒜 is algebraic over K, and contains all the elements of
L that are algebraic over K. We call 𝒜 the algebraic closure of K within L. In this chapter,
we prove that starting with any field K, we can construct an extension field K that is al-
gebraic over K and is algebraically closed. By this, we mean that there are no algebraic
extensions of K or, equivalently, that there are no irreducible nonlinear polynomials in
K[x]. In the final section of this chapter, we will give a proof of the famous fundamental
theorem of algebra, which in the language of this chapter says that the field ℂ of com-
plex numbers is algebraically closed. We will present another proof of this important
result later in the book after we discuss Galois theory.
First, we need the following crucial result of Kronecker, which says that given a
polynomial f (x) in K[x], where K is a field, we can construct an extension field L of K, in
which f (x) has a zero α. We say that L has been constructed by adjoining α to K. Recall
that if f (x) ∈ K[x] is irreducible, then f (x) can have no zeros in K. We first need the
following concept:

Definition 7.1.1. Let L|K and L′ |K be field extensions. Then a K-isomorphism is an iso-
morphism τ : L → L′ , that is, the identity map on K; thus, it fixes each element of K.

Theorem 7.1.2 (Kronecker’s theorem). Let K be a field and f (x) ∈ K[x]. Then there exists
a finite extension K ′ of K, where f (x) has a zero.

Proof. Suppose that f (x) ∈ K[x]. We know that f (x) factors into irreducible polynomials.
Let p(x) be an irreducible factor of f (x). From the material in Chapter 4, we know that
since p(x) is irreducible, the principal ideal ⟨p(x)⟩ in K[x] is a maximal ideal. To see this,
suppose that g(x) ∉ ⟨p(x)⟩, so that g(x) is not a multiple of p(x). Since p(x) is irreducible,
it follows that (p(x), g(x)) = 1. Thus, there exist h(x), k(x) ∈ K[x] with

h(x)p(x) + k(x)g(x) = 1.

The element on the left is in the ideal (g(x), p(x)), so the identity, 1, is in this ideal. There-
fore, the whole ring K[x] is in this ideal. Since g(x) was arbitrary, this implies that the
principal ideal ⟨p(x)⟩ is maximal.
Now let K ′ = K[x]/⟨p(x)⟩. Since ⟨p(x)⟩ is a maximal ideal, it follows that K ′ is a field.
We show that K can be embedded in K ′ , and that p(x) has a zero in K ′ .
First, consider the map α : K[x] → K ′ by α(f (x)) = f (x) + ⟨p(x)⟩. This is a homo-
morphism. Since the identity element 1 ∈ K is not in ⟨p(x)⟩, it follows that α restricted
to K is nontrivial. Therefore, α restricted to K is a monomorphism since if ker(α|K ) ≠ K
then ker(α|K ) = {0}. Therefore, K can be embedded into α(K), which is contained in K ′ .
Therefore, K ′ can be considered as an extension field of K. Consider the element a =

https://doi.org/10.1515/9783111142524-007
7.1 Kronecker’s Theorem � 89

x + ⟨p(x)⟩ ∈ K ′ . Then p(a) = p(x) + ⟨p(x)⟩ = 0 + ⟨p(x)⟩ since p(x) ∈ ⟨p(x)⟩. But 0 + ⟨p(x)⟩
is the zero element 0 of the factor ring K[x]/⟨p(x)⟩. Therefore, in K ′ , we have p(a) = 0;
hence, p(x) has a zero in K ′ . Since p(x) divides f (x), we must have f (a) = 0 in K ′ also.
Therefore, we have constructed an extension field of K, in which f (x) has a zero.
In conformity to Chapter 5, we write K(a) for the field adjunction of a = x + ⟨(p(x))⟩
to K. We now outline an intuitive construction. From this, we say that the field K is
constructed by adjoining the zero (α) to K. We remark that this construction is not a
formally correct proof as that given for Theorem 7.1.2.
We can assume that f (x) is irreducible. Suppose that f (x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n with
an ≠ 0. Define α to satisfy

a0 + a1 α + ⋅ ⋅ ⋅ + an αn = 0.

Now, define K ′ = K(α) in the following manner. We let

K(α) = {c0 + c1 α + ⋅ ⋅ ⋅ + cn−1 αn−1 : ci ∈ K}.

Then on K(α), define addition and subtraction componentwise, and define multiplica-
tion by algebraic manipulation, replacing powers of α higher than αn by using

−a0 − a1 α − ⋅ ⋅ ⋅ − an−1 αn−1


αn = .
an

We claim that K ′ = K(α), then forms a field of finite degree over K. The basic
ring properties follow easily by computation (see exercises) using the definitions. We
must show then that every nonzero element of K(α) has a multiplicative inverse. Let
g(α) ∈ K(α). Then the corresponding polynomial g(x) ∈ K[x] is a polynomial of de-
gree ≤ n − 1. Since f (x) is irreducible of degree n, it follows that f (x) and g(x) must be
relatively prime; that is, (f (x), g(x)) = 1. Hence, there exist a(x), b(x) ∈ K[x] with

a(x)f (x) + b(x)g(x) = 1.

Evaluate these polynomials at α to get

a(α)f (α) + b(α)g(α) = 1.

Since by definition we have f (α) = 0, this becomes

b(α)g(α) = 1.

Now b(α) might have degree higher than n − 1 in α. However, using the relation that
f (α) = 0, we can rewrite b(α) as b(α), where b(α) now has degree ≤ n − 1 in α and, hence,
is in K(α). Therefore,

b(α)g(α) = 1;
90 � 7 Kronecker’s Theorem and Algebraic Closures

hence, g(α) has a multiplicative inverse. It follows that K(α) is a field and, by definition,
f (α) = 0. The elements 1, α, . . . , αn−1 form a basis for K(α) over K and, hence,

󵄨󵄨K(α) : K 󵄨󵄨󵄨 = n.
󵄨󵄨 󵄨

Example 7.1.3. Let f (x) = x 2 + 1 ∈ ℝ[x]. This is irreducible over ℝ. We construct the
field, in which this has a zero. Let K ′ ≅ K[x]/⟨x 2 + 1⟩, and let a ∈ K ′ with f (a) = 0. The
extension field ℝ(α) then has the form

K ′ = ℝ(α) = {x + αy : x, y ∈ ℝ, α2 = −1}.

It is clear that this field is ℝ-isomorphic to the complex numbers ℂ; ℝ(α) ≅ ℝ(i) ≅ ℂ.

Theorem 7.1.4. Let p(x) ∈ K[x] be an irreducible polynomial, and let K ′ = K(α) be the
extension field of K constructed in Kronecker’s theorem, in which p(x) has a zero α. Let L
be an extension field of K, and suppose that a ∈ L is algebraic with minimal polynomial
mα (x) = p(x). Then K(α) is K-isomorphic to K(a).

Proof. If L|K is a field extension and a ∈ L with p(a) = 0 and if deg(p(x)) = n, then the
elements 1, a, . . . , an−1 constitute a basis for K(a) over K, and the elements 1, α, . . . , αn−1
constitute a basis for K(α) over K. The mapping

τ : K(a) → K(α)

defined by τ(k) = k if k ∈ K and τ(a) = α, and then extended by linearity, is easily shown
to be a K-isomorphism.

Theorem 7.1.5. Let K be a field. Then the following are equivalent:


(1) Each nonconstant polynomial in K[x] has a zero in K.
(2) Each nonconstant polynomial in K[x] factors into linear factors over K. That is, for
each f (x) ∈ K[x], there exist elements a1 , . . . , an , b ∈ K with

f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ).

(3) An element of K[x] is irreducible if and only if it is of degree one.


(4) If L|K is an algebraic extension, then L = K.

Proof. Suppose that each nonconstant polynomial in K[x] has a zero in K.


Let f (x) ∈ K[x] with deg(f (x)) = n. Suppose that a1 is a zero of f (x), then

f (x) = (x − a1 )h(x),

where the degree of h(x) is n − 1. Now h(x) has a zero a2 in K so that

f (x) = (x − a1 )(x − a2 )g(x)


7.2 Algebraic Closures and Algebraically Closed Fields � 91

with deg(g(x)) = n − 2. Continue in this manner, and f (x) factors completely into linear
factors. Hence, (1) implies (2).
Now suppose (2); that is, that each nonconstant polynomial in K[x] factors into lin-
ear factors over K. Suppose that f (x) is irreducible. If deg(f (x)) > 1, then f (x) factors
into linear factors and, hence, is not irreducible. Therefore, f (x) must be of degree 1,
and (2) implies (3).
Now suppose that an element of K[x] is irreducible if and only if it is of degree one,
and suppose that L|K is an algebraic extension. Let a ∈ L. Then a is algebraic over K.
Its minimal polynomial ma (x) is monic and irreducible over K and, hence, from (3), is
linear. Therefore, ma (x) = x−a ∈ K[x]. It follows that a ∈ K and, hence, K = L. Therefore,
(3) implies (4).
Finally, suppose that whenever L|K is an algebraic extension, then L = K. Suppose
that f (x) is a nonconstant polynomial in K[x]. From Kronecker’s theorem, there exists
a field extension L, and a ∈ L with f (a) = 0. However, L is an algebraic extension.
Therefore, by supposition, K = L. Therefore, a ∈ K, and f (x) has a zero in K. Therefore,
(4) implies (1), completing the proof.

In the next section, we will prove that given a field K, we can always find an exten-
sion field K with the properties of the last theorem.

7.2 Algebraic Closures and Algebraically Closed Fields


A field K is termed algebraically closed if K has no algebraic extensions other than K
itself. This is equivalent to any one of the conditions of Theorem 7.1.5.

Definition 7.2.1. A field K is algebraically closed if every nonconstant polynomial f (x) ∈


K[x] has a zero in K.

The following theorem is just a restatement of Theorem 7.1.5.

Theorem 7.2.2. A field K is algebraically closed if and only it satisfies any one of the fol-
lowing conditions:
(1) Each nonconstant polynomial in K[x] has a zero in K.
(2) Each nonconstant polynomial in K[x] factors into linear factors over K. That is, for
each f (x) ∈ K[x], there exist elements a1 , . . . , an , b ∈ K with

f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ).

(3) An element of K[x] is irreducible if and only if it is of degree one.


(4) If L|K is an algebraic extension, then L = K.

The prime example of an algebraically closed field is the field ℂ of complex num-
bers. The fundamental theorem of algebra says that any nonconstant complex polyno-
mial has a complex zero.
92 � 7 Kronecker’s Theorem and Algebraic Closures

We now show that the algebraic closure of one field within an algebraically closed
field is algebraically closed. First, we define a general algebraic closure.

Definition 7.2.3. An extension field K of a field K is an algebraic closure of K if K is


algebraically closed and K|K is algebraic.

Theorem 7.2.4. Let K be a field and L|K an extension of K with L algebraically closed. Let
K = 𝒜K be the algebraic closure of K within L. Then K is an algebraic closure of K.

Proof. Let K = 𝒜K be the algebraic closure of K within L. We know that K|K is algebraic.
Therefore, we must show that K is algebraically closed.
Let f (x) be a nonconstant polynomial in K[x]. Then f (x) ∈ L[x]. Since L is alge-
braically closed, f (x) has a zero a in L. Since f (a) = 0 and f (x) ∈ K[x], it follows that a is
algebraic over K. However, K is algebraic over K. Therefore, a is also algebraic over K.
Hence, a ∈ K, and f (x) has a zero in K. Therefore, K is algebraically closed.

We want to note the distinction between being algebraically closed and being an
algebraic closure.

Lemma 7.2.5. The complex numbers ℂ are an algebraic closure of ℝ, but not an algebraic
closure of ℚ. An algebraic closure of ℚ is 𝒜 the field of algebraic numbers within ℂ.

Proof. ℂ is algebraically closed (the fundamental theorem of algebra), and since


|ℂ : ℝ| = 2, it is algebraic over ℝ. Therefore, ℂ is an algebraic closure of ℝ. Although
ℂ is algebraically closed and contains the rational numbers ℚ, it is not an algebraic
closure of ℚ since it is not algebraic over ℚ as there exist transcendental elements.
On the other hand, 𝒜, the field of algebraic numbers within ℚ, is an algebraic clo-
sure of ℚ from Theorem 7.2.4.

We now show that every field has an algebraic closure. To do this, we first show that
any field can be embedded into an algebraically closed field.

Theorem 7.2.6. Let K be a field. Then K can be embedded into an algebraically closed
field.

Proof. We show first that there is an extension field L of K, in which each nonconstant
polynomial f (x) ∈ K[x] has a zero in L.
Assign to each nonconstant f (x) ∈ K[x] the symbol yf , and consider

R = K[ yf : f (x) ∈ K[x]],

the polynomial ring over K in the variables yf . Let

n
I = {∑ fj (yfj )rj : rj ∈ R, fj (x) ∈ K[x]}.
j=1
7.2 Algebraic Closures and Algebraically Closed Fields � 93

It is straightforward that I is an ideal in R. Suppose that I = R. Then 1 ∈ I. Hence, there


is a linear combination

1 = g1 f1 (yf1 ) + ⋅ ⋅ ⋅ + gn fn (yfn ),

where gi ∈ I = R.
In the n polynomials g1 , . . . , gn , there are only a finite number of variables, say for
example,

yf1 , . . . , yfn , . . . , yfm .

Hence,
n
1 = ∑ gi (yf1 , . . . , yfm )fi (yfi ). (∗)
i=1

Successive applications of Kronecker’s theorem lead us to construct an extension field


P of K, in which each fi has a zero ai . Substituting ai for yfi in (∗) above, we get that 1 = 0
a contradiction. Therefore, I ≠ R.
Since I is a ideal not equal to the whole ring R, it follows that I is contained in a
maximal ideal M of R. Set L = R/M. Since M is maximal L is a field. Now K ∩ M = {0}. If
not, suppose that a ∈ K ∩ M with a ≠ 0. Then a−1 a = 1 ∈ M, and then M = R. Now define
τ : K → L by τ(k) = k + M. Since K ∩ M = {0}, it follows that ker(τ) = {0}. Therefore, τ is a
monomorphism. This allows us to identify K and τ(K), and shows that K embeds into L.
Now suppose that f (x) is a nonconstant polynomial in K[x]. Then

f (yf + M) = f (yf ) + M.

However, by the construction f (yf ) ∈ M, so that

f (yf + M) = M = the zero element of L.

Therefore, yf + M is a zero of f (x).


Therefore, we have constructed a field L, in which every nonconstant polynomial
in K[x] has a zero in L.
We now iterate this procedure to form a chain of fields

K ⊂ K1 (= L) ⊂ K2 ⊂ ⋅ ⋅ ⋅

such that each nonconstant polynomial of Ki [x] has a zero in Ki+1 .


Now let K̂ = ⋃I Ki . It is easy to show (see exercises) that K̂ is a field. If f (x) is a
nonconstant polynomial in K[x],̂ then there is some i with f (x) ∈ Ki [x]. Therefore, f (x)
has a zero in Ki+1 [x] ⊂ K. Hence, f (x) has a zero in K,̂ and K̂ is algebraically closed.
̂
94 � 7 Kronecker’s Theorem and Algebraic Closures

Theorem 7.2.7. Let K be a field. Then K has an algebraic closure.

Proof. Let K̂ be an algebraically closed field containing K, which exists from Theo-
rem 7.2.6. Now let K = 𝒜K̂ be the set of elements of K̂ that are algebraic over K. From
Theorem 7.2.4, K̂ is an algebraic closure of K.

The following lemma is straightforward. We leave the proof to the exercises.

Lemma 7.2.8. Let K, K ′ be fields and ϕ : K → K ′ a homomorphism. Then

ϕ̃ : K[x] → K ′ [x], given by


n n
̃ ∑ k x i ) = ∑(ϕ(k ))x i ,
ϕ( i i
i=1 i=0

is also a homomorphism. By convention, we identify ϕ and ϕ̃ and write ϕ = ϕ.̃ If ϕ is an


isomorphism, then so is ϕ.̃

Lemma 7.2.9. Let K, K ′ be fields and ϕ : K → K ′ an isomorphism. Let f (x) ∈ K[x] be


irreducible. Let K ⊂ K(a) and K ′ ⊂ K ′ (a′ ), where a is a zero of f (x) and a′ is a zero of
ϕ(f (x)). Then there is an isomorphism ψ : K(a) → K ′ (a′ ) with ψ|K = ϕ and ψ(a) = a′ .
Furthermore, ψ is uniquely determined.

Proof. This is a generalized version of Theorem 7.1.4. If b ∈ K(a), then from the con-
struction of K(a), there is a polynomial g(x) ∈ K[x] with b = g(a). Define a map

ψ : K(a) → K ′ (a′ )

by

ψ(b) = ϕ(g(x))(a′ ).

We show that ψ is an isomorphism.


First, ψ is well defined. Suppose that b = g(a) = h(a) with h(x) ∈ K[x]. Then (g −
h)(a) = 0. Since f (x) is irreducible, this implies that f (x) = cma (x), and since a is a zero
of (g − h)(x), then f (x)|(g − h)(x). Then

ϕ(f (x))󵄨󵄨󵄨(ϕ(g(x)) − ϕ(h(x))).


󵄨

Since ϕ(f (x))(a′ ) = 0, this implies that ϕ(g(x))(a′ ) = ϕ(h(x))(a′ ); hence, the map ψ is
well defined.
It is easy to show that ψ is a homomorphism. Let b1 = g1 (a), b2 = g2 (a). Then b1 b1 =
g1 g2 (a). Hence,

ψ(b1 b2 ) = (ϕ(g1 g2 ))(a′ ) = ϕ(g1 )(a′ )ϕ(g2 )(a′ ) = ψ(b1 )ψ(b2 ).


7.2 Algebraic Closures and Algebraically Closed Fields � 95

In the same manner, we have ψ(b1 + b2 ) = ψ(b1 ) + ψ(b2 ). Now suppose that k ∈ K so that
k ∈ K[x] is a constant polynomial. Then ψ(k) = (ϕ(k))(a′ ) = ϕ(k). Therefore, ψ restricted
to K is precisely ϕ. As ψ is not the zero mapping, it follows that ψ is a monomorphism.
Finally, since K(a) is generated from K and a, and ψ restricted to K is ϕ, it follows
that ψ is uniquely determined by ϕ and ψ(a) = a′ . Hence, ψ is unique.

Theorem 7.2.10. Let L|K be an algebraic extension. Suppose that L1 is an algebraically


closed field and ϕ is an isomorphism from K to K1 ⊂ L1 . Then there exists a monomorphism
ψ from L to L1 with ψ|K = ϕ.

Before we give the proof, we note that the theorem gives the following diagram:

In particular, the theorem can be applied to monomorphisms of a field K within


an algebraic closure K of K. Specifically, suppose that K ⊂ K, where K is an algebraic
closure of K, and let α : K → K be a monomorphism with α(K) = K. Then there exists
an automorphism α∗ of K with α|∗K = α.

Proof of Theorem 7.2.10. Consider the set

ℳ = {(M, τ) : M is a field with K ⊂ M ⊂ L,


where there exists a monomorphism τ : M → L1 with τ|K = ϕ}.

Now the set ℳ is nonempty since (K, ϕ) ∈ ℳ. Order ℳ by (M1 , τ1 ) < (M2 , τ2 ) if
M1 ⊂ M2 and (τ2 )|M = τ1 . Let
1

𝒦 = {(Mi , τi ) : i ∈ I}

be a chain in ℳ. Let (M, τ) be defined by

M = ⋃ Mi with τ(a) = τi (a) for all a ∈ Mi .


i∈I

It is clear that M is an upper bound for the chain 𝒦. Since each chain has an upper bound
it follows from Zorn’s lemma that ℳ has a maximal element (N, ρ). We show that N = L.
Suppose that N ⊊ L. Let a ∈ L \ N. Then a is algebraic over N and further algebraic
over K, since L|K is algebraic. Let ma (x) ∈ N[x] be the minimal polynomial of a relative
to N. Since L1 is algebraically closed, ρ(ma (x)) has a zero a′ ∈ L1 . Therefore, there is a
monomorphism ρ′ : N(a) → L1 with ρ′ restricted to N, the same as ρ. It follows that
(N, ρ) < (N(a), ρ′ ) since a ∉ N. This contradicts the maximality of N. Therefore, N = L,
completing the proof.
96 � 7 Kronecker’s Theorem and Algebraic Closures

Combining the previous two theorems, we can now prove that any two algebraic
closures of a field K are unique up to K-isomorphism; that is, up to an isomorphism,
thus, is the identity on K.

Theorem 7.2.11. Let L1 and L2 be algebraic closures of the field K. Then there is a
K-isomorphism τ : L → L1 . Again by K-isomorphism, we mean that τ is the identity
on K.

Proof. From Theorem 7.2.7, there is a monomorphism τ : L1 → L2 with τ the identity


on K. However, since L1 is algebraically closed, so is τ(L1 ). Then L2 |τ(L1 ) is an algebraic
extension. Therefore, since L2 is algebraically closed, we must have L2 = τ(L1 ). There-
fore, τ is also surjective and, hence, an isomorphism.

The following corollary is immediate.

Corollary 7.2.12. Let L|K and L′ |K be field extensions with a ∈ L and a′ ∈ L′ algebraic
elements over K. Then K(a) is K-isomorphic to K(a′ ) if and only if |K(a) : K| = |K(a′ ) : K|,
and there is an element a′′ ∈ K(a′ ) with ma (x) = ma′′ (x).

7.3 The Fundamental Theorem of Algebra


The fundamental theorem of algebra is one of the most important algebraic results. This
says that any nonconstant complex polynomial must have a complex zero. In the lan-
guage of field extensions, this says that the field of complex numbers ℂ is algebraically
closed. There are many distinct and completely different proofs of this result. In [7],
twelve proofs were given covering a wide area of mathematics. In this section we pro-
vide an elementary proof of the fundamental theorem of algebra. Before doing this, we
briefly mention some of the history surrounding this theorem.
The first mention of the fundamental theorem of algebra, in the form that every
polynomial equation of degree n has exactly n zeros, was given by Peter Roth of Nurn-
berg in 1608. However, its conjecture is generally credited to Girard, who also stated
the result in 1629. It was then more clearly stated in 1637 by Descartes, who also distin-
guished between real and imaginary zeros. The first published proof of the fundamental
theorem of algebra was then given by D’Alembert in 1746. However, there were gaps in
D’Alembert’s proof, and the first fully accepted proof was that given by Gauss in 1797 in
his Ph. D. thesis. This was published in 1799. Interestingly enough, in reviewing Gauss’
original proof, modern scholars tend to agree that there are as many holes in this proof
as in D’Alembert’s proof. Gauss, however, published three other proofs with no such
holes. He published second and third proofs in 1816, while his final proof, which was
essentially another version of the first, was presented in 1849.
First, we need the concept of a splitting field for a polynomial.
7.3 The Fundamental Theorem of Algebra � 97

7.3.1 Splitting Fields

We have just seen that given an irreducible polynomial over a field K, we could always
find a field extension, in which this polynomial has a zero. We now push this further to
obtain field extensions, where a given polynomial has all its zeros.

Definition 7.3.1. If K is a field and 0 ≠ f (x) ∈ K[x], and K ′ is an extension field of K, then
f (x) splits in K ′ (K ′ may be K), if f (x) factors into linear factors in K ′ [x]. Equivalently,
this means that all the zeros of f (x) are in K ′ .
K ′ is a splitting field for f (x) over K if K ′ is the smallest extension field of K, in which
f (x) splits. (A splitting field for f (x) is the smallest extension field, in which f (x) has all
its possible zeros.)
K ′ is a splitting field over K if it is the splitting field for some finite set of polynomials
over K.

Theorem 7.3.2. If K is a field and 0 ≠ f (x) ∈ K[x], then there exists a splitting field for
f (x) over K.

Proof. The splitting field is constructed by repeated adjoining of zeros. Suppose, with-
out loss of generality, that f (x) is irreducible of degree n over K. From Theorem 7.1.2,
there exists a field K ′ containing α with f (α) = 0. Then f (x) = (x − α)g(x) ∈ K ′ [x] with
deg g(x) = n − 1. By an inductive argument, g(x) has a splitting field; therefore, so does
f (x).

7.3.2 Permutations and Symmetric Polynomials

To obtain a proof of the fundamental theorem of algebra, we need to go a bit outside


of our main discussions of rings and fields and introduce symmetric polynomials. To
introduce this concept, we first review some basic ideas from elementary group theory,
which we will look at in detail later in the book.

Definition 7.3.3. A group G is a set with one binary operation, which we will denote by
multiplication, such that the following hold:
(1) The operation is associative; that is, (g1 g2 )g3 = g1 (g2 g3 ) for all g1 , g2 , g3 ∈ G.
(2) There exists an identity for this operation; that is, an element 1 such that 1g = g for
each g ∈ G.
(3) Each g ∈ G has an inverse for this operation; that is, for each g, there exists a g −1
with the property that gg −1 = 1.

If in addition the operation is commutative (g1 g2 = g2 g1 for all g1 , g2 ∈ G), the group G
is called an Abelian group. The order of G is the number of elements in G, denoted |G|. If
|G| < ∞, G is a finite group. H ⊂ G is a subgroup if H is also a group under the same op-
eration as G. Equivalently, H is a subgroup if H ≠ 0, and H is closed under the operation
and inverses.
98 � 7 Kronecker’s Theorem and Algebraic Closures

Groups most often arise from invertible mappings of a set onto itself. Such mappings
are called permutations.

Definition 7.3.4. If T is a set, a permutation on T is a one-to-one mapping of T onto itself.


We denote the set of all permutations on T by ST .

Theorem 7.3.5. For any set T, ST forms a group under composition called the symmetric
group on T. If T, T1 have the same cardinality (size), then ST ≅ ST1 . If T is a finite set with
|T| = n, then ST is a finite group, and |ST | = n!.

Proof. If ST is the set of all permutations on the set T, we must show that composition
is an operation on ST that is associative and has an identity and inverses.
Let f , g ∈ ST . Then f , g are one-to-one mappings of T onto itself.
Consider f ∘g : T → T. If f ∘g(t1 ) = f ∘g(t2 ), then f (g(t1 )) = f (g(t2 )), and g(t1 ) = g(t2 ),
since f is one-to-one. But then t1 = t2 since g is one-to-one.
If t ∈ T, there exists t1 ∈ T with f (t1 ) = t since f is onto. Then there exists t2 ∈ T
with g(t2 ) = t1 since g is onto. Putting these together, f (g(t2 )) = t; therefore, f ∘ g is onto.
Therefore, f ∘ g is also a permutation, and composition gives a valid binary operation
on ST .
The identity function 1(t) = t for all t ∈ T will serve as the identity for ST , whereas
the inverse function for each permutation will be the inverse. Such unique inverse func-
tions exist since each permutation is a bijection.
Finally, composition of functions is always associative; therefore, ST forms a group.
If T, T1 have the same cardinality, then there exists a bijection σ : T → T1 . Define a
map F : ST → ST1 in the following manner: if f ∈ ST , let F(f ) be the permutation on T1
given by F(f )(t1 ) = σ(f (σ −1 (t1 ))). It is straightforward to verify that F is an isomorphism
(see the exercises).
Finally, suppose |T| = n < ∞. Then T = {t1 , . . . , tn }. Each f ∈ ST can be pictured as

t1 ⋅⋅⋅ tn
f =( ).
f (t1 ) ⋅⋅⋅ f (tn )

For t1 , there are n choices for f (t1 ). For t2 , there are only n − 1 choices since f is one-to-
one. This continues down to only one choice for tn . Using the multiplication principle,
the number of choices for f and, therefore, the size of ST is

n(n − 1) ⋅ ⋅ ⋅ 1 = n!.

For a set with n elements, we denote ST by Sn called the symmetric group on n sym-
bols.

Example 7.3.6. Write down the six elements of S3 , and give the multiplication table for
the group.
7.3 The Fundamental Theorem of Algebra � 99

Name the three elements 1, 2, 3 of T. The six elements of S3 are then:

1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2

The multiplication table for S3 can be written down directly by doing the required
composition. For example,

1 2 3 1 2 3 1 2 3
ac = ( )( )=( ) = d.
2 3 1 2 1 3 3 2 1

To see this, note that a : 1 → 2, 2 → 3, 3 → 1; c : 1 → 2, 2 → 1, 3 → 3, and so


ac : 1 → 3, 2 → 2, 3 → 1.
It is somewhat easier to construct the multiplication table if we make some obser-
vations. First, a2 = b, and a3 = 1. Next, c2 = 1, d = ac, e = a2 c and, finally, ac = ca2 .
From these relations, the following multiplication table can be constructed:

1 a a2 c ac a2 c
1 1 a a2 c ac a2 c
a a a2 1 ac a2 c c
a2 a2 1 a a2 c c ac
c c a2 c ac 1 a2 a
ac ac c a2 c a 1 a2
a2 c a2 c ac c a2 a 1

To see this, consider, for example, (ac)a2 = a(ca2 ) = a(ac) = a2 c.


More generally, we can say that S3 has a presentation given by

S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.

By this, we mean that S3 is generated by a, c, or that S3 has generators a, c. Thus,


the whole group and its multiplication table can be generated by using the relations
a3 = c2 = 1, ac = ca2 .

An important result, the form of which we will see later in our work on extension
fields, is the following:

Lemma 7.3.7. Let T be a set and T1 ⊂ T a subset. Let H be the subset of ST that fixes each
element of T1 ; that is, f ∈ H if f (t) = t for all t ∈ T1 . Then H is a subgroup.

Proof. We have H ≠ 0 since 1 ∈ H. Now suppose h1 , h2 ∈ H. Let t1 ∈ T1 , and consider


h1 ∘ h2 (t1 ) = h1 (h2 (t1 )). Now h2 (t1 ) = t1 since h2 ∈ H, but then h1 (t1 ) = t1 since h1 ∈ H.
Therefore, h1 ∘ h2 ∈ H, and H is closed under composition. If h1 fixes t1 , then h1−1 also
fixes t1 . Thus, H is also closed under inverses and is, therefore, a subgroup.
100 � 7 Kronecker’s Theorem and Algebraic Closures

We now apply these ideas of permutations to certain polynomial rings in indepen-


dent indeterminates over a field. We will look at these in detail in Chapter 11.

Definition 7.3.8. Let y1 , . . . , yn be (independent) indeterminates over a field K. A poly-


nomial f (y1 , . . . , yn ) ∈ K[y1 , . . . , yn ] is a symmetric polynomial in y1 , . . . , yn if f (y1 , . . . , yn )
is unchanged by any permutation σ of {y1 , . . . , yn }: f (y1 , . . . , yn ) = f (σ(y1 ), . . . , σ(yn )).
If K ⊂ K ′ are fields and α1 , . . . , αn are in K ′ , then we call a polynomial f (α1 , . . . , αn )
with coefficients in K symmetric in α1 , . . . , αn if f (α1 , . . . , αn ) is unchanged by any permu-
tation σ of {α1 , . . . , αn }.

Example 7.3.9. Let K be a field and k0 , k1 ∈ K. Let h(y1 , y2 ) = k0 (y1 + y2 ) + k1 (y1 y2 ). There
are two permutations on {y1 , y2 }, namely, σ1 : y1 → y1 , y2 → y2 and σ2 : y1 → y2 , y2 → y1 .
Applying either one of these two to {y1 , y2 } leaves h(y1 , y2 ) invariant. Therefore, h(y1 , y2 )
is a symmetric polynomial.

Definition 7.3.10. Let x, y1 , . . . , yn be indeterminates over a field K (or elements of an


extension field K ′ of K). Form the polynomial p(x, y1 , . . . , yn ) = (x − y1 ) ⋅ ⋅ ⋅ (x − yn ). The
i-th elementary symmetric polynomial si in y1 , . . . , yn for i = 1, . . . , n, is (−1)i ai , where ai
is the coefficient of x n−i in p(x, y1 , . . . , yn ).

Example 7.3.11. Consider y1 , y2 , y3 . Then

p(x, y1 , y2 , y3 ) = (x − y1 )(x − y2 )(x − y3 )


= x 3 − (y1 + y2 + y3 )x 2 + (y1 y2 + y1 y3 + y2 y3 )x − y1 y2 y3 .

Therefore, the three elementary symmetric polynomials in y1 , y2 , y3 over any field


are
(1) s1 = y1 + y2 + y3 .
(2) s2 = y1 y2 + y1 y3 + y2 y3 .
(3) s3 = y1 y2 y3 .

In general, the pattern of the last example holds for y1 , . . . , yn . That is,

s1 = y1 + y2 + ⋅ ⋅ ⋅ + yn
s2 = y1 y2 + y1 y3 + ⋅ ⋅ ⋅ + yn−1 yn
s3 = y1 y2 y3 + y1 y2 y4 + ⋅ ⋅ ⋅ + yn−2 yn−1 yn
..
.
sn = y1 ⋅ ⋅ ⋅ yn .

The importance of the elementary symmetric polynomials is that any symmetric


polynomial can be built up from the elementary symmetric polynomials. We make this
precise in the next theorem called the fundamental theorem of symmetric polynomials.
We will use this important result several times, and we will give a complete proof in
Section 7.4.
7.3 The Fundamental Theorem of Algebra � 101

Theorem 7.3.12 (Fundamental theorem of symmetric polynomials). If P is a symmetric


polynomial in the indeterminates y1 , . . . , yn over a field K; that is, P ∈ K[y1 , . . . , yn ] and P
is symmetric, then there exists a unique g ∈ K[y1 , . . . , yn ] with f (y1 , . . . , yn ) = g(s1 , . . . , sn ).
That is, any symmetric polynomial in y1 , . . . , yn is a polynomial expression in the elemen-
tary symmetric polynomials in y1 , . . . , yn .

From this theorem, we obtain the following two lemmas, which will be crucial in
our proof of the fundamental theorem of algebra.

Lemma 7.3.13. Let p(x) ∈ K[x], and suppose p(x) has the zeros α1 , . . . , αn in the splitting
field K ′ . Then the elementary symmetric polynomials in α1 , . . . , αn are in K.

Proof. Suppose p(x) = c0 + c1 x + ⋅ ⋅ ⋅ + cn x n ∈ K[x]. Since p(x) splits in K ′ [x], with zeros
α1 , . . . , αn , we have that, in K ′ [x],

p(x) = cn (x − α1 ) ⋅ ⋅ ⋅ (x − αn ).

The coefficients are then cn (−1)i si (α1 , . . . , αn ), where the si (α1 , . . . , αn ) are the ele-
mentary symmetric polynomials in α1 , . . . , αn . However, p(x) ∈ K[x], so each coefficient
is in K. It follows then that for each i, cn (−1)i si (α1 , . . . , αn ) ∈ K; hence, si (α1 , . . . , αn ) ∈ K
since cn ∈ K.

Lemma 7.3.14. Let p(x) ∈ K[x], and suppose p(x) has the zeros α1 , . . . , αn in the split-
ting field K ′ . Suppose further that g(x) = g(x, α1 , . . . , αn ) ∈ K ′ [x]. If g(x) is a symmetric
polynomial in α1 , . . . , αn , then g(x) ∈ K[x].

Proof. If g(x) = g(x, α1 , . . . , αn ) is symmetric in α1 , . . . , αn , then from Theorem 7.3.12, it is


a symmetric polynomial in the elementary symmetric polynomials in α1 , . . . , αn . From
Lemma 7.3.13, these are in the ground field K, so the coefficients of g(x) are in K. There-
fore, g(x) ∈ K[x].

We now present a proof of the fundamental theorem of algebra.

Theorem 7.3.15 (Fundamental theorem of algebra). Any nonconstant complex polyno-


mial has a complex zero. In other words, the complex number field ℂ is algebraically
closed.

The proof depends on the following sequence of lemmas. The crucial one now is the
last, which says that any real polynomial must have a complex zero.

Lemma 7.3.16. Any odd-degree real polynomial must have a real zero.

Proof. This is a consequence of the intermediate value theorem from analysis.


Suppose P(x) ∈ ℝ[x] with deg P(x) = n = 2k + 1, and suppose the leading coefficient
an > 0 (the proof is almost identical if an < 0). Then

P(x) = an x n + (lower terms),


102 � 7 Kronecker’s Theorem and Algebraic Closures

and n is odd. Then,


(1) limx→∞ P(x) = limx→∞ an x n = ∞ since an > 0.
(2) limx→−∞ P(x) = limx→−∞ an x n = −∞ since an > 0 and n is odd.

From (1), P(x) gets arbitrarily large positively, so there exists an x1 with P(x1 ) > 0. Simi-
larly, from (2) there exists an x2 with P(x2 ) < 0.
A real polynomial is a continuous real-valued function for all x ∈ ℝ. Since
P(x1 )P(x2 ) < 0, it follows from the intermediate value theorem that there exists an
x3 , between x1 and x2 , such that P(x3 ) = 0.

Lemma 7.3.17. Any degree-two complex polynomial must have a complex zero.

Proof. This is a consequence of the quadratic formula and of the fact that any complex
number has a square root.
If P(x) = ax 2 + bx + c, a ≠ 0, then the zeros formally are

−b + √b2 − 4ac −b − √b2 − 4ac


x1 = , x2 = .
2a 2a
From DeMoivre’s theorem, every complex number has a square root; hence, x1 , x2 exist
in ℂ. They of course are the same if b2 − 4ac = 0.

To go further, we need the concept of the conjugate of a polynomial and some


straightforward consequences of this idea.

Definition 7.3.18. If P(x) = a0 + ⋅ ⋅ ⋅ + an x n is a complex polynomial then its conjugate


is the polynomial P(x) = a0 + ⋅ ⋅ ⋅ + an x n . That is, the conjugate is the polynomial whose
coefficients are the complex conjugates of those of P(x).

Lemma 7.3.19. For any P(x) ∈ ℂ[x], we have the following:


(1) P(z) = P(z) if z ∈ ℂ.
(2) P(x) is a real polynomial if and only if P(x) = P(x).
(3) If P(x)Q(x) = H(x), then H(x) = (P(x))(Q(x)).

Proof. (1) Suppose z ∈ ℂ and P(z) = a0 + ⋅ ⋅ ⋅ + an zn . Then

P(z) = a0 + ⋅ ⋅ ⋅ + an zn = a0 + a1 z + ⋅ ⋅ ⋅ + an zn = P(z).

(2) Suppose P(x) is real, then ai = ai for all its coefficients; hence, P(x) = P(x).
Conversely, suppose P(x) = P(x). Then ai = ai for all its coefficients; hence, ai ∈ ℝ for
each ai ; therefore, P(x) is a real polynomial.
(3) The proof is a computation and left to the exercises.

Lemma 7.3.20. Suppose G(x) ∈ ℂ[x]. Then H(x) = G(x)G(x) ∈ ℝ[x].


7.3 The Fundamental Theorem of Algebra � 103

Proof. H(x) = G(x)G(x) = G(x)G(x) = G(x)G(x) = G(x)G(x) = H(x). Therefore, H(x) is a


real polynomial.

Lemma 7.3.21. If every nonconstant real polynomial has a complex zero, then every non-
constant complex polynomial has a complex zero.

Proof. Let P(x) ∈ ℂ[x], and suppose that every nonconstant real polynomial has at least
one complex zero. Let H(x) = P(x)P(x). From Lemma 7.3.20, H(x) ∈ ℝ[x]. By supposition
there exists a z0 ∈ ℂ with H(z0 ) = 0. Then P(z0 )P(z0 ) = 0, and since ℂ is a field it has no
zero divisors.
Hence, either P(z0 ) = 0, or P(z0 ) = 0. In the first case, z0 is a zero of P(x). In the
second case, P(z0 ) = 0. Then from Lemma 7.3.19, P(z0 ) = P(z0 ) = P(z0 ) = 0. Therefore,
z0 is a zero of P(x).
Now we come to the crucial lemma.

Lemma 7.3.22. Any nonconstant real polynomial has a complex zero.

Proof. Let f (x) = a0 + a1 x + ⋅ ⋅ ⋅ + an x n ∈ ℝ[x] with n ≥ 1, an ≠ 0. The proof is an induction


on the degree n of f (x).
Suppose n = 2m q, where q is odd. We do the induction on m. If m = 0, then f (x) has
odd degree, and the theorem is true from Lemma 7.3.16. Assume then that the theorem
is true for all degrees d = 2k q′ , where k < m and q′ is odd. Now assume that the degree
of f (x) is n = 2m q.
Suppose K ′ is the splitting field for f (x) over ℝ, in which the zeros are α1 , . . . , αn . We
show that at least one of these zeros must be in ℂ. (In fact, all are in ℂ, but to prove the
lemma, we need only show at least one.)
Let h ∈ ℤ, and form the polynomial

H(x) = ∏(x − (αi + αj + hαi αj )).


i<j

This is in K ′ [x]. In forming H(x), we chose pairs of zeros {αi , αj }, so the number of
such pairs is the number of ways of choosing two elements out of n = 2m q elements. This
is given by
(2m q)(2m q − 1)
= 2m−1 q(2m q − 1) = 2m−1 q′
2
with q′ odd. Therefore, the degree of H(x) is 2m−1 q′ .
H(x) is a symmetric polynomial in the zeros α1 , . . . , αn . Since α1 , . . . , αn are the zeros
of a real polynomial, from Lemma 7.3.14, any polynomial in the splitting field symmetric
in these zeros must be a real polynomial.
Therefore, H(x) ∈ ℝ[x] with degree 2m−1 q′ . By the inductive hypothesis, then, H(x)
must have a complex zero. This implies that there exists a pair {αi , αj } with

αi + αj + hαi αj ∈ ℂ.
104 � 7 Kronecker’s Theorem and Algebraic Closures

Since h was an arbitrary integer, for any integer h1 , there must exist such a pair
{αi , αj } with

αi + αj + h1 αi αj ∈ ℂ.

Now let h1 vary over the integers. Since there are only finitely many such pairs
{αi , αj }, it follows that there must be at least two different integers h1 , h2 such that

z1 = αi + αj + h1 αi αj ∈ ℂ, and z2 = αi + αj + h2 αi αj ∈ ℂ.

Then z1 − z2 = (h1 − h2 )αi αj ∈ ℂ, and since h1 , h2 ∈ ℤ ⊂ ℂ, it follows that αi αj ∈ ℂ.


But then h1 αi αj ∈ ℂ, from which it follows that αi + αj ∈ ℂ. Then,

p(x) = (x − αi )(x − αj ) = x 2 − (αi + αj )x + αi αj ∈ ℂ[x].

However, p(x) is then a degree-two complex polynomial, and so from Lemma 7.3.17, its
zeros are complex. Therefore, αi , αj ∈ ℂ; thus, f (x) has a complex zero.

It is now easy to give a proof of the fundamental theorem of algebra. From Lem-
ma 7.3.22, every nonconstant real polynomial has a complex zero. From Lemma 7.3.21, if
every nonconstant real polynomial has a complex zero, then every nonconstant complex
polynomial has a complex zero, proving the fundamental theorem.

Theorem 7.3.23. If E is a finite-dimensional field extension of ℂ, then E = ℂ.

Proof. Let a ∈ E. Regard the elements 1, a, a2 , . . . . These elements become linearly de-
pendent over ℂ, and we get a nonconstant polynomial over ℂ with zero a. By the fun-
damental theorem of algebra, we know that a ∈ ℂ.

Corollary 7.3.24. If E is a finite-dimensional field extension of ℝ, then E = ℝ, or E = ℂ.

We refer to Section 17.6 where we revisit the fundamental theorem of algebra and
provide a Galois theoretic proof.

7.4 The Fundamental Theorem of Symmetric Polynomials


In the proof of the fundamental theorem of algebra that was given in the previous sec-
tion, we used the fact that any symmetric polynomial in n indeterminates is a polyno-
mial in the elementary symmetric polynomials in these indeterminates. In this section,
we give a proof of this theorem.
Let R be an integral domain with x1 , . . . , xn (independent) indeterminates over R,
and let R[x1 , . . . , xn ] be the polynomial ring in these indeterminates. Any polynomial
i i
f (x1 , . . . , xn ) ∈ R[x1 , . . . , xn ] is composed of a sum of pieces of the form ax11 ⋅ ⋅ ⋅ xnn with
a ∈ R. We first put an order on these pieces of a polynomial.
7.4 The Fundamental Theorem of Symmetric Polynomials � 105

i i j j
The piece ax11 ⋅ ⋅ ⋅ xnn with a ≠ 0 is called higher than the piece bx11 ⋅ ⋅ ⋅ xnn with b ≠ 0,
if the first one of the differences i1 − j1 , i2 − j2 , . . . , in − jn that differs from zero is in fact
positive. The highest piece of a polynomial f (x1 , . . . , xn ) is denoted by HG(f ).

Lemma 7.4.1. For f (x1 , . . . , xn ), g(x1 , . . . , xn ) ∈ R[x1 , . . . , xn ], we have

HG(fg) = HG(f ) HG(g).

Proof. We use an induction on n, the number of indeterminates. It is clearly true for


n = 1, and now assume that the statement holds for all polynomials in k indeterminates
with k < n and n ≥ 2. Order the polynomials via exponents on the first indeterminate x1
so that

f (x1 , . . . , xn ) = x1r ϕr (x2 , . . . , xn ) + x1r−1 ϕr−1 (x2 , . . . , xn )


+ ⋅ ⋅ ⋅ + ϕ0 (x2 , . . . , xn )

g(x1 , . . . , xn ) = x1s ψs (x2 , . . . , xn ) + x1s−1 ψs−1 (x2 , . . . , xn )


+ ⋅ ⋅ ⋅ + ψ0 (x2 , . . . , xn ).

Then HG(fg) = x1r+s HG(ϕr ψs ). By the inductive hypothesis

HG(ϕr ψs ) = HG(ϕr ) HG(ψs ).

Hence,

HG(fg) = x1r+s HG(ϕr ) HG(ψs )


= (x1r HG(ϕr ))(x1s HG(ψs )) = HG(f ) HG(g).

The elementary symmetric polynomials in n indeterminates x1 , . . . , xn are:

s1 = x1 + x2 + ⋅ ⋅ ⋅ + xn
s2 = x1 x2 + x1 x3 + ⋅ ⋅ ⋅ + xn−1 xn
s3 = x1 x2 x3 + x1 x2 x4 + ⋅ ⋅ ⋅ + xn−2 xn−1 xn
..
.
sn = x1 ⋅ ⋅ ⋅ xn .

These were found by forming the polynomial p(x, x1 , . . . , xn ) = (x − x1 ) ⋅ ⋅ ⋅ (x − xn ).


The i-th elementary symmetric polynomial si in x1 , . . . , xn is then (−1)i ai , where ai is the
coefficient of x n−i in p(x, x1 , . . . , xn ).
In general,

sk = ∑ xi1 xi2 ⋅ ⋅ ⋅ xik ,


i1 <i2 <⋅⋅⋅<ik ,1≤k≤n
106 � 7 Kronecker’s Theorem and Algebraic Closures

where the sum is taken over all the (kn) different systems of indices i1 , . . . , ik with
i1 < i2 < ⋅ ⋅ ⋅ < ik . Furthermore, a polynomial s(x1 , . . . , xn ) is a symmetric polynomial
if s(x1 , . . . , xn ) is unchanged by any permutation σ of {x1 , . . . , xn }, that is, s(x1 , . . . , xn ) =
s(σ(x1 ), . . . , σ(xn )).
k k
Lemma 7.4.2. In the highest piece ax1 1 ⋅ ⋅ ⋅ xnn with a ≠ 0 of a symmetric polynomial
s(x1 , . . . , xn ), we have k1 ≥ k2 ≥ ⋅ ⋅ ⋅ ≥ kn .

Proof. Assume that ki < kj for some i < j. As a symmetric polynomial, s(x1 , . . . , xn ) also
k k k k k k
must then contain the piece ax1 1 ⋅ ⋅ ⋅ xi j ⋅ ⋅ ⋅ xj i ⋅ ⋅ ⋅ xnn , which is higher than ax1 1 ⋅ ⋅ ⋅ xi i ⋅ ⋅ ⋅
k k
xj j ⋅ ⋅ ⋅ xnn , giving a contradiction.
k −k2 k2 −k3 k −kn kn
Lemma 7.4.3. The product s1 1 s2 ⋅ ⋅ ⋅ sn−1
n−1
sn with k1 ≥ k2 ≥ ⋅ ⋅ ⋅ ≥ kn has the high-
k k k
est piece x1 1 x2 2 ⋅ ⋅ ⋅ xnn .

Proof. From the definition of the elementary symmetric polynomials, we have that

HG(skt ) = (x1 x2 ⋅ ⋅ ⋅ xk )t , 1 ≤ k ≤ n, t ≥ 1.

From Lemma 7.3.16,


k −k2 k2 −k3 kn−1 −kn kn
HG(s1 1 s2 ⋅ ⋅ ⋅ sn−1 sn )
k −k kn−1 −kn
= x1 1 2 (x1 x2 )k2 −k3 ⋅ ⋅ ⋅ (x1 ⋅ ⋅ ⋅ xn−1 )(x1 ⋅ ⋅ ⋅ xn )kn
k k
= x1 1 x2 2 ⋅ ⋅ ⋅ xnkn .

Theorem 7.4.4. Let s(x1 , . . . , xn ) ∈ R[x1 , . . . , xn ] be a symmetric polynomial. Then


s(x1 , . . . , xn ) can be uniquely expressed as a polynomial f (s1 , . . . , sn ) in the elementary
symmetric polynomials s1 , . . . , sn with coefficients from R.

Proof. We prove the existence of the polynomial f by induction on the size of the highest
pieces. If in the highest piece of a symmetric polynomial all exponents are zero, then it
is constant, that is, an element of R. Therefore, there is nothing to prove.
Now we assume that each symmetric polynomial with the highest piece smaller
than that of s(x1 , . . . , xn ) can be written as a polynomial in the elementary symmetric
k k
polynomials. Let ax1 1 ⋅ ⋅ ⋅ xnn , a ≠ 0, be the highest piece of s(x1 , . . . , xn ). Let

k −k2 k −kn kn
t(x1 , . . . , xn ) = s(x1 , . . . , xn ) − as1 1 ⋅ ⋅ ⋅ sn−1
n−1
sn .

Clearly, t(x1 , . . . , xn ) is another symmetric polynomial, and from Lemma 7.3.19, the
highest piece of t(x1 , . . . , xn ) is smaller than that of s(x1 , . . . , xn ). Therefore, t(x1 , . . . , xn ).
k −k kn−1 −kn kn
Hence, s(x1 , . . . , xn ) = t(x1 , . . . , xn ) + as1 1 2 ⋅ ⋅ ⋅ sn−1 sn can be written as a polynomial
in s1 , . . . , sn . To prove the uniqueness of this expression, assume that

s(x1 , . . . , xn ) = f (s1 , . . . , sn ) = g(s1 , . . . , sn ).


7.5 Skew Field Extensions of ℂ and the Frobenius Theorem � 107

Then

f (s1 , . . . , sn ) − g(s1 , . . . , sn ) = h(s1 , . . . , sn ) = ϕ(x1 , . . . , xn )

is the zero polynomial in x1 , . . . , xn . Hence, if we write h(s1 , . . . , sn ) as a sum of products


of powers of the s1 , . . . , sn , all coefficients disappear because two different products of
powers in the s1 , . . . , sn have different highest pieces. This follows from the previous set
of lemmas. Therefore, f and g are the same, proving the theorem.

7.5 Skew Field Extensions of ℂ and the Frobenius Theorem


Let V be a ℝ-vector space with dimℝ (V ) = n < ∞. We have already seen that as a
consequence of the Fundamental theorem of algebra that only for n = 1 and n = 2, we
may provide V with a multiplication such that V becomes a field with respect to the
addition in V and this multiplication. Up to isomorphisms, we get V = ℝ if n = 1 and
V = ℂ if n = 2.
If we want a suitable multiplication for n ≥ 3, we have to give up some of the rules
of a field. If all the axioms of a field hold except for the commutativity of multiplication,
then we have a skew field or division ring. Hence, a division ring is a noncommutative
ring with identity, in which every nonzero element has a multiplicative inverse.
Hamilton described for n = 4 a multiplication in V in such a way that V becomes
a skew field. In his honor, we talk about the Hamiltonian skew field. This skew field is
denoted by ℍ and is called the quaternions.
In this section, we want first to describe the skew field ℍ of Hamilton’s quaternions
and then to prove that if n ≥ 3, only for n = 4 can we provide V with a multiplication
such that V becomes a skew field.
We start with the construction and description of ℍ. Let {1, i, j, k} be a basis of V . The
addition will be the usual addition in the vector space. We also take scalar multiplication
by ℝ. The basis element 1 shall be the unit element for the multiplication (as already
mentioned in the case of the complex numbers, this is not a restriction because any
nonzero vector in V is a member of a basis). The basis element 1 then should generate
the embedding of ℝ.
For i, j, k, we define a multiplication by the following rules of Hamilton:

i2 = j2 = k 2 = −1,
ij = k, jk = i, ki = j,
ji = −k, kj = −i, ik = −j.

For

x = x0 + x1 i + x2 j + x3 k and y = y0 + y1 i + y2 j + x3 k,
108 � 7 Kronecker’s Theorem and Algebraic Closures

we determine the addition and multiplication in V by following basic algebraic manip-


ulation:

x + y := (x0 + y0 ) + (x1 + y1 )i + (x2 + y2 )j + (x3 + y3 )k,


x ⋅ y := (x0 y0 − x1 y1 − x2 y2 − x3 y3 ) + (x0 y1 + x1 y0 + x2 y3 − x3 y2 )i
+ (x0 y2 − x1 y3 + x2 y0 + x3 y1 )j + (x0 y3 + x1 y2 − x2 y1 + x3 y0 )k.

Together with this addition and multiplication, V becomes a noncommutative ring with
unit element 1. For each quaternion

x = x0 + x1 i + x2 j + x3 k,

we define the conjugate quaternion by

x := x0 − x1 i − x2 j − x3 k.

We have the rules

x = x, x + y = x + y, λx = λx, λ ∈ ℝ, and xy = x ⋅ y.

With help of the conjugation, we may now define the norm and the length of a quater-
nion

x = x0 + x1 i + x2 j + x3 k

by

n(x) = xx = xx = x02 + x12 + x22 + x32 and |x| = √x02 + x12 + x22 + x32 ,

respectively, in analogy to the complex numbers. If x ≠ 0, then we get the multiplicative


x
inverse x −1 by x −1 = xx , because

x x
xx −1 = x =1=x .
xx xx
Hence, together with the addition and multiplication, V becomes a skew field, in which
ℝ can be embedded via r 󳨃→ r ⋅ 1 for r ∈ ℝ.

Theorem 7.5.1. The set of quaternions ℍ is a skew field, which contains both the reals
and the complexes as subfields. It has dimension 4 as a vector space over ℝ. Furthermore,
rx = xr for all x ∈ ℍ, and all r ∈ ℝ (considered as elements of ℍ).

In ℍ, there is an important multiplicative rule for the norm and the length:

n(xy) = n(x)n(y) and |xy| = |x||y| for x, y ∈ ℍ.

This can be shown by an easy calculation.


7.5 Skew Field Extensions of ℂ and the Frobenius Theorem � 109

This result on norms in the quaternions provides the general equation in ℝ on sums
of four squares:

(x02 + x12 + x22 + x32 )(y20 + y21 + y22 + y23 ) = (x0 y0 − x1 y1 − x2 y2 − x3 y3 )2


+ (x0 y1 + x1 y0 + x2 y3 − x3 y2 )2
+ (x0 y2 − x1 y3 + x2 y0 + x3 y1 )2
+ (x0 y3 + x1 y2 − x2 y1 + x3 y0 )2 .

This equation is one of the bases for the Theorem of Lagrange.

Theorem 7.5.2 (Theorem of Lagrange). Each natural number n can be written as a sum

n = a 2 + b2 + c 2 + d 2

of four squares with a, b, c, d ∈ ℤ.

Hint: We have only to show that (see [53, Chapter 3.2]) if p is a prime number with
p ≡ 3 (mod 4), then p = a2 + b2 + c2 + d 2 for some a, b, c, d ∈ ℤ. A proof of this can be
found for instance in the book [53].
We remark that the skew field ℍ of the quaternions can be embedded into M(2, ℂ)
via

1 0 i 0
1 󳨃→ ( ), i 󳨃→ ( ),
0 1 0 −i
0 1 0 i
j 󳨃→ ( ) , k 󳨃→ ( ).
−1 0 i 0

Using this map, a quaternion x = x0 + x1 i + x2 j + x3 k can be considered as a matrix

x0 + x1 i x2 + x3 i w z
( )=( )
−x2 + x3 i x0 − x1 i −z w

with w = x0 + x1 i ∈ ℂ and z = x2 + x3 i ∈ ℂ.
We have shown that the quaternions form a skew field of degree 4 over the real
numbers. We ask whether there can be other finite degree skew field extensions of ℝ.
Let V be a ℝ-vector space of dimℝ (V ) = n < ∞. For which n, we may provide V with
a multiplication such that V with the vector addition and this multiplication becomes a
field, or a skew field.
We remark that some nonzero vector in V has to be the unit element 1; therefore,
we automatically have an embedding ℝ → V .
Let n ≥ 2. Since the irreducible polynomials from ℝ[x] have degree 1 or 2, then
under the existence of such a multiplication, each element α ∈ V , which is not in ℝ
(considered as a subset of V ), must be a zero of a quadratic polynomial from ℝ[x].
110 � 7 Kronecker’s Theorem and Algebraic Closures

We now assume that we have in V a multiplication such that V , together with the
addition in V and this multiplication, is a field or a skew field.
If n = 2, we get the field ℂ of the complex numbers.
Now, let n = 3. Using analogous thoughts as for the implementation of ℂ, we may
construct in two steps a basis {1, i, j} of V such that 1 is the unit element of V , and i2 = j2 =
−1. Recall that a two-dimensional subspace of V has to be isomorphic to ℂ as a subfield
of V .
Let k = ij. Since dimℝ (V ) = 3, we must have k = a1 + b1 i + c1 j with a1 , b1 , c1 ∈ ℝ.
Multiplication from the left with i results in

−j = a1 i − b1 + c1 k = a1 i − b1 + c1 (a1 + b1 i + c1 j),

and since 1, i, j are linearly independent, therefore, we get c12 = −1, which is impossible
in ℝ. Therefore, the case n = 3 is not possible.
If n = 4, we may construct in V three linearly independent elements 1, i, j such that
1 is the unit element of V , and i2 = j2 = −1. Certainly ij is linearly independent from 1, i
and j, because otherwise, we get a contradiction as in the case n = 3. Also ji is linearly
independent from 1, i and j. Now i + j and i − j are both zeros of quadratic polynomials
over ℝ; that is, there exists r1 , s1 , r2 , s2 ∈ ℝ with

(i + j)2 + r1 (i + j) + s1 = 0 and (i − j)2 + r2 (i − j) + s2 = 0.

If we add these equations, we see that r1 = r2 = 0; therefore, we get from the first
equation that ij + ji = c ∈ ℝ. Here, we used that 1, i and j are linearly independent.
Now, we may replace j by j + c2 i, which gives

c c
i(j + i) + (j + i)i = 0.
2 2

Since the subspace of V generated by 1 and j + c2 i must, as a field, be isomorphic to ℂ, we


may normalize j + c2 i to j1 with j12 = −1.
We now define k = ij1 . Then automatically

k = ij1 = −j1 i and k 2 = −1.

So altogether, we may construct a basis {1, i, j, k} of V such that 1 is the unit element of V ,
and i2 = j2 = k 2 = −1, k = ij = −ji. Thereby, V is isomorphic to the skew field ℍ of the
quaternions.
Finally, let n ≥ 5. Analogously as for the case n = 4 and the general observation for
the subfield isomorphic to ℂ, we may construct a basis {1, i, j, k, l, . . .} such that

i2 = j2 = k 2 = −1, k = ij = −ji and l2 = −1.


7.6 Exercises � 111

Analogously, as in the case n = 4, we have that i + l and i − l are both zeros of quadratic
polynomials over ℝ.
Therefore, as in the case n = 4,

il = li = a2 ∈ ℝ.

In the same manner, we get

jl + lj = b2 ∈ ℝ and kl + lk = c2 ∈ ℝ.

We calculate

lk = l(ij) = a2 j − ilj = a2 j − i(b2 − jl)


= a2 j − b2 i + ijl = a2 j − b2 i + kl
= a2 j − b2 i + c2 − lk.

From this, we get

2lk = a2 j − b2 i + c2 .

Multiplication with k from the right gives

−2l = a2 i + b2 j + c2 k,

because jk = i, and ik = −j.


This means that l is linearly dependent of {1, i, j, k}, which is not the case. This con-
tradiction shows that n ≥ 5 is not possible.
Altogether, we have proven the following theorem:

Theorem 7.5.3 (Frobenius Theorem). Let V be an ℝ-vector space, dimℝ (V ) = n < ∞.


Let V be provided in addition with a multiplication, such that V together with the vector
addition and the multiplication is a field or a skew field.
Then n = 1, 2 or 4. In particular, if n = 1 then V is isomorphic to ℝ, if n = 2, then V is
isomorphic to ℂ, and if n = 4 then V is isomorphic to ℍ.

7.6 Exercises
1. Let f , g ∈ K[x] be irreducible polynomials of degree 2 over the field K. Let α1 , α2
(respectively, β1 , β2 ) be zeros of f and g. For 1 ≤ i, j ≤ 2, let νij = αi + βj . Show the
following:
(a) |K(νij ) : K| ∈ {1, 2, 3, 4}.
(b) For fixed f , g, there are at most two different degrees in (a).
112 � 7 Kronecker’s Theorem and Algebraic Closures

(c) Decide which sets of combinations of degrees in (b) (with f , g variable) are pos-
sible, and give an example in each case.
2. Let L|K be a field extension; let ν ∈ L and f (x) ∈ L[x], a polynomial of degree ≥ 1.
Let all coefficients of f (x) be algebraic over K. If f (ν) = 0, then ν is algebraic over K.
3. Let L|K be a field extension, and let M be an intermediate field. The extension M|K
is algebraic. For ν ∈ L, the following are equivalent:
(a) ν is algebraic over M.
(b) ν is algebraic over K.
4. Let L|K be a field extension and ν1 , ν2 ∈ L. Then the following are equivalent:
(a) ν1 and ν2 are algebraic over K.
(b) ν1 + ν2 and ν1 ν2 are algebraic over K.
5. Let L|K be a simple field extension. Then there is an extension field L′ of L of the
form L′ = K(ν1 , ν2 ) with the following:
(a) ν1 and ν2 are transcendental over K.
(b) The set of all over K algebraic elements of L′ is L.
6. In the proof of Theorem 7.1.4, show that the mapping

τ : K(a) → K(α),

defined by τ(k) = k if k ∈ K and τ(a) = α, and then extended by linearity, is a


K-isomorphism.
7. Prove Lemma 7.2.8.
8. If T, T1 are sets with the same cardinality, then there exists a bijection σ : T → T1 .
Define a map F : ST → ST1 in the following manner: if f ∈ ST , let F(f ) be the
permutation on T1 given by F(f )(t1 ) = σ(f (σ −1 (t1 ))). Prove that F is an isomorphism.
9. Let P(X), Q(x), H(x) ∈ ℂ. Show that P(x)Q(x) = H(x) implies H(x) = (P(x))(Q(x)).
10. Show the multiplicative rule for the norm and the length for the quaternions:

n(xy) = n(x)n(y) and |xy| = |x||y| for x, y ∈ ℍ.

11. Determine all irreducible polynomials over ℝ. Factorize f (x) ∈ ℝ[x] in irreducible
polynomials.
8 Splitting Fields and Normal Extensions
8.1 Splitting Fields
In the last chapter, we introduced splitting fields and used this idea to present a proof of
the fundamental theorem of algebra. The concept of a splitting field is essential to the
Galois theory of equations. Therefore, in this chapter, we look more deeply at this idea.

Definition 8.1.1. Let K be a field and f (x) a nonconstant polynomial in K[x]. An exten-
sion field L of K is a splitting field for f (x) over K if the following hold:
(a) f (x) splits into linear factors in L[x].
(b) K ⊂ M ⊂ L and M ≠ L, resulting in f (x) not splitting into linear factors in M[x].

From part (b) in the definition, the following is clear:

Lemma 8.1.2. L is a splitting field for f (x) ∈ K[x] if and only if f (x) splits into linear
factors in L[x], and if f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ) with b ∈ K, then L = K(a1 , . . . , an ).

Example 8.1.3. The field ℂ of complex numbers is a splitting field for the polynomial
p(x) = x 2 + 1 in ℝ[x]. In fact, since ℂ is algebraically closed, it is a splitting field for any
real polynomial f (x) ∈ ℝ[x], which has at least one nonreal zero.
The field ℚ(i) adjoining i to ℚ is a splitting field for x 2 + 1 over ℚ[x].

The next result was used in the previous chapter. We restate and reprove it here.

Theorem 8.1.4. Let K be a field. Then each nonconstant polynomial in K[x] has a splitting
field.

Proof. Let K be an algebraic closure of K.


Then f (x) splits in K[x]; that is, f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ) with b ∈ K and ai ∈ K.
Let L = K(a1 , . . . , an ). Then L is the splitting field for f (x) over K.
We next show that the splitting field over K of a given polynomial is unique up to
K-isomorphism.

Theorem 8.1.5. Let K, K ′ be fields and ϕ : K → K ′ an isomorphism. Let f (x) be a non-


constant polynomial in K[x] and f ′ (x) = ϕ(f (x)) its image in K ′ [x]. Suppose that L is a
splitting field for f (x) over K, and L′ is a splitting field for f ′ (x) over K ′ .
(a) Suppose that L′ ⊂ L′′ . Then, if ψ : L → L′′ is a monomorphism with ψ|K = ϕ, then ψ
is an isomorphism from L onto L′ . Moreover, ψ maps the set of zeros of f (x) in L onto
the set of zeros of f ′ (x) in L′ . The map ψ is uniquely determined by the values of the
zeros of f (x).
(b) If g(x) is an irreducible factor of f (x) in K[x], a is a zero of g(x) in L, and a′ is a zero
of g ′ (x) = ϕ(g(x)) in L′ , then there is an isomorphism ψ from L to L′ with ψ|K = ϕ and
ψ(a) = ψ(a′ ).

https://doi.org/10.1515/9783111142524-008
114 � 8 Splitting Fields and Normal Extensions

Before giving the proof of this theorem, we note that the following important result
is a direct consequence of it:

Theorem 8.1.6. A splitting field for f (x) ∈ K[x] is unique up to K-isomorphism.

Proof of Theorem 8.1.5. Suppose that f (x) = b(x − a1 ) ⋅ ⋅ ⋅ (x − an ) ∈ L[x] and suppose that
f ′ (x) = b′ (x − a1′ ) ⋅ ⋅ ⋅ (x − an′ ) ∈ L′ [x]. Then

f ′ (x) = ϕ(f (x)) = ψ(f (x)) = (ψ(b))(x − ψ(a1 )) ⋅ ⋅ ⋅ (x − ψ(an )).

We have proved that polynomials have unique factorization over fields. Since L′ ⊂ L′′ ,
it follows that the set of zeros (ψ(a1 ), . . . , ψ(an )) is a permutation of the set of zeros
(a1′ , . . . , an′ ). In particular, this implies that ψ(ai ) ∈ L′ ; thus,

im(ψ) = L′ = K ′ (a1 , . . . , an′ ).

Since the image of ψ is K ′ (a1 , . . . , an′ ) = K ′ (ψ(ai ), . . . , ψ(an )), it is clear that ψ is uniquely
determined by the images ψ(ai ). This proves part (a).
For part (b), embed L′ in an algebraic closure L′′ . Hence, there is a monomorphism

ϕ′ : K(a) → L′′

with ϕ′|K = ϕ and ϕ′ (a) = a′ . Hence, there is a monomorphism ψ : L → L′′ with ψ|K(a) = ϕ′ .
Then from part (a), it follows that ψ : L → L′ is an isomorphism.

Example 8.1.7. Let f (x) = x 3 −7 ∈ ℚ[x]. This has no zeros in ℚ, and since it is of degree 3,
it follows that it must be irreducible in ℚ[x].
Let ω = − 21 + 23 i ∈ ℂ. Then it is easy to show by computation that ω2 = − 21 − 23 i,
√ √

3
and ω = 1. Therefore, the three zeros of f (x) in ℂ are as follows:

a1 = 71/3
a2 = ω ⋅ 71/3
a3 = ω2 ⋅ 71/3 .

Hence, L = ℚ(a1 , a2 , a3 ), the splitting field of f (x). Since the minimal polynomial of
all three zeros over ℚ is the same f (x), it follows that

ℚ(a1 ) ≅ ℚ(a2 ) ≅ ℚ(a3 ).

Since ℚ(a1 ) ⊂ ℝ and a2 , a3 are nonreal, it is clear that a2 , a3 ∉ ℚ(a1 ). Suppose that
ℚ(a2 ) = ℚ(a3 ). Then ω = a3 a2−1 ∈ ℚ(a2 ), and so 71/3 = ω−1 a2 ∈ ℚ(a2 ). Hence, Q(a1 ) ⊂
ℚ(a2 ); therefore, ℚ(a1 ) = ℚ(a2 ) since they have the same degree over ℚ. This contra-
diction shows that ℚ(a2 ) and ℚ(a3 ) are distinct.
8.2 Normal Extensions � 115

By computation, we have a3 = a1−1 a22 ; hence,

L = ℚ(a1 , a2 , a3 ) = ℚ(a1 , a2 ) = ℚ(71/3 , ω).

Now the degree of L over ℚ is

|L : ℚ| = 󵄨󵄨󵄨Q(71/3 , ω) : ℚ(ω)󵄨󵄨󵄨󵄨󵄨󵄨ℚ(ω) : ℚ󵄨󵄨󵄨.


󵄨 󵄨󵄨 󵄨

Now |ℚ(ω) : ℚ| = 2 since the minimal polynomial of ω over ℚ is x 2 + x + 1. Since no zero


of f (x) lies in ℚ(ω), and the degree of f (x) is 3, it follows that f (x) is irreducible over
ℚ(ω). Therefore, we have that the degree of L over ℚ(ω) is 3. Hence, |L : ℚ| = (2)(3) = 6.
We now have the following lattice diagram of fields and subfields:

We do not know however if there are any more intermediate fields. There could,
for example, be infinitely many. However, as we will see when we do the Galois theory,
there are no others.

8.2 Normal Extensions


We now consider algebraic field extensions L of K, which have the property that if f (x) ∈
K[x] has a zero in L, then f (x) must split in L. In particular, we show that if L is a splitting
field of finite degree for some g(x) ∈ K[x], then L has this property.

Definition 8.2.1. A field extension L of a field K is a normal extension if the following


hold:
(a) L|K is algebraic.
(b) Each irreducible polynomial f (x) ∈ K[x] that has a zero in L splits into linear factors
in L[x].

Note, in Example 8.1.7, the extension fields Q(αi )|ℚ are not normal extensions. Al-
though f (x) has a zero in ℚ(αi ), the polynomial f (x) does not split into linear factors in
ℚ(αi )[x].
116 � 8 Splitting Fields and Normal Extensions

We now show that L|K is a finite normal extension if and only if L is the splitting
field for some f (x) ∈ K[x].

Theorem 8.2.2. Let L|K be a finite extension. Then the following are equivalent:
(a) L|K is a normal extension.
(b) L|K is a splitting field for some f (x) ∈ K[x].
(c) If L ⊂ L′ and ψ : L → L′ is a monomorphism with ψ|K , the identity map on K, then ψ
is an automorphism of L; that is, ψ(L) = L.

Proof. Suppose that L|K is a finite normal extension. Since L|K is a finite extension, L is
algebraic over K, and since of finite degree, we have L = K(a1 , . . . , an ) with ai algebraic
over K.
Let fi (x) ∈ K[x] be the minimal polynomial of ai . Since L|K is a normal extension,
fi (x) splits in L[x]. This is true for each i = 1, . . . , n. Let f (x) = f1 (x)f2 (x) ⋅ ⋅ ⋅ fn (x). Then
f (x) splits into linear factors in L[x]. Since K = K(a1 , . . . , an ), the polynomial f (x) cannot
have all its zeros in any intermediate extension between K and L. Therefore, L is the
splitting field for f (x). Hence, (a) implies (b).
Now suppose that L ⊂ L′ and ψ : L → L′ is a monomorphism with ψ|K the identity
map on K. Then the extension field ψ(L) of K is also a splitting field for f (x) since ψ|K
is the identity on K. Hence, ψ maps the zeros of f (x) in L ⊂ L′ onto the zeros of f (x) in
ψ(L) ⊂ L′ , and thus it follows that ψ(L) = L. Hence, (b) implies (c).
Finally, suppose (c). Hence, we assume that if L ⊂ L′ and ψ : L → L′ is a monomor-
phism with ψ|K , the identity map on K, then ψ is an automorphism of L; that is, ψ(L) = L.
As before L|K is algebraic since L|K is finite. Suppose that f (x) ∈ K[x] is irreducible
and that a ∈ L is a zero of f (x). There are algebraic elements a1 , . . . , an ∈ L with L =
K(a1 , . . . , an ) since L|K is finite. For i = 1, . . . , n, let fi (x) ∈ K[x] be the minimal polynomial
of ai , and let g(x) = f (x)f1 (x) ⋅ ⋅ ⋅ fn (x). Let L′ be the splitting field of g(X). Clearly, L ⊂ L′ .
Let b ∈ L′ be a zero of f (x). From Theorem 8.1.5, there is an automorphism ψ of L′ with
ψ(a) = b and ψ|K , the identity on K. Hence, by our assumption, ψ|L is an automorphism
of L. It follows that b ∈ L; hence, f (x) splits in L[x]. Therefore, (c) implies (a), completing
the proof.

To give simple examples of normal extensions, we have the following:

Lemma 8.2.3. If L is an extension of K with |L : K| = 2, then L is a normal extension of K.

Proof. Suppose that |L : K| = 2. Then L|K is algebraic since it is finite.


Let f (x) ∈ K[x] be irreducible with leading coefficient 1, and which has a zero
in L. Let a be one zero. Then f (x) must be the minimal polynomial of a. However,
deg(ma (x)) ≤ |L : K| = 2; hence, f (x) is of degree 1 or 2. Since f (x) has a zero in L, it fol-
lows that it must split into linear factors in L[x]; therefore, L is a normal extension.

Later, we will tie this result to group theory when we prove that a subgroup of in-
dex 2 must be a normal subgroup.
8.2 Normal Extensions � 117

Example 8.2.4. As a first example of the lemma, consider the polynomial f (x) = x 2 −2. In
ℝ, this splits as (x − √2)(x + √2); hence, the field ℚ(√2) is the splitting field of f (x) = x 2 −2
over ℚ. Therefore, ℚ(√2) is a normal extension of ℚ.

Example 8.2.5. As a second example, consider the polynomial x 4 − 2 in ℚ[x]. The zeros
in ℂ are

21/4 , 21/4 i, 21/4 i2 , 21/4 i3 .

Hence,

L = ℚ(21/4 , 21/4 i, 21/4 i2 , 21/4 i3 )

is the splitting field of x 4 − 2 over ℚ.


Now

L = ℚ(21/4 , 21/4 i, 21/4 i2 , 21/4 i3 ) = ℚ(21/4 , i).

Therefore, we have

|L : ℚ| = 󵄨󵄨󵄨L : ℚ(21/4 )󵄨󵄨󵄨󵄨󵄨󵄨ℚ(21/4 ) : ℚ󵄨󵄨󵄨.


󵄨 󵄨󵄨 󵄨

Since x 4 − 2 is irreducible over ℚ, we have |ℚ(21/4 ) : ℚ| = 4. Since i has degree 2 over


any real field, we have |L : ℚ(21/4 )| = 2. Therefore, L is a normal extension of ℚ(21/4 ),
and x 2 − √2 ∈ ℚ(√2)[x] has the splitting field ℚ(21/4 ).
Altogether, we have that L|ℚ(21/4 ), ℚ(21/4 )|ℚ(21/2 ), ℚ(21/2 )|ℚ, and L|ℚ are normal
extensions. However, ℚ(21/4 )|ℚ is not normal since 21/4 is a zero of x 4 − 2, but ℚ(21/4 )
does not contain all the zeros of x 4 − 2.
Hence, we get the following Figure 8.1.

Figure 8.1: Normal extensions.


118 � 8 Splitting Fields and Normal Extensions

8.3 Exercises
1. Determine the splitting field of f (x) ∈ ℚ[x] and its degree over ℚ in the following
cases:
(a) f (x) = x 4 − p, where p is a prime.
(b) f (x) = x p − 2, where p is a prime.
2. Determine the degree of the splitting field of the polynomial x 4 +4 over ℚ. Determine
the splitting field of x 6 + 4x 4 + 4x 2 + 3 over ℚ.
3. For each a ∈ ℤ, let fa (x) = x 3 − ax 2 + (a − 3)x + 1 ∈ ℚ[x] be given:
(a) fa is irreducible over ℚ for each a ∈ ℤ.
(b) If b ∈ ℝ is a zero of fa , then also (1 − b)−1 and (b − 1)b−1 are zeros of fa .
(c) Determine the splitting field L of fa (x) over ℚ and its degree |L : ℚ|.
4. Let K be a field and f (x) ∈ K[x] a polynomial of degree n. Let L be a splitting field
of f (x). Show the following:
(a) If a1 , . . . , an ∈ L are the zeros of f , then |K(a1 , . . . , at ) : K| ≤ n ⋅ (n − 1) ⋅ ⋅ ⋅ (n − t + 1)
for each t with 1 ≤ t ≤ n.
(b) L over K is of degree at most n!.
(c) If f (x) is irreducible over K, then n divides |L : K|.
9 Groups, Subgroups and Examples
9.1 Groups, Subgroups and Isomorphisms
Recall from Chapter 1 that the three most commonly studied algebraic structures are
groups, rings and fields. We have now looked rather extensively at rings and fields. In
this chapter, we consider the basic concepts of group theory. Groups arise in many differ-
ent areas of mathematics. For example they arise in geometry as groups of congruence
motions, and in topology as groups of various types of continuous functions. Later in
this book, they will appear in Galois theory as groups of automorphisms of fields. First,
we recall the definition of a group given previously in Chapter 1.

Definition 9.1.1. A group G is a set with one binary operation, which we will denote by
multiplication, such that
(1) The operation is associative; that is, (g1 g2 )g3 = g1 (g2 g3 ) for all g1 , g2 , g3 ∈ G.
(2) There exists an identity for this operation; that is, an element 1 such that 1g = g and
g1 = g for each g ∈ G.
(3) Each g ∈ G has an inverse for this operation; that is, for each g, there exists a g −1
with the property that gg −1 = 1, and g −1 g = 1.

If, in addition, the operation is commutative; that is, g1 g2 = g2 g1 for all g1 , g2 ∈ G, the
group G is called an Abelian group.
The order of G, denoted |G|, is the number of elements in the group G. If |G| < ∞,
G is a finite group, otherwise, it is an infinite group.

It follows easily from the definition that the identity is unique, and that each element
has a unique inverse.

Lemma 9.1.2. If G is a group, then there is a unique identity. Furthermore, if g ∈ G, its


inverse is unique. Finally, if g1 , g2 ∈ G, then (g1 g2 )−1 = g2−1 g1−1 .

Proof. Suppose that 1 and e are both identities for G. Then 1e = e since 1 is an identity,
and 1e = 1 since e is an identity. Therefore, 1 = e, and there is only one identity.
Next suppose that g ∈ G, g1 , and g2 are inverses for g. Then

g1 gg2 = (g1 g)g2 = 1g2 = g2

since g1 g = 1. On the other hand,

g1 gg2 = g1 (gg2 ) = g1 1 = g1

since gg2 = 1. It follows that g1 = g2 , and g has a unique inverse.


Finally, consider

(g1 g2 )(g2−1 g1−1 ) = g1 (g2 g2−1 )g1−1 = g1 1g1−1 = g1 g1−1 = 1.

https://doi.org/10.1515/9783111142524-009
120 � 9 Groups, Subgroups and Examples

Therefore, g2−1 g1−1 is an inverse for g1 g2 , and since inverses are unique, it is the inverse
of the product.

Groups most often arise as permutations on a set. We will see this, as well as other
specific examples of groups, in the next sections.
Finite groups can be completely described by their group tables or multiplication
tables. These are sometimes called Cayley tables. In general, let G = {g1 , . . . , gn } be a
group, then the multiplication table of G is

g1 g2 ⋅⋅⋅ gj ⋅⋅⋅ gn
g1 ⋅⋅⋅
g2 ⋅⋅⋅
..
.
gi ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ gi gj
..
.
gn ...

The entry in the row of gi ∈ G and column of gj ∈ G is the product (in that order)
gi gj in G.
Groups satisfy the cancellation law for multiplication.

Lemma 9.1.3. If G is a group and a, b, c ∈ G with ab = ac or ba = ca, then b = c.

Proof. Suppose that ab = ac. Then a has an inverse a−1 , so we have

a−1 (ab) = a−1 (ac).

From the associativity of the group operation, we then have

(a−1 a)b = (a−1 a)c 󳨐⇒ 1 ⋅ b = 1 ⋅ c 󳨐⇒ b = c.

A consequence of Lemma 9.1.3 is that each row and each column in a group table is
just a permutation of the group elements. That is, each group element appears exactly
once in each row and each column.
A subset H ⊂ G is a subgroup of G if H is also a group under the same operation
as G. As for rings and fields, a subset of a group is a subgroup if it is nonempty and
closed under both the group operation and inverses.

Lemma 9.1.4. 1. A subset H ⊂ G is a subgroup if H ≠ 0, and H is closed under the


operation and inverses. That is, if a, b ∈ H, then ab ∈ H, and a−1 , b−1 ∈ H.
2. A nonempty subset H of a group G is a subgroup if and only if ab−1 ∈ H for all a, b ∈ H.
In addition, if G is finite, then H is a subgroup if and only if ab ∈ H for all a, b ∈ H.

We leave the proof of this to the exercises.


9.2 Examples of Groups � 121

Let G be a group and g ∈ G; we denote by g n , n ∈ ℕ, as with numbers, the product


of g taken n times. A negative exponent will indicate the inverse of the positive expo-
nent. As usual, let g 0 = 1. Clearly, group exponentiation will satisfy the standard laws of
exponents. Now consider the set

H = {1 = g 0 , g, g −1 , g 2 , g −2 , . . .}

of all powers of g. We will denote this by ⟨g⟩.

Lemma 9.1.5. If G is a group and g ∈ G, then ⟨g⟩ forms a subgroup of G called the cyclic
subgroup generated by g. ⟨g⟩ is Abelian, even if G is not.

Proof. If g ∈ G, then g ∈ ⟨g⟩; hence, ⟨g⟩ is nonempty. Suppose then that a = g n , b =


g m are elements of ⟨g⟩. Then ab = g n g m = g n+m ∈ ⟨g⟩, so ⟨g⟩ is closed under the
group operation. Furthermore, a−1 = (g n )−1 = g −n ∈ ⟨g⟩ so ⟨g⟩ is closed under inverses.
Therefore, ⟨g⟩ is a subgroup.
Finally, ab = g n g m = g n+m = g m+n = g m g n = ba; hence, ⟨g⟩ is Abelian.

Suppose that g ∈ G and g m = 1 for some positive integer m. Then let n be the smallest
positive integer such that g n = 1. It follows that the set of elements {1, g, g 2 , . . . , g n−1 } are
all distinct, but for any other power g k , we have g k = g t for some k = 0, 1, . . . , n − 1 (see
exercises). The cyclic subgroup generated by g then has order n, and we say that g has
order n, which we denote by o(g) = n. If no such n exists, we say that g has infinite order.
We will look more deeply at cyclic groups and subgroups in Section 9.5.
We introduce one more concept before looking at examples.

Definition 9.1.6. If G and H are groups, then a mapping f : G → H is a (group) homo-


morphism if f (g1 g2 ) = f (g1 )f (g2 ) for any g1 , g2 ∈ G. If f is also a bijection, then it is an
isomorphism.

As with rings and fields, we say that two groups G and H are isomorphic, denoted
by G ≅ H, if there exists an isomorphism f : G → H. This means that, abstractly, G and
H have exactly the same algebraic structure.

9.2 Examples of Groups


As already mentioned, groups arise in many diverse areas of mathematics. In this sec-
tion and the next, we present specific examples of groups.
First of all, any ring or field under addition forms an Abelian group. Hence, for
example, (ℤ, +), (ℚ, +), (ℝ, +), (ℂ, +), where ℤ, ℚ, ℝ, ℂ are respectively the integers, the
rationals, the reals, and the complex numbers; all are infinite Abelian groups. If ℤn is
the modular ring ℤ/nℤ, then for any natural number n, (ℤn , +) forms a finite Abelian
group. In Abelian groups, the group operation is often denoted by + and the identity
element by 0 (zero).
122 � 9 Groups, Subgroups and Examples

In a field K, the nonzero elements are all invertible and form a group under mul-
tiplication. This is called the multiplicative group of the field K and is usually denoted
by K ∗ . Since multiplication in a field is commutative, the multiplicative group of a field
is an Abelian group. Hence, ℚ∗ , ℝ∗ , ℂ∗ are all infinite Abelian groups, whereas if p is
a prime, ℤ∗p forms a finite Abelian group. Recall that if p is a prime, then the modular
ring ℤp is a field.
Within ℚ∗ , ℝ∗ , ℂ∗ , there are certain multiplicative subgroups. Since the positive
rationals ℚ+ and the positive reals ℝ+ are closed under multiplication and inverse, they
form subgroups of ℚ∗ and ℝ∗ , respectively. In ℂ, if we consider the set of all complex
numbers z with |z| = 1, these form a multiplicative subgroup. Further within this sub-
group, if we consider the set of n-th roots of unity z (that is zn = 1) for a fixed n, this
forms a subgroup, this time of finite order.
The multiplicative group of a field is a special case of the unit group of a ring. If R
is a ring with identity, recall that a unit is an element of R with a multiplicative inverse.
Hence, in ℤ, the only units are ±1, whereas in any field every nonzero element is a unit.

Lemma 9.2.1. If R is a ring with identity, then the set of units in R forms a group under
multiplication called the unit group of R, and is denoted by U(R). If R is a field, then
U(R) = R∗ .

Proof. Let R be a ring with identity. Then the identity 1 itself is a unit, so 1 ∈ U(R); hence,
U(R) is nonempty. If e ∈ R is a unit, then it has a multiplicative inverse e−1 . Clearly then,
the multiplicative inverse has an inverse, namely, e so e−1 ∈ U(R) if e is. Hence, to show
U(R) is a group, we must show that it is closed under product.
Let e1 , e2 ∈ U(R). Then there exist e1−1 , e2−1 . It follows that e2−1 e1−1 is an inverse for e1 e2 .
Hence, e1 e2 is also a unit, and U(R) is closed under product. Therefore, for any ring R
with identity U(R) forms a multiplicative group.

To present examples of non-Abelian groups, we turn to matrices. If K is a field, we


let

GL(n, K) = {n × n matrices over K with nonzero determinant}

and

SL(n, K) = {n × n matrices over K with determinant one}.

Lemma 9.2.2. If K is a field, then for n ≥ 2, GL(n, K) forms a non-Abelian group under
matrix multiplication, and SL(n, K) forms a subgroup.
GL(n, K) is called the n-dimensional general linear group over K, whereas SL(n, K) is
called the n-dimensional special linear group over K.

Proof. Recall that for two n × n matrices A and B with n ≥ 2 over a field, we have
det(AB) = det(A) det(B) where det is the determinant.
9.2 Examples of Groups � 123

Now for any field, the n × n identity matrix I has determinant 1; hence, I ∈ GL(n, K).
Since the determinant is multiplicative, the product of two matrices with nonzero de-
terminant has nonzero determinant, so GL(n, K) is closed under product. Furthermore,
over a field K, if A is an invertible matrix, then det(A−1 ) = det1 A .
Therefore, if A has nonzero determinant, so does its inverse. It follows that GL(n, K)
has the inverse of any of its elements. Since matrix multiplication is associative, it fol-
lows that GL(n, K) forms a group. It is non-Abelian since in general matrix multiplica-
tion is noncommutative. SL(n, K) forms a subgroup of GL(n, K) because det(A−1 ) = 1 if
det(A) = 1.

Groups play an important role in geometry. In any metric geometry, an isometry is


a mapping that preserves distance. To understand a geometry, one must understand the
group of isometries. We look briefly at the Euclidean geometry of the plane ℰ 2 .
An isometry or congruence motion of ℰ 2 is a transformation or bijection T of ℰ 2 that
preserves distance; that is, d(a, b) = d(T(a), T(b)) for all points a, b ∈ ℰ 2 .

Theorem 9.2.3. The set of congruence motions of ℰ 2 forms a group called the Euclidean
group. We denote the Euclidean group by ℰ .

Proof. The identity map I is clearly an isometry, and since composition of mappings is
associative, we need only to show that the product of isometries is an isometry, and that
the inverse of an isometry is an isometry.
Let T, U be isometries. Then d(a, b) = d(T(a), T(b)) and d(a, b) = d(U(a), U(b)) for
any points a, b. Now consider

d(TU(a), TU(b)) = d(T(U(a)), T(U(b))) = d(U(a), U(b))

since T is an isometry. However,

d(U(a), U(b)) = d(a, b)

since U is an isometry. Combining these, we have that TU is also an isometry.


Consider T −1 and points a, b. Then

d(T −1 (a), T −1 (b)) = d(TT −1 (a), TT −1 (b))

since T is an isometry. But TT −1 = I; hence,

d(T −1 (a), T −1 (b)) = d(TT −1 (a), TT −1 (b)) = d(a, b).

Therefore, T −1 is also an isometry; hence, ℰ is a group.

One of the major results concerning ℰ is the following. We refer to [41], [42], [27],
and [35] for a more thorough treatment.
124 � 9 Groups, Subgroups and Examples

Theorem 9.2.4. If T ∈ ℰ , then T is either a translation, rotation, reflection, or glide reflec-


tion. The set of translations and rotations forms a subgroup.

Proof. We outline a brief proof. If T is an isometry and T fixes the origin (0, 0), then T is
a linear mapping. It follows that T is a rotation or a reflection. If T does not fix the origin,
then there is a translation T0 such that T0 T fixes the origin. This gives translations and
glide reflections. In the exercises, we expand out more of the proof.

If D is a geometric figure in ℰ 2 , such as a triangle or square, then a symmetry of


D is a congruence motion T : ℰ 2 → ℰ 2 that leaves D in place. However, it may move
the individual elements of D. For example, a rotation about the center of a circle is a
symmetry of the circle.

Lemma 9.2.5. If D is a geometric figure in ℰ 2 , then the set of symmetries of D forms a


subgroup of ℰ called the symmetry group of D, denoted by Sym(D).

Proof. We show that Sym(D) is a subgroup of ℰ . The identity map I fixes D, that is, I ∈
Sym(D), and thus Sym(D) is nonempty. Let T, U ∈ Sym(D). Then T maps D to D, and so
does U. It follows directly that so does the composition TU; hence, TU ∈ Sym(D). If T
maps D to D, then certainly the inverse does.

Example 9.2.6. Let T be an equilateral triangle. Then there are exactly six symmetries
of T (see exercises). These are as follows:
– I is the identity,
– r is a rotation of 120∘ around the center of T,
– r is a rotation of 240∘ around the center of T,
– f is a reflection over the perpendicular bisector of one of the sides,
– fr is the composition of f and r, and
– fr 2 is the composition of f and r 2 .

The group Sym(T) is called the dihedral group D3 . In the next section, we will see that it
is isomorphic to S3 , the symmetric group on 3 symbols.

9.3 Permutation Groups


Groups most often appear as groups of transformations or permutations on a set. In this
section, we will take a short look at permutation groups, and then examine them more
deeply in Chapter 11. We recall some ideas, first introduced in Chapter 7, in relation to
the proof of the fundamental theorem of algebra.

Definition 9.3.1. If A is a set, a permutation on A is a one-to-one mapping of A onto itself.


We denote the set of all permutations on A by SA .
9.3 Permutation Groups � 125

Theorem 9.3.2. For any set A, SA forms a group under composition, called the symmetric
group on A. If |A| > 2, then SA is non-Abelian. Furthermore, if A, B have the same cardi-
nality, then SA ≅ SB .

Proof. If SA is the set of all permutations on the set A, we must show that composition
is an operation on SA that is associative, and has an identity and inverses. Let f , g ∈ SA .
Then f , g are one-to-one mappings of A onto itself.
Consider f ∘ g : A → A. If f ∘ g(a1 ) = f ∘ g(a2 ), then f (g(a1 )) = f (g(a2 )), and g(a1 ) =
g(a2 ), since f is one-to-one. But then a1 = a2 since g is one-to-one.
If a ∈ A, there exists a1 ∈ A with f (a1 ) = a since f is onto. Then there exists a2 ∈ A
with g(a2 ) = a1 since g is onto. Putting these together, f (g(a2 )) = a; therefore, f ∘g is onto.
Therefore, f ∘ g is also a permutation, and composition gives a valid binary operation
on SA .
The identity function 1(a) = a for all a ∈ A will serve as the identity for SA , whereas
the inverse function for each permutation will be the inverse. Such unique inverse func-
tions exist since each permutation is a bijection.
Finally, composition of functions is always associative; therefore, SA forms a group.
Suppose that |A| > 2. Then A has at least 3 elements. Call them a1 , a2 , a2 . Consider
the 2 permutations f and g, which fix (leave unchanged) all of A, except a1 , a2 , a3 and on
these three elements:
f (a1 ) = a2 , f (a2 ) = a3 , f (a3 ) = a1
g(a1 ) = a2 , g(a2 ) = a1 , g(a3 ) = a3 .

Then under composition

f (g(a1 )) = a3 , f (g(a2 )) = a2 , f (g(a3 )) = a1 ,

whereas

g( f (a1 )) = a1 , g( f (a2 )) = a3 , g( f (a3 )) = a2 .

Therefore, f ∘ g ≠ g ∘ f ; hence, SA is not Abelian.


If A, B have the same cardinality, then there exists a bijection σ : A → B. Define a
map F : SA → SB in the following manner: if f ∈ SA , let F(f ) be the permutation on B,
given by F(f )(b) = σ(f (σ −1 (b))). It is straightforward to verify that F is an isomorphism
(see the exercises).
If A1 ⊂ A, then those permutations on A that map A1 to A1 form a subgroup of SA
called the stabilizer of A1 , denoted as stab(A1 ). We leave the proof to the exercises.

Lemma 9.3.3. If A1 ⊂ A, then stab(A1 ) = { f ∈ SA : f : A1 → A1 } forms a subgroup of SA .

A permutation group is any subgroup of SA for some set A. We now look at finite
permutation groups. Let A be a finite set, say A = {a1 , a2 , . . . , an }. Then each f ∈ SA can
be pictured as
126 � 9 Groups, Subgroups and Examples

a1 ⋅⋅⋅ an
f =( ).
f (a1 ) ⋅⋅⋅ f (an )

For a1 , there are n choices for f (a1 ). For a2 , there are only n − 1 choices since f is one-to-
one. This continues down to only one choice for an . Using the multiplication principle,
the number of choices for f ; therefore, the size of SA is

n(n − 1) ⋅ ⋅ ⋅ 1 = n!.

We have thus proved the following theorem.

Theorem 9.3.4. If |A| = n then |SA | = n!.

For a set A with n elements, we denote SA by Sn , called the symmetric group on n


symbols.

Example 9.3.5. Write down the six elements of S3 and give the multiplication table for
the group.
Name the three elements 1, 2, 3. The six elements of S3 are then as follows:

1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2

The multiplication table for S3 can be written down directly by doing the required
composition. For example,

1 2 3 1 2 3 1 2 3
ac = ( )( )=( ) = d.
2 3 1 2 1 3 3 2 1

To see this, note that a : 1 → 2, 2 → 3, 3 → 1; c : 1 → 2, 2 → 1, 3 → 3, and so


ac : 1 → 3, 2 → 2, 3 → 1.
It is somewhat easier to construct the multiplication table if we make some obser-
vations. First, a2 = b and a3 = 1. Next, c2 = 1, d = ac, e = a2 c and, finally, ac = ca2 .
From these relations, the following multiplication table can be constructed:

1 a a2 c ac a2 c
1 1 a a2 c ac a2 c
a a a2 1 ac a2 c c
a2 a2 1 a a2 c c ac
c c a2 c ac 1 a2 a
ac ac c a2 c a 1 a2
a2 c a2 c ac c a2 a 1
9.4 Cosets and Lagrange’s Theorem � 127

To see this, consider, for example, (ac)a2 = a(ca2 ) = a(ac) = a2 c.


More generally, we can say that S3 has a presentation given by

S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.

By this, we mean that S3 is generated by a, c, or that S3 has generators a, c, and


the whole group and its multiplication table can be generated by using the relations
a3 = c2 = 1, ac = ca2 .

A theorem of Cayley actually shows that every group is a permutation group.


A group G is a permutation group on the group G itself considered as a set. This result,
however, does not give much information about the group.

Theorem 9.3.6 (Cayley’s theorem). Let G be a group. Consider the set of elements of G.
Then the group G is a permutation group on the set G; that is, G is a subgroup of SG .

Proof. We show that to each g ∈ G, we can associate a permutation of the set G. If g ∈ G,


let πg be the map given by

πg : g1 → gg1 for each g1 ∈ G.

It is straightforward to show that each πg is a permutation on G.

9.4 Cosets and Lagrange’s Theorem


In this section, given a group G and a subgroup H, we define an equivalence relation
on G. The equivalence classes all have the same size and are called the (left) or (right)
cosets of H in G.

Definition 9.4.1. Let G be a group and H ⊂ G a subgroup. For a, b ∈ G, define a ∼ b if


a−1 b ∈ H.

Lemma 9.4.2. Let G be a group and H ⊂ G a subgroup. Then the relation defined above is
an equivalence relation on G. The equivalence classes all have the form aH for a ∈ G and
are called the left cosets of H in G. Clearly, G is a disjoint union of its left cosets.

Proof. Let us show, first of all, that this is an equivalence relation. Now a ∼ a since
a−1 a = e ∈ H. Therefore, the relation is reflexive. Furthermore, a ∼ b implies a−1 b ∈ H,
but since H is a subgroup of G, we have b−1 a = (a−1 b)−1 ∈ H. Thus, b ∼ a. Therefore,
the relation is symmetric. Finally, suppose that a ∼ b and b ∼ c. Then a−1 b ∈ H, and
b−1 c ∈ H. Since H is a subgroup a−1 b ⋅ b−1 c = a−1 c ∈ H; hence, a ∼ c. Therefore, the
relation is transitive and, hence, is an equivalence relation.
For a ∈ G, the equivalence class is

[a] = {g ∈ G : a ∼ g} = {a ∈ G : a−1 g ∈ H}.


128 � 9 Groups, Subgroups and Examples

But then, clearly, g ∈ aH. It follows that the equivalence class for a ∈ G is precisely the set

aH = {g ∈ G : g = ah for some h ∈ H}.

These classes, aH, are called left cosets of H, and since they are equivalence classes,
they partition G. This means that every element of g is in one and only one left coset. In
particular, bH = H = eH if and only if b ∈ H.

If aH is a left coset, then we call the element a a coset representative. A complete


collection

{a ∈ G : {aH} is the set of all distinct left cosets of H}

is called a (left) transversal of H in G.


One could define another equivalence relation by defining a ∼ b if and only if
ba−1 ∈ H. Again, this can be shown to be an equivalence relation on G, and the equiva-
lence classes here are sets of the form

Ha = {g ∈ G : g = ha for some h ∈ H},

called right cosets of H. Also, of course, G is the (disjoint) union of distinct right cosets.
It is easy to see that any two left (right) cosets have the same order (number of
elements). To demonstrate this, consider the mapping aH → bH via ah 󳨃→ bh, where
h ∈ H. It is not hard to show that this mapping is 1–1 and onto (see exercises). Thus, we
have |aH| = |bH|. (This is also true for right cosets and can be established in a similar
manner.) Letting b ∈ H in the above discussion, we see |aH| = |H|, for any a ∈ G. That
is, the size of each left or right coset is exactly the same as the subgroup H.
One can also see that the collection {aH} of all distinct left cosets has the same num-
ber of elements as the collection {Ha} of all distinct right cosets. In other words, the
number of left cosets equals the number of right cosets (this number may be infinite).
For example, consider the map f : aH → Ha−1 . This mapping is well defined; for if
aH = bH, then b = ah, where h ∈ H. Thus, f (bH) = Hb−1 = Hh−1 a−1 = f (aH). It is not
hard to show that this mapping is 1–1 and onto (see exercises). Hence, the number of left
cosets equals the number of right cosets.

Definition 9.4.3. Let G be a group and H ⊂ G a subgroup. The number of distinct left
cosets, which is the same as the number of distinct right cosets, is called the index of H
in G, denoted by [G : H].

Now let us consider the case where the group G is finite. Each left coset has the
same size as the subgroup H; here, both are finite. Hence, |aH| = |H| for each coset. In
addition, the group G is a disjoint union of the left cosets; that is,

G = H ∪ g1 H ∪ ⋅ ⋅ ⋅ ∪ gn H.
9.4 Cosets and Lagrange’s Theorem � 129

Since this is a disjoint union, we have

|G| = |H| + |g1 H| + ⋅ ⋅ ⋅ + |gn H| = |H| + |H| + ⋅ ⋅ ⋅ + |H| = |H|[G : H].

This establishes the following extremely important theorem:

Theorem 9.4.4 (Lagrange’s theorem). Let G be a group and H ⊂ G a subgroup. Then

|G| = |H|[G : H].

If G is a finite group, this implies that both the order of a subgroup and the index of a
subgroup are divisors of the order of the group.

This theorem plays a crucial role in the structure theory of finite groups since it
greatly restricts the size of subgroups. For example, in a group of order 10, there can be
proper subgroups only of orders 1, 2, and 5.
As an immediate corollary, we have the following result:

Corollary 9.4.5. The order of any element g ∈ G, where G is a finite group, divides the
order of the group. In particular, if |G| = n and g ∈ G, then o(g)|n, and g n = 1.

Proof. Let g ∈ G and o(g) = m. Then m is the size of the cyclic subgroup generated by g;
hence divides n from Lagrange’s theorem. Then n = mk, and so

k
g n = g mk = (g m ) = 1k = 1.

Before leaving this section, we consider some results concerning general subsets of
a group.
Suppose that G is a group and S is an arbitrary nonempty subset of G, S ⊂ G, and
S ≠ 0. Such a set S is usually called a complex of G.
If U and V are two complexes of G, the product UV is defined as follows:

UV = {g1 g2 ∈ G : u ∈ U, v ∈ V }.

Now suppose that U, V are subgroups of G. When is the complex UV again a sub-
group of G?

Theorem 9.4.6. The product UV of two subgroups U, V of a group G is itself a subgroup


if and only if U and V commute; that is, if and only if UV = VU.

Proof. We note first that when we say U and V commute, we do not demand that this
is so elementwise. In other words, it is not required that uv = vu for all u ∈ U and all
v ∈ V . All that is required is that for any u ∈ U and v ∈ V uv = v1 u1 for some elements
u1 ∈ U and v1 ∈ V .
Assume that UV is a subgroup of G. Let u ∈ U and v ∈ V . Then u ∈ U ⋅ 1 ⊂ UV and
v ∈ 1 ⋅ V ⊂ UV . But since UV is assumed itself to be a subgroup, it follows that vu ∈ UV .
130 � 9 Groups, Subgroups and Examples

Hence, each product vu ∈ UV , and so VU ⊂ UV . In an identical manner, UV ⊂ VU, and


so UV = VU.
Conversely, suppose that UV = VU. Let g1 = u1 v1 ∈ UV , g2 = u2 v2 ∈ UV . Then

g1 g1 = (u1 v1 )(u2 v2 ) = u1 (v1 u2 )v2 = u1 u3 v3 v2 = (u1 u3 )(v3 v2 ) ∈ UV

since v1 u2 = u3 v3 for some u3 ∈ U and v3 ∈ V . Furthermore,

g1−1 = (u1 v1 )−1 = v−1


1 u1 = u4 v4 .
−1

It follows that UV is a subgroup.

Theorem 9.4.7 (Product formula). Let U, V be subgroups of G, and let R be a left transver-
sal of the intersection U ∩ V in U. Then

UV = ⋃ rV ,
r∈R

where this is a disjoint union.


In particular, if U, V are finite, then

|U||V |
|UV | = .
|U ∩ V |

Proof. Since R ⊂ U, we have that

⋃ rV ⊂ UV .
r∈R

In the other direction, let uv ∈ UV . Then

U = ⋃ r(U ∩ V ).
r∈R

It follows that u = rv′ with r ∈ R, and v′ ∈ U ∩ V . Hence,

uv = rv′ v ∈ rV .

The union of cosets of V is disjoint, so

uv ∈ ⋃ rV .
r∈R

Therefore, UV ⊂ ⋃r∈R rV , proving the equality.


Now suppose that |U| and |V | are finite. Then we have

|U| |U||V |
|UV | = |R||V | = |U : U ∩ V ||V | = |V | = .
|U ∩ V | |U ∩ V |
9.4 Cosets and Lagrange’s Theorem � 131

We now show that index is multiplicative. Later, we will see how this fact is related
to the multiplicativity of the degree of field extensions.

Theorem 9.4.8. Suppose G is a group and U and V are subgroups with U ⊂ V ⊂ G. Then
if G is the disjoint union

G = ⋃ rV ,
r∈R

R a left transversal of V in G, and V is the disjoint union

V = ⋃ sU,
s∈S

S a left transversal of U in V , then we get a disjoint union for G as

G = ⋃ rsU.
r∈R,s∈S

In particular, if [G : V ] and [V : U] are finite, then

[G : U] = [G : V ][V : U].

Proof. Now

G = ⋃ rV = ⋃ (⋃ sU) = ⋃ rsU.
r∈R r∈R s∈S r∈R,s∈S

Suppose that r1 s1 U = r2 s2 U. Then r1 s1 UV = r2 s2 UV . But s1 UV = V , and s2 UV = V so


r1 V = r2 V , which implies that r1 = r2 . Then s1 U = s2 U, which implies that s1 = s2 .
Therefore, the union is disjoint.
The index formula now follows directly.

The next result says that the intersection of subgroups of finite index must again be
of finite index.

Theorem 9.4.9 (Poincaré). Suppose that U, V are subgroups of finite index in G. Then U ∩V
is also of finite index. Furthermore,

[G : U ∩ V ] ≤ [G : U][G : V ].

If [G : U], [G : V ] are relatively prime then equality holds.

Proof. Let r be the number of left cosets of U in G that are contained in UV . r is finite
since the index [G : U] is finite. From Theorem 9.4.7, we then have

|V : U ∩ V | = r ≤ [G : U].
132 � 9 Groups, Subgroups and Examples

Then from Theorem 9.4.8,

[G : U ∩ V ] = [G : V ][V : U ∩ V ] ≤ [G : V ][G : U].

Since both [G : U] and [G : V ] are finite, so is [G : U ∩ V ].


Now [G : U]|[G : U ∩ V ], [G : V ]|[G : U ∩ V ]. If [G : U], and [G : V ] are relatively
prime, then

[G : U][G : V ]|[G : U ∩ V ] 󳨐⇒ [G : U][G : V ] ≤ [G : U ∩ V ].

Therefore, we must have equality.

Corollary 9.4.10. Suppose that [G : U] and [G : V ] are finite and relatively prime. Then
G = UV .

Proof. From Theorem 9.4.9, we have

[G : U ∩ V ] = [G : U][G : V ].

From Theorem 9.4.8

[G : U ∩ V ] = [G : V ][V : U ∩ V ].

Combing these, we have

[V : U ∩ V ] = [G : U].

The number of left cosets of U in G that are contained in VU is equal to the number of
all left cosets of U in G. It follows then that we must have G = UV .

9.5 Generators and Cyclic Groups


We saw that if G is any group and g ∈ G, then the powers of g generate a subgroup
of G, called the cyclic subgroup generated by g. Here, we explore more fully the idea of
generating a group or subgroup. We first need the following:

Lemma 9.5.1. If U and V are subgroups of a group G, then their intersection U ∩ V is also
a subgroup.

Proof. Since the identity of G is in both U and V , we have that U ∩V is nonempty. Suppose
that g1 , g2 ∈ U ∩ V . Then g1 , g2 ∈ U; hence, g1−1 g2 ∈ U since U is a subgroup. Analogously,
g1−1 g2 ∈ V . Hence, g −1 g2 ∈ U ∩ V ; therefore, U ∩ V is a subgroup.

Now let S be a subset of a group G. The subset S is certainly contained in at least


one subgroup of G, namely G itself. Let {Uα } be the collection of all subgroups of G con-
taining S. Then ⋂α Uα is again a subgroup of G from Lemma 9.5.1. Furthermore, it is the
9.5 Generators and Cyclic Groups � 133

smallest subgroup of G containing S (see the exercises). We call ⋂α Uα the subgroup of G


generated by S, and denote it by ⟨S⟩, or grp(S). We call the set S a set of generators for ⟨S⟩.

Definition 9.5.2. A subset M of a group G is a set of generators for G if G = ⟨M⟩; that is,
the smallest subgroup of G containing M is all of G. We say that G is generated by M, and
that M is a set of generators for G.

Notice that any group G has at least one set of generators, namely G itself. If we have
G = ⟨M⟩ and M is a finite set, then G is called finitely generated. Clearly, any finite group
is finitely generated. Shortly, we will give an example of a finitely generated infinite
group.

Example 9.5.3. The set of all reflections forms a set of generators for the Euclidean
group ℰ . Recall that any T ∈ ℰ is either a translation, a rotation, a reflection, or a glide
reflection. It can be shown (see exercises) that any one of these can be expressed as a
product of 3, or fewer reflections.

We now consider the case, where a group G has a single generator.

Definition 9.5.4. A group G is cyclic if there exists a g ∈ G such that G = ⟨g⟩.

In this case, G = {g n : n ∈ ℤ}; that is, G consists of all the powers of the element g.
If there exists an integer m such that g m = 1, then there exists a smallest such positive
integer say n. It follows that g k = g l if and only if k ≡ l (mod n). In this situation, the
distinct powers of g are precisely

{1 = g 0 , g, g 2 , . . . , g n−1 }.

It follows that |G| = n. We then call G a finite cyclic group. If no such power exists, then
all the powers of G are distinct and G is an infinite cyclic group.
We show next that any two cyclic groups of the same order are isomorphic.

Theorem 9.5.5. (a) If G = ⟨g⟩ is an infinite cyclic group, then G ≅ (ℤ, +); that is, the
integers under addition.
(b) If G = ⟨g⟩ is a finite cyclic group of order n, then G ≅ (ℤn , +); that is, the integers
modulo n under addition.
It follows that for a given order there is only one cyclic group up to isomorphism.

Proof. Let G be an infinite cyclic group with generator g. Map g onto 1 ∈ (ℤ, +). Since g
generates G and 1 generates ℤ under addition, this can be extended to a homomorphism.
It is straightforward to show that this defines an isomorphism.
Now let G be a finite cyclic group of order n with generator g. As above, map g to
1 ∈ ℤn and extend to a homomorphism. Again it is straightforward to show that this
defines an isomorphism.
134 � 9 Groups, Subgroups and Examples

Now let G and H be two cyclic groups of the same order. If both are infinite, then
both are isomorphic to (ℤ, +) and, hence, isomorphic to each other. If both are finite of
order n, then both are isomorphic to (ℤn , +) and, hence, isomorphic to each other.

Theorem 9.5.6. Let G = ⟨g⟩ be a finite cyclic group of order n. Then every subgroup of G
is also cyclic. Furthermore, if d|n, there exists a unique subgroup of G of order d.

Proof. Let G = ⟨g⟩ be a finite cyclic group of order n, and suppose that H is a subgroup
of G. Notice that if g m ∈ H, then g −m is also in H since H is a subgroup. Hence, H must
contain positive powers of the generator g. Let t be the smallest positive power of g such
that g t ∈ H. We claim that H = ⟨g t ⟩, the cyclic subgroup of G generated by g t . Let h ∈ H,
then h = g m for some positive integer m ≥ t. Divide m by t to get

m = qt + r, where r = 0 or 0 < r < t.

If r ≠ 0, then r = m − qt > 0. Now g m ∈ H, g t ∈ H so g −qt ∈ H for any q since H is a


subgroup. It follows that g m g −qt = g m−qt ∈ H. This implies that g r ∈ H. However, this
is a contradiction since r < t and t is the least positive power in H. It follows that r = 0
so m = qt. This implies that g m = g qt = (g t )q ; that is, g m is a multiple of g t . Therefore,
every element of H is a multiple of g t ; thus, g t generates H and, hence, H is cyclic.
Now suppose that d|n so that n = kd. Let H = ⟨g k ⟩; that is, the subgroup of G gener-
ated by g k . We claim that H has order d and that any other subgroup H1 of G with order
d coincides with H. Now (g k )d = g kd = g n = 1, so the order of g k divides d, hence is
≤ d. Suppose that (g k )d1 = g kd1 = 1 with d1 < d. Then since the order of g is n, we have
n = kd|kd1 with d1 < d, which is impossible. Therefore, the order of g k is d, and h = ⟨g k ⟩
is a subgroup of G of order d.
Now let H1 be a subgroup of G of order d. We must show that H1 = H. Let h ∈ H1 ,
so h = g t ; hence, g td = 1. It follows that n|td, and so kd|td; hence k|t. That is, t = qk for
some positive integer q. Therefore, g t = (g k )q ∈ H. Therefore, H1 ⊂ H, and since they
are of the same size, H = H1 .

Theorem 9.5.7. Let G = ⟨g⟩ be an infinite cyclic group. Then a subgroup H is of the form
H = ⟨g t ⟩ for a positive integer t. Furthermore, if t1 , t2 are positive integers with t1 ≠ t2 ,
then ⟨g t1 ⟩ and ⟨g t2 ⟩ are distinct.

Proof. Let G = ⟨g⟩ be an infinite cyclic group and H a subgroup of G. As in the proof of
Theorem 9.5.6, H must contain positive powers of the generator g. Let t be the smallest
positive power of g such that g t ∈ H. We claim that H = ⟨g t ⟩, the cyclic subgroup of G
generated by g t . Let h ∈ H, then h = g m for some positive integer m ≥ t. Divide m by t
to get

m = qt + r where r = 0 or 0 < r < t.

If r ≠ 0, then r = m − qt > 0. Now g m ∈ H, g t ∈ H so g −qt ∈ H for any q since H is a


subgroup. It follows that g m g −qt = g m−qt ∈ H. This implies that g r ∈ H. However, this is
9.5 Generators and Cyclic Groups � 135

a contradiction since r < t and t is the least positive power in H. It follows that r = 0,
so m = qt. This implies that g m = g qt = (g t )q ; that is, g m is a multiple of g t . Therefore,
every element of H is a multiple of g t and, therefore, g t generates H; hence, H = ⟨g t ⟩.
From the proof above in the subgroup ⟨g t ⟩, the integer t is the smallest positive
power of g in ⟨g t ⟩. Therefore, if t1 , t2 are positive integers with t1 ≠ t2 , then ⟨g t1 ⟩ and
⟨g t2 ⟩ are distinct.

Theorem 9.5.8. Let G = ⟨g⟩ be a cyclic group. Then the following hold:
(a) If G = ⟨g⟩ is finite of order n, then g k is also a generator if and only if (k, n) = 1. That
is, the generators of G are precisely those powers g k , where k is relatively prime to n.
(b) If G = ⟨g⟩ is infinite, then the only generators are g, g −1 .

Proof. (a) Let G = ⟨g⟩ be a finite cyclic group of order n, and suppose that (k, n) = 1.
Then there exist integers x, y with kx + ny = 1. It follows that
x y x
g = g kx+ny = (g k ) (g n ) = (g k )

since g n = 1. Hence, g is a power of g k , that implies every element of G is also a power


of g k . Therefore, g k is also a generator.
Conversely, suppose that g k is also a generator. Then g is a power of g k , so there ex-
ists an x such that g = g kx . It follows that kx ≡ 1 (mod n), and so there exists a y such that

kx + ny = 1.

This then implies that (k, n) = 1.


(b) If G = ⟨g⟩ is infinite, then any power of g other than g −1 generates a proper
subgroup. If g is a power of g n for some n so that g = g nx , it follows that g nx−1 = 1, thus,
g has finite order, contradicting that G is infinite cyclic.
Recall that for positive integers n, the Euler phi-function is defined as follows:

Definition 9.5.9. For any n > 0, let

ϕ(n) = number of integers less than or equal to n, and relatively prime to n.

Example 9.5.10. ϕ(6) = 2 since among 1, 2, 3, 4, 5, 6 only 1, 5 are relatively prime to 6.

Corollary 9.5.11. If G = ⟨g⟩ is finite of order n, then there are ϕ(n) generators for G, where
ϕ is the Euler phi-function.

Proof. From Theorem 9.5.8, the generators of G are precisely the powers g k , where
(k, n) = 1. The numbers relatively prime to n are counted by the Euler phi-function.

Recall that in an arbitrary group G, if g ∈ G, then the order of g, denoted o(g), is the
order of the cyclic subgroup generated by g. Given two elements g, h ∈ G, in general,
there is no relationship between o(g), o(h) and the order of the product gh. However, if
they commute, there is a very direct relationship.
136 � 9 Groups, Subgroups and Examples

Lemma 9.5.12. Let G be an arbitrary group and g, h ∈ G both of finite order o(g), o(h). If
g and h commute; that is, gh = hg, then o(gh) divides lcm(o(g), o(h)). In particular, if G is
an Abelian group, then o(gh)| lcm(o(g), o(h)) for all g, h ∈ G of finite order. Furthermore,
if ⟨g⟩ ∩ ⟨h⟩ = {1}, then o(gh) = lcm(o(g), o(h)).

Proof. Suppose o(g) = n and o(h) = m are finite. If g, h commute, then for any k, we
have (gh)k = g k hk . Let t = lcm(n, m), then t = k1 m, t = k2 n. Hence,

k k
(gh)t = g t ht = (g m ) 1 (hn ) 2 = 1.

Therefore, the order of gh is finite and divides t. Suppose that ⟨g⟩ ∩ ⟨h⟩ = {1}; that is, the
cyclic subgroup generated by g intersects trivially with the cyclic subgroup generated
by h. Let k = o(gh), which we know is finite from the first part of the lemma.
Let t = lcm(n, m). We then have (gh)k = g k hk = 1, which implies that g k = h−k .
Since the cyclic subgroups have only trivial intersection, this implies that g k = 1 and
hk = 1. But then n|k and m|k; hence t|k. Since k|t it follows that k = t.

Recall that if m and n are relatively prime, then lcm(m, n) = mn. Furthermore, if the
orders of g and h are relatively prime, it follows from Lagrange’s theorem that ⟨g⟩∩⟨h⟩ =
{1}. We then get the following:

Corollary 9.5.13. If g, h commute and o(g) and o(h) are finite and relatively prime, then
o(gh) = o(g)o(h).

Definition 9.5.14. If G is a finite Abelian group, then the exponent of G is the lcm of the
orders of all elements of G. That is,

exp(G) = lcm{o(g) : g ∈ G}.

As a consequence of Lemma 9.5.12, we obtain

Lemma 9.5.15. Let G be a finite Abelian group. Then G contains an element of order
exp(G).
e e
Proof. Suppose that exp(G) = p11 ⋅ ⋅ ⋅ pkk with pi distinct primes. By the definition of
e r
exp(G), there is a gi ∈ G with o(gi ) = pi i ri with pi and ri relatively prime. Let hi = gi i .
ei
Then from Lemma 9.5.12, we get o(hi ) = pi . Now let g = h1 h2 ⋅ ⋅ ⋅ hk . From the corollary
e e
to Lemma 9.5.12, we have o(g) = p11 ⋅ ⋅ ⋅ pkk = exp(G).

If K is a field then the multiplicative subgroup of nonzero elements of K is an Abelian


group K ⋆ . The above results lead to the fact that a finite subgroup of K ⋆ must actually
be cyclic.

Theorem 9.5.16. Let K be a field. Then any finite subgroup of K ⋆ is cyclic.

Proof. Let A ⊂ K ⋆ with |A| = n. Suppose that m = exp(A). Consider the polynomial
f (x) = x m − 1 ∈ K[x]. Since the order of each element in A divides m, it follows that
9.5 Generators and Cyclic Groups � 137

am = 1 for all a ∈ A; hence, each a ∈ A is a zero of the polynomial f (x). Hence, f (x) has
at least n zeros. Since a polynomial of degree m over a field can have at most m zeros, it
follows that n ≤ m. From Lemma 9.5.15, there is an element a ∈ A with o(a) = m. Since
|A| = n, it follows that m|n; hence, m ≤ n. Therefore, m = n; hence, A = ⟨a⟩ showing that
A is cyclic.
We close this section with two other results concerning cyclic groups. The first
proves, using group theory, a very interesting number theoretic result concerning the
Euler phi-function.

Theorem 9.5.17. For n > 1 and for d ≥ 1

∑ ϕ(d) = n.
d|n

Proof. Consider a cyclic group G of order n. For each d|n, d ≥ 1, there is a unique cyclic
subgroup H of order d. H then has ϕ(d) generators. Each element in G generates its
own cyclic subgroup H1 , say of order d and, hence, must be included in the ϕ(d) gener-
ators of H1 . Therefore, ∑d|n ϕ(d) is the sum of the numbers of generators of the cyclic
subgroups of G. But this must be the whole group; hence, this sum is n.

We shall make use of the above theorem directly in the following theorem.

Theorem 9.5.18. If |G| = n and if for each positive d such that d|n, G has at most one cyclic
subgroup of order d, then G is cyclic (and, consequently, has exactly one cyclic subgroup
of order d).

Proof. For each d|n, d > 0, let ψ(d) denote the number of elements of G of order d. Then

∑ ψ(d) = n.
d|n

Now suppose that ψ(d) ≠ 0 for a given d|n. Then there exists an a ∈ G of order d, which
generates a cyclic subgroup, ⟨a⟩, of order d of G. We claim that all elements of G of
order d are in ⟨a⟩. Indeed, if b ∈ G with o(b) = d and b ∉ ⟨a⟩, then ⟨b⟩ is a second cyclic
subgroup of order d, distinct from ⟨a⟩. This contradicts the hypothesis, so the claim is
proved. Thus, if ψ(d) ≠ 0, then ψ(d) = ϕ(d). In general, we have ψ(d) ≤ ϕ(d), for all
positive d|n. But n = ∑d|n ψ(d) ≤ ∑d|n ϕ(d), by the previous theorem. It follows, clearly,
from this that ψ(d) = ϕ(d) for all d|n. In particular, ψ(n) = ϕ(n) ≥ 1. Hence, there exists
at least one element of G of order n; hence, G is cyclic. This completes the proof.

Corollary 9.5.19. If in a group G of order n, for each d|n, the equation x d = 1 has at most
d solutions in G, then G is cyclic.

Proof. The hypothesis clearly implies that G can have at most one cyclic subgroup of
order d since all elements of such a subgroup satisfy the equation. So Theorem 9.5.18
applies to give our result.
138 � 9 Groups, Subgroups and Examples

If H is a subgroup of a group G then G operates as a group of permutations on the


set {aH : a ∈ R} of left cosets of H in G where R is a left transversal of H in G. This we
can use to show that a finitely generated group has only finitely many subgroups of a
given finite index.

Theorem 9.5.20. Let G be a finitely generated group. The number of subgroups of index
n < ∞ is finite.

Proof. Let H be a subgroup of index n. We choose a left transversal {c1 , . . . , cn } for H in


G where c1 = 1 represents H. G permutes the set of cosets ci H by multiplication from
the left. This induces a homomorphism ψH from G to Sn as follows. For each g ∈ G let
ψH (g) be the permutation which maps i to j if gci H = cj H. ψH (g) fixes the number 1 if
and only if g ∈ H because c1 H = H. Now, let H and L be two different subgroups of index
n in G. Then there exists g ∈ H with g ∉ L and ψH (g) ≠ ψL (g), and hence ψH and ψL
are different. Since G is finitely generated there are only finitely many homomorphisms
from G to Sn . Therefore the number of subgroups of index n < ∞ is finite.

9.6 Exercises
1. Prove Lemma 9.1.4.
2. Let G be a group and H a nonempty subset. H is a subgroup of G if and only if
ab−1 ∈ H for all a, b ∈ H.
3. Suppose that g ∈ G and g m = 1 for some positive integer m. Let n be the smallest
positive integer such that g n = 1.
Show that the set of elements {1, g, g 2 , . . . , g n−1 } are all distinct but for any other
power g k we have g k = g t for some k = 0, 1, . . . , n − 1.
4. Let G be a group and U1 , U2 be finite subgroups of G. If |U1 | and |U2 | are relatively
prime, then U1 ∩ U2 = {e}.
5. Let A, B be subgroups of a finite group G. If |A| ⋅ |B| > |G| then A ∩ B ≠ {e}.
2 2
6. Let G be the set of all real matrices of the form ( ab −b a ), where a + b ≠ 0. Show:
(a) G is a group.
(b) For each n ∈ ℕ there is at least one element of order n in G.
7. Let p be a prime, and let G = SL(2, p) = SL(2, ℤp ). Show: G has at least 2p−2 elements
of order p.
8. Let p be a prime and a ∈ ℤ. Show that ap ≡ a (mod p).
9. Here we outline a proof that every planar Euclidean congruence motion is either a
rotation, translation, reflection or glide reflection. An isometry in this problem is a
planar Euclidean congruence motion. Show:
(a) If T is an isometry then it is completely determined by its action on a triangle—
equivalent to showing that if T fixes three noncollinear points then it must be
the identity.
9.6 Exercises � 139

(b) If an isometry T has exactly one fixed point then it must be a rotation with that
point as center.
(c) If an isometry T has two fixed points then it fixes the line joining them. Then
show that if T is not the identity it must be a reflection through this line.
(d) If an isometry T has no fixed point but preserves orientation then it must be a
translation.
(e) If an isometry T has no fixed point but reverses orientation then it must be a
glide reflection.
10. Let Pn be a regular n-gon and Dn its group of symmetries. Show that |Dn | = 2n.
(Hint: First show that |Dn | ≤ 2n and then exhibit 2n distinct symmetries.)
11. If A, B have the same cardinality, then there exists a bijection σ : A → B. Define a
map F : SA → SB in the following manner: if f ∈ SA , let F(f ) be the permutation on
B given by F(f )(b) = σ(f (σ −1 (b))). Show that F is an isomorphism.
12. Prove Lemma 9.3.3.
10 Normal Subgroups, Factor Groups and Direct
Products
10.1 Normal Subgroups and Factor Groups
In rings, we saw that there were certain special types of subrings, called ideals, which
allowed us to define factor rings. The analogous object for groups is called a normal
subgroup, which we will define and investigate in this section.

Definition 10.1.1. Let G be an arbitrary group and suppose that H1 and H2 are subgroups
of G. We say that H2 is conjugate to H1 if there exists an element a ∈ G such that H2 =
a−1 H1 a. H1 , H2 are the called conjugate subgroups of G.

Lemma 10.1.2. Let G be an arbitrary group. Then the relation of conjugacy is an equiva-
lence relation on the set of subgroups of G.

Proof. We must show that conjugacy is reflexive, symmetric, and transitive. If H is a


subgroup of G, then 1−1 H1 = H; hence, H is conjugate to itself and, therefore, the relation
is reflexive.
Suppose that H1 is conjugate to H2 . Then there exists a g ∈ G with g −1 H1 g = H2 .
This implies that gH2 g −1 = H1 . However, (g −1 )−1 = g; hence, letting g −1 = g1 , we have
g1−1 H2 g1 = H1 . Therefore, H2 is conjugate to H1 and conjugacy is symmetric.
Finally, suppose that H1 is conjugate to H2 and H2 is conjugate to H3 . Then there exist
g1 , g2 ∈ G with H2 = g1−1 H1 g1 and H3 = g2−1 H2 g2 . Then

H3 = g2−1 g1−1 H1 g1 g2 = (g1 g2 )−1 H1 (g1 g2 ).

Therefore, H3 is conjugate to H1 and conjugacy is transitive.

Lemma 10.1.3. Let G be an arbitrary group. Then for g ∈ G, the map g : a → g −1 ag is an


automorphism on G.

Proof. For a fixed g ∈ G, define the map f : G → G by f (a) = g −1 ag for a ∈ G. We must


show that this is a homomorphism, and that it is one-to-one and onto.
Let a1 , a2 ∈ G. Then

f (a1 a2 ) = g −1 a1 a2 g = (g −1 a1 g)(g −1 a2 g) = f (a1 )f (a2 ).

Hence, f is a homomorphism.
If f (a1 ) = f (a2 ), then g −1 a1 g = g −1 a2 g. Clearly, by the cancellation law, we then have
a1 = a2 ; hence, f is one-to-one.
Finally, let a ∈ G, and let a1 = gag −1 . Then a = g −1 a1 g; hence, f (a1 ) = a. It follows
that f is onto; therefore, f is an automorphism on G.

https://doi.org/10.1515/9783111142524-010
10.1 Normal Subgroups and Factor Groups � 141

In general, a subgroup H of a group G may have many different conjugates. How-


ever, in certain situations, the only conjugate of a subgroup H is H itself. If this is the
case, we say that H is a normal subgroup. We will see shortly that this is precisely the
analog for groups of the concept of an ideal in rings.

Definition 10.1.4. Let G be an arbitrary group. A subgroup H is a normal subgroup of G,


which we denote by H ⊲ G, if g −1 Hg = H for all g ∈ G.

Since the conjugation map is an isomorphism, it follows that if g −1 Hg ⊂ H, then


g Hg = H. Hence, in order to show that a subgroup is normal, we need only show
−1

inclusion.

Lemma 10.1.5. Let N be a subgroup of a group G. Then if a−1 Na ⊂ N for all a ∈ G, then
a−1 Na = N. In particular, a−1 Na ⊂ N for all a ∈ G implies that N is a normal subgroup.

Notice that if g −1 Hg = H, then Hg = gH. That is as sets the left coset, gH, is equal to
the right coset, Hg. Hence, for each h1 ∈ H, there is an h2 ∈ H with gh1 = h2 g. If H ⊲ G,
this is true for all g ∈ G. Furthermore, if H is normal, then for the product of two cosets
g1 H and g2 H, we have

(g1 H)(g2 H) = g1 (Hg2 )H = g1 g2 (HH) = g1 g2 H.

If (g1 H)(g2 H) = (g1 g2 )H for all g1 , g2 ∈ G, we necessarily have g −1 Hg = H for all g ∈ G.


Hence, we have proved the following:

Lemma 10.1.6. Let H be a subgroup of a group G. Then the following are equivalent:
(1) H is a normal subgroup of G.
(2) g −1 Hg = H for all g ∈ G.
(3) gH = Hg for all g ∈ G.
(4) (g1 H)(g2 H) = (g1 g2 )H for all g1 , g2 ∈ G.

This is precisely the condition needed to construct factor groups. First we give some
examples of normal subgroups.

Lemma 10.1.7. Every subgroup of an Abelian group is normal.

Proof. Let G be Abelian and H a subgroup of G. Suppose g ∈ G, then gh = hg for all


h ∈ H since G is Abelian. It follows that gH = Hg. Since this is true for every g ∈ G, it
follows that H is normal.

Lemma 10.1.8. Let H ⊂ G be a subgroup of index 2; that is, [G : H] = 2. Then H is normal


in G.

Proof. Suppose that [G : H] = 2. We must show that gH = Hg for all g ∈ G. If g ∈ H,


clearly then, H = gH = Hg. Therefore, we may assume that g is not in H. Then there are
only 2 left cosets and 2 right cosets. That is,
142 � 10 Normal Subgroups, Factor Groups and Direct Products

G = H ∪ gH = H ∪ Hg.

Since the union is a disjoint union, we must have gH = Hg; hence, H is normal.

Lemma 10.1.9. Let K be any field. Then the group SL(n, K) is a normal subgroup of
GL(n, K) for any positive integer n.

Proof. Recall that GL(n, K) is the group of n × n matrices over the field K with nonzero
determinant, whereas SL(n, K) is the subgroup of n × n matrices over the field K with
determinant equal to 1. Let U ∈ SL(n, K) and T ∈ GL(n, K). Consider T −1 UT. Then

det(T −1 UT) = det(T −1 ) det(U) det(T) = det(U) det(T −1 T)


= det(U) det(I) = det(U) = 1.

Hence, T −1 UT ∈ SL(n, K) for any U ∈ SL(n, K), and any T ∈ GL(n, K). It follows that
T −1 SL(n, K)T ⊂ SL(n, K); therefore, SL(n, K) is normal in GL(n, K).

The intersection of normal subgroups is again normal, and the product of normal
subgroups is normal.

Lemma 10.1.10. Let N1 , N2 be normal subgroups of the group G. Then the following hold:
(1) N1 ∩ N2 is a normal subgroup of G.
(2) N1 N2 is a normal subgroup of G.
(3) If H is any subgroup of G, then N1 ∩ H is a normal subgroup of H, and N1 H = HN1 .

Proof. We first show (1). Let n ∈ N1 ∩ N2 and g ∈ G. Then g −1 ng ∈ N1 since N1 is normal.


Similarly, g −1 ng ∈ N2 since N2 is normal. Hence, g −1 ng ∈ N1 ∩ N2 . It follows that g −1 (N1 ∩
N2 )g ⊂ N1 ∩ N2 ; therefore, N1 ∩ N2 is normal.
We now show (2). Let n1 ∈ N1 , n2 ∈ N2 . Since N1 , N2 are both normal N1 N2 = N2 N1 as
sets, and the complex N1 N2 forms a subgroup of G. Let g ∈ G and n1 n2 ∈ N1 N2 . Then

g −1 (n1 n2 )g = (g −1 n1 g)(g −1 n2 g) ∈ N1 N2

since g −1 n1 g ∈ N1 and g −1 n2 g ∈ N2 . Therefore, N1 N2 is normal in G.


We finally show (3). Let h ∈ H and n ∈ N ∩ H. Then as in part (a), h−1 nh ∈ N ∩ H;
therefore, N ∩ H is a normal subgroup of H. If nh ∈ N1 H, n ∈ N1 , h ∈ H, then nh = hn′
with some n′ ∈ N1 . Hence, N1 H = HN1 .

We now construct factor groups or quotient groups of a group modulo a normal


subgroup.

Definition 10.1.11. Let G be an arbitrary group and H a normal subgroup of G. Let G/H
denote the set of distinct left (and hence also right) cosets of H in G. On G/H, define the
multiplication (g1 H)(g2 H) = g1 g2 H for any elements g1 H, g2 H in G/H.
10.1 Normal Subgroups and Factor Groups � 143

Theorem 10.1.12. Let G be a group and H a normal subgroup of G. Then G/H under the
operation defined above forms a group. This group is called the factor group or quotient
group of G modulo H. The identity element is the coset 1H = H, and the inverse of a coset
gH is g −1 H.

Proof. We first show that the operation on G/N is well defined. Suppose that a′ N = aN
and b′ N = bN, then b′ ∈ bN, and so b′ = bn1 . Similarly a′ = an2 , where n1 , n2 ∈ N.
Therefore,

a′ b′ N = an2 bn1 N = an2 bN

since n1 ∈ N. But b−1 n2 b = n3 ∈ N, since N is normal. Therefore, the right-hand side of


the equation can be written as

an2 bN = abN.

Thus, we have shown that if N ⊲ G, then a′ b′ N = abN, and the operation on G/N is
indeed well defined.
The associative law is true, because coset multiplication as defined above uses the
ordinary group operation, which is by definition associative.
The coset N serves as the identity element of G/N. Notice that

aN ⋅ N = aN 2 = aN,

and

N ⋅ aN = aN 2 = aN.

The inverse of aN is a−1 N since

aNa−1 N = aa−1 N 2 = N.

We emphasize that the elements of G/N are cosets; thus, subsets of G. If |G| < ∞,
then |G/N| = [G : N], the number of cosets of N in G. It is also to be emphasized that for
G/N to be a group, N must be a normal subgroup of G.
In some cases, properties of G are preserved in factor groups.

Lemma 10.1.13. If G is Abelian, then any factor group of G is also Abelian. If G is cyclic,
then any factor group of G is also cyclic.

Proof. Suppose that G is Abelian and H is a subgroup of G. H is necessarily normal from


Lemma 10.1.7 so that we can form the factor group G/H. Let g1 H, g2 H ∈ G/H. Since G is
Abelian, we have g1 g2 = g2 g1 . Then in G/H,

(g1 H)(g2 H) = (g1 g2 )H = (g2 g1 )H = (g2 H)(g1 H).


144 � 10 Normal Subgroups, Factor Groups and Direct Products

Therefore, G/H is Abelian.


We leave the proof of the second part to the exercises.

An extremely important concept has to do with when a group contains no proper


normal subgroups other than the identity subgroup {1}.

Definition 10.1.14. A group G ≠ {1} is simple, provided that N⊲G implies N = G or N = {1}.

One of the most outstanding problems in group theory has been to give a complete
classification of all finite simple groups. In other words, this is the program to discover
all finite simple groups, and to prove that there are no more to be found. This was ac-
complished through the efforts of many mathematicians. The proof of this magnificent
result took thousands of pages. We refer the reader to [30] for a complete discussion of
this. We give one elementary example:

Lemma 10.1.15. Any finite group of prime order is simple and cyclic.

Proof. Suppose that G is a finite group and |G| = p, where p is a prime. Let g ∈ G with
g ≠ 1. Then ⟨g⟩ is a nontrivial subgroup of G, so its order divides the order of G by
Lagrange’s theorem. Since g ≠ 1, and p is a prime, we must have |⟨g⟩| = p. Therefore,
⟨g⟩ is all of G; that is, G = ⟨g⟩; hence, G is cyclic.
The argument above shows that G has no nontrivial proper subgroups and, there-
fore, no nontrivial normal subgroups. Therefore, G is simple.

In the next chapter, we will examine certain other finite simple groups.

10.2 The Group Isomorphism Theorems


In Chapter 1, we saw that there was a close relationship between ring homomorphisms
and factor rings. In particular to each ideal, and consequently to each factor ring, there
is a ring homomorphism that has that ideal as its kernel. Conversely, to each ring homo-
morphism, its kernel is an ideal, and the corresponding factor ring is isomorphic to the
image of the homomorphism. This was formalized in Theorem 1.5.7, which we called the
ring isomorphism theorem. We now look at the group theoretical analog of this result,
called the group isomorphism theorem. We will then examine some consequences of this
result that will be crucial in the Galois theory of fields.

Definition 10.2.1. If G1 and G2 be groups and f : G1 → G2 is a group homomorphism,


then the kernel of f , denoted ker(f ), is defined as

ker(f ) = {g ∈ G1 : f (g) = 1}.

That is, the kernel is the set of the elements of G1 that map onto the identity of G2 . The
image of f , denoted im(f ), is the set of elements of G2 mapped onto by f from elements
of G1 . That is,
10.2 The Group Isomorphism Theorems � 145

im(f ) = {g ∈ G2 : f (g1 ) = g2 for some g1 ∈ G1 }.

Note that if f is a surjection, then im(f ) = G2 .

As with ring homomorphisms the kernel measures how far a homomorphism is


from being an injection, that is, a one-to-one mapping.

Lemma 10.2.2. Let G1 and G2 be groups and f : G1 → G2 a group homomorphism. Then


f is injective if and only if ker(f ) = {1}.

Proof. Suppose that f is injective. Since f (1) = 1, we always have 1 ∈ ker(f ). Suppose
that g ∈ ker(f ). Then f (g) = f (1). Since f is injective, this implies that g = 1; hence,
ker(f ) = {1}.
Conversely, suppose that ker(f ) = {1} and f (g1 ) = f (g2 ). Then

f (g1 )(f (g2 )) = 1 󳨐⇒ f (g1 g2−1 ) = 1 󳨐⇒ g1 g2−1 ∈ ker(f ).


−1

Then since ker(f ) = {1}, we have g1 g2−1 = 1; hence, g1 = g2 . Therefore, f is injective.

We now state the group isomorphism theorem. This is entirely analogous to the ring
isomorphism theorem replacing ideals by normal subgroups. We note that this theorem
is sometimes called the first group isomorphism theorem.

Theorem 10.2.3 (Group isomorphism theorem). (a) Let G1 and G2 be groups and f : G1 →
G2 a group homomorphism. Then ker(f ) is a normal subgroup of G1 , im(f ) is a sub-
group of G2 , and

G/ ker(f ) ≅ im(f ).

(b) Conversely, suppose that N is a normal subgroup of a group G. Then there exists a
group H and a homomorphism f : G → H such that ker(f ) = N, and im(f ) = H.

Proof. We first show (a). Since 1 ∈ ker(f ), the kernel is nonempty. Now suppose that
g1 , g2 ∈ ker(f ). Then f (g1 ) = f (g2 ) = 1. It follows that f (g1 g2−1 ) = f (g1 )(f (g2 ))−1 = 1. Hence,
g1 g2−1 ∈ ker(f ); therefore, ker(f ) is a subgroup of G1 . Furthermore, for g ∈ G1 , we have

f (g −1 g1 g) = (f (g)) f (g1 )f (g)


−1

⋅ 1 ⋅ f (g) = f (g −1 g) = f (1) = 1.
−1
= (f (g))

Hence, g −1 g1 g ∈ ker(f ) and ker(f ) is a normal subgroup. It is straightforward to show


that im(f ) is a subgroup of G2 . Consider the map f ̂ : G/ ker(f ) → im(f ) defined by

f ̂(g ker(f )) = f (g).

We show that this is an isomorphism.


146 � 10 Normal Subgroups, Factor Groups and Direct Products

Suppose that g1 ker(f ) = g2 ker(f ), then g1 g2−1 ∈ ker(f ) so that f (g1 g2−1 ) = 1. This
implies that f (g1 ) = f (g2 ); hence, the map f ̂ is well defined. Now,

f ̂(g1 ker(f )g2 ker(f )) = f ̂(g1 g2 ker(f )) = f (g1 g2 )


= f (g1 )f (g2 ) = f ̂(g1 ker(f ))f ̂(g2 ker(f ));

therefore, f ̂ is a homomorphism. Suppose that f ̂(g1 ker(f )) = f ̂(g2 ker(f )), then it follows
that f (g1 ) = f (g2 ); and hence, g1 ker(f ) = g2 ker(f ). It follows that f ̂ is injective.
Finally, suppose that h ∈ im(f ). Then there exists a g ∈ G1 with f (g) = h. Then
f ̂(g ker(f )) = h, and f ̂ is a surjection onto im(f ). Therefore, f ̂ is an isomorphism com-
pleting the proof of part (a).
Conversely, suppose that N is a normal subgroup of G. Define the map f : G → G/N
by f (g) = gN for g ∈ G. By the definition of the product in the quotient group G/N, it is
clear that f is a homomorphism with im(f ) = G/N. If g ∈ ker(f ), then f (g) = gN = N
since N is the identity in G/N. However, this implies that g ∈ N; hence, it follows that
ker(f ) = N, completing the proof.

There are two related theorems that are called the second isomorphism theorem
and the third isomorphism theorem.

Theorem 10.2.4 (Second isomorphism theorem). Let N be a normal subgroup of a group


G and U a subgroup of G. Then U ∩ N is normal in U, and

(UN)/N ≅ U/(U ∩ N).

Proof. From Lemma 10.1.10, we know that U ∩ N is normal in U. We define the map
α : UN → U/U ∩ N by α(un) = u(U ∩ N). If un = u′ n′ , then u′ −1 u = n′ n−1 ∈ U ∩ N.
Therefore, u′ (U ∩ N) = u(U ∩ N); hence, the map α is well defined.
Suppose that un, u′ n′ ∈ UN. Since N is normal in G, we have that unu′ n′ ∈ uu′ N.
Hence, unu′ n′ = uu′ n′′ with n′′ ∈ N. Then

α(unu′ n′ ) = α(uu′ n) = uu′ (U ∩ N).

However, U ∩ N is normal in U, so

uu′ (U ∩ N) = u(U ∩ N)u′ (U ∩ N) = α(un)α(u′ n′ ).

Therefore, α is a homomorphism. We have im(α) = U/(U ∩ N) by definition. Suppose


that un ∈ ker(α). Then α(un) = U ∩ N ⊂ N, which implies u ∈ N. Therefore, ker(f ) = N.
From the group isomorphism theorem, we then have

UN/N ≅ U/(U ∩ N),

proving the theorem.


10.2 The Group Isomorphism Theorems � 147

Theorem 10.2.5 (Third isomorphism theorem). Let N and M be normal subgroups of a


group G with N a subgroup of M. Then M/N is a normal subgroup in G/N, and

(G/N)/(M/N) ≅ G/M.

Proof. Define the map β : G/N → G/M by

β(gN) = gM.

It is straightforward that β is well defined and a homomorphism. If gN ∈ ker(β), then


β(gN) = gM = M; hence, g ∈ M. It follows that ker(β) = M/N. In particular, this shows
that M/N is normal in G/N. From the group isomorphism theorem then,

(G/N)/(M/N) ≅ G/M.

For a normal subgroup N in G, the homomorphism f : G → G/N provides a one-


to-one correspondence between subgroups of G containing N and the subgroups of
G/N. This correspondence will play a fundamental role in the study of subfields of a
field.

Theorem 10.2.6 (Correspondence Theorem). Let N be a normal subgroup of a group G,


and let f be the corresponding homomorphism f : G → G/N. Then the mapping

ϕ : H → f (H),

where H is a subgroup of G containing N provides a one-to-one correspondence between


all the subgroups of G/N and the subgroups of G containing N.

Proof. We first show that the mapping ϕ is surjective. Let H1 be a subgroup of G/N, and
let

H = {g ∈ G : f (g) ∈ H1 }.

We show that H is a subgroup of G, and that N ⊂ H.


If g1 , g2 ∈ H, then f (g1 ) ∈ H1 , and f (g2 ) ∈ H1 . Therefore, f (g1 )f (g2 ) ∈ H1 ; hence,
f (g1 g2 ) ∈ H1 . Therefore, g1 g2 ∈ H. In an identical fashion, g1−1 ∈ H. Therefore, H is a
subgroup of G. If n ∈ N, then f (n) = 1 ∈ H1 ; hence, n ∈ H. Therefore, N ⊂ H, showing
that the map ϕ is surjective.
Suppose that ϕ(H1 ) = ϕ(H2 ), where H1 and H2 are subgroups of G containing N.
This implies that f (H1 ) = f (H2 ). Let g1 ∈ H1 . Then f (g1 ) = f (g2 ) for some g2 ∈ H2 . Then
g1 g2−1 ∈ ker(f ) = N ⊂ H2 . It follows that g1 g2−1 ∈ H2 so that g1 ∈ H2 . Hence, H1 ⊂ H2 . In a
similar fashion, H2 ⊂ H1 ; therefore, H1 = H2 . It follows that ϕ is injective.
148 � 10 Normal Subgroups, Factor Groups and Direct Products

10.3 Direct Products of Groups


In this section, we look at a very important construction, the direct product, which al-
lows us to build new groups out of existing groups. This construction is the analog for
groups of the direct sum of rings. As an application of this construction, in the next sec-
tion, we present a theorem, which completely describes the structure of finite Abelian
groups.
Let G1 , G2 be groups and let G be the Cartesian product of G1 and G2 . That is,

G = G1 × G2 = {(a, b) : a ∈ G1 , b ∈ G2 }.

On G, define

(a1 , b1 ) ⋅ (a2 , b2 ) = (a1 a2 , b1 b2 ).

With this operation, it is direct to verify the groups axioms for G; hence, G becomes a
group.

Theorem 10.3.1. Let G1 , G2 be groups and G the Cartesian product G1 × G2 with the op-
eration defined above. Then G forms a group called the direct product of G1 and G2 . The
identity element is (1, 1), and (g, h)−1 = (g −1 , h−1 ).

This can be iterated to any finite number of groups (also to an infinite number, that
we will not consider here) G1 , . . . , Gn to form the direct product G1 × G2 × ⋅ ⋅ ⋅ × Gn .

Theorem 10.3.2. For groups G1 and G2 , we have G1 × G2 ≅ G2 × G1 , and G1 × G2 is Abelian


if and only if each Gi , i = 1, 2, is Abelian.

Proof. The map (a, b) → (b, a), where a ∈ G1 , b ∈ G2 provides an isomorphism G1 ×G2 →
G2 × G1 .
Suppose that both G1 , G2 are Abelian. Then if a1 , a2 ∈ G1 , b1 , b2 ∈ G2 , we have

(a1 , b1 )(a2 , b2 ) = (a1 a2 , b1 b2 ) = (a2 a1 , b2 b1 ) = (a2 , b2 )(a1 , b1 );

hence, G1 × G2 is Abelian.
Conversely, suppose G1 × G2 is Abelian, and suppose that a1 , a2 ∈ G1 . Then for the
identity 1 ∈ G2 , we have

(a1 a2 , 1) = (a1 , 1)(a2 , 1) = (a2 , 1)(a1 , 1) = (a2 a1 , 1).

Therefore, a1 a2 = a2 a1 , and G1 is Abelian. Similarly, G2 is Abelian.

We show next that in G1 × G2 , there are normal subgroups H1 , H2 with H1 ≅ G1 and


H2 ≅ G2 .
10.4 Finite Abelian Groups � 149

Theorem 10.3.3. Let G = G1 × G2 . Let H1 = {(a, 1) : a ∈ G1 } and H2 = {(1, b) : b ∈ G2 }. Then


both H1 and H2 are normal subgroups of G with G = H1 H2 and H1 ∩ H2 = {1}. Furthermore,
H1 ≅ G1 , H2 ≅ G2 , G/H1 ≅ G2 , and G/H2 ≅ G1 .

Proof. Map G1 ×G2 onto G2 by (a, b) → b. It is clear that this map is a homomorphism, and
that the kernel is H1 = {(a, 1) : a ∈ G1 }. This establishes that H1 is a normal subgroup of G,
and that G/H1 ≅ G2 . In an identical fashion, we get that G/H2 ≅ G1 . The map (a, 1) → a
provides the isomorphism from H1 onto G1 .
If the factors are finite, it is easy to find the order of G1 × G2 . The size of the Cartesian
product is just the product of the sizes of the factors.

Lemma 10.3.4. If |G1 | and |G2 | are finite, then |G1 × G2 | = |G1 ||G2 |.

Now suppose that G is a group with normal subgroups G1 , G2 such that G = G1 G2


and G1 ∩ G2 = {1}. Then we will show that G is isomorphic to the direct product G1 × G2 .
In this case, we say that G is the internal direct product of its subgroups, and that G1 , G2
are direct factors of G.

Theorem 10.3.5. Suppose that G is a group with normal subgroups G1 , G2 with G = G1 G2 ,


and G1 ∩ G2 = {1}. Then G is isomorphic to the direct product G1 × G2 .

Proof. Since G = G1 G2 , each element of G has the form ab with a ∈ G1 , b ∈ G2 . This repre-
sentation as ab is unique as G1 ∩ G2 = {1}. We first show that each a ∈ G1 commutes with
each b ∈ G2 . Consider the element aba−1 b−1 . Since G1 is normal ba−1 b−1 ∈ G1 , which im-
plies that abab−1 ∈ G1 . Since G2 is normal, aba−1 ∈ G2 , which implies that aba−1 b−1 ∈ G2 .
Therefore, aba−1 b−1 ∈ G1 ∩ G2 = {1}; hence, aba−1 b1 = 1, so that ab = ba.
Now map G onto G1 × G2 by f (ab) → (a, b). We claim that this is an isomorphism. It
is clearly onto. Now

f ((a1 b1 )(a2 b2 )) = f (a1 a2 b1 b2 ) = (a1 a2 , b1 b2 )


= (a1 , b1 )(a2 , b2 ) = f ((a1 , b1 ))(f (a2 , b2 )),

so that f is a homomorphism. The kernel is G1 ∩G2 = {1}, and so f is an isomorphism.


Although the end resulting groups are isomorphic, we call G1 × G2 an external direct
product if we started with the groups G1 , G2 and constructed G1 × G2 , and we call G1 × G2
an internal direct product if we started with a group G having normal subgroups, as in
the theorem.

10.4 Finite Abelian Groups


We now use the results of the last section to present a theorem that completely provides
the structure of finite Abelian groups. This theorem is a special case of a general result
on modules that we will examine in detail in Chapter 19.
150 � 10 Normal Subgroups, Factor Groups and Direct Products

Theorem 10.4.1 (Basis theorem for finite Abelian groups). Let G be a finite Abelian group.
Then G is a direct product of cyclic groups of prime power order.

Before giving the proof, we give two examples showing how this theorem leads to
the classification of finite Abelian groups.
Since all cyclic groups of order n are isomorphic to (ℤn , +), we will denote a cyclic
group of order n by ℤn .

Example 10.4.2. Classify all Abelian groups of order 60. Let G be an Abelian group of
order 60. From Theorem 10.4.1, G must be a direct product of cyclic groups of prime
power order. Now 60 = 22 ⋅ 3 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic group involved in the direct product decomposition of G have order either 2, 4, 3,
or 5 (by Lagrange’s theorem, they must be divisors of 60). Therefore, G must be of the
form

G ≅ ℤ4 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ5 .

Hence, up to isomorphism, there are only two Abelian groups of order 60.

Example 10.4.3. Classify all Abelian groups of order 180. Now 180 = 22 ⋅ 32 ⋅ 5, so the only
primes involved are 2, 3, and 5. Hence, the cyclic group involved in the direct product
decomposition of G have order either 2, 4, 3, 9, or 5 (by Lagrange’s theorem, they must
be divisors of 180). Therefore, G must be of the form

G ≅ ℤ4 × ℤ9 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ9 × ℤ5
G ≅ ℤ4 × ℤ3 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ3 × ℤ5 .

Hence, up to isomorphism, there are four Abelian groups of order 180.

The proof of Theorem 10.4.1 involves the following lemmas:

Lemma 10.4.4. Let G be a finite Abelian group, and let p||G|, where p is a prime. Then
all the elements of G, whose orders are a power of p, form a normal subgroup of G. This
subgroup is called the p-primary component of G, which we will denote by Gp .

Proof. Let p be a prime with p||G|, and let a and b be two elements of G of order a power
of p. Since G is Abelian, the order of ab is the lcm of the orders, which is again a power
of p. Therefore, ab ∈ Gp . The order of a−1 is the same as the order of a, so a−1 ∈ Gp ;
therefore, Gp is a subgroup.
e e
Lemma 10.4.5. Let G be a finite Abelian group of order n. Suppose that n = p11 ⋅ ⋅ ⋅ pkk with
p1 , . . . , pk distinct primes. Then
10.4 Finite Abelian Groups � 151

G ≅ Gp1 × ⋅ ⋅ ⋅ × Gpk ,

where Gpi is the pi -primary component of G.

Proof. Each Gpi is normal since G is Abelian, and since distinct primes are relatively
prime, the intersection of the Gpi is the identity. Therefore, Lemma 10.4.5 will follow by
showing that each element of G is a product of elements in the Gp1 .
f f f
Let g ∈ G. Then the order of g is p11 ⋅ ⋅ ⋅ pkk . We write this as pii m with (m, pi ) = 1. Then
f
g m has order pi i and, hence, is in Gpi . Now since p1 , . . . , pk are relatively prime, there
exists m1 , . . . , mk with

f f
m1 p11 + ⋅ ⋅ ⋅ + mk pkk = 1;

hence,
f1 fk
m1 m
g = (g p1 ) ⋅ ⋅ ⋅ (g pk ) k .

Therefore, g is a product of elements in the Gpi .


We next need the concept of a basis. Let G be any finitely generated Abelian group
(finite or infinite), and let g1 , . . . , gn be a set of generators for G. The generators g1 , . . . , gn
form a basis if

G = ⟨g1 ⟩ × ⋅ ⋅ ⋅ × ⟨gn ⟩;

that is, G is the direct product of the cyclic subgroups generated by the gi . The basis
theorem for finite Abelian groups says that any finite Abelian group has a basis. Suppose
that G is a finite Abelian group with a basis g1 , . . . , gk so that G = ⟨g1 ⟩ × ⋅ ⋅ ⋅ × ⟨gk ⟩. Since
G is finite, each gi has finite order, say mi . It follows then, from the fact that G is a direct
product, that each g ∈ G can be expressed as

n n
g = g1 1 ⋅ ⋅ ⋅ gk k

and, furthermore, the integers n1 , . . . , nk are unique modulo the order of gi . Hence, each
integer ni can be chosen in the range 0, 1, . . . , mi −1, and within this range for the element
g, the integer ni is unique.
From the previous lemma, each finite Abelian group splits into a direct product of
its p-primary components for different primes p. Hence, to complete the proof of the
basis theorem, we must show that any finite Abelian group of order pm for some prime
p has a basis. We call an Abelian group of order pm an Abelian p-group.
Consider an Abelian group G of order pm for a prime p. It is somewhat easier to com-
plete the proof if we consider the group using additive notation. That is, the operation is
considered +, the identity as 0, and powers are given by multiples. Hence, if an element
g ∈ G has order pk , then in additive notation, pk g = 0.
152 � 10 Normal Subgroups, Factor Groups and Direct Products

A set of elements g1 , . . . , gk is then a basis for G if each g ∈ G can be expressed


uniquely as g = m1 g1 + ⋅ ⋅ ⋅ + mk gk , where the mi are unique modulo the order of gi . We
say that the g1 , . . . , gk are independent, and this is equivalent to the fact that whenever
m1 g1 +⋅ ⋅ ⋅+mk gk = 0, then mi ≡ 0 modulo the order of gi . We now prove that any Abelian
p-group has a basis.

Lemma 10.4.6. Let G be a finite Abelian group of prime power order pn for some prime p.
Then G is a direct product of cyclic groups.

Notice that in the group G, we have pn g = 0 for all g ∈ G as a consequence of La-


grange’s theorem. Furthermore, every element has as its order a power of p. The smallest
power of p, say pr such that pr g = 0 for all g ∈ G, is called the exponent of G. Any finite
Abelian p-group must have some exponent pr .

Proof. The proof of this lemma is by induction on the exponent.


The lowest possible exponent is p. So, first, suppose that pg = 0 for all g ∈ G.
Since G is finite it has a finite system of generators. Let S = {g1 , . . . , gk } be a minimal
set of generators for G. We claim that this is a basis. Since this is a set of generators, to
show that it is a basis, we must show that they are independent. Hence, suppose that we
have

m1 g1 + ⋅ ⋅ ⋅ + mk gk = 0 (10.1)

for some set of integers mi . Since the order of each gi is p, as explained above, we may
assume that 0 ≤ mi < p for i = 1, . . . , k. Suppose that one mi ≠ 0.
Then (mi , p) = 1; hence, there exists an xi with mi xi ≡ 1 (mod p) (see Chapter 4).
Multiplying the equation (10.1) by xi , we get modulo p,

m1 xi g1 + ⋅ ⋅ ⋅ + gi + ⋅ ⋅ ⋅ + mk xi gk = 0,

and rearranging

gi = −m1 xi g1 − ⋅ ⋅ ⋅ − mk xk gk .

But then gi can be expressed in terms of the other gj ; therefore, the set {g1 , . . . , gk } is
not minimal. It follows that g1 , . . . , gk constitute a basis, and the lemma is true for the
exponent p.
Now suppose that any finite Abelian group of exponent pn−1 has a basis, and assume
that G has exponent pn . Consider the set G = pG = {pg : g ∈ G}. It is straightforward
that this forms a subgroup (see exercises). Since pn g = 0 for all g ∈ G, it follows that
pn−1 g = 0 for all g ∈ G, and so the exponent of G ≤ pn−1 . By the inductive hypothesis, G
has a basis

S = {pg1 , . . . , pgk }.
10.4 Finite Abelian Groups � 153

Consider the set {g1 , . . . , gk }, and adjoin to this set the set of all elements h ∈ G, satisfying
ph = 0. Call this set S1 , so that we have

S1 = {g1 , . . . , gk , h1 , . . . , ht }.

We claim that S1 is a set of generators for G. Let g ∈ G. Then pg ∈ G, which has the basis
pg1 , . . . , pgk , so that

pg = m1 pg1 + ⋅ ⋅ ⋅ + mk pgk .

This implies that

p(g − m1 g1 − ⋅ ⋅ ⋅ − mk gk ) = 0,

so that g1 − m1 g1 − ⋅ ⋅ ⋅ − mk gk must be one of the hi . Hence,

g − m1 g1 − ⋅ ⋅ ⋅ − mk gk = hi , so that g = m1 g1 + ⋅ ⋅ ⋅ + mk gk + hi ,

proving the claim.


Now S1 is finite, so there is a minimal subset of S1 that is still a generating system
for G. Call this S0 , and suppose that S0 , renumbering if necessary, is

S0 = {g1 , . . . , gr , h1 , . . . , hs } with phi = 0 for i = 1, . . . , s.

The subgroup generated by h1 , . . . , hs has exponent p. Therefore, by inductive hypoth-


esis, has a basis. We may assume then that h1 , . . . , hs is a basis for this subgroup and,
hence, is independent. We claim now that g1 , . . . , gr , h1 , . . . , hs are independent and,
hence, form a basis for G.
Suppose that

m1 g1 + ⋅ ⋅ ⋅ + mr gr + n1 h1 + ⋅ ⋅ ⋅ + ns hs = 0 (10.2)

for some integers m1 , . . . , mr , h1 , . . . , hs . Each mi , ni must be divisible by p. Suppose, for


example, that an mi is not. Then (mi , p) = 1, and then (mi , pn ) = 1. This implies that there
exists an xi with mi xi ≡ 1 (mod pn ). Multiplying through by xi and rearranging, we then
obtain

gi = −m1 xi g1 − ⋅ ⋅ ⋅ − ns xi hs .

Therefore, gi can be expressed in terms of the remaining elements of S0 , contradict-


ing the minimality of S0 . An identical argument works if an ni is not divisible by p.
Therefore, the relation (10.2) takes the form

a1 pg1 + ⋅ ⋅ ⋅ + ar pgr + b1 ph1 + ⋅ ⋅ ⋅ + bs phs = 0. (10.3)


154 � 10 Normal Subgroups, Factor Groups and Direct Products

Each of the terms phi = 0, so that (10.3) becomes

a1 pg1 + ⋅ ⋅ ⋅ + ar pgr = 0.

The g1 , . . . , gr are independent and, hence, ai p = 0 for each i; hence, ai = 0. Now (10.2)
becomes

n1 h1 + ⋅ ⋅ ⋅ + ns hs = 0.

However, h1 , . . . , hs are independent, so each ni = 0, completing the claim.


Therefore, the whole group G has a basis proving the lemma by induction.

For more details see the proof of the general result on modules over principal ideal
domains later in the book. There is also an additional elementary proof for the basis
theorem for finitely generated Abelian groups.

10.5 Some Properties of Finite Groups


Classification is an extremely important concept in algebra. A large part of the theory is
devoted to classifying all structures of a given type, for example all UFD’s. In most cases,
this is not possible. Since for a given finite n, there are only finitely many group tables, it
is theoretically possible to classify all groups of order n. However, even for small n, this
becomes impractical. We close the chapter by looking at some further results on finite
groups, and then using these to classify all the finite groups up to order 10.
Before stating the classification, we give some further examples of groups that are
needed.

Example 10.5.1. In Example 9.2.6, we saw that the symmetry group of an equilateral
triangle had 6 elements, and is generated by elements r and f , which satisfy the relations
r 3 = f 2 = 1, f −1 rf = r −1 , where r is a rotation of 120∘ about the center of the triangle, and
f is a reflection through an altitude. This was called the dihedral group D3 of order 6.
This can be generalized to any regular n-gon, n > 2. If D is a regular n-gon, then
the symmetry group Dn has 2n elements, and is called the dihedral group of order 2n. It
is generated by elements r and f , which satisfy the relations r n = f 2 = 1, f −1 rf = r n−1 ,
where r is a rotation of 2π n
about the center of the n-gon, and f is a reflection.
Hence, D4 , the symmetries of a square, has order 8 and D5 , the symmetries of a
regular pentagon, has order 10.

Example 10.5.2. Let i, j, k be the generators of the quaternions. Then we have

i2 = j2 = k 2 = −1, (−1)2 = 1, and ijk = 1.

These elements then form a group of order 8 called the quaternion group denoted by Q.
Since ijk = 1, we have ij = −ji, and the generators i and j satisfy the relations i4 = j4 = 1,
i2 = j2 , ij = i2 ji.
10.5 Some Properties of Finite Groups � 155

We now state the main classification, and then prove it in a series of lemmas.

Theorem 10.5.3. Let G be a finite group.


(a) If |G| = 2, then G ≅ ℤ2 .
(b) If |G| = 3, then G ≅ ℤ3 .
(c) If |G| = 4, then G ≅ ℤ4 , or G ≅ ℤ2 × ℤ2 .
(d) If |G| = 5, then G ≅ ℤ5 .
(e) If |G| = 6, then G ≅ ℤ6 ≅ ℤ2 × ℤ3 , or G ≅ D3 , the dihedral group with 6 elements.
(Note D3 ≅ S3 the symmetric group on 3 symbols.)
(f) If |G| = 7, then G ≅ ℤ7 .
(g) If |G| = 8, then G ≅ ℤ8 , or G ≅ ℤ4 × ℤ2 , or G ≅ ℤ2 × ℤ2 × ℤ2 , or G ≅ D4 , the dihedral
group of order 8, or G ≅ Q, the quaternion group.
(h) If |G| = 9, then G ≅ ℤ9 , or G ≅ ℤ3 × ℤ3 .
(i) If |G| = 10, then G ≅ ℤ10 ≅ ℤ2 × ℤ5 , or G ≅ D5 , the dihedral group with 10 elements.

Recall from Section 10.1, that a finite group of prime order must be cyclic. Hence, in
the theorem, the cases |G| = 2, 3, 5, 7 are handled. We next consider the case, where G
has order p2 , and where p is a prime.

Definition 10.5.4. If G is a group, then its center denoted Z(G), is the set of elements in G,
which commute with everything in G. That is,

Z(G) = {g ∈ G : gh = hg for any h ∈ G}.

Lemma 10.5.5. For any group G the following hold:


(a) The center Z(G) is a normal subgroup.
(b) G = Z(G) if and only if G is Abelian.
(c) If G/Z(G) is cyclic, then G is Abelian.

Proof. (a) and (b) are direct, and we leave them to the exercises. Consider the case,
where G/Z(G) is cyclic. Then each coset of Z(G) has the form g m Z(G), where g ∈ G.
Let a, b ∈ G. Then since a, b are in cosets of the center, we have a = g m u and b = g n v
with u, v ∈ Z(G). Then

ab = (g m u)(g n v) = (g m g n )(uv) = (g n g m )(vu) = (g n v)(g m u) = ba

since u, v commute with everything. Therefore, G is Abelian.

A p-group is any finite group of prime power order pk . We need the following: The
proof of this is based on what is called the class equation, which we will prove in Chap-
ter 13.

Lemma 10.5.6. A finite p-group has a nontrivial center of order at least p.


156 � 10 Normal Subgroups, Factor Groups and Direct Products

Lemma 10.5.7. If |G| = p2 with p a prime, then G is Abelian; hence we have G ≅ ℤp2 , or
G ≅ ℤp × ℤp .

Proof. Suppose that |G| = p2 . Then from the previous lemma, G has a nontrivial center;
hence, |Z(G)| = p, or |Z(G)| = p2 . If |Z(G)| = p2 , then G = Z(G), and G is Abelian. If
|Z(G)| = p, then |G/Z(G)| = p. Since p is a prime this implies that G/Z(G) is cyclic; hence,
from Lemma 10.5.5, G is Abelian.
Lemma 10.5.7 handles the cases n = 4 and n = 9. Therefore, if |G| = 4, we must have
G ≅ ℤ4 , or G ≅ ℤ2 × ℤ2 , and if |G| = 9, we must have G ≅ ℤ9 , or G ≅ ℤ3 × ℤ3 .
This leaves n = 6, 8, 10. We next handle the cases 6 and 10.

Lemma 10.5.8. If G is any group, where every nontrivial element has order 2, then G is
Abelian.

Proof. Suppose that g 2 = 1 for all g ∈ G. This implies that g = g −1 for all g ∈ G. Let a, b
be arbitrary elements of G. Then

(ab)2 = 1 󳨐⇒ abab = 1 󳨐⇒ ab = b−1 a−1 = ba.

Therefore, a, b commute, and G is Abelian.

Lemma 10.5.9. If |G| = 6, then G ≅ ℤ6 , or G ≅ D3 .

Proof. Since 6 = 2 ⋅ 3, if G was Abelian, then G ≅ ℤ2 × ℤ3 . Notice that if an Abelian


group has an element of order m and an element of order n with (n, m) = 1, then it has
an element of order mn. Therefore, for 6 if G is Abelian, there is an element of order 6;
hence, G ≅ ℤ2 × ℤ3 ≅ ℤ6 .
Now suppose that G is non-Abelian. The nontrivial elements of G have orders 2, 3,
or 6. If there is an element of order 6, then G is cyclic, and hence Abelian. If every element
has order 2, then G is Abelian. Therefore, there is an element of order 3, say g ∈ G. The
cyclic subgroup ⟨g⟩ = {1, g, g 2 } then has index 2 in G and is, therefore, normal. Let h ∈ G
with h ∉ ⟨g⟩. Since g, g 2 both generate ⟨g⟩, we must have ⟨g⟩ ∩ ⟨h⟩ = {1}. If h also had
order 3, then |⟨g, h⟩| = |⟨g⟩∩⟨h⟩|
|⟨g⟩||⟨h⟩|
= 9, which is impossible. Therefore, h must have order 2.
Since ⟨g⟩ is normal, we have h−1 gh = g t for t = 1, 2. If h−1 gh = g, then g, h commute,
and the group G is Abelian. Therefore, h−1 gh = g 2 = g −1 . It follows that g, h generate a
subgroup of G, satisfying

g 3 = h2 = 1, h1 gh = g −1 .

This defines a subgroup of order 6 isomorphic to D3 and, hence, must be all of G.

Lemma 10.5.10. If |G| = 10, then G ≅ ℤ10 , or G ≅ D5 .

Proof. The proof is almost identical to that for n = 6. Since 10 = 2 ⋅ 5, if G were Abelian,
G ≅ ℤ2 × ℤ5 = ℤ10 .
10.5 Some Properties of Finite Groups � 157

Now suppose that G is non-Abelian. As for n = 6, G must contain a normal cyclic


subgroup of order 5, say ⟨g⟩ = {1, g, g 2 , g 3 , g 4 }. If h ∉ ⟨g⟩, then exactly as for n = 6, it
follows that h must have order 2, and h−1 gh = g t for t = 1, 2, 3, 4. If h−1 gh = g, then g, h
commute, and G is Abelian. Notice that h−1 = h. Suppose that h−1 gh = hgh = g 2 . Then
3
(hgh)3 = (g 2 ) = g 6 = g 󳨐⇒ g = h2 gh2 = hg 2 h = g 4 󳨐⇒ g = 1,

which is a contradiction. Similarly, hgh = g 3 leads to a contradiction. Therefore, h−1 gh =


g 4 = g −1 , and g, h generate a subgroup of order 10, satisfying

g 5 = h2 = 1; h−1 gh = g −1 .

Therefore, this is all of G, and is isomorphic to D5 .

This leaves the case n = 8, the most difficult. If |G| = 8, and G is Abelian, then clearly,
G ≅ ℤ8 , or G ≅ ℤ4 ×ℤ2 , or G ≅ ℤ2 ×ℤ2 ×ℤ2 . The proof of Theorem 10.5.3 is then completed
with the following:

Lemma 10.5.11. If G is a non-Abelian group of order 8, then G ≅ D4 , or G ≅ Q.

Proof. The nontrivial elements of G have orders 2, 4, or 8. If there is an element of or-


der 8, then G is cyclic, and hence Abelian, whereas if every element has order 2, then
G is Abelian. Hence, we may assume that G has an element of order 4, say g. Then ⟨g⟩
has index 2 and is a normal subgroup. First, suppose that G has an element h ∉ ⟨g⟩ of
order 2. Then

h−1 gh = g t for some t = 1, 2, 3.

If h−1 gh = g, then as in the cases 6 and 10, ⟨g, h⟩ defines an Abelian subgroup of order 8;
hence, G is Abelian. If h−1 gh = g 2 , then
2 2
(h−1 gh) = (g 2 ) = g 4 = 1 󳨐⇒ g = h−2 gh2 = h−1 g 2 h = g 4 󳨐⇒ g 3 = 1,

contradicting the fact that g has order 4. Therefore, h−1 gh = g 3 = g −1 . It follows that g,
h define a subgroup of order 8, isomorphic to D4 . Since |G| = 8, this must be all of G and
G ≅ D4 .
Therefore, we may now assume that every element h ∈ G with h ∉ ⟨g⟩ has order 4.
Let h be such an element. Then h2 has order 2, so h2 ∈ ⟨g⟩, which implies that h2 = g 2 .
This further implies that g 2 is central; that is, commutes with everything. Identifying g
with i, h with j, and g 2 with −1, we get that G is isomorphic to Q, completing Lemma 10.5.11
and the proof of Theorem 10.5.3.
In principle, this type of analysis can be used to determine the structure of any finite
group, although it quickly becomes impractical. A major tool in this classification is the
following important result known as the Sylow theorem, which we just state. We will
prove this theorem in Chapter 13. If |G| = pm n with p a prime and (n, p) = 1, then a
158 � 10 Normal Subgroups, Factor Groups and Direct Products

subgroup of G of order pm is called a p-Sylow subgroup. It is not clear at first that a


group will contain p-Sylow subgroups.

Theorem 10.5.12 (Sylow theorem). Let |G| = pm n with p a prime and (n, p) = 1.
(a) G contains a p-Sylow subgroup.
(b) All p-Sylow subgroups of G are conjugate.
(c) Any p-subgroup of G is contained in a p-Sylow subgroup.
(d) The number of p-Sylow subgroups of G is of the form 1 + pk and divides n.

10.6 Automorphisms of a Group


Let G be a group. A homomorphism f : G → G is called an automorphism of G if f is
bijective. Let Aut(G) be the set of all automorphisms of G.

Theorem 10.6.1. Aut(G) is a group.

Proof. The identity map 1 is the identity of Aut(G).


Let f , g ∈ Aut(G).
Then certainly fg ∈ Aut(g). Now

f −1 (ab) = f −1 (ff −1 (a)ff −1 (b))


= f −1 (f (f −1 (a)f −1 (b)))
= f −1 (a)f −1 (b)

for a, b ∈ G, because f ∈ Aut(G).


Hence, f −1 ∈ Aut(G).

A special automorphism of G is as follows: Let a ∈ G, and

ia : G → G, ia (x) = axa−1 .

By Lemma 10.1.3, we have that ia ∈ Aut(G).

Definition 10.6.2. ia is called an inner automorphism of G by a.


Let Inn(G) be the set of all inner automorphisms of G.

Theorem 10.6.3. The map φ : G → Aut(G), a 󳨃→ ia , is an epimorphism; that is, a surjective


homomorphism.

Proof. Certainly φ(G) = Inn(G). We have the following:

φ(a)φ(b)(x) = ia (ib (x)) = ia (bxb−1 )


= abxb−1 a−1 = (ab)x(ab)−1
= iab (x) = φ(ab)(x),

that is, φ(ab) = φ(a)φ(b).


10.6 Automorphisms of a Group � 159

Theorem 10.6.4. Inn(G) is a normal subgroup of Aut(G); that is, Inn(G) ⊲ Aut(G).

Proof. From Theorem 10.6.3, Inn(G) is a homomorphic image φ(G) of G. Therefore,


Inn(G) < Aut(G). Let f ∈ Aut(G). Then

fia f −1 (x) = f (af −1 (x)a−1 ) = f (a)ff −1 (x)f (a−1 )


= f (a)x(f (a)) = if (a) (x),
−1

that is, fia f −1 = if (a) ∈ Inn(G).

We now consider the kernel ker(φ) of the map φ : G → Aut(G), a 󳨃→ ia .


We have

ker(φ) = {a ∈ G : ia (x) = x for all x ∈ G}


= {a ∈ G : axa−1 = x for all x ∈ G}.

Hence, ker(φ) = Z(G), the center of G. Now, from Theorem 10.2.3, we get the following:

Theorem 10.6.5. For a group G we have Inn(G) ≅ G/Z(G).

Let G be a group and f ∈ Aut(G). If a ∈ G has order n, then f (a) also has order n; if
a ∈ G has infinite order then f (a) also has infinite order.

Example 10.6.6. Let V ≅ ℤ2 × ℤ2 ; that is, V has four elements 1, a, b and ab with a2 =
b2 = (ab)2 = 1.
V is often called the Klein four group. An automorphism of V permutes the three
elements a, b and ab of order 2, and each permutation of {a, b, ab} defines an automor-
phism of V . Hence, Aut(V ) ≅ S3 .

Example 10.6.7. We have S3 ≅ Inn(S3 ) = Aut(S3 ). By Theorem 10.6.5, S3 ≅ Inn(S3 ), be-


cause Z(S3 ) = {1}. Now, let f ∈ Aut(S3 ). Analogously, as in Example 10.6.6, the automor-
phism f permutes the three transpositions (1, 2), (1, 3), and (2, 3).
This gives | Aut(S3 )| ≤ |S3 | = 6, because S3 is generated by these transpositions. From
S3 ≅ Inn(S3 ) ⊲ Aut(S3 ), we have | Aut(S3 )| ≥ 6.
Hence, Aut(S3 ) ≅ Inn(S3 ) ≅ S3 .

Example 10.6.8. Let Gn = ⟨g⟩ ≅ (ℤn , +), n ∈ ℕ, be a cyclic group of order n.


If f ∈ Aut(Gn ), then Gn = ⟨f (g)⟩ = ⟨g k ⟩, and (k, n) = 1 by Theorem 9.5.8. Hence,
Aut(Gn ) ≅ ℤ⋆n , the group of units of the ring ℤn = ℤ/nℤ.
In particular, | Aut(Gn )| = φ(n). If n = p a prime number, then Aut(Gp ) ≅ ℤ⋆p is cyclic
by Theorem 9.5.16.
In general, Aut(Gn ) is not cyclic. If, for instance, n = 8, then φ(8) = 4. The four
automorphisms of G8 are given by f1 (g) = g, f2 (g) = g 3 , f3 (g) = g 5 , and f4 (g) = g 7 .
We have fi2 (g) = g for i = 1, 2, 3, 4. Hence, Aut(G8 ) ≅ ℤ2 × ℤ2 . We remark that
certainly Aut(ℤ, +) = ℤ2 , because f (1) = 1 or f (1) = −1 for f ∈ Aut(ℤ, +).
160 � 10 Normal Subgroups, Factor Groups and Direct Products

10.7 Exercises
1. Prove that if G is cyclic, then any factor group of G is also cyclic.
2. Prove that for any group G, the center Z(G) is a normal subgroup, and G = Z(G) if
and only if G is Abelian.
3. Let U1 and U2 be subgroups of a group G. Let x, y ∈ G. Show the following:
(i) If xU1 = yU2 , then U1 = U2 .
(ii) An example that xU1 = U2 x does not imply U1 = U2 .
4. Let U, V be subgroups of a group G. Let x, y ∈ G. If UxV ∩ UyV ≠ 0, then UxV = UyV .
5. Let N be a cyclic normal subgroup of the group G. Then all subgroups of N are
normal subgroups of G. Give an example to show that the statement is not correct
if N is not cyclic.
6. Let N1 and N2 be normal subgroups of G. Show the following:
(i) If all elements in N1 and N2 have finite order, then also the elements of N1 N2 .
e
(ii) Let e1 , e2 ∈ ℕ. If ni i = 1 for all ni ∈ Ni (i = 1, 2), then x e1 e2 = 1 for all x ∈ N1 N2 .
7. Find groups N1 , N2 and G with N1 ⊲ N2 ⊲ G, but N1 is not a normal subgroup of G.
8. Let G be a group generated by a and b and let bab−1 = ar and an = 1 for suitable
r ∈ ℤ, n ∈ ℕ. Show the following:
(i) The subgroup A := ⟨a⟩ is a normal subgroup of G.
(ii) G/A = ⟨bA⟩.
(iii) G = {bj ai : i, j ∈ ℤ}.
9. Prove that any group of order 24 cannot be simple.
10. Let G be a group with subgroups G1 , G2 . Then the following are equivalent:
(i) G ≅ G1 × G2 .
(ii) G1 ⊲ G, G2 ⊲ G, G = G1 G2 , and G1 ∩ G2 = {1}.
(iii) Every g ∈ G has a unique expression g = g1 g2 , where g1 ∈ G1 , g2 ∈ G2 , and
g1 g2 = g2 g1 for each g1 ∈ G1 , g2 ∈ G2 .
11. Suppose that G is a finite group with normal subgroups G1 , G2 such that
(|G1 |, |G2 |) = 1. If |G| = |G1 ||G2 |, then G ≅ G1 × G2 .
12. Let G be a group with normal subgroups G1 and G2 such that G = G1 G2 . Then

G/(G1 ∩ G2 ) ≅ G1 /(G1 ∩ G2 ) × G2 /(G1 ∩ G2 ).


11 Symmetric and Alternating Groups
11.1 Symmetric Groups and Cycle Decomposition
Groups most often appear as groups of transformations or permutations on a set. In
Galois Theory, groups will appear as permutation groups on the zeros of a polynomial.
In Section 9.3, we introduced permutation groups and the symmetric group Sn . In this
chapter, we look more carefully at the structure of Sn , and for each n introduce a very
important normal subgroup, An of Sn , called the alternating group on n symbols.
Recall that if A is a set, a permutation on A is a one-to-one mapping of A onto itself.
The set SA of all permutations on A forms a group under composition called the sym-
metric group on A. If |A| > 2, then SA is non-Abelian. Furthermore, if A, B have the same
cardinality, then SA ≅ SB .
If |A| = n, then |SA | = n! and, in this case, we denote SA by Sn , called the symmetric
group on n symbols. For example, |S3 | = 6. In Example 9.3.5, we showed that the six
elements of S3 can be given by the following:

1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2

In addition, we saw that S3 has a presentation given by

S3 = ⟨a, c; a3 = c2 = 1, ac = ca2 ⟩.

By this, we mean that S3 is generated by a, c, or that S3 has generators a, c, and the


whole group and its multiplication table can be generated by using the relations a3 =
c2 = 1, ac = ca2 .
In general, a permutation group is any subgroup of SA for a set A.
For the remainder of this chapter, we will only consider finite symmetric groups Sn
and always consider the set A as A = {1, 2, 3, . . . , n}.

Definition 11.1.1. Suppose that f is a permutation of A = {1, 2, . . . , n}, which has the
following effect on the elements of A: There exists an element a1 ∈ A with f (a1 ) = a2 ,
f (a2 ) = a3 , . . . , f (ak−1 ) = ak , f (ak ) = a1 , and f leaves all other elements (if there are any)
of A fixed; that is, f (aj ) = aj for aj ≠ ai , i = 1, 2, . . . , k. Such a permutation f is called a
cycle or a k-cycle.

We use the following notation for a k-cycle, f , as given above:

f = (a1 , a2 , . . . , ak ).

https://doi.org/10.1515/9783111142524-011
162 � 11 Symmetric and Alternating Groups

The cycle notation is read from left to right. It says f takes a1 into a2 , a2 into a3 , et
cetera, and finally ak , the last symbol, into a1 , the first symbol. Moreover, f leaves all the
other elements not appearing in the representation above fixed.
Note that one can write the same cycle in many ways using this type of notation; for
example, f = (a2 , a3 , . . . , ak , a1 ). In fact, any cyclic rearrangement of the symbols gives
the same cycle. The integer k is the length of the cycle. Note we allow a cycle to have
length 1, that is, f = (a1 ), for instance. This is just the identity map. For this reason, we
will usually designate the identity of Sn by (1), or just 1. (Of course, it also could be written
as (ai ), where ai ∈ A.)
If f and g are two cycles, they are called disjoint cycles if the elements moved by
one are left fixed by the other; that is, their representations contain different elements
of the set A (their representations are disjoint as sets).

Lemma 11.1.2. If f and g are disjoint cycles, then they must commute; that is, fg = gf .

Proof. Since the cycles f and g are disjoint, each element moved by f is fixed by g, and
vice versa. First, suppose f (ai ) ≠ ai . This implies that g(ai ) = ai , and f 2 (ai ) ≠ f (ai ).
But since f 2 (ai ) ≠ f (ai ), g(f (ai )) = f (ai ). Thus, (fg)(ai ) = f (g(ai )) = f (ai ), whereas
(gf )(ai ) = g(f (ai )) = f (ai ). Similarly, if g(aj ) ≠ aj , then (fg)(aj ) = (gf )(aj ). Finally, if
f (ak ) = ak and g(ak ) = ak , clearly then, (fg)(ak ) = ak = (gf )(ak ). Thus, gf = fg.

Before proceeding further with the theory, let us consider a specific example. Let
A = {1, 2, . . . , 8}, and let

1 2 3 4 5 6 7 8
f =( ).
2 4 6 5 1 7 3 8

We pick an arbitrary number from the set A, say 1. Then f (1) = 2, f (2) = 4, f (4) = 5,
f (5) = 1. Now select an element from A not in the set {1, 2, 4, 5}, say 3. Then f (3) = 6,
f (6) = 7, f (7) = 3.
Next select any element of A that does not occur in the set {1, 2, 4, 5} ∪ {3, 6, 7}. The
only element left is 8, and f (8) = 8. It is clear that we can now write the permutation f
as a product of cycles:

f = (1, 2, 4, 5)(3, 6, 7)(8),

where the order of the cycles is immaterial since they are disjoint and, therefore, com-
mute. It is customary to omit such cycles as (8) and write f simply as

f = (1, 2, 4, 5)(3, 6, 7)

with the understanding that the elements of A not appearing are left fixed by f .
It is not difficult to generalize what was done here for a specific example, and show
that any permutation f can be written uniquely, except for order, as a product of disjoint
11.1 Symmetric Groups and Cycle Decomposition � 163

cycles. Thus, let f be a permutation on the set A = {1, 2, . . . , n}, and let a1 ∈ A. Let f (a1 ) =
a2 , f 2 (a1 ) = f (a2 ) = a3 , et cetera, and continue until a repetition is obtained. We claim
that this first occurs for a1 ; that is, the first repetition is, say

f k (a1 ) = f (ak ) = ak+1 = a1 .

For suppose the first repetition occurs at the k-th iterate of f and

f k (a1 ) = f (ak ) = ak+1 ,

and ak+1 = aj , where j < k. Then

f k (a1 ) = f j−1 (a1 ),

and so f k−j+1 (a1 ) = a1 . However, k − j + 1 < k if j ≠ 1, and we assumed that the first repe-
tition occurred for k. Thus, j = 1, and so f does cyclically permute the set {a1 , a2 , . . . , ak }.
If k < n, then there exists b1 ∈ A such that b1 ∉ {a1 , a2 , . . . , ak }, and we may proceed
similarly with b1 . We continue in this manner until all the elements of A are accounted
for. It is then seen that f can be written in the form

f = (a1 , . . . , ak )(b1 , . . . , bℓ )(c1 , . . . , cm ) ⋅ ⋅ ⋅ (h1 , . . . , ht ).

Note that all powers f i (a1 ) belong to the set

{a1 = f 0 (a1 ) = f k (a1 ), a2 = f 1 (a1 ), . . . , ak = f k−1 (a1 )};

all powers f i (b1 ) belong to the set

{b1 = f 0 (b1 ) = f ℓ (b1 ), b2 = f 1 (b1 ), . . . , bℓ = f ℓ−1 (b1 )};

and so on. Here, by definition, b1 is the smallest element in {1, 2, . . . , n}, which does not
belong to {a1 = f 0 (a1 ) = f k (a1 ), a2 = f 1 (a1 ), . . . , ak = f k−1 (a1 )}; c1 is the smallest element
in {1, 2, . . . , n}, which does not belong to

{a1 = f 0 (a1 ) = f k (a1 ), a2 = f 1 (a1 ), . . . , ak = f k−1 (a1 )}


∪ {b1 = f 0 (b1 ) = f ℓ (b1 ), b2 = f 1 (b1 ), . . . , bℓ = f ℓ−1 (b1 )}.

Therefore, by construction, all the cycles are disjoint.


From this, it follows that k + ℓ + m + ⋅ ⋅ ⋅ + t = n. It is clear that this factorization is
unique, except for the order of the factors, since it tells explicitly what effect f has on
each element of A.
In summary, we have proven the following result.
164 � 11 Symmetric and Alternating Groups

Theorem 11.1.3. Every permutation of Sn can be written uniquely as a product of disjoint


cycles (up to order).

Example 11.1.4. The elements of S3 can be written in cycle notation as 1 = (1), (1, 2), (1, 3),
(2, 3), (1, 2, 3), (1, 3, 2). This is the largest symmetric group, which consists entirely of cy-
cles.
In S4 , for example, the element (1, 2)(3, 4) is not a cycle, but a product of cycles. Sup-
pose we multiply two elements of S3 , say (1, 2) and (1, 3). In forming the product or com-
position here, we read from right to left. Thus, to compute (1, 2)(1, 3): We note the per-
mutation (1, 3) takes 1 into 3, and then the permutation (1, 2) takes 3 into 3. Therefore,
the composite (1, 2)(1, 3) takes 1 into 3. Continuing the permutation, (1, 3) takes 3 into 1,
and then the permutation (1, 2) takes 1 into 2. Therefore, the composite (1, 2)(1, 3) takes 3
into 2. Finally, (1, 3) takes 2 into 2, and then (1, 2) takes 2 into 1. So (1, 2)(1, 3) takes 2 into 1.
Thus, we see (1, 2)(1, 3) = (1, 3, 2).
As another example of this cycle multiplication consider (1, 2)(2, 4, 5)(1, 3)(1, 2, 5)
in S5 :
Reading from right to left 1 󳨃→ 2 󳨃→ 2 󳨃→ 4 󳨃→ 4 so 1 󳨃→ 4. Now 4 󳨃→ 4 󳨃→ 4 󳨃→ 5 󳨃→ 5
so 4 󳨃→ 5. Next 5 󳨃→ 1 󳨃→ 3 󳨃→ 3 󳨃→ 3 so 5 󳨃→ 3. Then 3 󳨃→ 3 󳨃→ 1 󳨃→ 1 󳨃→ 2 so 3 󳨃→ 2.
Finally, 2 󳨃→ 5 󳨃→ 5 󳨃→ 2 󳨃→ 1, so 2 󳨃→ 1. Since all the elements of A = {1, 2, 3, 4, 5} have
been accounted for, we have (1, 2)(2, 4, 5)(1, 3)(1, 2, 5) = (1, 4, 5, 3, 2).

Let f ∈ Sn . If f is a cycle of length 2, that is, f = (a1 , a2 ), where a1 , a2 ∈ A, then f is


called a transposition. Any cycle can be written as a product of transpositions, namely,

(a1 , . . . , ak ) = (a1 , ak )(a1 , ak−1 ) ⋅ ⋅ ⋅ (a1 , a2 ).

From Theorem 11.1.3, any permutation can be written in terms of cycles, but from the
above, any cycle can be written as a product of transpositions. Thus, we have the follow-
ing result:

Theorem 11.1.5. Let f ∈ Sn be any permutation. Then f can be written as a product of


transpositions.

11.2 Parity and the Alternating Groups


If f is a permutation with a cycle decomposition

(a1 , . . . , ak )(b1 , . . . , bj ) ⋅ ⋅ ⋅ (m1 , . . . , mt ),

then f can be written as a product of

W (f ) = (k − 1) + (j − 1) + ⋅ ⋅ ⋅ + (t − 1)

transpositions. The number W (f ) is uniquely associated with the permutation f since f


is uniquely represented (up to order) as a product of disjoint cycles. However, there is
11.2 Parity and the Alternating Groups � 165

nothing unique about the number of transpositions occurring in an arbitrary represen-


tation of f as a product of transpositions. For example, in S3 ,

(1, 3, 2) = (1, 2)(1, 3) = (1, 2)(1, 3)(1, 2)(1, 2),

since (1, 2)(1, 2) = (1), the identity permutation of S3 .


Although the number of transpositions is not unique in the representation of a per-
mutation f as a product of transpositions, we will show that the parity (evenness or
oddness) of that number is unique. Moreover, this depends solely on the number W (f )
uniquely associated with the representation of f . More explicitly, we have the following
result:

Theorem 11.2.1. If f is a permutation written as a product of disjoint cycles, and if W (f )


is the associated integer given above, then if W (f ) is even (odd), any representation of f ,
as a product of transpositions, must contain an even (odd) number of transpositions.

Proof. We first observe the following:

(a, b)(b, c1 , . . . , ct )(a, b1 , . . . , bk ) = (a, b1 , . . . , bk , b, c1 , . . . , ct ),


(a, b)(a, b1 , . . . , bk , b, c1 , . . . , ct ) = (a, b1 , . . . , bk )(b, c1 , . . . , ct ).

Suppose now that f is represented as a product of disjoint cycles, where we include all
the 1-cycles of elements of A, which f fixes, if any. If a and b occur in the same cycle in
this representation for f ,

f = ⋅ ⋅ ⋅ (a, b1 , . . . , bk , b, c1 , . . . , ct ) ⋅ ⋅ ⋅ ,

then, in the computation of W (f ), this cycle contributes k + t + 1. Now consider (a, b)f .
Since the cycles are disjoint and disjoint cycles commute,

(a, b)f = ⋅ ⋅ ⋅ (a, b)(a, b1 , . . . , bk , b, c1 , . . . , ct ) ⋅ ⋅ ⋅

since neither a nor b can occur in any factor of f other than

(a, b1 , . . . , bk , b, c1 , . . . , ct ).

So that (a, b) cancels out, and we find that

(a, b)f = ⋅ ⋅ ⋅ (b, c1 , . . . , ct )(a, b1 , . . . , bk ) ⋅ ⋅ ⋅ .

Since W ((b, c1 , . . . , ct )(a, b1 , . . . , bk )) = k + t, but W (a, b1 , . . . , bk , b, c1 , . . . , ct ) = k + t + 1,


we have W ((a, b)f ) = W (f ) − 1.
A similar analysis shows that in the case, where a and b occur in different cycles in
the representation of f , then W ((a, b)f ) = W (f ) + 1. Combining both cases, we have
166 � 11 Symmetric and Alternating Groups

W ((a, b)f ) = W (f ) ± 1.

Now let f be written as a product of m transpositions, say

f = (a1 , b1 )(a2 , b2 ) ⋅ ⋅ ⋅ (am , bm ).

Then

(am , bm ) ⋅ ⋅ ⋅ (a2 , b2 )(a1 , b1 )f = 1.

Iterating this, together with the fact that W (1) = 0, shows that

W (f )(±1)(±1)(±1) ⋅ ⋅ ⋅ (±1) = 0,

where there are m terms of the form ±1. Thus,

W (f ) = (±1)(±1) ⋅ ⋅ ⋅ (±1),

m times.
Note, if exactly p are + and q = m − p are −, then m = p + q, and W (f ) = p − q. Hence,
m ≡ W (f ) (mod 2). Thus, W (f ) is even if and only if m is even, and this completes the
proof.

It now makes sense to state the following definition since we know that the parity
is indeed unique:

Definition 11.2.2. A permutation f ∈ Sn is said to be even if it can be written as a product


of an even number of transpositions. Similarly, f is called odd if it can be written as a
product of an odd number of transpositions.

Definition 11.2.3. For n ≥ 2 we define the sign function sgn : Sn → (ℤ2 , +) by setting
sgn(π) = 0 if π is an even permutation and sgn(π) = 1 if π is an odd permutation.

We note that if f and g are even permutations, then so are fg and f −1 and also the
identity permutation is even. Furthermore, if f is even and g is odd, it is clear that fg is
odd. From this it is straightforward to establish the following:

Lemma 11.2.4. The map sgn is a homomorphism from Sn , for n ≥ 2, onto (ℤ2 , +).

We now let

An = {π ∈ Sn : sgn(π) = 0}.

That is, An is precisely the set of even permutations in Sn .

Theorem 11.2.5. For each n ∈ ℕ, n ≥ 2, the set An forms a normal subgroup of index 2 in
Sn , called the alternating group on n symbols. Furthermore, |An | = n!2 .
11.3 The Conjugation in Sn � 167

Proof. By Lemma 11.2.4 sgn : Sn → (ℤ2 , +) is a homomorphism. Then ker(sgn) = An ;


therefore, An is a normal subgroup of Sn . Since im(sgn) = ℤ2 , we have |im(sgn)| = 2,
hence, |Sn /An | = 2. Therefore, [Sn : An ] = 2. Since |Sn | = n!, then |An | = n!2 follows from
Lagrange’s theorem.

11.3 The Conjugation in Sn


Recall that in a group G, two elements x, y ∈ G are conjugates if there exists a g ∈ G with
g −1 xg = y. Conjugacy is an equivalence relation on G. In the symmetric groups Sn , it is
easy to determine if two elements are conjugates. We say that two permutations in Sn
have the same cycle structure if they have the same number of cycles and the lengths
are the same. Hence, for example in S8 the permutations

π1 = (1, 3, 6, 7)(2, 5) and π2 = (2, 3, 5, 6)(1, 8)

have the same cycle structure. In particular, if π1 , π2 are two permutations in Sn , then
π1 , π2 are conjugates if and only if they have the same cycle structure. Therefore, in S8 ,
the permutations

π1 = (1, 3, 6, 7)(2, 5) and π2 = (2, 3, 5, 6)(1, 8)

are conjugates.

Lemma 11.3.1. Let

π = (a11 , a12 , . . . , a1k1 ) ⋅ ⋅ ⋅ (as1 , as2 , . . . , asks )

be the cycle decomposition of π ∈ Sn . Let τ ∈ Sn , and denote the image of aij under τ by aijτ .
Then

τ τ τ τ τ τ
τπτ −1 = (a11 , a12 , . . . , a1k 1
) ⋅ ⋅ ⋅ (as1 , as2 , . . . , asks
).

Proof. (a) Consider a11 , then operating on the left like functions, we have

τ τ
τπτ −1 (a11 ) = τπ(a11 ) = τ(a12 ) = a12 .

The same computation then follows for all the symbols aij , proving the lemma.

Theorem 11.3.2. Two permutations π1 , π2 ∈ Sn are conjugates if and only if they are of
the same cycle structure.

Proof. Suppose that π2 = τπ1 τ −1 . Then, from Lemma 11.3.1, we have that π1 and π2 are
of the same cycle structure.
168 � 11 Symmetric and Alternating Groups

Conversely, suppose that π1 and π2 are of the same cycle structure. Let

π1 = (a11 , a12 , . . . , a1k1 ) ⋅ ⋅ ⋅ (as1 , as2 , . . . , asks )


π2 = (b11 , b12 , . . . , b1k1 ) ⋅ ⋅ ⋅ (bs1 , bs2 , . . . , bsks ),

where we place the cycles of the same length under each other. Let τ be the permutation
in Sn that maps each symbol in π1 to the digit below it in π2 . Then, from Lemma 11.3.1,
we have τπ1 τ −1 = π2 ; hence, π1 and π2 are conjugate.

11.4 The Simplicity of An


A simple group is a group G with no nontrivial proper normal subgroups. Up to this
point, the only examples we have of simple groups are cyclic groups of prime order. In
this section, we prove that if n ≥ 5, each alternating group An is a simple group.

Theorem 11.4.1. For each n ≥ 3 each π ∈ An is a product of cycles of length 3.

Proof. Let π ∈ An . Since π is a product of an even number of transpositions to prove


the theorem, it suffices to show that if τ1 , τ2 are transpositions, then τ1 τ2 is a product of
3-cycles.
The statement holds certainly for n = 3. Now, let n ≥ 4.
Suppose that a, b, c, d are different digits in {1, . . . , n}. There are three cases to con-
sider. First:

Case (1): (a, b)(a, b) = 1 = (1, 2, 3)0 ;

hence, it is true here.


Next:

Case (2): (a, b)(b, c) = (c, a, b);

hence, it is also true here.


Finally:

Case (3): (a, b)(c, d) = (a, b)(b, c)(b, c)(c, d) = (c, a, b)(c, d, b)

since (b, c)(b, c) = 1. Therefore, it is also true here, proving the theorem.
Now our main result:

Theorem 11.4.2. For n ≥ 5, the alternating group An is a simple non-Abelian group.

Proof. Suppose that N is a nontrivial normal subgroup of An with n ≥ 5. We show that


N = An ; hence, An is simple.
11.4 The Simplicity of An � 169

We claim first that N must contain a 3-cycle. Let 1 ≠ π ∈ N, then π is not a transposi-
tion since π ∈ An . Therefore, π moves at least 3 digits. If π moves exactly 3 digits, then it
is a 3-cycle, and we are done. Suppose then that π moves at least 4 digits. Let π = τ1 ⋅ ⋅ ⋅ τr
with τi disjoint cycles.
Case (1): There is a τi = (. . . , a, b, c, d). Set σ = (a, b, c) ∈ An . Then

πσπ −1 = τi στi−1 = (b, c, d).

However, from Lemma 11.3.1, (b, c, d) = (aτi , bτi , cτi ). Furthermore, since π ∈ N and N is
normal, we have

π(σπ −1 σ −1 ) = (b, c, d)(a, c, b) = (a, d, b).

Therefore, in this case, N contains a 3-cycle.


Case (2): There is a τi , which is a 3-cycle. Then

π = (a, b, c)(d, e, . . .).

Now, set σ = (a, b, d) ∈ An , and then

πσπ −1 = (b, c, e) = (aπ , bπ , d π ),

and

σ −1 πσπ −1 = (a, d, b)(b, c, e) = (b, c, e, d, a) ∈ N.

Now, use Case (1). Therefore, in this case, N has a 3-cycle.


In the final case, π is a disjoint product of transpositions.
Case (3): π = (a, b)(c, d) ⋅ ⋅ ⋅. Since n ≥ 5, there exists an e ≠ a, b, c, d. We now set
σ = (a, c, e) ∈ An . Then πσπ −1 = (b, d, e1 ) with e1 = eπ ≠ b, d. However, we have
(aπ , cπ , eπ ) = (b, d, e1 ). Let γ = (σ −1 πσ)π −1 . This is in N since N is normal. If e = e1 ,
then γ = (e, c, a)(b, d, e) = (a, e, b, d, c), and we can use Case (1) to get that N contains a
3-cycle. If e ≠ e1 , then γ = (e, c, a)(b, d, e1 ) ∈ N, and then we can use Case (2) to obtain
that N contains a 3-cycle.
These three cases show that N must contain a 3-cycle.
If N is normal in An , then from the argument above, N contains a 3-cycle τ. However,
from Theorem 11.3.2, any two 3-cycles in Sn are conjugate. Hence, τ is conjugate to any
other 3-cycle in Sn . Since N is normal in An and τ ∈ N, each of these conjugates must
also be in N. Therefore, N contains all 3-cycles in Sn . From Theorem 11.4.1, each element
of An is a product of 3-cycles. It follows then that each element of An is in N. However,
since N ⊂ An , this is only possible if N = An , completing the proof.

Theorem 11.4.3. Let n ∈ ℕ and U ⊂ Sn a subgroup. Let τ = (1, 2) be a transposition and


α = (1, 2, a3 , . . . , an ) an n-cycle with α, τ ∈ U. Then U = Sn .
170 � 11 Symmetric and Alternating Groups

Proof. Let

1 2 a3 ⋅⋅⋅ an
π=( ).
1 2 3 ⋅⋅⋅ n

Then, from Lemma 11.3.1, we have

παπ −1 = (1, 2, . . . , n).

Furthermore, π(1, 2)π −1 = (1, 2). Hence, U1 = πUπ −1 contains (1, 2) and (1, 2, . . . , n).
Now we have

(1, 2, . . . , n)(1, 2)(1, 2, . . . , n)−1 = (2, 3) ∈ U1 .

Analogously,

(1, 2, . . . , n)(2, 3)(1, 2, . . . , n)−1 = (3, 4) ∈ U1 ,

and so on until

(1, 2, . . . , n)(n − 2, n − 1)(1, 2, . . . , n)−1 = (n − 1, n) ∈ U1 .

Hence, the transpositions (1, 2), (2, 3), . . . , (n − 1, n) ∈ U1 . Moreover,

(1, 2)(2, 3)(1, 2) = (1, 3) ∈ U1 .

In an identical fashion, each (1, k) ∈ U1 . Then for any digits s, t, we have

(1, s)(1, t)(1, s) = (s, t) ∈ U1 .

Therefore, U1 contains all the transpositions of Sn ; hence, U1 = Sn . Since U = πU1 π −1 , we


must also have U = Sn .
We end this chapter with the following corollary.

Corollary 11.4.4. Let p be a prime number and U ⊂ Sp a subgroup. Let τ be a transposition


and α be a p-cycle with α, τ ∈ U. Then U = Sp .

Proof. Suppose, without loss of generality, that τ = (1, 2). Since α, . . . , αp−1 are p-cycles
with no fixed points (recall that p is a prime number), there exists an i with αi (1) = 2.
Without loss of generality, we may assume that α = (1, 2, a3 , . . . , ap ). Now the result fol-
lows from Theorem 11.4.3.
11.5 Exercises � 171

11.5 Exercises
1. Show that for n ≥ 3, the group An is generated by {(1, 2, k) : k ≥ 3}.
2. Let σ = (k1 , . . . , ks ) ∈ Sn be a permutation. Show that the order of σ is the least
common multiple of k1 , . . . , ks . Compute the order of τ = ( 21 62 35 41 35 46 77 ) ∈ S7 .
3. Let G = S4 .
(i) Determine a noncyclic subgroup H of order 4 of G.
(ii) Show that H is normal.
(iii) Show that f (g)(h) := ghg −1 defines an epimorphism f : G → Aut(H) for g ∈ G
and h ∈ H. Determine its kernel.
4. Show that all subgroups of order 6 of S4 are conjugate.
5. Let σ1 = (1, 2)(3, 4) and σ2 = (1, 3)(2, 4) ∈ S4 . Determine τ ∈ S4 such that τσ1 τ −1 = σ2 .
6. Let σ = (a1 , . . . , ak ) ∈ Sn . Describe σ −1 .
12 Solvable Groups
12.1 Solvability and Solvable Groups
The original motivation for Galois theory grew out of a famous problem in the theory of
equations. This problem was to determine the solvability or insolvability of a polynomial
equation of degree 5 or higher in terms of a formula involving the coefficients of the
polynomial and only using algebraic operations and radicals. This question arose out of
the well-known quadratic formula.
The ability to solve quadratic equations and, in essence, the quadratic formula was
known to the Babylonians some 3600 years ago. With the discovery of imaginary num-
bers, the quadratic formula then says that any second degree polynomial over ℂ can
be solved by radicals in terms of the coefficients. In the sixteenth century, the Italian
mathematician, Niccolo Tartaglia, discovered a similar formula in terms of radicals to
solve cubic equations. This cubic formula is now known erroneously as Cardano’s for-
mula in honor of Cardano, who first published it in 1545. An earlier special version of
this formula was discovered by Scipione del Ferro. Cardano’s student, Ferrari, extended
the formula to solutions by radicals for fourth degree polynomials. The combination of
these formulas says that polynomial equations of degree four or less over the complex
numbers can be solved by radicals.
From Cardano’s work until the very early nineteenth century, attempts were made
to find similar formulas for degree five polynomials. In 1805, Ruffini proved that fifth de-
gree polynomial equations are insolvable by radicals in general. Therefore, there exists
no comparable formula for degree 5. Abel (in 1825–1826) and Galois (in 1831) extended
Ruffini’s result and proved the insolubility by radicals for all degrees five or greater. In
doing this, Galois developed a general theory of field extensions and its relationship to
group theory. This has come to be known as Galois theory and is really the main focus
of this book.
The solution of the insolvability of the quintic and higher polynomials involved a
translation of the problem into a group theory setting. For a polynomial equation to
be solvable by radicals, its corresponding Galois group (a concept we will introduce in
Chapter 16) must be a solvable group. This is a group with a certain defined structure. In
this chapter, we introduce and discuss this class of groups.
A normal series for a group G is a finite chain of subgroups beginning with G and
ending with the identity subgroup {1}

G = G0 ⊃ G1 ⊃ G2 ⊃ ⋅ ⋅ ⋅ ⊃ Gn−1 ⊃ Gn = {1},

in which each Gi+1 is a proper normal subgroup of Gi . The factor groups Gi /Gi+1 are
called the factors of the series, and n is the length of the series.

https://doi.org/10.1515/9783111142524-012
12.1 Solvability and Solvable Groups � 173

Definition 12.1.1. A group G is solvable if it has a normal series with Abelian factors;
that is, Gi /Gi+1 is Abelian for all i = 0, 1, . . . , n − 1. Such a normal series is called a solvable
series.

If G is an Abelian group, then G = G0 ⊃ {1} provides a solvable series. Hence, any


Abelian group is solvable. Furthermore, the symmetric group S3 on 3-symbols is also
solvable, however, non-Abelian. Consider the series

S3 ⊃ A3 ⊃ {1}.

Since |S3 | = 6, we have |A3 | = 3; hence, A3 is cyclic and therefore Abelian. Furthermore,
|S3 /A3 | = 2; hence, the factor group S3 /A3 is also cyclic, thus Abelian. Therefore, the
series above gives a solvable series for S3 .

Lemma 12.1.2. If G is a finite solvable group, then G has a normal series with cyclic fac-
tors.

Proof. If G is a finite solvable group, then by definition, it has a normal series with
Abelian factors. Hence, to prove the lemma, it suffices to show that a finite Abelian group
has a normal series with cyclic factors. Let A be a nontrivial finite Abelian group. We do
an induction on the order of A. If |A| = 2, then A itself is cyclic, and the result follows.
Suppose that |A| > 2. Choose an 1 ≠ a ∈ A. Let N = ⟨a⟩ so that N is cyclic. Then we have
the normal series A ⊃ N ⊃ {1} with A/N Abelian. Moreover, A/N has order less than A,
so A/N has a normal series with cyclic factors, and the result follows.

Solvability is preserved under subgroups and factor groups.

Theorem 12.1.3. Let G be a solvable group. Then the following hold:


(1) Any subgroup H of G is also solvable.
(2) Any factor group G/N of G is also solvable.

Proof. (1) Let G be a solvable group, and suppose that

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}

is a solvable series for G. Hence, Gi+1 is a normal subgroup of Gi for each i, and the factor
group Gi /Gi+1 is Abelian.
Now let H be a subgroup of G, and consider the chain of subgroups

H = H ∩ G0 ⊃ H ∩ G1 ⊃ ⋅ ⋅ ⋅ ⊃ H ∩ Gr = {1}.

Since Gi+1 is normal in Gi , we know that H ∩ Gi+1 is normal in H ∩ Gi ; this gives a finite
normal series for H. Furthermore, from the second isomorphism theorem, we have

(H ∩ Gi )/(H ∩ Gi+1 ) = (H ∩ Gi )/((H ∩ Gi ) ∩ Gi+1 )


≅ (H ∩ Gi )Gi+1 /Gi+1 ⊂ Gi /Gi+1
174 � 12 Solvable Groups

for each i. However, Gi /Gi+1 is Abelian, so each factor in the normal series for H is
Abelian. Therefore, the above series is a solvable series for H; hence, H is also solvable.
(2) Let N be a normal subgroup of G. Then from (1) N is also solvable. As above, let

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}

be a solvable series for G. Consider the chain of subgroups

G/N = G0 N/N ⊃ G1 N/N ⊃ ⋅ ⋅ ⋅ ⊃ Gr N/N = N/N = {1}.

Let m ∈ Gi−1 , n ∈ N. Then since N is normal in G,

(mn)−1 Gi N(mn) = n−1 m−1 Gi mnN = n−1 Gi nN


= n−1 NGi = NGi = Gi N.

It follows that Gi+1 N is normal in Gi N for each i; therefore, the series for G/N is a normal
series.
Again, from the isomorphism theorems,

(Gi N/N)/(Gi+1 N/N) ≅ Gi /(Gi ∩ Gi+1 N)


≅ (Gi /Gi+1 )/((Gi ∩ Gi+1 N)/Gi+1 ).

However, the last group (Gi /Gi+1 )/((Gi ∩ Gi+1 N)/Gi+1 ) is a factor group of the group
Gi /Gi+1 , which is Abelian. Hence, this last group is also Abelian; therefore, each factor
in the normal series for G/N is Abelian. Hence, this series is a solvable series, and G/N
is solvable.
The following is a type of converse of the above theorem:

Theorem 12.1.4. Let G be a group and N a normal subgroup of G. If both N and G/N are
solvable, then G is solvable.

Proof. Suppose that

N = N0 ⊃ N1 ⊃ ⋅ ⋅ ⋅ ⊃ Nr = {1}
G/N = G0 /N ⊃ G1 /N ⊃ ⋅ ⋅ ⋅ ⊃ Gs /N = N/N = {1}

are solvable series for N and G/N, respectively. Then

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gs = N ⊃ N1 ⊃ ⋅ ⋅ ⋅ ⊃ Nr = {1}

gives a normal series for G. Furthermore, from the isomorphism theorems again,

Gi /Gi+1 ≅ (Gi /N)/(Gi+1 /N);

hence, each factor is Abelian. Therefore, this is a solvable series for G; hence, G is solv-
able.
12.1 Solvability and Solvable Groups � 175

This theorem allows us to prove that solvability is preserved under direct products.

Corollary 12.1.5. Let G and H be solvable groups. Then their direct product G × H is also
solvable.

Proof. Suppose that G and H are solvable groups and K = G × H. Recall from Chapter 10
that G can be considered as a normal subgroup of K with K/G ≅ H. Therefore, G is a solv-
able subgroup of K, and K/G is a solvable quotient. It follows then, from Theorem 12.1.4,
that K is solvable.
We saw that the symmetric group S3 is solvable. However, the following theorem
shows that the symmetric group Sn is not solvable for n ≥ 5. This result will be crucial
to the proof of the insolvability of the quintic and higher polynomials.

Theorem 12.1.6. For n ≥ 5, the symmetric group Sn is not solvable.

Proof. For n ≥ 5, we saw that the alternating group An is simple. Furthermore, An is non-
Abelian. Hence, An cannot have a nontrivial normal series, and so no solvable series.
Therefore, An is not solvable. If Sn were solvable for n ≥ 5, then from Theorem 12.1.3,
An would also be solvable. Therefore, Sn must also be nonsolvable for n ≥ 5.

In general, for a simple, solvable group we have the following:

Lemma 12.1.7. If a group G is both simple and solvable, then G is cyclic of prime order.

Proof. Suppose that G is a nontrivial simple, solvable group. Since G is simple, the only
normal series for G is G = G0 ⊃ {1}. Since G is solvable, the factors are Abelian; hence,
G is Abelian. Again, since G is simple, G must be cyclic. If G were infinite, then G ≅
(ℤ, +). However, then 2ℤ is a proper normal subgroup, a contradiction. Therefore, G
must be finite cyclic. If the order were not prime, then for each proper divisor of the
order, there would be a nontrivial proper normal subgroup. Therefore, G must be of
prime order.
In general, a finite p-group is solvable.

Theorem 12.1.8. A finite p-group G is solvable.

Proof. Suppose that |G| = pn . We do this by induction on n. If n = 1, then |G| = p, and G is


cyclic, hence Abelian and therefore solvable. Suppose that n > 1. Then as used previously
G has a nontrivial center Z(G). If Z(G) = G, then G is Abelian; hence solvable. If Z(G) ≠
G, then Z(G) is a finite p-group of order less than pn . From our inductive hypothesis,
Z(G) must be solvable. Furthermore, G/Z(G) is then also a finite p-group of order less
than pn , so it is also solvable. Hence, Z(G) and G/Z(G) are both solvable. Therefore, from
Theorem 12.1.4, G is solvable.
176 � 12 Solvable Groups

12.2 The Derived Series


Let G be a group, and let a, b ∈ G. The product aba−1 b−1 is called the commutator of a
and b. We write [a, b] = aba−1 b−1 .
Clearly, [a, b] = 1 if and only if a and b commute.

Definition 12.2.1. Let G′ be the subgroup of G, which is generated by the set of all com-
mutators

G′ = gp({[x, y] : x, y ∈ G}).

G′ is called the commutator or (derived) subgroup of G. We sometimes write G′ = [G, G].

Theorem 12.2.2. For any group G, the commutator subgroup G′ is a normal subgroup of
G, and G/G′ is Abelian. Furthermore, if H is a normal subgroup of G, then G/H is Abelian
if and only if G′ ⊂ H.

Proof. The commutator subgroup G′ consists of all finite products of commutators and
inverses of commutators. However,

[a, b]−1 = (aba−1 b−1 ) = bab−1 a−1 = [b, a],


−1

and so the inverse of a commutator is once again a commutator. It then follows that G′ is
precisely the set of all finite products of commutators; that is, G′ is the set of all elements
of the form

h1 h2 ⋅ ⋅ ⋅ hn ,

where each hi is a commutator of elements of G.


If h = [a, b] for a, b ∈ G, then for x ∈ G, xhx −1 = [xax −1 , xbx −1 ] is again a commutator
of elements of G. Now from our previous comments, an arbitrary element of G′ has the
form h1 h2 ⋅ ⋅ ⋅ hn , where each hi is a commutator.
Thus, x(h1 h2 ⋅ ⋅ ⋅ hn )x −1 = (xh1 x −1 )(xh2 x −1 ) ⋅ ⋅ ⋅ (xhn x −1 ) and, since by the above each
xhi x −1 is a commutator, x(h1 h2 ⋅ ⋅ ⋅ hn )x −1 ∈ G′ . It follows that G′ is a normal subgroup
of G.
Consider the factor group G/G′ . Let aG′ and bG′ be any two elements of G/G′ . Then

[aG′ , bG′ ] = aG′ ⋅ bG′ ⋅ (aG′ )


−1 −1
⋅ (bG′ )
= aG′ ⋅ bG′ ⋅ a−1 G′ ⋅ b−1 G′ = aba−1 b−1 G′ = G′

since [a, b] ∈ G′ . In other words, any two elements of G/G′ commute; therefore, G/G′ is
Abelian.
Now let N be a normal subgroup of G with G/N Abelian. Let a, b ∈ G, then aN and
bN commute since G/N is Abelian. Therefore,
12.3 Composition Series and the Jordan–Hölder Theorem � 177

[aN, bN] = aNbNa−1 Nb−1 N = aba−1 b−1 N = N.

It follows that [a, b] ∈ N. Therefore, all commutators of elements in G lie in N; thus,


G′ ⊂ N.

From the second part of Theorem 12.2.2, we see that G′ is the minimal normal sub-
group of G such that G/G′ is Abelian. We call G/G′ = Gab the Abelianization of G.
We consider next the following inductively defined sequence of subgroups of an
arbitrary group G called the derived series:

Definition 12.2.3. For an arbitrary group G, define G(0) = G and G(1) = G′ , and then,
inductively, G(n+1) = (G(n) )′ . That is, G(n+1) is the commutator subgroup or derived group
of G(n) . The chain of subgroups

G = G(0) ⊃ G(1) ⊃ ⋅ ⋅ ⋅ ⊃ G(n) ⊃ ⋅ ⋅ ⋅

is called the derived series for G.

Notice that since G(i+1) is the commutator subgroup of G(i) , we have G(i) /G(i+1) is
Abelian. If the derived series was finite, then G would have a normal series with Abelian
factors; hence would be solvable. The converse is also true and characterizes solvable
groups in terms of the derived series.

Theorem 12.2.4. A group G is solvable if and only if its derived series is finite. That is,
there exists an n such that G(n) = {1}.

Proof. If G(n) = {1} for some n, then as explained above, the derived series provides a
solvable series for G; hence, G is solvable. Conversely, suppose that G is solvable, and let

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gr = {1}

be a solvable series for G. We claim first that Gi ⊃ G(i) for all i. We do this by induction
on r. If r = 0, then G = G0 = G(0) . Suppose that Gi ⊃ G(i) . Then Gi′ ⊃ (G(i) )′ = G(i+1) . Since
Gi /Gi+1 is Abelian, it follows, from Theorem 12.2.2, that Gi+1 ⊃ Gi′ . Therefore, Gi+1 ⊃ G(i+1) ,
establishing the claim. Now if G is solvable, from the claim, we have that Gr ⊃ G(r) .
However, Gr = {1}; therefore, G(r) = {1}, proving the theorem.

The length of the derived series is called the solvability length of a solvable group G.
The class of solvable groups of class c consists of those solvable groups of solvability
length c, or less.

12.3 Composition Series and the Jordan–Hölder Theorem


The concept of a normal series is extremely important in the structure theory of groups.
This is especially true for finite groups. If
178 � 12 Solvable Groups

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gs = {1} and G = H0 ⊃ H1 ⊃ ⋅ ⋅ ⋅ ⊃ Ht = {1}

are two normal series for the group G, then the second is a refinement of the first if
all the terms of the second occur in the first series. Furthermore, two normal series
are called equivalent or (isomorphic) if there exists a 1–1 correspondence between the
factors (hence the length must be the same) of the two series such that the corresponding
factors are isomorphic.

Theorem 12.3.1 (Schreier’s theorem). Any two normal series for a group G have equiva-
lent refinements.

Proof. Consider two normal series for G:

G = G0 ⊃ G1 ⊃ ⋅ ⋅ ⋅ ⊃ Gs−1 ⊃ Gs = {1},
G = H0 ⊃ H1 ⊃ ⋅ ⋅ ⋅ ⊃ Ht−1 ⊃ Ht = {1}.

Now define

Gij = (Gi ∩ Hj )Gi+1 , j = 0, 1, 2, . . . , t,


Hji = (Gi ∩ Hj )Hj+1 , i = 0, 1, 2, . . . , s.

Then we have

G = G00 ⊃ G01 ⊃ ⋅ ⋅ ⋅ ⊃ G0t = G1


= G10 ⊃ ⋅ ⋅ ⋅ ⊃ G1t = G2 ⊃ ⋅ ⋅ ⋅ ⊃ Gst = {e},

and

G = H00 ⊃ H01 ⊃ ⋅ ⋅ ⋅ ⊃ H0s = H1


= H10 ⊃ ⋅ ⋅ ⋅ ⊃ H1s = H2 ⊃ ⋅ ⋅ ⋅ ⊃ Hts = {e}.

Now, applying the third isomorphism theorem to the groups Gi , Hj , Gi+1 , Hj+1 , we have
that Gi(j+1) = (Gi ∩ Hj+1 )Gi+1 is a normal subgroup of Gij = (Gi ∩ Hj )Gi+1 , and also that
Hj(i+1) = (Gi+1 ∩ Hj )Hj+1 is a normal subgroup of Hji = (Gi ∩ Hj )Hj+1 . Furthermore,

Gij /Gi(j+1) ≅ Hji /Hj(i+1) .

Thus, the above two are normal series, which are refinements of the two given series,
and they are equivalent.

A proper normal subgroup N of a group G is called maximal in G, if there does not


exist any normal subgroup N ⊂ M ⊂ G with all inclusions proper. This is the group
theoretic analog of a maximal ideal. An alternative characterization is the following: N
is a maximal normal subgroup of G if and only if G/N is simple.
12.4 Exercises � 179

A normal series, where each factor is simple can have no refinements.

Definition 12.3.2. A composition series for a group G is a normal series, where all the
inclusions are proper and such that Gi+1 is maximal in Gi . Equivalently, a normal series,
where each factor is simple.

It is possible that an arbitrary group does not have a composition series, or even if
it does have one, a subgroup of it may not have one. Of course, a finite group does have
a composition series.
In the case in which a group G does have a composition series, the following impor-
tant theorem, called the Jordan–Hölder theorem, provides a type of unique factoriza-
tion.

Theorem 12.3.3 (Jordan–Hölder theorem). If a group G has a composition series, then any
two composition series are equivalent; that is, the composition factors are unique.

Proof. Suppose we are given two composition series. Applying Theorem 12.3.1, we get
that the two composition series have equivalent refinements. But the only refinement
of a composition series is one obtained by introducing repetitions. If in the 1–1 corre-
spondence between the factors of these refinements, the paired factors equal to {e} are
disregarded; that is, if we drop the repetitions, clearly, we get that the original composi-
tion series are equivalent.

We remarked in Chapter 10 that the simple groups are important, because they play
a role in finite group theory somewhat analogous to that of the primes in number theory.
In particular, an arbitrary finite group G can be broken down into simple components.
These uniquely determined simple components are, according to the Jordan–Hölder the-
orem, the factors of a composition series for G.

12.4 Exercises
1. Let K be a field and

{ a x y }
{ }
G = {(0 b z) : a, b, c, x, y, z ∈ K, abc ≠ 0} .
{ }
{ 0 0 c }

Show that G is solvable.


2. A group G is called polycyclic if it has a normal series with cyclic factors. Show:
(i) Each subgroup and each factor group of a polycyclic group is polycyclic.
(ii) In a polycyclic group, each normal series has the same number of infinite cyclic
factors.
3. Let G be a group. Show the following:
(i) If G is finite and solvable, then G is polycyclic.
180 � 12 Solvable Groups

(ii) If G is polycyclic, then G is finitely generated.


(iii) The group (ℚ, +) is solvable, but not polycyclic.
4. Let N1 and N2 be normal subgroups of G. Show the following:
(i) If N1 and N2 are solvable, then also N1 N2 is a solvable normal subgroup of G.
(ii) Is (i) still true, if we replace “solvable” by “Abelian”?
5. Let N1 , . . . , Nt be normal subgroups of a group G. If all factor groups G/Ni are solv-
able, then also G/(N1 ∩ ⋅ ⋅ ⋅ ∩ Nt ) is solvable.
13 Group Actions and the Sylow Theorems
13.1 Group Actions
A group action of a group G on a set A is a homomorphism from G into SA , the symmetric
group on A. We say that G acts on A. Hence, G acts on A if to each g ∈ G corresponds a
permutation

πg : A → A

such that
(1) πg1 (πg2 (a)) = πg1 g2 (a) for all g1 , g2 ∈ G and for all a ∈ A,
(2) π1 (a) = a for all a ∈ A.

For the remainder of this chapter, if g ∈ G and a ∈ A, we will write ga for πg (a). Group
actions are an extremely important idea, and we use this idea in the present chapter to
prove several fundamental results in group theory. If G acts on the set A, then we say
that two elements a1 , a2 ∈ A are congruent under G if there exists a g ∈ G with ga1 = a2 .
The set

Ga = {a1 ∈ A : a1 = ga for some g ∈ G}

is called the orbit of a. It consists of elements congruent to a under G.

Lemma 13.1.1. If G acts on A, then congruence under G is an equivalence relation on A.

Proof. Any element a ∈ A is congruent to itself via the identity map; hence, the relation
is reflexive. If a1 ∼ a2 so that ga1 = a2 for some g ∈ G, then g −1 a2 = a1 , and so a2 ∼ a1 ,
and the relation is symmetric. Finally, if g1 a1 = a2 and g2 a2 = a3 , then g2 g1 a1 = a3 , and
the relation is transitive.

Recall that the equivalence classes under an equivalence relation partition a set.
For a given a ∈ A, its equivalence class under this relation is precisely its orbit Ga , as
defined above.

Corollary 13.1.2. If G acts on the set A, then the orbits under G partition the set A.

We say that G acts transitively on A if any two elements of A are congruent under G.
That is, the action is transitive if for any a1 , a2 ∈ A there is some g ∈ G such that ga1 = a2 .
If a ∈ A, the stabilizer of a consists of those g ∈ G that fix a. Hence,

StabG (a) = {g ∈ G : ga = a}.

The following lemma is easily proved and left to the exercises.

Lemma 13.1.3. If G acts on A, then for any a ∈ A, the stabilizer StabG (a) is a subgroup of G.

https://doi.org/10.1515/9783111142524-013
182 � 13 Group Actions and the Sylow Theorems

We now prove the crucial theorem concerning group actions.

Theorem 13.1.4. Suppose that G acts on A and a ∈ A. Let Ga be the orbit of a under G and
StabG (a) its stabilizer. Then

󵄨󵄨 󵄨
󵄨󵄨G : StabG (a)󵄨󵄨󵄨 = |Ga |.

That is, the size of the orbit of a is the index of its stabilizer in G.

Proof. Suppose that g1 , g2 ∈ G with g1 StabG (a) = g2 StabG (a); that is, they define the
same left coset of the stabilizer. Then g2−1 g1 ∈ StabG (a). This implies that g2−1 g1 a = a so
that g2 a = g1 a. Hence, any two elements in the same left coset of the stabilizer produce
the same image of a in Ga . Conversely, if g1 a = g2 a, then g1 , g2 define the same left coset
of StabG (a). This shows that there is a one-to-one correspondence between left cosets of
StabG (a) and elements of Ga . It follows that the size of Ga is precisely the index of the
stabilizer.

We will use this theorem repeatedly with different group actions to obtain impor-
tant group theoretic results.

13.2 Conjugacy Classes and the Class Equation


In Section 10.5, we introduced the center of a group

Z(G) = {g ∈ G : gg1 = g1 g for all g1 ∈ G},

and showed that it is a normal subgroup of G. We use this normal subgroup in conjunc-
tion with what we call the class equation to show that any finite p-group has a nontrivial
center. In this section, we use group actions to derive the class equation and prove the
result for finite p-groups.
Recall that if G is a group, then two elements g1 , g2 ∈ G are conjugate if there exists a
g ∈ G with g −1 g1 g = g2 . We saw that conjugacy is an equivalence relation on G. For The
equivalence class of g ∈ G is called its conjugacy class, which we will denote by Cl(g).
Thus,

Cl(g) = {g1 ∈ G : g1 is conjugate to g}.

If g ∈ G, then its centralizer CG (g) is the set of elements in G that commute with g:

CG (g) = {g1 ∈ G : gg1 = g1 g}.

Theorem 13.2.1. Let G be a finite group and g ∈ G. Then the centralizer of g is a subgroup
of G, and
13.2 Conjugacy Classes and the Class Equation � 183

󵄨󵄨 󵄨 󵄨 󵄨
󵄨󵄨G : CG (g)󵄨󵄨󵄨 = 󵄨󵄨󵄨Cl(g)󵄨󵄨󵄨.

That is, the index of the centralizer of g is the size of its conjugacy class.
In particular, for a finite group the size of each conjugacy class divides the order of
the group.

Proof. Let the group G act on itself by conjugation. That is, g(g1 ) = g −1 g1 g. It is easy
to show that this is an action on the set G (see exercises). The orbit of g ∈ G under this
action is precisely its conjugacy class Cl(g), and the stabilizer is its centralizer CG (g). The
statements in the theorem then follow directly from Theorem 13.1.4.

For any group G, since conjugacy is an equivalence relation, the conjugacy classes
partition G. Hence,

G = ⋃̇ Cl(g),
g∈G

where this union is taken over the distinct conjugacy classes. It follows that

󵄨 󵄨
|G| = ∑ 󵄨󵄨󵄨Cl(g)󵄨󵄨󵄨,
g∈G

where this sum is taken over distinct conjugacy classes.


If Cl(g) = {g}; that is, the conjugacy class of g is g alone, then CG (g) = G so that g
commutes with all of G. Therefore, in this case, g ∈ Z(G). This is true for every element
of the center; therefore,

G = Z(G) ∪ ⋃̇ Cl(g),
g∉Z(G)

where again the second union is taken over the distinct conjugacy classes Cl(g) with
g ∉ Z(G). The size of G is then the sum of these disjoint pieces, so

󵄨 󵄨 󵄨 󵄨
|G| = 󵄨󵄨󵄨Z(G)󵄨󵄨󵄨 + ∑ 󵄨󵄨󵄨Cl(g)󵄨󵄨󵄨,
g∉Z(G)

where the sum is taken over the distinct conjugacy classes Cl(g) with g ∉ Z(G). However,
from Theorem 13.2.1, |Cl(g)| = |G : CG (g)|, so the equation above becomes

󵄨 󵄨 󵄨 󵄨
|G| = 󵄨󵄨󵄨Z(G)󵄨󵄨󵄨 + ∑ 󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨,
g∉Z(G)

where the sum is taken over the distinct indices |G : CG (g)| with g ∉ Z(G). This is known
as the class equation.
184 � 13 Group Actions and the Sylow Theorems

Theorem 13.2.2 (Class equation). Let G be a finite group. Then

󵄨 󵄨 󵄨 󵄨
|G| = 󵄨󵄨󵄨Z(G)󵄨󵄨󵄨 + ∑ 󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨,
g∉Z(G)

where the sum is taken over the distinct centralizers.

As a first application, we prove the result that finite p-groups have nontrivial centers
(see Lemma 10.5.6).

Theorem 13.2.3. Let G be a finite p-group. Then G has a nontrivial center.

Proof. Let G be a finite p-group so that |G| = pn for some n, and consider the class equa-
tion

󵄨 󵄨 󵄨 󵄨
|G| = 󵄨󵄨󵄨Z(G)󵄨󵄨󵄨 + ∑ 󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨,
g∉Z(G)

where the sum is taken over the distinct centralizers. Since |G : CG (g)| divides |G| for
each g ∈ G, we must have that p||G : CG (g)| for each g ∈ G. Furthermore, p||G|. Therefore, p
must divide |Z(G)|; hence, |Z(G)| = pm for some m ≥ 1. Therefore, Z(G) is nontrivial.

The idea of conjugacy and the centralizer of an element can be extended to sub-
groups. If H1 , H2 are subgroups of a group G, then H1 , H2 are conjugate if there exists a
g ∈ G such that g −1 H1 g = H2 . As for elements, conjugacy is an equivalence relation on
the set of subgroups of G.
If H ⊂ G is a subgroup, then its conjugacy class consists of all the subgroups of G
conjugate to it. The normalizer of H is

NG (H) = {g ∈ G : g −1 Hg = H}.

As for elements, let G act on the set of subgroups of G by conjugation. That is, for
g ∈ G, the map is given by H 󳨃→ g −1 Hg. For H ⊂ G, the stabilizer under this action is pre-
cisely the normalizer. Hence, exactly as for elements, we obtain the following theorem:

Theorem 13.2.4. Let G be a group and H ⊂ G a subgroup. Then the normalizer NG (H) of
H is a subgroup of G, H is normal in NG (H), and

󵄨󵄨 󵄨
󵄨󵄨G : NG (H)󵄨󵄨󵄨 = number of conjugates of H in G.

13.3 The Sylow Theorems


If G is a finite group and H ⊂ G is a subgroup, then Lagrange’s theorem guarantees
that the order of H divides the order of G. However, the converse of Lagrange’s theorem
is false. That is, if G is a finite group of order n and if d|n, then G need not contain a
13.3 The Sylow Theorems � 185

subgroup of order d. If d is a prime p or a power of a prime pe , however, then we shall


see that G must contain subgroups of that order. In particular, we shall see that if pd
is the highest power of p that divides n, then all subgroups of that order are actually
conjugate, and we shall finally get a formula concerning the number of such subgroups.
These theorems constitute the Sylow theorems, which we will examine in this section.
First, we give an example, where the converse of Lagrange’s theorem is false.

Lemma 13.3.1. The alternating group on 4 symbols A4 has order 12, but has no subgroup
of order 6.

Proof. Suppose that there exists a subgroup U ⊂ A4 with |U| = 6. Then |A4 : U| = 2 since
|A4 | = 12; hence, U is normal in A4 .
Now id, (1, 2)(3, 4), (1, 3)(2, 4), (1, 4)(2, 3) are in A4 . These each have order 2 and com-
mute, so they form a normal subgroup V ⊂ A4 of order 4. This subgroup V is isomorphic
to ℤ2 × ℤ2 . Then

|V ||U| 4⋅6
12 = |A4 | ≥ |VU| = = .
|V ∩ U| |V ∩ U|

It follows that V ∩U ≠ {1}, and since U is normal, we have that V ∩U is also normal in A4 .
Now (1, 2)(3, 4) ∈ V , and by renaming the entries in V , if necessary, we may assume
that it is also in U, so that (1, 2)(3, 4) ∈ V ∩ U. Since (1, 2, 3) ∈ A4 , we have

(3, 2, 1)(1, 2)(3, 4)(1, 2, 3) = (1, 3)(2, 4) ∈ V ∩ U,

and then

(3, 2, 1)(1, 4)(2, 3)(1, 2, 3) = (1, 2)(3, 4) ∈ V ∩ U.

But then V ⊂ V ∩ U, and so V ⊂ U. But this is impossible since |V | = 4, which does not
divide |U| = 6.

Definition 13.3.2. Let G be a finite group with |G| = n, and let p be a prime such that
pa |n, but no higher power of p divides n. A subgroup of G of order pa is called a p-Sylow
subgroup.

It is not a clear that a p-Sylow subgroup must exist. We will prove that for each p|n
a p-Sylow subgroup exists.
We first consider and prove a very special case.

Theorem 13.3.3. Let G be a finite Abelian group, and let p be a prime such that p||G|. Then
G contains at least one element of order p.

Proof. Suppose that G is a finite Abelian group of order pn. We use induction on n. If
n = 1, then G has order p, and hence is cyclic. Therefore, it has an element of order p.
Suppose that the theorem is true for all Abelian groups of order pm with m < n, and
186 � 13 Group Actions and the Sylow Theorems

suppose that G has order pn. Suppose that g ∈ G. If the order of g is pt for some integer t,
then g t ≠ 1, and g t has order p, proving the theorem in this case. Hence, we may suppose
that g ∈ G has order prime to p, and we show that there must be an element, whose order
is a multiple of p, and then use the above argument to get an element of exact order p.
Hence, we have g ∈ G with order m, where (m, p) = 1. Since m||G| = pn, we must
have m|n. Since G is Abelian, ⟨g⟩ is normal, and the factor group G/⟨g⟩ is Abelian of
order p( mn ) < pn. By the inductive hypothesis, G/⟨g⟩ has an element h⟨g⟩ of order p,
h ∈ G; hence, hp = g k for some k. g k has order m1 |m; therefore, h has order pm1 . Now,
as above, hm1 has order p, proving the theorem.

Therefore, if G is an Abelian group, and if p|n, then G contains a subgroup of order p,


the cyclic subgroup of order p generated by an element a ∈ G of order p, whose existence
is guaranteed by the above theorem. We now present the first Sylow theorem:

Theorem 13.3.4 (First Sylow theorem). Let G be a finite group, and let p||G|, then G con-
tains a p-Sylow subgroup; that is, a p-Sylow subgroup exists.

Proof. Let G be a finite group of order pn, and—as above—we do induction on n. If


n = 1, then G is cyclic, and G is its own maximal p-subgroup; hence, all of G is a p-Sylow
subgroup. We assume then that if |G| = pm with m < n, then G has a p-Sylow subgroup.
Assume that |G| = pt m with (m, p) = 1. We must show that G contains a subgroup
of order pt . If H is a proper subgroup, whose index is prime to p, then |H| = pt m1 with
m1 < m. Therefore, by the inductive hypothesis, H has a p-Sylow subgroup of order pt .
This will also be a subgroup of G, hence a p-Sylow subgroup of G.
Therefore, we may assume that the index of any proper subgroup H of G must be
divisible by p. Now consider the class equation for G,

󵄨 󵄨 󵄨 󵄨
|G| = 󵄨󵄨󵄨Z(G)󵄨󵄨󵄨 + ∑ 󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨,
g∉Z(G)

where the sum is taken over the distinct centralizers. By assumption, each of the indices
are divisible by p and also p||G|. Therefore, p||Z(G)|. It follows that Z(G) is a finite Abelian
group, whose order is divisible by p. From Theorem 13.3.3, there exists an element g ∈
Z(G) ⊂ G of order p. Since g ∈ Z(G), we must have ⟨g⟩ normal in G. The factor group
G/⟨g⟩ then has order pt−1 m, and—by the inductive hypothesis—must have a p-Sylow
subgroup K of order pt−1 , hence of index m. By the Correspondence Theorem 10.2.6,
there is a subgroup K of G with ⟨g⟩ ⊂ K such that K/⟨g⟩ ≅ K. Therefore, |K| = pt , and K
is a p-Sylow subgroup of G.

On the basis of this theorem, we can now strengthen the result obtained in Theo-
rem 13.3.3.

Theorem 13.3.5 (Cauchy). If G is a finite group, and if p is a prime such that p||G|, then G
contains at least one element of order p.
13.3 The Sylow Theorems � 187

Proof. Let P be a p-Sylow subgroup of G, and let |P| = pt . If g ∈ P, g ≠ 1, then the order
t1 −1
of g is pt1 . Then g p has order p.

We have seen that p-Sylow subgroups exist. We now wish to show that any two
p-Sylow subgroups are conjugate. This is the content of the second Sylow theorem:

Theorem 13.3.6 (Second Sylow theorem). Let G be a finite group and p a prime such that
p||G|. Then any p-subgroup H of G is contained in a p-Sylow subgroup. Furthermore, all
p-Sylow subgroups of G are conjugate. That is, if P1 and P2 are any two p-Sylow subgroups
of G, then there exists an a ∈ G such that P1 = aP2 a−1 .

Proof. Let Ω be the set of p-Sylow subgroups of G, and let G act on Ω by conjugation. This
action will, of course, partition Ω into disjoint orbits. Let P be a fixed p-Sylow subgroup
and ΩP be its orbit under the conjugation action. The size of the orbit is the index of its
stabilizer; that is, |ΩP | = |G : StabG (P)|. Now P ⊂ StabG (P), and P is a maximal p-subgroup
of G. It follows that the index of StabG (P) must be prime to p, and so the number of
p-Sylow subgroups conjugate to P is prime to p.
Now let H be a p-subgroup of G, and let H act on ΩP by conjugation. ΩP will itself
decompose into disjoint orbits under this actions. Furthermore, the size of each orbit is
an index of a subgroup of H, hence must be a power of p. On the other hand, the size of
the whole orbit is prime to p. Therefore, there must be one orbit that has size exactly 1.
This orbit contains a p-Sylow subgroup P′ , and P′ is fixed by H under conjugation; that
is, H normalizes P′ . It follows that HP′ is a subgroup of G, and P′ is normal in HP′ . From
the second isomorphism theorem, we then obtain

HP′ /P′ ≅ H/(H ∩ P′ ).

Since H is a p-group, the size of H/(H ∩ P′ ) is a power of p; therefore, so is the size of


HP′ /P′ . But P′ is also a p-group, so it follows that HP′ also has order a power of p. Now
P′ ⊂ HP′ , but P′ is a maximal p-subgroup of G. Hence, HP′ = P′ . This is possible only
if H ⊂ P′ , proving the first assertion in the theorem. Therefore, any p-subgroup of G is
obtained in a p-Sylow subgroup.
Now let H be a p-Sylow subgroup P1 , and let P1 act on ΩP . Exactly as in the argument
above, P1 ⊂ P′ , where P′ is a conjugate of P. Since P1 and P′ are both p-Sylow subgroups,
they have the same size; hence, P1 = P′ . This implies that P1 is a conjugate of P. Since P1
and P are arbitrary p-Sylow subgroups, it follows that all p-Sylow subgroups are conju-
gate.

We come now to the last of the three Sylow theorems. This one gives us information
concerning the number of p-Sylow subgroups.

Theorem 13.3.7 (Third Sylow theorem). Let G be a finite group and p a prime such that
p||G|. Then the number of p-Sylow subgroups of G is of the form 1 + pk and divides the
order of |G|. It follows that if |G| = pa m with (p, m) = 1, then the number of p-Sylow
subgroups divides m.
188 � 13 Group Actions and the Sylow Theorems

Proof. Let P be a p-Sylow subgroup, and let P act on Ω, the set of all p-Sylow subgroups,
by conjugation. Now P normalizes itself, so there is one orbit, namely, P, having exactly
size 1. Every other orbit has size a power of p since the size is the index of a nontrivial
subgroup of P, and therefore must be divisible by p. Hence, the size of the Ω is 1 + pk.

13.4 Some Applications of the Sylow Theorems


We now give some applications of the Sylow theorems. First, we show that the converse
of Lagrange’s theorem is true for both general p-groups and for finite Abelian groups.

Theorem 13.4.1. Let G be a group of order pn , p a prime number. Then G contains at least
one normal subgroup of order pm for each m such that 0 ≤ m ≤ n.

Proof. We use induction on n. For n = 1, the theorem is trivial. By Lemma 10.5.7, any
group of order p2 is Abelian. This, together with Theorem 13.3.3, establishes the claim
for n = 2.
We now assume the theorem is true for all groups G of order pk , where 1 ≤ k < n,
where n > 2. Let G be a group of order pn . From Lemma 10.3.4, G has a nontrivial center
of order at least p, hence an element g ∈ Z(G) of order p. Let N = ⟨g⟩. Since g ∈ Z(G),
it follows that N is normal subgroup of order p. Then G/N is of order pn−1 , therefore
contains (by the induction hypothesis) normal subgroups of orders pm−1 , for 0 ≤ m − 1 ≤
n − 1. These groups are of the form H/N, where the normal subgroup H ⊂ G contains N
and is of order pm , 1 ≤ m ≤ n, because |H| = |N|[H : N] = |N| ⋅ |H/N|.

On the basis of the first Sylow theorem, we see that if G is a finite group, and if pk ||G|,
then G must contain a subgroup of order pk . One can actually show that, as in the case
of Sylow p-groups, the number of such subgroups is of the form 1 + pt, but we shall not
prove this here.

Theorem 13.4.2. Let G be a finite Abelian group of order n. Suppose that d|n. Then G
contains a subgroup of order d.
e e f f
Proof. Suppose that n = p11 ⋅ ⋅ ⋅ pkk is the prime factorization of n. Then d = p11 ⋅ ⋅ ⋅ pkk
e
for some nonnegative f1 , . . . , fk . Now G has p1 -Sylow subgroup H1 of order p11 . Hence,
f
from Theorem 13.4.1, H1 has a subgroup K1 of order p11 . Similarly, there are subgroups
f f
K2 , . . . , Kk of G of respective orders p22 , . . . , pkk . Moreover, since the orders are disjoint,
f f
Ki ∩ Kj = {1} if i ≠ j and thus ⟨K1 , K2 , . . . , Kk ⟩ has order |K1 ||K2 | ⋅ ⋅ ⋅ |Kk | = p11 ⋅ ⋅ ⋅ pkk = d.
In Section 10.5, we examined the classification of finite groups of small orders. Here,
we use the Sylow theorems to extend some of this material further.

Theorem 13.4.3. Let p, q be distinct primes with p < q and q not congruent to 1 modulo p.
Then any group of order pq is cyclic. For example, any group of order 15 must be cyclic.
13.4 Some Applications of the Sylow Theorems � 189

Proof. Suppose that |G| = pq with p < q and q not congruent to 1 modulo p. The number
of q-Sylow subgroups is of the form 1 + qk and divides p. Since q is greater than p, this
implies that there can be only one; hence, there is a normal q-Sylow subgroup H. Since
q is a prime, H is cyclic of order q; therefore, there is an element g of order q.
The number of p-Sylow subgroups is of the form 1 + pk and divides q. Since q is not
congruent to 1 modulo p, this implies that there also can be only one p-Sylow subgroup;
hence, there is a normal p-Sylow subgroup K. Since p is a prime K is cyclic of order p;
therefore, there is an element h of order p.
Since p, q are distinct primes H ∩ K = {1}. Consider the element g −1 h−1 gh. Since
K is normal, g −1 hg ∈ K. Then g −1 h−1 gh = (g −1 h−1 g)h ∈ K. But H is also normal, so
h−1 gh ∈ H. This then implies that g −1 h−1 gh = g −1 (h−1 gh) ∈ H; and therefore we have
g −1 h−1 gh ∈ K ∩ H. It follows then that g −1 h−1 gh = 1 or gh = hg. Since g, h commute, the
order of gh is the lcm of the orders of g and h, which is pq. Therefore, G has an element
of order pq. Since |G| = pq, this implies that G is cyclic.

In the above theorem, since we assumed that q is not congruent to 1 modulo p, hence
p ≠ 2. In the case where p = 2, we get another possibility.

Theorem 13.4.4. Let p be an odd prime and G a finite group of order 2p. Then either G is
cyclic, or G is isomorphic to the dihedral group of order 2p; that is, the group of symmetries
of a regular p-gon. In this latter case, G is generated by two elements, g and h, which satisfy
the relations g p = h2 = (gh)2 = 1.

Proof. As in the proof of Theorem 13.4.3, G must have a normal cyclic subgroup of or-
der p, say ⟨g⟩. Since 2||G|, the group G must have an element of order 2, say h. Consider
the order of gh. By Lagrange’s theorem, this element can have order 1, 2, p, 2p. If the
order is 1, then gh = 1 or g = h−1 = h. This is impossible since g has order p, and h
has order 2. If the order of gh is p, then from the second Sylow theorem, gh ∈ ⟨g⟩. But
this implies that h ∈ ⟨g⟩, which is impossible since every nontrivial element of ⟨g⟩ has
order p. Therefore, the order of gh is either 2 or 2p.
If the order of gh is 2p, then since G has order 2p, it must be cyclic.
If the order of gh is 2, then within G, we have the relations g p = h2 = (gh)2 = 1. Let
H = ⟨g, h⟩ be the subgroup of G generated by g and h. The relations g p = h2 = (gh)2 = 1
imply that H has order 2p. Since |G| = 2p, we get that H = G. G is isomorphic to the
dihedral group Dp of order 2p (see exercises).
In the above description, g represents a rotation of 2π p
of a regular p-gon about its
center, whereas h represents any reflection across a line of symmetry of the regular
p-gon.

Example 13.4.5 (The groups of order 21). Let G be a group of order 21. The number of
7-Sylow subgroups of G is 1, because it is of the form 1 + 7k and divides 3. Hence, the
7-Sylow subgroup K is normal and cyclic; that is, K ⊲ G and K = ⟨a⟩ with a of order 7.
The number of 3-Sylow subgroups is analogously 1 or 7. If it is 1, then we have exactly
one element of order 3 in G, and if it is 7, there are 14 elements of order 3 in G.
190 � 13 Group Actions and the Sylow Theorems

Let b be an element of order 3. Then bab−1 = ar for some r with 1 ≤ r ≤ 6. Now,


3
a = b3 ab−3 = ar ; hence, r 3 = 1 in ℤ6 , which implies r = 1, 2 or 4. The map b 󳨃→ b, a 󳨃→ a2
3
defines an automorphism of G, because a2 = a. Hence, up to isomorphism, there are
exactly two groups of order 21. If r = 1, then G is Abelian.
In fact, G = ⟨ab⟩ is cyclic of order 21. The group for r = 2 can be realized as a
subgroup of S7 . Let a = (1, 2, 3, 4, 5, 6, 7) and b = (2, 3, 5)(4, 7, 6). Then bab−1 = a2 and
⟨a, b⟩ has order 21.

We have looked at the finite fields ℤp . We give an example of a p-Sylow subgroup of


a matrix group over ℤp .

Example 13.4.6. Consider GL(n, p), the group of n × n invertible matrices over ℤp . If
{v1 , . . . , vn } is a basis for (ℤp )n over ℤp , then the size of GL(n, p) is the number of inde-
pendent images {w1 , . . . , wn } of {v1 , . . . , vn }. For w1 there are pn − 1 choices; for w2 there
are pn − p choices and so on. It follows that
n(n−1)
󵄨󵄨 󵄨 n n n n−1 1+2+⋅⋅⋅+(n−1)
󵄨󵄨GL(n, p)󵄨󵄨󵄨 = (p − 1)(p − p) ⋅ ⋅ ⋅ (p − p ) = p m=p 2 m
n(n−1)
with (p, m) = 1. Therefore, a p-Sylow subgroup must have size p 2 .
Let P be the subgroup of upper triangular matrices with 1’s on the diagonal. Then P
n(n−1)
has size p1+2+⋅⋅⋅+(n−1) = p 2 , and is therefore a p-Sylow subgroup of GL(n, p).

The final example is a bit more difficult. We mentioned that a major result on finite
groups is the classification of the finite simple groups. This classification showed that
any finite simple group is either cyclic of prime order, in one of several classes of groups
such as the An , n > 4, or one of a number of special examples called sporadic groups.
One of the major tools in this classification is the following famous result, called the
Feit–Thompson theorem, which showed that any finite group G of odd order is solvable
and, in addition, if G is not cyclic, then G is nonsimple.

Theorem 13.4.7 (Feit–Thompson theorem). Any finite group of odd order is solvable.

The proof of this theorem, one of the major results in algebra in the twentieth cen-
tury, is way beyond the scope of this book. The proof is actually hundreds of pages in
length, when one counts the results used. However, we look at the smallest non-Abelian
simple group.

Theorem 13.4.8. Suppose that G is a simple group of order 60. Then G is isomorphic to A5 .
Moreover, A5 is the smallest non-Abelian finite simple group.

Proof. Suppose that G is a simple group of order 60 = 22 ⋅ 3 ⋅ 5. The number of 5-Sylow


subgroups is of the form 1 + 5k and divides 12. Hence, there is 1 or 6. Since G is assumed
simple, and all 5-Sylow subgroups are conjugate, there cannot be only one. Hence, there
are 6. Since each of these is cyclic of order 5 they intersect only in the identity. Hence,
these 6 subgroups cover 24 distinct elements.
13.4 Some Applications of the Sylow Theorems � 191

The number of 3-Sylow subgroups is of the form 1 + 3k and divides 20. Hence, there
are 1, 4, 10. We claim that there are 10. There cannot be only 1, since G is simple. Suppose
there were 4. Let G act on the set of 3-Sylow subgroups by conjugation. Since an action
is a permutation, this gives a homomorphism f from G into S4 . By the first isomorphism
theorem, G/ ker(f ) ≅ im(f ).
However, since G is simple, the kernel must be trivial, and this implies that G would
imbed into S4 . This is impossible, since |G| = 60 > 24 = |S4 |. Therefore, there are 10
3-Sylow subgroups. Since each of these is cyclic of order 3, they intersect only in the
identity. Therefore, these 10 subgroups cover 20 distinct elements. Hence, together with
the elements in the 5-Sylow subgroups, we have 44 nontrivial elements.
The number of 2-Sylow subgroups is of the form 1 + 2k and divides 15. Hence, there
are 1, 3, 5, 15. We claim that there are 5. As before, there cannot be only 1, since G is sim-
ple. There cannot be 3, since as for the case of 3-Sylow subgroups, this would imply an
imbedding of G into S3 , which is impossible, given |S3 | = 6. Suppose that there were 15
2-Sylow subgroups, each of order 4. The intersections would have a maximum of 2 ele-
ments. Therefore, each of these would contribute at least 2 distinct elements. This gives
a minimum of 30 distinct elements. However, we already have 44 nontrivial elements
from the 3-Sylow and 5-Sylow subgroups. Since |G| = 60, this is too many. Therefore, G
must have 5 2-Sylow subgroups.
Now let G act on the set of 2-Sylow subgroups. This then, as above, implies an imbed-
ding of G into S5 , so we may consider G as a subgroup of S5 . However, the only subgroup
of S5 of order 60 is A5 ; therefore, G ≅ A5 .
The proof that A5 is the smallest non-Abelian simple group is actually brute force.
We show that any group G of order less than 60 either has prime order, or is nonsimple.
There are strong tools that we can use. By the Feit–Thompson theorem, we must only
consider groups of even order. From Theorem 13.4.4, we do not have to consider or-
ders 2p. The rest can be done by an analysis using Sylow theory. For example, we show
that any group of order 20 is nonsimple. Since 20 = 22 ⋅ 5, the number of 5-Sylow sub-
groups is 1 + 5k and divides 4. Hence, there is only one; therefore, it must be normal, and
so G is nonsimple. There is a strong theorem by Burnside, whose proof is usually done
with representation theory (see Chapter 22), which says that any group, whose order is
divisible by only two primes, is solvable. Therefore, for |G| = 60, we only have to show
that groups of order 30 = 2 ⋅ 3 ⋅ 5 and 42 = 2 ⋅ 3 ⋅ 7 are nonsimple. This is done in the same
manner as the first part of this proof. Suppose |G| = 30. The number of 5-Sylow sub-
groups is of the form 1 + 5k and divides 6. Hence, there are 1 or 6. If G were simple there
would have to be 6 covering 24 distinct elements. The number of 3-Sylow subgroups is
of the form 1 + 3k and divides 10; hence, there are 1 or 10. If there were 10 these would
cover an additional 20 distinct elements, which is impossible, since we already have 24
and G has order 30. Therefore, there is only one, hence a normal 3-Sylow subgroup. It fol-
lows that G cannot be simple. The case |G| = 42 is even simpler. There must be a normal
7-Sylow subgroup.
192 � 13 Group Actions and the Sylow Theorems

13.5 Exercises
1. Prove Lemma 13.1.3.
2. Let the group G act on itself by conjugation; that is, g(g1 ) = g −1 g1 g. Prove that this
is an action on the set G.
3. Show that the dihedral group Dn of order 2n has the presentation

⟨r, f ; r n = f 2 = (rf )2 = 1⟩

(see Chapter 14 for group presentations).


4. Show that each group of order ≤ 59 is solvable.
5. Show that there is no simple group of order 84.
6. Let P1 and P2 be two different p-Sylow subgroups of a finite group G. Show that P1 P2
is not a subgroup of G.
7. Let P and Q be two p-Sylow subgroups of the finite group G. If Z(P) is a normal
subgroup of Q, then Z(P) = Z(Q).
8. Let G be a finite group. For a prime p the following are equivalent:
(i) G has exactly one p-Sylow subgroup.
(ii) The product of any two elements of order p has some order pk .
9. Let p be a prime and G = SL(2, p). Let P = ⟨a⟩, where a = ( 01 11 ).
(i) Determine the normalizer NG (P) and the number of p-Sylow subgroups of G.
(ii) Determine the centralizer CG (a). How many elements of order p does G have?
In how many conjugacy classes can they be decomposed?
(iii) Show that all subgroups of G of order p(p − 1) are conjugate.
(iv) Show that G has no elements of order p(p − 1) for p ≥ 5.
10. Let G be a finite group and N a normal subgroup such that |N| is a power of p. Show
that N is contained in every p-Sylow subgroup of G.
11. Let p be a prime number, and let P and Q be two p-Sylow subgroups of the finite
group G such that P is contained in NG(Q) . Show that P = Q.
14 Free Groups and Group Presentations
14.1 Group Presentations and Combinatorial Group Theory
In discussing the symmetric group on 3 symbols and then the various dihedral groups
in Chapters 9, 10, and 11, we came across the concept of a group presentation. Roughly,
for a group G, a presentation consists of a set of generators X for G, so that G = ⟨X⟩,
and a set of relations between the elements of X, from which—in principle—the whole
group table can be constructed. In this chapter, we make this concept precise. As we will
see, every group G has a presentation, but it is mainly in the case where the group is
finite or countably infinite that presentations are most useful. Historically, the idea of
group presentations arose out of the attempt to describe the countably infinite funda-
mental groups that came out of low dimensional topology. The study of groups using
group presentations is called combinatorial group theory.
Before looking at group presentations in general, we revisit two examples of finite
groups and then a class of infinite groups.
Consider the symmetric group on 3 symbols, S3 . We saw that it has the following 6
elements:

1 2 3 1 2 3 1 2 3
1=( ), a=( ), b=( )
1 2 3 2 3 1 3 1 2
1 2 3 1 2 3 1 2 3
c=( ), d=( ), e=( ).
2 1 3 3 2 1 1 3 2

Notice that a3 = 1, c2 = 1, and that ac = ca2 . We claim that

⟨a, c; a3 = c2 = (ac)2 = 1⟩

is a presentation for S3 . First, it is easy to show that S3 = ⟨a, c⟩. Indeed,

1 = 1, a = a, b = a2 , c = c, d = ac, e = a2 c,

and so a, c generate S3 .
Now from (ac)2 = acac = 1, we get that ca = a2 c. This implies that if we write any
sequence (or word in our later language) in a and c, we can also rearrange it so that
the only nontrivial powers of a are a and a2 ; the only powers of c are c, and all a terms
precede c terms. For example,

aca2 cac = aca(acac) = a(ca) = a(a2 c) = (a3 )c = c.

Therefore, using the three relations from the presentation above, each element of S3 can
be written as aα cβ with α = 0, 1, 2 and β = 0, 1. From this the multiplication of any two
elements can be determined.

https://doi.org/10.1515/9783111142524-014
194 � 14 Free Groups and Group Presentations

This type of argument exactly applies to all the dihedral groups Dn . We saw that, in
general, |Dn | = 2n. Since these are the symmetry groups of a regular n-gon, we always
have a rotation r of angle 2π n
about the center of the n-gon. This element r would have
order n. Let f be a reflection about any line of symmetry. Then f 2 = 1, and rf is a reflec-
tion about the rotated line, which is also a line of symmetry. Therefore, (rf )2 = 1. Exactly
as for S3 , the relation (rf )2 = 1 implies that fr = r −1 f = r n−1 f . This allows us to always
place r terms in front of f terms in any word on r and f . Therefore, the elements of Dn
are always of the form

rα f β , α = 0, 1, 2, . . . , n − 1, β = 0, 1.

Moreover, the relations r n = f 2 = (rf )2 = 1 allow us to rearrange any word in r and f


into this form. It follows that |⟨r, f ⟩| = 2n; hence, Dn = ⟨r, f ⟩ together with the relations
above. Hence, we obtain the following:

Theorem 14.1.1. If Dn is the symmetry group of a regular n-gon, then a presentation for
Dn is given by

Dn = ⟨r, f ; r n = f 2 = (rf )2 = 1⟩.

(See Section 14.3 for the concept of group presentations.)

We now give one class of infinite examples. If G is an infinite cyclic group, so that
G ≅ ℤ, then G = ⟨g; ⟩ is a presentation for G. That is, G has a single generator with no
relations.
A direct product of n copies of ℤ is called a free Abelian group of rank n. We will
denote this by ℤn . A presentation for ℤn is then given by

ℤn = ⟨x1 , x2 , . . . , xn ; xi xj = xj xi for all i, j = 1, . . . , n⟩.

14.2 Free Groups


Crucial to the concept of a group presentation is the idea of a free group.

Definition 14.2.1 (Universal mapping property). A group F is free on a subset X if every


map f : X → G with G a group can be extended to a unique homomorphism f : F → G.
X is called a free basis for F. In general, a group F is a free group if it is free on some
subset X. If X is a free basis for a free group F, we write F = F(X).

We first show that given any set X, there does exist a free group with free basis X.
Let X = {xi }i∈I be a set (possibly empty). We will construct a group F(X), which is free
with free basis X. First, let X −1 be a set disjoint from X, but bijective to X. If xi ∈ X, then
we denote as xi−1 the corresponding element of X −1 under the bijection, and say that xi
and xi−1 are associated. The set X −1 is called the set of formal inverses from X, and we
14.2 Free Groups � 195

call X ∪ X −1 the alphabet. Elements of the alphabet are called letters. Hence, a letter has
ϵ
the form xi 1 , where ϵi = ±1. A word in X is a finite sequence of letters from the alphabet.
That is a word has the form
ϵi ϵi ϵ
w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in ,
1 2 n

where xij ∈ X, and ϵij = ±1. If n = 0, we call it the empty word, which we will denote as e.
The integer n is called the length of the word. Words of the form xi xi−1 or xi−1 xi are called
trivial words. We let W (X) be the set of all words on X.
If w1 , w2 ∈ W (X), we say that w1 is equivalent to w2 , denoted as w1 ∼ w2 , if w1 can
be converted to w2 by a finite string of insertions and deletions of trivial words. For
example, if w1 = x3 x4 x4−1 x2 x2 and w2 = x3 x2 x2 , then w1 ∼ w2 . It is straightforward to
verify that this is an equivalence relation on W (X) (see exercises). Let F(X) denote the
set of equivalence classes in W (X) under this relation; hence, F(X) is a set of equivalence
classes of words from X.
A word w ∈ W (X) is said to be freely reduced or reduced if it has no trivial subwords
(a subword is a connected sequence within a word). Hence, in the example above, w2 =
x3 x2 x2 is reduced, but w1 = x3 x4 x4−1 x2 x2 is not reduced. There is a unique element of
minimal length in each equivalence class in F(X). Furthermore, this element must be
reduced or else it would be equivalent to something of smaller length. Two reduced
words in W (X) are either equal or not in the same equivalence class in F(X). Hence,
F(X) can also be considered as the set of all reduced words from W (X).
ϵi ϵi ϵ
Given a word w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , we can find the unique reduced word w equivalent
1 2 n
to w via the following free reduction process. Beginning from the left side of w, we cancel
each occurrence of a trivial subword. After all these possible cancellations, we have a
word w′ . Now we repeat the process again, starting from the left side. Since w has finite
length, eventually the resulting word will either be empty or reduced. The final reduced
w is the free reduction of w.
Now we build a multiplication on F(X). If
ϵi ϵi ϵ ϵj ϵj ϵ
w1 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , w2 = xj 1 xj 2 ⋅ ⋅ ⋅ xj jm
1 2 n 1 2 m

are two words in W (X), then their concatenation w1 ⋆ w2 is simply placing w2 after w1 ,
ϵi ϵi ϵ ϵj ϵj ϵ
w1 ⋆ w2 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in xj 1 xj 2 ⋅ ⋅ ⋅ xj jm .
1 2 n 1 2 m

If w1 , w2 ∈ F(X), then we define their product as

w1 w2 = equivalence class of w1 ⋆ w2 .

That is, we concatenate w1 and w2 , and the product is the equivalence class of the result-
ing word. It is easy to show that if w1 ∼ w1′ and w2 ∼ w2′ , then w1 ⋆ w2 ∼ w1′ ⋆ w2′ so that
the above multiplication is well defined. Equivalently, we can think of this product in
196 � 14 Free Groups and Group Presentations

the following way. If w1 , w2 are reduced words, then to find w1 w2 , first concatenate, and
ϵ ϵj
then freely reduce. Notice that if xi in xj 1 is a trivial word, then it is cancelled when the
n 1
concatenation is formed. We say then that there is cancellation in forming the product
w1 w2 . Otherwise, the product is formed without cancellation.

Theorem 14.2.2. Let X be a nonempty set, and let F(X) be as above. Then F(X) is a free
group with free basis X. Furthermore, if X = 0, then F(X) = {1}; if |X| = 1, then F(X) ≅ ℤ,
and if |X| ≥ 2, then F(X) is non-Abelian.

Proof. We first show that F(X) is a group, and then show that it satisfies the universal
mapping property on X. We consider F(X) as the set of reduced words in W (X) with
the multiplication defined above. Clearly, the empty word acts as the identity element 1.
ϵi ϵi ϵ −ϵi −ϵi
If w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in and w1 = xi in xi n−1 ⋅ ⋅ ⋅ xi 1 , then both w ⋆ w1 and w1 ⋆ w freely
−ϵ
1 2 n n n−1 1
reduce to the empty word, and so w1 is the inverse of w. Therefore, each element of
F(X) has an inverse. Therefore, to show that F(X) forms a group, we must show that the
multiplication is associative. Let
ϵi ϵi ϵ ϵj ϵj ϵ ϵk ϵk ϵk
w1 = xi 1 xi 2 ⋅ ⋅ ⋅ xi in , w2 = xj 1 xj 2 ⋅ ⋅ ⋅ xj jm , w3 = xk 1 xk 2 ⋅ ⋅ ⋅ xk p
1 2 n 1 2 m 1 2 p

be three freely reduced words in F(X). We must show that

(w1 w2 )w3 = w1 (w2 w3 ).

To prove this, we use induction on m, the length of w2 . If m = 0, then w2 is the


empty word, hence the identity, and it is certainly true. Now suppose that m = 1 so that
ϵj
w2 = xj 1 . We must consider exactly four cases.
1
Case (1): There is no cancellation in forming either w1 w2 or w2 w3 . Put differently,
ϵj ϵj −ϵk
xj 1 ≠ xi in , and xj 1 ≠ xk 1 . Then the product w1 w2 is just the concatenation of the words,
−ϵ
1 n 1 1
and so is (w1 w2 )w3 . The same is true for w1 (w2 w3 ). Therefore, w1 (w2 w3 ) = (w1 w2 )w3 .
Case (2): There is cancellation in forming w1 w2 , but not in forming w2 w3 . Then if we
concatenate all three words, the only cancellation occurs between w1 and w2 in either
w1 (w2 w3 ) or in (w1 w2 )w3 ; hence, they are equal. Therefore, w1 (w2 w3 ) = (w1 w2 )w3 .
Case (3): There is cancellation in forming w2 w3 , but not in forming w1 w2 . This is
entirely analogous to Case (2). Therefore, w1 (w2 w3 ) = (w1 w2 )w3 .
ϵj
Case (4): There is cancellation in forming w1 w2 and also in forming w2 w3 . Then xj 1 =
1
ϵj −ϵk1
xi and xj 1 = xk . Here,
−ϵin
n 1 1

ϵi ϵin−1 ϵk1 ϵk2 ϵk


(w1 w2 )w3 = xi 1 ⋅ ⋅ ⋅ xi xk xk ⋅ ⋅ ⋅ xk p .
1 n−1 1 2 p

On the other hand,


ϵi ϵ ϵk ϵk
w1 (w2 w3 ) = xi 1 ⋅ ⋅ ⋅ xi in xk 2 ⋅ ⋅ ⋅ xk p .
1 n 2 p

ϵ ϵk
However, these are equal since xi in = xk 1 . Therefore, w1 (w2 w3 ) = (w1 w2 )w3 .
n 1
14.2 Free Groups � 197

It follows, inductively, from these four cases, that the associative law holds in F(X);
therefore, F(X) forms a group.
Now suppose that f : X → G is a map from X into a group G. By the construction of
F(X) as a set of reduced words this can be extended to a unique homomorphism. If w ∈ F
ϵi ϵ
with w = xi 1 ⋅ ⋅ ⋅ xi in , then define f (w) = f (xi1 )ϵi1 ⋅ ⋅ ⋅ f (xin )ϵin . Since multiplication in F(X)
1 n
is concatenation, this defines a homomorphism and again form the construction of F(X),
its the only one extending f . This is analogous to constructing a linear transformation
from one vector space to another by specifying the images of a basis. Therefore, F(X)
satisfies the universal mapping property of Definition 14.2.1. Hence, F(X) is a free group
with free basis X.
The final parts of Theorem 14.2.2 are straightforward. If X is empty, the only reduced
word is the empty word; hence, the group is just the identity. If X has a single letter, then
F(X) has a single generator, and is therefore cyclic. It is easy to see that it must be torsion-
free. Therefore, F(X) is infinite cyclic; that is, F(X) ≅ ℤ. Finally, if |X| ≥ 2, let x1 , x2 ∈ X.
Then x1 x2 ≠ x2 x1 , and both are reduced. Therefore, F(X) is non-Abelian.

The proof of Theorem 14.2.2 provides another way to look at free groups.

Theorem 14.2.3. F is a free group if and only if there is a generating set X such that every
element of F has a unique representation as a freely reduced word on X.

The structure of a free group is entirely dependent on the cardinality of a free basis.
In particular, the cardinality of a free basis X for a free group F is unique, and is called
the rank of F. If |X| < ∞, F is of finite rank. If F has rank n and X = {x1 , x2 , . . . , xn }, we
say that F is free on {x1 , x2 , . . . , xn }. We denote this by F(x1 , x2 , . . . , xn ).

Theorem 14.2.4. If X and Y are sets with the same cardinality, that is, |X| = |Y |, then
F(X) ≅ F(Y ), the resulting free groups are isomorphic. Furthermore, if F(X) ≅ F(Y ), then
|X| = |Y |.

Proof. Suppose that f : X → Y is a bijection from X onto Y . Now Y ⊂ F(Y ), so there is


a unique homomorphism ϕ : F(X) → F(Y ) extending f . Since f is a bijection, it has an
inverse f −1 : Y → X, and since F(Y ) is free, there is a unique homomorphism ϕ1 from
F(Y ) to F(X) extending f −1 . Then ϕϕ1 is the identity map on F(Y ), and ϕ1 ϕ is the identity
map on F(X). Therefore, ϕ, ϕ1 are isomorphisms with ϕ = ϕ−1 1 .
Conversely, suppose that F(X) ≅ F(Y ). In F(X), let N(X) be the subgroup generated
by all squares in F(X); that is,

N(X) = ⟨{g 2 : g ∈ F(X)}⟩.

Then N(X) is a normal subgroup, and the factor group F(X)/N(X) is Abelian, where
every nontrivial element has order 2 (see exercises). Therefore, F(X)/N(X) can be con-
sidered as a vector space over ℤ2 , the finite field of order 2, with X as a vector space
basis. Hence, |X| is the dimension of this vector space. Let N(Y ) be the corresponding
198 � 14 Free Groups and Group Presentations

subgroup of F(Y ). Since F(X) ≅ F(Y ), we would have F(X)/N(X) ≅ F(Y )/N(Y ); therefore,
|Y | is the dimension of the vector space F(Y )/N(Y ). Thus, |X| = |Y | from the uniqueness
of dimension of vector spaces.

Expressing elements of F(X) as a reduced word gives a normal form for elements in
a free group F. As we will see in Section 14.5, this solves what is termed the word problem
for free groups. Another important concept is the following: a freely reduced word W =
e e e
xv11 xv22 ⋅ ⋅ ⋅ xvnn is cyclically reduced if v1 ≠ vn , or if v1 = vn , then e1 ≠ −en . Clearly then,
every element of a free group is conjugate to an element given by a cyclically reduced
word. This provides a method to determine conjugacy in free groups.

Theorem 14.2.5. In a free group F, two elements g1 , g2 are conjugate if and only if a cycli-
cally reduced word for g1 is a cyclic permutation of a cyclically reduced word for g2 .

The theory of free groups has a large and extensive literature. We close this section
by stating several important properties. Proofs for these results can be found in [37], [36]
or [21].

Theorem 14.2.6. A free group is torsion-free.

From Theorem 14.2.4, we can deduce:

Theorem 14.2.7. An Abelian subgroup of a free group must be cyclic.

Finally, a celebrated theorem of Nielsen and Schreier states that a subgroup of a free
group must be free.

Theorem 14.2.8 (Nielsen–Schreier). A subgroup of a free group is itself a free group.

Combinatorially, F is free on X if X is a set of generators for F, and there are no


nontrivial relations. In particular, the following hold:
There are several different proofs of this result, see [37], with the most straightfor-
ward being topological in nature. We give an outline of a simple topological proof in
Section 14.4.
About 1920, Nielsen, using a technique now called Nielsen transformations in his
honor, first proved this theorem for finitely generated subgroups. Schreier, shortly after,
found a combinatorial method to extend this to arbitrary subgroups. A complete version
of the original combinatorial proof appears in [37], and in the notes by Johnson [31].
Schreier’s combinatorial proof also allows for a description of the free basis for the
subgroup. In particular, let F be free on X, and H ⊂ F a subgroup. Let T = {tα } be a
complete set of right coset representatives for F modulo H with the property that if tα =
e e e e e e
xv11 xv22 ⋅ ⋅ ⋅ xvnn ∈ T, with ϵi = ±1, then all the initial segments 1, xv11 , xv11 xv22 , et cetera are also
in T. Such a system of coset representatives can always be found, and is called a Schreier
system or Schreier transversal for H. If g ∈ F, let g represent its coset representative in
T, and further define for g ∈ F and t ∈ T, Stg = tg(tg)−1 . Notice that Stg ∈ H for all t, g.
We then have the following:
14.3 Group Presentations � 199

Theorem 14.2.9 (Explicit form of Nielsen–Schreier). Let F be free on X and H a subgroup


of F. If T is a Schreier transversal for F modulo H, then H is free on the set

{Stx : t ∈ T, x ∈ X, Stx ≠ 1}.

Example 14.2.10. Let F be free on {a, b} and H = F(X 2 ) the normal subgroup of F gen-
erated by all squares in F.
Then F/F(X 2 ) = ⟨a, b; a2 = b2 = (ab)2 = 1⟩ = ℤ2 × ℤ2 (see Section 14.3 for the concept
of group presentations). It follows that a Schreier system for F modulo H is {1, a, b, ab}
with a = a, b = b and ba = ab. From this it can be shown that H is free on the generating
set

x1 = a2 , x2 = bab−1 a−1 , x3 = b2 , x4 = abab−1 , x5 = ab2 a−1 .

The theorem also allows for a computation of the rank of H, given the rank of F and
the index. Specifically:

Corollary 14.2.11. Suppose F is free of rank n and |F : H| = k. Then H is free of rank


nk − k + 1.

From the example, we see that F is free of rank 2, H has index 4, so H is free of rank
2 ⋅ 4 − 4 + 1 = 5.

14.3 Group Presentations


The significance of free groups stems from the following result, which is easily deduced
from the definition and will lead us directly to a formal definition of a group presenta-
tion. Let G be any group and F the free group on the elements of G considered as a set.
The identity map f : G → G can be extended to a homomorphism of F onto G. Therefore,
we have the following:

Theorem 14.3.1. Every group G is a homomorphic image of a free group. That is, let G be
any group. Then G = F/N, where F is a free group.

In the above theorem, instead of taking all the elements of G, we can consider just
a set X of generators for G. Then G is a factor group of F(X), G ≅ F(X)/N. The normal
subgroup N is the kernel of the homomorphism from F(X) onto G. We use Theorem 14.3.1
to formally define a group presentation.
If H is a subset of a group G, then the normal closure of H denoted by N(H) is the
smallest normal subgroup of G containing H. This can be described alternatively in the
following manner. The normal closure of H is the subgroup of G generated by all conju-
gates of elements of H.
Now suppose that G is a group with X, a set of generators for G. We also call X a gen-
erating system for G. Now let G = F(X)/N as in Theorem 14.3.1 and the comments after
200 � 14 Free Groups and Group Presentations

it. N is the kernel of the homomorphism f : F(X) → G. It follows that if r is a free group
word with r ∈ N, then r = 1 in G (under the homomorphism). We then call r a relator
in G, and the equation r = 1 a relation in G. Suppose that R is a subset of N such that
N = N(R), then R is called a set of defining relators for G. The equations r = 1, r ∈ R, are
a set of defining relations for G. It follows that any relator in G is a product of conjugates
of elements of R. Equivalently, r ∈ F(X) is a relator in G if and only if r can be reduced
to the empty word by insertions and deletions of elements of R, and trivial words.

Definition 14.3.2. Let G be a group. Then a group presentation for G consists of a set of
generators X for G and a set R of defining relators. In this case, we write G = ⟨X; R⟩. We
could also write the presentation in terms of defining relations as G = ⟨X; r = 1, r ∈ R⟩.

From Theorem 14.3.1, it follows immediately that every group has a presentation.
However, in general, there are many presentations for the same group. If R ⊂ R1 , then
R1 is also a set of defining relators.

Lemma 14.3.3. Let G be a group. Then G has a presentation.

If G = ⟨X; R⟩ and X is finite, then G is said to be finitely generated. If R is finite, G is


finitely related. If both X and R are finite, G is finitely presented.
Using group presentations, we get another characterization of free groups.

Theorem 14.3.4. F is a free group if and only if F has a presentation of the form F = ⟨X; ⟩.

Mimicking the construction of a free group from a set X, we can show that to each
presentation corresponds a group. Suppose that we are given a supposed presentation
⟨X; R⟩, where R is given as a set of words in X. Consider the free group F(X) on X. Define
two words w1 , w2 on X to be equivalent if w1 can be transformed into w2 using insertions
and deletions of elements of R and trivial words. As in the free group case, this is an
equivalence relation. Let G be the set of equivalence classes. If we define multiplication
as before, as concatenation followed by the appropriate equivalence class, then G is a
group. Furthermore, each r ∈ R must equal the identity in G so that G = ⟨X; R⟩. Notice
that here there may be no unique reduced word for an element of G.

Theorem 14.3.5. Given (X, R), where X is a set and R is a set of words on X. Then there
exists a group G with presentation ⟨X; R⟩.

We now give some examples of group presentations:

Example 14.3.6. A free group of rank n has a presentation

Fn = ⟨x1 , . . . , xn ; ⟩.

Example 14.3.7. A free Abelian group of rank n has a presentation

ℤn = ⟨x1 , . . . , xn ; xi xj xi−1 xj−1 , i = 1, . . . , n, j = 1, . . . , n⟩.


14.3 Group Presentations � 201

Example 14.3.8. A cyclic group of order n has a presentation

ℤn = ⟨x; x n = 1⟩.

Example 14.3.9. The dihedral groups of order 2n, representing the symmetry group of
a regular n-gon, has a presentation

⟨r, f ; r n = 1, f 2 = 1, (rf )2 = 1⟩.

14.3.1 The Modular Group

In this section, we give a more complicated example, and then a nice application to num-
ber theory.
If R is a commutative ring with identity, then the set of invertible (n × n)-matrices
with entries from R forms a group under matrix multiplication called the n-dimen-
sional general linear group over R, see [41]. This group is denoted by GL(n, R). Since
det(A) det(B) = det(AB) for square matrices A, B, it follows that the subset of GL(n, R),
consisting of those matrices of determinant 1, forms a subgroup. This subgroup is called
the special linear group over R and is denoted by SL(n, R). In this section, we concentrate
on SL(2, ℤ), or more specifically, a quotient of it, PSL(2, ℤ), and find presentations for
them. The group SL(2, ℤ) then consists of (2 × 2)-matrices of determinant 1 with integral
entries:

a b
SL(2, ℤ) = {( ) : a, b, c, d ∈ ℤ, ad − bc = 1} .
c d

The group SL(2, ℤ) is called the homogeneous modular group, and an element of SL(2, ℤ)
is called a unimodular matrix. If G is any group, recall that its center Z(G) consists of those
elements of G, which commute with all elements of G:

Z(G) = {g ∈ G : gh = hg, ∀h ∈ G}.

The group Z(G) is a normal subgroup of G. Hence, we can form the factor group G/Z(G).
For G = SL(2, ℤ), the only unimodular matrices that commute with all others are
±I = ±( 01 01 ). Therefore, Z(SL(2, ℤ)) = {I, −I}. The quotient

SL(2, ℤ)/Z(SL(2, ℤ)) = SL(2, ℤ)/{I, −I}

is denoted by PSL(2, ℤ) and is called the projective special linear group or inhomogeneous
modular group. More commonly, PSL(2, ℤ) is just called the modular group, and denoted
by M.
M arises in many different areas of mathematics, including number theory, com-
plex analysis, and Riemann surface theory and the theory of automorphic forms and
202 � 14 Free Groups and Group Presentations

functions. M is perhaps the most widely studied single finitely presented group. Com-
plete discussions of M and its structure can be found in the books Integral Matrices by
M. Newman, see [56], and Algebraic Theory of the Bianchi Groups by B. Fine, see [51].
Since M = PSL(2, ℤ) = SL(2, ℤ)/{I, −I}, it follows that each element of M can be
considered as ±A, where A is a unimodular matrix. A projective unimodular matrix is
then

a b
±( ), a, b, c, d ∈ ℤ, ad − bc = 1.
c d

The elements of M can also be considered as linear fractional transformations over the
complex numbers

az + b
z′ = , a, b, c, d ∈ ℤ, ad − bc = 1, where z ∈ ℂ.
cz + d

Thought of in this way, M forms a Fuchsian group, which is a discrete group of isometries
of the non-Euclidean hyperbolic plane. The book by Katok, see [33], gives a solid and clear
introduction to such groups. This material can also be found in condensed form in [53].
We now determine presentations for both SL(2, ℤ) and M = PSL(2, ℤ).

Theorem 14.3.10. The group SL(2, ℤ) is generated by the elements

0 −1 0 1
X =( ) and Y =( ).
1 0 −1 −1

Furthermore, a complete set of defining relations for the group in terms of these gen-
erators is given by

X 4 = Y 3 = YX 2 Y −1 X −2 = I.

It follows that SL(2, ℤ) has the presentation

⟨X, Y ; X 4 = Y 3 = YX 2 Y −1 X −2 = I⟩.

Proof. We first show that SL(2, ℤ) is generated by X and Y ; that is, every matrix A in the
group can be written as a product of powers of X and Y .
Let

1 1
U =( ).
0 1

Then a direct multiplication shows that U = XY , and we show that SL(2, ℤ) is generated
by X and U, which implies that it is also generated by X and Y . Furthermore,
14.3 Group Presentations � 203

1 n
Un = ( );
0 1

therefore, U has infinite order.


Let A = ( ac db ) ∈ SL(2, ℤ). Then we have

−c −d a + kc b + kd
XA = ( ), and U k A = ( )
a b c d

for any k ∈ ℤ. We may assume that |c| ≤ |a| otherwise start with XA rather than A. If
c = 0, then A = ±U q for some q. If A = U q , then certainly A is in the group generated by
X and U. If A = −U q , then A = X 2 U q since X 2 = −I. It follows that here also A is in the
group generated by X and U.
Now suppose c ≠ 0. Apply the Euclidean algorithm to a and c in the following mod-
ified way:

a = q0 c + r1
−c = q1 r1 + r2
r1 = q2 r2 + r3
..
.
(−1)n rn−1 = qn rn + 0,

where rn = ±1 since (a, c) = 1. Then

XU −qn ⋅ ⋅ ⋅ XU −q0 A = ±U qn+1 with qn+1 ∈ ℤ.

Therefore,

A = X m U q0 XU q1 ⋅ ⋅ ⋅ XU qn XU qn+1

with m = 0, 1, 2, 3; q0 , q1 , . . . , qn+1 ∈ ℤ and q0 , . . . , qn ≠ 0. Thus, X and U, and hence X and


Y generate SL(2, ℤ).
We must now show that

X 4 = Y 3 = YX 2 Y −1 X −2 = I

form a complete set of defining relations for SL(2, ℤ), or that every relation on these
generators is derivable from these. It is straightforward to see that X and Y do satisfy
these relations. Assume then that we have a relation

S = X ϵ1 Y α1 X ϵ2 Y α2 ⋅ ⋅ ⋅ Y αn X ϵn+1 = I

with all ϵi , αj ∈ ℤ. Using the set of relations


204 � 14 Free Groups and Group Presentations

X 4 = Y 3 = YX 2 Y −1 X −2 = I,

we may transform S so that

S = X ϵ1 Y α1 XY α2 ⋅ ⋅ ⋅ Y αm X ϵm+1

with ϵ1 , ϵm+1 = 0, 1, 2 or 3 and αi = 1 or 2 for i = 1, . . . , m and m ≥ 0. Multiplying by a


suitable power of X, we obtain

Y α1 X ⋅ ⋅ ⋅ Y αm X = X α = S1

with m ≥ 0 and α = 0, 1, 2 or 3. Assume that m ≥ 1, and let

a −b
S1 = ( ).
−c d

We show by induction that

a, b, c, d ≥ 0, b + c > 0,

or

a, b, c, d ≤ 0, b + c < 0.

This claim for the entries of S1 is true for

1 0 −1 1
YX = ( ), and Y 2 X = ( ).
−1 1 0 −1

a −b1
Suppose it is correct for S2 = ( −c1 ). Then
1 d1

a1 −b1
YXS2 = ( ) and
−(a1 + c1 ) b1 + d1
−a1 − c1 b1 + d1
Y 2 XS2 = ( ).
c1 d1

Therefore, the claim is correct for all S1 with m ≥ 1. This gives a contradiction, for the
entries of X α with α = 0, 1, 2 or 3 do not satisfy the claim. Hence, m = 0, and S can be
reduced to a trivial relation by the given set of relations. Therefore, they are a complete
set of defining relations, and the theorem is proved.

Corollary 14.3.11. The modular group M = PSL(2, ℤ) has the presentation

M = ⟨x, y; x 2 = y3 = 1⟩.
14.3 Group Presentations � 205

Furthermore, x, y can be taken as the linear fractional transformations

1 1
x : z′ = − , and y : z′ = − .
z z+1

Proof. The center of SL(2, ℤ) is ±I. Since X 2 = −I, setting X 2 = I in the presentation for
SL2 (ℤ) gives the presentation for M. Writing the projective matrices as linear fractional
transformations gives the second statement.

This corollary says that M is the free product of a cyclic group of order 2 and a cyclic
group of order 3, a concept we will introduce in Section 14.7.
We note that there is an elementary alternative proof to Corollary 14.3.11 as far as
showing that X 2 = Y 3 = 1 are a complete set of defining relations. As linear fractional
transformations, we have

1 1 z+1
X(z) = − , Y (z) = − , Y 2 (z) = − .
z z+1 z
Now let

ℝ+ = {x ∈ ℝ : x > 0} and ℝ− = {x ∈ ℝ : x < 0}.

Then

X(ℝ− ) ⊂ ℝ+ , and Y α (ℝ+ ) ⊂ ℝ− , α = 1, 2.

Let S ∈ M. Using the relations X 2 = Y 3 = 1 and a suitable conjugation, we may assume


that either S = 1 is a consequence of these relations, or that

S = Y α1 XY α2 ⋅ ⋅ ⋅ XY αn

with 1 ≤ αi ≤ 2 and α1 = αn .
In this second case, if x ∈ ℝ+ , then S(x) ∈ ℝ− ; hence, S ≠ 1.
This type of ping-pong argument can be used in many examples, see [36], [21]
and [31]. As another example, consider the unimodular matrices

0 1 0 −1
A=( ), B=( ).
−1 2 1 2

Let A, B denote the corresponding linear fractional transformations in the modular


group M. We have

−n + 1 n −n + 1 −n
An = ( ), Bn = ( ) for n ∈ ℤ.
−n n+1 n n+1

In particular, A and B have infinite order. Now


206 � 14 Free Groups and Group Presentations

n n
A (ℝ− ) ⊂ ℝ+ and B (ℝ+ ) ⊂ ℝ−

for all n ≠ 0. The ping-pong argument used for any element of the type
n m1 m nk+1
S = A 1B ⋅⋅⋅B kA

with all ni , mi ≠ 0 and n1 + nk+1 ≠ 0 shows that S(x) ∈ ℝ+ if x ∈ ℝ− . It follows that there
are no nontrivial relations on A and B; therefore, the subgroup of M generated by A, B
must be a free group of rank 2.
To close this section, we present a significant number of theoretical applications of
the modular group. First, we need the following corollary to Corollary 14.3.11:

Corollary 14.3.12. Let M = ⟨X, Y ; X 2 = Y 3 = 1⟩ be the modular group. If A is an element


of order 2, then A is conjugate to X. If B is an element of order 3, then B is conjugate to
either Y or Y 2 .

Definition 14.3.13. Let a, n be relatively prime integers with a ≠ 0, n ≥ 1. Then a is


a quadratic residue modulo n if there exists an x ∈ ℤ with x 2 ≡ a (mod n); that is,
a = x 2 + kn for some k ∈ ℤ.

The following is called Fermat’s two-square theorem.

Theorem 14.3.14 (Fermat’s two-square theorem). Let n > 0 be a natural number. Then
n = a2 + b2 with (a, b) = 1 if and only if −1 is a quadratic residue modulo n.

Proof. Suppose −1 is a quadratic residue modulo n, then there exists an x such that x 2 ≡
−1 (mod n) or x 2 = −1 + mn. This implies that −x 2 − mn = 1 so that there must exist a
projective unimodular matrix

x n
A = ±( ).
m −x

It is straightforward that A2 = 1. Therefore, by Corollary 14.3.12, A is conjugate within M


to X. Now consider conjugates of X within M. Let T = ( ac db ). Then

d −b
T −1 = ( ),
−c a

and

a b 0 1 d −b −(bd + ac) a2 + b2
TXT −1 = ( )( )( ) = ±( ). (∗)
c d −1 0 −c a −(c2 + d 2 ) bd + ac

Therefore, any conjugate of X must have the form (∗), and thus A also must have the
form (∗). Therefore, n = a2 + b2 . Furthermore, (a, b) = 1 since in finding the form (∗), we
had ad − bc = 1.
14.3 Group Presentations � 207

Conversely suppose n = a2 + b2 with (a, b) = 1. Then there exist c, d ∈ ℤ with


ad − bc = 1; hence, there exists a projective unimodular matrix

a b
T = ±( ).
c d

Then

α a2 + b2 α n
TXT −1 = ± ( ) = ±( ).
γ −α γ −α

This has determinant one, so

−α2 − nγ = 1 󳨐⇒ α2 = −1 − nγ 󳨐⇒ α2 ≡ −1 (mod n).

Therefore, −1 is a quadratic residue modulo n.

This type of group theoretical proof can be extended in several directions. Kern-
Isberner and Rosenberger, see [34], considered groups of matrices of the form

a b√N
U =( ), a, b, c, d, N ∈ ℤ, ad − Nbc = 1,
c√N d

or

a√N b
U =( ), a, b, c, d, N ∈ ℤ, Nad − bc = 1.
c d √N

They then proved that if

N ∈ {1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 16, 18, 22, 25, 28, 37, 58}

and n ∈ ℕ with (n, N) = 1, then the following hold:


(1) If −N is a quadratic residue modulo n and n is a quadratic residue modulo N, then
n can be written as n = x 2 + Ny2 with x, y ∈ ℤ.
(2) Conversely, if n = x 2 +Ny2 with x, y ∈ ℤ and (x, y) = 1, then −N is a quadratic residue
modulo n, and n is a quadratic residue modulo N.

The proof of the above results depends on the class number of ℚ(√−N) (see [34]).
In another direction, Fine [50] and [49] showed that the Fermat two-square property
is actually a property satisfied by many rings R. These are called sum of squares rings.
For example, if p ≡ 3 (mod 4), then ℤpn for n > 1 is a sum of squares ring.
208 � 14 Free Groups and Group Presentations

14.4 Presentations of Subgroups


Given a group presentation G = ⟨X; R⟩, it is possible to find a presentation for a subgroup
H of G. The procedure to do this is called the Reidemeister–Schreier process and is a
consequence of the explicit version of the Nielsen–Schreier theorem (Theorem 14.2.9).
We give a brief description. A complete description and a verification of its correctness
is found in [37], or in [21].
Let G be a group with the presentation ⟨a1 , . . . , an ; R1 , . . . , Rk ⟩. Let H be a subgroup
of G and T a Schreier system for G modulo H, defined analogously as above.

Reidemeister–Schreier process
Let G, H and T be as above. Then H is generated by the set

{Stav : t ∈ T, av ∈ {a1 , . . . , an }, Stav ≠ 1}

with a complete set of defining relations given by conjugates of the original relators
rewritten in terms of the subgroup generating set.
To actually rewrite the relators in terms of the new generators, we use a mapping τ
on words on the generators of G called the Reidemeister rewriting process. This map is
defined as follows: If

e
W = ave11 ave22 ⋅ ⋅ ⋅ avjj with ei = ±1 defines an element of H

then

e e e
τ(W ) = St11,av St22,av ⋅ ⋅ ⋅ Stjj,av ,
1 2 j

where ti is the coset representative of the initial segment of W preceding avi , if ei = 1


and ti is the representative of the initial segment of W up to and including av−1i if ei = −1.
The complete set of relators rewritten in terms of the subgroup generators is then given
by

{τ(tRi t −1 )} with t ∈ T, and Ri runs over all relators in G.

We present two examples; one with a finite group, and then an important example
with a free group, which shows that a countable free group contains free subgroups of
arbitrary ranks.

Example 14.4.1. Let G = A4 be the alternating group on 4 symbols. Then a presentation


for G is

G = A4 = ⟨a, b; a2 = b3 = (ab)3 = 1⟩.


14.4 Presentations of Subgroups � 209

Let H = A′4 be the commutator subgroup. We use the above method to find a presenta-
tion for H. Now

G/H = A4 /A′4 = ⟨a, b; a2 = b3 = (ab)3 = [a, b] = 1⟩ = ⟨b; b3 = 1⟩.

Therefore, |A4 : A′4 | = 3. A Schreier system is then {1, b, b2 }. The generators for A′4 are
then

X1 = S1a = a, X2 = Sba = bab−1 , X3 = Sb2 a = b2 ab,

whereas the relations are the following:


1. τ(aa) = S1a S1a = X12
2. τ(baab−1 ) = X22
3. τ(b2 aab−2 ) = X32
4. τ(bbb) = 1
5. τ(bbbbb−1 ) = 1
6. τ(b2 bbbb−2 ) = 1
7. τ(ababab) = S1a Sba Sb2 a = X1 X2 X3
8. τ(babababb−1 ) = Sba Sb2 a S1a = X2 X3 X1
9. τ(b2 abababb−2 ) = Sb2 a S1a Sba = X3 X1 X2

Therefore, after eliminating redundant relations and using X3 = X1 X2 , we get as a pre-


sentation for A′4 ,

⟨X1 , X2 ; X12 = X22 = (X1 X2 )2 = 1⟩.

Example 14.4.2. Let F = ⟨x, y; ⟩ be the free group of rank 2. Let H be the commutator
subgroup. Then

F/H = ⟨x, y; [x, y] = 1⟩ = ℤ × ℤ

a free Abelian group of rank 2. It follows that H has infinite index in F. As Schreier coset
representatives, we can take

tm,n = x m yn , m = 0, ±1, ±2, . . . , n = 0, ±1, ±2, . . . .

The corresponding Schreier generators for H are

xm,n = x m yn x −m y−n , m = 0, ±1, ±2, . . . , n = 0, ±1, ±2, . . . .

The relations are only trivial; therefore, H is free on the countable infinitely many gen-
erators above. It follows that a free group of rank 2 contains as a subgroup a free group
of countably infinite rank. Since a free group of countable infinite rank contains as sub-
groups free groups of all finite ranks, it follows that a free group of rank 2 contains as a
subgroup a free subgroup of any arbitrary finite rank.
210 � 14 Free Groups and Group Presentations

Theorem 14.4.3. Let F be free of rank 2. Then the commutator subgroup F ′ is free of count-
able infinite rank. In particular, a free group of rank 2 contains as a subgroup a free group
of any finite rank n.

Corollary 14.4.4. Let n, m be any pair of positive integers n, m ≥ 2 and Fn , Fm free groups
of ranks n, m, respectively. Then Fn can be embedded into Fm , and Fm can be embedded
into Fn .

14.5 Geometric Interpretation


Combinatorial group theory has its origins in topology and complex analysis. Especially
important in the development is the theory of the fundamental group. This connection
is so deep that many people consider combinatorial group theory as the study of the
fundamental group—especially the fundamental group of a low-dimensional complex.
This connection proceeds in both directions. The fundamental group provides methods
and insights to study the topology. In the other direction, the topology can be used to
study the groups.
Recall that if X is a topological space, then its fundamental group based at a point
x0 , denoted by π(X, x0 ), is the group of all homotopy classes of closed paths at x0 . If X
is path-connected, then the fundamental groups at different points are all isomorphic,
and we can speak of the fundamental group of X, which we will denote by π(X). Histori-
cally, group presentations were developed to handle the fundamental groups of spaces,
which allowed simplicial or cellular decompositions. In these cases, the presentation
of the fundamental group can be read off from the combinatorial decomposition of the
space.
An (abstract) simplicial complex or cell complex K is a topological space consisting
of a set of points called the vertices, which we will denote by V (K), and collections of
subsets of vertices called simplexes or cells, which have the property that the intersec-
tion of any two simplices is again a simplex. If n is the number of vertices in a cell, then
n − 1 is called its dimension. Hence, the set of vertices are the 0-dimensional cells, and
a simplex {v1 , . . . , vn } is an (n − 1)-dimensional cell. The 1-dimensional cells are called
edges. These have the form {u, v}, where u and v are vertices. One should think of the
cells in a geometric manner so that the edges are really edges, the 2-cells are filled trian-
gles (which are equivalent to disks), and so on. The maximum dimension of any cell in a
complex K is called the dimension of K. From now on, we will assume that our simplicial
complexes are path-connected.
A graph Γ is just a 1-dimensional simplicial complex. Hence, Γ consists of just vertices
and edges. If K is any complex, then the set of vertices and edges is called the 1-skeleton
of K. Similarly, all the cells of dimension less than or equal to 2 comprise the 2-skeleton.
A connected graph with no closed paths in it is called a tree. If K is any complex, then a
maximal tree in K is a tree that can be contained in no other tree within K.
14.5 Geometric Interpretation � 211

From the viewpoint of combinatorial group theory what is relevant is that if K is


a complex, then a presentation of its fundamental group can be determined from its
2-skeleton and read off directly. In particular the following hold:

Theorem 14.5.1. Suppose that K is a connected cell complex. Suppose that T is a maximal
tree within the 1-skeleton of K. Then a presentation for π(K) can be determined in the
following manner:
Generators: all edges outside of the maximal tree T.
Relations: (a) {u, v} = 1 if {u, v} is an edge in T.
(b) {u, v}{v, w} = {u, w} if u, v, w lie in a simplex of K.

From this the following is obvious:

Corollary 14.5.2. The fundamental group of a connected graph is free. Furthermore, its
rank is the number of edges outside a maximal tree.

A connected graph is homotopic to a wedge or bouquet of circles. If there are n


circles in a bouquet of circles, then the fundamental group is free of rank n. The converse
is also true. A free group can be realized as the fundamental group of a wedge of circles.
An important concept in applying combinatorial group theory is that of a covering
complex.

Definition 14.5.3. Suppose that K is a complex. Then a complex K1 is a covering complex


for K if there exists a surjection p : K1 → K called a covering map with the property that
for any cell s ∈ K the inverse image p−1 (s) is a union of pairwise disjoint cells in K1 , and
p restricted to any of the preimage cells is a homeomorphism.
That is, for each simplex S in K, we have

p−1 (S) = ⋃ Si

and p : Si → S is a bijection for each i.

The following then becomes clear:

Lemma 14.5.4. If K1 is a connected covering complex for K, then K1 and K have the same
dimension.

What is crucial in using covering complexes to study the fundamental group is that
there is a Galois theory of covering complexes and maps. The covering map p induces a
homomorphism of the fundamental group, which we will also call p. Then we have the
following:

Theorem 14.5.5. Let K1 be a covering complex of K with covering map p. Then p(π(K1 )) is
a subgroup of π(K). Conversely, to each subgroup H of π(K), there is a covering complex
K1 with π(K1 ) = H. Hence, there is a one-to-one correspondence between subgroups of the
fundamental group of a complex K and covers of K.
212 � 14 Free Groups and Group Presentations

We will see the analog of this theorem in regard to algebraic field extensions in
Chapter 15.
A topological space X is simply connected if π(X) = {1}. Hence, the covering com-
plex of K corresponding to the identity in π(K) is simply connected. This is called the
universal cover of K since it covers any other cover of K.
Based on Theorem 14.5.1, we get a very simple proof of the Nielsen–Schreier theo-
rem.

Theorem 14.5.6 (Nielsen–Schreier). Any subgroup of a free group is free.

Proof. Let F be a free group. Then F = π(K), where K is a connected graph. Let H be a
subgroup of F. Then H corresponds to a cover K1 of K. But a cover is also 1-dimensional;
hence, H = π(K1 ), where K1 is a connected graph. Therefore, H is also free.

The fact that a presentation of a fundamental group of a simplicial complex is de-


termined by its 2-skeleton going in the other direction also. That is, given an arbitrary
presentation, there exists a 2-dimensional complex, whose fundamental group has that
presentation. Essentially, given a presentation ⟨X; R⟩, we consider a wedge of circles
with cardinality |X|. We then paste on a 2-cell for each relator W in R bounded by the
path corresponding to the word W .

Theorem 14.5.7. Given an arbitrary presentation ⟨X; R⟩, there exists a connected 2-com-
plex K with π(K) = ⟨X; R⟩.

We note that the books by Rotman, see [43], and Fine, Moldenhauer, Rosenberger,
and Wienke, see [26], have significantly detailed and accessible descriptions of groups
and complexes. Cayley, and then Dehn, introduced for each group G a graph, now called
Cayley graph, as a tool to apply complexes to the study of G. The Cayley graph is actually
tied to a presentation, and not to the group itself. Gromov reversed the procedure and
showed that by considering the geometry of the Cayley graph, one could get information
about the group. This led to the development of the theory of hyperbolic groups.
In the following, we need a special kind of generating systems for finitely presented
groups G = ⟨X; R⟩. Let S ⊂ G be a generating system for G. Then S is called a valid
generating system if it has the following two properties:
(a) 1 ∉ S where 1 is the neutral element of G.
(b) the set S is a symmetric generating system, that is, if γ ∈ S then also γ−1 ∈ S.

In the following, the pair (G, S) denotes a finitely presented group G together with a valid
generating system S. Given such a pair we define a metric on G with respect to S in the
following way. Let (G, S) be a pair as above. Then define lS : G → [0, ∞) as follows: If
γ ∈ G, then lS (γ) = 1 if γ = 1, and if γ ≠ 1 then let lS (γ) be the minimal length of a word
that is completely constructed of elements from S that represent γ. This length is also
called S-length.
14.5 Geometric Interpretation � 213

We now define the desired metric dS : G × G → [0, ∞) via dS (γ1 , γ2 ) = lS (γ1−1 γ2 ) and
check that dS is indeed a metric:
1. The equivalence lS (γ) = 0 if and only if γ = 1 implies the equivalence dS (γ1 , γ2 ) = 0
if and only if γ1 = γ2 .
2. We have dS (γ1 , γ2 ) = lS (γ1−1 γ2 ) = lS (γ2−1 γ1 ) = dS (γ2 , γ1 ), because S is symmetric.
3. We have dS (γ1 , γ2 ) ≤ dS (γ1 , β) + dS (β, γ2 ) for all γ1 , γ2 , β ∈ G as γ1−1 γ2 = γ1−1 ββ−1 γ2 .

We give the following remarks.


1. The metric structure on (G, S) depends on the choice of S. Say G = ℤ and S = {±1},
then dS (0, 1) = 1, and if S ′ = {±2, ±3}, then dS′ (0, 1) = 2.
2. The metric structure on (G, S) is induced by the natural metric structure of the Cay-
ley graph with respect to (G, S):
The vertices are elements of G, and two vertices γ1 and γ2 are connected by an edge
if and only if there exists a σ ∈ S with γ1 σ = γ2 . Since γ1 = γ2 σ −1 we get in fact a
directed graph called the Cayley graph with respect to (G, S).
If we parametrize in such a way that any edge of the Cayley graph of (G, S) has length
1, then the metric of (G, S) is induced from that of the Cayley graph of (G, S). Here
we extend the metric for the Cayley graph in the usual way for all pairs of points of
edges by transforming any edge to an interval of length 1. In this manner the Cayley
graph becomes a geodesic metric space. We always consider the Cayley graph in
this way which should not lead to misunderstandings. Any closed path represents
a relation. If G = ⟨X; R⟩ is finitely presented with 1 ∉ X then we may consider
S = X ∪ X −1 and may call (G, S) the Cayley graph of G without misunderstandings.
If we insert a 2-cell for any closed path in the Cayley graph then we obtain a simply
connected 2-dimensional complex, the Cayley complex.

The construction of the Cayley graph depends on the choice of S as well as on the metric
on (G, S). We would like to have an equivalence relation that permits to connect the
different metric spaces for G if we alter S.

Definition 14.5.8. Let (X, d) and (X ′ , d ′ ) be metric spaces. Then (X, d) are (X ′ , d ′ ) are
quasi-isometric, if there are functions f : X → X ′ and g: X ′ → X together with constants
λ > 0 and C ≥ 0, such that
(a) d ′ (f (x), f (y)) ≤ λd(x, y) + C for all x, y ∈ X,
(b) d(g(x ′ ), g(y′ )) ≤ λd ′ (x ′ , y′ ) + C for all x ′ , y′ ∈ X ′ ,
(c) d(g(f (x)), x) ≤ C for all x ∈ X, and
(d) d ′ (f (g(x ′ )), x ′ ) ≤ C for all x ′ ∈ X ′ .

Theorem 14.5.9. Quasi-isometry is an equivalence relation in the class of metric spaces.

Proof. Of course quasi-isometry is reflexive and symmetric. We show transitivity. Let


(X, d) and (X ′ , d ′ ) as well as (X ′ , d ′ ) and (X ′′ , d ′′ ) be quasi-isometric. Thus we have func-
214 � 14 Free Groups and Group Presentations

f f′
tions X 󴀘󴀯 X ′ , X ′ 󴀘󴀯 X ′′ , and constants λ, C and λ′ , C ′ respectively, such that the con-
g g′
f ′′
ditions (a)–(d) are satisfied. We look for functions X 󴀘󴀯 X ′′ and constants λ′′ , C ′′ , such
g ′′
that conditions (a)–(d) are satisfied again. Set f = f ∘ f and g ′′ = g ∘ g ′ , λ′′ = λλ′ and
′′ ′

C ′′ = 2C + 2C ′ + λ′ C + λC ′ . We check the conditions step by step:


(a) Let x, y ∈ X. Then

d ′′ (f ′′ (x), f ′′ (y)) = d ′′ (f ′ (f (x)), f ′ (f (y)))


≤ λ′ d ′ (f (x), f (y)) + C ′
≤ λ′ (λd(x, y) + C) + C ′
= λ′ λd(x, y) + λ′ C + C ′
≤ λ′′ d(x, y) + C ′′ .

(b) This is analogous.


(c) Let x ∈ X. Then (according to our assumption)

d ′ (g ′ ∘ f ′ (f (x)), f (x)) ≤ C ′ ,

and hence, because of (b),

d(g(g ′ ∘ f ′ (f (x))), g(f (x))) ≤ λC ′ + C

which gives

d(g ′′ ∘ f ′′ (x), x) ≤ d(g ′′ ∘ f ′′ (x), g(f (x))) + d(g(f (x)), x)


≤ (λC ′ + C) + C = λC ′ + 2C
≤ C ′′ .

(d) This is analogous.

Theorem 14.5.10. Let G be a group of finitely presented group with finite valid generating
systems S and S ′ . Then the metric spaces (G, S) and (G, S ′ ) are quasi-isometric.

Proof. We look for suitable f , g, λ, and C. Take f = id(G,S) , g = id(G,S′ ) , C = 0, and λ =


max({lS′ (γ) : γ ∈ S} ∪ {lS (γ′ ) : γ′ ∈ S ′ }).
We verify condition (a). Let x, y ∈ (G, S). Then

dS′ (f (x), f (y)) = lS′ (f (x)−1 f (y)) = lS′ (x −1 y).

Our definition of λ permits lS′ (x −1 y) ≤ λlS (x −1 y) because, if we write x −1 y as a prod-


uct of elements of S with length k, then we can surely write x −1 y as a product (of elements
of S ′ ) of length ≤ λk. Hence

dS′ (f (x), f (y)) ≤ λlS (x −1 y) = λdS (x, y) + C.


14.5 Geometric Interpretation � 215

The proof of (b) is analogous and that of (c) and (d) is obvious because f and g are in-
verses for each other.

We observe: The quasi-isometry class of the metric spaces for (G, S), S finite, is an
invariant of the group G and does not depend on the finite generating set S.
We ask: Is this invariant suitable in order to study group theoretical properties of G
and to what extent does quasi-isometry preserve group theoretic properties?
We call two finitely presented groups G1 and G2 quasi-isometric, if the metric spaces
for (G1 , S1 ), S1 a valid generating set for G1 , and (G2 , S2 ), S2 a valid generating set for G2 ,
are quasi-isometric.
Aiming at the motivation of hyperbolic groups we first have to describe a hyperbolic
metric space.

Definition 14.5.11. Let (X, d) be a metric space.


1. Let x0 , x1 ∈ X with a = (x1 − x0 ). A geodesic segment in X starting at x0 and ending in
x1 is an isometry g: [0, a] → X with g(0) = x0 and g(a) = x1 (recall that an isometry is
by definition length preserving). We say that X is a geodesic space if for all x0 , x1 ∈ X
there is a geodesic segment in X starting at x0 and ending at x1 .
2. A geodesic triangle in X with x, y, z ∈ X as vertices is the union of three geodesic
segments with (pairwise) x, y and z as end points.

Note that the definition explicitly allows degenerated triangles, for instance, take
y = z and the geodesic segments from x to y and x to z are different.
An example of a geodesic space is the Cayley graph for a finitely presented group. If
the Cayley graph is not a tree, then it contains a circle (or embedded loop). Hence, there
is more than one geodesic segment allowed between the same pair of points.
We fix the following notation: Let x0 , x1 ∈ X for a geodesic space X. Although several
geodesic segments in X with start points x0 and end points x1 are allowed, we denote by
[x0 , x1 ] a given geodesic segment with x0 and x1 as start and end points.

Definition 14.5.12. Let δ ≥ 0. We say that a geodesic space X satisfies the Rips condition
for the constant δ if for every geodesic triangle [x, y] ∪ [y, z] ∪ [z, x] in X and for every
u ∈ [x, y] the following holds: d(u, [y, z] ∪ [z, x]) ≤ δ, see Figure 14.1. We call a geodesic
space X hyperbolic if it satisfies the Rips condition for a constant δ ≥ 0.

u
y
x Figure 14.1: Geodesic triangle.
216 � 14 Free Groups and Group Presentations

Theorem 14.5.13. Let X1 and X2 be geodesic spaces that are quasi-isometric. If X1 is hy-
perbolic then also X2 is hyperbolic.

A proof is given in [26]. Hence, quasi-isometries respect hyperbolicity.

Definition 14.5.14. Let Γ be a group of finite type. Γ is called hyperbolic group if there is
a finite generating system S such that the metric space for (Γ, S)—or the Cayley graph
for (Γ, S)—is a hyperbolic space.

According to Theorem 14.5.13 the definition of a hyperbolic group is independent of


the choice of a finite generating system X.

Theorem 14.5.15. 1. Let G1 be a subgroup of a hyperbolic group G2 with finite index.


Then G1 is also hyperbolic.
2. Let 1 → Δ → G1 → G2 → 1 be a short exact sequence with Δ finite and G2 hyperbolic,
that is, G2 ≅ G1 /Δ. Then G1 is also hyperbolic.

A proof is given in [26]. Hyperbolic groups have many other important properties
(see, for instance, [26]). We end this section with a collection of examples of hyperbolic
groups.

Example 14.5.16. The following groups are hyperbolic. For proofs see [26].
1. Finite groups and infinite cyclic groups.
2. Fundamental groups of compact, connected Riemann manifolds. Especially, co-
compact Fuchsian and Kleinian groups.
3. One-relator groups with torsion.
4. Free products of finitely many hyperbolic groups, see Section 14.8.
5. A group G of F-type is a group with a presentation

r
G = ⟨a1 , . . . , an ; a11 = ⋅ ⋅ ⋅ = anrn = u(a1 , . . . , ap )v(ap+1 , . . . , an ) = 1⟩

where n ≥ 2, ri = 0 or ri ≥ 2, 1 ≤ p ≤ n − 1, u(a1 , . . . , an ) a cyclically reduced word


in the free product on a1 , . . . , ap which is of infinite order, and v(ap+1 , . . . , an ) is a
cyclically reduced word in the free product on ap+1 , . . . , an which is of infinite order
(see Section 14.8). The group G is hyperbolic unless u(a1 , . . . , ap ) is a proper power
or a product of two elements of order 2 and v(ap+1 , . . . , an ) also is a proper power or
a product of two elements of order 2.
Especially, free groups of finite rank, oriented surface groups of genus g ≥ 2
and nonoriented surface groups of genus g ≥ 3 are of F-type. We remark that
a group of F-type is hyperbolic if and only if it has a faithful representation in
PSL(2, ℝ).
14.6 Presentations of Factor Groups � 217

14.6 Presentations of Factor Groups


Let G be a group with a presentation G = ⟨X; R⟩. Suppose that H is a factor group of G;
that is, H ≅ G/N for some normal subgroup N of G. We show that a presentation for H
is then H = ⟨X; R ∪ R1 ⟩, where R1 is a, perhaps additional, system of relators.

Theorem 14.6.1 (Dyck’s theorem). Let G = ⟨X; R⟩, and suppose that H ≅ G/N, where N is
a normal subgroup of G. Then a presentation for H is ⟨X; R ∪ R1 ⟩ for some set of words R1
on X. Conversely, the presentation ⟨X; R ∪ R1 ⟩ defines a group, that is, a factor group of G.

Proof. Since each element of H is a coset of N, they have the form gN for g ∈ G. It is clear
then that the images of X generate H. Furthermore, since H is a homomorphic image of
G, each relator in R is a relator in H. Let N1 be a set of elements that generate N, and
let R1 be the corresponding words in the free group on X. Then R1 is an additional set of
relators in H. Hence, R ∪ R1 is a set of relators for H. Any relator in H is either a relator
in G, hence a consequence of R, or can be realized as an element of G that lies in N, and
therefore a consequence of R1 . Therefore, R ∪ R1 is a complete set of defining relators for
H, and H has the presentation H = ⟨X; R ∪ R1 ⟩.
Conversely, G = ⟨X; R⟩, G1 = ⟨X; R ∪ R1 ⟩. Then G = F(X)/N1 , where N1 = N(R), and
G1 = F(X)/N2 , where N2 = N(R ∪ R1 ). Hence, N1 ⊂ N2 . The normal subgroup N2 /N1 of
F(X)/N1 corresponds to a normal subgroup of H of G, and therefore by the isomorphism
theorem

G/H ≅ (F(X)/N1 )/(N2 /N1 ) ≅ F(X)/N2 ≅ G1 .

14.7 Decision Problems


We have seen that given any group G, there exists a presentation for it, G = ⟨X; R⟩. In
the other direction, given any presentation ⟨X; R⟩, we have seen that there is a group
with that presentation. In principle, every question about a group can be answered via
a presentation. However, things are not that simple. Max Dehn in his pioneering work
on combinatorial group theory about 1910 introduced the following three fundamental
group decision problems:
(1) Word Problem: Suppose G is a group given by a finite presentation. Is there an algo-
rithm to determine if an arbitrary word w in the generators of G defines the identity
element of G?
(2) Conjugacy Problem: Suppose G is a group given by a finite presentation. Is there
an algorithm to determine if an arbitrary pair of words u, v in the generators of G
define conjugate elements of G?
(3) Isomorphism Problem: Is there an algorithm to determine, given two arbitrary finite
presentations, whether the groups they present are isomorphic or not?
218 � 14 Free Groups and Group Presentations

All three of these problems have negative answers in general. That is, for each of these
problems one can find a finite presentation, for which these questions cannot be an-
swered algorithmically (see [36]). Attempts for solutions, and for solutions in restricted
cases, have been of central importance in combinatorial group theory. For this reason
combinatorial group theory has always searched for and studied classes of groups, in
which these decision problems are solvable.
For finitely generated free groups, there are simple and elegant solutions to all three
problems. If F is a free group on x1 , . . . , xn and W is a freely reduced word in x1 , . . . , xn ,
then W ≠ 1 if and only if L(W ) ≥ 1 for L(W ) the length of W . Since freely reducing
any word to a freely reduced word is algorithmic, this provides a solution to the word
e e e
problem. Furthermore, a freely reduced word W = xv11 xv22 ⋅ ⋅ ⋅ xvnn is cyclically reduced if
v1 ≠ vn , or if v1 = vn , then e1 ≠ −en . Clearly then, every element of a free group is
conjugate to an element given by a cyclically reduced word called a cyclic reduction.
This leads to a solution to the conjugacy problem. Suppose V and W are two words in
the generators of F and V , W are respective cyclic reductions. Then V is conjugate to W
if and only if V is a cyclic permutation of W . Finally, two finitely generated free groups
are isomorphic if and only if they have the same rank.

14.8 Group Amalgams: Free Products and Direct Products


Closely related to free groups in both form and properties are free products of groups.
Let A = ⟨a1 , . . . ; R1 , . . .⟩ and B = ⟨b1 , . . . ; S1 , . . .⟩ be two groups. We consider A and B to be
disjoint. Then we have the following:

Definition 14.8.1. The free product of A and B, denoted by A ∗ B, is the group G with
the presentation ⟨a1 , . . . , b1 , . . . ; R1 , . . . , S1 , . . .⟩; that is, the generators of G consist of the
disjoint union of the generators of A and B with relators taken as the disjoint union of
the relators Ri of A and Sj of B. A and B are called the factors of G.

In an analogous manner, the concept of a free product can be extended to an arbi-


trary collection of groups.

Definition 14.8.2. If Aα = ⟨gens Aα ; rels Aα ⟩, α ∈ ℐ , is a collection of groups, then their


free product G = ∗Aα is the group, whose generators consist of the disjoint union of the
generators of the Aα , and whose relators are the disjoint union of the relators of the Aα .

Free products exist and are nontrivial. In that regard, we have the following:

Theorem 14.8.3. Let G = A ∗ B. Then the maps A → G and B → G are injections. The
subgroup of G generated by the generators of A has the presentation ⟨generators of A;
relators of A⟩, that is, is isomorphic to A. Similarly for B. Thus, A and B can be considered
as subgroups of G. In particular, A ∗ B is nontrivial if A and B are.
14.8 Group Amalgams: Free Products and Direct Products � 219

Free products share many properties with free groups. First of all there is a categor-
ical formulation of free products. Specifically we have the following:

Theorem 14.8.4. A group G is the free product of its subgroups A and B if A and B generate
G, and given homomorphisms f1 : A → H, f2 : B → H into a group H, there exists a unique
homomorphism f : G → H, extending f1 and f2 .

Secondly, each element of a free product has a normal form related to the reduced
words of free groups. If G = A ∗ B, then a reduced sequence or reduced word in G is a
sequence g1 g2 . . . gn , n ≥ 0, with gi ≠ 1, each gi in either A or B and gi , gi+1 not both in the
same factor. Then the following hold:

Theorem 14.8.5. Each element g ∈ G = A ∗ B has a unique representation as a reduced


sequence. The length n is unique and is called the syllable length. The case n = 0 is reserved
for the identity.

A reduced word g1 . . . gn ∈ G = A ∗ B is called cyclically reduced if either n ≤ 1 or


n ≥ 2 and g1 and gn are from different factors. Certainly, every element of G is conjugate
to a cyclically reduced word.
From this, we obtain several important properties of free products, which are anal-
ogous to properties in free groups.

Theorem 14.8.6. An element of finite order in a free product is conjugate to an element of


finite order in a factor. In particular, a finite subgroup of a free product is entirely contained
in a conjugate of a factor.

Theorem 14.8.7. If two elements of a free product commute, then they are both powers
of a single element or are contained in a conjugate of an Abelian subgroup of a factor.

Finally, a theorem of Kurosh extends the Nielsen–Schreier theorem to free products.

Theorem 14.8.8 (Kurosh). A subgroup of a free product is also a free product. Explicitly,
if G = A ∗ B and H ⊂ G, then

H = F ∗ (∗Aα ) ∗ (∗Bβ ),

where F is a free group, (∗Aα ) is a free product of conjugates of subgroups of A, and (∗Bβ )
is a free product of conjugates of subgroups of B.

We note that the rank of F and the number of the other factors can be computed.
A complete discussion of these is in [37], [36] and [21].
If A and B are disjoint groups, then we now have two types of products forming new
groups out of them: the free product and the direct product. In both these products, the
original factors inject. In the free product, there are no relations between elements of A
and elements of B, whereas in a direct product, each element of A commutes with each
element of B. If a ∈ A and b ∈ B, a cross commutator is [a, b] = aba−1 b−1 . The direct
220 � 14 Free Groups and Group Presentations

product is a factor group of the free product, and the kernel is precisely the normal
subgroup generated by all the cross commutators.

Theorem 14.8.9. Suppose that A and B are disjoint groups. Then

A × B = (A ⋆ B)/H,

where H is the normal closure in A ⋆ B of all the cross commutators. In particular, a


presentation for A × B is given by

A × B = ⟨gens A, gens B; rels A, rels B, [a, b] for all a ∈ A, b ∈ B⟩.

This coincides with the concept in Section 10.3.

14.9 Exercises
1. Let X −1 be a set disjoint from X, but bijective to X. A word in X is a finite sequence
of letters from the alphabet. That is, a word has the form
ϵi ϵi ϵ
w = xi 1 xi 2 ⋅ ⋅ ⋅ xi in ,
1 2 n

where xij ∈ X, and ϵij = ±1. Let W (X) be the set of all words on X.
If w1 , w2 ∈ W (X), we say that w1 is equivalent to w2 , denoted by w1 ∼ w2 , if w1 can be
converted to w2 by a finite string of insertions and deletions of trivial words. Verify
that this is an equivalence relation on W (X).
2. In F(X), let N(X) be the subgroup generated by all squares in F(X); that is,

N(X) = ⟨{g 2 : g ∈ F(X)}⟩.

Show that N(X) is a normal subgroup, and that the factor group F(X)/N(X) is
Abelian, where every nontrivial element has order 2.
3. Show that a free group F is torsion-free.
4. Let F be a free group, and a, b ∈ F. Show: If ak = bk , k ≠ 0, then a = b.
5. Let F = ⟨a, b; ⟩ be a free group with basis {a, b}. Let ci = a−i bai , i ∈ ℤ. Show that
then G = ⟨ci , i ∈ ℤ⟩ is free with basis {ci | i ∈ ℤ}.
6. Show that ⟨x, y; x 2 y3 , x 3 y4 ⟩ ≅ ⟨x; x⟩ = {1}.
7. Let G = ⟨v1 , . . . , vn ; v21 ⋅ ⋅ ⋅ v2n ⟩, n ≥ 1, and α : G → ℤ2 be the epimorphism with
α(vi ) = −1 for all i. Let U be the kernel of α. Show that then U has a presentation

U = ⟨x1 , . . . , xn−1 , y1 , . . . , yn−1 ; y1 x1 ⋅ ⋅ ⋅ yn−1 xn−1 y−1


n−1 xn−1 ⋅ ⋅ ⋅ y1 x1 ⟩.
−1 −1 −1

8. Let M = ⟨x, y; x 2 , y3 ⟩ ≅ PSL(2, ℤ) be the modular group. Let M ′ be the commutator


subgroup. Show that M ′ is a free group of rank 2 with a basis {[x, y], [x, y2 ]}.
15 Finite Galois Extensions
15.1 Galois Theory and the Solvability of Polynomial Equations
As we mentioned in Chapter 1, one of the origins of abstract algebra was the problem
of trying to determine a formula for finding the solutions in terms of radicals of a fifth
degree polynomial. It was proved first by Ruffini in 1800 and then by Abel that, in
general, it is impossible to find a formula in terms of radicals for such a solution. In
1820, Galois extended this and showed that such a formula is impossible for any degree
five or greater. In proving this, he laid the groundwork for much of the development
of modern abstract algebra, especially field theory and finite group theory. One of the
goals of this book has been to present a comprehensive treatment of Galois theory and
a proof of the results mentioned above. At this point, we have covered enough general
algebra and group theory to discuss Galois extensions and general Galois theory.
In modern terms, Galois theory is that branch of mathematics, which deals with the
interplay of the algebraic theory of fields, the theory of equations, and finite group the-
ory. This theory was introduced by Evariste Galois about 1830 in his study of the insolv-
ability by radicals of quintic (degree 5) polynomials, a result proved somewhat earlier
by Ruffini, and independently by Abel. Galois was the first to see the close connection
between field extensions and permutation groups. In doing so, he initiated the study of
finite groups. He was the first to use the term group as an abstract concept, although his
definition was really just for a closed set of permutations.
The method Galois developed not only facilitated the proof of the insolvability of the
quintic and higher powers, but led to other applications, and to a much larger theory.
The main idea of Galois theory is to associate to certain special types of algebraic
field extensions called Galois extensions, a group called the Galois group. The properties
of the field extension will be reflected in the properties of the group, which are some-
what easier to examine. Thus, for example, solvability by radicals can be translated into
solvability of groups, which was discussed in Chapter 12. Showing that for every poly-
nomial of degree five or greater, there exists a field extension whose Galois group is not
solvable proves that there cannot be a general formula for solvability by radicals.
The tie-in to the theory of equations is as follows: If f (x) = 0 is a polynomial equation
over some field K, we can form the splitting field K. This is usually a Galois extension,
and therefore has a Galois group called the Galois group of the equation. As before, prop-
erties of this group will reflect properties of this equation.

15.2 Automorphism Groups of Field Extensions


To define the Galois group, we must first consider the automorphism group of a field ex-
tension. In this section, K, L, M will always be (commutative) fields with additive iden-
tity 0 and multiplicative identity 1.

https://doi.org/10.1515/9783111142524-015
222 � 15 Finite Galois Extensions

Definition 15.2.1. Let L|K be a field extension. Then the set

Aut(L|K) = {α ∈ Aut(L) : α|K = the identity on K}

is called the set of automorphisms of L over K. Notice that if α ∈ Aut(L|K), then α(k) = k
for all k ∈ K.

Lemma 15.2.2. Let L|K be a field extension. Then Aut(L|K) forms a group called the Galois
group of L|K.

Proof. Aut(L|K) ⊂ Aut(L). Hence, to show that Aut(L|K) is a group, we only have to show
that its a subgroup of Aut(L). Now the identity map on L is certainly the identity map on
K, so 1 ∈ Aut(L|K); hence, Aut(L|K) is nonempty. If α, β ∈ Aut(L|K), then consider α−1 β.
If k ∈ K, then β(k) = k, and α(k) = k, so α−1 (k) = k.
Therefore, α−1 β(k) = k for all k ∈ K, and hence α−1 β ∈ Aut(L|K). It follows that
Aut(L|K) is a subgroup of Aut(L), and therefore a group.

If f (x) ∈ K[x] \ K and L is the splitting field of f (x) over K, then Aut(L|K) is also
called the Galois group of f (x).

Theorem 15.2.3. If P is the prime field of L, then Aut(L|P) = Aut(L).

Proof. We must show that any automorphism of a prime field P is the identity. Now
if α ∈ Aut(L), then α(1) = 1, and so α(n ⋅ 1) = n ⋅ 1. Therefore, in P, α fixes all integer
multiples of the identity. However, every element of P can be written as a quotient m⋅1
n⋅1
of
integer multiples of the identity. Since α is a field homomorphism and α fixes both the
top and the bottom, it follows that α will fix every element of this form, and hence fix
each element of P.

For splitting fields, the Galois group is a permutation group on the zeros of the defin-
ing polynomial.

Theorem 15.2.4. Let f (x) ∈ K[x] and L the splitting field of f (x) over K. Suppose that f (x)
has zeros α1 , . . . , αn ∈ L.
(a) Then each ϕ ∈ Aut(L|K) is a permutation on the zeros. In particular, Aut(L|K) is
isomorphic to a subgroup of Sn and uniquely determined by the zeros of f (x).
(b) If f (x) is irreducible, then Aut(L|K) operates transitively on {α1 , . . . , αn }. Hence, for
each i, j, there is a ϕ ∈ Aut(L|K) such that ϕ(αi ) = αj .
(c) If f (x) = b(x − α1 ) ⋅ ⋅ ⋅ (x − αn ) with α1 , . . . , αn pairwise distinct and Aut(L|K) operates
transitively on α1 , . . . , αn , then f (x) is irreducible.

Proof. For the proofs, we use the results of Chapter 8.


For (a), let ϕ ∈ Aut(L|K). Then, from Theorem 8.1.5, we obtain that ϕ permutes the
zeros α1 , . . . , αn . Hence, ϕ|{α1 ,...,αn } ∈ Sn . This map then defines a homomorphism

τ : Aut(L|K) → Sn by τ(ϕ) = ϕ|{α1 ,...,αn } .


15.2 Automorphism Groups of Field Extensions � 223

Furthermore, ϕ is uniquely determined by the images ϕ(αi ). It follows that τ is a


monomorphism.
We now prove (b). If f (x) is irreducible, then Aut(L|K) operates transitively on the
set {α1 , . . . , αn }, again following from Theorem 8.1.5.
Finally, for (c), suppose that f (x) = b(x − α1 ) ⋅ ⋅ ⋅ (x − αn ) with α1 , . . . , αn distinct and
f ∈ Aut(L|K) operates transitively on α1 , . . . , αn . Now, assume that f (x) = g(x)h(x) with
g(x), h(x) ∈ K[x] \ K. Without loss of generality, let α1 be a zero of g(x) and αn be a zero
of h(x).
Let α ∈ Aut(L|K) with α(α1 ) = αn . However, α(g(x)) = g(x); that is, α(α1 ) is a zero of
α(g(x)) = g(x), which gives a contradiction since αn is not a zero of g(x). Therefore, f (x)
must be irreducible.

Example 15.2.5. Let f (x) = (x 2 − 2)(x 2 − 3) ∈ ℚ[x]. The field L = ℚ(√2, √3) is the spitting
field of f (x).
Over L, we have

f (x) = (x + √2)(x − √2)(x + √3)(x − √3).

We want to determine the Galois group Aut(L|ℚ) = Aut(L) = G.

Lemma 15.2.6. The Galois group G above is the Klein 4-group.

Proof. First, we show that |Aut(L)| ≤ 4. Let α ∈ Aut(L). Then α is uniquely determined
by α(√2) and α(√3), and

2 2 2
α(2) = 2 = (√2) = α(√2 ) = (α(√2)) .

Hence, α(√2) = ±√2. Analogously, α(√3) = ±√3. From this it follows that |Aut(L)| ≤ 4.
Furthermore, α2 = 1 for any α ∈ G.
Next we show that the polynomial f (x) = x 2 − 3 is irreducible over K = ℚ(√2).
Assume that x 2 − 3 were reducible over K. Then √3 ∈ K. This implies that √3 = ab + dc √2
with a, b, c, d ∈ ℤ and b ≠ 0 ≠ d, and gcd(c, d) = 1. Then bd √3 = ad + bc√2, hence
3b2 d 2 = a2 b2 + 2b2 c2 + 2√2adbc. Since bd ≠ 0, this implies that we must have ac = 0.
If c = 0, then √3 = ab ∈ ℚ, a contradiction. If a = 0, then √3 = dc √2, which implies
3d 2 = 2c2 . It follows from this that 3| gcd(c, d) = 1, again a contradiction.
Hence f (x) = x 2 − 3 is irreducible over K = ℚ(√2).
Since L is the splitting field of f (x) and f (x) is irreducible over K, then there exists
an automorphism α ∈ Aut(L) with α(√3) = −√3 and α|K = IK ; that is, α(√2) = √2.
Analogously, there is a β ∈ Aut(L) with β(√2) = −√2 and β(√3) = √3.
Clearly, α ≠ β, αβ = βα and α ≠ αβ ≠ β. It follows that Aut(L) = {1, α, β, αβ}, complet-
ing the proof.
224 � 15 Finite Galois Extensions

15.3 Finite Galois Extensions


We now define (finite) Galois extensions. First, we introduce the concept of a fix field.
Let K be a field and G a subgroup of Aut(K). Define the set

Fix(K, G) = {k ∈ K : g(k) = k for all g ∈ G}.

Theorem 15.3.1. For a G ⊂ Aut(K), the set Fix(K, G) is a subfield of K called the fix field
of G over K.

Proof. 1 ∈ K is in Fix(K, G), so Fix(K, G) is not empty. Let k1 , k2 ∈ Fix(K, G), and let g ∈ G.
Then g(k1 ± k2 ) = g(k1 ) ± g(k2 ) since g is an automorphism.
Then g(k1 ) ± g(k2 ) = k1 ± k2 , and it follows that k1 ± k2 ∈ Fix(K, G). In an analogous
manner, k1 k2−1 ∈ Fix(K, G) if k2 ≠ 0; therefore, Fix(K, G) is a subfield of K.

Using the concept of a fix field, we define a finite Galois extension.

Definition 15.3.2. The extension L|K is a (finite) Galois extension if there exists a finite
subgroup G ⊂ Aut(L) such that K = Fix(L, G).

We now give some examples of finite Galois extensions:

Lemma 15.3.3. Let L = ℚ(√2, √3) and K = ℚ. Then L|K is a Galois extension.

Proof. Let G = Aut(L|K). From the example in the previous section, there are automor-
phisms α, β ∈ G with

α(√3) = −√3, α(√2) = √2 and β(√2) = −√2, β(√3) = √3.

We have

ℚ(√2, √3) = {c + d √3 : c, d ∈ ℚ(√2)}.

Let t = a1 + b1 √2 + (a2 + b2 √2)√3 ∈ Fix(L, G).


Then applying β, we have

t = β(t) = a1 − b1 √2 + (a2 − b2 √2)√3.

It follows that b1 + b2 √3 = 0; that is, b1 = b2 = 0 since √3 ∉ ℚ. Therefore, t = a1 + a2 √3.


Applying α, we have α(t) = a1 − a2 √3, and hence a2 = 0. Therefore, t = a1 ∈ ℚ. Hence
ℚ = Fix(L, G), and L|K is a Galois extension.
1
Lemma 15.3.4. Let L = ℚ(2 4 ) and K = ℚ. Then L|K is not a Galois extension.
1 1
Proof. Suppose that α ∈ Aut(L) and a = 2 4 . Then a is a zero of x 4 −2, and hence α(a) = 2 4
1 1 1
or α(a) = i2 4 ∉ L since i ∉ L or α(a) = −2 4 or α(a) = −i2 4 ∉ L since i ∉ L. In particular,
α(√2) = √2; therefore,

Fix(L, Aut(L)) = ℚ(√2) ≠ ℚ.


15.4 The Fundamental Theorem of Galois Theory � 225

15.4 The Fundamental Theorem of Galois Theory


We now state the fundamental theorem of Galois theory. This theorem describes the
interplay between the Galois group and Galois extensions. In particular, the result ties
together subgroups of the Galois group and intermediate fields between L and K.

Theorem 15.4.1 (Fundamental theorem of Galois theory). Let L|K be a Galois extension
with Galois group G = Aut(L|K). For each intermediate field E, let τ(E) be the subgroup
of G fixing E. Then the following hold:
(1) τ is a bijection between intermediate fields containing K and subgroups of G.
(2) L|K is a finite extension, and if M is an intermediate field, then |L : M| = |Aut(L|M)|
and |M : K| = |Aut(L|K) : Aut(L|M)|.
(3) If M is an intermediate field, then the following hold:
(a) L|M is always a Galois extension.
(b) M|K is a Galois extension if and only if Aut(L|M) is a normal subgroup of
Aut(L|K).
(4) If M is an intermediate field and M|K is a Galois extension we have the following:
(a) α(M) = M for all α ∈ Aut(L|K).
(b) The map ϕ : Aut(L|K) → Aut(M|K) with ϕ(α) = α|M = β is an epimorphism.
(c) Aut(M|K) = Aut(L|K)/ Aut(L|M).
(5) The lattice of subfields of L containing K is the inverted lattice of subgroups of
Aut(L|K).

We will prove this main result via a series of theorems, and then combine them all.

Theorem 15.4.2. Let G be a group, K a field, and α1 , . . . , αn pairwise distinct group ho-
momorphisms from G to K ⋆ , the multiplicative group of K. Then α1 , . . . , αn are linearly
independent elements of the K-vector space of all homomorphisms from G to K.

Proof. We use induction on n. If n = 1 and kα1 = 0 with k ∈ K, then 0 = kα1 (1) = k ⋅ 1,


and hence k = 0. Now suppose that n ≥ 2, and suppose that each n − 1 of the α1 , . . . , αn
are linearly independent over K. If
n
∑ ki αi = 0, ki ∈ K, (∗)
i=1

then we must show that all ki = 0. Since α1 ≠ αn , there exists an a ∈ G such that α1 (a) ≠
αn (a). Let g ∈ G and apply the sum above to ag. We get
n
∑ ki (αi (a))(αi (g)) = 0. (∗∗)
i=1

Now multiply equation (∗) by αn (a) ∈ K to get


n
∑ ki (αn (a))(αi (g)) = 0. (∗∗∗)
i=1
226 � 15 Finite Galois Extensions

If we subtract equation (∗∗∗) from equation (∗∗), then the last term vanishes and we
have an equation in the n − 1 homomorphism α1 , . . . , αn−1 . Since these are linearly inde-
pendent, we obtain

k1 (α1 (a)) − k1 (αn (a)) = 0

for the coefficient for α1 . Since α1 (a) ≠ αn (a), we must have k1 = 0. Now α2 , . . . , αn−1 are
by assumption linearly independent, so k2 = ⋅ ⋅ ⋅ = kn = 0 also. Hence, all the coefficients
must be zero, and therefore the mappings are independent.

Theorem 15.4.3. Let α1 , . . . , αn be pairwise distinct monomorphisms from the field K into
the field K ′ . Let

L = {k ∈ K : α1 (k) = α2 (k) = ⋅ ⋅ ⋅ = αn (k)}.

Then L is a subfield of K with |L : K| ≥ n.

Proof. Certainly L is a field. Assume that r = |K : L| < n, and let {a1 , . . . , ar } be a basis
of the L-vector space K. We consider the following system of linear equations with r
equations and n unknowns:

(α1 (a1 ))x1 + ⋅ ⋅ ⋅ + (αn (a1 ))xn = 0


..
.
(α1 (ar ))x1 + ⋅ ⋅ ⋅ + (αn (ar ))xn = 0.

Since r < n, there exists a nontrivial solution (x1 , . . . , xn ) ∈ (K ′ )n .


Let a ∈ K. Then
r
a = ∑ lj aj with lj ∈ L.
j=1

From the definition of L, we have

α1 (lj ) = αi (lj ) for i = 2, . . . , n.

Then with our nontrivial solution (x1 , . . . , xn ), we have

n n r r n
∑ xi (αi (a)) = ∑ xi (∑ αi (lj )αi (aj )) = ∑(α1 (lj )) ∑ xi (αi (aj )) = 0
i=1 i=1 j=1 j=1 i=1

since α1 (lj ) = αi (lj ) for i = 2, . . . , n. This holds for all a ∈ K, and hence ∑ni=1 xi αi = 0,
contradicting Theorem 15.4.2. Therefore, our assumption that |K : L| < n must be false,
and hence |K : L| ≥ n.
15.4 The Fundamental Theorem of Galois Theory � 227

Definition 15.4.4. Let K be a field and G a finite subgroup of Aut(K). The map
trG : K → K, given by

trG (k) = ∑ α(k),


α∈G

is called the G-trace of K.

Theorem 15.4.5. Let K be a field and G a finite subgroup of Aut(K). Then

{0} ≠ trG (K) ⊂ Fix(K, G).

Proof. Let β ∈ G. Then

β(trG (k)) = ∑ βα(k) = ∑ α(k) = trG (k).


α∈G α∈G

Therefore, trG (K) ⊂ Fix(K, G).


Now assume that trG (k) = 0 for all k ∈ K. Then ∑α∈G α(k) = 0 for all k ∈ K. It
follows that ∑α∈G α is the zero map; hence, the set of all α ∈ G are linearly dependent as
elements of the K-vector space of all maps from K to K. This contradicts Theorem 15.4.2,
and hence the trace cannot be the zero map.

Theorem 15.4.6. Let K be a field and G a finite subgroup of Aut(K). Then

󵄨󵄨 󵄨
󵄨󵄨K : Fix(K, G)󵄨󵄨󵄨 = |G|.

Proof. Let L = Fix(K, G), and suppose that |G| = n. From Theorem 15.4.3, we know that
|K : L| ≥ n. We must show that |K : L| ≤ n.
Suppose that G = {α1 , . . . , αn }. To prove the result, we show that if m > n and
a1 , . . . , am ∈ K, then a1 , . . . , am are linearly dependent.
We consider the system of equations

(α1−1 (a1 ))x1 + ⋅ ⋅ ⋅ + (α1−1 (am ))xm = 0


..
.
(αn−1 (a1 ))x1 + ⋅ ⋅ ⋅ + (αn−1 (am ))xm = 0.

Since m > n, there exists a nontrivial solution (y1 , . . . , ym ) ∈ K m . Suppose that yl ≠ 0.


Using Theorem 15.4.5, we can choose k ∈ K with trG (k) ≠ 0. Define

(x1 , . . . , xm ) = ky−1
l (y1 , . . . , ym ).

This m-tuple (x1 , . . . , xm ) is then also a nontrivial solution of the system of equations
considered above.
228 � 15 Finite Galois Extensions

Then we have

trG (xl ) = trG (k) since xl = k.

Now we apply αi to the i-th equation to obtain

a1 (α1 (x1 )) + ⋅ ⋅ ⋅ + am (α1 (xm )) = 0


..
.
a1 (αn (x1 )) + ⋅ ⋅ ⋅ + am (αn (xm )) = 0.

Summation leads to
m n m
0 = ∑ aj ∑(αi (xj )) = ∑(trG (xj ))aj
j=1 i=1 j=1

by definition of the G-trace. Hence, a1 , . . . , am are linearly dependent over L since


trG (xl ) ≠ 0. Therefore, |K : L| ≤ n. Combining this with Theorem 15.4.3, we get that
|K : L| = n = |G|.

Theorem 15.4.7. Let K be a field and G a finite subgroup of Aut(K). Then

Aut(K|Fix(K, G)) = G.

Proof. We have G ⊂ Aut(K|Fix(K, G)). Since if g ∈ G, then g ∈ Aut(K), and g fixes


Fix(K, G) by definition. Therefore, we must show that Aut(K|Fix(K, G)) ⊂ G.
Assume then that there exists an α ∈ Aut(K| Fix(K, G)) with α ∉ G. Suppose, as in
the previous proof, |G| = n and G = {α1 , . . . , αn } with α1 = 1. Now

Fix(K, G) = {a ∈ K : a = α2 (a) = ⋅ ⋅ ⋅ = αn (a)}


= {a ∈ K : α(a) = a = α2 (a) = ⋅ ⋅ ⋅ = αn (a)}.

From Theorem 15.4.3, we have that |K : Fix(K, G)| ≥ n+1. However, from Theorem 15.4.6,
|K : Fix(K, G)| = n, getting a contradiction.

Suppose that L|K is a Galois extension. We now establish that the map τ between
intermediate fields K ⊂ E ⊂ L and subgroups of Aut(L|K) is a bijection.

Theorem 15.4.8. Let L|K be a Galois extension. Then we have the following:
(1) Aut(L|K) is finite and

Fix(L, Aut(L|K)) = K.

(2) If H ⊂ Aut(L|K), then

Aut(L|Fix(L, H)) = H.
15.4 The Fundamental Theorem of Galois Theory � 229

Proof. If (L|K) is a Galois extension, there exists a finite subgroup G of Aut(L) with K =
Fix(K, G). From Theorem 15.4.7, we have G = Aut(L|K). In particular, Aut(L|K) is finite,
and K = Fix(L, Aut(L|K)).
Now, let H ⊂ Aut(L|K). From the first part, H is finite, and then Aut(L|Fix(L, H)) = H
from Theorem 15.4.7.

Theorem 15.4.9. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a Galois extension.
(2) |L : K| = |Aut(L|K)| < ∞.
(3) |Aut(L|K)| < ∞, and K = Fix(L, Aut(L|K)).

Proof. (1) ⇒ (2): Now, from Theorem 15.4.8, |Aut(L|K)| < ∞, and Fix(L, Aut(L|K)) = K.
Therefore, from Theorem 15.4.6, |L : K| = |Aut(L|K)|.
(2) ⇒ (3): Let G = Aut(L|K). Then K ⊂ Fix(L, G) ⊂ L. From Theorem 15.4.6, we have
󵄨󵄨 󵄨
󵄨󵄨L : Fix(L, G)󵄨󵄨󵄨 = |G| = |L : K|.

(3) ⇒ (1) follows directly from the definition completing the proof.
We now show that if L|K is a Galois extension, then L|M is also a Galois extension
for any intermediate field M.

Theorem 15.4.10. Let L|K be a Galois extension and K ⊂ M ⊂ L be an intermediate field.


Then L|M is always a Galois extension, and
󵄨 󵄨
|M : K| = 󵄨󵄨󵄨Aut(L|K) : Aut(L|M)󵄨󵄨󵄨.

Proof. Let G = Aut(L|K). Then, from Theorem 15.4.9, |G| < ∞, and K = Fix(L, G). Define
H = Aut(L|M) and M ′ = Fix(L, H). We must show that M ′ = M for then L|M is a Galois
extension.
Since the elements of H fix M, we have M ⊂ M ′ . Let G = ⋃ri=1 αi H, a disjoint union
of the cosets of H. Let α1 = 1, and define βi = αi|M . The β1 , . . . , βr are pairwise distinct for
if βi = βj ; that is αi|M = αj|M . Then αj−1 αi ∈ H, so αi and αj are in the same coset.
We claim that

{a ∈ M : β1 (a) = ⋅ ⋅ ⋅ = βr (a)} = M ∩ Fix(L, G).

Moreover, from Theorem 15.4.9, we know that

M ∩ Fix(L, G) = M ∩ K = K.

To establish the claim, it is clear that

M ∩ Fix(L, G) ⊂ {a ∈ M : β1 (a) = ⋅ ⋅ ⋅ = βr (a)},

since

a = βi (a) = αi (a) for αi ∈ G, a ∈ K.


230 � 15 Finite Galois Extensions

Hence, we must show that

{a ∈ M : β1 (a) = ⋅ ⋅ ⋅ = βr (a)} ⊂ M ∩ Fix(L, G).

To do this, we must show that α(b) = b for all α ∈ G, b ∈ M. We have α ∈ αi H for


some i, and hence α = αi γ for γ ∈ H. We obtain then

α(b) = αi (γ(b)) = αi (b) = βi (b) = b,

proving the inclusion and establishing the claim.


Now, from Theorem 15.4.3, |M : K| ≥ r. From the degree formula, we get
󵄨󵄨 ′ 󵄨󵄨 ′ 󵄨 󵄨 ′󵄨
󵄨󵄨L : M 󵄨󵄨󵄨󵄨󵄨󵄨M : M 󵄨󵄨󵄨|M : K| = |L : K| = |G| = |G : H||H| = r 󵄨󵄨󵄨L : M 󵄨󵄨󵄨,

since, from Theorem 15.4.9, |L : K| = |G| and |H| = |L : M ′ |. Therefore, |M : M ′ | = 1.


Hence, M = M ′ , since |M : K| ≥ r. Now
󵄨 󵄨
|M : K| = |G : H| = 󵄨󵄨󵄨Aut(L|K) : Aut(L|M)󵄨󵄨󵄨,

completing the proof.

Lemma 15.4.11. Let L|K be a field extension and K ⊂ M ⊂ L be an intermediate field. If


α ∈ Aut(L|K), then

Aut(L|α(M)) = α Aut(L|M)α−1 .

Proof. Now, β ∈ Aut(L|α(M)) if and only if β(α(a)) = α(a) for all a ∈ M. This occurs if
and only if α−1 βα(a) = a for all a ∈ M, which is true if and only if β ∈ α Aut(L|M)α−1 .

Lemma 15.4.12. Let L|K be a Galois extension and K ⊂ M ⊂ L be an intermediate field.


Suppose that α(M) = M for all α ∈ Aut(L|K). Then

ϕ : Aut(L|K) → Aut(M|K) with ϕ(α) = α|M

is an epimorphism with kernel ker(ϕ) = Aut(L|M).

Proof. It is clear that ϕ is a homomorphism with ker(ϕ) = Aut(L|M) (see exercises). We


must show that it is an epimorphism.
Let G = im(ϕ). Since L|K is a Galois extension, we get that

Fix(M, G) = Fix(L, Aut(L|K)) ∩ M = K ∩ M = K.

Then, from Theorem 15.4.8, we have

Aut(M|K) = Aut(M|Fix(M, G)) = G,

and therefore ϕ is an epimorphism.


15.4 The Fundamental Theorem of Galois Theory � 231

Theorem 15.4.13. Let L|K be a Galois extension and K ⊂ M ⊂ L be an intermediate field.


Then the following are equivalent:
(1) M|K is a Galois extension.
(2) If α ∈ Aut(L|K), then α(M) = M.
(3) Aut(L|M) is a normal subgroup of Aut(L|K).

Proof. (1) ⇒ (2): Suppose that M|K is a Galois extension. Let Aut(M|K) = {α1 , . . . , αr }.
Consider the αi as monomorphisms from M into L. Let αr+1 : M → L be a monomorphism
with αr+1|K = 1. Then

{a ∈ M : α1 (a) = α2 (a) = ⋅ ⋅ ⋅ = αr (a) = αr+1 (a)} = K,

since M|K is a Galois extension. Therefore, from Theorem 15.4.3, we have that if the
α1 , . . . , αr , αr+1 are distinct, then
󵄨 󵄨
|M : K| ≥ r + 1 > r = 󵄨󵄨󵄨Aut(M|K)󵄨󵄨󵄨 = |M : K|,

giving a contradiction. Hence, if αr+1 ∈ Aut(L|K) is arbitrary, then αr+1|M ∈ {α1 , . . . , αr };


that is, αr+1 fixes M.
(2) ⇒ (1): Suppose that if α ∈ Aut(L|K), then α(M) = M. The map ϕ : Aut(L|K) →
Aut(M|K) with ϕ(α) = α|M is surjective. Since L|K is a Galois extension, then Aut(L|K) is
finite. Therefore, also H = Aut(M|K) is finite. To prove (1) then, it is sufficient to show
that K = Fix(M, H).
The field K ⊂ Fix(M, H) from the definition of the fix field. Hence, we must show
that Fix(M, H) ⊂ K. Assume that there exists an α ∈ Aut(L|K) with α(a) ≠ a for some
a ∈ Fix(M, H). Recall that L|K is a Galois extension, and therefore Fix(L, Aut(L|K)) = K.
Define β = α|M . Then β ∈ H, since α(M) = M and our original assumption. Then β(a) ≠ a,
contradicting a ∈ Fix(M, H). Therefore, K = Fix(M, H), and M|K is a Galois extension.
(2) ⇒ (3): Suppose that if α ∈ Aut(L|K), then α(M) = M. Then Aut(L|M) is a normal
subgroup of Aut(L|K) follows from Lemma 15.4.12, since Aut(L|M) is the kernel of ϕ.
(3) ⇒ (2): Suppose that Aut(L|M) is a normal subgroup of Aut(L|K). Let α ∈ Aut(L|K),
then from our assumption and Lemma 15.4.11, we get that

Aut(L|α(M)) = Aut(L|M).

Now L|M and L|α(M) are Galois extensions by Theorem 15.4.10. Therefore,

α(M) = Fix(L, Aut(L|α(M)) = Fix(L, Aut(L|M)) = M,

completing the proof.

We now combine all of these results to give the proof of Theorem 15.4.1, the funda-
mental theorem of Galois theory.
232 � 15 Finite Galois Extensions

Proof of Theorem 15.4.1. Let L|K be a Galois extension.


For (1), let G ⊂ Aut(L|K). Both G and Aut(L|K) are finite from Theorem 15.4.8. Fur-
thermore, G = Aut(L|Fix(L, G)) from Theorem 15.4.7. Now let M be an intermediate field
of L|K. Then L|M is a Galois extension from Theorem 15.4.10, and then Fix(L, Aut(L|M)) =
M from Theorem 15.4.8.
For (2), let M be an intermediate field of L|K. From Theorem 15.4.10, L|M is a Ga-
lois extension. From Theorem 15.4.9, we have |L : M| = |Aut(L|M)|. Applying Theo-
rem 15.4.10, we get the result on indices
󵄨 󵄨
|M : K| = 󵄨󵄨󵄨Aut(L|K) : Aut(L|M)󵄨󵄨󵄨.

For (3), let M be an intermediate field of L|K. From Theorem 15.4.10, we have that
L|M is a Galois extension, hence (a) holds. From Theorem 15.4.13, M|K is a Galois exten-
sion if and only if Aut(L|M) is a normal subgroup of Aut(L|K), that is, (b) holds.
For (4), let M|K be a Galois extension. Assertion (a) holds because α(M) = M for all
α ∈ Aut(L|K) by Theorem 15.4.13. The map ϕ : Aut(L|K) → Aut(M|K) with ϕ(α) = α|M = β
is an epimorphism by Lemma 15.4.12 and Theorem 15.4.13, hence (b) holds. Assertion (c),
that is, Aut(M|K) = Aut(L|K)/ Aut(L|M), follows directly from the group isomorphism
theorem.
That the lattice of subfields of L containing K is the inverted lattice of subgroups
of Aut(L|K) follows directly from the previous results, this shows (5) and finishes the
proof.
In Chapter 8, we looked at Example 8.1.7. Here, we analyze it further using the Galois
theory.

Example 15.4.14. Let f (x) = x 3 − 7 ∈ ℚ[x]. This has no zeros in ℚ, and since it is of
degree 3, it follows that it must be irreducible in ℚ[x].
Let ω = − 21 + 23 i ∈ ℂ. Then it is easy to show by computation that

1 √3
ω2 = − − i and ω3 = 1.
2 2
Therefore, the three zeros of f (x) in ℂ are

a1 = 71/3 , a2 = ω(71/3 ), a3 = ω2 (71/3 ).

Hence, L = ℚ(a1 , a2 , a3 ) is the splitting field of f (x). Since the minimal polynomial
of all three zeros over ℚ is the same f (x), it follows that

ℚ(a1 ) ≅ ℚ(a2 ) ≅ ℚ(a3 ).

Since ℚ(a1 ) ⊂ ℝ and a2 , a3 are nonreal, it is clear that a2 , a3 ∉ ℚ(a1 ).


Suppose that ℚ(a2 ) = ℚ(a3 ). Then ω = a3 a2−1 ∈ ℚ(a2 ), and so 71/3 = ω−1 a2 ∈ ℚ(a2 ).
Hence, Q(a1 ) ⊂ ℚ(a2 ); therefore, ℚ(a1 ) = ℚ(a2 ) since they are of the same degree over ℚ.
This contradiction shows that ℚ(a2 ) and ℚ(a3 ) are distinct.
15.4 The Fundamental Theorem of Galois Theory � 233

By computation, we have a3 = a1−1 a22 , and hence

L = ℚ(a1 , a2 , a3 ) = ℚ(a1 , a2 ) = ℚ(71/3 , ω).

Now the degree of L over ℚ is

|L : ℚ| = 󵄨󵄨󵄨Q(71/3 , ω) : ℚ(ω)󵄨󵄨󵄨󵄨󵄨󵄨ℚ(ω) : ℚ󵄨󵄨󵄨.


󵄨 󵄨󵄨 󵄨

Now |ℚ(ω) : ℚ| = 2, since the minimal polynomial of ω over ℚ is x 2 + x + 1. Since no


zero of f (x) lies in ℚ(ω), and the degree of f (x) is 3, it follows that f (x) is irreducible over
ℚ(ω). Therefore, we have that the degree of L over ℚ(ω) is 3. Hence, |L : ℚ| = (2)(3) = 6.
Clearly then, we have the following lattice of intermediate fields:

The question then arises as to whether these are all the intermediate fields. The
answer is yes, which we now prove.
Let G = Aut(L|ℚ) = Aut(L). (Aut(L|ℚ) = Aut(L), since ℚ is a prime field.) Now
G ≅ S3 . G acts transitively on {a1 , a2 , a3 }, since f is irreducible. Let δ : ℂ → ℂ be the
automorphism of ℂ taking each element to its complex conjugate; that is, δ(z) = z. Then
δ(f ) = f and δ|L ∈ G (Theorem 8.2.2). Since a1 ∈ ℝ, we get that δ|{a1 ,a2 ,a3 } = (a2 , a3 ), the
2-cycle that maps a2 to a3 and a3 to a2 . Since G is transitive on {a1 , a2 , a3 }, there is a τ ∈ G
with τ(a1 ) = a2 .
Case 1: τ(a3 ) = a3 . Then τ = (a1 , a2 ), and (a1 , a2 )(a2 , a3 ) = (a1 , a2 , a3 ) ∈ G.
Case 2: τ(a3 ) ≠ a3 . Then τ is a 3-cycle. In either case, G is generated by a transposition
and a 3-cycle. Hence, G is all of S3 . Then L|ℚ is a Galois extension from Theorem 15.4.9,
since |G| = |L : ℚ|.
The subgroups of S3 are as follows:
234 � 15 Finite Galois Extensions

Hence, the above lattice of fields is complete. L|ℚ, ℚ|ℚ, ℚ(ω)|ℚ and L|ℚ(ai ) are Ga-
lois extensions, whereas ℚ(ai )|ℚ with i = 1, 2, 3 are not Galois extensions.

15.5 Exercises
1. Let K ⊂ M ⊂ L be a chain of fields, and let ϕ : Aut(L|K) → Aut(M|K) be defined by
ϕ(α) = α|M . Show that ϕ is an epimorphism with kernel ker(ϕ) = Aut(L|M).
1 1
2. Show that ℚ(5 4 )|ℚ(√5) and ℚ(√5)|ℚ are Galois extensions, and ℚ(5 4 )|ℚ is not a
Galois extension.
3. Let L|K be a field extension and u, v ∈ L algebraic over K with |K(u) : K| = m and
|K(v) : K| = n. If m and n are coprime, then |K(u, v) : K| = n ⋅ m.
1 1
4. Let p, q be prime numbers with p ≠ q. Let L = ℚ(√p, q 3 ). Show that L = ℚ(√p ⋅ q 3 ).
1
Determine a basis of L over ℚ and the minimal polynomial of √p ⋅ q 3 .
1
5. Let K = ℚ(2 n ) with n ≥ 2.
(i) Determine the number of ℚ-embeddings σ : K → ℝ. Show that for each such
embedding, we have σ(K) = K.
(ii) Determine Aut(K|ℚ).
6. Let α = √5 + 2√5.
(i) Determine the minimal polynomial of α over ℚ.
(ii) Show that ℚ(a)|ℚ is a Galois extension.
(iii) Determine Aut(ℚ(a)|ℚ).
7. Let K be a field of prime characteristic p, and let f (x) = x p − x + a ∈ K be an
irreducible polynomial. Let L = K(v), where v is a zero of f (x).
(i) If α is a zero of f (x), then also α + 1 is.
(ii) L|K is a Galois extension.
(iii) There is exactly one K-automorphism σ of L with σ(v) = v + 1.
(iv) The Galois group Aut(L|K) is cyclic with generating element σ.
16 Separable Field Extensions
16.1 Separability of Fields and Polynomials
In the previous chapter, we introduced and examined Galois extensions. Recall that L|K
is a Galois extension if there exists a finite subgroup G ⊂ Aut(L) with K = Fix(L, G). The
following questions logically arise:
(1) Under what conditions is a field extension L|K a Galois extension?
(2) If L|K is a Galois extension when L is the splitting field of a polynomial f (x) ∈ K[x]?

In this chapter, we consider these questions and completely characterize Galois exten-
sions. To do this, we must introduce separable extensions.

Definition 16.1.1. Let K be a field. Then a nonconstant polynomial f (x) ∈ K[x] is called
separable over K if each irreducible factor of f (x) has only simple zeros in its splitting
field.

We now extend this definition to field extensions.

Definition 16.1.2. Let L|K be a field extension and a ∈ L. Then a is separable over K if a
is a zero of a separable polynomial. The field extension L|K is a separable field extension,
or just separable if all a ∈ L are separable over K. In particular, a separable extension is
an algebraic extension.

Finally, we consider fields, where every nonconstant polynomial is separable.

Definition 16.1.3. A field K is perfect if each nonconstant polynomial in K[x] is separa-


ble over K.

The following is straightforward from the definitions: An element a is separable


over K if and only if its minimal polynomial ma (x) is separable.
If f (x) ∈ K[x], then f (x) = ∑ni=0 ki x i with ki ∈ K. The formal derivative of f (x) is then
f ′ (x) = ∑ni=1 iki x i−1 . As in ordinary Calculus, we have the usual differentiation rules

(f (x) + g(x)) = f ′ (x) + g ′ (x)


and

(f (x)g(x)) = f ′ (x)g(x) + f (x)g ′ (x)


for f (x), g(x) ∈ K[x].

Lemma 16.1.4. Let K be a field and f (x) an irreducible nonconstant polynomial in K[x].
Then f (x) is separable if and only if its formal derivative is nonzero.

https://doi.org/10.1515/9783111142524-016
236 � 16 Separable Field Extensions

Proof. Let L be the splitting field of f (x) over K. Let f (x) = (x − a)r g(x), where (x − a)
does not divide g(x). Then

f ′ (x) = (x − a)r−1 (rg(x) + (x − a)g ′ (x)).

If f ′ (x) ≠ 0, then a is a zero of f (x) in L over K of multiplicity m ≥ 2 if and only if


(x − a)|f (x), and also (x − a)|f ′ (x).
Let f (x) be a separable polynomial over K[x], and let a be a zero of f (x) in L. Then
if f (x) = (x − a)r g(x) with (x − a) not dividing g(x), we must have r = 1. Then

f ′ (x) = g(x) + (x − a)g ′ (x).

If g ′ (x) = 0, then f ′ (x) = g(x) ≠ 0. Now suppose that g ′ (x) ≠ 0. Assume that f ′ (x) = 0;
then, necessarily, (x − a)|g(x) giving a contradiction. Therefore, f ′ (x) ≠ 0.
Conversely, suppose that f ′ (x) ≠ 0. Assume that f (x) is not separable. Then both f (x)
and f ′ (x) have a common zero a ∈ L. Let ma (x) be the minimal polynomial of a in K[x].
Then ma (x)|f (x), and ma (x)|f ′ (x). Since f (x) is irreducible, then the degree of ma (x) must
equal the degree of f (x). But ma (x) must also have the same degree as f ′ (x), which is less
than that of f (x), giving a contradiction. Therefore, f (x) must be separable.

We now consider the following example of a nonseparable polynomial over the fi-
nite field ℤp of p elements. We will denote this field now as GF(p), the Galois field of p
elements.

Example 16.1.5. Let K = GF(p) and L = K(t), the field of rational functions in t over K.
Consider the polynomial f (x) = x p − t ∈ L[x].
Now K[t]/tK[t] ≅ K. Since K is a field, this implies that tK[t] is a maximal ideal,
and hence a prime ideal in K[t] with prime element t ∈ K[t] (see Theorem 3.2.7). By
the Eisenstein criteria, f (x) is an irreducible polynomial in L[x] (see Theorem 4.4.8).
However, f ′ (x) = px p−1 = 0, since char(K) = p. Therefore, f (x) is not separable.

16.2 Perfect Fields


We now consider when a field K is perfect. First, we show that, in general, any field
of characteristic 0 is perfect. In particular, the rationals ℚ are perfect, and hence any
extension of the rationals is separable.

Theorem 16.2.1. Each field K of characteristic zero is perfect.

Proof. Suppose that K is a field with char(K) = 0. Suppose that f (x) is a nonconstant
polynomial in K[x]. Then f ′ (x) ≠ 0. If f (x) is irreducible, then f (x) is separable from
Lemma 16.1.4. Therefore, by definition, each nonconstant polynomial f (x) ∈ K[x] is sep-
arable.
16.2 Perfect Fields � 237

We remark that in the original motivation for Galois theory, the ground field was
the rationals ℚ. Since this has characteristic zero, it is perfect and all extensions are sep-
arable. Hence, the question of separability did not arise until the question of extensions
of fields of prime characteristic arose.

Corollary 16.2.2. Any finite extension of the rationals ℚ is separable.

We now consider the case of prime characteristic.

Theorem 16.2.3. Let K be a field with char(K) = p ≠ 0. If f (x) is a nonconstant polynomial


in K[x], then the following are equivalent:
(1) f ′ (x) = 0.
(2) f (x) is a polynomial in x p ; that is, there is a g(x) ∈ K[x] with f (x) = g(x p ).

If in (1) and (2) f (x) is irreducible, then f (x) is not separable over K if and only if f (x) is a
polynomial in x p .

Proof. Let f (x) = ∑ni=1 ai x i . Then f ′ (x) = 0 if and only if p|i for all i with ai ≠ 0. But this
is equivalent to

f (x) = a0 + ap x p + ⋅ ⋅ ⋅ + am x mp .

If f (x) is irreducible, then f (x) is not separable if and only if f ′ (x) = 0 from
Lemma 16.1.4.

Theorem 16.2.4. Let K be a field with char(K) = p ≠ 0. Then the following are equivalent:
(1) K is perfect.
(2) Each element in K has a p-th root in K.
(3) The Frobenius homomorphism τ : x 󳨃→ x p is an automorphism of K.

Proof. First we show that (1) implies (2). Suppose that K is perfect, and a ∈ K. Then
x p − a is separable over K. Let g(x) ∈ K[x] be an irreducible factor of x p − a. Let L be
the splitting field of g(x) over K, and b a zero of g(x) in L. Then bp = a. Furthermore,
x p − bp = (x − b)p ∈ L[x], since the characteristic of K is p. Hence, g(x) = (x − b)s , and
then s must equal 1 since g(x) is irreducible. Therefore, b ∈ K, and b is a p-th root of a.
Now we show that (2) implies (3). Recall that the Frobenius homomorphism τ is
injective (see Theorem 1.8.8). We must show that it is also surjective. Let a ∈ K, and let
b be a p-th root of a so that a = bp . Then τ(b) = bp = a, and τ is surjective.
Finally, we show that (3) implies (1). Let τ : x 󳨃→ x p be surjective. It follows that each
a ∈ K has a p-th root in K. Now let f (x) ∈ K[x] be irreducible. Assume that f (x) is not
separable. From Theorem 16.2.3, there is a g(x) ∈ K[x] with f (x) = g(x p ); that is,

f (x) = a0 + a1 x p + ⋅ ⋅ ⋅ + am x mp .

p
Let bi ∈ K with ai = bi . Then
238 � 16 Separable Field Extensions

p p
f (x) = bpo + b1 x p + ⋅ ⋅ ⋅ + bpm x mp = (b0 + b1 x + ⋅ ⋅ ⋅ + bm x m ) .

However, this is a contradiction since f (x) is irreducible. Therefore, f (x) is separable,


completing the proof.

Theorem 16.2.5. Let K be a field with char(K) = p ≠ 0. Then each element of K has at
most one p-th power in K.
p p
Proof. Suppose that b1 , b2 ∈ K with b1 = b2 = a. Then

p p
0 = b1 − b2 = (b1 − b2 )p .

Since K has no zero divisors, it follows that b1 = b2 .

16.3 Finite Fields


In this section, we consider finite fields. In particular, we show that if K is a finite field,
then |K| = pm for some prime p and natural number m > 0. Moreover, we show that if
K1 , K2 are finite fields with |K1 | = |K2 |, then K1 ≅ K2 . Hence, there is a unique finite field
for each possible order.
Notice that if K is a finite field, then by necessity char K = p ≠ 0. We first show that,
in this case, K is always perfect.

Theorem 16.3.1. A finite field is perfect.

Proof. Let K be a finite field of characteristic p > 0. Then the Frobenius map τ is surjec-
tive since it is injective and K is finite. Therefore, K is perfect from Theorem 16.2.4.

Next we show that each finite field has order pm for some prime p and natural num-
ber m > 0.

Lemma 16.3.2. Let K be a finite field. Then |K| = pm for some prime p and natural number
m > 0.

Proof. Let K be a finite field with characteristic p > 0. Then K can be considered as a
vector space over K = GF(p), and hence of finite dimension since |K| < ∞. If α1 , . . . , αm
is a basis, then each f ∈ K can be written as f = c1 α1 + ⋅ ⋅ ⋅ + cn αm with each ci ∈ GF(p).
Hence, there are p choices for each ci , and therefore pm choices for each f .

In Theorem 9.5.16, we proved that any finite subgroup of the multiplicative group
of a field is cyclic. If K is a finite field, then its multiplicative subgroup K ⋆ is finite, and
hence cyclic.

Lemma 16.3.3. Let K be a finite field. Then its multiplicative subgroup K ⋆ is cyclic.
16.4 Separable Extensions � 239

If K is a finite field with order pm , then its multiplicative subgroup K ⋆ has order
p − 1. Then, from Lagrange’s theorem, each nonzero element to the power pm is the
m

identity. Therefore, we have the result.

Lemma 16.3.4. Let K be a field of order pm . Then each α ∈ K is a zero of the polynomial
m m
x p − x. In particular, if α ≠ 0, then α is a zero of x p −1 − 1.

If K is a finite field of order pm , it is a finite extension of GF(p). Since the multiplica-


tive group is cyclic, we must have K = GF(p)(α) for some α ∈ K. From this, we obtain
that for a given possible finite order, there is only one finite field up to isomorphism.

Theorem 16.3.5. Let K1 , K2 be finite fields with |K1 | = |K2 |. Then K1 ≅ K2 .

Proof. Let |K1 | = |K2 | = pm . From the remarks above, K1 = GF(p)(α), where α has order
pm − 1 in K1⋆ . Similarly, K2 = GF(p)(β), where β also has order pm − 1 in K2⋆ . Hence,
GF(p)(α) ≅ GF(p)(β), and therefore K1 ≅ K2 .

In Lemma 16.3.2, we saw that if K is a finite field, then |K| = pn for some prime p and
positive integer n. We now show that given a prime power pn , there does exist a finite
field of that order.

Theorem 16.3.6. Let p be a prime and n > 0 a natural number. Then there exists a field K
of order pn .
n
Proof. Given a prime p, consider the polynomial g(x) = x p − x ∈ GF(p)[x]. Let K be the
splitting field of this polynomial over GF(p). Since a finite field is perfect, K is a separable
extension, and hence all the zeros of g(x) are distinct in K.
Let F be the set of pn distinct zeros of g(x) within K. Let a, b ∈ F. Since
n n n n n n
(a ± b)p = ap ± bp and (ab)p = ap bp ,

it follows that F forms a subfield of K. However, F contains all the zeros of g(x), and
since K is the smallest extension of GF(p) containing all the zeros of g(x), we must have
K = F. Since F has pn elements, it follows that the order of K is pn .

Combining Theorems 16.3.5 and 16.3.6, we get the following summary result, indi-
cating that up to isomorphism there exists one and only one finite field of order pn .

Theorem 16.3.7. Let p be a prime and n > 0 a natural number. Then up to isomorphism,
there exists a unique finite field of order pn .

16.4 Separable Extensions


In this section, we consider some properties of separable extensions.
240 � 16 Separable Field Extensions

Theorem 16.4.1. Let K be a field with K ⊂ L and L algebraically closed. Let α : K → L be a


monomorphism. Then the number of monomorphisms β : K(a) → L with β|K = α is equal
to the number of pairwise distinct zeros in L of the minimal polynomial ma of a over K.

Proof. Let β be as in the statement of the theorem. Then β is uniquely determined by


β(a), and β(a) is a zero of the polynomial β(ma (x)) = α(ma (x)). Now let a′ be a zero of
α(ma (x)) in L. Then there exists a β : K(a) → L with β(a) = a′ from Theorem 7.1.4.
Therefore, α has exactly as many extensions β as α(ma (x)) has pairwise distinct zeros
in L. The number of pairwise distinct zeros of α(ma (x)) is equal to the number of pair-
wise distinct zeros of ma (x). This can be seen as follows: Let L0 be a splitting field of
ma (x) and L1 ⊂ L a splitting field of α(ma (x)). From Theorems 8.1.5 and 8.1.6, there is an
isomorphism ψ : L0 → L1 , which maps the zeros of ma (x) onto the zeros of α(ma (x)).

Lemma 16.4.2. Let L|K be a finite extension with L ⊂ L, and L algebraically closed. In
particular, L = K(a1 , . . . , an ), where the ai are algebraic over K. Let pi be the number of
pairwise distinct zeros of the minimal polynomial mai of ai over K(a1 , . . . , an−1 ) in L. Then
there are exactly p1 , . . . , pn monomorphisms β : L → L with β|K = 1K .

Proof. From Theorem 16.4.1, there are exactly p1 monomorphisms α : K(a1 ) → L with
α|K equal to the identity on K. Each such α has exactly p2 extensions of the identity on K
to K(a1 , a2 ). We now continue in this manner.

Theorem 16.4.3. Let L|K be a field extension with M an intermediate field. If a ∈ L is


separable over K, then it is also separable over M.

Proof. This follows directly from the fact that the minimal polynomial of a over M di-
vides the minimal polynomial of a over K.

Theorem 16.4.4. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is finite and separable.
(2) There are finitely many separable elements a1 , . . . , an over K with K = K(a1 , . . . , an ).
(3) L|K is finite, and if L ⊂ L with L algebraically closed, then there are exactly [L : K]
monomorphisms α : L → L with α|K = 1K .

Proof. That (1) implies (2) follows directly from the definitions. We show then that (2)
implies (3). Let L = K(a1 , . . . , an ), where a1 , . . . , an are separable elements over K. The
extension L|K is finite (see Theorem 5.3.4).
Let pi be the number of pairwise distinct zeros in L of the minimal polynomial
mai (x) = fi (x) of ai over K(a1 , . . . , ai−1 ). Then

󵄨 󵄨
pi ≤ deg(fi ) = 󵄨󵄨󵄨K(a1 , . . . , ai ) : K(a1 , . . . , ai−1 )󵄨󵄨󵄨.

Hence, pi = deg(fi (x)) since ai is separable over K(a1 , . . . , ai−1 ) from Theorem 16.4.3.
Therefore, [L : K] = p1 ⋅ ⋅ ⋅ pn is equal to the number of monomorphisms α : L → L with
α|K , the identity on K.
16.4 Separable Extensions � 241

Finally, we show that (3) implies (1). Suppose then the conditions of (3). Since L|K is
finite, there are finitely many a1 , . . . , an ∈ L with L = K(a1 , . . . , an ). Let pi and fi (x) be as
in the proof above, and hence pi ≤ deg(fi (x)). By assumption we have

[L : K] = p1 ⋅ ⋅ ⋅ pn

equal to the number of monomorphisms α : L → L with α|K , the identity on K. Also

[L : K] = p1 ⋅ ⋅ ⋅ pn ≤ deg(f1 (x)) ⋅ ⋅ ⋅ deg(fn (x)) = [L : K].

Hence, pi = deg(fi (x)). Therefore, by definition, each ai is separable over K.


To complete the proof, we must show that L|K is separable. Inductively, it suffices
to prove that K(a1 )|K is separable over K whenever a1 is separable over K, and not in K.
This is clear if char(K) = 0, because K is perfect.
p
Suppose then that char(K) = p > 0. First, we show that K(a1 ) = K(a1 ). Certainly,
p p p p
K(a1 ) ⊂ K(a1 ). Assume that a1 ∉ K(a1 ). Then g(x) = x − a1 is the minimal polynomial
p
of a1 over K. This follows from the fact that x p − a1 = (x − a1 )p , and hence there can be
p
no irreducible factor of x p − a1 of the form (x − a1 )m with m < p and m|p.
However, it follows then, in this case, that g ′ (x) = 0, contradicting the separability
p
of a1 over K. Therefore, K(a1 ) = K(a1 ).
Let E = K(a1 ), then also E = K(E p ), where E p is the field generated by the p-th
powers of E. Now let b ∈ E = K(a1 ). We must show that the minimal polynomial of b,
say mb (x), is separable over K. Assume that mb (x) is not separable over K. Then

k
mb (x) = ∑ bi x pi , bi ∈ K, bk = 1
i=0

from Theorem 16.2.3. We have

b0 + b1 bp + ⋅ ⋅ ⋅ + bk bpk = 0.

Therefore, the elements 1, bp , . . . , bpk are linearly dependent over K.


Since K(a1 ) = E = K(E p ), we find that 1, b, . . . , bk are linearly dependent also, since if
they were independent the p-th powers would also be independent. However, this is not
possible, since k < deg(mb (x)). Therefore, mb (x) is separable over K, and hence K(a1 )|K
is separable. Altogether L|K is then finite and separable, completing the proof.

Theorem 16.4.5. Let L|K be a field extension, and let M be an intermediate field. Then the
following are equivalent:
(1) L|K is separable.
(2) L|M and M|K are separable.

Proof. We first show that (1) implies (2): If L|K is separable then L|M is separable by
Theorem 16.4.3, and M|K is separable.
242 � 16 Separable Field Extensions

Now suppose (2), and let M|K and L|M be separable. Let a ∈ L, and let

ma (x) = f (x) = b0 + ⋅ ⋅ ⋅ + bn−1 x n−1 + x n

be the minimal polynomial of a over M. Then f (x) is separable. Let

M ′ = K(b1 , . . . , bn−1 ).

We have K ⊂ M ′ ⊂ M, and hence M ′ |K is separable, since M|K is separable. Further-


more, a is separable over M ′ , since f (x) is separable, and f (x) ∈ M ′ [x]. From Theo-
rem 16.4.1, there are m = deg(f (x)) = [M ′ (a) : M ′ ] extensions of α : M ′ → M with
M the algebraic closure of M ′ . Since M ′ |K is separable and finite, there are [M ′ : K]
monomorphisms α : M ′ → M from Theorem 16.4.4. Altogether, there are [M ′ (a) : K]
monomorphisms α : M ′ → M with α|K , the identity on K. Therefore, M ′ (a)|K is sep-
arable from Theorem 16.4.4. Hence, a is separable over K, and then L|K is separable.
Therefore, (2) implies (1).

Theorem 16.4.6. Let L|K be a field extension, and let S ⊂ L such that all elements of S are
separable over K. Then K(S)|K is separable, and K[S] = K(S).

Proof. Let W be the set of finite subsets of S. Let T ∈ W . From Theorem 16.4.4, we
obtain that K(T)|K is separable. Since each element of K(S) is contained in some K(T),
we have that K(S)|K is separable. Since all elements of S are algebraic, we have that
K[S] = K(S).

Theorem 16.4.7. Let L|K be a field extension. Then there exists in L a uniquely determined
maximal field M with the property that M|K is separable. If a ∈ L is separable over M,
then a ∈ M. M is called the separable hull of K in L.

Proof. Let S be the set of all elements in L, which are separable over K. We now define
M = K(S). Then M|K is separable from Theorem 16.4.6. Now, let a ∈ L be separable
over M. Then M(a)|M is separable from Theorem 16.4.4. Furthermore, M(a)|K is sepa-
rable from Theorem 16.4.5. It follows that a ∈ M.

16.5 Separability and Galois Extensions


We now completely characterize Galois extensions L|K as finite, normal, separable ex-
tensions.

Theorem 16.5.1. Let L|K be a field extension. Then the following are equivalent:
(1) L|K is a Galois extension.
(2) L is the splitting field of a separable polynomial in K[x].
(3) L|K is finite, normal, and separable.

Therefore, we may characterize Galois extensions of a field K as finite, normal, and sepa-
rable extensions of K.
16.5 Separability and Galois Extensions � 243

Proof. Recall from Theorem 8.2.2 that an extension L|K is normal if the following hold:
(1) L|k is algebraic, and
(2) each irreducible polynomial f (x) ∈ K[x] that has a zero in L splits into linear factors
in L[x].

Now suppose that L|K is a Galois extension. Then L|K is finite from Theorem 15.4.1.
Let L = K(b1 , . . . , bm ) and mbi (x) = fi (x) be the minimal polynomial of bi over K. Let
ai1 , . . . , ain be the pairwise distinct elements from

Hi = {α(bi ) : α ∈ Aut(L|K)}.

Define

gi (x) = (x − ai1 ) ⋅ ⋅ ⋅ (x − ain ) ∈ L[x].

If α ∈ Aut(L|K), then α(gi ) = gi , since α permutes the elements of Hi . This means that the
coefficients of gi (x) are in Fix(L, Aut(L|K)) = K. Furthermore, gi (x) ∈ K[x], because bi is
one of the aij , and fi (x)|gi (x). The group Aut(L|K) acts transitively on {ai1 , . . . , ain } by the
choice of ai1 , . . . , ain . Therefore, each gi (x) is irreducible (see Theorem 15.2.4). It follows
that fi (x) = gi (x). Now, fi (x) has only simple zeros in L; that is, no zero has multiplicity
≥ 2, and hence fi (x) splits over L. Thus, L is a splitting field of f (x) = f1 (x) ⋅ ⋅ ⋅ fm (x), and
f (x) is separable by definition. Hence, (1) implies (2).
Now suppose that L is a splitting field of the separable polynomial f (x) ∈ K[x], and
L|K is finite. From Theorem 16.4.4, we get that L|K is separable, since L = K(a1 , . . . , an )
with each ai separable over K. Therefore, L|K is normal from Definition 8.2.1. Hence,
(2) implies (3).
Finally, suppose that L|K is finite, normal, and separable. Since L|K is finite and
separable from Theorem 16.4.4, there exist exactly [L : K] monomorphisms α : L →
L, L, the algebraic closure of L, with α|K the identity on K. Since L|K is normal, these
monomorphisms are already automorphisms of L from Theorem 8.2.2.
Hence, [L : K] ≤ |Aut(L|K)|. Furthermore, |L : K| ≥ |Aut(L|K)| from Theorem 15.4.3.
Combining these, we have [L : K] = Aut(L|K), and hence L|K is a Galois extension from
Theorem 15.4.9. Therefore, (3) implies (1), completing the proof.

Recall that any field of characteristic 0 is perfect, and therefore any finite extension
is separable. Applying this to ℚ implies that the Galois extensions of the rationals are
precisely the splitting fields of polynomials.

Corollary 16.5.2. The Galois extensions of the rationals are precisely the splitting fields
of polynomials in ℚ[x].

Theorem 16.5.3. Let L|K be a finite, separable field extension. Then there exists an exten-
sion field M of L such that M|K is a Galois extension.
244 � 16 Separable Field Extensions

Proof. Let L = K(a1 , . . . , an ) with all ai separable over K. Let fi (x) be the minimal poly-
nomial of ai over K. Then each fi (x), and hence also f (x) = f1 (x) ⋅ ⋅ ⋅ fn (x), is separable
over K. Let M be the splitting field of f (x) over K. Then M|K is a Galois extension from
Theorem 16.5.1.

Example 16.5.4. Let K = ℚ be the rationals, and let f (x) = x 4 −2 ∈ ℚ[x]. From Chapter 8,
we know that L = ℚ(√2, i) is a splitting field of f (x). By the Eisenstein criteria, f (x) is
4

irreducible, and [L : ℚ] = 8. Moreover,

√4 2, i√4 2, −√4 2, −i√4 2

are the zeros of f (x). Since the rationals are perfect, f (x) is separable. L|K is a Galois
extension by Theorem 16.5.1. From the calculations in Chapter 15, we have
󵄨󵄨 󵄨 󵄨 󵄨
󵄨󵄨Aut(L|K)󵄨󵄨󵄨 = 󵄨󵄨󵄨Aut(L)󵄨󵄨󵄨 = [L : K] = 8.

Let

G = Aut(L|K) = Aut(L|ℚ) = Aut(L).

We want to determine the subgroup lattice of the Galois group G. We show G ≅ D4 , the
dihedral group of order 8. Since there are 4 zeros of f (x), and G permutes these, G must
be a subgroup of S4 , and since the order is 8, G is a 2-Sylow subgroup of S4 . From this,
we have that

G = ⟨(2, 4), (1, 2, 3, 4)⟩.

If we let τ = (2, 4) and σ = (1, 2, 3, 4), we get the isomorphism between G and D4 . From
Theorem 14.1.1, we know that D4 = ⟨r, f ; r 4 = f 2 = (rf )2 = 1⟩.
This can also be seen in the following manner. Let
4 4 4 4
a1 = √2, a2 = i√2, a3 = −√2, a4 = −i√2.

Let α ∈ G. α is determined if we know α(√2) and α(i). The possibilities for α(i) are i or
4

−i; that is, the zeros of x 2 + 1.


The possibilities for √2 are the 4 zeros of f (x) = x 4 −2. Hence, we have 8 possibilities
4

for α. These are exactly the elements of the group G. We have δ, τ ∈ G with
4 4
δ(√2) = i√2, δ(i) = i

and
4 4
τ(√2) = √2, τ(i) = −i.

It is straightforward to show that δ has order 4, τ has order 2, and δτ has order 2. These
define a group of order 8 isomorphic to D4 , and since G has 8 elements, this must be all
of G.
16.5 Separability and Galois Extensions � 245

We now look at the subgroup lattice of G, and then the corresponding field lattice.
Let δ and τ be as above. Then G has 5 subgroups of order 2

{1, δ2 }, {1, τ}, {1, δτ}, {1, δ2 τ}, {1, δ3 τ}.

Of these only {1, δ2 } is normal in G.


G has 3 subgroups of order 4

{1, δ, δ2 , δ3 }, {1, δ2 , τ, τδ2 }, {1, δ2 , δτ, δ3 τ},

and all are normal since they all have index 2.


Hence, we have the following subgroup lattice:

From this we construct the lattice of fields and intermediate fields. Since there are
10 proper subgroups of G from the fundamental theorem of Galois theory, there are 10
intermediate fields in L|ℚ, namely, the fix fields Fix(L, H), where H is a proper subgroup
of G. In the identification, the extension field corresponding to the whole group G is the
ground field ℚ (recall that the lattice of fields is the inverted lattice of the subgroups),
whereas the extension field corresponding to the identity is the whole field L. We now
consider the other proper subgroups. Let δ, τ be as before.
(1) Let M1 = Fix(L, {1, τ}). Now, {1, τ} fixes ℚ(√2) elementwise such that ℚ(√2) ⊂ M1 .
4 4

Furthermore, [L : M1 ] = |{1, τ}| = 2, and hence [L : ℚ(√2)] = 2. Hence, M1 = ℚ(√2).


4 4

(2) Consider M2 = Fix(L, {1, τδ}). We have the following:

4 4 4
τδ(√2) = τ(i√2) = −i√2
4 4 4
τδ(i√2) = τ(−√2) = −√2
4 4 4
τδ(−√2) = τ(−i√2) = i√2
4 4 4
τδ(−i√2) = τ(√2) = √2.

It follows that τδ fixes (1 − i)√2, and hence M2 = ℚ((1 − i)√2).


4 4
246 � 16 Separable Field Extensions

(3) Consider M3 = Fix(L, {1, τδ2 }). The map τδ2 interchanges a1 and a3 and fixes a2
and a4 . Therefore, M3 = ℚ(i√2).
4

In an analogous manner, we can then consider the other 5 proper subgroups and corre-
sponding intermediate fields. We get the following lattice of fields and subfields:

16.6 The Primitive Element Theorem


In this section, we describe finite separable field extensions as simple extensions. It fol-
lows that a Galois extension is always a simple extension.

Theorem 16.6.1 (Primitive element theorem). Let L = K(γ1 , . . . , γn ), and suppose that
each γi is separable over K. Then there exists a γ0 ∈ L such that L = K(γ0 ). The element
γ0 is called a primitive element.

Proof. Suppose first that K is a finite field. Then L is also a finite field, and therefore
L⋆ = ⟨γ0 ⟩ is cyclic. Therefore, L = K(γ0 ), and the theorem is proved if K is a finite field.
Now suppose that K is infinite. Inductively, it suffices to prove the theorem for n = 2.
Hence, let α, β ∈ L be separable over K. We must show that there exists a γ ∈ L with
K(α, β) = K(γ).
Let L be the splitting field of the polynomial mα (x)mβ (x) over L, where mα (x), mβ (x)
are, respectively, the minimal polynomials of α, β over K. In L[x], we have the following:

mα (x) = (x − α1 )(x − α2 ) ⋅ ⋅ ⋅ (x − αs ) with α = α1


mβ (x) = (x − β1 )(x − β2 ) ⋅ ⋅ ⋅ (x − βt ) with β = β1 .

By assumption the αi and the βj are, respectively, pairwise distinct.


For each pair (i, j) with 1 ≤ i ≤ s, 2 ≤ j ≤ t, the equation

α1 + zβ1 = αi + zβj
16.6 The Primitive Element Theorem � 247

has exactly one solution z ∈ L, since βj − β1 ≠ 0 if j ≥ 2. Since K is infinite, there exists a


c ∈ K with

α1 + cβ1 ≠ αi + cβj

for all i, j with 1 ≤ i ≤ s, 2 ≤ j ≤ t. With such a value c ∈ K, we define

γ = α + cβ = α1 + cβ1 .

We claim that K(α, β) = K(γ) holds. It suffices to show that β ∈ K(γ), for then α =
γ − cβ ∈ K(γ). This implies that K(α, β) ⊂ K(γ), and since γ ∈ K(α, β), it follows that
K(α, β) = K(γ). To show that β ∈ K(γ), we first define f (x) = mα (γ − cx), and let d(x) =
gcd(f (x), mβ (x)). We may assume that d(x) is monic. We show that d(x) = x − β. Then
β ∈ K(γ), since d(x) ∈ K(γ)[x].
Assume first that d(x) = 1. Then gcd(f (x), mβ (x)) = 1, and f (x) and mβ (x) are also
relatively prime in L[x]. This is a contradiction, since f (x) and mβ (x) have the common
zero β ∈ L, and hence the common divisor x − β.
Therefore, d(x) ≠ 1, so deg(d(x)) ≥ 1.
The polynomial d(x) is a divisor of mβ (x), and hence d(x) splits into linear factors
of the form x − βj , 1 ≤ j ≤ t in L[x]. The proof is completed if we can show that no linear
factor of the form x − βj with 2 ≤ j ≤ t is a divisor of f (x). That is, we must show that
f (βj ) ≠ 0 in L if j ≥ 2.
Now f (βj ) = mα (γ − cβj ) = mα (α1 + cβ1 − cβj ). Suppose that f (βj ) = 0 for some j ≥ 2.
This would imply that αi = α1 +cβ1 −cβj ; that is, α1 +cβ1 = αj +cβj for j ≥ 2. This contradicts
the choice of the value c. Therefore, f (βj ) ≠ 0 if j ≥ 2, completing the proof.

In the above theorem, it is sufficient to assume that n − 1 of γ1 , . . . , γn are separable


over K. The proof is similar. We only need that the β1 , . . . , βt are pairwise distinct if β is
separable over K to show that K(α, β) = K(γ) for some γ ∈ L.
If K is a perfect field, then every finite extension is separable. Therefore, we get the
following corollary:

Corollary 16.6.2. Let L|K be a finite extension with K a perfect field. Then L = K(γ) for
some γ ∈ L.

Corollary 16.6.3. Let L|K be a finite extension with K a perfect field. Then there exist only
finitely many intermediate fields E with K ⊂ E ⊂ L.

Proof. Since K is a perfect field, we have L = K(γ) for some γ ∈ L. Let mγ (x) ∈ K[x]
be the minimal polynomial of γ over K, and let L be the splitting field of mγ (x) over K.
Then L|K is a Galois extension; hence, there are only finitely many intermediate fields
between K and L. Therefore, also only finitely many fields between K and L.

Suppose that L|K is algebraic. Then, in general, L = K(γ) for some γ ∈ L if and only
if there exist only finitely many intermediate fields E with K ⊂ E ⊂ L.
248 � 16 Separable Field Extensions

This condition on intermediate fields implies that L|K is finite if L|K is algebraic.
Hence, we have proved this result, in the case that K is perfect. The general case is dis-
cussed in the book of S. Lang [13].

16.7 Exercises
1. Let f (x) = x 4 −8x 3 +24x 2 −32x+14 ∈ ℚ[x], and let v ∈ ℂ be a zero of f . Let α := v(4−v),
and K a splitting field of f over ℚ. Show the following:
(i) f is irreducible over ℚ, and f (x) = f (4 − x).
(ii) There is exactly one automorphism σ of ℚ(v) with σ(v) = 4 − v.
(iii) L := ℚ(α) is the Fix field of σ and |L : ℚ| = 2.
(iv) Determine the minimal polynomial of α over ℚ and determine α.
(v) |ℚ(v) : L| = 2, and determine the minimal polynomial of v over L; also deter-
mine v and all other zeros of f (x).
(vi) Determine the degree of |K : ℚ|.
(vii) Determine the structure of Aut(K|ℚ).
2. Let L|K be a field extension and f ∈ K[x] a separable polynomial. Let Z be a splitting
field of f over L and Z0 a splitting field of f over K. Show that Aut(Z|L) is isomorphic
to a subgroup of Aut(Z0 |K).
3. Let L|K be a field extension and v ∈ L. For each element c ∈ K it is K(v + c) = K(v).
For c ≠ 0, it is K(cv) = K(v).
4. Let v = √2+ √3 and let K = ℚ(v). Show that √2 and √3 are presentable as a ℚ-linear
combination of 1, v, v2 , v3 . Conclude that K = ℚ(√2, √3).
5. Let L be the splitting field of x 3 − 5 over ℚ in ℂ. Determine a primitive element t of
L over ℚ.
17 Applications of Galois Theory
As we mentioned in Chapter 1, Galois theory was originally developed as part of the
proof that polynomial equations of degree 5 or higher over the rationals cannot be
solved by formulas in terms of radicals. In this chapter, we do this first and prove the in-
solvability of the quintic polynomials by radicals. To do this, we must examine in detail
what we call radical extensions.
We then return to some geometric material we started in Chapter 6. There, using
general field extensions, we proved the impossibility of certain geometric compass and
straightedge constructions. Here, we use Galois theory to consider constructible n-gons.
Finally, we will use Galois theory to present a proof of the fundamental theorem of
algebra, which says, essentially, that the complex number field ℂ is algebraically closed.
In Chapter 17, we always assume that K is a field of characteristic 0; in particular, K
is perfect. We remark that some parts of Sections 17.1–17.4 go through for finite fields of
characteristic p > 3.

17.1 Field Extensions by Radicals


We would like to use Galois theory to prove the insolvability by radicals of polynomial
equations of degree 5 or higher. To do this we must introduce extensions by radicals and
solvability by radicals.

Definition 17.1.1. Let L|K be a field extension.


(1) Each zero of a polynomial x n − a ∈ K[x] in L is called a radical (over K). We denote
it by √n a (if a more detailed identification is not necessary).
(2) L is called a simple extension of K by a radical if L = K(√n a) for some a ∈ K.
(3) L is called an extension of K by radicals if there is a chain of fields

K = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Lm = L

such that each Li is a simple extension of Li−1 by a radical for each i = 1, . . . , m.


(4) Let f (x) ∈ K[x]. Then the equation f (x) = 0 is solvable by radicals, or just solvable,
if the splitting field of f (x) over K is contained in an extension of K by radicals.

In proving the insolvability of the quintic polynomial, we will look for necessary
and sufficient conditions for the solvability of polynomial equations. Our main result
will be that if f (x) ∈ K[x], then f (x) = 0 is solvable over K if the Galois group of the
splitting field of f (x) over K is a solvable group (see Chapter 11).
In the remainder of this section, we assume that all fields have characteristic zero.
The next theorem gives a characterization of simple extensions by radicals:

https://doi.org/10.1515/9783111142524-017
250 � 17 Applications of Galois Theory

Theorem 17.1.2. Let L|K be a field extension and n ∈ ℕ. Assume that the polynomial x n −1
splits into linear factors in K[x] so that K contains all the n-th roots of unity.
Then L = K(√n a) for some a ∈ K if and only if L is a Galois extension over K, and if
Aut(L|K) = ℤ/mℤ for some m ∈ ℕ with m|n.

Proof. The n-th roots of unity, that is, the zeros of the polynomial x n − 1 ∈ K[x], form a
cyclic multiplicative group ℱ ⊂ K ⋆ of order n, since each finite subgroup of the multi-
plicative group K ⋆ of K is cyclic, and |ℱ | = n. We call an n-th root of unity ω primitive if
ℱ = ⟨ω⟩.
Now let L = K(√n a) with a ∈ K; that is, L = K(β) with βn = a ∈ K. Let ω be a
primitive n-th root of unity. With this β, the elements ωβ, ω2 β, . . . , ωn β = β are zeros of
x n − a. Hence, the polynomial x n − a splits into linear factors over L; hence, L = K(β) is
a splitting field of x n − a over K. It follows that L|K is a Galois extension.
Let σ ∈ Aut(L|K). Then σ(β) = ων β for some 0 < ν ≤ n. The element ων is uniquely
determined by σ, and we may write ων = ωσ .
Consider the map ϕ : Aut(L|K) → ℱ given by σ → ωσ , where ωσ is defined as above
by σ(β) = ωσ β. If τ, σ ∈ Aut(L|K), then

στ(β) = σ(ωτ )σ(β) = ωτ ωσ β,

because ωτ ∈ K.
Therefore, ϕ(στ) = ϕ(σ)ϕ(τ); hence, ϕ is a homomorphism. The kernel ker(ϕ) con-
tains all the K-automorphisms of L, for which σ(β) = β. However, since K = K(β), it
follows that ker(ϕ) contains only the identity. The Galois group Aut(L|K) is, therefore,
isomorphic to a subgroup of ℱ . Since ℱ is cyclic of order n, we have that Aut(L|K) is
cyclic of order m for some m|n, completing one way in the theorem.
Conversely, first suppose that L|K is a Galois extension with Aut(L|K) = ℤn , a cyclic
group of order n. Let σ be a generator of Aut(L|K). This is equivalent to

Aut(L|K) = {σ, σ 2 , . . . , σ n = 1}.

Let ω be a primitive n-th root of unity. Then, by assumption, ω ∈ K, σ(ω) = ω, and ℱ =


{ω, ω2 , . . . , ωn = 1}. Furthermore, the pairwise distinct automorphism σ ν , ν = 1, 2, . . . , n,
of L are linearly independent; that is, there exists an η ∈ L such that

n
ω ⋆ η = ∑ ων σ ν (η) ≠ 0.
ν=1

The element ω ⋆ η is called the Lagrange resolvent of ω by η. We fix such an element


η ∈ L. Then we get, since σ(ω) = ω,

n n n+1
σ(ω ⋆ η) = ∑ ων σ ν+1 (η) = ω−1 ∑ ων+1 σ ν+1 (η) = ω−1 ∑ ων σ ν (η)
ν=1 ν=1 ν=2
17.1 Field Extensions by Radicals � 251

n
= ω−1 ∑ ων σ ν (η) = ω−1 (ω ⋆ η).
ν=1

Moreover, σ μ (ω ⋆ η) = ω−μ (ω ⋆ η), μ = 1, 2, . . . , n. Hence, the only K-automorphism of L,


which fixes ω ⋆ η is the identity. Therefore, Aut(L|K(ω ⋆ η)) = {1}; hence, L = K(ω ⋆ η)
by the fundamental theorem of Galois theory.
Furthermore,
n n
σ((ω ⋆ η)n ) = (σ(ω ⋆ η)) = (ω−1 (ω ⋆ η)) = ω−n (ω ⋆ η)n = (ω ⋆ η)n .

Therefore, (ω ⋆ η)n ∈ Fix(L, Aut(L|K)) = K, again from the fundamental theorem of


Galois theory. If a = (ω ⋆ η)n ∈ K, then first a ∈ K, and second L = K(√n a) = K(ω ⋆ η).
This proves the result in the case where m = n. We now use this to prove it in general.
Finally, suppose that L|K is a Galois extension with Aut(L|K) = ℤm , a cyclic group
of order m, where n = qm for some q ≥ 1. If n = qm, then L = K( √b) for some b ∈ K by
m

the above argument. Hence, L = K(β) with βm ∈ K. Then certainly, a = βn = (βm )q ∈ K;


therefore, L = K(β) = K(√n a) for some a ∈ K, completing the general case.

We next show that every extension by radicals is contained in a Galois extension by


radicals.

Theorem 17.1.3. Each extension L of K by radicals is contained in a Galois extension L̃ of


K by radicals. This means that there is an extension L̃ of K by radicals with L ⊂ L,̃ and L|K
̃
is a Galois extension.

Proof. We use induction on the degree m = [L : K]. Suppose that m = 1. If L = K(√n a),
then if ω is a primitive n-th root of unity, define K̃ = K(ω) and L̃ = K(̃ √n a). We then get
the chain K ⊂ K̃ ⊂ L̃ with L ⊂ L,̃ and L|K ̃ is a Galois extension. This last statement is due
to the fact that L is the splitting field of the polynomial x n − a ∈ K[x] over K. Hence, the
̃
theorem is true if m = 1.
Now suppose that m ≥ 2, and suppose that the theorem is true for all extensions F
of K by radicals with [F : K] < m.
Since m ≥ 2 by the definition of extension by radicals, there exists a simple extension
L|E by a radical. That is, there exists a field E with

K ⊂ E ⊂ L, [L : E] ≥ 2

and L = E(√n a) for some a ∈ E, n ∈ ℕ. Now [E : K] < m. Therefore, by the inductive


hypothesis, there exists a Galois extension by radicals Ẽ of K with E ⊂ E.̃
Let G = Aut(E|K)
̃ and L̃ be the splitting field of the polynomial f (x) = ma (x n ) ∈ K[x]
over E,̃ where ma (x) is the minimal polynomial of a over K. We show that L̃ has the
desired properties.
Now √n a ∈ L is a zero of the polynomial f (x), and E ⊂ Ẽ ⊂ L.̃ Therefore, L̃ contains
an E-isomorphic image of L = K(√n a); hence, we may consider L̃ as an extension of L.
252 � 17 Applications of Galois Theory

Since Ẽ is a Galois extension of K, the polynomial f (x) may be factored as

f (x) = (x n − α1 ) ⋅ ⋅ ⋅ (x n − αs )

with αi ∈ Ẽ for i = 1, . . . , s. All zeros of f (x) in L̃ are radicals over E.̃ Therefore, L̃ is an
extension by radicals of E.̃ Since Ẽ is also an extension by radicals of K, we obtain that
L̃ is an extension by radicals of K.
Since Ẽ is a Galois extension of K, we have that Ẽ is a splitting field of a polynomial
g(x) ∈ K[x]. Furthermore, L̃ is a splitting field of f (x) ∈ K[x] over E.̃ Altogether then, we
have that L̃ is a splitting field of f (x)g(x) ∈ K[x] over K. Therefore, L̃ is a Galois extension
of K, completing the proof.

We will eventually show that a polynomial equation is solvable by radicals if and


only if the corresponding Galois group is a solvable group. We now begin to find condi-
tions, where the Galois group is solvable.

Lemma 17.1.4. Let K = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Lr = L be a chain of fields such that the following


hold:
(i) L is a Galois extension of K.
(ii) Lj is a Galois extension of Lj−1 for j = 1, . . . , r.
(iii) Gj = Aut(Lj |Lj−1 ) is Abelian for j = 1, . . . , r.

Then G = Aut(L|K) is solvable.

Proof. We prove the lemma by induction on r. If r = 0, then G = {1}, and there is nothing
to prove. Suppose then that r ≥ 1, and assume that the lemma holds for all such chains
of fields with a length r ′ < r. Since L1 |K is a Galois extension, then Aut(L1 |K) is a normal
subgroup of G by the fundamental theorem of Galois theory. Moreover,

G1 = Aut(L1 |K) = G/ Aut(L|L1 ).

Since G1 is an Abelian group, it is solvable, and by assumption Aut(L|L1 ) is solvable.


Therefore, G is solvable (see Theorem 12.1.4).

Lemma 17.1.5. Let L|K be a field extension. Let K̃ and L̃ be the splitting fields of the poly-
nomial x n − 1 ∈ K[x] over K and L, respectively. Since K ⊂ L, we have K̃ ⊂ L.̃ Then the
following hold:
(1) If σ ∈ Aut(L|L),
̃ then σ|K̃ ∈ Aut(K|K),
̃ and the map

Aut(L|L)
̃ → Aut(K|K),
̃ given by σ 󳨃→ σ|K̃ ,

is an injective homomorphism.
(2) Suppose that in addition L|K is a Galois extension. Then L|K
̃ is also a Galois extension.
If furthermore, σ ∈ Aut(L|̃ K),
̃ then σ|L ∈ Aut(L|K), and
17.2 Cyclotomic Extensions � 253

Aut(L|̃ K)̃ → Aut(L|K), given by σ 󳨃→ σ|L ,

is an injective homomorphism.

Proof. (1) Let ω be a primitive nth root of unity. Then K̃ = K(ω), and L̃ = L(ω). Each
σ ∈ Aut(L|L)̃ maps ω onto a primitive nth root of unity, and fixes K ⊂ L elementwise.
Hence, from σ ∈ Aut(L|L), ̃ we get that σ|K̃ ∈ Aut(K|K).
̃ Certainly, the map σ 󳨃→ σ|K̃
defines a homomorphism Aut(L|L) → Aut(K|K). Let σ|K̃ = 1 with σ ∈ Aut(L|L).
̃ ̃ ̃ Then
σ(ω) = ω; therefore, we have already that σ = 1, since L̃ = L(ω).
(2) If L is the splitting field of a polynomial g(x) over K, then L̃ is the splitting field of
g(x)(x n − 1) over K. Hence, L|K ̃ is a Galois extension. Therefore, K ⊂ L ⊂ L,̃ and L|K, L|L ̃
and L|K are all Galois extensions. Therefore, from the fundamental theorem of Galois
̃
theory

Aut(L|K) = {σ|L : σ ∈ Aut(L|K)}.


̃

In particular, σ|L ∈ Aut(L|K) if σ ∈ Aut(L|̃ K).


̃ Certainly, the map Aut(L|̃ K)̃ → Aut(L|K),
given by σ 󳨃→ σ|L , is a homomorphism. From σ ∈ Aut(L|̃ K), ̃ we get that σ(ω) = ω,
where—as above—ω is a primitive nth root of unity. Therefore, if σ|L = 1, then already,
σ = 1, since L̃ = L(ω). Hence, the map is injective.

17.2 Cyclotomic Extensions


Very important in the solvability by radicals problem are the splitting fields of the poly-
nomials x n − 1 over ℚ. These are called cyclotomic fields.

Definition 17.2.1. The splitting field of the polynomial x n − 1 ∈ ℚ[x] with n ≥ 2 is called
the nth cyclotomic field denoted by kn .

We have kn = ℚ(ω), where ω is a primitive nth root of unity. For example, consider
2πi
ω = e n over ℚ. kn |ℚ is a Galois extension, and the Galois group Aut(kn |ℚ) is the set of
automorphisms σm : ω → ωm with 1 ≤ m ≤ n and gcd(m, n) = 1.
To understand this group G, we need the following concept: A prime residue class
modulo n is a residue class a + nℤ with gcd(a, n) = 1. The set of the prime residue classes
modulo n is just the set of invertible elements with respect to multiplication of the ℤ/nℤ.
This forms a multiplicative group that we denote by (ℤ/nℤ)⋆ = Pn . We have |Pn | = ϕ(n),
where ϕ(n) is the Euler phi-function. If G = Aut(kn |ℚ), then G ≅ Pn under the map
σm 󳨃→ m + nℤ. If n = p is a prime number, then G = Aut(kp |ℚ) is cyclic with |G| = p − 1.
If n = p2 , then |G| = |Aut(kp2 |ℚ)| = p(p − 1), since

2
x p −1 x − 1
= x p(p−1) + x p(p−1)−1 + ⋅ ⋅ ⋅ + 1.
x − 1 xp − 1
254 � 17 Applications of Galois Theory

Lemma 17.2.2. Let K be a field and K̃ be the splitting field of x n − 1 over K. Then Aut(K|K)
̃
is Abelian.

Proof. We apply Lemma 17.1.5 for the field extension K|ℚ. This can be done since the
characteristic of K is zero, and ℚ is the prime field of K. It follows that Aut(K|K)
̃ is iso-
morphic to a subgroup of Aut(ℚ|ℚ) from part (1) of Lemma 17.1.5. But ℚ = kn , and hence
̃ ̃
Aut(ℚ|ℚ)
̃ is Abelian. Therefore, Aut(K|K)
̃ is Abelian.

17.3 Solvability and Galois Extensions


In this section, we prove that solvability by radicals is equivalent to the solvability of the
Galois group.

Theorem 17.3.1. Let L|K be a Galois extension of K by radicals. Then G = Aut(L|K) is a


solvable group.

Proof. Suppose that L|K is a Galois extension. Then we have a chain of fields

K = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Lr = L
n
such that Lj = Lj−1 ( √j aj ) for some aj ∈ Lj . Let n = n1 ⋅ ⋅ ⋅ nr , and let L̃ j be the splitting field
of the polynomial x n − 1 ∈ K[x] over Lj for each j = 0, 1, . . . , r. Then L̃ j = L̃ j−1 ( √j aj ), and
n

we get the chain

K ⊂ K̃ = L̃ 0 ⊂ L̃ 1 ⊂ ⋅ ⋅ ⋅ ⊂ L̃ r = L.̃

From part (2) of Lemma 17.1.5, we get that L|K ̃ is a Galois extension. Furthermore,
̃Lj |L̃ j−1 is a Galois extension with Aut(L̃ j |L̃ j−1 ) cyclic from Theorem 17.1.2. In particular,
Aut(L̃ j |L̃ j−1 ) is Abelian. The group Aut(K|K)̃ is Abelian from Lemma 17.2.2. Therefore,
we may apply Lemma 17.1.4 to the chain

K ⊂ K̃ = L̃ 0 ⊂ ⋅ ⋅ ⋅ ⊂ L̃ r = L.̃

Therefore, G̃ = Aut(L|K)
̃ is solvable. The group G = Aut(L|K) is a homomorphic im-
age of G from the fundamental theorem of Galois theory. Since homomorphic images of
̃
solvable groups are still solvable (see Theorem 12.1.3), it follows that G is solvable.

Lemma 17.3.2. Let L|K be a Galois extension, and suppose that G = Aut(L|K) is solv-
able. Assume further that K contains all q-th roots of unity for each prime divisor q of
m = [L : K]. Then L is an extension of K by radicals.

Proof. Let L|K be a Galois extension, and suppose that G = Aut(L|K) is solvable; also
assume that K contains all the q-th roots of unity for each prime divisor q of m = [L : K].
We prove the result by induction on m.
17.4 The Insolvability of the Quintic Polynomial � 255

If m = 1, then L = K, and the result is clear. Now suppose that m ≥ 2, and as-
sume that the result holds for all Galois extensions L′ |K ′ with [L′ : K ′ ] < m. Now
G = Aut(L|K) is solvable, and G is nontrivial since m ≥ 2. Let q be a prime divisor of m.
From Lemma 12.1.2 and Theorem 13.3.5, it follows that there is a normal subgroup H of
G with G/H cyclic of order q. Let E = Fix(L, H). From the fundamental theorem of Galois
theory, E|K is a Galois extension with Aut(E|K) ≅ G/H, and hence Aut(E|K) is cyclic of
order q. From Theorem 17.1.2, E|K is a simple extension of K by a radical. The proof is
completed if we can show that L is an extension of E by radicals.
The extension L|E is a Galois extension, and the group Aut(L|E) is solvable, since it
is a subgroup of G = Aut(L|K). Each prime divisor p of [L : E] is also a prime divisor of
m = [L : K] by the degree formula. Hence, as an extension of K, the field E contains all
the p-th roots of unity. Finally,

[L : K] m
[L : E] = = < m.
[E : K] q

Therefore, L|E is an extension of E by radicals from the inductive assumption, complet-


ing the proof.

17.4 The Insolvability of the Quintic Polynomial


We are now able to prove the insolvability of the quintic polynomial. This is one of the
most important applications of Galois theory. As aforementioned, we do this by equating
the solvability of a polynomial equation by radicals to the solvability of the Galois group
of the splitting field of this polynomial.

Theorem 17.4.1. Let K be a field of characteristic 0, and let f (x) ∈ K[x]. Suppose that L
is the splitting field of f (x) over K. Then the polynomial equation f (x) = 0 is solvable by
radicals if and only if Aut(L|K) is solvable.

Proof. Suppose first that f (x) = 0 is solvable by radicals. Then L is contained in an ex-
tension L′ of K by radicals. Hence, L is contained in a Galois extension L̃ of K by radicals
from Theorem 17.1.3. The group G̃ = Aut(L|K) ̃ is solvable from Theorem 17.3.1. Further-
more, L|K is a Galois extension. Therefore, the Galois group Aut(L|K) is solvable as a
subgroup of G.̃
Conversely, suppose that the group Aut(L|K) is solvable. Let q1 , . . . , qr be the prime
divisors of m = [K : K], and let n = q1 ⋅ ⋅ ⋅ qr . Let K̃ and L̃ be the splitting fields of the
polynomial x n − 1 ∈ K[x] over K and L, respectively. We have K̃ ⊂ L.̃ From part (2) of
Lemma 17.1.5, we have that L|K ̃ is a Galois extension, and Aut(L|̃ K)̃ is isomorphic to a
subgroup of Aut(L|K). From this, we first obtain that [L̃ : K]̃ = |Aut(L|̃ K)|̃ is a divisor of
[L : K] = |Aut(L|K)|. Hence, each prime divisor q of [L : K] is also a prime divisor of
̃ ̃
[L : K]. Therefore, L̃ is an extension by radicals of K̃ by Lemma 17.3.2. Since K̃ = K(ω),
where ω is a primitive n-th root of unity, we obtain that L̃ is also an extension of K by
256 � 17 Applications of Galois Theory

radicals. Therefore, L is contained in an extension L̃ of K by radicals; therefore, f (x) = 0


is solvable by radicals.

Corollary 17.4.2. Let K be a field of characteristic 0, and let f (x) ∈ K[x] be a polynomial
of degree m with 1 ≤ m ≤ 4. Then the equation f (x) = 0 is solvable by radicals.

Proof. Let L be the splitting field of f (x) over K. The Galois group Aut(L|K) is isomorphic
to the subgroup of the symmetric group Sm . Now the group S4 is solvable via the chain

{1} ⊂ ℤ2 ⊂ D2 ⊂ A4 ⊂ S4 ,

where ℤ2 is the cyclic group of order 2, and D2 is the Klein 4-group, which is isomorphic
to ℤ2 × ℤ2 . Because Sm ⊂ S4 for 1 ≤ m ≤ 4, it follows that Aut(L|K) is solvable. From
Theorem 17.4.1, the equation f (x) = 0 is solvable by radicals.

Corollary 17.4.2 uses the general theory to show that any polynomial equation of
degree less than or equal to 4 is solvable by radicals. This, however, does not provide
explicit formulas for the solutions. We present these below:
Let K be a field of characteristic 0, and let f (x) ∈ K[x] be a polynomial of degree
m with 1 ≤ m ≤ 4. As mentioned above, we assume that K is the splitting field of the
respective polynomial.
Case (1): If deg(f (x)) = 1, then f (x) = ax + b with a, b ∈ K and a ≠ 0. A zero is then
given by k = − ab .
Case (2): If deg(f (x)) = 2, then f (x) = ax 2 + bx + c with a, b, c ∈ K and a ≠ 0. The
zeros are then given by the quadratic formula

−b ± √b2 − 4ac
k= .
2a

We note that the quadratic formula holds over any field of characteristic not equal to 2.
Whether there is a solution within the field K then depends on whether b2 − 4ac has a
square root within K.
For the cases of degrees 3 and 4, we have the general forms of what are known as
Cardano’s formulas.
Case (3): If deg(f (x)) = 3, then f (x) = ax 3 + bx 2 + cx + d with a, b, c, d ∈ K and a ≠ 0.
Dividing through by a, we may assume, without loss of generality, that a = 1.
By a substitution x = y − b3 , the polynomial is transformed into

g(y) = y3 + py + q ∈ K[y].

Let L be the splitting field of g(y) over K, and let α ∈ L be a zero of g(y) so that

α3 + pα + q = 0.

If p = 0, then α = √−q
3
so that g(y) has the three zeros
17.4 The Insolvability of the Quintic Polynomial � 257

3
√−q, ω√−q,
3
ω2√−q,
3

where ω is a primitive third root of unity, ω3 = 1 with ω ≠ ω2 .


Now let p ≠ 0, and let β be a zero of x 2 − αx − p3 in a suitable extension L′ of L.
p
We have β ≠ 0, since p ≠ 0. Hence, α = β − 3β . Putting this into the transformed cubic
equation

α3 + pα + q = 0,

we get

p3
β3 − + q = 0.
27β3

Define γ = β3 and δ = ( −p

)3 so that

γ + δ + q = 0.

Then
3 3
p p3 p
γ2 + qγ − ( ) = 0 and − +δ+q=0 and δ2 + qδ − ( ) = 0.
3 27δ 3

Hence, the zeros of the polynomial

3
p
x 2 + qx − ( )
3

are

2 3
q q p
γ, δ = − ± √( ) + ( ) .
2 2 3

If we have γ = δ, then both are equal to − q2 , and

2 3
√( q ) + ( p ) = 0.
2 3

p
Then from the definitions of γ, δ, we have γ = β3 , and δ = ( −p

)3 . From above, α = β − 3β .
Therefore, we get α by finding the cube roots of γ and δ.
There are certain possibilities and combinations with these cube roots, but because
of the conditions, the cube roots of γ and δ are not independent. We must satisfy the
condition
258 � 17 Applications of Galois Theory

−p p
√3 γ√3 δ = β =− .
3β 3

Therefore, we get the final result:


The zeros of g(y) = y3 + py + q with p ≠ 0 are

u + v, ωu + ω2 v, ω2 u + ωv,

where ω is a primitive third root of unity, and

2 3 2 3
3 q q p 3 q q p
u = √− + √( ) + ( ) and v = √− − √( ) + ( ) .
2 2 3 2 2 3

The above is known as the cubic formula, or Cardano’s formula.


Case (4): If deg(f (x)) = 4, then f (x) = ax 4 + bx 3 + cx 2 + dx + e with a, b, c, d, e ∈ K and
a ≠ 0. Dividing through by a, we may assume without loss of generality that a = 1.
By a substitution x = y − b4 , the polynomial f (x) is transformed into

g(y) = y4 + py2 + qy + r.

We have to find the zeros of g(y). Let x1 , x2 , x3 , x4 be the solutions in the splitting field of
the polynomial

y4 + py2 + qy + r = 0.

Then

0 = y4 + py2 + qy + r = (y − x1 )(y − x2 )(y − x3 )(y − x4 ).

If we compare the coefficients, we get the following:

0 = x1 + x2 + x3 + x4 ,
p = x1 x2 + x1 x3 + x1 x4 + x2 x3 + x2 x4 + x3 x4 ,
−q = x1 x2 x3 + x1 x2 x4 + x1 x3 x4 + x2 x3 x4 ,
r = x1 x2 x3 x4 .

We define

y1 = (x1 + x2 )(x3 + x4 ),
y2 = (x1 + x3 )(x2 + x4 ),
y3 = (x1 + x4 )(x2 + x3 ).

From x1 + x2 + x3 + x4 = 0, we get
17.4 The Insolvability of the Quintic Polynomial � 259

y1 = −(x1 + x2 )2 = −(x3 + x4 )2 , because x1 + x2 = −(x3 + x4 ),


2 2
y2 = −(x1 + x3 ) = −(x2 + x4 ) , because x1 + x3 = −(x2 + x4 ),
2 2
y3 = −(x1 + x4 ) = −(x2 + x3 ) , because x1 + x4 = −(x2 + x3 ).

Let y3 + fy2 + gy + h = 0 be the cubic equation with the solutions y1 , y2 , and y3 . This
polynomial y3 + fy2 + gy + h is called the cubic resolvent of the equation of degree four.
If we compare the coefficients, we get the following:

f = −y1 − y2 − y3 ,
g = y1 y2 + y1 y3 + y2 y3 ,
h = −y1 y2 y3 .

Direct calculations leads to

f = −2p,
g = p2 − 4r,
h = q2 .

Hence, the equation

y3 − 2py2 + (p2 − 4r)y + q2 = 0

is the resolvent of y4 + py2 + qy + r = 0. We now calculate the solutions y1 , y2 , y3 of


y3 − 2py2 + (p2 − 4r)y + q2 = 0 using Cardano’s formula.
Then we substitute backwards, and get the following:

x1 + x2 = −(x3 + x4 ) = ±√−y1 ,
x1 + x3 = −(x2 + x4 ) = ±√−y2 ,
x1 + x4 = −(x2 + x3 ) = ±√−y3 .

We add these equations, and get

±√−y1 ± √−y2 ± √−y3


3x1 + x2 + x3 + x4 = 2x1 = ±√−y1 ± √−y2 ± √−y3 ⇒ x1 = .
2

The formulas for x2 , x3 , and x4 follow analogously, and are of the same type as that for x1 .
By variation of the signs we get eight numbers ±x1 , ±x2 , ±x3 and ±x4 . Four of them
are the solutions of the equation

y4 + py3 + qy + r = 0.

The correct ones we get by putting into the equation. They are as follows:
260 � 17 Applications of Galois Theory

1
x1 = (√−y1 + √−y2 + √−y3 ),
2
1
x2 = (√−y1 − √−y2 − √−y3 ),
2
1
x3 = (−√−y1 + √−y2 − √−y3 ),
2
1
x4 = (−√−y1 − √−y2 + √−y3 ).
2

The following theorem is due to Abel; it shows the insolvability of the general de-
gree 5 polynomial over the rationals ℚ.

Theorem 17.4.3. Let L be the splitting field of the polynomial f (x) = x 5 − 2x 4 + 2 ∈ ℚ[x]
over ℚ. Then Aut(L|K) = S5 , the symmetric group on 5 letters. Since S5 is not solvable, the
equation f (x) = 0 is not solvable by radicals.

Proof. The polynomial f (x) is irreducible over ℚ by the Eisenstein criterion. Further-
more, f (x) has five zeros in the complex numbers ℂ by the fundamental theorem of al-
gebra (see Section 17.6). We claim that f (x) has exactly 3 real zeros and 2 nonreal zeros,
which then necessarily are complex conjugates. In particular, the 5 zeros are pairwise
distinct.
To see the claim, notice first that f (x) has at least 3 real zeros from the intermediate
value theorem. As a real function, f (x) is continuous, f (−1) = −1 < 0, f (0) = 2 > 0, so
it must have a real zero between −1 and 0. Furthermore, we have f ( 32 ) = − 813 < 0 and
f (2) = 2 > 0. Hence, there must be distinct real zeros between 0 and 32 , and between 32
and 2. Suppose that f (x) has more than 3 real zeros. Then f ′ (x) = x 3 (5x − 8) has at least 3
pairwise distinct real zeros from Rolle’s theorem. But f ′ (x) clearly has only 2 real zeros,
so this is not the case. Therefore, f (x) has exactly 3 real zeros, and hence 2 nonreal zeros
that are complex conjugates.
Let L be the splitting field of f (x). The field L lies in ℂ, and the restriction of the map
δ : z 󳨃→ z of ℂ to L maps the set of zeros of f (x) onto themselves. Therefore, δ is an
automorphism of L. The map δ fixes the 3 real zeros and transposes the 2 nonreal zeros.
From this, we now show that Aut(L|ℚ) = Aut L = G = S5 , the full symmetric group on 5
symbols. Clearly, G ⊂ S5 , since G acts as a permutation group on the 5 zeros of f (x).
Since δ transposes the 2 nonreal zeros, G (as a permutation group) contains at least
one transposition. Since f (x) is irreducible, G acts transitively on the zeros of f (x). Let
x0 be one of the zeros of f (x), and let Gx0 be the stabilizer of x0 .
Since G acts transitively, x0 has five images under G; therefore, the index of the
stabilizer must be 5 (see Chapter 10):

5 = [G : Gx0 ],

which—by Lagrange’s theorem—must divide the order of G. Therefore, from the Sylow
theorems, G contains an element of order 5. Hence, G contains a 5-cycle and a transpo-
17.5 Constructibility of Regular n-Gons � 261

sition; therefore, by Theorem 11.4.3, it follows that G = S5 . Since S5 is not solvable, it


follows that f (x) cannot be solved by radicals.

Since Abel’s theorem shows that there exists a degree 5 polynomial that cannot be
solved by radicals, it follows that there can be no formula like Cardano’s formula in
terms of radicals for degree 5.

Corollary 17.4.4. There is no general formula for solving by radicals a fifth degree poly-
nomial over the rationals.

We now show that this result can be further extended to any degree greater than 5.

Theorem 17.4.5. For each n ≥ 5, there exist polynomials f (x) ∈ ℚ[x] of degree n, for
which the equation f (x) = 0 is not solvable by radicals.

Proof. Let f (x) = x n−5 (x 5 − 2x 4 + 2), and let L be the splitting field of f (x) over ℚ. Then
Aut(L|ℚ) = Aut(L) contains a subgroup that is isomorphic to S5 . It follows that Aut(L) is
not solvable; therefore, the equation f (x) = 0 is not solvable by radicals.

This immediately implies the following:

Corollary 17.4.6. There is no general formula for solving by radicals polynomial equa-
tions over the rationals of degree 5 or greater.

17.5 Constructibility of Regular n-Gons


In Chapter 6, we considered certain geometric material related to field extensions.
There, using general field extensions, we proved the impossibility of certain geometric
compass and straightedge constructions. In particular, there were four famous insolv-
able (to the Greeks) construction problems. The first is the squaring of the circle. This
problem is, given a circle, to construct using straightedge and compass a square having
an area equal to that of the given circle. The second is the doubling of the cube. This
problem is, given a cube of given side length, to construct, using a straightedge and
compass, a side of a cube having double the volume of the original cube. The third
problem is the trisection of an angle. This problem is to trisect a given angle using only a
straightedge and compass. The final problem is the construction of a regular n-gon. This
problems asks which regular n-gons could be constructed using only straightedge and
compass. In Chapter 6, we proved the impossibility of the first 3 problems. Here, we use
Galois theory to consider constructible n-gons.
Recall that a Fermat number is a positive integer of the form
n
Fn = 22 + 1, n = 0, 1, 2, 3, . . . .

If a particular Fm is prime, it is called a Fermat prime.


262 � 17 Applications of Galois Theory

Fermat believed that all the numbers in this sequence were primes. In fact, F0 , F1 ,
F2 , F3 , F4 are all prime, but F5 is composite and divisible by 641 (see exercises). It is still
an open question whether or not there are infinitely many Fermat primes. It has been
conjectured that there are only finitely many. On the other hand, if a number of the form
2n + 1 is a prime for some integer n, then it must be a Fermat prime; that is, n must be a
power of 2.
We first need the following:

Theorem 17.5.1. Let p = 2n + 1, n = 2s with s ≥ 0 be a Fermat prime. Then there exists a


chain of fields

ℚ = L0 ⊂ L1 ⊂ ⋅ ⋅ ⋅ ⊂ Ln = kp ,

where kp is the p-th cyclotomic field such that

[Lj : Lj−1 ] = 2

for j = 1, . . . , n.

Proof. The extension kp |ℚ is a Galois extension, and [kp : ℚ] = p − 1. Furthermore,


Aut(kp ) is cyclic of order p − 1 = 2n . Hence, there is a chain of subgroups

{1} = Un ⊂ Un−1 ⊂ ⋅ ⋅ ⋅ ⊂ U0 = Aut(kp )

with [Uj−1 : Uj ] = 2 for j = 1, . . . , n. From the fundamental theorem of Galois theory, the
fields Lj = Fix(kp , Uj ) with j = 0, . . . , n have the desired properties.

The following corollaries describe completely the constructible n-gons, tying them
to Fermat primes.

Corollary 17.5.2. Consider the numbers 0, 1, that is, a unit line segment or a unit circle.
A regular p-gon with p ≥ 3 prime is constructible from {0, 1} using a straightedge and
s
compass if and only if p = 22 + 1, s ≥ 0 is a Fermat prime.

Proof. From Theorem 6.3.13, we have that if a regular p-gon is constructible with a
straightedge and compass, then p must be a Fermat prime. The sufficiency follows from
Theorem 17.5.1.

We now extend this to general n-gons. Let m, n ∈ ℕ. Assume that we may construct
from {0, 1} a regular n-gon and a regular m-gon. In particular, this means that we may
construct the real numbers cos( 2π
n
), sin( 2π
n
), cos( 2π
m
), and sin( 2π
m
). If the gcd(m, n) = 1,
then we may construct from {0, 1} a regular mn-gon.
To see this, notice that

2π 2π 2(n + m)π 2π 2π 2π 2π
cos( + ) = cos( ) = cos( ) cos( ) − sin( ) sin( ),
n m nm n m n m
17.6 The Fundamental Theorem of Algebra � 263

and

2π 2π 2(n + m)π 2π 2π 2π 2π
sin( + ) = sin( ) = sin( ) cos( ) + cos( ) sin( ).
n m nm n m n m
2π 2π
Therefore, we may construct from {0, 1} the numbers cos( mn ) and sin( mn ), because
gcd(n + m, mn) = 1. Therefore, we may construct from {0, 1} a regular mn-gon.
Now let p ≥ 3 be a prime. Then [kp2 : ℚ] = p(p − 1), which is not a power of 2.
Therefore, from {0, 1} it is not possible to construct a regular p2 -gon. Hence, altogether
we have the following:

Corollary 17.5.3. Consider the numbers 0, 1, that is, a unit line segment or a unit circle.
A regular n-gon with n ∈ ℕ is constructible from {0, 1} using a straightedge and compass
if and only if
(i) n = 2m , m ≥ 0 or
(ii) n = 2m p1 p2 ⋅ ⋅ ⋅ pr , m ≥ 0, and the pi are pairwise distinct Fermat primes.

Proof. Certainly we may construct a 2m -gon. Furthermore, if r, s ∈ ℕ with gcd(r, s) = 1,


and if we can construct a regular rs-gon, then clearly, we may construct a regular r-gon
and a regular s-gon.

17.6 The Fundamental Theorem of Algebra


In this section we present a Galois theoretic proof of the fundamental theorem of Alge-
bra that we have first studied in Section 7.3.

Theorem 17.6.1. Each nonconstant polynomial f (x) ∈ ℂ[x], where ℂ is the field of com-
plex numbers, has a zero in ℂ. Therefore, ℂ is an algebraically closed field.

Proof. Let f (x) ∈ ℂ[x] be a nonconstant polynomial, and let K be the splitting field of
f (x) over ℂ. Since the characteristic of the complex numbers ℂ is zero, this will be a
Galois extension of ℂ. Since ℂ is a finite extension of ℝ, this field K would also be a
Galois extension of ℝ. The fundamental theorem of algebra asserts that K must be ℂ
itself, and hence the fundamental theorem of algebra is equivalent to the fact that any
nontrivial Galois extension of ℂ must be ℂ.
Let K be any finite extension of ℝ with |K : ℝ| = 2m q, (2, q) = 1. If m = 0, then K is
an odd-degree extension of ℝ. Since K is separable over ℝ, from the primitive element
theorem, it is a simple extension, and hence K = ℝ(α), where the minimal polynomial
mα (x) over ℝ has odd degree. However, odd-degree real polynomials always have a real
zero, and therefore mα (x) is irreducible only if its degree is one. But then, α ∈ ℝ, and
K = ℝ. Therefore, if K is a nontrivial finite extension of ℝ of degree 2m q, we must have
m > 0. This shows more generally that there are no odd-degree finite extensions of ℝ.
264 � 17 Applications of Galois Theory

Suppose that K is a degree 2 extension of ℂ. Then K = ℂ(α) with deg mα (x) = 2,


where mα (x) is the minimal polynomial of α over ℂ. But from the quadratic formula
complex, quadratic polynomials always have zeros in ℂ, so a contradiction. Therefore,
ℂ has no degree 2 extensions.
Now, let K be a Galois extension of ℂ. Then K is also Galois over ℝ. Suppose that |K :
ℝ| = 2m q, (2, q) = 1. From the argument above, we must have m > 0. Consider the Galois
group G = Gal(K/ℝ). Then |G| = 2m q, m > 0, (2, q) = 1. Thus, G has a 2-Sylow subgroup
of order 2m and index q (see Theorem 13.3.4). This would correspond to an intermediate
field E with |K : E| = 2m and |E : ℝ| = q. However, then E is an odd-degree finite
extension of ℝ. It follows that q = 1 and E = ℝ. Therefore, |K : ℝ| = 2m , and |G| = 2m .
Now, |K : ℂ| = 2m−1 and suppose G1 = Gal(K/ℂ). This is a 2-group. If it were not
trivial, then from Theorem 13.4.1 there would exist a subgroup of order 2m−2 and index 2.
This would correspond to an intermediate field E of degree 2 over ℂ. However, from the
argument above, ℂ has no degree 2 extensions. It follows then that G1 is trivial; that is,
|G1 | = 1, so |K : ℂ| = 1, and K = ℂ, completing the proof.

The fact that ℂ is algebraically closed limits the possible algebraic extensions of the
reals.

Corollary 17.6.2. Let K be a finite field extension of the real numbers ℝ. Then K = ℝ or
K = ℂ.

Proof. Since |K : ℝ| < ∞ by the primitive element theorem, K = ℝ(α) for some α ∈ K.
Then the minimal polynomial mα (x) of α over ℝ is in ℝ[x], and hence in ℂ[x]. Therefore,
from the fundamental theorem of algebra it has a zero in ℂ. Hence, α ∈ ℂ. If α ∈ ℝ, then
K = ℝ, if not, then K = ℂ.

17.7 Exercises
1. For f (x) ∈ ℚ[x] with

f (x) = x 6 − 12x 4 + 36x 2 − 50


(f (x) = 4x 4 − 12x 2 + 20x − 3)

determine for each complex zero α of f (x) a finite number of radicals γi = βi i ,


m

i = 1, . . . , r, and a presentation of α as a rational function in γ1 , . . . , γr over ℚ such


that γi+1 is irreducible over ℚ(γ1 , . . . , γi ), and βi+1 ∈ ℚ(γ1 , . . . , γi ) for i = 0, . . . , r − 1.
2. Let K be a field of prime characteristic p. Let n ∈ ℕ and Kn be the splitting field of
x n − 1 over K. Show that Aut(Kn |K) is cyclic.
3. Let f (x) = x 4 − x + 1 ∈ ℤ[x]. Show the following:
(i) f has a real zero.
(ii) f is irreducible over ℚ.
17.7 Exercises � 265

(iii) If u+iv (u, v ∈ ℝ) is a zero of f in ℂ, then g = x 3 −4x−1 is the minimal polynomial


of 4u2 over ℚ.
(iv) The Galois group of f over ℚ has an element of order 3.
(v) No zero a ∈ ℂ of f is constructible from the points 0 and 1 with straightedge
and compass.
4. Show that each polynomial f (x) over ℝ decomposes in linear factors and quadratic
factors (f (x) = d(x − a1 ) ⋅ (x − a2 ) ⋅ ⋅ ⋅ (x 2 + b1 x + c1 ) ⋅ (x 2 + b2 x + c2 ) ⋅ ⋅ ⋅, d ∈ ℝ).
5. Let E be a finite (commutative) field extension of ℝ. Then E ≅ ℝ, or E ≅ ℂ.
6. (Vieta) Show that y3 −py = q reduces to the form 4z3 −3z = c by a suitable substitution
y = mz.
7. Suppose that |a + id| = |c + id| and |a + ib|3 = c + id. Show that the relation between
a and c is 4a3 − 3a = c.
8. Show the identity of Bombelli:

√3 (2 ± √−121) = 2 ± √−1,

and apply it on the equation x 4 = 15x + 4.


9. Solve the following equations:
(a) x 3 − 2x + 3 = 0.
(b) x 4 + 2x 3 + 3x 2 − x − 2 = 0.
10. Let n ≥ 1 be a natural number and x an indeterminate over ℂ. Consider the polyno-
mial x n − 1 ∈ ℤ[x]. In ℂ[x] it decomposes in linear factors:

x n − 1 = (x − ξ1 )(x − ξ2 ) ⋅ ⋅ ⋅ (x − ξn ),

where the complex numbers

ν 2πν 2πν
ξν = e2πi n = cos + i ⋅ sin , 1 ≤ ν ≤ n,
n n

are all (different) n-th roots of unity, that is, especially ξn = 1. These ξν form a mul-
tiplicative cyclic group G = {ξ1 , ξ2 , . . . , ξn } generated by ξ1 . It is ξν = ξ1ν .
An n-th root of unity ξν is called a primitive n-th root of unity, if ξν is not an m-th root
of unity for any m < n.
Show that the following are equivalent:
(i) ξν is a primitive n-th root of unity.
(ii) ξν is a generating element of G.
(iii) gcd(ν, n) = 1.
11. The polynomial ϕn (x) ∈ ℂ[x], whose zeros are exactly the primitive n-th roots of
unity, is called the n-th cyclotomic polynomial. With Exercise 6 it is
ν
ϕn (x) = ∏ (x − ξν ) = ∏ (x − e2πi n ).
1≤ν≤n 1≤ν≤n
gcd(ν,n)=1 gcd(ν,n)=1
266 � 17 Applications of Galois Theory

The degree of ϕn (x) is the number of the integers {1, . . . , n}, which are coprime to n.
Show the following:
(i) x n − 1 = ∏d≥1 ϕd (x).
d|n
(ii) ϕn (x) ∈ ℤ[x] for all n ≥ 1.
(iii) ϕn (x) is irreducible over ℚ (and therefore also over ℤ) for all n ≥ 1.
12. Show that the Fermat numbers F0 , F1 , F2 , F3 , F4 are all prime but F5 is composite and
divisible by 641.
18 The Theory of Modules
18.1 Modules over Rings
Recall that a vector space V over a field K is an Abelian group V with a scalar multipli-
cation ⋅ : K × V → V , satisfying the following:
(1) f (v1 + v2 ) = fv1 + fv2 for f ∈ K and v1 , v2 ∈ V .
(2) (f1 + f2 )v = f1 v + f2 v for f1 , f2 ∈ K and v ∈ V .
(3) (f1 f2 )v = f1 (f2 v) for f1 , f2 ∈ K and v ∈ V .
(4) 1v = v for v ∈ V .

Vector spaces are the fundamental algebraic structures in linear algebra, and the study
of linear equations. Vector spaces have been crucial in our study of fields and Galois
theory, since any field extension is a vector space over any subfield. In this context, the
degree of a field extension is just the dimension of the extension field as a vector space
over the base field. If we modify the definition of a vector space to allow scalar multipli-
cation from an arbitrary ring, we obtain a more general structure called a module. We
will formally define this below. Modules generalize vector spaces, but the fact that the
scalars do not necessarily have inverses makes the study of modules much more com-
plicated. Modules will play an important role in both the study of rings and the study
of Abelian groups. In fact, any Abelian group is a module over the integers ℤ so that
modules, besides being generalizations of vector spaces, can also be considered as gen-
eralizations of Abelian groups.
In this chapter, we will introduce the theory of modules. In particular, we will ex-
tend to modules the basic algebraic properties such as the isomorphism theorems, which
have been introduced earlier in presenting groups, rings, and fields. We restrict our-
selves to commutative rings, so that throughout R is always a commutative ring. If R has
an identity 1, then we always consider only the case that 1 ≠ 0. Throughout this chapter,
we use letters a, b, c, m, . . . for ideals in R. For principal ideals, we write ⟨a⟩ or aR for
the ideal generated by a ∈ R. We note, however, that the definition can be extended to
include modules over noncommutative rings (see Chapter 22). In this case, we would
speak of left modules and right modules.

Definition 18.1.1. Let R = (R, +, ⋅) a commutative ring and M = (M, +) an Abelian group.
M together with a scalar multiplication ⋅ : R × M → M, (α, x) 󳨃→ αx, is called a R-module
or module over R if the following axioms hold:
(M1) (α + β)x = αx + βx,
(M2) α(x + y) = αx + αy, and
(M3) (αβ)x = α(βx) for all α, β ∈ R and x, y ∈ M.

If R has an identity 1, then M is called an unitary R-module, if in addition


(M4) 1 ⋅ x = x for all x ∈ M holds.

https://doi.org/10.1515/9783111142524-018
268 � 18 The Theory of Modules

In the following, R always is a commutative ring. If R contains an identity 1, then M


always is an unitary R-module. If R has an identity 1, then we always assume 1 ≠ 0.
As usual, we have the rules:

0 ⋅ x = 0, α ⋅ 0 = 0, −(αx) = (−α)x = α(−x),

for all α ∈ R and for all x ∈ M.


We next present a series of examples of modules.

Example 18.1.2. (1) If R = K is a field, then a K-module is a K-vector space.


(2) Let G = (G, +) be an Abelian group. If n ∈ ℤ and x ∈ G, then nx is defined as usual:

0 ⋅ x = 0,
nx = x⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
+ ⋅⋅⋅ + x if n > 0, and
n-times

nx = (−n)(−x) if n < 0.

Then G is an unitary ℤ-module via the scalar multiplication

⋅ : ℤ × G → G, (n, x) 󳨃→ nx.

(3) Let S be a subring of R. Then, via (s, r) 󳨃→ sr, the ring R itself becomes an S-module.
(4) Let K be a field, V a K-vector space, and f : V → V a linear map of V .
Let p = ∑i αi t i ∈ K[t]. Then p(f ) := ∑i αi f i defines a linear map of V , and V is an
unitary K[t]-module via the scalar multiplication

K[t] × V → V , (p, v) 󳨃→ pv := p(f )(v).

(5) If R is a commutative ring and a is an ideal in R, then a is a module over R.

Basic to all algebraic theory is the concept of substructures. Next we define submod-
ules.

Definition 18.1.3. Let M be an R-module. 0 ≠ U ⊂ M is called a submodule of M if


(UMI) (U, +) < (M, +) and
(UMII) α ∈ R, u ∈ U ⇒ αu ∈ U; that is, RU ⊂ U.

Example 18.1.4. (1) In an Abelian group G, considered as a ℤ-module, the subgroups


are precisely the submodules.
(2) The submodules of R, considered as a R-module, are precisely the ideals.
(3) Rx := {αx : α ∈ R} is a submodule of M for each x ∈ M.
(4) Let K be a field, V a K-vector space, and f : V → V a linear map of V . Let U be a
submodule of V , considered as a K[t]-module as above.
Then the following holds:
(a) U < V .
18.1 Modules over Rings � 269

(b) pU = p(f )U ⊂ U for all p ∈ K[t]. In particular, αU ⊂ U for p = α ∈ K and


tU = f (U) ⊂ U for p = t; that is, U is an f -invariant subspace.
Also, on the other hand, p(f )U ⊂ U for all p ∈ K[t] if U is an f -invariant sub-
space.

We next extend to modules the concept of a generating system. For a single genera-
tor, as with groups, this is called cyclic.

Definition 18.1.5. A submodule U of the R-module M is called cyclic if there exists an


x ∈ M with U = Rx.

Example 18.1.4.(3) (above) is an example for a cyclic submodule.


As in vector spaces, groups, and rings, the following constructions are standard lead-
ing us to generating systems.
(1) Let M be a R-module and {Ui : i ∈ I} a family of submodules. Then ⋂i∈I Ui is a
submodule of M.
(2) Let M be a R-module. If A ⊂ M, then we define

⟨A⟩ := ⋂{U : U submodule of M with A ⊂ U}.

⟨A⟩ is the smallest submodule of M, which contains A. If R has an identity 1, then


⟨A⟩ is the set of all linear combinations ∑i αi ai with all αi ∈ R, all ai ∈ A. This holds
because M is unitary, and na = n(1 ⋅ a) = (n ⋅ 1)a for n ∈ ℤ and a ∈ A; that is, we may
consider the pseudoproduct na as a real product in the module. Especially, if R has
an identity 1, then aR = ⟨{a}⟩ =: ⟨a⟩.

Definition 18.1.6. Let R have an identity 1. If M = ⟨A⟩, then A is called a gener-


ating system of M. M is called finitely generated if there are a1 , . . . , an ∈ M with
M = ⟨{a1 , . . . , an }⟩ =: ⟨a1 , . . . , an ⟩.

The following is clear:

Lemma 18.1.7. Let Ui be submodules of M, i ∈ I, I an index set. Then

⟨⋃ Ui ⟩ = {∑ ai : ai ∈ Ui , L ⊂ I finite}.
i∈I i∈L

We write ⟨⋃i∈I Ui ⟩ =: ∑i∈I Ui and call this submodule the sum of the Ui . A sum ∑i∈I Ui
is called a direct sum if for each representation of 0, as 0 = ∑ ai , ai ∈ Ui , it follows that
all ai = 0. This is equivalent to Ui ∩ ∑i=j̸ Uj = 0 for all i ∈ I.
Notation: ⨁i∈I Ui ; and if I = {1, . . . , n}, then we also write U1 ⊕ ⋅ ⋅ ⋅ ⊕ Un .
In analogy with our previously defined algebraic structure, we extend to modules
the concepts of quotient modules and module homomorphisms.
270 � 18 The Theory of Modules

Definition 18.1.8. Let U be a submodule of the R-module M. Let M/U be the factor
group. We define a (well defined) scalar multiplication:

R × M/U → M/U, α(x + U) := αx + U.

With this M/U is a R-module, the factor module or quotient module of M by U. In M/U,
we have the operations

(x + U) + (y + U) = (x + y) + U,

and

α(x + U) = αx + U.

A module M over a ring R can also be considered as a module over a quotient ring
of R. The following is straightforward to verify (see exercises):

Lemma 18.1.9. Let a ⊲ R an ideal in R and M a R-module. The set of all finite sums of the
form ∑ αi xi , αi ∈ a, xi ∈ M, is a submodule of M, which we denote by aM. The factor group
M/aM becomes a R/a-module via the well defined scalar multiplication

(α + a)(m + aM) = αm + aM.

If here R has an identity 1 and a is a maximal ideal, then M/aM becomes a vector space
over the field K = R/a.

We next define module homomorphisms:

Definition 18.1.10. Let R be a ring and M, N be R-modules. A map f : M → N is called a


R-module homomorphism (or R-linear) if

f (x + y) = f (x) + f (y) and f (αx) = αf (x)

for all α ∈ R and all x, y ∈ M. Endo-, epi-, mono-, iso- and automorphisms are defined
analogously via the corresponding properties of the maps. If f : M → N and g : N → P
are module homomorphisms, then g ∘ f : M → P is also a module homomorphism. If
f : M → N is an isomorphism, then also f −1 : N → M.

We define kernel and image in the usual way:

ker(f ) := {x ∈ M : f (x) = 0},

and

im(f ) := f (M) = { f (x) : x ∈ M}.

The set ker(f ) is a submodule of M, and im(f ) is a submodule of N. As usual, f is injective


if and only if ker(f ) = {0}.
18.2 Annihilators and Torsion � 271

If U is a submodule of M, then the map x 󳨃→ x + U defines a module epimorphism


(the canonical epimorphism) from M onto M/U with kernel U.
There are module isomorphism theorems. The proofs are straightforward exten-
sions of the corresponding proofs for groups and rings.

Theorem 18.1.11 (Module isomorphism theorems). Let M, N be R-modules.


(1) If f : M → N is a module homomorphism, then

f (M) ≅ M/ ker(f ).

(2) If U, V are submodules of the R-module M, then

U/(U ∩ V ) ≅ (U + V )/V .

(3) If U and V are submodules of the R-module M with U ⊂ V ⊂ M, then

(M/U)/(V /U) ≅ M/V .

For the proofs, as for groups, just consider the map f : U + V → U/(U ∩ V ), u + v 󳨃→
u + (U ∩ V ), which is well defined because U ∩ V is a submodule of U; then we have
ker(f ) = V .
Note that α 󳨃→ αρ, ρ ∈ R fixed, defines a module homomorphism R → R if we
consider R itself as a R-module.

18.2 Annihilators and Torsion


In this section, we define torsion for an R-module and a very important subring of R
called the annihilator.

Definition 18.2.1. Let M be an R-module. For a fixed a ∈ M, consider the module homo-
morphism λa : R → M, λa (α) := αa where we consider R as an R-module. We call ker(λa )
the annihilator of a denoted by Ann(a); that is,

Ann(a) = {α ∈ R : αa = 0}.

Lemma 18.2.2. The annihilator Ann(a) is a submodule of R and the module isomorphism
theorem (1) gives R/ Ann(a) ≅ Ra.

We next extend the annihilator to whole submodules of M:

Definition 18.2.3. Let U be a submodule of the R-module M. The annihilator Ann(U) is


defined to be

Ann(U) := {α ∈ R : αu = 0 for all u ∈ U}.


272 � 18 The Theory of Modules

As for single elements, since Ann(U) = ⋂u∈U Ann(u), then Ann(U) is a submodule
of R. If ρ ∈ R, u ∈ U, then ρu ∈ U; that means, if u ∈ Ann(U), then also ρu ∈ Ann(U),
because (αρ)u = α(ρu) = 0. Hence, Ann(U) is an ideal in R.
Suppose that G is an Abelian group. Then as aforementioned, G is a ℤ-module. An
element g ∈ G is a torsion element, or has finite order if ng = 0 for some n ∈ ℕ. The
set Tor(G) consists of all the torsion elements in G. An Abelian group is torsion-free if
Tor(G) = {0}.

Lemma 18.2.4. Let G be an Abelian group. Then Tor(G) is a subgroup of G, and the factor
group G/ Tor(G) is torsion-free.

We extend this concept now to general modules:

Definition 18.2.5. The R-module M is called faithful if Ann(M) = {0}. We call an element
a ∈ M a torsion element, or element of finite order, if Ann(a) ≠ {0}. A module without
torsion elements ≠ 0 is called torsion-free. If the R-module M is torsion-free, then R has
no zero divisors ≠ 0.

Theorem 18.2.6. Let R be an integral domain and M an R-module (by our agreement M
is unitary). Let Tor(M) = T(M) be the set of torsion elements of M. Then Tor(M) is a
submodule of M, and M/ Tor(M) is torsion-free.

Proof. If m ∈ Tor(M), α ∈ Ann(m), α ≠ 0, and β ∈ R, then we get

α(βm) = (αβ)m = (βα)m = β(αm) = 0;

that is, βm ∈ Tor(M), because αβ ≠ 0 if β ≠ 0 (R is an integral domain). Let m′ another


element of Tor(M) and 0 ≠ α′ ∈ Ann(m′ ). Then αα′ ≠ 0, and

αα′ (m + m′ ) = αα′ m + αα′ m′ = α′ (αm) + α(α′ m′ ) = 0;

that is, m + m′ ∈ Tor(M). Therefore, Tor(M) is a submodule.


Now, let m + Tor(M) be a torsion element in M/ Tor(M). Let α ∈ R, α ≠ 0 with
α(m + Tor(M)) = αm + Tor(M) = Tor(M). Then αm ∈ Tor(M). Hence, there exists a β ∈ R,
β ≠ 0, with 0 = β(αm) = (βα)m. Since βα ≠ 0, we get that m ∈ Tor(M), and the torsion
element m + Tor(M) is trivial.

18.3 Direct Products and Direct Sums of Modules


Let Mi , i ∈ I ≠ 0, be a family of R-modules. On the direct product

P = ∏ Mi = { f : I → ⋃ Mi : f (i) ∈ Mi for all i ∈ I},


i∈I i∈I

we define the module operations


18.3 Direct Products and Direct Sums of Modules � 273

+:P×P →P and ⋅:R×P →P

via

(f + g)(i) := f (i) + g(i) and (αf )(i) := αf (i).

Together with these operations, P = ∏i∈I Mi is an R-module, the direct product of


the Mi . If we identify f with the I-tuple of the images f = (fi )i∈I , then the sum and the
scalar multiplication are componentwise. If I = {1, . . . , n} and Mi = M for all i ∈ I, then
we write, as usual, M n = ∏i∈I Mi .
We make the agreement that ∏i∈I=0 Mi := {0}.
⨁i∈I Mi := { f ∈ ∏i∈I Mi : f (i) = 0 for almost all i} (“for almost all i” means that
there are at most finitely many i with f (i) ≠ 0) is a submodule of the direct product,
called the direct sum of the Mi . If I = {1, . . . , n}, then we write ⨁i∈I Mi = M1 ⊕ ⋅ ⋅ ⋅ ⊕ Mn .
Here, ∏ni=1 Mi = ⨁ni=1 Mi for finite I.

Theorem 18.3.1. (1) If π ∈ SI is a permutation of I, then

∏ Mi ≅ ∏ Mπ(i) ,
i∈I i∈I

and

⨁ Mi ≅ ⨁ Mπ(i) .
i∈I i∈I

(2) If I = ⋃̇ j∈J Ij , the disjoint union, then

∏ Mi ≅ ∏(∏ Mi ),
i∈I j∈J i∈Ij

and

⨁ Mi ≅ ⨁(⨁ Mi ).
i∈I j∈J i∈Ij

Proof. For (1), consider the map f 󳨃→ f ∘ π.


For (2), consider the map f 󳨃→ ⋃j∈J fj , where fj ∈ ∏i∈Ij Mi is the restriction of f onto
Ij , and ⋃j∈J fj is on J, defined by (⋃j∈J fj )(k) := fk = f (k).
Let I ≠ 0. If M = ∏i∈I Mi , then we get in a natural manner module homomorphisms
πi : M → Mi via f 󳨃→ f (i); πi is called the projection onto the ith component. In duality,
we define module homomorphisms δi : Mi → ⨁i∈I Mi ⊂ ∏i∈I Mi via δi (mi ) = (nj )j∈I ,
where nj = 0 if i ≠ j and ni = mi . δi is called the ith canonical injection. If I = {1, . . . , n},
then πi (a1 , . . . , ai , . . . , an ) = ai , and δi (mi ) = (0, . . . , 0, mi , 0, . . . , 0).
We now consider universal properties.
274 � 18 The Theory of Modules

Theorem 18.3.2 (Universal properties). Let A, Mi , i ∈ I ≠ 0, be R-modules.


(1) If ϕi : A → Mi , i ∈ I, are module homomorphisms, then there exists exactly one
module homomorphism ϕ : A → ∏i∈I Mi such that, for each i, the following diagram
commutes:

that is, ϕj = πj ∘ ϕ where πj is the jth projection.


(2) If Ψi : Mi → A, i ∈ I, are module homomorphisms then there exists exactly one
module homomorphism Ψ : ⨁i∈I Mi → A such that for each j ∈ J the following
diagram commutes:

that is, Ψj = Ψ ∘ δj where δj is the jth canonical injection.

Proof. We first consider (1). If there is such ϕ, then the jth component of ϕ(a) is equal
ϕj (a), because πj ∘ ϕ = ϕj . Hence, define ϕ(a) ∈ ∏i∈I Mi via ϕ(a)(i) := ϕi (a), and ϕ is the
desired map.
We now prove (2). If there is such a Ψ with Ψ ∘ αj = Ψj , then

Ψ(x) = Ψ((xi )) = Ψ(∑ δi (xi )) = ∑ Ψ ∘ δi (xi ) = ∑ Ψi (xi ).


i∈I i∈I i∈I

Hence, define Ψ((xi )) = ∑i∈I Ψi (xi ), and Ψ is the desired map (recall that the sum is well
defined).

18.4 Free Modules


If V is a vector space over a field K, then V always has a basis over K, which may be infi-
nite. Despite the similarity to vector spaces, because the scalars may not have inverses,
this is not necessarily true for modules.
18.4 Free Modules � 275

We now define a basis for a module. Those modules that actually have a basis are
called free modules.
Let R be a ring with identity 1, M be a unitary R-module, and S ⊂ M. Each finite sum
∑ αi si , the αi ∈ R, and the si ∈ S, is called a linear combination in S. Since M is unitary,
and S ≠ 0, then ⟨S⟩ is exactly the set of all linear combinations in S. In the following, we
assume that S ≠ 0. If S = 0, then ⟨S⟩ = ⟨0⟩ = {0}, and this case is not interesting. For
convention, in the following, we always assume mi ≠ mj if i ≠ j in a finite sum ∑ αi mi
with all αi ∈ R and all mi ∈ M.

Definition 18.4.1. A finite set {m1 , . . . , mn } ⊂ M is called linear independent or free


(over R) if a representation 0 = ∑ni=1 αi mi implies always αi = 0 for all i ∈ {1, . . . , n};
that is, 0 can be represented only trivially on {m1 , . . . , mn }. A nonempty subset S ⊂ M is
called free (over R) if each finite subset of S is free.

Definition 18.4.2. Let M be an R-module (as above).


(1) S ⊂ M is called a basis of M if
(a) M = ⟨S⟩, and
(b) S is free (over R).
(2) If M has a basis, then M is called a free R-module. If S is a basis of M, then M is called
free on S, or free with basis S.

In this sense, we can consider {0} as a free module with basis 0.

Example 18.4.3. 1. R × R = R2 , as an R-module, is free with basis {(1, 0), (0, 1)}.
2. More generally, let I ≠ 0. Then ⨁i∈I Ri with Ri = R for all i ∈ I is free with basis
{ϵi : I → R : ϵi (j) = δij , i, j ∈ I}, where

0 if i ≠ j,
δij = {
1 if i = j.

In particular, if I = {1, . . . , n}, then Rn = {(a1 , . . . , an ) : ai ∈ R} is free with basis


{ϵi = (0, . . . , 0, 1, 0, . . . , 0); 1 ≤ i ≤ n}.
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
i−1
3. Let G be an Abelian group. If G, as a ℤ-module, is free on S ⊂ G, then G is called a
free Abelian group with basis S. If |S| = n < ∞, then G ≅ ℤn .

Theorem 18.4.4. The R-module M is free on S if and only if each m ∈ M can be written
uniquely in the form ∑ αi si with αi ∈ R, si ∈ S. This is exactly the case, where M = ⨁s∈S Rs
is the direct sum of the cyclic submodules Rs, and each Rs is module isomorphic to R.

Proof. If S is a basis then each m ∈ M can be written as m = ∑ αi si , because M = ⟨S⟩.


This representation is unique, because if ∑ αi si = ∑ βi si , then ∑(αi − βi )si = 0; that is,
αi − βi = 0 for all i. If, on the other side, we assume that the representation is unique,
then we get from ∑ αi si = 0 = ∑ 0 ⋅ si that all αi = 0, and therefore M is free on S.
276 � 18 The Theory of Modules

The rest of the theorem, essentially, is a rewriting of the definition. If each m ∈ M can
be written as m = ∑ αi si , then M = ∑s∈S Rs. If x ∈ Rs′ ∩ ∑s∈S,s=s̸ ′ Rs with s′ ∈ S, then
x = α′ s′ = ∑si =s̸ ′ ,si ∈S αi si , and 0 = α′ s′ − ∑si =s̸ ′ ,si ∈S αi si . Therefore, α′ = 0, and αi = 0 for
all i. This gives M = ⨁s∈S Rs. The cyclic modules Rs are isomorphic to R/ Ann(s), and
Ann(s) = {0} in the free modules. On the other side such modules are free on S.

Corollary 18.4.5. (1) M is free on S ⇔ M ≅ ⨁s∈S Rs , Rs = R for all s ∈ S.


(2) If M is finitely generated and free, then there exists an n ∈ ℕ0 such that

M ≅ Rn = R ⊕ ⋅⋅⋅ ⊕ R.
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
n-times

Proof. Part (1) is clear. For (2), let M = ⟨x1 , . . . , xr ⟩ and S a basis of M. Each xi is uniquely
representable on S, as xi = ∑si ∈S αi si . Since the xi generates M, m = ∑ βi xi = ∑ βi αj sj for
arbitrary m ∈ M, and we need only finitely many sj to generate M. Hence, S is finite.

Theorem 18.4.6. Let R be a commutative ring with identity 1, and M a free R-module.
Then any two bases of M have the same cardinality.

Proof. The ring R contains a maximal ideal m, and R/m is a field (see Theorems 2.3.2
and 2.4.2). Then M/mM is a vector space over R/m. From M ≅ ⨁s∈S Rs with basis S, we
get mM ≅ ⨁s∈S ms; hence,

M/mM ≅ (⨁ Rs)/mM ≅ ⨁(Rs/mM) ≅ ⨁ R/m.


s∈S s∈S s∈S

Therefore, the R/m-vector space M/mM has a basis of the cardinality of S. This gives the
result.

Let R be a commutative ring with identity 1, and M a free R-module. The cardinality
of a basis is an invariant of M, called the rank of M or dimension of M.
If rank(M) = n < ∞, then this means M ≅ Rn .

Theorem 18.4.7. Each R-module is a (module-)homomorphic image of a free R-module.

Proof. Let M be a R-module. We consider F := ⨁m∈M Rm with Rm = R for all m ∈ M. F is


a free R-module. The map f : F → M, f ((αm )m∈M ) = ∑ αm m, defines a surjective module
homomorphism.

Theorem 18.4.8. Let F, M be R-modules, and let F be free. Let f : M → F be a module


epimorphism. Then there exists a module homomorphism g : F → M with f ∘ g = idF ,
and we have M = ker(f ) ⊕ g(F).

Proof. Let S be a basis of F. By the axiom of choice, there exists for each s ∈ S an element
ms ∈ M with f (ms ) = s (f is surjective). We define the map g : F → M via s 󳨃→ ms
linearly; that is, g(∑si ∈S αi si ) = ∑si ∈S αi msi . Since F is free, the map g is well defined.
Obviously, f ∘ g(s) = f (ms ) = s for s ∈ S; that means f ∘ g = idF , because F is free on S. For
18.5 Modules over Principal Ideal Domains � 277

each m ∈ M, we have also m = g ∘ f (m) + (m − g ∘ f (m)), where g ∘ f (m) = g(f (m)) ∈ g(F).
Since f ∘ g = idF , the elements of the form m − g ∘ f (m) are in the kernel of f . Therefore,
M = g(F) + ker(f ). Now let x ∈ g(F) ∩ ker(f ). Then x = g(y) for some y ∈ F and 0 = f (x) =
f ∘ g(y) = y, and hence x = 0. Therefore, the sum is direct: M = g(F) ⊕ ker(f ).

Corollary 18.4.9. Let M be an R-module and N a submodule such that M/N is free. Then
there is a submodule N ′ of M with M = N ⊕ N ′ .

Proof. Apply the above theorem for the canonical map π : M → M/N with ker(π) = N.

18.5 Modules over Principal Ideal Domains


We now specialize to the case of modules over principal ideal domains. For the remain-
der of this section, R is always a principal ideal domain ≠ {0}. We now use the notation
(α) := αR, α ∈ R, for the principal ideal αR.

Theorem 18.5.1. Let M be a free R-module of finite rank over the principal ideal domain R.
Then each submodule U is free of finite rank, and rank(U) ≤ rank(M).

Proof. We prove the theorem by induction on n = rank(M). The theorem certainly holds
if n = 0. Now let n ≥ 1, and assume that the theorem holds for all free R-modules of
rank < n. Let M be a free R-module of rank n with basis {x1 , . . . , xn }. Let U be a submod-
ule of M. We represent the elements of U as linear combination of the basis elements
x1 , . . . , xn , and we consider the set of coefficients of x1 for the elements of U:

n
a = {β ∈ R : βx1 + ∑ βi xi ∈ U}.
i=2

Certainly a is an ideal in R. Since R is a principal ideal domain, we have a = (α1 ) for some
α1 ∈ R. Let u ∈ U be an element in U, which has α1 as its first coefficient; that is

n
u = α1 x1 + ∑ αi xi ∈ U.
i=2

Let v ∈ U be arbitrary. Then


n
v = ρ(α1 x1 ) + ∑ ρi xi .
i=2

Hence, v − ρu ∈ U ′ := U ∩ M ′ , where M ′ is the free R-module with basis {x2 , . . . , xn }.


By induction, U ′ is a free submodule of M ′ with a basis {y1 , . . . , yt }, t ≤ n − 1. If α1 = 0,
then a = (0), and U = U ′ , and there is nothing to prove. Now let α1 ≠ 0. We show that
{u, y1 , . . . , yt } is a basis of U. v − ρu is a linear combination of the basis elements of U ′ ;
that is, v − ρu = ∑ti=1 ηi yi uniquely. Hence, v = ρu + ∑ti=1 ηi yi , and U = ⟨u, y1 , . . . , yt ⟩.
278 � 18 The Theory of Modules

Now let be 0 = γu + ∑ti=1 μi yi . We write u and the yi as linear combinations in the basis
elements x1 , . . . , xn of M. There is only an x1 -portion in γu. Hence,

n
0 = γα1 x1 + ∑ μ′i xi .
i=2

Therefore, first γα1 x1 = 0; that is, γ = 0, because R has no zero divisor ≠ 0, and further-
more, μ′2 = ⋅ ⋅ ⋅ = μ′n = 0. That means, μ1 = ⋅ ⋅ ⋅ = μt = 0.

Let R be a principal ideal domain. Then the annihilator Ann(x) in R-modules M has
certain further properties. Let x ∈ M. By definition

Ann(x) = {α ∈ R : αx = 0} ⊲ R, an ideal in R,

hence Ann(x) = (δx ). If x = 0, then (δx ) = R. δx is called the order of x and (δx ) the
order ideal of x. δx is uniquely determined up to units in R (that is, up to elements η with
ηη′ = 1 for some η′ ∈ R). For a submodule U of M, we call Ann(U) = ⋂u∈U (δu ) = (μ), the
order ideal of U.
In an Abelian group G, considered as a ℤ-module, this order for elements corre-
sponds exactly to the order as group elements if we choose δx ≥ 0 for x ∈ G.

Theorem 18.5.2. Let R be a principal ideal domain and M be a finitely generated torsion-
free R-module. Then M is free.

Proof. Let M = ⟨x1 , . . . , xn ⟩ torsion-free and R a principal ideal domain. Each submodule
⟨xi ⟩ = Rxi is free, because M is torsion-free. We call a subset S ⊂ ⟨x1 , . . . , xn ⟩ free if the
submodule ⟨S⟩ is free. Since ⟨xi ⟩ is free, there exist such nonempty subsets. Under all
free subsets S ⊂ ⟨x1 , . . . , xn ⟩, we choose one with a maximal number of elements. We
may assume that {x1 , . . . , xs }, 1 ≤ s ≤ n, is such a maximal set—after possible renaming.
If s = n, then the theorem holds. Now, let s < n. By the choice of s, the sets {x1 , . . . , xs , xj }
with s < j ≤ n are not free. Hence, there are αj ∈ R, and αi ∈ R, not all 0, with

s
αj xj = ∑ αi xi , αj ≠ 0, s < j ≤ n.
i=1

For the product α := αs+1 ⋅ ⋅ ⋅ αn ≠ 0, we get αxj ∈ Rx1 ⊕ ⋅ ⋅ ⋅ ⊕ Rxs =: F, s < j ≤ n, because
αxi ∈ F for 1 ≤ i ≤ s. Altogether, we get αM ⊂ F. αM is a submodule of the free R-module
F of rank s. By Theorem 18.5.1, we have that αM is free. Since α ≠ 0, and M is torsion-
free, the map M → αM, x 󳨃→ αx, defines an (module) isomorphism; that is, M ≅ αM.
Therefore, also M is free.

We remind that for an integral domain R, the set

Tor(M) = T(M) = {x ∈ M : ∃α ∈ R, α ≠ 0, with αx = 0}


18.5 Modules over Principal Ideal Domains � 279

of the torsion elements of an R-module M, is a submodule with torsion-free factor mod-


ule M/T(M).

Corollary 18.5.3. Let R be a principal ideal domain and M be a finitely generated R-mod-
ule. Then M = T(M) ⊕ F with a free submodule F ≅ M/T(M).

Proof. M/T(M) is a finitely generated, torsion-free R-module, and hence free. By Corol-
lary 18.4.9, we have M = T(M) ⊕ F, F ≅ M/T(M).

From now on, we are interested in the case where M ≠ {0} is a torsion R-module;
that is, M = T(M). Let R be a principal ideal domain and M = T(M) an R-module. Let
M ≠ {0} and finitely generated. As above, let δx be the order of x ∈ M, unique up to units
in R, and let (δx ) = {α ∈ R : αx = 0} be the order ideal of x.
Let (μ) = ⋂x∈M (δx ) be the order ideal of M. Since (μ) ⊂ (δx ), we have δx |μ for
all x ∈ M. Since principal ideal domains are unique factorization domains, if μ ≠ 0,
then there can not be many essentially different orders (that means, different up to
units). Since M ≠ {0} and finitely generated, we have in any case μ ≠ 0, because if
M = ⟨x1 , . . . , xn ⟩, αi xi = 0 with αi ≠ 0, then αM = {0} if α := α1 ⋅ ⋅ ⋅ αn ≠ 0.

Lemma 18.5.4. Let R be a principal ideal domain and M ≠ {0} be an R-module with M =
T(M).
(1) If the orders δx and δy of x, y ∈ M are relatively prime; that is, gcd(δx , δy ) = 1, then
(δx+y ) = (δx δy ).
(2) Let δz be the order of z ∈ M, z ≠ 0. If δz = αβ with gcd(α, β) = 1, then there exist
x, y ∈ M with z = x + y and (δx ) = (α), (δy ) = (β).

Proof. (1) Since δx δy (x + y) = δx δy x + δx δy y = δy δx x + δx δy y = 0, we get (δx δy ) ⊂ (δx+y ).


On the other hand, from δx x = 0 and δx+y (x + y) = 0, we get 0 = δx δx+y (x + y) = δx δx+y y;
that means, δx δx+y ∈ (δy ), and hence δy |δx δx+y . Since gcd(δx , δy ) = 1, we have δy |δx+y .
Analogously δx |δx+y . Hence, δx δy |δx+y , and (δx+y ) ⊂ (δx δy ).
(2) Let δz = αβ with gcd(α, β) = 1. Then there are ρ, σ ∈ R with 1 = ρα+σβ. Therefore,
we get

z = 1 ⋅ z = ⏟⏟ραz
⏟⏟⏟⏟⏟ + ⏟⏟σβz
⏟⏟⏟⏟⏟ = y + x = x + y.
=:y =:x

Since αx = ασβz = σδz z = 0, we get α ∈ (δz ); that means, δx |α. On the other hand, from
0 = δx x = σβδx z, we get δz |σβδx , and hence αβ|σβδx , because δz = αβ. Therefore, α|σδx .
From gcd(α, σ) = 1, we get α|δx . Therefore, α is associated to δx ; that is α = δx ϵ with ϵ a
unit in R, and furthermore, (α) = (δx ). Analogously, (β) = (δy ).

In Lemma 18.5.4, we do not need M = T(M). We only need x, y, z ∈ M with δx ≠ 0,


δy ≠ 0 and δz ≠ 0, respectively.
280 � 18 The Theory of Modules

Corollary 18.5.5. Let R be a principal ideal domain and M ≠ {0} be an R-module with
M = T(M).
1. Let x1 , . . . , xn ∈ M be pairwise different and pairwise relatively prime orders δxi = αi .
Then y = x1 + ⋅ ⋅ ⋅ + xn has order α := α1 ⋅ ⋅ ⋅ αn .
k k
2. Let 0 ≠ x ∈ M and δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of the order δx of x (ϵ a
unit in R and the πi pairwise nonassociate prime elements in R), where n > 0, ki > 0.
k
Then there exist xi , i = 1, . . . , n, with δxi associated with πi i and x = x1 + ⋅ ⋅ ⋅ + xn .

This is exercise 7.

18.6 The Fundamental Theorem for Finitely Generated Modules


In Section 10.4, we described the following result called the basis theorem for finite
Abelian groups. In the following, we give a complete proof in detail; an elementary proof
is given in Chapter 19:

Theorem 18.6.1 (Theorem 10.4.1, basis theorem for finite Abelian groups). Let G be a fi-
nite Abelian group. Then G is a direct product of cyclic groups of prime power order.

This allowed us, for a given finite order n, to present a complete classification of
Abelian groups of order n. In this section, we extend this result to general modules
over principal ideal domains. As a consequence, we obtain the fundamental decom-
position theorem for finitely generated (not necessarily finite) Abelian groups, which
finally proves Theorem 10.4.1. In the next chapter, we present a separate proof of this in
a slightly different format.

Definition 18.6.2. Let R be a principal ideal domain and M be an R-module. Let π ∈ R


be a prime element. Mπ := {x ∈ M : ∃k ≥ 0 with π k x = 0} is called the π-primary
component of M. If M = Mπ for some prime element π ∈ R, then M is called π-primary.

We have the following:


1. Mπ is a submodule of M.
2. The primary components correspond to the p-subgroup in Abelian groups.

Theorem 18.6.3. Let R be a principal ideal domain and M ≠ {0} be an R-module with
M = T(M). Then M is the direct sum of its π-primary components.
k k
Proof. x ∈ M has finite order δx . Let δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of δx . By
Corollary 18.5.5, we have that x = ∑ xi with xi ∈ Mπi . That means, M = ∑π∈P Mπ , where P
is the set of the prime elements of R. Let y ∈ Mπ ∩ ∑σ∈P,σ =π̸ Mσ ; that is, δy = π k for some
k ≥ 0 and y = ∑ xi with xi ∈ Mσi . That means, δxi = σ li for some li ≥ 0. By Corollary 18.5.5,
l l
we get that y has the order ∏σi =π̸ σi i ; that means, π k is associated to ∏σi =π̸ σi i . Therefore,
k = li = 0 for all i, and the sum is direct.
18.6 The Fundamental Theorem for Finitely Generated Modules � 281

If R is a principal ideal domain and {0} ≠ M = T(M) a finitely generated torsion


R-module, then there are only finitely many π-primary components. That is to say, for
the prime elements, π with π|μ, where (μ) is the order ideal of M.

Corollary 18.6.4. Let R be a principal ideal domain and {0} ≠ M be a finitely gener-
ated torsion R-module. Then M has only finitely many nontrivial primary components
Mπ1 , . . . , Mπn , and we have
n
M = ⨁ Mπi .
i=1

Hence, we have a reduction of the decomposition problem to the primary compo-


nents.

Theorem 18.6.5. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0}
a R-module with π k M = {0}; furthermore, let m ∈ M with (δm ) = (π k ). Then there exists a
submodule N ⊂ M with M = Rm ⊕ N.

Proof. By Zorn’s lemma, the set {U : U submodule of M and U ∩Rm = {0}} has a maximal
element N. This set is nonempty, because it contains {0}. We consider M ′ := N ⊕Rm ⊂ M,
and have to show that M ′ = M. Assume that M ′ ≠ M. Then there exists a x ∈ M with
x ∉ M ′ , especially x ∉ N. Then N is properly contained in the submodule Rx+N = ⟨x, N⟩.
By our choice of N, we get A := (Rx + N) ∩ Rm ≠ {0}. If z ∈ A, z ≠ 0, then z = ρm = αx + n
with ρ, α ∈ R and n ∈ N. Since z ≠ 0, we have ρm ≠ 0; also x ≠ 0, because otherwise
z ∈ Rm ∩ N = {0}; α is not a unit in R, because otherwise x = α−1 (ρm − n) ∈ M ′ . Hence
we have: If x ∈ M, x ∉ M ′ , then there exist α ∈ R, α ≠ 0, α not a unit in R, ρ ∈ R with
ρm ≠ 0, and n ∈ N such that
αx = ρm + n. (⋆)

In particular, αx ∈ M ′ .
Now let α = ϵπ1 ⋅ ⋅ ⋅ πr be a prime decomposition. We consider one after the other
the elements x, πr x, πr−1 πr x, . . . , ϵπ1 ⋅ ⋅ ⋅ πr x = αx. We have x ∉ M ′ , but αx ∈ M ′ ; hence,
there exists an y ∉ M ′ with πi y ∈ N + Rm.
1. πi ≠ π, π the prime element in the statement of the theorem. Then we have
gcd(πi , π k ) = 1; hence, there are σ, σ ′ ∈ R with σπi + σ ′ π k = 1, and we get

Rm = (Rπi + Rπ k )m = πi Rm,

because π k m = 0. Therefore, πi y ∈ M ′ = N ⊕ Rm = N + πi Rm.


2. πi = π. Then we write πy as πy = n + λm with n ∈ N and λ ∈ R. This is possible,
because πy ∈ M ′ . Since π k M = {0}, we get 0 = π k−1 ⋅ πy = π k−1 n + π k−1 λm. Therefore,
π k−1 n = π k−1 λm = 0, because N ∩ Rm = {0}. In particular, we get π k−1 λ ∈ (δm ); that
is, π k |π k−1 λ, and hence π|λ. Therefore, πy = n + λm = n + πλ′ m ∈ N + πRm, λ′ ∈ R.

Hence, in any case, we have πi y ∈ N + πi Rm; that is, πi y = n + πi z with n ∈ N and z ∈ Rm.
It follows that πi (y − z) = n ∈ N.
282 � 18 The Theory of Modules

y − z is not an element of M ′ , because y ∉ M ′ . By (⋆), we have, therefore, α, β ∈ R,


β ≠ 0 not a unit in R with β(y − z) = n′ + αm, αm ≠ 0, n′ ∈ N. We write z′ = αm, then
z′ ∈ Rm, z′ ≠ 0, and β(y − z) = n′ + z′ . So, we have the equations β(y − z) = n′ + z′ , z′ ≠ 0,
and

πi (y − z) = n. (⋆⋆)

We have gcd(β, πi ) = 1, because otherwise πi |β and, hence, β(y − z) ∈ N and z′ = 0,


because N ∩ Rm = {0}. Then there exist γ, γ′ with γπi + γ′ β = 1. In (⋆⋆), we multiply the
first equation with γ′ and the second with γ.
Addition gives y − z ∈ N ⊕ Rm = M ′ , and hence y ∈ M ′ , which contradicts y ∉ M ′ .
Therefore, M = M ′ .

Theorem 18.6.6. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0}
a finitely generated π-primary R-module. Then there exist finitely many m1 , . . . , ms ∈ M
with M = ⨁si=1 Rmi .

Proof. Let M = ⟨x1 , . . . , xn ⟩. Each xi has an order π ki . We may assume that k1 =


k
max{k1 , k2 , . . . , kn }, possibly after renaming. We have π ki xi = 0 for all i. Since xi 1 =
k
(xi i )k1 −ki , we have also π k1 M = 0, and also (δx1 ) = (π k1 ). Then M = Rx1 ⊕ N for some
submodule N ⊂ M by Theorem 18.6.5. Now N ≅ M/Rx1 , and M/Rx1 is generated by the
elements x2 + Rx1 , . . . , xn + Rx1 . Hence, N is finitely generated by n − 1 elements, and
certainly N is π-primary. This proves the result by induction.

Since Rmi ≅ R/ Ann(mi ), and Ann(mi ) = (δmi ) = (π ki ), we get the following extension
of Theorem 18.6.6:

Theorem 18.6.7. Let R be a principal ideal domain, π ∈ R a prime element, and M ≠ {0} a
finitely generated π-primary R-module. Then there exist finitely many k1 , . . . , ks ∈ ℕ with

s
M ≅ ⨁ R/(π ki ),
i=0

and M is, up to isomorphism, uniquely determined by (k1 , . . . , ks ).

Proof. The first part, that is, a description as M ≅ ⨁si=0 R/(π ki ), follows directly from
Theorem 18.6.6. Now, let
n m
M ≅ ⨁ R/(π ki ) ≅ ⨁ R/(π li ).
i=0 i=0

We may assume that k1 ≥ k2 ≥ ⋅ ⋅ ⋅ ≥ kn > 0, and l1 ≥ l2 ≥ ⋅ ⋅ ⋅ ≥ lm > 0. We consider first


the submodule N := {x ∈ M : πx = 0}. Let M = ⨁ni=1 R/(π ki ).
If we then write x = ∑(ri + (π ki )), we have πx = 0 if and only if ri ∈ (π ki −1 ); that is,
N ≅ ⨁ni=1 (π ki −1 )/(π ki ) ≅ ⨁ni=1 R/(π), because π k−1 R/π k R ≅ R/πR.
18.6 The Fundamental Theorem for Finitely Generated Modules � 283

Since (α + (π))x = αx if πx = 0, we get that N is an R/(π)-module, and hence a vector


space over the field R/(π). From the decompositions
n m
N ≅ ⨁ R/(π) and, analogously, N ≅ ⨁ R/(π),
i=1 i=1

we get

n = dimR/(π) N = m. (⋆⋆⋆)

Assume that there is an i with ki < li or li < ki . Without loss of generality, assume that
there is an i with ki < li .
Let j be the smallest index, for which kj < lj . Then (because of the ordering of the ki )

n j−1
M ′ := π kj M ≅ ⨁ π kj R/π ki R ≅ ⨁ π kj R/π ki R,
i=1 i=1

because if i > j, then π kj R/π ki R = {0}.


We now consider M ′ = π kj M with respect to the second decomposition; that is,
M ≅ ⨁m
′ kj li
i=1 π R/π R. By our choice of j, we have kj < lj ≤ li for 1 ≤ i ≤ j.
Therefore, in this second decomposition, the first j summands π kj R/π li R are unequal
{0}; that is, π kj R/π li R ≠ {0} if 1 ≤ i ≤ j. The remaining summands are {0}, or of the
form R/π s R. Hence, altogether, on the one hand, M ′ is a direct sum of j − 1 cyclic sub-
modules, and, on the other hand, a direct sum of t ≥ j nontrivial submodules. But this
contradicts the above result (⋆⋆⋆) about the number of direct sums for finitely gener-
ated π-primary modules, because, certainly, M ′ is also finitely generated and π-primary.
Therefore, ki = li for i = 1, . . . , n. This proves the theorem.

Theorem 18.6.8 (Fundamental theorem for finitely generated modules over principal ideal
domains). Let R be a principal ideal domain and M ≠ {0} be a finitely generated (uni-
tary) R-module. Then there exist prime elements π1 , . . . , πr ∈ R, 0 ≤ r < ∞ and numbers
k1 , . . . , kr ∈ tℕ, t ∈ ℕ0 such that
k k
M ≅ R/(π1 1 ) ⊕ R/(π2 2 ) ⊕ ⋅ ⋅ ⋅ ⊕ R/(πrkr ) ⊕ R ⊕ ⋅ ⋅ ⋅ ⊕ R,
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
t-times

k k
and M is, up to isomorphism, uniquely determined by (π1 1 , . . . , πr r , t).

The prime elements πi are not necessarily pairwise different (up to units in R); that
means, it can be πi = ϵπj for i ≠ j, where ϵ is a unit in R.

Proof. The proof is a combination of the preceding results. The free part of M is isomor-
phic to M/T(M), and the rank of M/T(M), which we call here t, is uniquely determined,
because two bases of M/T(M) have the same cardinality. Therefore, we may restrict our-
selves on torsion modules. Here, we have a reduction to π-primary modules, because in
284 � 18 The Theory of Modules

k k
a decomposition M = ⨁i R/(πi i ) is Mπ = ⨁πi =π R/(πi i ), the π-primary component of M
(an isomorphism certainly maps a π-primary component onto a π-primary component).
Therefore, it is only necessary, now, to consider π-primary modules M. The uniqueness
statement now follows from Theorem 18.6.8:

Since Abelian groups can be considered as ℤ-modules, and ℤ is a principal ideal


domain, we get the following corollary. We will restate this result in the next chapter
and prove a different version of it.

Theorem 18.6.9 (Fundamental theorem for finitely generated Abelian groups). Let {0} ≠
G = (G, +) be a finitely generated Abelian group. Then there exist prime numbers p1 , . . . , pr ,
0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 such that

k
G ≅ ℤ/(p1 1 ℤ) ⊕ ⋅ ⋅ ⋅ ⊕ ℤ/(pkr r ℤ) ⊕ ℤ
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⊕ ⋅ ⋅ ⋅ ⊕ ℤ,
t-times

k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).

18.7 Exercises
1. Let M and N be isomorphic modules over a commutative ring R. Then EndR (M) and
EndR (N) are isomorphic rings. (EndR (M) is the set of all R-modules endomorphisms
of M.)
2. Let R be an integral domain and M an R-module with M = Tor(M) (torsion module).
Show that HomR (M, R) = 0. (HomR (M, R) is the set of all R-module homomorphisms
from M to R.)
3. Prove the isomorphism theorems for modules (1), (2), and (3) in Theorem 18.1.11.
4. Let M, M ′ , N be R-modules, R a commutative ring. Show the following:
(i) HomR (M ⊕ M ′ , N) ≅ HomR (M, N) × HomR (M ′ , N).
(ii) HomR (N, M × M ′ ) ≅ HomR (N, M) ⊕ HomR (N, M ′ ).
5. Show that two free R-modules having bases, whose cardinalities are equal are iso-
morphic.
6. Let M be an unitary R-module (R a commutative ring), and let {m1 , . . . , ms } be a finite
subset of M. Show that the following are equivalent:
(i) {m1 , . . . , ms } generates M freely.
(ii) {m1 , . . . , ms } is linearly independent and generates M.
(iii) Every element m ∈ M is uniquely expressible in the form m = ∑si=1 ri mi with
ri ∈ R.
(iv) Each Rmi is torsion-free, and M = Rm1 ⊕ ⋅ ⋅ ⋅ ⊕ Rms .
7. Let R be a principal domain and M ≠ {0} be an R-module with M = T(M).
(i) Let x1 , . . . , xn ∈ M be pairwise different and pairwise relatively prime orders
δxi = αi . Then y = x1 + ⋅ ⋅ ⋅ + xn has order α := α1 . . . αn .
18.7 Exercises � 285

k k
(ii) Let 0 ≠ x ∈ M and δx = ϵπ1 1 ⋅ ⋅ ⋅ πnn be a prime decomposition of the order δx of
x (ϵ a unit in R and the πi pairwise nonassociate prime elements in R), where
k
n > 0, ki > 0. Then there exist xi , i = 1, . . . , n, with δxi associated with πi i and
x = x1 + ⋅ ⋅ ⋅ + xn .
19 Finitely Generated Abelian Groups
19.1 Finite Abelian Groups
In Chapter 10, we described the theorem below that completely provides the structure
of finite Abelian groups. As we saw in Chapter 18, this result is a special case of a general
result on modules over principal ideal domains.

Theorem 19.1.1 (Theorem 10.4.1, basis theorem for finite Abelian groups). Let G be a finite
Abelian group. Then G is a direct product of cyclic groups of prime power order.

We review two examples that show how this theorem leads to the classification of
finite Abelian groups. In particular, this theorem allows us, for a given finite order n, to
present a complete classification of Abelian groups of order n.
Since all cyclic groups of order n are isomorphic to (ℤn , +), ℤn = ℤ/nℤ, we will
denote a cyclic group of order n by ℤn .

Example 19.1.2. Classify all Abelian groups of order 60. Let G be an Abelian group of
order 60. From Theorem 10.4.1, G must be a direct product of cyclic groups of prime
power order. Now 60 = 22 ⋅ 3 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic groups involved in the direct product decomposition of G have order either 2, 4,
3, or 5 (by Lagrange’s theorem they must be divisors of 60). Therefore, G must be of the
form

G ≅ ℤ4 × ℤ3 × ℤ5 ,

or

G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ5 .

Hence, up to isomorphism, there are only two Abelian groups of order 60.

Example 19.1.3. Classify all Abelian groups of order 180. Let G be an Abelian group of
order 180. Now 180 = 22 ⋅ 32 ⋅ 5, so the only primes involved are 2, 3, and 5. Hence, the
cyclic groups involved in the direct product decomposition of G have order either 2, 4,
3, 9, or 5 (by Lagrange’s theorem they must be divisors of 180). Therefore, G must be of
the form

G ≅ ℤ4 × ℤ9 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ9 × ℤ5
G ≅ ℤ4 × ℤ3 × ℤ3 × ℤ5
G ≅ ℤ2 × ℤ2 × ℤ3 × ℤ3 × ℤ5 .

Therefore, up to isomorphism, there are four Abelian groups of order 180.

https://doi.org/10.1515/9783111142524-019
19.2 The Fundamental Theorem: p-Primary Components � 287

The proof of Theorem 19.1.1 involves the lemmas that follow. We refer back to Chap-
ter 10 or Chapter 18 for the proofs. Notice how these lemmas mirror the results for
finitely generated modules over principal ideal domains considered in the last chap-
ter.

Lemma 19.1.4. Let G be a finite Abelian group, and let p||G|, where p is a prime. Then
all the elements of G, whose orders are a power of p form a normal subgroup of G. This
subgroup is called the p-primary component of G, which we will denote by Gp .
e e
Lemma 19.1.5. Let G be a finite Abelian group of order n. Suppose that n = p11 ⋅ ⋅ ⋅ pkk with
p1 , . . . , pk distinct primes.
Then

G ≅ Gp1 × ⋅ ⋅ ⋅ × Gpk ,

where Gpi is the pi -primary component of G.

Theorem 19.1.6 (Basis theorem for finite Abelian groups). Let G be a finite Abelian group.
Then G is a direct product of cyclic groups of prime power order.

19.2 The Fundamental Theorem: p-Primary Components


In this section, we use the fundamental theorem for finitely generated modules over
principal ideal domains to extend the basis theorem for finite Abelian groups to the more
general case of finitely generated Abelian groups. We also consider the decomposition
into p-primary components, mirroring our result in the finite case. In the next section,
we present a different form of the basis theorem with a more elementary proof.
In Chapter 18, we proved the following:

Theorem 19.2.1 (Fundamental theorem for finitely generated modules over principal ideal
domains). Let R be a principal ideal domain and M ≠ {0} be a finitely generated (uni-
tary) R-module. Then there exist prime elements π1 , . . . , πr ∈ R, 0 ≤ r < ∞ and numbers
k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 , such that

k k
M ≅ R/(π1 1 ) ⊕ R/(π2 2 ) ⊕ ⋅ ⋅ ⋅ ⊕ R/(πrkr ) ⊕ R ⊕ ⋅ ⋅ ⋅ ⊕ R,
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
t-times

k k
and M is, up to isomorphism, uniquely determined by (π1 1 , . . . , πr r , t).

The prime elements πi are not necessarily pairwise different (up to units in R); that
means, it can be πi = ϵπj for i ≠ j, where ϵ is a unit in R.
Since Abelian groups can be considered as ℤ-modules, and ℤ is a principal ideal
domain, we get the following corollary, which is extremely important in its own right.
288 � 19 Finitely Generated Abelian Groups

Theorem 19.2.2 (Fundamental theorem for finitely generated Abelian groups). Suppose
{0} ≠ G = (G, +) is a finitely generated Abelian group. Then there exist prime numbers
p1 , . . . , pr , 0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 , such that

k
G ≅ ℤ/(p1 1 ℤ) ⊕ ⋅ ⋅ ⋅ ⊕ ℤ/(pkr r ℤ) ⊕ ℤ
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⊕ ⋅ ⋅ ⋅ ⊕ ℤ,
t-times

k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).

Notice that the number t of infinite components is unique. This is called the rank or
Betti number of the Abelian group G. This number plays an important role in the study
of homology and cohomology groups in topology.
If G = ℤ × ℤ × ⋅ ⋅ ⋅ × ℤ = ℤr for some r, we call G a free Abelian group of rank r.
Notice that if an Abelian group G is torsion-free, then the p-primary components are just
the identity. It follows that, in this case, G is a free Abelian group of finite rank. Again,
using module theory, it follows that subgroups of this must also be free Abelian and of
smaller or equal rank. Notice the distinction between free Abelian groups and absolutely
free groups (see Chapter 14). In the free group case, a non-Abelian free group of finite
rank contains free subgroups of all possible countable ranks. In the free Abelian case,
however, the subgroups have smaller or equal rank. We summarize these comments as
follows:

Theorem 19.2.3. Let G ≠ {0} be a finitely generated torsion-free Abelian group. Then G is
a free Abelian group of finite rank r; that is, G ≅ ℤr . Furthermore, if H is a subgroup of G,
then H is also free Abelian and the rank of H is smaller than or equal to the rank of G.

19.3 The Fundamental Theorem: Elementary Divisors


In this section, we present the fundamental theorem of finitely generated Abelian
groups in a slightly different form, and present an elementary proof of it.
In the following, G is always a finitely generated Abelian group. We use the addition
“+” for the binary operation; that is,

+ : G × G → G, (x, y) 󳨃→ x + y.

We also write ng instead of g n , and use 0 as the symbol for the identity element in G;
that is, 0 + g = g for all g ∈ G. G = ⟨g1 , . . . , gt ⟩, 0 ≤ t < ∞. That is, G is (finitely)
generated by g1 , . . . , gt , is equivalent to the fact that each g ∈ G can be written in the
form g = n1 g1 + n2 g2 + ⋅ ⋅ ⋅ + nt gt , ni ∈ ℤ. A relation between the gi with coefficients
n1 , . . . , nt is then each an equation of the form n1 g1 + ⋅ ⋅ ⋅ + nt gt = 0. A relation is called
nontrivial if ni ≠ 0 for at least one i. A system R of relations in G is called a system of
defining relations, if each relation in G is a consequence of R. The elements g1 , . . . , gt are
called integrally linear independent if there are no nontrivial relations between them.
19.3 The Fundamental Theorem: Elementary Divisors � 289

A finite generating system {g1 , . . . , gt } of G is called a minimal generating system if there


is no generating system with t − 1 elements.
Certainly, each finitely generated group has a minimal generating system. In what
follow, we always assume that our finitely generated Abelian group G is unequal {0};
that is, G is nontrivial.
As above, we may consider G as a finitely generated ℤ-module, and in this sense,
the subgroups of G are precisely the submodules. Hence, it is clear what we mean if we
call G a direct product G = U1 × ⋅ ⋅ ⋅ × Us of its subgroups U1 , . . . , Us ; namely, each g ∈ G
can be written as g = u1 + u2 + ⋅ ⋅ ⋅ + us with ui ∈ Ui and

s
Ui ∩ ( ∏ Uj ) = {0}.
j=1,j=i̸

To emphasize the little difference between Abelian groups and ℤ-modules, here we
use the notation “direct product” instead of “direct sum”. Considered as ℤ-modules, for
finite index sets I = {1, . . . , s}, we have anyway
s s
∏ Ui = ⨁ Ui .
i=1 i=1

Finally, we use the notation ℤn instead of ℤ/nℤ, n ∈ ℕ. In general, we use Zn to be


a cyclic group of order n.
The aim in this section is to prove the following:

Theorem 19.3.1 (Basis theorem for finitely generated Abelian groups). Let G ≠ {0} be a
finitely generated Abelian group. Then G is a direct product

G ≅ Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us ,

r ≥ 0, s ≥ 0, of cyclic subgroups with |Zki | = ki for i = 1, . . . , r, ki |ki+1 for i = 1, . . . , r − 1


and Uj ≅ ℤ for j = 1, . . . , s. Here, the numbers k1 , . . . , kr , r, and s are uniquely determined
by G; that means, if k1′ , . . . , kr′ , r ′ and s′ are the respective numbers for a second analogous
decomposition of G, then r = r ′ , k1 = k1′ , . . . , kr = kr′ , and s = s′ .

The numbers ki are called the elementary divisors of G.


We can have r = 0, or s = 0 (but not both, because G ≠ {0}). If s > 0, r = 0, then G is a
free Abelian group of rank s (exactly the same rank if you consider G as a free ℤ-module
of rank s). If s = 0, then G is finite. In fact, s = 0 if and only if G is finite.
We first prove some preliminary results:

Lemma 19.3.2. Let G = ⟨g1 , . . . , gt ⟩, t ≥ 2, an Abelian group.


Then also G = ⟨g1 + ∑ti=2 mi gi , g2 , . . . , gt ⟩ for arbitrary m2 , . . . , mt ∈ ℤ.

Lemma 19.3.3. Let G be a finitely generated Abelian group. Among all nontrivial relations
between elements of minimal generating systems of G, we choose one relation,
290 � 19 Finitely Generated Abelian Groups

m1 g1 + ⋅ ⋅ ⋅ + mt gt = 0 (⋆)

with smallest possible positive coefficient, and let this smallest coefficient be m1 . Let

n1 g1 + ⋅ ⋅ ⋅ + nt gt = 0 (⋆⋆)

be another relation between the same generators g1 , . . . , gt . Then


(1) m1 |n1 , and
(2) m1 |mi for i = 1, 2, . . . , t.

Proof. For (1), assume m1 ∤ n1 . Then n1 = qm1 + m1′ with 0 < m1′ < m1 . If we multiply the
relation (⋆) with q and subtract the resulting relation from the relation (⋆⋆), then we get
a relation with a coefficient m1′ < m1 , contradicting the choice of m1 . Hence, m1 |n1 .
For (2), assume m1 ∤ m2 . Then m2 = qm1 + m2′ with 0 < m2′ < m2 . {g1 + qg2 , g2 , . . . , gt }
is a minimal generating system, which satisfies the relation

m1 (g1 + qg2 ) + m2′ g2 + m3 g3 + ⋅ ⋅ ⋅ + mt gt = 0,

and this relation has a coefficient m2′ < m1 . This again contradicts the choice of m1 .
Hence, m1 |m2 , and furthermore, m1 |mi for i = 1, . . . , t.

Lemma 19.3.4 (Invariant characterization of kr for finite Abelian groups G). Consider the
group G = Zk1 × ⋅ ⋅ ⋅ × Zkr with Zki finite cyclic of order ki ≥ 2, i = 1, . . . , r and ki |ki+1 for
i = 1, . . . , r − 1. Then kr is the smallest natural number n such that ng = 0 for all g ∈ G. kr
is called the exponent or the maximal order of G.

Proof. Let g ∈ G arbitrary; that is, g = n1 g1 + ⋅ ⋅ ⋅ + nr gr with gi ∈ Zki . Then ki gi = 0 for


i = 1, . . . , r by the theorem of Fermat. Since ki |kr , we get kr g = n1 k1 g1 +⋅ ⋅ ⋅+nr kr gr = 0. Let
a ∈ G with Zkr = ⟨a⟩. Then the order of a is kr and, hence, na ≠ 0 for all 0 < n < kr .

Lemma 19.3.5 (Invariant characterization of s). Let G = Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us , s > 0,


where the Zki are finite cyclic groups of order ki , and the Uj are infinite cyclic groups. Then,
s is the maximal number of integrally linear independent elements of G; s is called the rank
of G.

Proof. Let gi ∈ Ui , gi ≠ 0, for i = 1, . . . , s. Then the g1 , . . . , gs are integrally linear inde-


pendent, because from n1 g1 + ⋅ ⋅ ⋅ + ns gs = 0, the ni ∈ ℤ, we get

n1 g1 ∈ U1 ∩ (U2 × ⋅ ⋅ ⋅ × Us ) = {0}.

Hence, n1 g1 = 0; that is, n1 = 0, because g1 has infinite order. Analogously, we get n2 =


⋅ ⋅ ⋅ = ns = 0.
Let g1 , . . . , gs+1 ∈ G. We look for integers x1 , . . . , xs+1 , not all 0, such that a relation
∑s+1i=1 xi gi = 0 holds. Let Zki ∈ ⟨ai ⟩, Uj = ⟨bj ⟩. Then we may write each gi as
19.3 The Fundamental Theorem: Elementary Divisors � 291

gi = mi1 a1 + ⋅ ⋅ ⋅ + mir ar + ni1 b1 + ⋅ ⋅ ⋅ + nis bs

for i = 1, . . . , s + 1, where mij aj ∈ Zkj , and nil bl ∈ Ul .


Case 1: all mij aj = 0. Then ∑s+1
i=1 xi gi = 0 is equivalent to

s+1 s s s+1
∑ xi (∑ nij bj ) = ∑(∑ nij xi )bj = 0.
i=1 j=1 j=1 i=1

The system ∑s+1 i=1 nij xi = 0, j = 1, . . . , s, of linear equations has at least one nontrivial ra-
tional solution (x1 , . . . , xs+1 ), because we have more unknowns than equations. Multipli-
cation with the common denominator gives a nontrivial integral solution (x1 , . . . , xs+1 ) ∈
ℤs+1 . For this solution, we get
s+1
∑ xi gi = 0.
i=1

Case 2: mij aj arbitrary. Let k ≠ 0 be a common multiple of the orders kj of the cyclic
groups Zkj , j = 1, . . . , r. Then

kgi = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
mi1 ka1 + ⋅ ⋅ ⋅ + ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
mir kar +ni1 kb1 + ⋅ ⋅ ⋅ + nis kbs
=0 =0

for i = 1, . . . , s + 1. By case 1, the kg1 , . . . , kgs+1 are integrally linear dependent; that is, we
have integers x1 , . . . , xs+1 , not all 0, with ∑s+1 s+1
i=1 xi (kgi ) = 0 = ∑i=1 (xi k)gi , and the xi k are
not all 0. Hence, also g1 , . . . , gs+1 are integrally linear dependent.

Lemma 19.3.6. Let G := Zk1 × ⋅ ⋅ ⋅ × Zkr ≅ Zk1′ × ⋅ ⋅ ⋅ × Zk ′′ =: G′ , the Zki , Zk ′ cyclic groups
r j

of orders ki ≠ 1 and kj′ ≠ 1, respectively, and ki |ki+1 for i = 1, . . . , r − 1 and kj′ |kj+1

for
j = 1, . . . , r − 1. Then r = r , and k1 = k1 , k2 = k2 , . . . , kr = kr .
′ ′ ′ ′ ′

Proof. We prove this lemma by induction on the group order |G| = |G′ |. Certainly,
Lemma 19.3.6 holds if |G| ≤ 2, because then, either G = {0}, and here r = r ′ = 0, or
G ≅ ℤ2 , and here r = r ′ = 1. Now let |G| > 2. Then, in particular, r ≥ 1. Inductively we
assume that Lemma 19.3.6 holds for all finite Abelian groups of order less than |G|. By
Lemma 19.3.4 the number kr is invariantly characterized, that is, from G ≅ G′ follows
kr = kr′′ , that is especially, Zkr ≅ Zk ′′ . Then G/Zkr ≅ G/Zk ′′ , that is,
r r

Zk1 × ⋅ ⋅ ⋅ × Zkr−1 ≅ Zk1′ × ⋅ ⋅ ⋅ × Zk ′′ .


r −1

Inductively, r − 1 = r ′ − 1; that is, r = r ′ , and k1 = k1′ , . . . , kr−1 = kr′′ −1 .

We can now present the main result, which we state again, and its proof.

Theorem 19.3.7 (Basis theorem for finitely generated Abelian groups). Let G ≠ {0} be a
finitely generated Abelian group. Then G is a direct product
292 � 19 Finitely Generated Abelian Groups

G ≅ Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us , r ≥ 0, s ≥ 0,

of cyclic subgroups with |Zki | = ki for i = 1, . . . , r, ki |ki+1 for i = 1, . . . , r − 1, and Uj ≅ ℤ for


j = 1, . . . , s. Here, the numbers k1 , . . . , kr , r, and s are uniquely determined by G; that means,
are k1′ , . . . , kr′ , r ′ , and s′ , the respective numbers for a second analogous decomposition
of G. Then r = r ′ , k1 = k1′ , . . . , kr = kr′ , and s = s′ .

Proof. We first prove the existence of the given decomposition. Let G ≠ {0} be a finitely
generated Abelian group. Let t, 0 < t < ∞, be the number of elements in a minimal
generating system of G. We have to show that G is decomposable as a direct product of
t cyclic groups with the given description. We prove this by induction on t. If t = 1, then
the basis theorem is correct. Now let t ≥ 2, and assume that the assertion holds for all
Abelian groups with less then t generators.
Case 1: There does not exist a minimal generating system of G, which satisfies a
nontrivial relation. Let {g1 , . . . , gt } be an arbitrary minimal generating system for G. Let
Ui = ⟨gi ⟩. Then all Ui are infinite cyclic, and we have G = U1 × ⋅ ⋅ ⋅ × Ut , because if, for
instance, U1 ∩ (U2 + ⋅ ⋅ ⋅ + Ut ) ≠ {0}, then we must have a nontrivial relation between the
g1 , . . . , gt .
Case 2: There exist minimal generating systems of G, which satisfy nontrivial rela-
tions. Among all nontrivial relations between elements of minimal generating systems
of G, we choose one relation,

m1 g1 + ⋅ ⋅ ⋅ + mt gt = 0 (⋆)

with smallest possible positive coefficient. Without loss of generality, let m1 be this coef-
ficient. By Lemma 19.3.3, we get m2 = q2 m1 , . . . , mt = qt m1 . Now,
t
{g1 + ∑ qi gi , g2 , . . . , gt }
i=2

is a minimal generating system of G by Lemma 19.3.2. Define h1 = g1 + ∑ti=2 qi gi , then


m1 h1 = 0. If n1 h1 + n2 g2 + ⋅ ⋅ ⋅ + nt gt = 0 is an arbitrary relation between h1 , g2 , . . . , gt ,
then m1 |n1 by Lemma 19.3.3; hence, n1 h1 = 0. Define H1 := ⟨h1 ⟩, and G′ = ⟨g2 , . . . , gt ⟩.
Then G = H1 × G′ . This we can see as follows: First, each g ∈ G can be written as g =
m1 h1 +m2 g2 +⋅ ⋅ ⋅+mt gt = m1 h1 +g ′ with g ′ ∈ G′ . Also H1 ∩G′ = {0}, because m1 h1 = g ′ ∈ G′
implies a relation n1 h1 +n2 g2 +⋅ ⋅ ⋅+nt gt = 0, and from this we get, as above, n1 h1 = g ′ = 0.
Now, inductively, G′ = Zk2 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us with Zki a cyclic group of order ki ,
i = 2, . . . , r, ki |ki+1 for i = 2, . . . , r − 2, Uj ≅ ℤ for j = 1, . . . , s, and (r − 1) + s = t − 1; that is,
r + s = t. Furthermore, G = H1 × G′ , where H1 is cyclic of order m1 . If r ≥ 2 and Zk2 = ⟨h2 ⟩,
then we get a nontrivial relation

m⏟ ⏟⏟
⏟⏟ 1 h⏟ 1⏟ + k h⏟2⏟ = 0,
⏟⏟⏟2⏟⏟
=0 =0

since k2 ≠ 0. Again m1 |k2 by Lemma 19.3.3. This gives the desired decomposition.
19.3 The Fundamental Theorem: Elementary Divisors � 293

We now prove the uniqueness statement.


Case 1: G is finite Abelian. Then the claim follows from Lemma 19.3.6.
Case 2: G is arbitrary finitely generated and Abelian. Let T := {x ∈ G : |x| < ∞}; that
is, the set of elements of G of finite order. Since G is Abelian, T is a subgroup of G, the so
called torsion subgroup of G. If, as above,

G = Zk1 × ⋅ ⋅ ⋅ × Zkr × U1 × ⋅ ⋅ ⋅ × Us ,

then T = Zk1 × ⋅ ⋅ ⋅ × Zkr , because an element b1 + ⋅ ⋅ ⋅ + br + c1 + ⋅ ⋅ ⋅ + cs , bi ∈ Zki ,


cj ∈ Uj has finite order if and only if all cj = 0. That means: Zk1 × ⋅ ⋅ ⋅ × Zkr is indepen-
dent of the special decomposition, uniquely determined by G; hence, also the numbers
r, k1 , . . . , kr by Lemma 19.3.6. Finally, the number s, the rank of G, is uniquely determined
by Lemma 19.3.5.

As a corollary, we get the fundamental theorem for finitely generated Abelian


groups as given in Theorem 19.2.1.

Theorem 19.3.8. Let {0} ≠ G = (G, +) be a finitely generated Abelian group. Then there
exist prime numbers p1 , . . . , pr , 0 ≤ r < ∞, and numbers k1 , . . . , kr ∈ ℕ, t ∈ ℕ0 such that

G ≅ ℤpk1 × ⋅ ⋅ ⋅ × ℤpkr × ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟


ℤ × ⋅ ⋅ ⋅ × ℤ,
1 r
t-times

k k
and G is, up to isomorphism, uniquely determined by (p1 1 , . . . , pr r , t).

Proof. For the existence, we only have to show that ℤmn ≅ ℤm × ℤn if gcd(m, n) = 1.
For this, we write Un = ⟨m + mnℤ⟩ < ℤmn , Um = ⟨n + nmℤ⟩ < ℤmn , and Un ∩ Um =
{mnℤ}, because gcd(m, n) = 1. Furthermore, there are h, k ∈ ℤ with 1 = hm + kn. Hence,
l + mnℤ = hlm + mnℤ + kln + mnℤ, and therefore ℤmn = Un × Um ≅ ℤn × ℤm .
For the uniqueness statement, we may reduce the problem to the case |G| = pk for a
prime number p and k ∈ ℕ. But here the result follows directly from Lemma 19.3.6.
From this proof, we automatically get the Chinese remainder theorem for the case
ℤn = ℤ/nℤ.

Theorem 19.3.9 (Chinese remainder theorem). Let m1 , . . . , mr ∈ ℕ with r ≥ 2 and


gcd(mi , mj ) = 1, for i ≠ j. Define m := m1 ⋅ ⋅ ⋅ mr .
(1) π : ℤm → ℤm1 × ⋅ ⋅ ⋅ × ℤmr , a + mℤ 󳨃→ (a + m1 ℤ, . . . , a + mr ℤ), defines a ring isomor-
phism.
(2) The restriction of π on the multiplicative group of the prime residue classes defines a
group isomorphism ℤ⋆m → ℤ⋆m1 × ⋅ ⋅ ⋅ × ℤ⋆mr .
(3) For a1 , . . . , ar ∈ ℤ, there exists modulo m exactly one x ∈ ℤ with x ≡ ai (mod mi ) for
i = 1, . . . , r.

Recall that for k ∈ ℕ, a prime residue class is defined by a + kℤ with gcd(a, k) = 1.


The set of prime residue classes modulo k is certainly a multiplicative group.
294 � 19 Finitely Generated Abelian Groups

Proof. By Theorem 19.3.1, we get that π is an additive group isomorphism, which can be
extended directly to a ring isomorphism via

(a + mℤ)(b + mℤ) 󳨃→ (ab + m1 ℤ, . . . , ab + mr ℤ).

The remaining statements are now obvious.

Let A(n) be the number of nonisomorphic finite Abelian groups that have order
k k
n = p1 1 ⋅ ⋅ ⋅ pr r , r ≥ 1, with pairwise different primes p1 , . . . , pr and k1 , . . . , kr ∈ ℕ. By
k k
Theorem 19.2.2, we have A(n) = A(p1 1 ) ⋅ ⋅ ⋅ A(pr r ). Hence, to calculate A(n), we have to
calculate A(pm ) for a prime number p and a natural number m ∈ ℕ. Again, by Theo-
rem 19.2.2, we get G ≅ ℤpm1 × ⋅ ⋅ ⋅ × ℤpmk , all mi ≥ 1, if G is Abelian of order pm . If we
compare the orders, we get m = m1 + ⋅ ⋅ ⋅ + mk . We may order the mi by size. A k-tuple
(m1 , . . . , mk ) with 0 < m1 ≤ m2 ≤ ⋅ ⋅ ⋅ ≤ mk and m1 + m2 + ⋅ ⋅ ⋅ + mk = m is called a partition
of m. From above, each Abelian group of order pm gives a partition (m1 , . . . , mk ) of m for
some k with 1 ≤ k ≤ m. On the other hand, each partition (m1 , . . . , mk ) of m gives an
Abelian group of order pm , namely ℤpm1 × ⋅ ⋅ ⋅ × ℤpmk . Theorem 19.2.2 shows that different
partitions give nonisomorphic groups. If we define p(m) to be the number of partitions
k k
of m, then we get the following: A(pm ) = p(m), and A(p1 1 ⋅ ⋅ ⋅ pr r ) = p(k1 ) ⋅ ⋅ ⋅ p(kr ).

19.4 Exercises
1. Let H be a finite generated Abelian group, which is the homomorphic image of a
torsion-free Abelian group of finite rank n. Show that H is the direct sum of ≤ n
cyclic groups.
2. Determine (up to isomorphism) all groups of order p2 (p prime) and all Abelian
groups of order ≤ 15.
3. Let G be an Abelian group with generating elements a1 , . . . , a4 and defining relations

5a1 + 4a2 + a3 + 5a4 = 0


7a1 + 6a2 + 5a3 + 11a4 = 0
2a1 + 2a2 + 10a3 + 12a4 = 0
10a1 + 8a2 − 4a3 + 4a4 = 0.

Express G as a direct product of cyclic groups.


4. Let G be a finite Abelian group and u = ∏g∈G g, the product of all elements of G.
Show: If G has exactly one element a of order 2, then u = a, otherwise u = e.
Conclude from this the theorem of Wilson:

(p − 1)! ≡ −1((mod p)) for each prime p.

5. Let p be a prime and G a finite Abelian p-group; that is, the order of all elements of
G is finite and a power of p. Show that G is cyclic, if G has exactly one subgroup of
order p. Is the statement still correct if G is not Abelian?
20 Integral and Transcendental Extensions
20.1 The Ring of Algebraic Integers
Recall that a complex number α is an algebraic number if it is algebraic over the rational
numbers ℚ. That is, α is a zero of a polynomial p(x) ∈ ℚ[x]. If α ∈ ℂ is not algebraic,
then it is a transcendental number.
We will let 𝒜 denote the totality of algebraic numbers within the complex num-
bers ℂ, and 𝒯 the set of transcendentals, so that ℂ = 𝒜 ∪ 𝒯 . The set 𝒜 is the algebraic
closure of ℚ within ℂ.
The set 𝒜 of algebraic numbers forms a subfield of ℂ (see Chapter 5), and the subset
𝒜′ = 𝒜 ∩ ℝ of real algebraic numbers forms a subfield of ℝ. The field 𝒜 is an algebraic
extension of the rationals ℚ. However, the degree is infinite.
Since each rational is algebraic, it is clear that there are algebraic numbers. Fur-
thermore, there are irrational algebraic numbers, √2 for example, since it is a zero of
the irreducible polynomial x 2 − 2 over ℚ. In Chapter 5, we proved that there are un-
countably infinitely many transcendental numbers (Theorem 5.5.3). However, it is very
difficult to prove that any particular real or complex number is actually transcendental.
In Theorem 5.5.4, we showed that the real number

1
c=∑
j=1 10j!

is transcendental.
In this section, we examine a special type of algebraic number called an algebraic
integer. These are the algebraic numbers that are zeros of monic integral polynomials.
The set of all such algebraic integers forms a subring of ℂ. The proofs in this section can
be found in [53].
After we do this, we extend the concept of an algebraic integer to a general con-
text and define integral ring extensions. We then consider field extensions that are
nonalgebraic—transcendental field extensions. Finally, we will prove that the familiar
numbers e and π are transcendental.

Definition 20.1.1. An algebraic integer is a complex number α, that is, a zero of a monic
integral polynomial. That is, α ∈ ℂ is an algebraic integer if there exists f (x) ∈ ℤ[x] with
f (x) = x n + bn−1 x n−1 + ⋅ ⋅ ⋅ + b0 , bi ∈ ℤ, n ≥ 1, and f (α) = 0.

An algebraic integer is clearly an algebraic number. The following are clear:

Lemma 20.1.2. If α ∈ ℂ is an algebraic integer, then all its conjugates, α1 , . . . , αn , over ℚ


are also algebraic integers.

Lemma 20.1.3. α ∈ ℂ is an algebraic integer if and only if mα ∈ ℤ[x].

https://doi.org/10.1515/9783111142524-020
296 � 20 Integral and Transcendental Extensions

To prove the converse of this lemma, we need the concept of a primitive integral
polynomial. This is a polynomial p(x) ∈ ℤ[x] such that the GCD of all its coefficients is 1.
The following can be proved (see exercises or Chapter 4):
(1) If f (x) and g(x) are primitive, then so is f (x)g(x).
(2) If f (x) ∈ ℤ[x] is monic, then it is primitive.
(3) If f (x) ∈ ℚ[x], then there exists a rational number c such that f (x) = cf1 (x) with
f1 (x) primitive.

Now suppose f (x) ∈ ℤ[x] is a monic polynomial with f (α) = 0. Let p(x) = mα (x). Then
p(x) divides f (x) so f (x) = p(x)q(x).
Let p(x) = c1 p1 (x) with p1 (x) primitive, and let q(x) = c2 q1 (x) with q1 (x) primitive.
Then

f (x) = cp1 (x)q1 (x).

Since f (x) is monic, it is primitive; hence c = 1, so f (x) = p1 (x)q1 (x).


Since p1 (x), and q1 (x) are integral and their product is monic, they both must be
monic. Since p(x) = c1 p1 (x), and they are both monic, it follows that c1 = 1. Hence,
p(x) = p1 (x). Therefore, p(x) = mα (x) is integral.
When we speak of algebraic integers, we will refer to the ordinary integers as ratio-
nal integers. The next lemma shows the close ties between algebraic integers and ratio-
nal integers.

Lemma 20.1.4. If α is an algebraic integer and also rational, then it is a rational inte-
ger.

The following ties algebraic numbers in general to corresponding algebraic integers.


Notice that if q ∈ ℚ, then there exists a rational integer n such that nq ∈ ℤ. This result
generalizes this simple idea.

Theorem 20.1.5. If θ is an algebraic number, then there exists a rational integer r ≠ 0


such that rθ is an algebraic integer.

We saw that the set 𝒜 of all algebraic numbers is a subfield of ℂ. In the same manner,
the set ℐ of all algebraic integers forms a subring of 𝒜. First, an extension of the following
result on algebraic numbers.

Lemma 20.1.6. Suppose α1 , . . . , αn form the set of conjugates over ℚ of an algebraic inte-
ger α. Then any integral symmetric function of α1 , . . . , αn is a rational integer.

Theorem 20.1.7. The set ℐ of all algebraic integers forms a subring of 𝒜.

We note that 𝒜, the field of algebraic numbers, is precisely the quotient field of the
ring of algebraic integers.
20.1 The Ring of Algebraic Integers � 297

An algebraic number field is a finite extension of ℚ within ℂ. Since any finite exten-
sion of ℚ is a simple extension, each algebraic number field has the form K = ℚ(θ) for
some algebraic number θ.
Let K = ℚ(θ) be an algebraic number field, and let RK = K ∩ ℐ . Then RK forms a
subring of K called the algebraic integers, or integers of K. An analysis of the proof of
Theorem 20.1.5 shows that each β ∈ K can be written as

α
β=
r
with α ∈ RK and r ∈ ℤ.
These rings of algebraic integers share many properties with the rational integers.
Whereas there may not be unique factorization into primes, there is always prime fac-
torization.

Theorem 20.1.8. Let K be an algebraic number field and RK its ring of integers. Then each
α ∈ RK is either 0, a unit, or can be factored into a product of primes.

We stress again that the prime factorization need not be unique. However, from the
existence of a prime factorization, we can extend Euclid’s original proof of the infinitude
of primes (see [53]) to obtain the following:

Corollary 20.1.9. There exist infinitely many primes in RK for any algebraic number
ring RK .

Just as any algebraic number field is finite-dimensional over ℚ, we will see that each
RK is of finite degree over ℚ. That is, if K has degree n over ℚ, we show that there exists
ω1 , . . . , ωn in RK such that each α ∈ RK is expressible as

α = m1 ω1 + ⋅ ⋅ ⋅ + mn ωn ,

where m1 , . . . , mn ∈ ℤ.

Definition 20.1.10. An integral basis for RK is a set of integers ω1 , . . . , ωt ∈ RK such that


each α ∈ RK can be expressed uniquely as

α = m1 ω1 + ⋅ ⋅ ⋅ + mt ωt ,

where m1 , . . . , mt ∈ ℤ.

The finite degree comes from the following result that shows there does exist an
integral basis (see [53]):

Theorem 20.1.11. Let RK be the ring of integers in the algebraic number field K of degree
n over ℚ. Then there exists at least one integral basis for RK .
298 � 20 Integral and Transcendental Extensions

20.2 Integral Ring Extensions


We now extend the concept of an algebraic integer to general ring extensions. We first
need the idea of an R-algebra, where R is a commutative ring with identity 1 ≠ 0.

Definition 20.2.1. Let R be a commutative ring with an identity 1 ≠ 0. An R-algebra or


algebra over R is a unitary R-module A, in which there is an additional multiplication
such that the following hold
(1) A is a ring with respect to the addition and this multiplication.
(2) (rx)y = x(ry) = r(xy) for all r ∈ R and x, y ∈ A.

As examples of R-algebras, first consider R = K, where K is a field, set A = M(n, K),


the set of all (n × n)-matrices over K. Then M(n, K) is a K-algebra. Furthermore, the set
of polynomials K[x] is also a K-algebra.
We now define ring extensions. Let A be a ring, not necessarily commutative, with
an identity 1 ≠ 0, and R be a commutative subring of A, which contains 1. Assume that
R is contained in the center of A; that is, rx = xr for all r ∈ R and x ∈ A. We then call A
a ring extension of R and write A|R. If A|R is a ring extension, then A is an R-algebra in a
natural manner.
Let A be an R-algebra with an identity 1 ≠ 0. Then we have the canonical ring homo-
morphism ϕ : R → A, r 󳨃→ r ⋅ 1. The image R′ := ϕ(R) is a subring of the center of A, and
R′ contains the identity element of A. Then A|R′ is a ring extension (in the above sense).
Hence, if A is a R-algebra with an identity 1 ≠ 0, then we may consider R as a subring of
A and A|R as a ring extension.
We now will extend to the general context of ring extensions the ideas of integral
elements and integral extensions. As above, let R be a commutative ring with an identity
1 ≠ 0, and let A be an R-algebra.

Definition 20.2.2. An element a ∈ A is said to be integral over R, or integrally dependent


over R, if there is a monic polynomial f (x) = x n + αn−1 x n−1 + ⋅ ⋅ ⋅ + α0 ∈ R[x] of degree
n ≥ 1 over R with f (a) = an + αn−1 an−1 + ⋅ ⋅ ⋅ + α0 = 0. That is, a is integral over R if it is a
zero of a monic polynomial of degree ≥ 1 over R.
An equation that an integral element satisfies is called integral equation of a over R.
If A has an identity 1 ≠ 0, then we may write a0 = 1 and ∑ni=0 αi ai with αn = 1.

Example 20.2.3. 1. Let E|K be a field extension. a ∈ E is integral over K if and only if
a is algebraic over K. If K is the quotient field of an integral domain R, and a ∈ E
is algebraic over K. Then there exists an α ∈ R with αa integral over R, because if
0 = αn an + ⋅ ⋅ ⋅ + α0 , thus, 0 = (αn a)n + ⋅ ⋅ ⋅ + αnn−1 α0 .
2. The elements of ℂ, which are integral over ℤ are precisely the algebraic integers
over ℤ, that is, the zeros of monic polynomials over ℤ.

Theorem 20.2.4. Let R be as above and A an R-algebra with an identity 1 ≠ 0. If A is, as


an R-module, finitely generated, then each element of A is integral over R.
20.2 Integral Ring Extensions � 299

Proof. Let {b1 , . . . , bn } be a finite generating system of A, as an R-module. We may assume


that b1 = 1, otherwise add 1 to the system. As explained in the preliminaries, without
loss of generality, we may assume that R ⊂ A. Let a ∈ A. For each 1 ≤ j ≤ n, we have an
equation abj = ∑nk=1 αkj bk for some αkj ∈ R. In other words,

n
∑ (αkj − δjk a)bk = 0 (⋆⋆)
k=1

for j = 1, . . . , n, where

0 if j ≠ k,
δjk = {
1 if j = k.

Define γjk := αkj − δjk a and C = (γjk )j,k . C is an (n × n)-matrix over the commutative ring
R[a]. Recall that R[a] has an identity element. Let C̃ = (γ̃jk )j,k be the complementary
matrix of C (see for instance [9]). Then CC ̃ = (det C)En . From (⋆⋆), we get

n n n n n
0 = ∑ γ̃ij ( ∑ γjk bk ) = ∑ ∑ γ̃ij γjk bk = ∑ (det C)δik bk = (det C)bi
j=1 k=1 k=1 j=1 k=1

for all 1 ≤ i ≤ n. Since b1 = 1, we have necessarily that det C = det(αjk − δjk a)j,k = 0
(recall that δjk = δkj ). Hence, a is a zero of the monic polynomial f (x) = det(δjk x − αjk )
in R[x] of degree n ≥ 1. Therefore, a is integral over R.

Definition 20.2.5. A ring extension A|R is called an integral extension if each element of
A is integral over R. A ring extension A|R is called finite if A, as a R-module, is finitely
generated.

Recall that finite field extensions are algebraic extensions. As an immediate conse-
quence of Theorem 20.2.4, we get the corresponding result for ring extensions.

Theorem 20.2.6. Each finite ring extension A|R is an integral extension.

Theorem 20.2.7. Let A be an R-algebra with an identity 1 ≠ 0. If a ∈ A, then the following


are equivalent:
(1) a is integral over R.
(2) The subalgebra R[a] is, as an R-module, finitely generated.
(3) There exists a subalgebra A′ of A, which contains a, and which is, as an R-module,
finitely generated.

A subalgebra of an algebra over R is a submodule, which is also a subring.

Proof. (1) implies (2): We have R[a] = {g(a) : g ∈ R[x]}. Let f (a) = 0 be an integral
equation of a over R. Since f is monic, by the division algorithm, for each g ∈ R[x], there
300 � 20 Integral and Transcendental Extensions

are h, r ∈ R[x] with g = h ⋅ f + r and r = 0, or r ≠ 0 and deg(r) < deg(f ) =: n. Let r ≠ 0.


Since g(a) = r(a), we get that {1, a, . . . , an−1 } is a generating system for the R-module R[a].
(2) implies (3): Take A′ = R[a].
(3) implies (1): Use Theorem 20.2.4 for A′ .
For the remainder of this chapter, all rings are commutative with an identity 1 ≠ 0.

Theorem 20.2.8. Let A|R and B|A be finite ring extensions. Then also B|R is finite.

Proof. From A = Re1 + ⋅ ⋅ ⋅ + Rem , and B = Af1 + ⋅ ⋅ ⋅ + Afn , we get B = Re1 f1 + ⋅ ⋅ ⋅ + Rem fn .

Theorem 20.2.9. Let A|R be a ring extension. Then the following are equivalent:
(1) There are finitely many, over R integral elements a1 , . . . , am in A such that

A = R[a1 , . . . , am ].

(2) A|R is finite.

Proof. (2) ⇒ (1): We only need to take for a1 , . . . , am a generating system of A as an


R-module, and the result holds, because A = Ra1 + ⋅ ⋅ ⋅ + Ram , and each ai is integral
over R by Theorem 20.2.4.
(1) ⇒ (2): We use induction for m. If m = 0, then there is nothing to prove. Now let
m ≥ 1, and assume that (1) holds. Define A′ = R[a1 , . . . , am−1 ]. Then A = A′ [am ], and am
is integral over A′ . A|A′ is finite by Theorem 20.2.7. By the induction assumption, A′ |R is
finite. Then A|R is finite by Theorem 20.2.8.

Definition 20.2.10. Let A|R be a ring extension. Then the subset

C = {a ∈ A : a is integral over R} ⊂ A

is called the integral closure of R in A.

Theorem 20.2.11. Let A|R be a ring extension. Then the integral closure of R in A is a
subring of A with R ⊂ A.

Proof. R ⊂ C, because α ∈ R is a zero of the polynomial x − α. Let a, b ∈ C. We consider


the subalgebra R[a, b] of the R-algebra A. R[a, b]|R is finite by Theorem 20.2.9. Hence,
by Theorem 20.2.4, all elements from R[a, b] are integral over R; that is, R[a, b] ⊂ C. In
particular, a + b, a − b, and ab are in C.
We extend to ring extensions the idea of a closure:

Definition 20.2.12. Let A|R a ring extension. R is called integrally closed in A, if R itself
is its integral closure in R; that is, R = C, the integral closure of R in A.

Theorem 20.2.13. For each ring extension A|R, the integral closure C of R in A, is inte-
grally closed in A.
20.2 Integral Ring Extensions � 301

Proof. Let a ∈ A be integral over C. Then an + αn−1 an−1 + ⋅ ⋅ ⋅ + α0 = 0 for some αi ∈ C,


n ≥ 1. Then a is also integral over the R-subalgebra A′ = R[α0 , . . . , αn−1 ] of C, and A′ |R
is finite. Furthermore, A′ [a]|A is finite. Hence, A′ [a]|R is finite. By Theorem 20.2.4, then
a ∈ A′ [a] is already integral over R, that is, a ∈ C.

Theorem 20.2.14. Let A|R and B|A be ring extensions. If A|R and B|A are integral exten-
sions, then also B|R is an integral extension (and certainly vice versa).

Proof. Let C be the integral closure of R in B. We have A ⊂ C, since A|R is integral.


Together with B|A, we also have that B|C is integral. By Theorem 20.2.13, we get that C is
integrally closed in B. Hence, B = C.

We now consider integrally closed integral domains.

Definition 20.2.15. An integral domain R is called integrally closed if R is integrally


closed in its quotient field K.

Theorem 20.2.16. Each unique factorization domain R is integrally closed.

Proof. Let α ∈ K and α = ab with a, b ∈ R, a ≠ 0. Since R is a unique factorization domain,


we may assume that a and b are relatively prime. Let α be integral over R. Then we have
over R an integral equation αn + an−1 αn−1 + ⋅ ⋅ ⋅ + a0 = 0 for α. Multiplication with bn gives
an + ban−1 + ⋅ ⋅ ⋅ + bn a0 = 0. Hence, b is a divisor of an . Since a and b are relatively prime
in R, we have that b is a unit in R. Hence, α = ab ∈ R.

Theorem 20.2.17. Let R be an integral domain and K its quotient field. Let E|K be a finite
field extension. Let R be integrally closed and α ∈ E integral over R. Then the minimal
polynomial g ∈ K[x] of α over K has only coefficients of R.

Proof. Let g ∈ K[x] be the minimal polynomial of α over K (recall that g is monic by
definition). Let Ē be an algebraic closure of E. Then g(x) = (x − α1 ) ⋅ ⋅ ⋅ (x − αn ) with α1 = α
over E.̄ There are K-isomorphisms σi : K(α) → Ē with σi (α) = αi . Hence, all αi are also
integral over R. Since all coefficients of g are polynomial expressions Cj (α1 , . . . , αn ) in
the αi , we get that all coefficients of g are integral over R (see Theorem 20.2.11). Now
g ∈ R[x], because g ∈ K[x], and R is integrally closed.

Theorem 20.2.18. Let R be an integrally closed integral domain and K its quotient field.
Let f , g, h ∈ K[x] be monic polynomials over K with f = gh.
If f ∈ R[x], then also g, h ∈ R[x].

Proof. Let E be the splitting field of f over K. Over E, we have f (x) = (x − α1 ) ⋅ ⋅ ⋅ (x −


αn ). Since f is monic, all αk are integral over R (see the proof of Theorem 20.2.17). Since
f = gh, there are I, J ⊂ {1, . . . , n} with g(x) = ∏i∈I (x − αi ) and h(x) = ∏j∈J (x − αj ). As
polynomial expressions in the αi , i ∈ I, and αj , j ∈ J, respectively, the coefficients of g
and h, respectively, are integral over R. On the other hand, all these coefficients are in K,
and R is integrally closed. Hence, g, h ∈ R[x].
302 � 20 Integral and Transcendental Extensions

Theorem 20.2.19. Let E|R be an integral ring extension. If E is a field, then also R is a field.
1
Proof. Let α ∈ R \ {0}. The element α
∈ E satisfies an integral equation

n n−1
1 1
( ) + an−1 ( ) + ⋅ ⋅ ⋅ + a0 = 0
α α

over R. Multiplication with αn−1 gives

1
= −an−1 − an−2 α − ⋅ ⋅ ⋅ − a0 αn−1 ∈ R.
α

Hence, R is a field.

20.3 Transcendental Field Extensions


Recall that a transcendental number is an element of ℂ that is not algebraic over ℚ.
More generally, if E|K is a field extension, then an element α ∈ E is transcendental over
K if it is not algebraic; that is, it is not a zero of any polynomial f (x) ∈ K[x]. Since fi-
nite extensions are algebraic, clearly E|K will contain transcendental elements only if
[E : K] = ∞. However, this is not sufficient. The field 𝒜 of algebraic numbers is algebraic
over ℚ, but infinite dimensional over ℚ. We now extend the idea of a transcendental
number to that of a transcendental extension.
Let K ⊂ E be fields; that is, E|K is a field extension. Let M be a subset of E. The
algebraic cover of M in E is defined to be the algebraic closure H(M) of K(M) in E; that
is, HK,E (M) = H(M) = {α ∈ E : α algebraic over K(M)}.
H(M) is a field with K ⊂ K(M) ⊂ H(M) ⊂ E. α ∈ E is called algebraically dependent
on M (over K) if α ∈ H(M); that is, if α is algebraic over K(M).
The following are clear:
1. M ⊂ H(M),
2. M ⊂ M ′ implies H(M) ⊂ H(M ′ ), and
3. H(H(M)) = H(M).

Definition 20.3.1. (a) M is said to be algebraically independent (over K) if α ∉ H(M \{α})


for all α ∈ M; that is, if each α ∈ M is transcendental over K(M \ {α}).
(b) M is said to be algebraically dependent (over K) if M is not algebraically independent.

The proofs of the statements in the following lemma are straightforward:

Lemma 20.3.2. (1) M is algebraically dependent if and only if there exists an α ∈ M,


which is algebraic over K(M \ {α}).
(2) Let α ∈ M. Then α ∈ H(M \ {α}) ⇔ H(M) = H(M \ {α}).
(3) If α ∉ M and α is algebraic over K(M), then M ∪ {α} is algebraically dependent.
20.3 Transcendental Field Extensions � 303

(4) M is algebraically dependent if and only if there is a finite subset in M, which is alge-
braically dependent.
(5) M is algebraically independent if and only if each finite subset of M is algebraically
independent.
(6) M is algebraically independent if and only if the following holds: If α1 , . . . , αn are
finitely many, pairwise different elements of M, then the canonical homomorphism
ϕ : K[x1 , . . . , xn ] → E, f (x1 , . . . , xn ) 󳨃→ f (α1 , . . . , αn ) is injective; or in other words,
for all f ∈ K[x1 , . . . , xn ], we have that f = 0 if f (α1 , . . . , αn ) = 0. That is, there is no
nontrivial algebraic relation between the α1 , . . . , αn over K.
(7) Let M ⊂ E, α ∈ E. If M is algebraically independent and M ∪ {α} algebraically depen-
dent, then α ∈ H(M); that is, α is algebraically dependent on M.
(8) Let M ⊂ E, B ⊂ M. If B is maximal algebraically independent, that is, if α ∈ M \ B,
then B ∪ {α} is algebraically dependent, thus M ⊂ H(B). That is, each element of M is
algebraic over K(B).

We will show that any field extension can be decomposed into a transcendental
extension over an algebraic extension. We need the idea of a transcendence basis.

Definition 20.3.3. B ⊂ E is called a transcendence basis of the field extension E|K if the
following two conditions are satisfied:
1. E = H(B), that is, the extension E|K(B) is algebraic.
2. B is algebraically independent over K.

Theorem 20.3.4. If B ⊂ E, then the following are equivalent:


(1) B is a transcendence basis of E|K.
(2) If B ⊂ M ⊂ E with H(M) = E, then B is a maximal algebraically independent subset
of M.
(3) There exists a subset M ⊂ E with H(M) = E, which contains B as a maximal alge-
braically independent subset.

Proof. (1) implies (2): Let α ∈ M \ B. We have to show that B ∪ {α} is algebraically depen-
dent. But this is clear, because α ∈ H(B) = E.
(2) implies (3): We just take M = E.
(3) implies (1): We have to show that H(B) = E. Certainly, M ⊂ H(B).
Hence, E = H(M) ⊂ H(H(B)) = H(B) ⊂ E.

We next show that any field extension does have a transcendence basis:

Theorem 20.3.5. Each field extension E|K has a transcendence basis. More concretely, if
there is a subset M ⊂ E such that E|K(M) is algebraic and if there is a subset C ⊂ M,
which is algebraically independent, then there exists a transcendence basis B of E|K with
C ⊂ B ⊂ M.
304 � 20 Integral and Transcendental Extensions

Proof. We have to extend C to a maximal algebraically independent subset B of M. By


Theorem 20.3.4, such a B is a transcendence basis of E|K. If M is finite, then such a B
certainly exists. Now let M be not finite. We argue analogously as for the existence of a
basis of a vector space, for instance, with Zorn’s lemma: If a partially ordered, nonempty
set S is inductive, then there exist maximal elements in S. Here, a partially ordered,
nonempty set S is said to be inductive if every totally ordered subset of S has an upper
bound in S. The set N of all algebraically independent subsets of M, which contain C
is partially ordered with respect to “⊂”, and N ≠ 0, because C ∈ N. Let K ≠ 0 be an
ascending chain in N; that is, given an ascending chain 0 ≠ Y1 ⊂ Y2 ⊂ ⋅ ⋅ ⋅ in N. The
union U = ⋃Y ∈K Y is also algebraically independent. Hence, there exists a maximal
algebraically independent subset B ⊂ M with C ⊂ B.

Theorem 20.3.6. Let E|K be a field extension and M a subset of E, for which E|K(M) is
algebraic. Let C be an arbitrary subset of E, which is algebraically independent on K. Then
there exists a subset M ′ ⊂ M with C ∩ M ′ = 0 such that C ∪ M ′ is a transcendence basis
of E|K.

Proof. Take M ∪ C, and define M ′ := B \ C in Theorem 20.3.5.

Theorem 20.3.7. Let B, B′ be two transcendence bases of the field extension E|K. Then
there is a bijection ϕ : B → B′ . In other words, any two transcendence bases of E|K have
the same cardinal number.

Proof. (a) If B is a transcendental basis of E|K and M is a subset of E such that E|K(M)
is algebraic, then we may write B = ⋃α∈M Bα with finite sets Bα . In particular, if B is
infinite, then the cardinal number of B is not bigger than the cardinal number of M.
(b) Let B and B′ be two transcendence bases of E|K. If B and B′ are both infinite,
then B and B′ have the same cardinal number by (a) and the theorem by Schroeder–
Bernstein [10]. We now prove Theorem 20.3.7 for the case that E|K has a finite transcen-
dence basis. Let B be finite with n elements. Let C be an arbitrary algebraically inde-
pendent subset in E over K with m elements. We show that m ≤ n. Let C = {α1 , . . . , αm }
with m ≥ n. We show, by induction, that for each integer k, 0 ≤ k ≤ n, there are subsets
B ⫌ B1 ⫌ ⋅ ⋅ ⋅ ⫌ Bk of B such that {α1 , . . . , αk } ∪ Bk is a transcendence basis of E|K, and
{α1 , . . . , αk } ∩ Bk = 0. For k = 0, we take B0 = B, and the statement holds. Assume now
that the statement is correct for 0 ≤ k < n. By Theorems 20.3.4 and 20.3.5, there is a
subset Bk+1 of {α1 , . . . , αk } ∪ Bk such that {α1 , . . . , αk+1 } ∪ Bk+1 is a transcendence basis of
E|K, and {α1 , . . . , αk+1 } ∩ Bk+1 = 0. Then necessarily, Bk+1 ⊂ Bk . Assume Bk = Bk+1 . Then
on the one hand, Bk ∪ {α1 , . . . , αk+1 } is algebraic independent because Bk = Bk+1 . On the
other hand, also Bk ∪ {α1 , . . . , αk } ∪ {ak+1 } is algebraically dependent, which gives a con-
tradiction. Hence, Bk+1 ⫋ Bk . Now Bk has at most n − k elements. Therefore, Bn = 0; that
is, {α1 , . . . , αn } = {α1 , . . . , αn }∪Bn is a transcendence basis of E|K. Because C = {α1 , . . . , αm }
is algebraically independent, we cannot have m > n. Thus, m ≤ n, and B and B′ have the
same number of elements, because B′ must also be finite.
20.3 Transcendental Field Extensions � 305

Since the cardinality of any transcendence basis for a field extension E|K is the
same, we can define the transcendence degree.

Definition 20.3.8. The transcendence degree trgd(E|K) of a field extension is the cardi-
nal number of one (and hence of each) transcendence basis of E|K. A field extension E|K
is called purely transcendental, if E|K has a transcendence basis B with E = K(B).

We note the following facts:


(1) If E|K is purely transcendental and B = {α1 , . . . , αn } is a transcendence basis of E|K,
then E is K-isomorphic to the quotient field of the polynomial ring K[x1 , . . . , xn ] of
the independence indeterminates x1 , . . . , xn .
(2) K is algebraically closed in E if E|K is purely transcendental.
(3) By Theorem 20.3.4, the field extension E|K has an intermediate field F, K ⊂ F ⊂
E, such that F|K is purely transcendental, and E|F is algebraic. Certainly F is not
uniquely determined.
For example, take ℚ ⊂ F ⊂ ℚ(i, π), and for F, we may take F = ℚ(π), and also
F = ℚ(iπ), for instance.
(4) trgd(ℝ|ℚ) = trgd(ℂ|ℚ) = card ℝ, the cardinal number of ℝ. This holds, because the
set of the algebraic numbers (over ℚ) is countable.

Theorem 20.3.9. Let E|K be a field extension and F an arbitrary intermediate field, that
is, K ⊂ F ⊂ E. Let B be a transcendence basis of F|K and B′ a transcendence base of E|F.
Then B ∩ B′ = 0, and B ∪ B′ is a transcendence basis of E|K.
In particular, trgd(E|K) = trgd(E|F) + trgd(F|K).

Proof. (1) Assume α ∈ B ∩ B′ . As an element of F, then α is algebraic over F(B′ ) \ {α}. But
this gives a contradiction, because α ∈ B′ , and B′ is algebraically independent over F.
(2) F|K(B) is an algebraic extension, and also F(B′ )|K(B ∪ B′ ) = K(B)(B′ ). Since the
relation “algebraic extension” is transitive, we have that E|K(B ∪ B′ ) is algebraic.
(3) Finally, we have to show that B ∪ B′ is algebraically independent over K. By The-
orems 20.3.5 and 20.3.6, there is a subset B′′ of B ∪ B′ with B ∩ B′′ = 0 such that B ∪ B′′ is
a transcendence basis of E|K. We have B′′ ⊂ B′ , and have to show that B′ ⊂ B′′ . Assume
that there is an α ∈ B′ with α ∉ B′′ . Then α is algebraic over K(B ∪ B′′ ) = K(B)(B′′ ), and
hence algebraic over F(B′′ ). Since B′′ ⊂ B′ , we have that α is algebraically independent
over F, which gives a contradiction. Hence, B′′ = B′ .

Theorem 20.3.10 (Noether’s normalization theorem). Let K be a field and A = K[a1 , . . . , an ].


Then there exist elements u1 , . . . , um , 0 ≤ m ≤ n, in A with the following properties:
(1) K[u1 , . . . , um ] is K-isomorphic to the polynomial ring K[x1 , . . . , xm ] of the independent
indeterminates x1 , . . . , xm .
(2) The ring extension A|K[u1 , . . . , um ] is an integral extension, that is, for each

a ∈ A \ K[u1 , . . . , um ]
306 � 20 Integral and Transcendental Extensions

there exists a monic polynomial

f (x) = x n + αn−1 x n−1 + ⋅ ⋅ ⋅ + α0 ∈ K[u1 , . . . , um ][x]

of degree n ≥ 1 with

f (a) = an + αn−1 an−1 + ⋅ ⋅ ⋅ + α0 = 0.

In particular, A|K[u1 , . . . , um ] is finite.

Proof. Without loss of generality, let the a1 , . . . , an be pairwise different. We prove the
theorem by induction on n. If n = 1, then there is nothing to show. Now, let n ≥ 2,
and assume that the statement holds for n − 1. If there is no nontrivial algebraic re-
lation f (a1 , . . . , an ) = 0 over K between the a1 , . . . , an , then there is nothing to show.
Hence, let there exist a polynomial f ∈ K[x1 , . . . , xn ] with f ≠ 0 and f (a1 , . . . , an ) = 0. Let
ν ν
f = ∑ν=(ν1 ,...,νn ) cν x1 1 ⋅ ⋅ ⋅ xnn . Let μ2 , μ3 , . . . , μn be natural numbers, which we specify later.
μ2 μ μ μ
Define b2 = a2 − a1 , b3 = a3 − a1 3 , . . . , bn = an − a1 n . Then ai = bi + a1 i for 2 ≤ i ≤ n,
μ2 μn
hence, f (a1 , b2 + a1 , . . . , bn + a1 ) = 0. We write R := K[x1 , . . . , xn ] and consider the poly-
nomial ring R[y2 , . . . , yn ] of the n − 1 independent indeterminates y2 , . . . , yn over R. In
μ μ
R[y2 , . . . , yn ], we consider the polynomial f (x1 , y2 + x1 2 , . . . , yn + x1 n ). We may rewrite this
polynomial as
ν +μ2 ν2 +⋅⋅⋅+μn νn
∑ cν x1 1 + g(x1 , y2 , . . . , yn )
ν=(ν1 ,...,νn )

with a polynomial g(x1 , y2 , . . . , yn ), for which, as a polynomial in x1 over K[y2 , . . . , yn ],


ν +μ ν +⋅⋅⋅+μn νn
the degree in x1 is smaller than the degree of ∑ν=(ν1 ,...,νn ) cν x1 1 2 2 , provided that
we may choose the μ2 , . . . , μn in such a way that this really holds. We now specify the
μ2 , . . . , μn . We write μ := (1, μ2 , . . . , μn ), and define the scalar product μν = 1 ⋅ ν1 + μ2 ν2 +
⋅ ⋅ ⋅ + μn νn . Choose p ∈ ℕ with p > deg(f ) = max{ν1 + ⋅ ⋅ ⋅ + νn : cν ≠ 0}. We now take
μ = (1, p, p2 , . . . , pn−1 ). If ν = (ν1 , . . . , νn ) with cν ≠ 0 and ν′ = (ν1′ , . . . , νn′ ) with cν′ ′ ≠ 0 are
different n-tuples then indeed μν ≠ μν′ because νi , νi′ < p for all i, 1 ≤ i ≤ n. This follows
from the uniqueness of the p-adic expression of a natural number. Hence, we may choose
μ μ
μ2 , . . . , μn such that f (x1 , y2 +x1 2 , . . . , yn +x1 n ) = cx1N +h(x1 , y2 , . . . , yn ) with c ∈ K, c ≠ 0, and
h ∈ K[y2 , . . . , yn ][x1 ] has in x1 a degree < N. If we divide by c and take a1 , b2 , . . . , bn for
x1 , y2 , . . . , yn , then we get an integral equation of a1 over K[b2 , . . . , bn ]. Therefore, the ring
μ
extension A = K[a1 , . . . , an ]|K[b2 , . . . , bn ] is integral (see Theorem 20.2.9), ai = bi + a1 i for
2 ≤ i ≤ n. By induction, there exist elements u1 , . . . , um in K[b2 , . . . , bn ] with the following
properties:
1. K[u1 , . . . , um ] is a polynomial ring of the m independent indeterminates u1 , . . . , um ,
and
2. K[b2 , . . . , bn ]|K[u1 , . . . , um ] is integral.

Hence, also A|K[u1 , . . . , um ] is integral by Theorem 20.2.14.


20.4 The Transcendence of e and π � 307

Corollary 20.3.11. Let E|K be a field extension. If E = K[a1 , . . . , an ] for a1 , . . . , an ∈ E, then


E|K is algebraic.

Proof. By Theorem 20.3.10, we have that E contains a polynomial ring K[u1 , . . . , um ],


0 ≤ m ≤ n, of the m independent indeterminates u1 , . . . , um as a subring, for which
E|K[u1 , . . . , um ] is integral. We claim that then already K[u1 , . . . , um ] is a field. To prove
that, let a ∈ K[u1 , . . . , um ], a ≠ 0. The element a−1 ∈ E satisfies an integral equation
(a−1 )n + αn−1 (a−1 )n−1 + ⋅ ⋅ ⋅ + α0 = 0 over K[u1 , . . . , um ] =: R. Hence,

a−1 = −αn−1 − αn−2 a − ⋅ ⋅ ⋅ − α0 an−1 ∈ R.

Therefore, R is a field, which proves the claim. This is possible only for m = 0, and then
E|K is integral; here, that is algebraic.

20.4 The Transcendence of e and π


Although we have shown that within ℂ, there are continuously many transcendental
numbers, we have only shown that one particular number is transcendental. In this
section, we prove that the numbers e and π are transcendental. We start with e.

Theorem 20.4.1. e is a transcendental number, that is, transcendental over ℚ.

Proof. Let f (x) ∈ ℝ[x] with the degree of f (x) = m ≥ 1.


Let z1 ∈ ℂ, z1 ≠ 0, and γ : [0, 1] → ℂ, γ(t) = tz1 . Let
z1
z1 −z
I(z1 ) = ∫ e f (z)dz = (∫) ez1 −z f (z)dz.
γ 0 γ

z
By (∫0 1 )γ , we mean the integral from 0 to z1 along γ. Recall that

z1 z1

(∫) ez1 −z f (z)dz = −f (z1 ) + ez1 f (0) + (∫) ez1 −z f ′ (z)dz.


0 γ 0 γ

It follows then by repeated partial integration that


(1) I(z1 ) = ez1 ∑m m
j=0 f (0) − ∑j=0 f (z1 ).
(j) (j)

Let |f |(x) be the polynomial we get if we replace the coefficients of f (x) by their absolute
values. Since |ez1 −z | ≤ e|z1 −z| ≤ e|z1 | , we get
(2) |I(z1 )| ≤ |z1 |e|z1 | |f |(|z1 |).

Now assume that e is an algebraic number; that is,


(3) q0 + q1 e + ⋅ ⋅ ⋅ + qn en = 0 for n ≥ 1 and integers q0 ≠ 0, q1 , . . . , qn , and the greatest
common divisor of q0 , q1 , . . . , qn , is equal to 1.
308 � 20 Integral and Transcendental Extensions

For a detailed proof of these facts see for instance [52]. We consider now the polynomial
f (x) = x p−1 (x − 1)p ⋅ ⋅ ⋅ (x − n)p with p a sufficiently large prime number, and we consider
I(z1 ) with respect to this polynomial. Let J = q0 I(0) + q1 I(1) + ⋅ ⋅ ⋅ + qn I(n).
From (1) and (3), we get that

m n
J = − ∑ ∑ qk f (j) (k),
j=0 k=0

where m = (n + 1)p − 1, since (q0 + q1 e + ⋅ ⋅ ⋅ + qn en )(∑m


j=0 f (0)) = 0.
(j)

Now, f (j) (k) = 0 if j < p, k > 0, and if j < p − 1, then k = 0. Hence, f (j) (k) is
an integer that is divisible by p! for all j, k, except for j = p − 1, k = 0. Furthermore,
f (p−1) (0) = (p − 1)!(−1)np (n!)p . Hence, if p > n, then f (p−1) (0) is an integer divisible by
(p − 1)!, but not by p!. It follows that J is a nonzero integer that is divisible by (p − 1)!
if p > |q0 | and p > n. So let p > n, p > |q0 |, so that |J| ≥ (p − 1)!. Now, |f |(k) ≤ (2n)m .
Together with (2), we then get that |J| ≤ |q1 |e|f |(1) + ⋅ ⋅ ⋅ + |qn |nen |f |(n) ≤ cp for a number
c independent of p. It follows that

(p − 1)! ≤ |J| ≤ cp ;

that is,

|J| cp−1
1≤ ≤c .
(p − 1)! (p − 1)!

cp−1
This gives a contradiction, since (p−1)!
→ 0 as p → ∞.

We now move on to the transcendence of π. We first need the following lemma:

Lemma 20.4.2. Suppose α ∈ ℂ is an algebraic number and f (x) = an x n + ⋅ ⋅ ⋅ + a0 , n ≥ 1,


an ≠ 0, and all ai ∈ ℤ (f (x) ∈ ℤ[x]) with f (α) = 0. Then an α is an algebraic integer.

Proof.
ann−1 f (x) = ann x n + ann−1 an−1 x n−1 + ⋅ ⋅ ⋅ + ann−1 a0
= (an x)n + an−1 (an x)n−1 + ⋅ ⋅ ⋅ + ann−1 a0
= g(an x) = g(y) ∈ ℤ[y],

where y = an x, and g(y) is monic. Then g(an α) = 0; hence, an α is an algebraic integer.

Theorem 20.4.3. π is a transcendental number, that is, transcendental over ℚ.

Proof. Assume that π is an algebraic number. Then θ = iπ is also algebraic. Consider the
conjugates θ1 = θ, θ2 , . . . , θd of θ. Suppose

p(x) = q0 + q1 x + ⋅ ⋅ ⋅ + qd x d ∈ ℤ[x], qd > 0, and gcd(q0 , . . . , qd ) = 1


20.4 The Transcendence of e and π � 309

is the entire minimal polynomial of θ over ℚ. Then θ1 = θ, θ2 , . . . , θd are the zeros of


this polynomial. Let t = qd . Then from Lemma 20.4.2, tθi is an algebraic integer for all i.
From eiπ + 1 = 0, and from θ1 = iπ, we get that

(1 + eθ1 )(1 + eθ2 ) ⋅ ⋅ ⋅ (1 + eθd ) = 0.

The product on the left side can be written as a sum of 2d terms eϕ , where

ϕ = ϵ1 θ1 + ⋅ ⋅ ⋅ + ϵd θd ,

ϵj = 0 or 1. Let n be the number of terms ϵ1 θ1 + ⋅ ⋅ ⋅ + ϵd θd that are nonzero. Call these


α1 , . . . , αn . We then have an equation

q + eα1 + ⋅ ⋅ ⋅ + eαn = 0

with q = 2d − n > 0. Recall that all tαi are algebraic integers, and we consider the poly-
nomial

f (x) = t np x p−1 (x − α1 )p ⋅ ⋅ ⋅ (x − αn )p

with p a sufficiently large prime integer. We have f (x) ∈ ℝ[x], since the αi are algebraic
numbers, and the elementary symmetric polynomials in α1 , . . . , αn are rational numbers.
Let I(z1 ) be defined as in the proof of Theorem 20.4.1, and now let

J = I(α1 ) + ⋅ ⋅ ⋅ + I(αn ).

From (1) in the proof of Theorem 20.4.1 and (4), we get


m m n
J = −q ∑ f (j) (0) − ∑ ∑ f (j) (αk ),
j=0 j=0 k=1

with m = (n + 1)p − 1.
Now, ∑nk=1 f (j) (αk ) is a symmetric polynomial in tα1 , . . . , tαn with integer coefficients,
since the tαi are algebraic integers. It follows from the main theorem on symmetric poly-
nomials that ∑m n
j=0 ∑k=1 f (αk ) is an integer. Furthermore, f (αk ) = 0 for j < p. Hence,
(j) (j)

∑m n
j=0 ∑k=1 f (αk ) is an integer divisible by p!. Now, f (0) is an integer divisible by p! if
(j) (j)

j ≠ p − 1, and f (p−1) (0) = (p − 1)!(−t)np × (α1 ⋅ ⋅ ⋅ αn )p is an integer divisible by (p − 1)!, but


not divisible by p! if p is sufficiently large. In particular, this is true if p > |t n (α1 ⋅ ⋅ ⋅ αn )|
and also p > q.
From (2) in the proof of Theorem 20.4.1, we get that

|J| ≤ |α1 |e|α1 | |f |(|α1 |) + ⋅ ⋅ ⋅ + |αn |e|αn | |f |(|αn |) ≤ cp

for some number c independent of p.


310 � 20 Integral and Transcendental Extensions

As in the proof of Theorem 20.4.1, this gives us

(p − 1)! ≤ |J| ≤ cp ;

that is,

|J| cp−1
1≤ ≤c .
(p − 1)! (p − 1)!

cp−1
This, as before, gives a contradiction, since (p−1)!
→ 0 as p → ∞. Therefore, π is
transcendental.

20.5 Exercises
1. A polynomial p(x) ∈ ℤ[x] is primitive if the GCD of all its coefficients is 1. Prove the
following:
(i) If f (x) and g(x) are primitive, then so is f (x)g(x).
(ii) If f (x) ∈ ℤ[x] is monic, then it is primitive.
(iii) If f (x) ∈ ℚ[x], then there exists a rational number c such that f (x) = cf1 (x) with
f1 (x) primitive.
2. Let d be a square-free integer and K = ℚ(√d) be a quadratic field. Let RK be the
subring of K of the algebraic integers of K. Show the following:
(i) RK = {m + n√d : m, n ∈ ℤ} if d ≡ 2 (mod 4) or d ≡ 3 (mod 4). {1, √d} is an
integral basis for RK .
(ii) RK = {m + n 1+2 d : m, n ∈ ℤ} if d ≡ 1 (mod 4). {1, 1+2 d } is an integral basis for RK .
√ √

(iii) If d < 0, then there are only finitely many units in RK .


(iv) If d > 0, then there are infinitely many units in RK .
3. Let K = ℚ(α) with α3 + α + 1 = 0 and RK the subring of the algebraic integers in K.
Show that:
(i) {1, α, α2 } is an integral basis for RK .
(ii) RK = ℤ[α].
4. Let A|R be an integral ring extension. If A is an integral domain and R a field, then
A is also a field.
5. Let A|R be an integral extension. Let 𝒫 be a prime ideal of A and p be a prime ideal
of R such that 𝒫 ∩ R = p. Show that:
(i) If p is maximal in R, then 𝒫 is maximal in A.
(Hint: consider A/𝒫 .)
(ii) If 𝒫0 is another prime ideal of A with 𝒫0 ∩ R = p and 𝒫0 ⊂ 𝒫 , then 𝒫 = 𝒫0 .
(Hint: we may assume that A is an integral domain, and 𝒫 ∩ R = {0}, otherwise
go to A/𝒫 .)
6. Show that for a field extension E|K, the following are equivalent:
(i) [E : K(B)] < ∞ for each transcendence basis B of E|K.
20.5 Exercises � 311

(ii) trgd(E|K) < ∞ and [E : K(B)] < ∞ for each transcendence basis B of E|K.
(iii) There is a finite transcendence basis B of E|K with [E : K(B)] < ∞.
(iv) There are finitely many x1 , . . . , xn ∈ E with E = K(x1 , . . . , xn ).
7. Let E|K be a field extension. If E|K is purely transcendental, then K is algebraically
closed in E.
21 The Hilbert Basis Theorem and the Nullstellensatz
21.1 Algebraic Geometry
An extremely important application of abstract algebra and an application central to
all of mathematics is the subject of algebraic geometry. As the name suggests this is
the branch of mathematics that uses the techniques of abstract algebra to study geo-
metric problems. Classically, algebraic geometry involved the study of algebraic curves,
which roughly are the sets of zeros of a polynomial or set of polynomials in several vari-
ables over a field. For example, in two variables a real algebraic plane curve is the set
of zeros in ℝ2 of a polynomial p(x, y) ∈ ℝ[x, y]. The common planar curves, such as
parabolas and the other conic sections, are all plane algebraic curves. In actual prac-
tice, plane algebraic curves are usually considered over the complex numbers and are
projectivized.
The algebraic theory that deals most directly with algebraic geometry is called com-
mutative algebra. This is the study of commutative rings, ideals in commutative rings,
and modules over commutative rings. A large portion of this book has dealt with com-
mutative algebra.
Although we will not consider the geometric aspects of algebraic geometry in gen-
eral, we will close the book by introducing some of the basic algebraic ideas that are
crucial to the subject. These include the concept of an algebraic variety or algebraic set
and its radical. We also state and prove two of the cornerstones of the theory as applied
to commutative algebra—the Hilbert basis theorem and the nullstellensatz.
In this chapter, we also often consider a fixed field extension C|K and the polynomial
ring K[x1 , . . . , xn ] of the n independent indeterminates x1 , . . . , xn . Again, in this chapter,
we often use letters a, b, m, p, P, A, Q, . . . for ideals in rings.

21.2 Algebraic Varieties and Radicals


We first define the concept of an algebraic variety:

Definition 21.2.1. If M ⊂ K[x1 , . . . , xn ], then we define

n
𝒩 (M) = {(α1 , . . . , αn ) ∈ C : f (α1 , . . . , αn ) = 0 ∀f ∈ M}.

α = (α1 , . . . , αn ) ∈ 𝒩 (M) is called a zero (Nullstelle) of M in C n , and 𝒩 (M) is called the


zero set of M in C n . If we want to mention C, then we write 𝒩 (M) = 𝒩C (M). A subset
V ⊂ C n of the form V = 𝒩 (M) for some M ⊂ K[x1 , . . . , xn ] is called an algebraic variety
or (affine) algebraic set of C n over K, or just an algebraic K-set of C n .

For any subset N of C n , we can reverse the procedure and consider the set of poly-
nomials, whose zero set is N.

https://doi.org/10.1515/9783111142524-021
21.2 Algebraic Varieties and Radicals � 313

Definition 21.2.2. Suppose that N ⊂ C n . Then

I(N) = { f ∈ K[x1 , . . . , xn ] : f (α1 , . . . , αn ) = 0 ∀(α1 , . . . , αn ) ∈ N}.

Instead of f ∈ I(N), we also say that f vanishes on N (over K). If we want to mention K,
then we write I(N) = IK (N).

What is important is that the set I(N) forms an ideal. The proof is straightforward.

Theorem 21.2.3. For any subset N ⊂ C n , the set I(N) is an ideal in K[x1 , . . . , xn ]; it is called
the vanishing ideal of N ⊂ C n in K[x1 , . . . , xn ].

The following result examines the relationship between subsets in C n and their van-
ishing ideals.

Theorem 21.2.4. The following properties hold:


(1) M ⊂ M ′ ⇒ 𝒩 (M ′ ) ⊂ 𝒩 (M);
(2) If a = (M) is the ideal in K[x1 , . . . , xn ] generated by M, then 𝒩 (M) = 𝒩 (a);
(3) N ⊂ N ′ ⇒ I(N ′ ) ⊂ I(N);
(4) M ⊂ I 𝒩 (M) for all M ⊂ K[x1 , . . . , xn ];
(5) N ⊂ 𝒩 I(N) for all N ⊂ C n ;
(6) If (ai )i∈I is a family of ideals in K[x1 , . . . , xn ], then ⋂i∈I 𝒩 (ai ) = 𝒩 (∑i∈I ai ). Here
∑i∈I ai is the ideal in K[x1 , . . . , xn ], generated by the union ⋃i∈I ai ;
(7) If a, b are ideals in K[x1 , . . . , xn ], then 𝒩 (a) ∪ 𝒩 (b) = 𝒩 (ab) = 𝒩 (a ∩ b). Here ab is
the ideal in K[x1 , . . . , xn ] generated by all products fg, where f ∈ a and g ∈ b;
(8) 𝒩 (M) = 𝒩 I 𝒩 (M) for all M ⊂ K[x1 , . . . , xn ];
(9) V = 𝒩 I(V ) for all algebraic K-sets V ;
(10) I(N) = I 𝒩 I(N) for all N ⊂ C n .

Proof. The proofs are straightforward. Hence, we prove only (7), (8), and (9). The rest
can be left as exercise for the reader.
Proof of (7): Since ab ⊂ a ∩ b ⊂ a, b, we have, by (1), the inclusion

𝒩 (a) ∪ 𝒩 (b) ⊂ 𝒩 (a ∩ b) ⊂ 𝒩 (ab).

Hence, we have to show that 𝒩 (ab) ⊂ 𝒩 (a) ∪ 𝒩 (b).


Let α = (α1 , . . . , αn ) ∈ C n be a zero of ab, but not a zero of a. Then there is an f ∈ a
with f (α) ≠ 0; hence, for all g ∈ b, we get f (α)g(α) = (fg)(α) = 0. Thus, g(α) = 0.
Therefore, α ∈ 𝒩 (b).
Proof of (8) and (9): Let M ⊂ K[x1 , . . . , xn ]. Then, on the one hand, M ⊂ I 𝒩 (M) by
(5), and further 𝒩 I 𝒩 (M) ⊂ 𝒩 (M) by (1). On the other hand, 𝒩 (M) ⊂ 𝒩 I 𝒩 (M) by (6).
Therefore, 𝒩 (M) = 𝒩 I 𝒩 (M) for all M ⊂ K[x1 , . . . , xn ].
Now, the algebraic K-sets of C n are precisely the sets of the form V = 𝒩 (M). Hence,
V = 𝒩 I(V ).
314 � 21 The Hilbert Basis Theorem and the Nullstellensatz

We make the following agreement: if a is an ideal in K[x1 , . . . , xn ], then we write

a ⊲ K[x1 , . . . , xn ].

If a ⊲ K[x1 , . . . , xn ], then we do not have a = I 𝒩 (a) in general. That is, a is, in general,
not equal to the vanishing ideal of its zero set in C n . The reason for this is that not each
ideal a occurs as a vanishing ideal of some N ⊂ C n . If a = I(N), then we must have that
f m ∈ a for m ≥ 1 implies f ∈ a.
Hence, for instance, if a = (x12 , . . . , xn2 ) ⊲ K[x1 , . . . , xn ], then a is not of the form a =
I(N) for some N ⊂ C n . We now define the radical of an ideal:

Definition 21.2.5. Let R be a commutative ring and a ⊲ R an ideal in R. Then

√a = { f ∈ R : f m ∈ a for some m ∈ ℕ}

is an ideal in R. √a is called the radical of a (in R). a is said to be reduced if √a = a.

We note that the √0 is called the nil radical of R; it contains exactly the nilpotent
elements of R; that is, the elements a ∈ R with am = 0 for some m ∈ ℕ.
Let a ⊲ R be an ideal in R and π : R → R/a the canonical mapping. Then √a is exactly
the preimage of the nil radical of R/a.

21.3 The Hilbert Basis Theorem


In this section, we show that if K is a field, then each ideal a ⊲ K[x1 , . . . , xn ] is finitely
generated. This is the content of the Hilbert basis theorem. This has as an important
consequence: any algebraic variety of C n is the zero set of only finitely many polynomi-
als.
The Hilbert basis theorem follows directly from the following Theorem 21.3.2. Before
we state this theorem, we need a definition.

Definition 21.3.1. Let R be a commutative ring with an identity 1 ≠ 0. R is said to be


Noetherian if each ideal in R is generated by finitely many elements; that is, each ideal
in R is finitely generated.

Theorem 21.3.2. Let R be a noetherian ring. Then the polynomial ring R[x] over R is also
noetherian.

Proof. Let 0 ≠ fk ∈ R[x]. We denote the degree of fk with deg(fk ). Let a ⊲ R[x] be an ideal
in R[x]. Assume that a is not finitely generated. Then, particularly, a ≠ 0. We construct a
sequence of polynomials fk ∈ a such that the highest coefficients ak generate an ideal in
R, which is not finitely generated. This produces then a contradiction; hence, a is in fact
finitely generated. Choose f1 ∈ a, f1 ≠ 0, so that deg(f1 ) = n1 is minimal.
21.4 The Nullstellensatz � 315

If k ≥ 1, then choose fk+1 ∈ a, fk+1 ∉ (f1 , . . . , fk ) so that deg(fk+1 ) = nk+1 is minimal


for the polynomials in a \ (f1 , . . . , fk ). This is possible, because we assume that a is not
finitely generated.
We have nk ≤ nk+1 by construction. Furthermore, (a1 , . . . , ak ) ⫋ (a1 , . . . , ak , ak+1 ).
To see this, assume that (a1 , . . . , ak ) = (a1 , . . . , ak , ak+1 ). Then ak+1 ∈ (a1 , . . . , ak ).
Hence, there are bi ∈ R with ak+1 = ∑ki=1 ai bi . Let g(x) = ∑ki=1 bi fi (x)x nk+1 −ni ; hence, g ∈
(f1 , . . . , fk ), and g = ak+1 x nk+1 + ⋅ ⋅ ⋅.
Therefore, deg(fk+1 −g) < nk+1 , and fk+1 −g ∉ (f1 , . . . , fk ), which contradicts the choice
of fk+1 . This proves the claim. Hence, (a1 , . . . , ak ) ⫋ (a1 , . . . , ak , ak+1 ), which contradicts
the fact that R is Noetherian. Hence, a is finitely generated.
We now have the Hilbert basis theorem:

Theorem 21.3.3 (Hilbert basis theorem). Let K be a field. Then any ideal a ⊲ K[x1 , . . . , xn ]
is finitely generated; that is, a = (f1 , . . . , fm ) for finitely many f1 , . . . , fm ∈ K[x1 , . . . , xn ].

Corollary 21.3.4. If C|K is a field extension, then each algebraic K-set V of C n is already
the zero set of only finitely many polynomials f1 , . . . , fm ∈ K[x1 , . . . , xn ]:

V = {(α1 , . . . , αn ) ∈ C n : fi (α1 , . . . , αn ) = 0 for i = 1, . . . , m}.

Furthermore, we write V = 𝒩 (f1 , . . . , fm ).

21.4 The Nullstellensatz


Vanishing ideals of subsets of C n are not necessarily reduced. For an arbitrary field C,
the condition

f m ∈ a, m ≥ 1 󳨐⇒ f ∈ a

is, in general, not sufficient for a ⊲ K[x1 , . . . , xn ] to be a vanishing ideal of a subset of C n .


For example, let n ≥ 2, K = C = ℝ and a = (x12 + ⋅ ⋅ ⋅ + xn2 ) ⊲ ℝ[x1 , . . . , xn ]. a is a prime
ideal in ℝ[x1 , . . . , xn ], because x12 + ⋅ ⋅ ⋅ + xn2 is a prime element in ℝ[x1 , . . . , xn ]. Hence, a is
reduced. But, on the other hand, 𝒩 (a) = {0}, and I({0}) = (x1 , . . . , xn ). Therefore, a is not
of the form I(N) for some N ⊂ C n . If this would be the case, then a = I(N) = I 𝒩 I(N) =
I{0} = (x1 , . . . , xn ), because of Theorem 21.2.4 (10), which gives a contradiction.
The nullstellensatz by Hilbert, which we give in two forms shows that if a is reduced,
that is, a = √a, then I 𝒩 (a) = a.

Theorem 21.4.1 (Hilbert’s nullstellensatz, first form). Let C|K be a field extension with C
algebraically closed. If a ⊲ K[x1 , . . . , xn ], then I 𝒩 (a) = √a. Moreover, if a is reduced, that
is, a = √a, then I 𝒩 (a) = a. Therefore, 𝒩 defines a bijective map between the set of reduced
ideals in K[x1 , . . . , xn ] and the set of the algebraic K-sets in C n , and I defines the inverse
map.
316 � 21 The Hilbert Basis Theorem and the Nullstellensatz

The proof follows from the following:

Theorem 21.4.2 (Hilbert’s nullstellensatz, second form). Let C|K be a field extension with
C algebraically closed. Let a ⊲ K[x1 , . . . , xn ] with a ≠ K[x1 , . . . , xn ]. Then there exists an
α = (α1 , . . . , αn ) ∈ C n with f (α) = 0 for all f ∈ a; that is, 𝒩C (a) ≠ 0.

Proof. Since a ≠ K[x1 , . . . , xn ], there exists a maximal ideal m ⊲ K[x1 , . . . , xn ] with a ⊂ m.


We consider the canonical map π : K[x1 , . . . , xn ] → K[x1 , . . . , xn ]/m. Let βi = π(xi ) for
i = 1, . . . , n. Then K[x1 , . . . , xn ]/m = K[β1 , . . . , βn ] =: E. Since m is maximal, E is a field.
Moreover, E|K is algebraic by Corollary 20.3.11. Hence, there exists a K-homomorphism
σ : K[β1 , . . . , βn ] → C (C is algebraically closed). Let αi = σ(βi ). As a result we have
f (α1 , . . . , αn ) = 0 for all f ∈ m. Since a ⊂ m this holds also for all f ∈ a. Hence, we get a
zero (α1 , . . . , αn ) of a in C n .

Proof of Theorem 21.4.1. Let a ⊲ K[x1 , . . . , xn ], and let f ∈ I 𝒩 (a). We have to show that
f m ∈ a for some m ∈ ℕ. If f = 0, then there is nothing to show.
Now, let f ≠ 0. We consider K[x1 , . . . , xn ] as a subring of K[x1 , . . . , xn , xn+1 ] of the n + 1
independent indeterminates x1 , . . . , xn , xn+1 . In K[x1 , . . . , xn , xn+1 ], we consider the ideal
ā = (a, 1 − xn+1 f ) ⊲ K[x1 , . . . , xn , xn+1 ], generated by a and 1 − xn+1 f .
Case 1: ā ≠ K[x1 , . . . , xn , xn+1 ]. Then ā has a zero (β1 , . . . , βn , βn+1 ) in C n+1 by Theo-
rem 21.2.4. Hence, for (β1 , . . . , βn , βn+1 ) ∈ 𝒩 (a), ̄ we have the equations:
(1) g(β1 , . . . , βn ) = 0 for all g ∈ a, and
(2) f (β1 , . . . , βn )βn+1 = 1.

From (1), we get (β1 , . . . , βn ) ∈ 𝒩 (a). In particular, f (β1 , . . . , βn ) = 0 for our f ∈ I 𝒩 (a).
But this contradicts (2). Therefore, ā ≠ K[x1 , . . . , xn , xn+1 ] is not possible. Thus, we have
Case 2: ā = K[x1 , . . . , xn , xn+1 ], that is, 1 ∈ a.̄ Then there exists a relation of the form

1 = ∑ hi gi + h(1 − xn+1 f ) for some gi ∈ a and hi , h ∈ K[x1 , . . . , xn , xn+1 ].


i

The map given by xi 󳨃→ xi for 1 ≤ i ≤ n and xn+1 󳨃→ f1 defines a homomorphism ϕ :


K[x1 , . . . , xn , xn+1 ] → K(x1 , . . . , xn ), the quotient field of K[x1 , . . . , xn ]. From (3), we get
a relation 1 = ∑i hi (x1 , . . . , xn , f1 )gi (x1 , . . . , xn ) in K(x1 , . . . , xn ). If we multiply this with a
suitable power f m of f , we get f m = ∑i h̃ i (x1 , . . . , xn )gi (x1 , . . . , xn ) for some polynomials
h̃ ∈ K[x , . . . , x ]. Since g ∈ a, we get f m ∈ a.
1 n i

21.5 Applications and Consequences of Hilbert’s Theorems


Theorem 21.5.1. Each nonempty set of algebraic K-sets in C n contains a minimal element.
In other words, for each descending chain

V1 ⊃ V2 ⊃ ⋅ ⋅ ⋅ ⊃ Vm ⊃ Vm+1 ⊃ ⋅ ⋅ ⋅ (21.1)
21.5 Applications and Consequences of Hilbert’s Theorems � 317

of algebraic K-sets Vi in C n , there exists an integer m such that Vm = Vm+1 = Vm+2 = ⋅ ⋅ ⋅ ,


or equivalently, every strictly descending chain V1 ⫌ V2 ⫌ ⋅ ⋅ ⋅ of algebraic K-sets Vi in C n
is finite.

Proof. We apply the operator I; that is, we pass to the vanishing ideals. This gives an
ascending chain of ideals

I(V1 ) ⊂ I(V2 ) ⊂ ⋅ ⋅ ⋅ ⊂ I(Vm ) ⊂ I(Vm+1 ) ⊂ ⋅ ⋅ ⋅ . (21.2)

The union of the I(Vi ) is an ideal in K[x1 , . . . , xn ], and hence, by Theorem 21.3.3,
finitely generated. Therefore, there is an m with I(Vm ) = I(Vm+1 ) = I(Vm+2 ) = ⋅ ⋅ ⋅ .
Now we apply the operator 𝒩 and get the desired result, because Vi = 𝒩 I(Vi ) by
Theorem 21.2.4 (10).

Definition 21.5.2. An algebraic K-set V ≠ 0 in C n is called irreducible if it is not describ-


able as a union V = V1 ∪ V2 of two algebraic K-sets Vi ≠ 0 in C n with Vi ≠ V for i = 1, 2.
An irreducible algebraic K-set in C n is also called a K-variety in C n .

Theorem 21.5.3. An algebraic K-set V ≠ 0 in C n is irreducible if and only if its vanishing


ideal Ik (V ) = I(V ) is a prime ideal of R = K[x1 , . . . , xn ] with I(V ) ≠ R.

Proof. (1) Let V be irreducible. Let fg ∈ I(V ). Then V = 𝒩 I(V ) ⊂ 𝒩 (fg) = 𝒩 (f ) ∪ 𝒩 (g);
hence, V = V1 ∪ V2 with the algebraic K-sets V1 = 𝒩 (f ) ∩ V and V2 = 𝒩 (g) ∩ V . Now
V is irreducible; hence, V = V1 , or V = V2 , say V = V1 . Then V ⊂ 𝒩 (f ). Therefore,
f ∈ I 𝒩 (f ) ⊂ I(V ). Since V ≠ 0, we have further 1 ∉ I(V ); that is, I(V ) ≠ R.
(2) Let I(V )⊲R with I(V ) ≠ R be a prime ideal. Let V = V1 ∪V2 , V1 ≠ V , with algebraic
K-sets Vi in C n . First,

I(V ) = I(V1 ∪ V2 ) = I(V1 ) ∩ I(V2 ) ⊃ I(V1 )I(V2 ), (⋆)

where I(V1 )I(V2 ) is the ideal generated by all products fg with f ∈ I(V1 ), g ∈ I(V2 ).
We have I(V1 ) ≠ I(V ), because otherwise V1 = 𝒩 I(V1 ) = 𝒩 I(V ) = V contradicting
V1 ≠ V . Hence, there is a f ∈ I(V1 ) with f ∉ I(V ). Now, I(V ) ≠ R is a prime ideal; hence,
necessarily I(V2 ) ⊂ I(V ) by (⋆). It follows that V ⊂ V2 . Therefore, V is irreducible.

Note that the affine space K n is, as the zero set of the zero polynomial 0, itself an
algebraic K-set in K n . If K is infinite, then I(K n ) = {0}. Hence, K n is irreducible by The-
orem 21.5.3. Moreover, if K is infinite, then K n can not be written as a union of finitely
many proper algebraic K-subsets. If K is finite, then K n is not irreducible.
Furthermore, each algebraic K-set V in C n is also an algebraic C-set in C n . If V is
an irreducible algebraic K-set in C n , then—in general—it is not an irreducible algebraic
C-set in C n .

Theorem 21.5.4. Let V be an algebraic K-set in C n . Then V can be written as a finite union
V = V1 ∪ V2 ∪ ⋅ ⋅ ⋅ ∪ Vr of irreducible algebraic K-sets Vi in C n . If here Vi ⊈ Vk for all pairs
318 � 21 The Hilbert Basis Theorem and the Nullstellensatz

(i, k) with i ≠ k, then this presentation is unique, up to the ordering of the Vi , and then the
Vi are called the irreducible K-components of V .

Proof. Let a be the set of all algebraic K-sets in C n , which can not be presented as a finite
union of irreducible algebraic K-sets in C n .
Assume that a ≠ 0. By Theorem 21.4.1, there is a minimal element V in a. This V
is not irreducible, otherwise we have a presentation as desired. Hence, there exists a
presentation V = V1 ∪ V2 with algebraic K-sets Vi , which are strictly smaller than V . By
definition, both V1 and V2 have a presentation as desired; hence, V also has one, which
gives a contradiction. Hence, a = 0.
Now suppose that V = V1 ∪ ⋅ ⋅ ⋅ ∪ Vr = W1 ∪ ⋅ ⋅ ⋅ ∪ Ws are two presentations of the
desired form. For each Vi , we have a presentation Vi = (Vi ∩ W1 ) ∪ ⋅ ⋅ ⋅ ∪ (Vi ∩ Ws ). Each
Vi ∩ Wj is a K-algebraic set (see Theorem 21.2.4). Since Vi is irreducible, we get that there
is a Wj with Vi = Vi ∩ Wj , that is, Vi ⊂ Wj . Analogously, for this Wj , there is a Vk with
Wj ⊂ Vk . Altogether, Vi ⊂ Wj ⊂ Vk . But Vp ⊈ Vq if p ≠ q. Hence, from Vi ⊂ Wj ⊂ Vk ,
we get i = k. Therefore, Vi = Wj ; that means, for each Vi there is a Wj with Vi = Wj .
Analogously, for each Wk , there is a Vl with Wk = Vl . This proves the theorem.

Example 21.5.5. 1. Let M = {gf } ⊂ ℝ[x, y] with g(x) = x 2 + y2 − 1 and f (x) = x 2 + y2 − 2.


Then we have 𝒩 (M) = V = V1 ∪ V2 , where V1 = 𝒩 (g), and V2 = 𝒩 (f ); V is not
irreducible.
2. Let M = { f } ⊂ ℝ[x, y] with f (x, y) = xy − 1; f is irreducible in ℝ[x, y]. Therefore, the
ideal (f ) is a prime ideal in ℝ[x, y]. Hence, V = 𝒩 (f ) is irreducible.

Definition 21.5.6. Let V be an algebraic K-set in C n . Then the residue class ring

K[V ] = K[x1 , . . . , xn ]/I(V )

is called the (affine) coordinate ring of V .

K[V ] can be identified with the ring of all those functions V → C, which are given
by polynomials from K[x1 , . . . , xn ]. As a homomorphic image of K[x1 , . . . , xn ], we get that
K[V ] can be described in the form K[V ] = K[α1 , . . . , αn ]; therefore, a K-algebra of the
form K[α1 , . . . , αn ] is often called an affine K-algebra. If the algebraic K-set V in C n is
irreducible—we can call V now an (affine) K-variety in C n —then K[V ] is an integral
domain with an identity, because I(V ) is then a prime ideal with I(V ) ≠ R by Theo-
rem 21.4.2. The quotient field K(V ) = Quot K[V ] is called the field of rational functions
on the K-variety V .
We note the following:
1. If C is algebraically closed, then V = C n is a K-variety, and K(V ) is the field
K(x1 , . . . , xn ) of the rational functions in n variables over K.
2. Let the affine K-algebra A = K[α1 , . . . , αn ] be an integral domain with an identity
1 ≠ 0. Then A ≅ K[x1 , . . . , xn ]/p for some prime ideal p ≠ K[x1 , . . . , xn ]. Hence, if C
is algebraically closed, then A is isomorphic to the coordinate ring of the K-variety
V = 𝒩 (p) in C n (see Hilbert’s nullstellensatz, first form, Theorem 21.4.1).
21.6 Dimensions � 319

3. If the affine K-algebra A = K[α1 , . . . , αn ] is an integral domain with an identity 1 ≠ 0,


then we define the transcendence degree trgd(A|K) to be the transcendence degree
of the field extension Quot(A)|K; that is, trgd(A|K) = trgd(Quot(A)|K), Quot(A) the
quotient field of A.
In this sense, trgd(K[x1 , . . . , xn ]|K) = n. Since Quot(A) = K(α1 , . . . , αn ), we get
trgd(A|K) ≤ n by Noether’s normalization theorem (Theorem 20.3.10).
4. An arbitrary affine K-algebra K[α1 , . . . , αn ] is, as a homomorphic image of the poly-
nomial ring K[x1 , . . . , xn ], noetherian (see Theorem 21.2.4 and Theorem 21.2.3).

Example 21.5.7. Let ω1 , ω2 ∈ ℂ two elements which are linear independent over ℝ. An
element ω = m1 ω1 + m2 ω2 with m1 , m2 ∈ ℤ, is called a period. The periods describe an
Abelian group Ω = {m1 ω1 + m2 ω2 : m1 , m2 ∈ ℤ} ≅ ℤ ⊕ ℤ and give a lattice in ℂ.

An elliptic function f (with respect to Ω) is a meromorphic function with period


group Ω, that is, f (z + w) = f (z) for all z ∈ ℂ. The Weierstrass ℘-function,

1 1 1
℘(z) = 2
+ ∑ ( 2
− 2 ),
z 0=w∈Ω
̸ (z − w) w

is an elliptic function.
1 1
With g2 = 60 ∑0=w∈Ω ̸ w4
, and g3 = 140 ∑0=w∈Ω
̸ w6
, we get the differential equation
2 3
℘ (z) = 4℘(z) + g2 ℘(z) + g3 = 0. The set of elliptic functions is a field E, and each elliptic

function is a rational function in ℘ and ℘′ (for details see, for instance, [44]).
The polynomial f (t) = t 2 − 4s3 + g2 s + g3 ∈ ℂ(s)[t] is irreducible over ℂ(s). For the
corresponding algebraic ℂ(s)-set V , we get K(V ) = ℂ(s)[t]/(t 2 − 4s3 + g2 s + g3 ) ≅ E with
respect to t 󳨃→ ℘′ , s 󳨃→ ℘.

21.6 Dimensions
From now we assume that C is algebraically closed.

Definition 21.6.1. (1) The dimension dim(V ) of an algebraic K-set V in C n is said to be


the supremum of all integers m, for which there exists a strictly descending chain
V0 ⊋ V1 ⊋ ⋅ ⋅ ⋅ ⊋ Vm of K-varieties Vi in C n with Vi ⊂ V for all i.
320 � 21 The Hilbert Basis Theorem and the Nullstellensatz

(2) Let A be a commutative ring with an identity 1 ≠ 0. The height h(p) of a prime ideal
p ≠ A of A is said to be the supremum of all integers m, for which there exists a
strictly ascending chain p0 ⊊ p1 ⊊ ⋅ ⋅ ⋅ ⊊ pm = p of prime ideals pi of A with pi ≠ A.
The dimension (Krull dimension) dim(A) of A is the supremum of the heights of all
prime ideals ≠ A in A.

Theorem 21.6.2. Let V be an algebraic K-set in C n . Then dim(V ) = dim(K[V ]).

Proof. By Theorem 21.2.4 and Theorem 21.4.2, we have a bijective map between the
K-varieties W with W ⊂ V and the prime ideals ≠ R = K[x1 , . . . , xn ] of R, which con-
tain I(V ) (the bijective map reverses the inclusion). But these prime ideals correspond
exactly with the prime ideals ≠ K[V ] of K[V ] = K[x1 , . . . , xn ]/I(V ), which gives the state-
ment.

Suppose that V is an algebraic K-set in C n , and let V1 , . . . , Vr the irreducible compo-


nents of V . Then dim(V ) = max{dim(V1 ), . . . , dim(Vr )}, because if V is a K-variety with
V ′ ⊂ V , Then, V ′ = (V ′ ∩ V1 ) ∪ ⋅ ⋅ ⋅ ∪ (V ′ ∩ Vr ). Hence, we may restrict ourselves on
K-varieties V .
If we consider the special case of the K-variety V = C 1 = C (recall that C is alge-
braically closed, and, hence, in particular, C is infinite). Then K[V ] = K[x], the polyno-
mial ring K[x] in one indeterminate x. Now, K[x] is a principal ideal domain, and hence,
each prime ideal ≠ K[x] is either a maximal ideal or the zero ideal {0} of K[x]. The only
K-varieties in V = C are therefore V itself and the zero set of irreducible polynomials in
K[x]. Hence, if V = C, then dim(V ) = dim K[V ] = 1 = trgd(K[V ]|K).

Theorem 21.6.3. Let A = K[α1 , . . . , αn ] be an affine K-algebra, and let A be also an integral
domain. Let {0} = p0 ⊊ p1 ⊊ ⋅ ⋅ ⋅ ⊊ pm be a maximal strictly ascending chain of prime ideals
in A (such a chain exists since A is noetherian). Then m = trgd(A|K) = dim(A). In other
words;
All maximal ideals of A have the same height, and this height is equal to the transcen-
dence degree of A over K.

Corollary 21.6.4. Let V be a K-variety in C n . Then dim(V ) = trgd(K[V ]|K).

We prove Theorem 21.6.3 in several steps.

Lemma 21.6.5. Let R be an unique factorization domain. Then each prime ideal p with
height h(p) = 1 is a principal ideal.

Proof. p ≠ {0}, since h(p) = 1. Hence, there is an f ∈ p, f ≠ 0. Since R is an unique


factorization domain, f has a decomposition f = p1 ⋅ ⋅ ⋅ ps with prime elements pi ∈ R.
Now, p is a prime ideal; hence, some pi ∈ p, because f ∈ p, say p1 ∈ p. Then we have the
chain {0} ⊊ (p1 ) ⊂ p, and (p1 ) is a prime ideal of R. Since h(p) = 1, we get (p1 ) = p.

Lemma 21.6.6. Let R = K[y1 , . . . , yr ] be the polynomial ring of the r independent indeter-
minates y1 , . . . , yr over the field K (recall that R is a unique factorization domain). If p is
21.6 Dimensions � 321

a prime ideal in R with height h(p) = 1, then the residue class ring R̄ = R/p has transcen-
dence degree r − 1 over K.

Proof. By Lemma 21.6.5, we have that p = (p) for some nonconstant polynomial
p ∈ K[y1 , . . . , yr ]. Let the indeterminate y = yr occur in p, that is, degy (p) ≥ 1, the
degree in y. If f is a multiple of p, then also degy (f ) ≥ 1. Hence, p ∩ K[y1 , . . . , yr ] ≠ {0}.
Therefore, the residue class mapping R → R̄ = K[ȳ1 , . . . , ȳr ] induces an isomorphism
K[y1 , . . . , yr−1 ] → K[ȳ1 , . . . , ȳr−1 ] of the subring K[y1 , . . . , yr−1 ]; that is, ȳ1 , . . . , ȳr−1 are al-
gebraically independent over K. On the other hand, p(ȳ1 , . . . , ȳr−1 , ȳr ) = 0 is a nontrivial
algebraic relation for ȳr over K(ȳ1 , . . . , ȳr−1 ).
Hence, altogether trgd(R|K) ̄ = trgd(K(ȳ1 , . . . , ȳr )|K) = r − 1 by Theorem 20.3.9.

Before we describe the last technical lemma, we need some preparatory theoretical
material.
Let R, A be integral domains (with identity 1 ≠ 0), and let A|R be a ring extension.
We first consider only R.
(1) A subset S ⊂ R \ {0} is called a multiplicative subset of R if 1 ∈ S for the identity 1
of R, and if s, t ∈ S, then also, st ∈ S. (x, s) ∼ (y, t) :⇔ xt − ys = 0 defines an equivalence
relation on M = R × S. Let xs be the equivalence class of (x, s) and S −1 R, the set of all
equivalence classes. We call xs a fraction. If we add and multiply fractions as usual,
we get that S −1 R becomes an integral domain; it is called the ring of fractions of R with
respect to S. If, in particular, S = R \ {0}, then S −1 R = Quot(R), the quotient field of R.
Now, back to the general situation. i : R → S −1 R, i(r) = r1 , defines an embedding of
R into S −1 R. Hence, we may consider R as a subring of S −1 R. For each s ∈ S ⊂ R \ {0},
we have that i(s) is an unit in S −1 R. That is, i(s) is invertible, and each element of S −1 R
has the form i(s)−1 i(r) with r ∈ R, s ∈ S. Therefore, S −1 R is uniquely determined up to
isomorphisms, and we have the following universal property:
If ϕ : R → R′ is a ring homomorphism (of integral domains) with ϕ(s) invertible
for each s ∈ S, then there exist exactly one ring homomorphism λ : S −1 R → R′ with
λ ∘ i = ϕ. If a ⊲ R is an ideal in a, then we write S −1 a for the ideal in S −1 R, generated
by i(a). S −1 a is the set of all elements of the form as with a ∈ a and s ∈ S. Furthermore,
S −1 a = (1) ⇔ a ∩ S ≠ 0.
Vice versa; if A ⊲ S −1 R is an ideal in S −1 R, then we also denote the ideal i−1 (A) ⊲ R
with A ∩ R. An ideal a ⊲ R is of the form a = i−1 (A) if and only if there is no s ∈ S such that
its image in R/a under the canonical map R → R/a is a proper zero divisor in R/a. Under
the mapping P → P ∩ R and p 󳨃→ S −1 p, the prime ideals in S −1 R correspond exactly to
the prime ideals in R, which do not contain an element of S.
We now identify R with i(R):
(2) Now, let p ⊲ R be a prime ideal in R. Then S = R \ p is multiplicative. In this
case, we write Rp instead of S −1 R, and call Rp the quotient ring of R with respect to p,
or the localization of R of p. Put m = pRp = S −1 p. Then 1 ∉ m. Each element of Rp /m is
a unit in Rp and vice versa. In other words, each ideal a ≠ (1) in Rp is contained in m,
or equivalently, m is the only maximal ideal in Rp . A commutative ring with an identity
322 � 21 The Hilbert Basis Theorem and the Nullstellensatz

1 ≠ 0, which has exactly one maximal ideal, is called a local ring. Hence, Rp is a local
ring. From part (1), we additionally get the prime ideals of the local ring Rp correspond
bijectively to the prime ideals of R, which are contained in p.
(3) Now we consider our ring extension A|R as above. Let q be a prime ideal in R.
Claim: If qA ∩ R = q, then there exists a prime ideal Q ⊲ A with Q ∩ R = q (and vice
versa).
Proof of the claim: If S = R \ q, then qA ∩ S = 0. Hence, qS −1 A is a proper ideal in
S −1 A, and hence contained in a maximal ideal m in S −1 A. Here, qS −1 A is the ideal in S −1 A,
which is generated by q. Define Q = m ∩ A; Q is a prime ideal in A, and Q ∩ R = q by
part (1), because Q ∩ S = 0, where S = R \ q.
(4) Now let A|R be an integral extension (A, R integral domains as above). Assume
that R is integrally closed in its quotient field K. Let P ⊲ A be a prime ideal in A and
p = P ∩ R.
Claim: If q ⊲ R is a prime ideal in A with q ⊂ p then qAp ∩ R = q.
Proof of the claim: An arbitrary β ∈ qAp has the form β = αs with α ∈ qA, qA (the
ideal in A generated by q), and s ∈ S = A \ p. An integral equation for α ∈ qA over K
is given a form αn + an−1 αn−1 + ⋅ ⋅ ⋅ + a0 = 0 with ai ∈ q. This can be seen as follows:
we have certainly a form α = b1 α1 + ⋅ ⋅ ⋅ + bm αm with bi ∈ q and αi ∈ A. The subring
A′ = R[α1 , . . . , αm ] is, as an R-module, finitely generated, and αA′ ⊂ qA′ . Now, ai ∈ q
follows with the same type of arguments as in the proof of Theorem 20.2.4.
Now, in addition, let β ∈ R. Then, for s = βα , we have an equation

an−1 n−1 a
sn + s + ⋅ ⋅ ⋅ + 0n = 0
β β
a
over K. But s is integral over R; hence, all βn−1i ∈ R.
We are now prepared to prove the last preliminary lemma, which we need for the
proof of Theorem 21.6.3.

Lemma 21.6.7 (Krull’s going up lemma). Let A|R be an integral ring extension of integral
domains, and let R be integrally closed in its quotient field. Let p and q be prime ideals in
R with q ⊂ p. Furthermore, let P be a prime ideal in A with P ∩ R = p. Then there exists a
prime ideal Q in A with Q ∩ R = q, and Q ⊂ P.

Proof. It is enough to show that there exists a prime ideal Q in Ap with Q ∩ R = q. This
can be seen from the preceding preparations. By part (1) and (2) such a Q has the form
Q = Q′ Ap with a prime ideal Q′ in A with Q′ ⊂ P, and Q ∩ A = Q′ . It follows that
q = Q′ ∩ R ⊂ P ∩ R = p. And the existence of such a Q follows from parts (3) and (4).

Proof of Theorem 21.6.3. Let first be m = 0. Then {0} is a maximal ideal in A; and
hence, A = K[α1 , . . . , αn ] a field. By Corollary 20.3.11 then, A|K is algebraic; therefore,
trgd(A|K) = 0. So, Theorem 21.3.3 holds for m = 0.
Now, let m ≥ 1. We use Noether’s normalization theorem. A has a polynomial ring
R = K[y1 , . . . , yr ] of the r independent indeterminates y1 , . . . , yr as a subring, and A|R is
21.6 Dimensions � 323

an integral extension. As a polynomial ring over K, the ring R is a unique factorization


domain, and hence, certainly, algebraically closed (in its quotient field).
Now, let

{0} = P0 ⊊ P1 ⊊ ⋅ ⋅ ⋅ ⊊ Pm (21.3)

be a maximal strictly ascending chain of prime ideals in A. If we intersect with R, we get


a chain

{0} = p0 ⊂ p1 ⊂ ⋅ ⋅ ⋅ ⊂ pm (21.4)

of prime ideals pi = Pi ∩ R of R. Since A|R is integral, the chain (21.4) is also a strictly
ascending chain. This follows from Krull’s going up lemma (Lemma 21.6.7), because if
pi = pj , then Pi = Pj . If Pm is a maximal ideal in A, then also pm is a maximal ideal in
R, because A|R is integral (consider A/Pm and use Theorem 20.2.19). If the chain (21.3) is
maximal and strictly, then also the chain (2).
Now, let the chain (21.3) be maximal and strictly. If we pass to the residue class rings
Ā = A/P1 and R̄ = R/p1 , then we get the chains of prime ideals

{0} = P̄ 1 ⊂ P̄ 2 ⊂ ⋅ ⋅ ⋅ ⊂ P̄ m and {0} = p̄ 1 ⊂ p̄ 2 ⊂ ⋅ ⋅ ⋅ ⊂ p̄ m

for the affine K-algebras Ā and R,̄ respectively, but with a 1 less length. By induction,
we may assume that already trgd(A|K) ̄ = m − 1 = trgd(R|K).̄ On the other hand, by
construction, we have trgd(A|K) = trgd(R|K) = r. Finally, to prove Theorem 21.3.3, we
have to show that r = m. If we compare both equations, then r = m follows if trgd(R|K)
̄ =
r − 1. But this holds by Lemma 21.6.6.

Theorem 21.6.8. Let V be a K-variety in C n . Then dim(V ) = n − 1 if and only if V = (f ) for


some irreducible polynomial f ∈ K[x1 , . . . , xn ].

Proof. (1) Let V be a K-variety in C n with dim(V ) = n − 1. The corresponding ideal (in
the sense of Theorem 21.2.4) is by Theorem 21.4.2 a prime ideal p in K[x1 , . . . , xn ]. By
Theorem 21.3.3 and Corollary 21.3.4, we get h(p) = 1 for the height of p, because dim(V ) =
n − 1 (see also Theorem 21.3.2). Since K[x1 , . . . , xn ] is a unique factorization domain, we
get that p = (f ) is a principal ideal by Lemma 21.6.5.
(2) Now let f ∈ K[x1 , . . . , xn ] be irreducible. We have to show that V = 𝒩 (f ) has
dimension n − 1. For that, by Theorem 21.6.3, we have to show that the prime ideal p =
(f ) has the height h(p) = 1. Assume that this is not the case. Then there exists a prime
e e
ideal q ≠ p with {0} ≠ q ⊂ p. Choose g ∈ q, g ≠ 0. Let g = uf e1 π2 2 ⋅ ⋅ ⋅ πr r be its prime
factorization in K[x1 , . . . , xn ]. Now g ∈ q and f ∉ q, because q ≠ p. Hence, there is a πi in
q ⊊ p = (f ), which is impossible. Therefore h(p) = 1.
324 � 21 The Hilbert Basis Theorem and the Nullstellensatz

21.7 Exercises
1. Let A = K[a1 , . . . , an ] and C|K be a field extension with C algebraically closed. Show
that there is a K-algebra homomorphism K[a1 , . . . , an ] → C.
2. Let K[x1 , . . . , xn ] be the polynomial ring of the n independent indeterminates
x1 , . . . , xn over the algebraically closed field K. The maximal ideals of K[x1 , . . . , xn ]
are exactly the ideals of the form m(α) = (x1 − α1 , x2 − α2 , . . . , xn − αn ) with
α = (α1 , . . . , αn ) ∈ K n .
3. The nil radical √0 of A = K[a1 , . . . , an ] corresponds with the Jacobson radical of A,
that is, the intersection of all maximal ideals of A.
4. Let R be a commutative ring with 1 ≠ 0. If each prime ideal of R is finitely generated,
then R is noetherian.
5. Prove the theoretical preparations for Krull’s going up lemma in detail.
6. Let K[x1 , . . . , xn ] be the polynomial ring of the n independent indeterminates
x1 , . . . , xn . For each ideal a of K[x1 , . . . , xn ], there exists a natural number m with the
following property: if f ∈ K[x1 , . . . , xn ] vanishes on the zero set of a, then f m ∈ a.
7. Let K be a field with char K ≠ 2 and a, b ∈ K ⋆ . We consider the polynomial

f (x, y) = ax 2 + by2 − 1 ∈ K[x, y]

as the polynomial ring of the independent indeterminates x and y. Let C be the al-
gebraic closure of K(x) and β ∈ C with f (x, β) = 0. Show the following:
(i) f is irreducible over the algebraic closure C0 of K (in C).
(ii) trgd(K(x, β)|K) = 1, [K(x, β) : K(x)] = 2, and K is algebraically closed in K(x, β).
22 Algebras and Group Representations
22.1 Group Representations
In Chapter 13, we spoke about group actions. These are homomorphisms from a group G
into a set of permutations on a set S. The way a group G acts on a set S can often be used
to study the structure of the group G, and, in Chapter 13, we used group actions to prove
the important Sylow theorems.
In this chapter, we discuss a very important type of group action called a group
representation or linear representation. This is a homomorphism of a group G into the
set of linear transformations of a vector space V over a field K. It is a finite-dimensional
representation if V is a finite dimensional vector space over K, and infinite-dimensional
otherwise. For an n-dimensional representation, each element of the group G can be
represented by an (n × n)-matrix over K, and the group operation can be represented
by matrix multiplication. As with general group actions, much information about the
structure of the group G can be obtained from representations. In particular, in this
chapter, we will present an important Burnside theorem, which shows that any finite
group, whose order is divisible by only two primes, must be solvable.
Representations of groups are important in many areas of mathematics. Group rep-
resentations allow many group-theoretic problems to be reduced to problems in linear
algebra, which is well understood. They are also important in physics and the study of
physical structure, because they describe how the symmetry group of a physical system
affects the solutions of equations describing that system.
The theory of group representations can be divided into several areas depending on
the kind of group being represented. The various areas can be quite different in detail,
though the basic definitions and concepts are the same. The most important areas are:
(1) The theory of finite group representations. Group representations constitute a cru-
cial tool in the study of finite groups. They also arise in applications of finite group
theory to crystallography and to geometry.
(2) Group representations of compact and locally compact groups. Using integration
theory and Haar measure, many of the results on representations of finite groups
can be extended to infinite locally compact groups. The resulting theory is a cen-
tral part of the area of mathematics called harmonic analysis. Pontryagin dual-
ity describes the theory for commutative groups as a generalized Fourier trans-
form.
(3) Representations of Lie Groups. Lie groups are continuous groups with a differen-
tiable structure. Most of the groups that arise in physics and chemistry are Lie
groups, and their representation theory is important to the application of group
theory in those fields.
(4) Linear algebraic groups are the analogues of Lie groups, but over more general
fields than just the reals or complexes. Their representation theory is more compli-
cated than that of Lie groups.

https://doi.org/10.1515/9783111142524-022
326 � 22 Algebras and Group Representations

For this chapter, we will consider solely the representation theory of finite groups, and
for the remainder of this chapter, when we say group, we mean finite group.

22.2 Representations and Modules


A group representation is a group action on a vector space that respects the vector space
structure. In this section, we examine the basic definitions of group representations
and the ties to general modules over rings, both commutative and non-commutative.
The main reference for this chapter is the book entitled Groups and Representations by
J. L. Alperin and R. B. Bell [1]. We follow the main lines of this book. As we mentioned in
the previous section, throughout the remainder of the chapter, group refers to a finite
group.
Let K be a field, and let G be a group action on a K-vector space V . We denote this
action by gv for g ∈ G and v ∈ V . The action is called linear if the following hold:
(1) g(v + w) = gv + gw for all g ∈ G, and v, w ∈ V .
(2) g(αv) = α(gv) for all g ∈ G, α ∈ K, and v ∈ V .

Recall that group actions correspond to group homomorphisms into symmetric groups.
For linear actions on a vector space V , we have a stronger result.

Theorem 22.2.1. There is a bijective correspondence between the set of linear actions of
a group G on a K-vector space V and the set of homomorphisms from G into GL(V ), the
group of all invertible linear transformations of V , which is called the general linear group
over V .

Proof. Suppose that ρ : G → GL(V ) is a homomorphism, then the action of G on V is


defined by setting gv = ρ(g)(v), and it is clear that this action is linear.
Conversely, if we have a linear action of G on V , then we can define a homomor-
phism ρ : G → GL(V ) by ρ(g)v = gv. These processes are mutually inverse, which gives
the desired correspondence.

Definition 22.2.2. A homomorphism ρ : G → GL(V ), where G is a group and V is a


K-vector space called a linear representation or group representation of G in V .

From Theorem 22.2.1, it follows that the study of group representations is equivalent
to the study of linear actions of groups. This area of study, with emphasis on finite groups
and finite-dimensional vector spaces, has many applications to finite group theory.
The modern approach to the representation theory of finite groups involves another
equivalent concept, namely that of finitely generated modules over group rings.
In Chapter 18, we considered R-modules over commutative rings R, and used this
study to prove the fundamental theorem of finitely generated modules over principal
ideal domains. In particular, we used the same study to prove the fundamental theorem
22.2 Representations and Modules � 327

of finitely generated Abelian groups. Here we must extend the concepts and allow R to
be a general ring with identity.

Definition 22.2.3. Let R be a ring with identity 1, and let M be an Abelian group written
additively. M is called left R-module if there is a map R × M → M written as (r, m) 󳨃→ rm
such that the following hold:
(1) 1 ⋅ m = m;
(2) r(m + n) = rm + rn;
(3) (r + s)m = rm + sm;
(4) r(sm) = (rs)m;

for all r, s ∈ R and m, n ∈ M.


We can similarly define the notion of a right R-module via a map from M × R to M
sending (m, r) to mr, which satisfies the analogous properties to those above. If R is com-
mutative, then every left module can in an obvious manner be given a right R-module
structure; hence, it is not necessary in the commutative case to distinguish between left
and right R-modules.
We always use the wording R-module to denote left R-module, unless otherwise
specified.

Definition 22.2.4. An R-module M is finitely generated if every element of M can be writ-


ten as an R-linear combination m = r1 m1 +⋅ ⋅ ⋅+rk mk for a finite subset {m1 , . . . , mk } of M.

Finite minimal sets for a given module may have different numbers of elements.
This is in contrast to the situation in free R-modules over a commutative ring R with
identity, where any two finite bases have the same number of elements (Theorem 18.4.6).
In the following, we review the module theory that is necessary for the study of
group representations. The facts we use are straightforward extensions of the respective
facts for modules over commutative rings or for groups.

Definition 22.2.5. Let M be an R-module, and let N be a subgroup of M. Then N is an


R-submodule (or just a submodule) if rn ∈ N for every r ∈ R and n ∈ N.

Example 22.2.6. The R-submodules of a ring R are exactly the left ideals of R (see Chap-
ter 1). Every R-module M has at least two submodules, namely, M itself and the zero
submodule {0}.

Definition 22.2.7. A simple R-module is an R-module M ≠ {0}, which has only M and {0}
as submodules.

If N is a submodule of M, then we may construct the factor group M/N (recall that
M is Abelian). We may give the factor group M/N an R-module structure by defining
r(m + N) = rm + N for every r ∈ R and m + N ∈ M/N. We call M/N the factor R-module,
or just factor module of M/N.
328 � 22 Algebras and Group Representations

Definition 22.2.8. Let N1 , N2 be submodules of an R-module M. Then we define the mod-


ule sum N1 + N2 by

N1 + N2 = {x + y | x ∈ N1 , y ∈ N2 } ⊂ M.

The sum N1 + N2 and the intersection N1 ∩ N2 are submodules of M. If N1 ∩ N2 = {0}, then


we call the sum N1 + N2 a direct sum and write N1 ⊕ N2 instead of N1 + N2 .
We say that a submodule N of M is a direct summand if there is some other submod-
ule N ′ of M such that M = N ⊕ N ′ . In general, we write kN or N k to denote the direct
sum

N ⊕ N ⊕ ⋅⋅⋅ ⊕ N

of k copies of N.

As for groups, we also have the external notion of a direct sum. If M and N are
R-modules, then we give the Cartesian product M × N an R-module structure by setting
r(m, n) = (rm, rn), and we write M ⊕ N instead of M × N.
The notions of internal and external direct sums can be extended to any finite num-
ber of submodules and modules, respectively.

Definition 22.2.9. A composition series of an R-module M ≠ {0} is a descending series

M = M0 ⊃ M1 ⊃ ⋅ ⋅ ⋅ ⊃ Mk = {0}

of finitely many submodules Mi of M beginning with M and ending with {0}, where the
inclusions are proper, and in which each successive factor module Mi /Mi+1 is a simple
module. We call the length of the composition series k.

Notice the following:


(1) A module need not have a composition series. For example, an infinite Abelian
group, considered as a ℤ-module, does not have a composition series (see Chap-
ter 12).
(2) The analog of the Jordan–Hölder theorem for groups (see Theorem 12.3.3) holds for
modules that have composition series.

Theorem 22.2.10 (Jordan–Hölder theorem for R-modules). If an R-module M ≠ {0} has a


composition series, then any two composition series are equivalent; that is, there exists
a one-to-one correspondence between their respective factor modules. Hence, the factor
modules are unique, and, in particular, the length must be the same.

Therefore, we can speak in a well defined manner about the factor modules of a
composition series. If an R-module M has a composition series, then each submodule N
and each factor module M/N also has a composition series.
22.2 Representations and Modules � 329

If the submodule N and the factor module M/N each have a composition series,
then the module M also has one (see Chapter 13 for the respective proofs for groups).

Definition 22.2.11. Let M and N be R-modules, and let ϕ : M → N be a group homomor-


phism. Then ϕ is an R-module homomorphism if ϕ(rm) = rϕ(m) for any r ∈ R and m ∈ M.
As for all other structures, we define monomorphism, epimorphism, isomorphism,
and automorphism of R-modules in analogy with the definition for groups.

Analogously, for groups, we have the following results:

Theorem 22.2.12 (First isomorphism theorem). Let M and N be R-modules, and ϕ : M → N


an R-module homomorphism.
(1) The kernel ker(ϕ) = {m ∈ M | ϕ(m) = 0} of ϕ is a submodule of M.
(2) The image Im(ϕ) = {n ∈ N | ϕ(m) = n for some m ∈ M} of ϕ is a submodule of N.
(3) The R-modules M/kerϕ and Im(ϕ) are isomorphic via the map induced by ϕ.

If the R-modules M and N are R-module isomorphic, then we write M ≅ N.

Corollary 22.2.13. An R-module homomorphism ϕ : M → N is injective if and only if


ker(ϕ) = {0}.

Theorem 22.2.14 (Second isomorphism theorem). Let N1 , N2 be submodules of an R-mod-


ule M. Then
(N1 + N2 )/N2 ≅ N1 /(N1 ∩ N2 ).

Theorem 22.2.15 (Schur’s lemma). Let M and N be simple R-modules, and let ϕ : M → N
be a nonzero R-module homomorphism. Then ϕ is an R-module isomorphism.

Proof. Since both M and N are simple, we must have either ker(ϕ) = M or ker(ϕ) = {0}.
If ker(ϕ) = M, then ϕ = 0 the zero homomorphism. Hence, ker(ϕ) = {0} and Im(ϕ) = N.
Therefore, if ϕ ≠ 0, then ϕ is an R-module isomorphism.

Group Rings and Modules over Group Rings


We now introduce the class of rings, whose modules we will study for group represen-
tations. They form the class of group algebras.

Definition 22.2.16. Let R be a ring and G a group. Then the group ring of G over R, de-
noted by RG, consists of all finite R-linear combinations of elements of G. This is the set
of linear combinations of the form

{ ∑ αg g 󵄨󵄨󵄨 all αg ∈ R}.


󵄨
g∈G

For addition in RG, we take the rule

∑ αg g + ∑ βg g = ∑ (αg + βg )g.
g∈G g∈G g∈G
330 � 22 Algebras and Group Representations

Multiplication in RG is defined by extending the multiplication in G:

( ∑ αg g)( ∑ βg g) = ∑ ∑ αg βh gh
g∈G g∈G g∈G h∈G

= ∑ ( ∑ (αg βg −1 x ))x.
x∈G g∈G

The group ring RG has an identity element, which coincides with the identity element
of G. We usually denote this by just 1.

From the viewpoint of abstract group theory, it is of interest to consider the case,
where the underlying ring is an integral domain. In this connection, we mention the
famous zero divisor conjecture by Higman and Kaplansky, which poses the question
whether every group ring RG of a torsion-free group G over an integral domain R or
over a field K has no zero divisors.
The conjecture has been proved only for a fairly restricted class of torsion-free
groups.
In this chapter, we will primarily consider the case where R = K is a field and the
group G is finite, in which case the group ring KG is not only a ring, but also a finite-
dimensional K-vector space having G as a basis. In this case, KG is called the group alge-
bra.
In mathematics, in general, an algebra over a field K is a K-vector space with a
bilinear product that makes it a ring. That is, an algebra over K is an algebraic structure
A with both a ring structure and a K-vector space structure that are compatible. That is,
α(ab) = (αa)b = a(αb) for any α ∈ K and a, b ∈ A. An algebra is finite-dimensional if it
has finite dimension as K-vector space.

Example 22.2.17. (1) The matrix ring M(n, K) is a finite-dimensional K-algebra for any
natural number n.
(2) The group ring KG is a finite-dimensional K-algebra when the group G is finite.

Definition 22.2.18. A homomorphism of K-algebras is a ring homomorphism, which is


also a K-linear transformation.

Modules over a group algebra KG can also be considered as K-vector spaces with
α ∈ K acting as α ⋅ 1 ∈ KG.

Lemma 22.2.19. If K is a field, and G is a finite group, then a KG-module is finitely gener-
ated if and only if it is finite-dimensional as a K-vector space.

Proof. If V is generated as a KG-module by {v1 , . . . , vk }, then V is generated as a K-vector


space by {gv1 , . . . , gvk }, and hence has finite dimension as a K-vector space. The converse
is clear.
22.2 Representations and Modules � 331

We now describe the fundamental connections between modules over group alge-
bras and group representation theory.

Theorem 22.2.20. If K is a field and G is a finite group, then there is a one-to-one cor-
respondence between finitely generated KG-modules and linear actions of G on finite-
dimensional K-vector spaces V , and hence with the homomorphisms ρ : G → GL(V )
for finite dimensional K-vector spaces V .

Proof. If V is a finitely generated KG-module, then dim K(V ) < ∞ by Lemma 22.2.19,
and the map from G × V to V obtained by restricting the module structure map from
KG × V to V is a linear action.
Conversely, let V be a finite-dimensional K-vector space, on which G acts linearly.
Then we place a KG-module structure on V by defining

( ∑ αg g)v = ∑ αg (gv) for


g∈G g∈G

∑ αg g ∈ KG and v ∈ V.
g∈G

The processes are inverses of each other.

To define a KG-module structure on a K-vector space V , it suffices to stipulate the


action of the elements of G on V . The action of arbitrary elements of KG on V is then
defined by extending linearly.
As indicated for the remainder of this section, G will denote a finite group, and K will
denote a field. All K-vector spaces will be finite dimensional, and all KG-modules will be
finitely generated and hence of finite dimension as a K-vector space. Our attention will
primarily be on KG-modules, although on occasion it will be convenient to work with
the linear representation ρ : G → GL(V ) with ρ(g) = gv for g ∈ G, v ∈ V arising from a
given KG-module V .

Example 22.2.21. (1) The field K can always be considered as a KG-module by defining
gλ = λ for all g ∈ G and λ ∈ K. This module is called the trivial module.
(2) Let G act on the finite set X = {x1 , . . . , xn }. Let KX be the set

n
{∑ ci xi 󵄨󵄨󵄨 ci ∈ K, xi ∈ X for i = 1, . . . , n}
󵄨
i=1

of all formal sums of K-linear combinations of elements of X. This then has a


K-vector space structure with basis X. On KX, we may define a KG-module in the
following manner: If g ∈ G and ∑ni=1 ci xi ∈ KX, then

n n
g(∑ ci xi ) = ∑ ci (gxi ).
i=1 i=1
332 � 22 Algebras and Group Representations

These modules are called the permutation modules.


(3) Let U, V be KG-modules. Then the (external) direct sum U ⊕ V has a KG-module
structure given by

g(u, v) = (gu, gv).

(4) Let U, V be KG-modules, and let HomKG (U, V ) be the set of all KG-module homo-
morphisms from U to V . For ϕ, ψ ∈ HomKG (U, V ) define ϕ + ψ ∈ HomKG (U, V ) by

(ϕ + ψ)(u) = ϕ(u) + ψ(u).

With this definition HomKG (U, V ) is an Abelian group. Furthermore, HomKG (U.V )
is a K-vector space with (λϕ)(u) = λϕ(u) for λ ∈ K, u ∈ U and ϕ ∈ HomKG (U, V ).
Note that this K-vector space has finite dimension. The K-vector space HomKG (U, V )
also admits a natural KG-module structure. For g ∈ G and ϕ ∈ HomKG (U, V ) then, we
define

gϕ : U → V by (gϕ)(u) = g(ϕ(g −1 (u))).

It is clear that gϕ ∈ HomKG (U, V ).

For g1 , g2 ∈ G, and ϕ ∈ HomKG (U, V ) then,

((g1 g2 )ϕ)(u) = g1 g2 ϕ((g1 g2 )−1 u) = g1 (g2 ϕ(g2−1 (g1−1 u)))


= g1 (g2 ϕ)(g1−1 (u)) = (g1 (g2 (ϕ))(u).

Therefore, (g1 g2 )ϕ = g1 (g2 ϕ). It follows that HomKG (U, V ) has a KG-module structure.
G acts on HomKG (U, V ), and we write U ⋆ for HomKG (U, K), where K is the trivial
module. U ⋆ is called the dual module of U, and here we have (gϕ)(u) = ϕ(g −1 u).

Theorem 22.2.22 (Maschke’s Theorem). Let G be a finite group, and suppose that the char-
acteristic of K is either 0 or co-prime to |G|; that is, gcd(char(K), |G|) = 1. If U is a KG-
module and V is a KG-submodule of U, then V is a direct summand of U as KG-modules.

Proof. U is, in particular, a finite-dimensional K-vector space, and V is a K-subspace.


Any basis for V can be extended to a basis of U. Hence, there is some subspace W of U
such that U = V ⊕ W as K-vector spaces. However, W may not be a KG-submodule of U.
Let π : U → V be the projection of U onto V in terms of the vector space decomposition
so that the map π is the unique linear transformation; that is, the identity on V and zero
on W . We now define a linear transformation

π′ : U → U

by

1
π ′ (u) = ∑ gπ(g −1 u) for u ∈ U.
|G| g∈G
22.2 Representations and Modules � 333

1
Since char(K) = 0, or gcd(char(K), |G|) = 1, it follows that |G| ≠ 0 in K; hence, |G|
exists
in K. Therefore, the definition of π makes sense.

We have gv ∈ V for any g ∈ G and v ∈ V , because V is a KG-submodule of U.


Therefore, the map π ′ maps U into V . moreover, since π is the identity on V , we have
that gπ(g −1 v) = gg −1 (v) = v for any g ∈ G and v ∈ V . Therefore, the restriction of π ′ to V
is the identity. It also follows that U = V ⊕ ker(π ′ ) as K-vector spaces. It remains to show
that ker(π ′ ) is a KG-submodule of U.
To show this, it is sufficient to show that π ′ is a KG-module homomorphism; that is,
we must show that π ′ (xu) = xπ ′ (u) for any x ∈ G and u ∈ U. We have

1
π ′ (xu) = ∑ gπ(g −1 xu)
|G| g∈G
1
= ∑ xx −1 gπ(g −1 xu)
|G| g∈G
1
= x( ∑ x −1 gπ(g −1 xu)).
|G| g∈G

But as g varies through G further, y = x −1 g varies through G for fixed x ∈ G.


Therefore,

1
π ′ (xu) = x( ∑ yπ(y−1 u)) = xπ ′ (u)
|G| y∈G

as required.

Definition 22.2.23. A module U is semisimple if it is a direct sum of simple modules. If


U = {0}, then the sum is the empty sum.

Corollary 22.2.24. Let G be a finite group and K a field. Suppose that either char(K) = 0
or char(K) is relatively prime to |G|. Then every nonzero KG-module is semisimple.

Proof. Let U be a nonzero KG-module. We use induction on dimK (V ). If U is simple, we


are done. This includes the case where dimK (V ) = 1. Suppose that dimK (V ) > 1, and
assume that U is not simple. Then U must have a nonzero proper KG-submodule V . By
Maschke’s theorem, we have U = V ⊕ W for some nonzero proper KG-submodule W
of U. Then both V and W have dimension strictly less than dimK (U). By the induction
hypothesis, both V and W are semisimple; therefore, U is semisimple.

We now present a version of Maschke’s theorem for linear group representations


ρ : G → GL(V ), where ρ(g)(u) = gu for g ∈ G, u ∈ U, which arises from the given
KG-module U. To formulate Maschke’s result, we need some additional definitions and
notation.
334 � 22 Algebras and Group Representations

Definition 22.2.25. (1) A K-vector subspace V of U is a G-invariant subspace if gv ∈ V


for all g ∈ G and v ∈ V .
(2) Let U be nonzero. A representation ρ : G → GL(U) is irreducible if {0} and U are the
only G-invariant subspaces of U.
(3) Let U be nonzero. A representation ρ : G → GL(U) is fully reducible if each
G-invariant subspace V of U has a G-invariant complement W in U; that is, U =
V ⊕ W as K-vector spaces.

Theorem 22.2.26 (Maschke’s theorem). Let G be a finite group and K a field. Suppose that
either char(K) = 0 or char(K) is relatively prime to |G|. Let U be a finite-dimensional
K-vector space. Then each representation ρ : G → GL(V ) is fully reducible.

Proof. By Theorem 22.2.1, we may consider U as a KG-module. Then the above version
of Maschke’s theorem follows from the proof for modules, because the KG-submodules
of U together with the respective definitions for group representations represent the
G-invariant subspaces of U.

The theory of KG-modules, when char(K) = p > 0 and p, divides |G|. In which case,
arbitrary KG-modules need not be semisimple, and is called modular representation the-
ory. The earliest work on modular representations was done by Dickson and many of
the main developments were done by Brauer. More details and a good overview may be
found in [1], [4], [5], and [18].

22.3 Semisimple Algebras and Wedderburn’s Theorem


In this section, K will denote a field and all algebras will be finite dimensional K-algebras
and, unless explicitly stated otherwise, will be algebras with an identity element. All
modules and algebras are assumed to be finitely generated or equivalently finite-
dimensional as K-vector spaces. All direct sums of modules will be assumed to be
finite. Let A be an algebra. We are interested in semisimple A-modules, and want to
determine conditions on A so that every A-module is semisimple.

Lemma 22.3.1. Let M be an A-module. Then the following are equivalent:


(1) Any submodule of M is a direct summand of M.
(2) M is semisimple.
(3) M is a sum of simple submodules.

Proof. The implication (1) 󳨐⇒ (2) follows in the same manner as Corollary 22.2.24. The
implication (2) 󳨐⇒ (3) is direct.
Finally, we must show the implication (3) 󳨐⇒ (1). Suppose that (3) holds, and let
N be a submodule of M. Let V also be a submodule of M; that is, maximal among all
submodules of M that intersect N trivially. Such a submodule V exists by Zorn’s lemma.
We wish to show that N +V = M. Suppose that N +V ≠ M (certainly we have N +V ⊂ M). If
22.3 Semisimple Algebras and Wedderburn’s Theorem � 335

every simple submodule of M were contained in N +V , then as M can be written as a sum


of simple submodules, we would have M ⊂ N + V . This is not the case, since N + V ≠ M.
Hence, there is some simple submodule S of M that is not contained in N + V . Since
S ∩(N +V ) is a proper submodule of the simple module S, we must have S ∩(N +V ) = {0}.
In particular, S ∩V = {0}, so we have V ⊂ V +S. Let n ∈ N ∩(V +S). Then n = s+v for some
v ∈ V and s ∈ S. This gives s = n − v ∈ S ∩ N + V , and therefore s = 0. Hence, n = v, which
forces n to be 0, because N ∩ V = {0}. It follows that N ∩ (V + S) = {0}, which contradicts
the maximality of V . Hence, we now have M = N + V . Furthermore, since N ∩ V = {0},
we get that the sum is direct and M = N ⊕ V . Therefore, N is a direct summand of M,
which proves the implication (3) 󳨐⇒ (1) completing the proof of the lemma.

Lemma 22.3.2. Submodules and factor modules of semisimple modules are also semi-
simple.

Proof. Let M be a semisimple A-module. By the previous lemma and the isomorphism
theorem for modules, we get that every submodule of M is isomorphic to a factor module
of M. Therefore, it suffices to show that factor modules of M are semisimple. Let M/N
be an arbitrary factor module, and let η : M → M/N with m 󳨃→ m + N be the canonical
map. Since M is semisimple, we have M = S1 + ⋅ ⋅ ⋅ + Sn with n ∈ ℕ, and each Si a simple
module. Then M/N = η(M) = η(S1 ) + ⋅ ⋅ ⋅ + η(Sn ). But each η(Si ) is isomorphic to a factor
module of Si , and hence each η(Si ) is either {0} or a simple module. Therefore, M/N is a
sum of simple modules, and hence semisimple by Lemma 22.3.1.

Definition 22.3.3. An algebra A is semisimple if all nonzero A-modules are semisimple.

Note that if G is a finite group, and either char(K) = 0 or gcd(char(K), |G|) = 1, then
KG is semisimple.
We now give some fundamental results on semisimple algebras.

Lemma 22.3.4. The algebra A is semisimple if and only if the A-module A is semisimple.

Proof. Suppose that the A-module A is semisimple, and let M be an A-module generated
by {m1 , . . . , mr }.
Let Ar denote the direct sum of r copies of A; (a1 , . . . , ar ) 󳨃→ a1 m1 + ⋅ ⋅ ⋅ + ar mr defines
a map from Ar to M, which is an A-module epimorphism. Thus, M is isomorphic to a
factor module of the semisimple module Ar , and hence semisimple by Lemma 22.3.2. It
follows that A is a semisimple algebra.
The converse is clear.

Theorem 22.3.5. Let A be a semisimple algebra, and suppose that as an A-module, we


have

A ≅ S1 ⊕ ⋅ ⋅ ⋅ ⊕ Sr , r ∈ ℕ,

where the Si are simple submodules of A. Then any simple A-module is isomorphic to
some Si .
336 � 22 Algebras and Group Representations

Proof. Let S be a simple A-module and s ∈ S with s ≠ 0. We define an A-module homo-


morphism ϕ : A → S by ϕ(a) = as for a ∈ A. Since S is simple, the map ϕ is surjective.
For each i, let be ϕi = ϕ |Si , the restriction of ϕ to Si . If ϕi = 0 for all i, then we would
have ϕ = 0. Hence, ϕi is nonzero for some i, and it follows from Schur’s lemma that
ϕi : Si → Si is an isomorphism for such an i.

Theorem 22.3.6. Suppose that A is a semisimple algebra, and let S1 , . . . , Sr be a collection


of simple A-modules such that every simple A-module is isomorphic with exactly one Si .
Let M be an A-module, and let

M ≅ m1 S1 + ⋅ ⋅ ⋅ + mr Sr

for some integers mi ∈ ℕ ∪ {0}. Then the mi are uniquely determined.

Proof. There is a composition series of m1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ mr Sr having m1 + ⋅ ⋅ ⋅ + mr terms,


in which Si appears mi times as a composition factor. The result then follows from the
Jordan-Hölder theorem for modules (Theorem 22.2.10).

Whenever the modules S1 , . . . , Sr are stated as m1 S1 + ⋅ ⋅ ⋅ + mr S, as in the previous


theorem, we will say that the Si are the distinct simple A-modules. The Si are nonisomor-
phic.
We want to classify all semisimple algebras. We start by showing the semi-simplicity
of a certain class of algebras, and then showing that all semisimple algebras fall in this
class. We will introduce this class in steps.
Let D be a finite-dimensional K-algebra. Then for any n ∈ ℕ, the set M(n, D) of (n×n)-
matrices with entries in D is a finite-dimensional K-algebra of dimension n2 dimK (D).
Algebras of this form are called matrix algebras over D.
For 1 ≤ i, j ≤ n, and α ∈ D, let Eij (α) be the matrix, whose only nonzero entry is equal
to α, and occurs in the (i, j)-th position.
Let Dn be the set of column vectors of length n with entries from D, then Dn forms
an M(n, D)-module under matrix multiplication.

Definition 22.3.7. An algebra D is a division algebra or skew field if the nonzero ele-
ments of D form a group. Equivalently, it is a ring, where every nonzero element has a
multiplicative inverse. It is exactly the definition of a field without requiring commuta-
tivity.

Any field K is a division algebra over itself, but there may be division algebras that
are noncommutative. If the interest is on the ring structure of D, one often speaks about
division rings (see Chapter 7).

Theorem 22.3.8. Let D be a division algebra and n ∈ ℕ. Then any simple M(n, D)-module
is isomorphic to Dn , and M(n, D) is an M(n, D)-module isomorphic to the direct sum of n
copies of Dn . In particular, M(n, D) is a semisimple algebra.
22.3 Semisimple Algebras and Wedderburn’s Theorem � 337

Proof. A nonzero submodule of Dn must contain some nonzero vector, which must have
a nonzero entry x in the j-th place for some j. This x is invertible in D.
By premultiplying this vector by Ejj (x −1 ), we see that the submodule contains the j-th
canonical basis vector. By premultiplying this basis vector by appropriate permutation
matrices, we get that the submodule contains every canonical basis vector, and hence
contains every vector.
It follows that Dn is the only nonzero M(n, D)-submodule of Dn , and hence Dn is
simple. Now for each 1 ≤ k ≤ n, let Ck be the submodule of M(n, D) consisting of those
matrices, whose only nonzero entries appear in the k-th column. Then we have
n
M(n, D) ≅ ⨁ Ck
k=1

as M(n, D)-modules. But each Ck is isomorphic as an M(n, D)-module to Dn .


It follows that M(n, D) is a semisimple algebra by Lemma 22.3.4, and then Dn is the
unique simple M(n, D)-module by Theorem 22.3.5.

Definition 22.3.9. A nonzero algebra is simple if its only (two-sided) ideals (as a ring)
are itself and the zero ideal.

Lemma 22.3.10. Simple algebras are semisimple.

Proof. Let A be a simple algebra, and let Σ be the sum of all simple submodules of A. Let
S be a simple submodule of A, and let a ∈ A. Then the map ϕ : S → Sa, given by s 󳨃→ sa,
is a module epimorphism. Therefore, Sa is simple, or Sa = {0}. In either case, we have
Sa ⊂ Σ for any submodule S and any a ∈ A.
It follows that Σ is a right ideal in A, and hence that Sa is a two-sided ideal. How-
ever, A is simple, and Σ ≠ {0}, so we must have Σ = A. Therefore, A is the sum of
simple A-modules, and from Lemmas 22.3.1 and 22.3.4, it follows that A is a semisimple
algebra.

Theorem 22.3.11. Let D be a division algebra, and let n ∈ ℕ. Then M(n, D) is a simple
algebra.

Proof. Let M ∈ M(n, D) with M ≠ {0}. We must show that the principal two-sided ideal
J of M(n, D) generated by M is equal to M(n, D).
It suffices to show that J contains each Eij (1), since these matrices generate M(n, D)
as an M(n, D)-module. Since M ≠ {0}, there exists some 1 ≤ r, s ≤ n such that the (r, s)-
entry of M is nonzero. We call this entry x. By calculation, we have

Ess (1) = Esr (x −1 )MEss (1) ∈ J.

Now let 1 ≤ i, j ≤ n, and let w, w′ be the permutation matrices corresponding to the


transpositions (i, s) and (s, j), respectively. Then Eij (1) = wEss (1)w′ ∈ J.
338 � 22 Algebras and Group Representations

Let B1 , . . . , Br be algebras. The external direct sum B = B1 ⊕B2 ⊕⋅ ⋅ ⋅⊕Br is the algebra,
whose underlying set is the Cartesian product, and whose addition, multiplication, and
scalar multiplication are defined componentwise.
If M is a Bi -module for some i, then M has a B-module structure given by

(b1 , . . . , br )m = bi m.

If M is simple (respectively semisimple) as a Bi -module, then M is also simple (respec-


tively semisimple) as a B-module. For each i, the set of elements of B, whose only nonzero
entry is in the ith component of B, is an ideal in B, and this ideal is B-module isomorphic
to Bi .
Now suppose that B is an algebra having ideals B1 , . . . , Br such that, as vector spaces,
B is the direct sum of the Bi . Then B is isomorphic to the external direct sum B1 ⊕ ⋅ ⋅ ⋅ ⊕ Br
by the map

b = b1 + ⋅ ⋅ ⋅ + br 󳨃→ (b1 , . . . , br ).

The algebra B is the internal direct sum as algebras of the Bi . This can be seen as follows.
If i ≠ j and bi ∈ Bi , bj ∈ Bj , then we must have bi bj ∈ Bi ∩ Bj = {0}, since Bi and Bj are
ideals. Therefore, the product in B of b1 + ⋅ ⋅ ⋅ + br and b′1 + ⋅ ⋅ ⋅ b′r is just b1 b′1 + ⋅ ⋅ ⋅ + br b′r .

Lemma 22.3.12. Let B = B1 ⊕ ⋅ ⋅ ⋅ ⊕ Br be a direct sum of algebras. Then the (two-sided)


ideals of B are precisely the sets of the form J1 ⊕ ⋅ ⋅ ⋅ ⊕ Jr , where Ji is a (two-sided) ideal of
Bi for each i.

Proof. Let J be a (two-sided) ideal of B, and let Ji = J ∩ Bi for each i. Certainly, ⨁ri=1 Ji ⊂ J.
Let b ∈ J, then b = b1 + ⋅ ⋅ ⋅ + br with bi ∈ Bi for each i. For some i, consider ei =
(0, . . . , 0, 1, 0, . . . , 0); that is, the element of B, whose only nonzero entry is the identity
element of Bi . Then b = bei ∈ J ∩ Bi = Ji . Therefore, b ∈ ⨁ri=1 Ji , which shows that
J = J1 ⊕ ⋅ ⋅ ⋅ ⊕ Jr .
The converse is clear.

Theorem 22.3.13. Let r ∈ ℕ. For each 1 ≤ i ≤ r, let Di be a division algebra over K.


Let ni ∈ ℕ, and let Bi = M(ni , Di ). Let B be the external direct sum of the Bi . Then B is a
semisimple algebra having exactly r isomorphism classes of simple modules and exactly
2r (two-sided) ideals, namely, every sum of the form ⨁j∈J Bj , where J is a subset of {1, . . . , r}.

Proof. For each i, we write Bi = Ci1 ⊕ ⋅ ⋅ ⋅ ⊕ Cin using Theorem 22.3.8, where the Cij are
mutually isomorphic Bi -modules. As we saw above, each Cij is also simple as a B-module.
Therefore, as B-modules, we have B ≅ ⨁i,j Cij , and hence B is a semisimple algebra by
Lemma 22.3.4. From Theorem 22.3.5, we get that any simple B-module is isomorphic to
some Cij , but Cij ≅ Ckl if and only if i = k. Hence, there are exactly r isomorphisms of sim-
ple B-modules. The final statement is a straightforward consequence of Theorem 22.3.11
and Lemma 22.3.12.
22.3 Semisimple Algebras and Wedderburn’s Theorem � 339

We saw that a direct sum of matrix algebras over a division algebras is semisimple.
We now start to show that the converse is also true; that is, any semisimple algebra is iso-
morphic to a direct sum of matrix algebras over division algebras. This is Wedderburn’s
theorem.

Definition 22.3.14. If M is an A-module, then let EndA (M) = HomA (M, M) denote the
set of all A-module endomorphisms of M. In a more general context, we have seen that
EndA (M) has the structure of an A-module via

(ϕ + ψ)(m) = ϕ(m) + ψ(m)


(λϕ)(m) = ϕ(λm)

for all ϕ, ψ ∈ EndA (M), λ ∈ A, and m ∈ M. This composition of mappings gives a multipli-
cation in EndA (M), and hence EndA (M) is a K-algebra, called the endomorphism algebra
of M.

Definition 22.3.15. The opposite algebra of B, denoted Bop , is the set B together with the
usual addition and scalar multiplication, but with the opposite multiplication, that is,
the multiplication rule of B reversed.

Given a, b ∈ B, we use ab to denote their product in B, and a ⋅ b to denote their


product in Bop . Hence, a ⋅ b = ba. We certainly have (Bop )op = B. If B is a division algebra,
then so is Bop . The opposite of a direct sum of algebras is the direct sum of the opposite
algebras, because the multiplication in the direct sum is defined componentwise.
Endomorphism algebras and opposite algebras are closely related.

Lemma 22.3.16. Let B be an algebra. Then Bop ≅ EndB (B).

Proof. Let ϕ ∈ EndB (B), and let a = ϕ(1). Then ϕ(b) = bϕ(1) = ba for any b ∈ B; hence, ϕ
is equal to the automorphism ψa , given by right multiplication of a. Therefore, EndB (B) =
{ψa : a ∈ B}; hence, EndB (B) and B are in one-to-one correspondence. To finish the proof,
we must show that ψa ψb = ψa⋅b for any a, b ∈ B.
Let a, b ∈ B. Then ψa ψb (x) = ψa (xb) = xba = ψba (x) = ψa⋅b (x), as required.

Lemma 22.3.17. Let S1 , . . . , Sr be the r distinct simple A-modules of Theorem 22.3.6. For
each i, let Ui be a direct sum of copies of Si , and let U = U1 ⊕ ⋅ ⋅ ⋅ ⊕ Ur . Then

EndA (U) ≅ EndA (U1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (Ur ).

Proof. Let ϕ ∈ EndA (U). Fix some i. Then every composition factor of Ui is isomorphic
to Si . Therefore, by the Jordan–Hölder theorem for modules (Theorem 22.3.10), we see
that the same is true for ϕ(Ui ), since ϕ(Ui ) is isomorphic to a quotient of Ui . Assume that
ϕ(Ui ) is not contained in Ui . Then the image of ϕ(Ui ) in U/Ui under the canonical map
is a nonzero submodule, having Si as a composition factor. However, the composition
factors of U/Ui are exactly those Sj for j ≠ i. This gives a contradiction. It follows that
340 � 22 Algebras and Group Representations

ϕ(Ui ) ⊂ Ui , and a submodule of U/Ui cannot have Si as a composition factor. For each i,
we can define ϕi = ϕ|U , and we have ϕi ∈ EndA (Ui ). In this way, we define a map
i

Γ : EndA (U) 󳨃→ EndA (U1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (Ur )

by setting

Γ(ϕ) = (ϕ1 . . . , ϕr ) ∈ EndA (U1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (Ur ).

It is straightforward that Γ is an A-module monomorphism.



Now let (ϕ1 , . . . , ϕr ) ∈ EndA (U1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (Ur ). We define ϕ ∈ EndA (U) as follows:
Given x ∈ U with x = x1 + ⋅ ⋅ ⋅ + xr , and xi ∈ Ui for each i, then

ϕ(x) = ϕ1 (x1 ) + ⋅ ⋅ ⋅ + ϕr (xr ).

We then have (ϕ1 , . . . , ϕr ) = Γ(ϕ), which shows that Γ is surjective, and hence an isomor-
phism.

Lemma 22.3.18. If S is a simple A-module, then EndA (nS) ≅ M(n, EndA (S)) for n ∈ ℕ.

Proof. We regard the elements of nS as being column vectors of length n with entries
from S. Let Φ = (ϕij ) ∈ M(n, EndA (S)). We now define the map

Γ(Φ) : nS → nS

by

s1 ϕ11 ⋅⋅⋅ ϕin s1


.. . .. .
Γ(Φ) ( . ) = ( .. . ) ( .. )
sn ϕn1 ⋅⋅⋅ ϕnn sn
ϕ11 (s1 ) + ⋅ ⋅ ⋅ + ϕ1n (sn )
..
=( . ).
ϕn1 (s1 ) + ⋅ ⋅ ⋅ + ϕnn (sn )

s1
󳨀s = ( .. ) ∈ nS. Then
We write → .
sn

Γ(Φ(a→
󳨀s + →
t )) = aΓ(Φ)(→
󳨀s ) + Γ(Φ)(→
t)
󳨀 󳨀

for any a ∈ A and →󳨀s ,→


t ∈ nS, because each ϕij is an A-module homomorphism. There-
󳨀
fore, Γ(Φ) ∈ EndA (nS), and we easily obtain that

Γ : M(n, (EndA (S))) → EndA (nS)


22.3 Semisimple Algebras and Wedderburn’s Theorem � 341

by

Φ 󳨃→ Γ(Φ)

is an algebra monomorphism.
Now let ψ ∈ EndA (nS). For each 1 ≤ i, j ≤ n, we define ψij : S → S implicitly by

s 0
ψ11 (s) ψ1n (s)
0 .. 0 .
ψ ( . ) = ( . ) , . . . , ψ ( . ) = ( .. ) .
.. ..
ψn1 (s) ψnn (s)
0 s

We get that each ψij ∈ EndA (S). Now let Ψ = (ψij ) ∈ M(n, EndA (S)). Then Γ(Ψ) = ψ,
showing that Γ is also surjective, and hence an isomorphism.

If S is a simple A-module, then EndA (S) is a division algebra by Schur’s lemma (The-
orem 22.2.15). If the ground field K is algebraically closed, then more specific results can
be stated about the structure of EndA (S).

Lemma 22.3.19. Suppose that K is algebraically closed, and let S be a simple A-module.
Then EndA (S) ≅ K.

Proof. Let ϕ ∈ EndA (S). Consider ϕ as an invertible K-linear map of the finite-dimen-
sional K-vector space S onto itself. Since K is algebraically closed, ϕ has a nonzero eigen-
value λϕ ∈ K.
If I is the identity element of Enda (S), then (ϕ−λϕ I) ∈ EndA (S) has a nonzero kernel,
and therefore is not invertible. From this, it follows that ϕ = λϕ I, since EndA (S) is a
division algebra. The map ϕ 󳨃→ λϕ is then an isomorphism from EndA (S) to K.

Lemma 22.3.20. Let B be an algebra. Then (M(n, B))op ≅ M(n, Bop ) for any n ∈ ℕ.

Proof. Define the map ψ : (M(n, B))op → M(n, Bop ) by ψ(X) = X t , where X t is the trans-
pose of the matrix X. This map is bijective.
Let X = (xij ) and Y = (yij ) be elements of (M(n, B))op . Then for any i and j we have

n n
(ψ(X)ψ(Y ))ij = ∑ ψ(X)ij ⋅ ψ(Y )kj = ∑ (X t )ik ⋅ (Y t )kj
k=1 k=1
n n
= ∑ Xki ⋅ Yjk = ∑ Yjk Xki = (YX)ji
k=1 k=1

= ((YX)t )ij = ((X ⋅ Y )t )ij = ψ(X ⋅ Y )ij .

Therefore, ψ(X ⋅ Y ) = ψ(X)ψ(Y ), and then ψ is an algebra homomorphism, and since it


is bijective also an algebra isomorphism.
342 � 22 Algebras and Group Representations

We are now at the point of stating Wedderburn’s theorem.

Theorem 22.3.21 (Wedderburn). The algebra A is semisimple if and only if it is isomorphic


to a direct sum of matrix algebras over division algebras.

Proof. Suppose that the algebra A is semisimple. Then A is of the form A = U1 ⊕ ⋅ ⋅ ⋅ ⊕ Ur ,


where each Ui is the direct sum of ni copies of a simple A-module Si , and no two of the
distinct Si are isomorphic.
We have Aop ≅ EndA (A) by Lemma 22.3.16, and Aop ≅ EndA (U1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (Ur )
by Lemma 22.3.17. Therefore, Aop ≅ EndA (n1 S1 ) ⊕ ⋅ ⋅ ⋅ ⊕ EndA (nr Sr ), and then by Lem-
ma 22.3.16, Aop ≅ M(n1 , EndA (S1 )) ⊕ ⋅ ⋅ ⋅ ⊕ M(nr , EndA (Sr )). Now, from Lemma 22.3.18,
op
A ≅ M(n1 , EndA (S1 )) ⊕ ⋅ ⋅ ⋅ ⊕ M(nr , Enda (Sr ))
op op
≅ M(n1 , EndA (S1 )) ⊕ ⋅ ⋅ ⋅ ⊕ M(nr , Enda (Sr ))
op op
≅ M(n1 , EndA (S1 )) ⊕ ⋅ ⋅ ⋅ ⊕ M(nr , EndA (Sr )op ) .

Since the endomorphism algebra of a simple module is a division algebra, and the
opposite algebra of a division algebra is also a division algebra, it follows that a semisim-
ple algebra is isomorphic to a direct sum of matrix algebras over division algebras. The
converse is a direct consequence of Theorem 22.3.13.

Theorem 22.3.22. The algebra A is simple if and only if it is isomorphic to a matrix alge-
bra over a division ring.

Proof. Suppose that A is a simple algebra. Then by Lemma 22.3.10, A is semisimple;


hence, by Theorem 22.3.21, A is isomorphic to a direct sum of R matrix algebras over
division algebras. From Theorem 22.3.13, we have that A has exactly 2r ideals. However,
A is simple, and hence has only 2 ideals. Therefore, r = 1, and any simple algebra is
isomorphic to a matrix algebra over a division algebra. The converse follows from The-
orem 22.3.11.

We see that an algebra is semisimple if and only if it is a direct sum of simple alge-
bras. This affirms the consistency of the choice of terminology.

Theorem 22.3.23. Suppose that the field K is algebraically closed. Then any semisimple
algebra is isomorphic to a direct sum of matrix algebras over K.

Proof. This follows directly from Lemma 22.3.19 and Theorem 22.3.21.

22.4 Ordinary Representations, Characters and Character Theory


In this section, we look at a concept, the character of a representation, which gives more
information than one might expect at first glance. Throughout this section, we will be
concerned with the case, where the ground field K is ℂ, the field of complex numbers. In
22.4 Ordinary Representations, Characters and Character Theory � 343

this case, representation theory of groups is called ordinary representation theory. Recall
that ℂ has characteristic 0 and is algebraically closed. For this section, G will denote
a finite group, and all ℂG-modules are finitely generated, or equivalently have finite
dimension as ℂ-vector spaces. From Theorem 22.3.21, we see that every nonzero ℂG-
modules is semisimple for any group G. It follows, from Wedderburn’s theorem, that we
have very specific information about the nature of the group algebra ℂG.

Theorem 22.4.1. There exists some r ∈ ℕ and some f1 , . . . , fr ∈ ℕ such that

ℂG = M(f1 , ℂ) ⊕ ⋅ ⋅ ⋅ ⊕ M(fr , ℂ)

as ℂ-algebras. Furthermore, there are exactly r isomorphism classes of simple ℂG-mod-


ules, and if we let S1 , . . . , Sr be representations of these r classes, then we can order the Si
so that

ℂG ≅ f1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ fr Sr

as ℂG-modules, where dimℂ Si = fi for each i. Any ℂG-module can be written uniquely in
the form a1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ ar Sr where all ai ∈ ℕ ∪ {0}.

Proof. The theorem follows from our results on the classification of simple and semisim-
ple algebras. The first statement follows from Corollary 22.2.24 and Theorem 22.3.23. The
second statement follows from Theorems 22.3.8 and 22.3.13, where we take Si as the space
of column vectors of length fi with the canonical module structure over the ith summand
M(fi , ℂ).
The final statement follows from Theorem 22.3.6.

Definition 22.4.2. The ℂ-dimensions f1 , . . . , fr of the r simple ℂG-modules are called the
degrees of the representations of G.

The trivial ℂG-module ℂ is one-dimensional, and hence simple. Therefore, G will


always have at least one representation of degree 1. By convention, we let f1 = 1. The
sizes of the degrees are determined by the order of the group G.

Corollary 22.4.3. We have


r
∑ fi2 = |G|.
i=1

Proof. Theorem 22.4.1 gives

r
|G| = dimℂ (ℂG) = dim(⨁ M(fi , ℂ))
i=1
r r
= ∑ dimℂ M(fi , ℂ) = ∑ fi2 .
i=1 i=1
344 � 22 Algebras and Group Representations

We note that the degrees of G divide |G|. We do not need this fact. For a proof see
the appendix in the book [1].

Theorem 22.4.4. The number r of simple G-modules is equal to the number of conjugacy
classes of G.

Proof. Let Z be the center of ℂG; that is, the subalgebra of ℂG consisting of all elements
that commute with every element of ℂG. From Theorem 22.4.1, it follows that Z is iso-
morphic to the center of M(f1 , ℂ)⊕⋅ ⋅ ⋅⊕M(fr , ℂ), and therefore is isomorphic to the direct
sum of the centers of the M(fi , ℂ). It is straightforward that the center of M(fi , ℂ) is equal
to the set of diagonal matrices

{αI : I is the identity matrix in M(fi , ℂ), α ∈ ℂ}.

Hence, the center of M(fi , ℂ) is isomorphic to ℂ, and therefore Z ≅ ℂr , which implies


that dimℂ (Z) = r.
We now consider an element ∑g∈G λg G of Z. For any h ∈ G, we have

( ∑ λg G)h = h( ∑ λg g),
g∈G g∈G

which leads to

∑ λg g = ∑ λg h−1 gh = ∑ λhgh−1 g.
g∈G g∈G g∈G

It follows that we must have λg = λhgh−1 for all g, h ∈ G.


It also follows then that the coefficients of elements of the center Z are constant on
conjugacy classes of G, and that a basis for Z is the set of class sums, which are the sums
of the form ∑g∈C g, where C is a conjugacy class of G. Thus, dimℂ Z is equal to the number
of conjugacy classes of G.

Characters and Character Theory


We now define and study the characters of an ordinary representation.

Definition 22.4.5. If U is a ℂG-module, then each g ∈ G defines an invertible linear


transformation of U via u 󳨃→ gu for u ∈ U. The character of U is the function χU : G → ℂ
defined by χU (g), the trace of the linear transformation of U defined by g.

We note that for any representation U, we have χU (1) = dimℂ (U), since the identity
element of G induces the identity transformation of U. Furthermore, if ρ : G → GL(U)
is the representation corresponding to U, then χU (g) is just the trace of the map ρ(g).
Thus, isomorphic ℂG-modules have equal characters.
If g, h ∈ G, then the linear transformations of U, defined by g and hgh−1 , have the
same trace. These linear transformations are called similar. Therefore, any character
22.4 Ordinary Representations, Characters and Character Theory � 345

is constant on each conjugacy class of G; that is, the value of the character on any two
conjugate elements is the same.

Example 22.4.6. Let U = ℂG and g ∈ G. By considering the matrix of the linear trans-
formation defined by g with respect to the basis G of ℂG, we get that χU (g) is equal to
the number of elements x ∈ G, for which gx = x. Therefore, we have χU (1) = |G| and
χU (g) = 0 for every g ∈ G with g ≠ 1. This character is called the regular character of G.

The theory of characters was introduced by Frobenius. In connection with number


theory, he defined characters as being functions from G to ℂ satisfying certain proper-
ties. However, it turned out that his characters were exactly the trace functions of finitely
generated ℂG-modules. In what follows, we describe the properties of characters.
We first consider the characters of the r simple ℂG-modules. We denote these by
χ1 , . . . , χr . These are called the irreducible characters of G.
Whenever we have that S1 , . . . , Sr are the distinct (up to isomorphism) ℂG-modules,
we order them so that χSi = χi for each i. Because S1 = {1} for the trivial representation,
we let χ1 be the character of the trivial representation, and call χ1 the principal character
of G. We then have χ1 (g) = 1 for all g ∈ G.

Definition 22.4.7. A character of a one-dimensional representation ℂG-module is called


a linear character.

Since one-dimensional modules are simple, we get that all linear characters are ir-
reducible. Let χ be the linear character arising from the ℂG-module U, and let g, h ∈ G.
Since U is one-dimensional for any u ∈ U, we have gu = χ(g)u, and hu = χ(h)u. Then
χ(gh)u = (gh)u = χ(g)χ(h)u. Hence, χ is a homomorphism from G to the multiplicative
group ℂ⋆ = ℂ \ {0}. On the other hand, given a homomorphism ϕ : G → ℂ⋆ , we can
define a one-dimensional ℂG-module U by gu = ϕ(g)u for g ∈ G and u ∈ U. Therefore,
χU = ϕ. It follows that the linear characters of G are precisely the group of homomor-
phisms from G to ℂ⋆ .

Theorem 22.4.8. Let U be a ℂG-module, and let ρ : G → GL(V ) be the representation


corresponding to U. Let g ∈ G be of order n. Then the following hold:
(i) ρ(g) is diagonalizable.
(ii) χU (g) equals the sum (with multiplicities) of the eigenvalues of ρ(g).
(iii) χU (g) is the sum of the χU (1)th roots of unity.
(iv) χU (g −1 ) = χU (g) the complex conjugate of χU (g).
(v) |χU (g)| ≤ χU (1).
(vi) The set {x ∈ G | χU (x) = χU (1)} is a normal subgroup of G.

Proof. Since g n = 1, we get that ρ(g) is a zero of the polynomial X n − 1. However, X n − 1


splits into distinct linear factors in ℂ[X], and so it follows that the minimal polynomial
of ρ(g) does also. Hence, ρ(g) is diagonalizable by way of proving (i). From this, we have
that the trace of ρ(g) is the sum (with multiplicities) of the eigenvalues proving (ii). The
346 � 22 Algebras and Group Representations

eigenvalues are precisely the zeros of the minimal polynomial of ρ(g), which divides
X n − 1. Consequently, these roots are nth roots of unity, which proves (iii), since χU (1) =
dimℂ (U). Each eigenvector of ρ(g) is also an eigenvector for ρ(g −1 ) with the eigenvalue
for ρ(g −1 ) being the inverse of the eigenvalue for ρ(g). Since the eigenvalues are roots
of unity, it follows that χU (g −1 ) = χU (g). From this we obtain (iv).
Now (v) follows directly from (iii). We have already seen that χU (g) is the sum of its
χU (1) eigenvalues, each of which is a root of unity. If the sum is equal to χU (1), then it
follows that each of these eigenvalues must be 1, in which case ρ(g) must be the identity
map. Conversely, if ρ(g) is the identity map, then χU (g) = dimℂ (U) = χU (1). Therefore,
{x ∈ G : χU (x) = χU (1)} = ker(ρ), and hence is a normal subgroup of G.

Suppose that χ and ψ are characters of G. We define new functions χ + ψ and χψ


from G to ℂ by (χ + ψ)(g) = χ(g) + ψ(g) and (χψ)(g) = χ(g)ψ(g) for g ∈ G. These new
functions are not a priori characters themselves. Given a scalar λ ∈ ℂ, define a new
function λχ : G → ℂ by (λχ)(g) = λχ(g). Consequently, we can view the characters of G
as elements of a ℂ-vector space of functions from G to ℂ.

Theorem 22.4.9. The irreducible characters of G are, as functions from G to ℂ, linearly


independent over ℂ.

Proof. We have ℂG ≅ M(f1 , ℂ) ⊕ ⋅ ⋅ ⋅ ⊕ M(fr , ℂ) by Theorem 22.4.1. Let S1 , . . . , Sr be the


distinct simple ℂG-modules. For each i, let ei be the identity element of M(fi , ℂ). We fix
some i.
Recall that χi (g) is the trace of the linear transformation on Si defined by g ∈ G. The
linear transformation on Si , given by ei , is the identity. Hence, χi (ei ) = dimℂ (Si ) = fi .
Moreover, if j ≠ i, then the linear transformation on Sj given by ei is the zero map, and
hence χj (ei ) = 0 for j ≠ i. Now suppose that λ1 , . . . , λr ∈ ℂ such that ∑rj=1 λj χj = 0. From
above, we see that 0 = ∑rj=1 λj χj (ei ) = λi fi for each i. It follows that λi = 0 for all i;
therefore, the characters are linearly independent.

Lemma 22.4.10. χU⊕V = χU + χV for any ℂG-modules U and V .

Proof. By considering a ℂ-basis for U ⊕ V , whose first dimℂ (U) elements form a ℂ-basis
for U ⊕ {0}, and whose remaining elements form a ℂ-basis for {0} ⊕ V , we get that
χU⊕V (g) = χU (g) + χV (g) for any g ∈ G.

Theorem 22.4.11. If S1 , . . . , Sr are the distinct (up to isomorphism), simple ℂG-modules,


then the character of the ℂG-module a1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ ar Sr with ai ∈ ℕ ∪ {0} is a1 χ1 + ⋅ ⋅ ⋅ + ar χr .
Consequently, two ℂG-modules are isomorphic if and only if their characters are equal.

Proof. The first statement follows directly from Lemma 22.4.10. Now, suppose that
χU = χV for some ℂG-modules U and V .
Since ℂG is semisimple, we can write U ≅ a1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ ar Sr and V ≅ b1 S1 ⊕ ⋅ ⋅ ⋅ ⊕ br Sr
with ai , bi ∈ ℕ ∪ {0}. By taking characters, we have
22.4 Ordinary Representations, Characters and Character Theory � 347

r
0 = χU − χV = ∑(ai − bi )χi .
i=1

By Theorem 22.4.9, this forces ai = bi for all i, and therefore U ≅ V .

Definition 22.4.12. A class function on G is a function from G to ℂ, whose value within


any conjugacy class is constant.

For example, characters of ℂG-modules are class functions.


The set of all class functions on G forms a ℂ-vector space of dimension r, where r is
the number of conjugacy classes within G. An obvious basis for this vector space is the
set of class functions on G that have the value 1 on a single conjugacy class, and 0 on all
other conjugacy classes.

Theorem 22.4.13. The irreducible characters for G form a basis for the ℂ-vector space of
class functions on G.

Proof. By Theorem 22.4.9, the irreducible characters of G are linearly independent el-
ements of the space of class functions. Their number equals the number of conjugacy
classes of G by Theorem 22.4.4, and this number is equal to the dimension of the space
of class functions.

Definition 22.4.14. If α, β are class function of G, then their inner product is the complex
number

1
⟨α, β⟩ = ∑ α(g)β(g).
|G| g∈G

This inner product is a traditional complex inner product on the space of class func-
tion. Therefore, we have the following properties:
(1) ⟨α, α⟩ ≥ 0, and ⟨α, α⟩ = 0, if and only if α = 0;
(2) ⟨α, β⟩ = ⟨β, α⟩;
(3) ⟨λα, β⟩ = λ⟨α, β⟩ for all λ ∈ ℂ;
(4) ⟨α1 + α2 , β⟩ = ⟨α1 , β⟩ + ⟨α2 , β⟩.

From these basic properties we further have


(5) ⟨α, λβ⟩ = λ⟨α, β⟩,
(6) ⟨α, β1 + β2 ⟩ = ⟨α, β1 ⟩ + ⟨α, β2 ⟩,

for all class functions α1 , β1 , α2 , β2 , and all λ ∈ ℂ.

Definition 22.4.15. If U is a ℂG-module, then

U G = {u ∈ U : gu = u for all g ∈ G}.


348 � 22 Algebras and Group Representations

Lemma 22.4.16. If U is a ℂG-module, then

1
dimℂ (U G ) = ∑ χ (g).
|G| g∈G U

1
Proof. Let a = |G|
∑g∈G g ∈ ℂG. Clearly, ga = a for any g ∈ G, and hence a2 = a. If T is a
linear transformation of U, defined by a, then T must satisfy the equation X 2 −X = 0, and
consequently, T is diagonalizable. It follows that the only eigenvalues of T are 0 and 1.
Let U1 ⊂ U be the eigenspace of T corresponding to the eigenvalue 1. If u ∈ U1 , then
gu = gau = au = u for any g ∈ G. Therefore, u ∈ U G . Conversely, suppose that u ∈ U G .
Then

|G|au = ( ∑ g)u = ∑ gu = ∑ u = |G|u,


g∈G g∈G g∈G

and hence a ∈ U1 . It follows that U G = U1 . However, the trace of T is equal to the dimen-
sion of U1 , and then the result follows from the linearity of the trace map.

Theorem 22.4.17. We have ⟨ χU , χV ⟩ = dimℂ (HomℂG (U, V )) for any ℂG-modules U, V .

Recall that HomℂG (U, V ) is an ℂ-vector space with (ϕ + ψ)(u) = ϕ(u) + ψ(u), and
(λϕ)(u) = λϕ(u) for any λ ∈ ℂ, u ∈ U and ϕ, ψ ∈ HomℂG (U, V ).

Proof. We observe that HomℂG (U, V ) is a subspace of the ℂG-module HomℂG (U, V ). If
ϕ ∈ HomℂG (U, V ) and g ∈ G, then (gϕ)(u) = gϕ(g −1 u) = gg −1 ϕ(u) = ϕ(u) for any u ∈ U.
Hence, gϕ = ϕ for all g ∈ G. This implies that ϕ ∈ HomℂG (U, V )G . By reversing the
elements, we get HomℂG (U, V ) = HomℂG (U, V )G .
Therefore,

dimℂ (HomℂG (U, V )) = dimℂ (HomℂG (U, V )G )


1
= ∑χ (g)
|G| g∈G HomℂG (U,V )
1
= ∑ χ (g)χV (g)
|G| g∈G U

= ⟨ χV , χU ⟩

by Lemma 22.4.16, and part (iii) of Theorem 22.4.8.


This implies that

⟨ χU , χV ⟩ = ⟨ χV , χU ⟩ = ⟨ χV , χU ⟩ = dim(HomℂG (U, V )),

since we know that ⟨ χV , χU ⟩ is real.


22.4 Ordinary Representations, Characters and Character Theory � 349

The Character Table and Orthogonality Relations


We have seen that the number of conjugacy classes r in a finite group G is the same
as the number of irreducible characters. Furthermore, the set of irreducible characters
form a basis for the space of class functions on G. If χ1 , . . . , χr are the set of irreducible
characters, and g1 , . . . , gr are a complete set of conjugacy class representatives, then the
r × r-matrix χ = (χi (gj )) is called the character table for G.
We close this section by showing that the rows and columns of the character table
are orthogonal vectors relative to the defined inner product. These results are called the
orthogonality relations. As a consequence of these relations, we obtain the fact that the
irreducible characters form an orthonormal basis for the space of characters. There is
great deal of other information that can be obtained from the character table. We refer
to the book by Alperin and Bell [1] for further discussion.

Theorem 22.4.18 (First orthogonality relation). Let χ1 , . . . , χr be the set of irreducible char-
acters of G. Then

1 0, if i ≠ j,
∑ χi (g)χj (g) = {
|G| g∈G 1, if i = j.

In other words, the irreducible characters form an orthonormal set with respect to
the defined inner product.

Proof. Let S1 , . . . , Sr be the distinct simple ℂG-modules that go with the irreducible char-
acters. From the previous theorem, we have

⟨ χi , χj ⟩ = dimℂ (HomℂG (Si , Sj ))

for any i, j. We further have HomℂG (Si , Si ) ≅ ℂ, and by Schur’s lemma HomℂG (Si , Sj ) = 0
for i ≠ j, proving the theorem.

Corollary 22.4.19. The set of irreducible characters form an orthonormal basis for the
vector space of class functions.

Proof. The irreducible characters form a basis for the space of characters, and from the
orthogonality result they are an orthonormal set relative to the inner product.

The second orthogonality relation says that the columns of the character table are
also a set of orthogonal vectors. That is, the irreducible characters of a set of conju-
gacy class representatives also forms an orthogonal set with respect to the defined inner
product.

Theorem 22.4.20 (Second orthogonality relation). Let χ1 , . . . , χr be the set of irreducible


characters of G, and suppose that g1 , . . . , gr are a set of conjugacy class representatives,
and k1 , . . . , kr are the orders of the conjugacy classes. Then for any 1 ≤ i, j ≤ r, we have
350 � 22 Algebras and Group Representations

r 0, if i ≠ j,
∑ χs (gi )χs (gj ) = { |G|
s=1 ki
, if i = j.

Proof. Let χ = (χi (gj ))1≤i,j≤r be the character table for G, and let K be the r × r diagonal
matrix with the set {k1 , . . . , kr } as its main diagonal. Then we have (χK)i,j = χi (gj )kj for
any i, j. Then
r
t
(χKχ )ij = ∑ kℓ χi (gℓ )χj (gℓ ) = ∑ χi (g)χj (g),
ℓ=1 g∈G

but this equals = |G|⟨ χi , χj ⟩ by the first orthogonality relation.


t
Hence, χKχ = |G|I, where I is the identity matrix. It follows that for any i, j, we have
r
|G| = ∑ kj χℓ (gj )χℓ (gj ),
ℓ=1

and
r
0 = ∑ kj χℓ (gj )χℓ (gi ) for i ≠ j,
ℓ=1

completing the proof.

As mentioned before, more information about character tables and their conse-
quences can be found in [1].

22.5 Burnside’s Theorem


We conclude this chapter by presenting a very important result in finite group theory,
whose proof uses representation theory. This is Burnside’s Theorem, which asserts that
any group of order pa qb with p, q distinct primes must be solvable. Burnside’s result was
important in the proof of the famous Feit–Thompson theorem, which asserted that any
group of odd order must be solvable. This was crucial in the classification of finite simple
groups.
Recall that a group G is solvable if it has a normal series with Abelian factors. Solv-
able groups play a crucial role in the proof of the insolvability of the quintic polynomial,
and we discussed solvable groups in detail in Chapter 12. For the proof, we need the fol-
lowing two facts about solvable groups:
1. If a group G has a normal solvable subgroup N with G/N solvable, then G is solvable
(Theorem 12.1.3).
2. Any finite group of prime power order is solvable (Theorem 12.1.8).

We start with several lemmas that depend on representation theory.


22.5 Burnside’s Theorem � 351

Let G be a finite group, and suppose it has r irreducible representations χ1 , χ2 , . . . , χr


of respective degrees m1 , m2 , . . . , mr . Suppose the respective orders of the r conjugacy
classes are h1 , h2 , . . . , hr . The statements in the lemmas depend on some mild facts on
algebraic integers. An algebraic integer is a complex number, which is a zero of a monic
integral polynomial. Here we just need the following two facts:
1. The set of algebraic integers forms a subring of ℂ.
2. If an algebraic integer is a rational number, then it is an ordinary integer.

For more information about algebraic integers see Chapter 21.

Lemma 22.5.1. Let χ be a character of G. The value χ(g) for any g ∈ G is an algebraic
integer.

Proof. For any g ∈ G, the value χ(g) is a sum of roots of unity. However, any root of unity
satisfies a monic integral polynomial X n − 1 = 0, and hence is an algebraic integer. Since
the algebraic integers form a ring, any sum of roots of unity is an algebraic integer.

Lemma 22.5.2. Let χ be an irreducible character of G. Let g ∈ G and CG (g) the centralizer
of g in G. Then

|G : CG (g)|
χ(g)
χ(1)

is an algebraic integer.

Proof. Let S be the simple ℂG-module having character χ.


Let g ∈ G, and let C be the conjugacy class of g in G. By Theorem 13.2.1, we have
|C| = |C : CG (g)|.
Let α ∈ ℂ, α = ∑x∈K x, be the class sum of K. We consider the map φ : S → S, φ(s) =
αs, for s ∈ S. From Theorem 22.4.4 and its proof, we get that α is in the center of ℂG.
This gives φ ∈ EndℂG (S), and there exists a λ ∈ ℂ with αs = λs for all s ∈ S by
Schur’s lemma. We obtain

λχ(1) = ∑ χ(x) = |C|χ(g) = 󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨 χ(g)


󵄨 󵄨
x∈C

by taking traces. Therefore,

|G : CG (g)|
λ= = χ(g).
χ(1)

Let τ : ℂG → ℂG, τ(z) = zα for z ∈ ℂG. We get τ ∈ EndℂG (ℂG) by the proof of Lem-
ma 22.3.6. Since S is a simple ℂG-module. Therefore, we may consider S as a submodule
of ℂG, and for 0 ≠ s ∈ S ⊂ ℂG, we have τ(s) = sα = αs = λs, since α is a central element.
352 � 22 Algebras and Group Representations

Therefore, λ is an eigenvalue of τ, and so det(λI − A) = 0, where I is the identity


matrix, and A the matrix of τ with respect to the ℂ-basis G for ℂG. Each entry of A is
either 0 or 1, which means that, in particular, f (X) = det(XI − A) is a monic polynomial
in X with integer coefficients. Since f (λ) = 0, we get that λ is an algebraic integer.

Lemma 22.5.3. Let χ be an irreducible character of G. Then χ(1) divides |G|.

Proof. Let g1 , g2 , . . . , gr be a set of representatives of the conjugacy classes of G. We know


that
|G : CG (gi )| χ(gi )
and χ(gi ) = χ(gi−1 )
χ(1)

are algebraic integers. By the first orthogonality relation

|G| 1 r 󵄨󵄨 r
󵄨 χ(gi )
∑ 󵄨󵄨G : CG (gi )󵄨󵄨󵄨 χ(gi )χ(gi ) = ∑ 󵄨󵄨󵄨G : CG (gi )󵄨󵄨󵄨 χ(gi ),
󵄨 󵄨
=
χ(1) χ(1) i=1 i=1
χ(1)

which is an algebraic integer, and hence an ordinary integer.


χ(g)
Lemma 22.5.4. Let G be a character of G, g ∈ G and γ = χ(1)
.
If γ is a nonzero algebraic integer, then |γ| = 1.

Proof. From Theorem 22.4.8, we know that |γ| ≤ 1.


Suppose that 0 < |γ| < 1, and assume that γ is an algebraic integer.
Now, γ is an average of complex roots of unity. The same will be true for all σ(γ)
with σ ∈ Aut(K | ℚ) =: H, where K is the splitting field of the minimal polynomial of γ
over ℚ.
In particular, |σ(γ)| ≤ 1 for all σ ∈ H. Hence, p := | ∏σ∈H σ(γ)| < 1.
On the other hand, p ∈ ℤ by Theorems 7.3.12 and 16.5.1 (recall that γ is a zero of a
irreducible, monic polynomial with integer coefficients, see Theorem 4.4.3).
This implies p = 0, and therefore the constant term of the minimal polynomial of γ
over ℚ must be zero, which gives a contradiction.
Hence, γ cannot be an algebraic integer.

Theorem 22.5.5. If G has a conjugacy class of nontrivial prime power order, then G is not
simple.

Proof. Suppose that G is simple and that the conjugacy class of 1 ≠ g ∈ G has order pn
with p a prime number, and n ∈ ℕ. From the second orthogonality relation, we get

0 1 r 1 1 r
0= = ∑ χi (g)χi (1) = + ∑ χi (g)χi (1),
p p i=1 p p i=2

where χ1 , χ2 , . . . , χr are the irreducible characters of G (recall that χ1 is the principal char-
acter).
22.5 Burnside’s Theorem � 353

χ (g)χ (1)
Since − p1 is not an algebraic integer, it follows that i p i is not an algebraic integer
for some 2 ≤ i ≤ r. As χi (g) is an algebraic integer, this implies that p ∤ χi (1), and χi (g) ≠ 0.
Now |G : CG (g)| = pn is relatively prime to χi (1).
Therefore,

a󵄨󵄨󵄨G : CG (g)󵄨󵄨󵄨 + bχi (1) = 1


󵄨 󵄨

for some a, b ∈ ℤ (see Theorem 3.1.9).


Thus,

χi (g) a|G : CG (g)| χi (g)


= + bχi (g),
χi (1) χi (1)

which is an algebraic integer, and therefore |χi (x)| = χi (1).


Consequently,

g ∈ Zi = {x ∈ G : 󵄨󵄨󵄨 χi (x)󵄨󵄨󵄨 = χi (1)}.


󵄨 󵄨

We show that Zi is a subgroup of G.


First of all, if g ∈ Zi , then g −1 ∈ Zi . From Theorem 22.4.8, we also get |χi (g)| = χi (1) if
and only if g has exactly one eigenvalue. If g ∈ Zi , let this eigenvalue be λ(g), so that, if
U is the ℂG-module corresponding to χi , then we have gu = λ(g)u for all u ∈ U. We now
see that for g, h ∈ Zi , then (gh)u = λ(g)λ(h)u for all u ∈ U. Hence, χi (gh) = χi (1)λ(g)λ(h),
and thus |χi (gh)| = χi (1), which gives gh ∈ Zi . Therefore, Zi is a subgroup of G.
Now, let Ki = {x ∈ G : χi (x) = χi (1)}. Ki is a normal subgroup of G, and also in Zi .
We now want to show that

Zi /Ki = Z(G/Ki ),

the center of G/Ki . If ρ : G → GL(U) is the representation corresponding to χi , then for


any g ∈ Zi , the matrix of ρ(g) (with respect to any ℂ-basis of U) will be scalar, and hence
ρ(g) ∈ Z(ρ(G)). Since ρ(G) ≅ G/Ki , it follows that Zi /Ki is a subgroup of Z(G/Ki ). Now,
we apply that χi is irreducible. If gKi ∈ Z(G/Ki ), then ρ(g) commutes with ρ(x) for every
x ∈ G. Consequently, the map defined by u 󳨃→ gu, u ∈ U, is a ℂG-endomorphism of U.
But U is simple, so we have EndℂG (U) ≅ ℂ by Schur’s lemma.
Therefore, there is a complex root of unity μ such that gu = μu for all u ∈ U. We
now have χi (g) = χi (1), and hence g ∈ Zi . Therefore, Zi /Ki = Z(G/Ki ).
Consequently, if G is non-Abelian and simple, then Zi = {1}. But this gives a contra-
diction.

Theorem 22.5.6 (Burnside’s Theorem). If |G| = pa qb , where p and q are prime numbers
and a, b ∈ ℕ, then G is solvable.
354 � 22 Algebras and Group Representations

Proof. We use induction on a + b. If a + b = 1, then G has a prime order, and hence is


solvable. We now assume that a + b ≥ 2, and that any group of order pr qs , r, s ∈ ℕ, is
solvable whenever r + s < a + b.
First of all, if the center Z(G) is nontrivial, then G is solvable, because Z(G) is solvable
and G/Z(G) is solvable by the inductive hypothesis.
Now, let Z(G) = {1}.
Then we may take h1 = 1 for the conjugacy class of 1.
By the class equation (see Theorem 13.2.2), we then have

pa qb = |G| = 1 + h2 + h3 + ⋅ ⋅ ⋅ + hr .

It follows that pq cannot divide each h2 , h3 , . . . , hr . Hence, hi is a prime power of either p


or q for some i ≥ 2. If hi is a nontrivial prime power, then from Theorem 22.5.5 it follows
that G is not simple.
If hi = 1 for some i ≥ 2, then G has at least two representations into ℂ. The number
of these representations is given by the Abelianizations, which is given by |G : G′ |, where
G′ is the commutator subgroup of G. Then |G : G′ | > 1, and since G′ is non-Abelian, G′
is a proper normal subgroup. Hence, G is not simple. So, in any case, G is not simple.
Therefore, G contains a proper normal subgroup N. Since |N|||G|, we have |N| = pa1 qb1
with a1 + b1 < a + b, since N is a proper subgroup.
By the inductive hypothesis, N is solvable. Furthermore, |G/N| also divides |G|. So,
for the same reason, G/N is solvable. Therefore, both N and G/N are solvable, so G is
solvable by Theorem 12.1.3.

22.6 Exercises

1. Let K be a field, and let G be a finite group. Let U and V be KG-modules having the
same dimension n, and let ρ : G → GL(U) and τ : G → GL(V ) be the corresponding
representations.
By fixing K-bases for U and V , consider ρ and τ as homomorphisms from G to
GL(n, K). Show that U and V are KG-module isomorphic if and only if there exists
some M ∈ GL(n, K) such that ρ(g)M = Mτ(g) for every g ∈ G.
2. Let K be a field, and let G be a finite group. Let x = ∑g∈G g ∈ KG.
(i) Show that the subspace Kx of KG is the unique submodule of KG, that is, iso-
morphic to the trivial module.
(ii) Let ϵ : KG → K be the KG-module epimorphism defined by ϵ(g) = 1 for all
g ∈ G.
Show that ker(ϵ) is the unique KG-submodule of KG, whose quotient is isomor-
phic to the trivial module. This kernel is called the augmentation ideal of KG.
22.6 Exercises � 355

(iii) Suppose that char(K) = p, with p dividing |G|. Show that KG ⊂ ker(ϵ), the aug-
mentation ideal of KG. Show that ker(ϵ) is not a direct summand of KG, and
hence that the KG-module KG is not semisimple.
3. Show that the converse of Corollary 22.2.24 is true.
4. Let U be a finite-dimensional K-vector space and let G be a finite group with fully
reducible representation ρ : G → GL(U). Show that ρ gives a direct decomposition

U = V1 ⊕ ⋅ ⋅ ⋅ ⊕ Vk

of U with all Vi , i = 1, . . . , k, irreducible G-invariant subspaces of U.


5. Show that A is a simple A-module if and only if A is a division algebra.
6. Let n ∈ ℕ, and let Tn (K) be the algebra of upper triangular n × n matrices over K.
(i) Show that the set Vn (K) of column vectors of K of length n is a Tn (K)-module that
has a unique composition series, in which every simple Tn (K)-module appears
exactly once as a composition factor.
(ii) Show that the Tn (K)-module Tn (K) is isomorphic to the direct sum of all nonzero
submodules of Vn (K).
7. Let U be an A-module, let n ∈ ℕ, and let U n be the set of column vectors of length
n with entries from U, considered in the obvious way as an M(n, A)-module. Show
that U is a simple A-module if and only if U n is a simple M(n, A)-module.
8. Let χ be an irreducible character of G. Let λ be any |G|th root of unity. Show that the
set {x ∈ G : χ(x) = λχ(1)} is a normal subgroup of G.
9. Prove that the set of algebraic integers forms a subring of ℂ.
10. Prove that if an algebraic integer is rational, then it is an ordinary integer.
11. Prove that G is simple if and only if the only irreducible character χi , for which
χi (g) = χi (1) for some 1 ≠ g ∈ G is the principal character χ1 .
23 Algebraic Cryptography
23.1 Basic Algebraic Cryptography
23.1.1 Cryptosystems Tied to Abelian Groups

Cryptography refers to the science of sending and receiving coded messages. Coding and
hidden ciphering is an old endeavor used by governments and military, and between
private individuals from ancient times. Recently, it has become even more prominent
because of the necessity of sending secure and private information, such as credit card
numbers and passwords, over essentially open communication systems.
Traditionally, cryptography deals with devising and implementing secret codes or
cryptosystems. Cryptoanalysis is the science of breaking cryptosystems while cryptology
refers to the whole field of cryptography plus cryptoanalysis.
A cryptosystem or code is an algorithm to change a plain message, called the plain
text message, into a coded message, called the ciphertext message. In general, both the
plaintext message (uncoded message) and the ciphertext message (coded message) are
written in some N-letter alphabet which is usually the same for both plaintext and code.
The method of coding, or the encoding algorithm, is then a transformation of the N let-
ters. The most common way to perform this transformation is to consider the N letters
as N integers modulo N and then apply a number theoretical function to them. There-
fore, many encoding algorithms use modular arithmetic and hence cryptography is tied
to number theory and Abelian groups.
Modern cryptography is usually separated into classical cryptography, called sym-
metric key cryptography, and public key cryptography. In the former, both the encoding
and decoding algorithms are supposedly known only to the sender and receiver, usually
referred to as Bob and Alice. In the latter, the encryption method is public knowledge
but only the receiver knows how to decode.
The message that one wants to send is written in plaintext and then converted into
code. The coded message is written in ciphertext. The plaintext message and the cipher-
text message are written in some alphabets that are usually the same. The process of
putting the plaintext into code is called enciphering or encryption while the reverse pro-
cess is called deciphering or decryption.
Encryption algorithms break the plaintext and ciphertext message into message
units. These are single letters or, more generally, k-vectors of letters. The transforma-
tions are done in these message units and the encryption algorithm is a mapping from
the set of plaintext message units to the set of ciphertext message units.
Putting this into a mathematical formulation, we let 𝒫 to be the set of all plaintext
message units and 𝒞 be the set of all ciphertext message units. The encryption algorithm
is then the application of an injective map f : 𝒫 → 𝒞 . The map f is the encryption map. The
left inverse map g: 𝒞 → 𝒫 is the decryption or deciphering map. The collection {𝒫 , 𝒞 , f , g}
is called a basic cryptosystem.

https://doi.org/10.1515/9783111142524-023
23.1 Basic Algebraic Cryptography � 357

We may place this in a more general context. We call this wider model a (general)
cryptosystem, indexed by a set 𝒦, called the key space. Formally, a cryptosystem is a tuple
(𝒫 , 𝒞 , 𝒦, ℰ , 𝒟) where 𝒫 is the set of plaintext message units, called the plaintext space, 𝒞
is the set of ciphertext message units, called the ciphertext space, the elements k ∈ 𝒦 are
called keys, ℰ is a set of injective maps fk : 𝒫 → 𝒞 indexed by the key space. This is called
the set of encryption maps. Hence, for each k ∈ K, there is an injective map fk : 𝒫 → 𝒞 .
The set 𝒟 consists of maps gk : 𝒞 → 𝒫 , also indexed by the key space. This is called the
set of decryption maps.
The central property of a cryptosystem is that, for each k ∈ K, there exists a corre-
sponding key k ′ ∈ 𝒦 and a decryption map gk ′ : 𝒞 → 𝒫 such that gk ′ is the left inverse
of fk . In our previous language this means that for each k ∈ 𝒦 we have a basic cryptosys-
tem {𝒫 , 𝒞 , fk , gk ′ } with k the encryption key and k ′ the decryption key.
Using this model, we can easily distinguish symmetric from asymmetric cryptosys-
tems. In a symmetric key cryptosystem, if the encryption key k is given, it is easy to
find the corresponding decryption key k ′ . In fact, most of the time we have k = k ′ . In
an asymmetric or public key cryptosystem, even if the encryption key k is known, it is
infeasible to find or to compute the corresponding decryption key k ′ .
In the following, we describe some cryptosystems and start with the symmetric key
cryptosystems.

23.1.1.1 Permutation Cipher


The simplest type of encryption algorithm is a permutation cipher. Here, the letters of
the plaintext alphabet are permuted and the plaintext message is sent in the permuted
letters. A very straightforward example of a permutation encryption algorithm is a shift
algorithm.
Here, we consider the plaintext alphabet as the integers 0, 1, . . . , N − 1 (mod N). We
choose a fixed integer k and the encryption algorithm is

f (m) ≡ (m + k) (mod N).

This is often known as a Caesar code after Julius Caesar who supposedly invented it.
Any permutation encryption algorithm is very simple to attack using statistical analy-
sis. Polyalphabetic ciphers are an attempt to thwart statistical attacks. One variation of
the basic Caesar code is the following where message units are k-vectors. It is actually
a type of polyalphabetic cipher called a Vigenère code. In this code, message units are
considered as t-vectors of integers modulo N from an N letter alphabet. Let (b1 , . . . , bt )
be a fixed t-vector in ℤtn . This Vigenère code then takes a message unit (a1 , . . . , at ) to
(a1 + b1 , . . . , at + bt ) (mod N). For a long period of time polyalphabetic ciphers where
considered unbreakable. In 1920, the Friedmann test was developed. Given a sequence
of letters of length m representing a Vigenère encrypted cipher text, the Friedmann test
calculates the length t of the key word (b1 , . . . , bt ), see for instance [66]. A statistical anal-
ysis then allows to break the Vigenère code.
358 � 23 Algebraic Cryptography

23.1.1.2 One-Time Pad


We now describe the one-time pad which has perfect security. Here, let 𝒫 be the set of
plaintext messages, 𝒞 the set of ciphertext messages, and 𝒦 the set of keys for a cryp-
tosystem ℰ . Then ℰ has perfect security if for any given plaintext message 𝒫 and cor-
responding ciphertext message 𝒞 we have that the conditional probability Prob(P|C) of
determining the plaintext message P, given knowledge of the ciphertext message C, is
exactly the same as the absolute probability Prob(P) of determining the plaintext P.

Definition 23.1.1. Suppose the sets 𝒫 of plaintext messages, 𝒞 of ciphertext messages


and 𝒦 of keys are all given by elements of {0, 1}n . That is, plaintext messages, ciphertext
messages and keys are all random bit strings of fixed length n. For a given k ∈ 𝒦 the
encryption function is given by Fk (p) = p ⊕ k for p ∈ 𝒫 . Here, ⊕ denotes the XOR opera-
tion on each pair of corresponding bits. This is simply the operation on bits {0, 1}, that is,
addition modulo 2. We assume that the distribution on all three sets is the uniform distri-
bution and a key k is only used once. The resulting cryptosystem is called a one-time pad.

Shannon, see [98], proved that the one-time pad, under the assumptions provided
in the definition, is perfectly secure, as long as the keys are randomly chosen and used
only once.

Theorem 23.1.2. A one-time pad has perfect security if the keys are randomly chosen from
the uniform distribution of keys and a key is used only once.

Although the one-time pad is theoretically secure there are many problems with
its practical use because of the assumptions described above. For these reasons the one-
time pad, while important theoretically, is not used to a great extent in encryption. How-
ever, a stream cipher is a method to attempt to mimic the important properties of the
one-time pad. A stream cipher is a symmetric key cipher where plaintext characters are
combined with a pseudo-random key generator called a key stream. In a stream cipher
the plaintext characters are encrypted one at a time and the encryption of successive
characters varies during the encryption.
Stream ciphers require sequences of pseudo-random digits. These are sequences
that behave as if they are random. Here we will discuss a procedure to generate pseudo-
random sequences and hence stream cipher key generation. First we need the concept
of a linear congruence generator. For a given natural number n we denote by ℤn the
ring of integers modulo n. Elements of ℤn are residue classes of integers modulo n. If a
is an integer, we will denote the corresponding residue class in ℤn by a.

Definition 23.1.3. Let n ∈ ℕ and a, b ∈ ℤn . A bijective map f : ℤn → ℤn given by x 󳨃→


ax + b is called a bilinear congruence generator.

Notice that the map x 󳨃→ ax + b is bijective if and only if gcd(a, b) = 1. If we choose


a large modulus n, linear congruence generators are used to generate pseudo-random
integers. In using a linear congruence generator f : x 󳨃→ ax + b the integers a, b should be
23.1 Basic Algebraic Cryptography � 359

chosen such that the function g has no fixed point in ℤn . Then b ≠ 0 for otherwise 0 is
a fixed point. Hence, let b ≠ 0. If a = 1, then f has no fixed point but then the function is
just a linear shift which is insecure. Therefore, let a ≠ 1. Then f has a fixed point in ℤn if
gcd(a − 1, n) = 1 because then there exists a d ∈ ℤ with (d(a − 1)) = 1 and then x = −db is
a fixed point in ℤn . Therefore, altogether for a linear congruence generator we should
choose a and b such that gcd(a − 1, n) > 1, a ≠ 1, and b ≠ 0.
Using the idea of a linear congruence generator, we now give a procedure for the
generation of a stream cipher.
1. Choose a seed s ∈ ℤ by key agreement or as a random number.
2. Let n ∈ ℕ, a, b ∈ ℤ and f : ℤn → ℤn , x 󳨃→ ax + b be a linear congruence generator.
Define the sequences x0 = s, x 1 ≡ f (x 0 ) (mod n), x 2 ≡ f (x 1 ) (mod n), . . . .
3. Transform the sequence of plaintext units into a sequence of residue classes
m0 , m1 , . . . in ℤn .
4. Encrypt the mi into ci = mi + x i ∈ ℤn . The secret key is s ∈ ℤn .

We give the following remarks.


1. The integer n should be very large and the residue classes should occur with the
same probability. Further the function f should not have a fixed point. To accom-
plish this we must choose f and s ∈ ℤn such that the period length x 0 , x 1 , . . . is as
large as possible. Best would be the maximal length n.
2. If we know sufficiently many plain text units which follow each other and we know
the linear congruence generator used then we may calculate s.

Theorem 23.1.4 (Maximal period length for n ≥ 2). Let n ∈ ℕ with n = 2m , m ≥ 1, and let
a, b ∈ ℤ such that f : ℤn → ℤn , x 󳨃→ ax + b is a linear congruence generator. Further let
s ∈ {0, 1, . . . , n − 1} be given, x 0 = s, x 1 = f (x 0 ), . . . . Then the sequence x 0 , x 1 , . . . is periodic
with the maximal period length n = 2m if and only if the following holds:
(1) a is odd.
(2) If m ≥ 2 then a ≡ 1 (mod 4).
(3) b is odd.

Proof. We show that (1), (2), and (3) hold if the period length is maximal. First we must
have gcd(a, n) = 1 since f is a linear congruence generator. Further f has no fixed point
because the period length is maximal. We show that a ≡ 3 (mod 4) is not possible if
m ≥ 2. Suppose that a ≡ 3 (mod 4) and m ≥ 2. Suppose that a ≡ 3 (mod 4) and m ≥ 2.
Then a + 1 ≡ 0 (mod 4) and it follows that

(1 + a + a2 + ⋅ ⋅ ⋅ + a2i−1 ) = (1 + a2 + a4 + ⋅ ⋅ ⋅ + a2i−2 )(1 + a) ≡ 0 (mod 4). (∗)

We now consider

x i+1 − x i = f (x i ) − f (x i−1 ) = (a x i + b) − (a x i−1 + b) = a(x i − x i−1 )


360 � 23 Algebraic Cryptography

for i ≥ 1. It then follows recursively that

x k − x 0 = (x k − x k−1 ) + (x k−1 − x k−2 ) + ⋅ ⋅ ⋅ + (x 1 − x 0 )


k−1 k−2
=a (x 1 − x 0 ) + a (x 1 − x 1 ) + ⋅ ⋅ ⋅ + (x 1 − x 0 )
2 k−1
= (x 1 − x 0 )(1 + a + a + ⋅ ⋅ ⋅ + a )

for k ≥ 1. Therefore

x2i ≡ x0 (mod 4) and x2i+1 ≡ x1 (mod 4)

from relation (∗) above.


Hence half of the elements in the sequence have the same residue class as x0 mod-
ulo 4 and the other half the same as x1 modulo 4 which gives a contradiction to the
maximality of the period length. Therefore a ≡ 1 (mod 4) if m ≥ 2. To show (3) notice
that in a sequence with maximal period length the residue class 0 must occur.
Hence, without loss of generality, we may assume that x 0 = 0. Then x 1 = b and
recursively we have

k−1
x k = (1 + a + ⋅ ⋅ ⋅ + a )b

for k ≥ 1 since x 0 = 0 and x 1 = b. All elements in the sequence are multiples of b. There
is an x i = 1 and therefore b is invertible in ℤn and hence b is odd.
Now, assume that (1), (2), and (3) are satisfied. The theorem follows directly if n = 2
since then if x 0 = 0 we have x 1 = 1 and if x 0 = 1 we have x 1 = 0. Now suppose that m ≥ 2,
so that n ≥ 4. We show that we may obtain the maximal length n = 2m for x 0 = 0 which
proves the theorem.
k−1
Let x 0 = 0. Then as before we obtain recursively x k = (1 + a + ⋅ ⋅ ⋅ + a )b for k ≥ 1.
k−1
Since b is odd we have x k = 0 if and only if (1 + a + . . . a ) = 0 in ℤn .
We write k = 2r t with r ≥ 0 and t odd. Then

k−1
0 = (1 + . . . a )
2r −1 2r 2r 2 2r t−1
= (1 + a + ⋅ ⋅ ⋅ + a )(1 + a + (a ) + ⋅ ⋅ ⋅ + (a ) ).

The second factor is congruent to 1 modulo 2 and hence 2m |(1 + a + ⋅ ⋅ ⋅ + ak−1 ) if and only
r r
if 2m |(1 + a + ⋅ ⋅ ⋅ + a2 −1 ). The integer 1 + a + ⋅ ⋅ ⋅ + a2 −1 is divisible by 2r since it is the
sum of 2r odd numbers but not divisible by 2r+1 . It follows that r ≥ m if and only if 2m |k.
Therefore x k = 0 occurs for k ≥ 1 for the first time when k = n = 2m .

We now describe some of the current public key cryptosystems. We start with the
RSA cryptosystem named after L. Rivest, A. Shamir, and L. Adleman.
23.1 Basic Algebraic Cryptography � 361

23.1.1.3 RSA Cryptosystem


Alice chooses two distinct primes p and q and computes the product n = pq; n must be
chosen large enough. For the Euler φ-function we have

φ(n) = 󵄨󵄨󵄨{a ∈ ℕ | 1 ≤ a ≤ n and gcd(a, n) = 1}󵄨󵄨󵄨


󵄨 󵄨

= pq − p − q + 1
= (p − 1)(q − 1).

Now Alice computes two numbers e, s ≥ 3 such that es ≡ 1 (mod φ(n)). The number s
should be large; otherwise, the private key (n, s) is insecure due to an attack by Wiener,
see [104]. Assume that the plaintext message is given by an integer x ∈ {0, 1, . . . , n − 1}.
The public key is the pair (n, e), and the encryption is done by x 󳨃→ x e (mod n). Alice
decrypts by y 󳨃→ ys (mod n).
Now, let y = x e (mod n). If es = 1 + (p − 1)k, then

ys ≡ x es
k
≡ x ⋅ (x p−1 )
0 if p|x
={ k
x⋅1 otherwise
≡ x (mod p)

by Fermat’s Little Theorem.


Analogously, ys ≡ x (mod q). In other words, both p and q divide ys − x. Since p and
q are coprime, n = pq divides ys − x, and hence we have ys ≡ x (mod n). Especially, x ≡
ys (mod n) if x ∈ {0, 1, . . . , n − 1}. Thus, every encrypted message is decrypted correctly.
The security certificate of the RSA cryptosystem is based on the assumption that
the factorization into prime factors is difficult for large numbers. It is not really known
how difficult the factorization problem really is. It is possible that there exists an easy
solution to the factorization problem that is not yet known. At the present time we can
say that the factorization problem is in the complexity class NP.
Recall that a mathematical problem Π belongs to NP if there exists a polynomial
time algorithm which can prove if a general solution is correct or not. The factorization
problem for an integer n ≥ 1 is in NP because it can be checked with the division algo-
rithm if a general divisor is or is not a divisor of n. If the input value is n then we have
to make 𝒪(2n ) tests. We now discuss the ElGamal encryption.

23.1.1.4 ElGamal Encryption


The basic scheme for an ElGamal encryption system is the following. Each user chooses
a common large prime p and a generator g for the cyclic group ℤ∗p = ℤp \ {0}, the unit
group within ℤp . Given a large prime p there is a fixed efficiently invertible procedure
362 � 23 Algebraic Cryptography

to encrypt plaintext into residue classes within ℤ∗p . For each message transmission the
user’s public key is (p, g, A) where A = g a for some integer a.
The encryption works as follows. Suppose that Bob wants to send a message to Alice.
Alice’s public key is (p, g, A) as above. The message is m, and as above, is encrypted in
some workable efficient manner within ℤ∗p , that is, the message is encrypted in a man-
ner known to all users as an integer in {0, 1, . . . , p − 1}. Bob now randomly chooses an
integer b and computes B = g b . He now sends to Alice (B, mC) where C = g ab . To decrypt,
Alice first uses B to determine the common shared key C. Since B = g b , and she knows
A = g a , she knows C = g ab and the modulus p. Hence, she can compute the inverse g (−ab)
to obtain the message m.
The security certificate of the ElGamal cryptosystem is based on the difficulty of the
Computational Diffie–Hellman problem (CDH) for ℤ∗p : given a prime p, a generator g of
ℤ∗p , g a modulo p and g b modulo p, determine g ab modulo p. Certainly, the CDH can be
formulated for each cyclic group G = ⟨g⟩: the CDH is the problem to find g ab for two
elements g a and g b . At present, the only known solution of CDH is to solve the discrete
logarithm problem (DLP): for G = ⟨g⟩ being a cyclic group and h ∈ G, find a ∈ ℤ such
that h = g a .
The DLP appears to be very hard for large orders |G| of G. Solving the DLP for ℤ∗p
breaks the ElGamal cryptosystem, as does solving the CDH. It is not known whether the
CDH can be solved without solving the DLP. The ElGamal encryption becomes the basis
for elliptic curve cryptography which we discuss briefly.

23.1.1.5 Elliptic Curve Cryptography


A very powerful approach which has wide ranging applications in cryptography is to
use elliptic curves. If K is a finite field of characteristic not equal to 2 or 3 then an elliptic
curve over K (in Weierstrass form) is the locus of points (x, y) ∈ K × K satisfying the
equation y2 = x 3 + ax + b with 4a3 + 27b2 ≠ 0. We denote by 𝒪 a single point at infinity
and let

E(K) = {(x, y) ∈ K × K : y2 = x 3 + ax + b} ∪ {𝒪}.

The important thing about elliptic curves from the viewpoint of cryptography is that a
group structure can be placed on E(K). In particular, we define the operation + on E(K)
by:
1. 𝒪 + P = P for any point P ∈ E(K).
2. If P = (x, y), then −P = (x, −y) and −𝒪 = 𝒪.
3. P + (−P) = 𝒪 for any point P ∈ E(K).
4. If P1 = (x1 , y1 ) and P2 = (x2 , y2 ) such that P1 ≠ −P2 , then P1 + P2 = (x3 , y3 ) with
y2 −y1 3x12 +a
x3 = m2 − (x1 + x2 ) and y3 = −m(x3 − x1 ) − y1 where m = x2 −x1
if x2 ≠ x1 and m = 2y1
if x2 = x1 .
23.1 Basic Algebraic Cryptography � 363

This operation has a very nice geometric interpretation if K = ℝ. It is known as the


chord and tangent method. If P1 ≠ P2 are two points on the curve then the line through
P1 and P2 intersects the curve at another point P3 . If we reflect P3 through the x-axis we
get P1 + P2 . If P1 = P2 we take the tangent line at P1 . With this operation E(K) becomes
an Abelian group. A very detailed proof can be found in [6]. The structure of the group
can be worked out.

Theorem 23.1.5. If K is a finite field of order pk then the group E(K) is either cyclic or is
isomorphic to ℤm1 × ℤm2 with m1 |m2 and m1 |(pk − 1).

A proof of this result is given in [60].


We now consider the case K = ℤp , p ≥ 5, and write a and b instead of a and b
in ℤp for the residue classes. Let f (x) = x 3 + ax + b. We have p elements x in ℤp . If
f (x) = 0, then we have exactly one point (x, 0) in E(ℤp ). If f (x) is a nontrivial square in
p−1
ℤp , especially f (x) 2 = 1, then for x there are two points (x, y) and (x, −y) in E(ℤp ). If
f (x) is not a square in ℤp , then for x there is no point in E(ℤp ). Finally we have to add 1
for the element 𝒪. Hence, |E(ℤp )| = 1 + s + 2t where s is the number of x with f (x) = 0
and t is the number of nontrivial squares in ℤp .
We now give a version of Hasse’s Theorem for ℤp , p ≥ 5.

Theorem 23.1.6 (Hasse’s Theorem). Let I = [p+1−2√p, p+1+2√p]∩ℕ. Then there exists
for each k ∈ I at least one elliptic curve with |E(ℤp )| = k.

A proof is given in the book [61].


In [66] there are described efficient probabilistic algorithms to calculate points on
E(ℤp ) \ {0} and to construct an injective, efficiently invertible map ℳ → E(𝔼p ) \ {0},
p ≥ 5 prime, where ℳ is the set of plain text units. Using these, we may describe the
elliptic curve public key system for E(ℤp ) as follows:
(1) Choose a large prime p ≥ 5 and a, b ∈ ℤp such that y2 = x 3 + ax + b is an elliptic
curve.
(2) Choose an injective efficiently invertible (on the image) map ρ: ℳ → E(ℤp ) \ {0},
where ℳ is the set of plain text units.
(3) Choose a point P ∈ E(ℤp ) \ {0}.
(4) Choose a secret integer d ∈ ℤ and calculate dP ∈ E(ℤp ).

The public key is (P, dP) and the elliptic curve itself. The secret key is d.
For encryption, let m ∈ ℳ be a plain text message unit. Calculate Q = ρ(m). Choose
a random integer k and define c = (kP, Q + k(dP)) ∈ 𝒞 , where 𝒞 is the set of cipher text
units. This is the encrypted message unit.
For decryption, let C = (c1 , c2 ) ∈ 𝒞 be a ciphertext unit. Calculate Q = c2 − dc1
and m = ρ−1 (Q), the preimage of Q. Recall that Q ∈ E(ℤp ) \ {0} if Q = ρ(m) and (c1 , c2 ) =
(kP, Q+k(dP)). The elliptic curve public key cryptosystem provides a valid cryptosystem:
if (c1 , c2 ) = (kP, Q+k(dP)), then c2 −dc1 = Q = ρ(m). The security certificate of the elliptic
364 � 23 Algebraic Cryptography

curve public key cryptosystem is also based on the difficulty of the Computational Diffie–
Hellman problem for E(ℤp ). For this, care should be taken that the discrete logarithm
problem in E(ℤp ) is difficult. Elliptic curve public key cryptosystems are at present the
most important commutative alternatives to the use of the RSA algorithm. There are
several reasons for that. They are more efficient in many cases than RSA and keys in
elliptic curve systems are much smaller than keys in RSA.

23.1.2 Cryptographic Protocols

Besides secure confidential message transmission there are many other tasks that are
important in cryptography, both symmetric key and public key. Although it is not en-
tirely precise, we say that a cryptographic task is where one or more people must com-
municate with some degree of secrecy. The set of algorithms and procedures needed to
accomplish a cryptographic task is called cryptographic protocol. A cryptosystem is just
one type of a cryptographic protocol. More formally, suppose that several parties want
to manage a cryptographic task. Then they must communicate with each other and co-
operate. Hence, each party must follow certain rules and implement a certain algorithm
that they agreed upon.
We now discuss some cryptographic tasks that we will occasionally refer to in this
book but many more can be found in detail in the book [66].

23.1.2.1 Secret Sharing


Given a secret S, a (t, n)-secret sharing threshold scheme is a cryptographical primitive
in which a secret is split into pieces (shares) and distributed among a collection of n
participants {p1 , . . . , pn } so that any group of t or more participants, with t ≤ n, can
recover the secret. Meanwhile, any group of t−1 or fewer participants cannot recover the
secret. Shamir solved the secret sharing problem in a very simple but beautiful manner
using polynomial interpolation.
The general idea in a Shamir (t, n)-secret sharing threshold scheme is the following.
Let K be any field and (x1 , y1 ), . . . , (xn , yn ) be n points in K 2 with pairwise distinct xi .
A polynomial p(x) over K interpolates these points if p(xi ) = yi for i = 1, . . . , n. The
polynomial p(x) is called the interpolating polynomial for the given points. The crucial
theoretical result is that for any n points (x1 , y1 ), . . . , (xn , yn ) with distinct xi there always
exists a unique interpolating polynomial of degree ≤ n − 1.
We now present a more explicit version of the Shamir scheme using the finite field
K = GF(q) where q = pk with k ≥ 1 and p is a large prime. Let S be the secret. The dealer
generates a polynomial p(x) of degree at most t − 1 over K where q is much larger than
n as follows:

p(x) = a0 + a1 x + ⋅ ⋅ ⋅ + at−1 x t−1


23.1 Basic Algebraic Cryptography � 365

where a0 = S is the secret and a1 , . . . , at−1 ∈ K. The dealer chooses pairwise distinct
xi ∈ K \ {0}, i = 1, . . . , n, which are stored in a public area. The dealer calculates yi =
p(xi ), i = 1, . . . , n, and distributes to the n participants via a secure channel so that each
participant pi gets one share yi .
For the secret recovery we use the Lagrange interpolation. We can construct the
Lagrange interpolating polynomial with respect to (x1 , y1 ), . . . , (xn , yn ), all xi ∈ K \ {0}
pairwise distinct, as
t
p(x) = ∑ yi li (x)
i=1

xi −xj
where li (x) = ∏j=1,j=i̸ xi −xj
. Clearly, p(x) is a polynomial of degree at most t − 1. In partic-
ular, the secret a0 will be
t t −xj
a0 = p(0) = ∑ yi ∏ .
i=1 j=i,j=i̸
xi − xj

This scheme is perfect in the sense that for t − 1 participants any secret S ∈ K is equally
likely.
We now describe a geometric alternative scheme which depends on the closest vec-
tor theorem. Let W be a real inner product space and V be a subspace of finite dimen-
sion t. Suppose that w⃗ ∈ W and {e⃗1 , . . . , e⃗t } is an orthonormal basis of V . Note that, given
any basis for the subspace V , the Gram–Schmidt orthonormalization procedure can be
used to find an orthonormal basis for V . Suppose that w⃗ ∈ W is not in V . Then the unique
vector w⃗ ∗ ∈ V closest to w⃗ is given by

w⃗ ∗ = ⟨w,⃗ e⃗1 ⟩e⃗1 + ⋅ ⋅ ⋅ + ⟨w,⃗ e⃗t ⟩e⃗t

where ⟨ , ⟩ is the inner product on W .


We now describe the secret sharing scheme. We start with an inner product space
W of dimension m and an access control group of size n. We assume that m is much
greater than n. Within W there is a hidden subspace V of dimension t < n. The secret
to be shared is given as an element in this hidden subspace, that is, the secret v⃗ ∈ V , a
vector in V . The dealer distributes two vectors v⃗i and w⃗ where v⃗i ∈ V and w⃗ is a vector
in W \ V , and let v⃗ ∈ V be the vector closest to w.⃗
In general, the vector w⃗ can be given publically. The set {v⃗1 , . . . , v⃗n } has the property
that any subset of size t is linearly independent. Hence, any subset of size t determines a
basis for V . Suppose t valid users get together. They can determine an orthonormal basis
of V . Since w⃗ is given, they can determine v⃗ by the closest vector theorem and recover
the secret. Given a subset of size less than t, the given vectors generate a subspace of V of
dimension less than t and hence in W there are infinitely many extensions of subspaces
of dimension t. This implies that determining V with less than t elements of a basis has
zero probability.
366 � 23 Algebraic Cryptography

23.1.2.2 Key Exchange and Key Transport


In a key exchange two people, usually called Bob and Alice, exchange a secret shared
key to be used in some encryption. In a key transport one party transports to another
a secret key that is to be used. We briefly describe the Diffie–Hellman key exchange
protocol.
Bob and Alice choose a large prime p and a generator g of the cyclic multiplicative
group ℤ∗p . The element g is public to all. Alice chooses an a with 1 < a < p − 1. Her public
information or public key is g a given modulo q. This is open to all. Her private informa-
tion or (secret) private key is a. Bob chooses a b with 1 < b < p−1. His public information
or public key is g b given modulo p. This is open to all. His private information or (secret)
private key is g b .
Communication: The secret sharing key is g ab . This can be computed easily by both
Bob and Alice using their private keys. Alice knows her private key a and the value g a is
public from Bob. Hence, she can compute g ab = (g b )a . The analogous situation holds for
Bob. The security certificate of the Diffie–Hellman key exchange protocol is again the
Computational Diffie–Hellman problem for ℤ∗p .

23.1.2.3 Authentication Protocols and Zero-Knowledge Proof Protocols


There are two more important cryptographic protocols which are discussed in detail in
[66] and also to some extent in Chapter 24 on noncommutative group based cryptogra-
phy: the authentication protocols and the zero-knowledge proof protocols.
When a confidential message is transmitted there are several aspects that must be
verified. First, there must be a verification to the receiver that the sender is who he
claims to be. Secondly, there must be a verification to the sender that the receiver is also
who he claims to be. Next there should be a verification that the message has not been
altered in any way. Finally, there should be in many message transmissions some form of
undeniability, that is a procedure that makes it impossible for the sender that he did not
send the message. All of these verifications are handled by an authentication protocol. In
Section 24.5 we discuss a password-authentication protocol using combinatorial group
theory.
Now, a zero-knowledge proof protocol is a method by which one party (the prover)
can prove to another party (the verifier) that a given statement is true while the prover
avoids conveying any additional information apart from the fact that the statement is
indeed true. The essence of zero-knowledge proofs is that it is trivial to prove that one
possesses knowledge of certain information by simply revealing it; the challenge is to
prove such possession without revealing the information itself or any additional infor-
mation. For a classical prototype of a zero-knowledge proof we mention the Ali Baba
cave problem with a magic secret door, see [97].
Exercises � 367

Exercises
1. Let F: ℤ24 → ℤ24 be given by x 󳨃→ 5x + 3. Calculate the period length for x 0 = 0.
2. We use the standard allocation A = 01, B = 02, . . . , Z = 26. Calculate the plaintext
number M for the plaintext message ‘Louisa is born on Christmas Day.’
3. Distribute the secret 42 using the Shamir secret sharing scheme evenly among three
people such that any two can put together the secret.
4. The company Ruin Invest has two directors, seven department managers, and 87
further employees. A valuable customer file is protected by a secret key. Develop a
procedure of the information about the key among the following groups of autho-
rized people:
(1) both directors,
(2) one director and all seven department managers together, and
(3) one director, at least four department managers, and also at least 11 employees.
5. Given are prime numbers p and q with q < p and n = pq. For an RSA cryptosystem
assume that p − q is very small. Show that n can be factorized using the following
procedure:
(1) Let t ∈ ℕ be the smallest number with t ≥ √n.
(2) If t 2 − n is a square, that is, t 2 − n = s2 for some s ∈ ℕ, then p = t + s and q = t − s
provides the factorization.
(3) Otherwise take the next integer t ≥ √n and go back to (2).
Use the procedure to factorize n = 9898828507.
6. Let (n, e) = (2047, 179) be the public RSA key. A plaintext alphabet has the 26 letters
A, B, . . . , Z and the empty sign 0 between words. The plaintext message c with 0 be-
tween words will be subdivided into double blocks with 0 at the end, if necessary.
By the assignment A 󳨃→ 00, B 󳨃→ 01, . . . , Z 󳨃→ 25, 0 󳨃→ 26 each double block gives a
block with 4 digits. We consider the four digit numbers as residue classes modulo
2047. Encryption with the public key (2047, 179) gives the ciphertext message 1054,
92, 1141, 1571, 92, 832 in the form of residue classes modulo 2047.
(a) Break the encryption by factoring 2047.
(b) Why is the number 2047 besides the small size, a particularly unfavorable
choice?
It is possible to break the encryption without factoring 2047?
7. Alice and Bob agree on the following public key cryptosystem:
(1) Alice chooses a, b ∈ ℤ with ab ≠ 1 and calculates M = ab − 1. Then Alice chooses
two integers a′ , b′ and calculates e = a′ M +a and d = b′ M +b. She then calculates
n = ed−1 M
.
(2) Alice publishes the pair (n, e). The secret key is d.
(3) Bob wants to send a message m ∈ {0, 1, . . . , n − 1} to Alice.
He calculates c ≡ em (mod n) and sends c to Alice.
(4) She decrypts the message by calculating cd modulo n.
Show that this is a valid cryptosystem, that is, Alice gets the message.
368 � 23 Algebraic Cryptography

8. Show that breaking the ElGamal encryption scheme and breaking the Diffie–
Hellman key exchange protocol are equally difficult.
9. (a) Let K = ℤ5 and y2 = x 3 + x. This equation defines an elliptic curve over ℤ5 .
Show that E(ℤ5 ) ≅ ℤ2 × ℤ2 .
(b) Let K = ℤ11 and y2 = x 3 + x + 6 be a curve over ℤ11 . Show that y2 = x 3 + x + 6 is
an elliptic curve over ℤ11 and that E(ℤ11 ) is cyclic of order 13.
10. Determine all possible groups E(ℤ5 ) for elliptic curves over ℤ5 . Give all possible
orders for a group E(ℤ5 ).
24 Non-Commutative Group Based Cryptography
24.1 Group Based Methods
The public key cryptosystems and public key exchange protocols that we have discussed,
such as the RSA algorithm, or the Diffie–Hellman, ElGamal and elliptic curve meth-
ods, are number theory based, and thus depend on the structure of Abelian groups. As
computing machinery has gotten stronger, and computational techniques have become
more sophisticated and improved, there have been successful attacks on both RSA and
Diffie–Hellman for smaller and specialized parameters (RSA and Diffie–Hellman mod-
uli). Furthermore, there exist quantum algorithms that specifically break both RSA and
Diffie–Hellman. As a consequence, when and if a workable quantum computer will be
realized, these cryptographic methods will have to be altered.
Because of these attacks there is a feeling that these number theoretic techniques
are theoretically susceptible to attack. Somehow the relatively simple structure of
Abelian groups opens up the possibility of weaknesses in cryptographic protocols. As
a result there has been an active line of research to develop cryptosystems and key
exchange protocols using noncommutative cryptographic platforms which is called
noncommutative algebraic cryptography. Since most of the cryptographic platforms are
groups this is also known as group based cryptography.
The main sources for non-Abelian groups are combinatorial group theory and lin-
ear group theory, that is matrix groups. Braid group cryptography where encryption is
done within the classical braid groups, is one prominent example. The one-way func-
tions in braid group systems are based on the difficulty of solving group theoretic de-
cision problems such as the conjugacy problem and conjugator search problem. Recall
that a one-way function is a function which is easy to implement but very hard to invert.
Although braid group cryptography had initial spectacular success, various potential
attacks have been identified. Borovik, Myasnikov, Shpilrain, see [70], and others have
studied the statistical aspects of these attacks and have identified what is termed black
holes in the platform groups, the outsides of which present cryptographic problems.
The extension of the cryptographic ideas to noncommutative platforms involves the
following ideas:
1. general algebraic techniques for developing cryptosystems;
2. potential algebraic platforms (specific groups, rings, etc.) for implementing the tech-
niques; and
3. cryptanalysis and security analysis of the resulting systems.

The basic idea in using combinatorial group theory for cryptography is that elements
of groups can be expressed as words in some alphabet. If there is an easy method to
rewrite group elements in terms of these words, and further the technique used in this
rewriting process can be supplied by a secret key, then a cryptosystem can be created.

https://doi.org/10.1515/9783111142524-024
370 � 24 Non-Commutative Group Based Cryptography

In Section 14.7 we discussed group presentations and fundamental group decision


problems. Given a group G there exists a presentation G = ⟨X; R⟩ and vice versa. We
recall that the three fundamental group decision problems by Dehn, that is, the word
problem, the conjugacy problem, and the isomorphism problem, have negative answers
in general but have simple and elegant solutions for finitely generated free groups.
These three problems are only the basic decision problems and other algorithmic
problems concerning presentations can be considered. The conjugacy problem asks to
algorithmically determine if two elements given in terms of the generators are conju-
gate. The conjugator search problem asks: given a group presentation for G and two
elements g1 , g2 in G that are known to be conjugate, to determine algorithmically a con-
jugator, that is an element h such that h−1 g1 h = g2 . It is known, as with the conjugacy
problem itself, that the conjugator search problem is undecidable in general.
There are several other group theoretical decision problems. We just mention two.
For a subgroup H of a group G, where H has generating set {x1 , . . . , xn } ⊂ H, the mem-
bership problem asks whether a given element g ∈ G lies in H, and the constructional
membership problem asks whether a given element g ∈ G lies in H, and if so, how to ex-
press g as a word in the generators x1 , . . . , xn . Michailova, see [38], showed that in general
the constructional membership problem is undecidable for infinite matrix groups, also
see [39].
The second is the root extraction problem in a group G. Given an element g ∈ G,
and a number k ∈ ℕ, find an h ∈ G such that hk = g. Many cryptosystems such as
authentication schemes and digital signatures are based on the root extraction problem.
We mention that the root extraction problem is solvable in free groups.
The computational difficulty of solving various group decision problems will play
the role of a hard problem used to construct a one-way function in several non-Abelian
group based cryptosystems.
The book [93] by Myasnikov, Shpilrain and Ushakov has discussions of the complex-
ity of many of these group decision problems.
If a cryptographic protocol is based on an algebraic object, e. g., group, ring, lat-
tice, or finite field, then this object is called the (cryptographic) platform. In group based
cryptography this is then a platform group for the cryptographic protocol. The security
of the cryptographic protocol is then dependent upon the difficulty, computational or
theoretic, of solving a group theoretic problem within the platform group.
To be a reasonable platform group for a group based cryptographic protocol, a
group G must possess certain properties that make the protocol both efficient to im-
plement and secure.
We assume that the group G has a finite presentation

G = ⟨X; R⟩ = ⟨x1 , . . . , xn ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩

and that the protocol security is based on a group theoretic problem that we denote by 𝒫 .
The first necessity is that there is an efficient way to uniquely represent and then multi-
24.1 Group Based Methods � 371

ply the elements of G. In most cases this requires a normal form for elements g ∈ G, that
is, a unique representation in terms of the generators {x1 , . . . , xn }. In particular, reduced
words provide normal forms for elements of free groups. Normal forms provide an ef-
fective method of disguising group elements. Without this, one can determine a secret
key simply by inspection of group elements. The existence of a normal form in a group
implies solvable word problem, which is also essential for these protocols. For g ∈ G we
will denote its normal form, in terms of the set of generators X, by NFX (g).
To be useful in cryptography, given g ∈ G, expressed as a word in x1 , . . . , xn , the
process of moving between the word and the unique normal form must be efficiently
computable. Usually we require at most polynomial time in the input length of g.
In addition to the platform group having normal forms, ideally, it would also exhibit
exponential growth. That is, the growth function for G, γ : ℕ → ℝ, defined by γ(n) =
# {w ∈ G : l(w) ≤ n}, has an exponential growth rate, also see [93]. In the definition l(W )
stands for the minimal number of letters needed to express W as a word in x1 , . . . , xn .
Exponential growth is a necessity that ensures that the group will provide a large key
space.
Further, the normal form must exhibit good diffusion in determining the normal
forms of products. This means that in finding the normal forms of products it is compu-
tationally difficult to rediscover the factors, that is if we know NFX (g1 g2 ) it is computa-
tionally difficult to discover g1 , g2 or NFX (g1 ), NFX (g2 ).
Other necessities for a platform group depend on the particular protocol. If the secu-
rity is based on the group problem 𝒫 , such as the word problem or conjugacy problem,
we have to assume that in G, the solution to 𝒫 is computationally hard (NP-hard) or un-
solvable. However, what we really want is generically hard, that is, hard on most inputs.
The solution to 𝒫 might be unsolvable but have polynomial average case complexity. In
this case, if care is not taken in choosing the inputs, the solution to 𝒫 is easy and the cryp-
tographic protocol is broken. This does not eliminate a group G as a possible platform
group but indicates that one must take great care in choosing cryptographic inputs.
Among the first attempts to use non-Abelian groups as platforms for public key cryp-
tosystems were the schemes [62] by Anshel, Anshel and Goldfeld, and the schemes [85]
by Ko, Lee et al. The first protocol was developed by I. Anshel, M. Anshel and D. Goldfeld.
The original version of the Ko–Lee protocol was published by K. H. Ko, S. J. Lee, J. H. Han,
J. Kang and C. Park. We will refer to the second protocol as Ko–Lee. Both sets of authors,
at about the same time, proposed using non-Abelian groups and combinatorial group
theory for public key exchange.
The Anshel–Anshel–Goldfeld and Ko–Lee methods can be considered as group theo-
retic analogs of the number theory based Diffie–Hellman method. The basic underlying
idea is the following. If G is a group and g, h ∈ G we let g h denote the conjugate of g by h,
that is g h = h−1 gh. The simple observation is that (g h1 )h2 = g h1 h2 . Therefore writing con-
jugation in this exponential manner behaves like ordinary exponentiation. From this
straightforward idea one can almost exactly mimic the Diffie–Hellman protocol, now
within a non-Abelian group.
372 � 24 Non-Commutative Group Based Cryptography

In Section 24.8, we examine the Ko–Lee and Anshel–Anshel–Goldfeld protocols.


Both sets of developers originally suggested using braid groups as the basic and most
appropriate group theoretic platform. Here, we just give a presentation for the braid
group Bn , n ≥ 3, in the form

Bn = ⟨σ1 , . . . , σn−1 ; [σi , σj ] = 1 if |i − j| > 1, σi+1 σi σi+1 = σi σi+1 σi for i = 1, . . . , n − 1⟩

which is now called the Artin presentation. We remark that there are several possibili-
ties for normal forms for elements of Bn , see [24].
We describe both protocols in a most general context, that is, with a general platform
group. This platform group must have a finite presentation with efficiently computable
normal forms, exponential growth, and good diffusion in determining the normal form
of products. For the following Ko–Lee protocol and the Anshel–Anshel–Goldfeld pro-
tocols, the platform group must also contain an abundant collection of subgroups that
commute elementwise and that can be efficiently described.

24.2 Initial Group Theoretic Cryptosystems—The Magnus Method


One of the earliest descriptions of using a non-Abelian group in cryptography appeared
in a paper by Magnus in the early 1970’s, see [89]. This was what is now called a free
group cryptosystem. The seminal idea of using the difficulty of group theory decision
problems in infinite non-Abelian groups as one-way functions in cryptography was first
developed by Magyarik and Wagner in 1985. Neither of these two methods proved suc-
cessful as workable encryption methods yet their introduction ushered in a subsequent
complete theory and other ideas. In this section we describe Magnus’ idea and in the
next subsection the Wagner–Magyarik method.
In [89], Magnus studied rational representations of Fuchsian groups and non-
parabolic subgroups of the classical modular group M. Recall that M = PSL(2, ℤ). That
is, M consists of the 2 × 2 projective integral matrices

a b
M = {± ( ) : ad − bc = 1, a, b, c, d ∈ ℤ} .
c d

Equivalently, M can be considered as the set of integral linear fractional transformations


with determinant 1:

az + b
z′ = with ad − bc = 1 and a, b, c, d ∈ ℤ.
cz + d

Theorem 24.2.1 (B. H. Neumann). The matrices

1 1 1 + 4t 2 2t
±( ), ±( ), t = 1, 2, 3, . . . ,
1 2 2t 1
24.2 Initial Group Theoretic Cryptosystems—The Magnus Method � 373

freely generate a free subgroup F of infinite index in M. Further, distinct elements of F have
distinct first columns (up to sign). The group F is of infinite rank.

Proof. Without loss of generality we first work in the homogenous modular group

a b
Γ = {( ) : a, b, c, d ∈ ℤ, ad − bc = 1 = SL(2, ℤ)} .
c d

B. H. Neumann, see [40], constructed infinitely many subgroups N of Γ with the fol-
lowing properties:
(i) N contains the matrix T = ( 01 −1
0 ).
(ii) Let a and c be any pair of coprime integers. Then N contains exactly one matrix in
which the first column consists of the ordered pair (a, c).

We remark that Neumann showed that such an N has properties (i) and (ii) if it contains
T and has exactly all the elements U n , n = 0, ±1, ±2, . . . , as right coset representatives in
Γ where U = ( 01 11 ).
To prove Theorem 24.2.1 we do not need the whole procedure, also not the additional
remark (for the complete construction see [40]). We just pick up the single procedure for
the special group given in Theorem 24.2.1. We consider the bijective map f : ℤ → ℤ given
by f (f (n)) = n, f (0) = 0, f (−1) = −1, and for any positive integer k we have f (2k) = 2k,
f (6k − 1) = −3k − 1, f (6k − 3) = −3k, f (6k − 5) = 1 − 3k.
We define the subgroup N generated by the elements

n −1 − nf (n)
γn = ( ).
1 −f (n)

We now consider N as a subgroup of the modular group M and use the Reide-
meister–Schreier method in combination with Tietze transformations, see Chapter 14.
We see that N is generated by the elements

γ−1 and γ2k , k = 1, 2, 3, . . .

with the defining relations

2 2
γ−1 = γ2k , k = 1, 2, 3, . . . .

This shows that the elements A = γ0−1 γ−1 and B2k = γ2k γ0−1 , k = 1, 2, 3, . . . , freely gen-
erate a free subgroup F of infinite rank in N using the Reidemeister–Schreier method.
This, in fact, also follows if we consider F acting on the upper half plane.
2
We have A = ±( 11 21 ) and B2k = ±( 1+4k 2k 1
2k ), k = 1, 2, 3, . . . . The group F does not

contain any power of U , t ∈ ℤ \ {0}. In fact, all the elements U n , n = 0, ±1, ±2, ±3, . . . , are
t

right coset representatives of F in Γ because f is bijective. If C = ±( ac db ) is any element


of F, then C ≠ 0 because no power of U is in F and the elements CU t ∈ M, t ∈ ℤ, have
374 � 24 Non-Commutative Group Based Cryptography

the same first column as C, up to the sign, and if t runs through the integers, we get all
elements of M with the same first column. This we can see as follows.
Let D = ±( ac gh ) be any element of M with the same first column. Then

1 = ad − bc = ah − gc

from the determinant. It follows that a(d − h) = c(b − g). Since gcd(a, c) = 1 we get
c|(d − h), that is, there exists a t ∈ ℤ with ct = d − h, and therefore h = d − ct. We get
with this that ad − bc = a(d − ct) − gc, that is, g = b − at.
Hence, D = ±( ac b−at
d−ct ). Now consider CU
−t
∈ M, t ∈ ℤ, then

a b 1 −t a b − at
CU −t = ± ( )( ) = ±( ).
c d 0 1 c d − ct

This shows that distinct elements of F have distinct first columns, up to sign.
Magnus, see [89], had the idea to use this for cryptographic protocols. Since the en-
tries in the generating matrices are positive we can do the following.
Choose a set T1 , . . . , Tn of projective matrices from the set above with n large enough
to encode a desired plaintext alphabet 𝒜. Any message would be encoded by a word
w(T1 , . . . , Tn ) with nonnegative exponents. This represents an element g of F. The two
elements in the first column determine w and therefore g. Receiving w then determines
the message uniquely. Pure free cryptography as Magnus proposed is subject to many
attacks. We will discuss this further in Section 24.3.

24.2.1 The Wagner–Magyarik Method

The idea of using the difficulty of group theory decision problems in devising hard one-
way functions for cryptographic purposes was first developed by Magyarik and Wagner
in 1985, see [103]. They devised a public key protocol based on the difficulty of the so-
lution to the word problem. Although this was a seminal idea, their basic cryptosystem
was really unworkable and not secure in the form they presented.
Wagner and Magyarik outlined a conceptual public key cryptosystem based on the
hardness of the word problem for finitely presented groups. At the same time, they gave
a specific example of such a system. Gonzalez Vasco and Steinwandt, see [78], proved
that their approach is vulnerable to so-called reaction attacks. In particular, for the pro-
posed instance it is possible to retrieve the private key just by watching the performance
of a legitimate recipient.
The general scheme of the Wagner and Magyarik public key cryptosystem is as fol-
lows. Let X be a finite set of generators, and let R and S be finite sets of relators on X.
Consider the two groups G and G0 with presentations

G = ⟨X; R⟩ and G0 = ⟨X; R ∪ S⟩.


24.3 Free Group Cryptosystems � 375

The group G0 is then a homomorphic image of G. We assume first that G has a hard word
problem so that the word problem in G is not solvable in polynomial time. We next
assume that the homomorphic image G0 has a word problem solvable in polynomial
time, that is an easy word problem.
Choose two words w0 and w1 which are not equivalent in G0 (and hence not equiva-
lent in G since G0 is a homomorphic image of G). The public key is the presentation ⟨X; R⟩
and the chosen words w0 and w1 . To encrypt a single bit ∈ {0, 1}, pick wi and transform it
into a ciphertext word w by repeatedly and randomly applying Tietze transformations
to the presentation ⟨X; R⟩. To decrypt a word w, run the algorithm for the word problem
of G0 in order to decide which of wi w−1 is equivalent to the empty word for the pre-
sentation ⟨X; R ∪ S⟩. The private key is the set S. As pointed out by González Vasco and
Steinwandt, this is not sufficient and Wagner and Magyarik are not clear on this point.
The public key should be a deterministic polynomial-time algorithm for the word prob-
lem of G0 = ⟨X; R ∪ S⟩. Just knowing S does not automatically and explicitly give us an
efficient algorithm (even if such an algorithm exists).
Although the Wagner–Magyarik protocol was not workable as a public key system,
the idea opened the door for using similar types of encryption involving group theoretic
decision problems.

24.3 Free Group Cryptosystems


The simplest example of a non-Abelian group based cryptosystem is perhaps a free group
cryptosystem. This can be described in the following manner.
Consider a free group F on free generators x1 , . . . , xr . Then each element g in F has
a unique expression as a reduced word w(x1 , . . . , xr ). Let w1 , . . . , wk , where each wi =
wi (x1 , . . . , xr ), be a set of words in the generators x1 , . . . , xr of the free group F. At the most
basic level, to construct a cryptosystem, suppose that we have a plaintext alphabet 𝒜.
For example, suppose 𝒜 = {a, b, . . . } are the symbols needed to construct meaningful
messages in English. To encrypt, use a substitution ciphertext

𝒜 󳨃→ {w1 , . . . , wk }

given by a 󳨃→ w1 , b 󳨃→ w2 , . . . . Then, for a word w(a, b, . . . ) in the plaintext alphabet,


form the free group word w(w1 , w2 , . . . ). This represents an element g in F. Send out g
as the secret message.
In order to implement this scheme we need a concrete representation of g and then
for decryption a way to rewrite g back in terms of w1 , . . . , wk . This concrete representa-
tion is the idea behind homomorphic cryptosystems.
The decryption algorithm in a free group cryptosystem then depends on the
Reidemeister–Schreier rewriting process, see Section 14.4. Let F be a free group on
{x1 , . . . , xn }. The Reidemeister–Schreier process allows one to construct a set of gen-
erators w1 , . . . , wk for H by using a Schreier transversal. Further, given the Schreier
376 � 24 Non-Commutative Group Based Cryptography

transversal from which the set of generators for H was constructed, the Reidemeister–
Schreier rewriting process allows us to algorithmically rewrite an element of H. Given
such an element expressed as a word w = w(x1 , . . . , xr ) in the generators of F this
algorithm rewrites w as a word w⋆ (w1 , . . . , wk ) in the generators of H.
Pure free group cryptosystems are subject to various attacks and can be broken of-
ten easily. However, a public key free group cryptosystem using a free group represen-
tation in the modular group was developed by Baumslag, Fine and Xu, see [67] and [68].
The most successful attacks on free group cryptosystems are called length based attacks.
The general idea in a length based attack is that an attacker multiplies a word in cipher-
text by a generator to get a shorter word which then could possibly be decoded. We refer
to [76] for more on length based attacks.
Baumslag, Fine and Xu in [67] described the following general encryption scheme
using free group cryptography. A further enhancement was discussed in the paper [68].
We start with a finitely presented group

G = ⟨X; R⟩,

where X = {x1 , . . . , xn }, and a faithful representation

ρ : G 󳨃→ G.

G can be any one of several different kinds of objects; linear group, permutation group,
power series ring, etc.
We assume that there is an algorithm to re-express an element of ρ(G) in G in terms
of the generators of G. That is if g = w(x1 , . . . , xn ) ∈ G, where w is a word in these
generators and we are given ρ(g) ∈ G, we can algorithmically find g and its expression
as the word w(x1 , . . . , xn ).
Once we have G, we assume that we have two free subgroups K, H with

H ⊂ K ⊂ G.

We assume that we have fixed Schreier transversals for K in G and for H in K both of
which are held in secret by the communicating parties Bob and Alice. Now based on the
fixed Schreier transversals we have sets of Schreier generators constructed from the
Reidemeister–Schreier process for K and for H:

k1 , . . . , km , . . . for K
and
h1 , . . . , ht , . . . for H.

Notice that the generators for K will be given as words in x1 , . . . , xn , the generators
of G, while the generators for H will be given as words in the generators k1 , k2 , . . . for K.
24.3 Free Group Cryptosystems � 377

We note further that H and K may coincide and that H and K need not in general be free
but only have a unique set of normal forms so that the representation of an element in
terms of the given Schreier generators is unique.
We will encode within H, or more precisely within ρ(H). We assume that the num-
ber of generators for H is larger than the set of characters within our plaintext alphabet.
Let 𝒜 = {a, b, c, . . . } be our plaintext alphabet. At the simplest level we choose a starting
point i, within the generators of H, and encode

a 󳨃→ hi , b 󳨃→ hi+1 , . . . , etc.

Suppose that Bob wants to communicate the message w(a, b, c, . . . ) to Alice where
w is a word in the plaintext alphabet. Recall that both Bob and Alice know the var-
ious Schreier transversals which are kept secret between them. Bob then encodes
w(hi , hi+1 , . . . ) and computes the element w(ρ(hi ), ρ(hi+1 ), . . . ) in G which he sends to Al-
ice. This is sent as a matrix if G is a linear group or as a permutation if G is a permutation
group and so on.
Alice uses the algorithm for G relative to G to rewrite w(ρ(hi ), ρ(hi+1 ), . . . ) as a word
w⋆ (x1 , . . . , xn ) in the generators of G. She then uses the Schreier transversal for K in
G to rewrite using the Reidemeister–Schreier process w⋆ as a word w⋆⋆ (k1 , . . . , ks ) in
the generators of K. Since K is free or has unique normal forms this expression for the
element of K is unique. Once she has the word written in the generators of K she uses
the transversal for H in K to rewrite again, using the Reidemeister–Schreier process,
in terms of the generators for H. She then has a word w⋆⋆⋆ (hi , hi+1 , . . . ) and using the
allocation hi 󳨃→ a, hi+1 󳨃→ b, . . . decodes the message.
In an actual implementation an additional random noise factor is added. This is
explained in more detail below.
We now describe an implementation of this process using for the base group G
the classical modular group M = PSL(2, ℤ). Further, this implementation uses a poly-
alphabetic cipher which is secure. This was introduced originally in [67] and [68].
The system in the modular group M works as follows. A list of finitely generated
free subgroups H1 , . . . , Hm of M is public and presented by their systems of generators
(presented as matrices). In a full practical implementation it is assumed that m is large.
For each Hi we have a Schreier transversal

h1, i , . . . , ht(i), i

and a corresponding ordered set of generators

w1, i , . . . , wm(i), i

constructed from the Schreier transversal by the Reidemeister–Schreier process. It is


assumed that each m(i) ≫ l where l is the size of the plaintext alphabet, that is, each
subgroup has many more generators than the size of the plaintext alphabet. Although
378 � 24 Non-Commutative Group Based Cryptography

Bob and Alice know these subgroups in terms of free group generators what is made
public are generating systems given in terms of matrices.
The subgroups on this list and their corresponding Schreier transversals can be cho-
sen in a variety of ways. For example the commutator subgroup of the modular group is
free of rank 2 and some of the subgroups Hi can be determined from homomorphisms
of this subgroup onto a set of finite groups.
Suppose that Bob wants to send a message to Alice. Bob first chooses three integers
(m, q, t) where m is the choice of the subgroup Hm , q is the choice of the starting point
among the generators of Hm for the substitution of the plaintext alphabet, and t is the
choice of the size of the message unit.
We clarify the meanings of q and t. Once Bob chooses m, to further clarify the mean-
ing of q, he makes the substitution

a 󳨃→ wm, q , b 󳨃→ wm, q+1 , . . . .

Again the assumption is that m(i) ≫ l so that starting almost anywhere in the sequence
of generators of Hm will allow this substitution. The message unit size t is the number
of coded letters that Bob will place into each coded integral matrix.
Once Bob has chosen (m, q, t) he takes his plaintext message w(a, b, . . . ) and groups
blocks of t letters. He then makes the given substitution above to form the corresponding
matrices in the modular group:

T1 , . . . , Ts .

We now introduce a random noise factor. After forming T1 , . . . , Ts Bob then multiplies
on the right each Ti by a random matrix in M say RTi (different for each Ti ). The only
restriction on this random matrix RTi is that there is no free cancellation in forming the
product Ti RTi . This can be easily checked and ensures that the freely reduced form for
Ti RTi is just the concatenation of the expressions for Ti and RTi . Next he sends Alice the
integral key (m, q, t) by some public key method (RSA, Anshel–Goldfeld, etc.). He then
sends the message as s random matrices

T1 RT1 , T2 RT2 , . . . , Ts RTs .

Hence what is actually being sent out are not elements of the chosen subgroup Hm
but rather elements of random right cosets of Hm in M. The purpose of sending coset
elements is two-fold. The first is to hinder any geometric attack by masking the sub-
group. The second is that it makes the resulting words in the modular group generators
longer—effectively hindering a brute force attack.
To decode the message Alice first uses public key decryption to obtain the integral
keys (m, q, t). She then knows the subgroup Hm , the ciphertext substitution from the gen-
erators of Hm and how many letters t each matrix encodes. She next uses the algorithms
described in Section 24.2 to express each Ti RTi in terms of the free group generators of M
24.4 Non-Abelian Digital Signatures � 379

say wTi (y1 , . . . , yn ). She has knowledge of the Schreier transversal, which is held secretly
by Bob and Alice, so now uses the Reidemeister–Schreier rewriting process to start ex-
pressing this freely reduced word in terms of the generators of Hm . The Reidemeister–
Schreier rewriting is done letter by letter from left to right. Hence when she reaches
t of the free generators she stops. Notice that the string that she is rewriting is longer
than what she needs to rewrite in order to decode as a result of the random matrix RTi .
This is due to the fact that she is actually rewriting not an element of the subgroup but
an element in a right coset. This presents a further difficulty to an attacker. Since these
are random right cosets it makes it difficult to pick up statistical patterns in the genera-
tors even if more than one message is intercepted. In practice the subgroups should be
changed with each message.
The initial key (m, q, t) is changed frequently. Hence as mentioned above this
method becomes a type of polyalphabetic cipher which is difficult to decode.

24.4 Non-Abelian Digital Signature Procedure


We present a digital signature procedure based on non-Abelian groups developed by
Ko, Lee et al., see [84]. In describing this protocol we must first introduce additional
group theoretic decision problems. In Section 14.7 we discussed the three basic group
decision problems for a finitely presented group G: the word problem, the conjugacy
problem, and the isomorphism problem. Recall that in a finitely presented group G the
conjugacy problem asks if there exists an algorithm to decide whether or not arbitrary
words u and v in the generators of G are conjugate? That is, is there an x ∈ G such that
x −1 ux = v? To distinguish this from certain other decision problems using conjugacy we
call this the decision conjugacy problem. For a finitely presented group G the conjugator
search problem is the following. Given u, v ∈ G that we know to be conjugate is there an
algorithm to find z ∈ G satisfying z−1 uz = v?
In the following we use the notation uz for z−1 uz.
Let G be a non-Abelian group in which the conjugator search problem is infeasible
and the decision conjugacy problem is solvable. Let {0, 1}∗ be the set of all 0, 1 sequences
and let h : {0, 1}∗ → G be a hash function. Recall that a (cryptographic) hash function is
a deterministic function h: S → {0, 1}n , which returns for each arbitrary block of data,
called a message, a fixed size of bit strings. It should have the property that a change in
the data will change the hash value.
An ideal hash function has the following properties:
(i) It is easy to compute the hash value for any given message.
(ii) It is infeasible to find a message that has a given hash value (preimage resistant).
(iii) It is infeasible to modify a message.
(iv) It is infeasible to find two different messages with the same hash (collision resistant).

With these ideas here is the Ko–Lee digital signature scheme.


380 � 24 Non-Commutative Group Based Cryptography

– Key Generation: Alice wants to sign and send a message, m, to Bob. Alice begins by
choosing two conjugate elements u, v ∈ G with conjugator a. The conjugate pair
(u, v) is public information while the conjugator a is Alice’s secret key.
– Signature Generation: Alice chooses arbitrary b ∈ G, and computes α = ub and
y = h(mα). Then a signature σ on the message m is the triple (α, β, γ) where β = yb
and γ = ya b . She sends this to Bob for verification and acceptance.
−1

– Verification: Upon receiving the signature, Bob checks whether or not the following
hold:
(1) There exists c1 ∈ G such that u = αc1 .
(2) There exist c2 , c3 ∈ G such that γ = βc2 and y = γc3 .
(3) There exists c4 ∈ G such that uy = (αβ)c4 .
(4) There exists c5 ∈ G such that vy = (αγ)c5 .
Bob accepts the signature if and only if conditions (1)–(4) hold.

The security of this scheme lies in the assumption that, given a pair of conjugate ele-
ments u, v ∈ G, finding elements α, β, γ such that (1)–(4) above hold is infeasible. If the
conjugator a can be found, then (α, β, γ) = (ub , yb , ya b ) satisfy properties (1)–(4) for any
−1

b ∈ G. Hence the conjugacy search problem has to be infeasible.

24.5 Password Authentication Using Combinatorial Group Theory


Closely related to digital signatures is the problem of secure password authentication.
With the increased use of online credit card transactions there is at present more than
ever a need for secure password identification. For many online purchases, this is be-
ing carried out by a challenge response system accompanying the password. In the sim-
plest systems this takes the form of secondary password questions such as the user’s
mother’s maiden name or place of birth. There are inherent difficulties with these types
of challenge response systems. First of all there is the trivial problem of the users re-
membering their responses. More critical is the problem that this type of information
for many people is readily available and easily found or guessed by would-be attackers
or eavesdroppers.
Challenge response systems are also subject to man-in-the-middle attacks and re-
play attacks. In this section we present an alternative method for challenge response
password authentication using combinatorial group theory. In particular this method
depends upon the difficulty of solving the word problem within a given finitely pre-
sented group without knowing the presentation and the difficulty of solving systems of
equations within free groups. This latter problem has been proved to be NP-hard.
These group theoretic techniques have several major advantages over other chal-
lenge response systems. We will call the password presenter, the prover, and the pass-
word presentee, the verifier. The methods we present can be used for two-way authen-
tication, that is to both verify the prover to the verifier and to verify the verifier to the
24.5 Password Authentication Using Combinatorial Group Theory � 381

prover. To each user in conjunction with a standard password there will be assigned a
finitely presented group with a solvable word problem. We call this the challenge group.
This will be done randomly by the group randomizer system and will be held in secret
by the prover and the verifier.
Cryptographically, we assume the adversary can steal the encrypted form of the
group theoretic responses. Probabilistically this does not present a problem. Each chal-
lenge response set of questions forms a virtual one time key pad as we will explain.
Therefore the adversary must steal three things: the original password, the challenge
group and the group randomizer. Hence there is almost total security in the challenge
response system.
Further there is an infinite supply of finitely presented groups to use as challenge
groups and an infinite supply of challenge response questions that never have to be
duplicated. We will explain these in the section on this protocol’s security. Finally the
method is symmetric between the verifier and the prover, so while the verifier verifies
the prover’s password simultaneously the prover verifies that he or she is dealing with
the verifier.
The theoretical security of the system is provided by several results in asymptotic
group theory which we discuss in Section 24.6. In particular, a result of Lysenok and
Myasnikov, see [91], implies that stealing the challenge group is NP-hard while a result
of Jitsukawa, see [81], says that the asymptotic density of using homomorphisms to attack
the group randomizer protocol is zero.
The whole password protocol depends upon the group randomizer system. This is
a computer program that can handle several elementary tasks involving finitely pre-
sented groups. The scope of the particular group randomizer system will depend on
the type of login protocol or cryptographic protocol desired. At the most basic level the
group randomizer system has the ability to do the following things:
1. To recognize a finite presentation of a finitely presented group with a solvable word
problem and manipulate arbitrary words in the alphabet of generators according
to the rewriting rules of the presentation. In particular, if the group has a normal
form for each element, the group randomizer can rewrite an arbitrary word in the
generators in terms of its group normal form.
2. Given a finite presentation of a group with a solvable word problem, to recognize
whether two free group words have the same value in the given group when con-
sidered in terms of the given generators of the group.
3. To randomly generate free group words on an alphabet of any finite size.
4. To recognize and store sets of free group words w1 , . . . , wk on an alphabet x1 , . . . , xn
and rewrite words w(w1 , . . . , wk ) as the corresponding word in x1 , . . . , xn .
5. Given a free group of finite rank on x1 , . . . , xn and a set of words w1 , . . . , wk on an al-
phabet x1 , . . . , xn , to solve the membership problem in F relative to H = ⟨w1 , . . . , wk ⟩,
the subgroup of F generated by w1 , . . . , wk .
6. Given a stored finitely presented group or a stored set of free group words, the ran-
domizer can accept a random free group word and rewrite it as a normal form in
382 � 24 Non-Commutative Group Based Cryptography

the finitely presented group in the former case or as a word in the ambient free
group in the latter case.

We now present several variations on secure password authentication using the group
randomizer. First we give an overall outline of the protocol.

24.5.1 General Outline of the Authentication Protocol

This is a symmetric key cryptographic authentication protocol. Both the prover and ver-
ifier use a single private key to both encrypt and decrypt within the authentication pro-
cess. At the first step the prover and verifier must communicate directly, either face-to-
face or by a public key method, to set the private shared secret. This is the model now
used for most password/password back-up schemes. We assume that both the prover
and verifier have a group randomizer system. For security analysis we assume that an
adversary or eavesdropper has access to the encrypted form of the transmission but is
passive in that the adversary will not change any transmissions.
1. The prover and verifier communicate directly to set up a common shared secret
(P, G) where P is a standard password and G is a challenge group. Each prover’s
challenge group is unique to that prover. The challenge group is a finitely presented
group with a solvable word problem and satisfying the strong generic free group
property which we discuss in Section 24.6. The password is chosen by the prover
while the challenge group is randomly chosen by the group randomizer system.
2. The prover presents the password to the verifier. The group randomizer of the ver-
ifier presents a group theoretic “question” concerning the challenge group G to the
prover. The assumption is that this “question” is difficult in the sense that it is in-
feasible to answer it if the group G is unknown. The question is then answered by
the group randomizer. This is repeated a finite number of times. If the answers are
correct, the prover (and the password) is verified.
3. The protocol is then repeated from the viewpoint of the prover, authenticating the
verifier to the prover.

24.5.2 Free Subgroup Method

We assume that both the prover and the verifier has a group randomizer. Each prover
has a standard password. Suppose that F is a free group on {x1 , . . . , xn }. The prover’s
password is linked to a finitely generated subgroup of a free group given as words in
the generators, that is, the prover’s password is linked to w1 , . . . , wk where each wi is a
word in x1 , . . . , xn . The group G = ⟨w1 , . . . , wk ⟩ is called the challenge group. In general
we have k ≠ n. The prover does not need to know the generators. The randomizer can
randomly choose words from this subgroup and then freely reduce them. The prover
has the challenge group or subgroup also stored in its randomizer.
24.5 Password Authentication Using Combinatorial Group Theory � 383

The prover submits his or her standard password to the prover. This activates the
verifier’s randomizer to the prover’s set of words. The verifier now submits a random
free group word on y1 , . . . , yk to the prover’s randomizer say w(y1 , . . . , yk ). The prover’s
randomizer treats this as w(w1 , . . . , wk ) and then reduces it in terms of the free group
generators x1 , . . . , xn and rewrites it as w⋆ (x1 , . . . , xn ). The verifier checks that this is cor-
rect, that is, w(w1 , . . . , wk ) = w⋆ (x1 , . . . , xn ) on the free group on x1 , . . . , xn . If it is, the
verifier continues and does this three (or some other finite number) of times. There is
one proviso. The verifier submits a word to the prover only once, so that a submitted
word can never be reused. The prover’s randomizer will recognize if it has (this is a
verification to the prover of the verifier).
To verify that the verifier is legitimate, the process is repeated from the prover’s
randomizer to the verifier.
An attacker only has access to the transmitted words. Given a series of free group
words there is essentially zero probability of reconstructing the subgroup. To prevent an
attacker using an already used word to gain access, the group randomizer system allows
a free group word, submitted as a challenge word, to be used only once. If an attacker
gets access to the verifier and submits an already submitted word or vice versa from the
prover, this will red flag the attempt. We also suggest that if there is a previously used
word, indicating perhaps an attack, the group randomizer should change the prover’s
group. The beauty of this system is that this can be done extremely easily; change several
of the words for example. Essentially this presents an essential one-time key pad each
time the prover presents the password. The map yi → wi is a homomorphism and an
attacker can manipulate various equations in an attempt to solve. Presumably, if there
are enough equations, the words w1 , . . . , wk can be discovered. However, in Section 24.6
we present a security proof based on several results in asymptotic group theory showing
that this cannot happen with asymptotic density one.
We suggest a noise/diffusion enhancement. The provers challenge group generator
words w1 , . . . , wk are indexed. With each use the randomizer applies a random permu-
tation ϕ on {1, . . . , k} to scramble the indices. These permutations are coded and stored
both in the prover’s randomizer and the verifier’s one. This prevents a length based at-
tack by an eavesdropper since discovering, for example, what w37 is, is of no use since
it will be indexed differently for the next use. The coded permutation is sent as part of
the challenge.

24.5.3 General Finitely Presented Group Method

This is essentially the same method, however, rather than working with an ambient free
group we work with a given finitely presented group with a solvable word problem. Let
G = ⟨X; R⟩ be the group. As before we assume that both the prover and the verifier has a
group randomizer. Each prover has a standard password. Suppose that X = {x1 , . . . , xn }
384 � 24 Non-Commutative Group Based Cryptography

and F is a free group on {x1 , . . . , xn }. The prover’s password is linked to a finitely gen-
erated subgroup of G again given as words in the generators X, that is, the prover’s
password is linked to w1 , . . . , wk where each wi is a word in x1 , . . . , xn . As before, we let
k ≠ n. The randomizer can randomly choose words from this subgroup and then reduce
them via the finite presentation. The verifier has the group and subgroup also stored in
its randomizer.
The remainder of the procedure is exactly the same as in the free group case. The
prover submits his or her standard password to the verifier. This activates the verifier’s
randomizer to the prover’s set of words. The verifier now submits a random free group
word on y1 , . . . , yk to the prover’s randomizer, say, w(y1 , . . . , yk ). The prover’s randomizer
treats this as w(w1 , . . . , wk ) and rewrites it as w⋆ (x1 , . . . , xn ). The verifier checks that this
is correct, that is, w(w1 , . . . , wk ) = w⋆ (x1 , . . . , xn ), however, this time in the group G. If it is,
the verifier continues and does this three (or some other finite number) of times. There
is one proviso. The verifier submits a word to the prover only once so that a submitted
word can never be reused. The prover’s randomizer will recognize if it has (this is a
verification to the prover of the verifier).
To verify that the verifier is legitimate, the process is repeated from the prover’s
randomizer to the verifier.
As in the free group method, an attacker only has access to the transmitted words.
Given a series of group words there is zero probability of reconstructing the group, how-
ever, as in the free group method a given challenge response word is to be used only
once.

24.6 The Strong Generic Free Group Property


Part of the theoretical security of the group randomizer protocols depends on the
strong generic free group property and asymptotic density. Asymptotic density is a gen-
eral method to compute densities and/or probabilities on infinite discrete sets where
each individual outcome is tacitly assumed to be equally likely. The origin of asymptotic
density lie in the attempt to compute probabilities on the whole set of integers where
each integer is considered equally likely. The method can also be used where some
probability distribution is assumed on the elements. It has been effectively applied to
determining densities within infinite finitely generated groups where random elements
are considered as being generated from random walks on the Cayley graph of the group.
The paper [70] by Borovik, Myasnikov and Shpilrain provides a good general descrip-
tion of the probability method in group theory. Let 𝒫 be a group property and let G be
a finitely generated group. We want to determine the measure of the set of elements
which satisfy 𝒫 . For each positive integer n let Bn denote the n-ball in G. Let |Bn | denote
the actual size of Bn (which is an integer since G is finitely generated) or the measure of
|Bn | if a distribution has been placed on the elements of G. Let S be the set of elements
in G satisfying 𝒫 . The asymptotic density of S is then
24.6 The Strong Generic Free Group Property � 385

|S ∩ Bn |
lim
n→∞ |Bn |

provided this limit exists. We say that the property 𝒫 is generic if the asymptotic density
of the set S of elements satisfying 𝒫 equals 1.
This concept can be easily extended to properties of finitely generated subgroups.
We consider the asymptotic density of finite sets of elements that generate subgroups
that have a considered property. For example, to say that a group has the generic free
group property we mean that

|Sm ∩ Bm,n |
lim =1
m,n→∞ |Bm,n |

where Sm is the collection of finite sets of elements of size m that generate a free sub-
group while Bm,n are all the m-element subsets within the n-ball. We refer to the pa-
per [70] and the book [93] for terminology and further definitions.
We say that a group G has the generic free group property if a finitely generated sub-
group is generically a free group. For example, a result by Epstein, see [25], says that the
group GL(n, ℝ) satisfies the generic free group property. A group G has the strong generic
free group property if given randomly chosen elements g1 , . . . , gn in G then generically
they are a free basis for the free subgroup they generate. Jitsukawa, see [81], proved
that free groups have the strong generic free group property. That is, given k random
elements w1 , . . . , wk in the free group on y1 , . . . , yn , then with asymptotic density one the
elements w1 , . . . , wk are a free basis for the subgroup they generate. We compare this
with the Nielsen–Schreier theorem that says that w1 , . . . , wk generate a free group. In
the context of the group randomizer protocols, the strong generic free group property
implies that if v1 (y1 , . . . , ym ), . . . , vk (y1 , . . . , ym ) have already been presented as challenge
words then the probability is approximately zero that a new challenge word v(y1 , . . . , ym )
lies in the subgroup generated by v1 , . . . , vk , and hence a homomorphism attack is nulli-
fied.
The strong generic free group property has been extended to many classes of groups
including surface groups by Fine, Myasnikov and Rosenberger, see [29]. Let us mention
some further results. Gilman, Myasnikov and Osin, see [77], showed that torsion-free
hyperbolic groups have the generic free group property. Myasnikov and Ushakov, see
[94], showed that pure braid groups Pn with n ≥ 3 also have the strong generic free
group property. We will show that all Fuchsian groups of finite co-volume and all braid
groups Bn with n ≥ 3 have the strong generic free group property.
Extremely useful in proving that a group has the generic or strong generic free
group property is the following, see Exercise 6.

Theorem 24.6.1. Let G be a group and N a normal subgroup. If the quotient G/N satisfies
the strong generic free group property then G also satisfies the strong generic free group
property.
386 � 24 Non-Commutative Group Based Cryptography

Corollary 24.6.2. Any orientable surface group

g
⟨a1 , . . . , ag , b1 , . . . , bg ; ∏[ai , bi ] = 1⟩
i=1

of genus g ≥ 2 and any nonorientable surface group

⟨a1 , . . . , ag ; a12 ⋅ ⋅ ⋅ ag2 = 1⟩

of genus g ≥ 4 satisfies the strong generic subgroup property.

In general, asymptotic density is not independent of finite generating systems. In-


deed, it is possible for a group property to be generic with respect to one finite generating
system and negligible with respect to another, see [32]. We call a group property 𝒫 suit-
able for a finitely generated group G if it is preserved under isomorphisms and its asymp-
totic density is independent of finite generating systems and supersuitable for G if its
suitable both for G and all subgroups of finite index in G. It is clear that the strong generic
free group property is suitable in any group G which has a non-Abelian free quotient.

Corollary 24.6.3. The strong generic free group property is suitable in any finitely gener-
ated group G which has a non-Abelian free quotient.

We remark that in a strong generic free group the conjugacy problem and the root
extraction problem are generic problems.
In [23] it was shown that there is an interesting connection between the strong
generic free group property of a group G and its subgroups of finite index. The main
result of that paper is that a finitely generated group which has a non-Abelian free quo-
tient satisfies the strong generic free group property if and only if each subgroup of finite
index satisfies the strong generic free group property. As a consequence of this and The-
orem 24.6.4, it follows that many important classes of groups, such as finitely generated
Fuchsian groups with finite co-volume and the braid groups Bn for n ≥ 3 satisfy the
strong generic free group property.

Theorem 24.6.4 (Inheritance Theorem). Let G be a finitely generated group and H ⊂ G a


subgroup of finite index [G : H] = n < ∞. Let 𝒫 be the strong generic free group property.
Then:
1. If 𝒫 is a suitable and generic property in H then it is also suitable and generic in G.
2. If 𝒫 is a suitable and generic property in G then it is also suitable and generic in H.

Proof. Let X be a finite generating system for G. As X is finite, it follows that H is finitely
generated, and H has finite index in G. Let Y be a finite generating system for H. Let 𝒫
be the strong generic free group property and suppose that 𝒫 is a suitable and generic
property in H. Let Sm be the collection of m element subsets that generate a free sub-
group of G.
24.6 The Strong Generic Free Group Property � 387

Let Bk (G) be the ball of radius k in the Cayley graph of G (with respect to X). Since
H is a subgroup of finite index n in G, there exists a complete system of representatives
a1 , . . . , an ∈ G for the left cosets of H in G. We consider the elements of H as vertices in
the Cayley graph of G. Let Bk (H) be the set of vertices in Bk (G) which belong to H. For all i
let ai Bk (H) denote the displaced Bk (H) around the representative ai in the Cayley graph
of G, that is the set of all elements of the form ai h, where h ∈ H is of length ≤ k. Define
Bk′ (H) = ⋃ni=1 ai Bk (H) as the (disjoint) union of these Bk (H). We have |Bk′ (H)| = n⋅|Bk (H)|,
since the cosets ai H and also the ai Bk (H) with them are pairwise disjoint. Let t ∈ ℕ be
the length of the longest geodesic in the Cayley graph of G from the identity element 1 to
one of the representatives ai . With this t we have

Bk−t

(H) ⊂ Bk (G) ⊂ Bk+t

(H). (1)

Now let Bm,k (G) and ai Bm,k (H) be the collection of m element subsets within Bk (G) and
ai Bk (H), respectively, for i = 1, . . . , m. Let A be any m-element subset within Bk (G). Then
A splits into the disjoint union ⋃ni=1 Ai of mi -element subsets Ai within ai Bk+t (H) for i =
1, . . . , n and we have 0 ≤ mi ≤ m for all i (some of the mi may be zero).
In this sense, if we define Bm,k

(H) = ⋃ni=1 ai Bmi ,k (H), m = m1 + ⋅ ⋅ ⋅ + mn , then we get
the inclusions

Bm,k−t

(H) ⊂ Bm,k (G) ⊂ Bm,k+t

(H). (1’)

Here, we consider a disjoint union ⋃ni=1 Ai of mi -element subsets Ai in ai Bk−t (H) with
m1 + ⋅ ⋅ ⋅ + mn = m as an m-element subset Bk (G). If A is a free generating system for a
free subgroup of G, then each Ai is a free generating system for a free subgroup of G.
Then intersecting with Sm leads to

Sm ∩ Bm,k−t

(H) ⊂ Sm ∩ Bm,k (G) ⊂ Sm ∩ Bm,k+t

(H). (2)

On the other hand, if some Ai ⊂ ai Bk (H) contains a subset which generates a free sub-
group of G, then also Aj = aj ai−1 Ai ⊂ aj Bk (H) contains a subset which generates a free
subgroup of G. More concretely, if Ai freely generates a free subgroup of rank p, then
⟨Aj ⟩ has a p generating system which contains a basis for a free subgroup of rank at
least p − 1.
This shows that for k large enough the sets Sm ∩ ai Bm,k (H) are of the same order of
magnitude in m. Applying this we get approximately the equality

|Sm ∩ Bm,k

(H)| |Sm ∩ Bm,k (H)|
′ (H)|
= (3)
|Bm,k |Bm,k (H)|

for k and m large enough.


388 � 24 Non-Commutative Group Based Cryptography

Assume that 𝒫 holds and is suitable in H. Then there exists a constant integer s > 0
such that the length of each y ∈ Y written as a word in X is less then s. Therefore the
fraction on the right hand side of (3) converges to 1 as k → ∞ and m → ∞. Therefore

|Sm ∩ Bm,k (G)|


lim =1
m,k→∞ |Bm,k (G)|

from the inclusions (1’) and (2), completing the proof of (1). The proof for (2) follows in
an entirely analogous manner.

As mentioned above if a finitely generated group G has a non-Abelian free quotient


then the strong generic free group property holds in G and is suitable. Therefore we
have the following corollary.

Corollary 24.6.5. Let G be a finitely generated group and H ⊂ G a subgroup of finite index.
Assume that both G and H have non-Abelian free quotients. Then G has the strong generic
free group property if and only if H has the strong generic free group property.

We now show the strong generic free group property for braid groups.

Theorem 24.6.6. The braid group Bn , n ≥ 3, has the strong generic free group property.

Proof. Denote by σi,i+1 the transposition (i, i + 1) in the symmetric group Sn . The map
σi 󳨃→ σi,i+1 , i = 1, . . . , n − 1, defines a canonical epimorphism π: Bn → Sn . The kernel
of σ is a subgroup of index n! in Bn , called the pure braid group PBn . The group PBn ,
n ≥ 3, maps onto the group PB3 , and the group PB3 is isomorphic to F2 × ℤ, where F2 is
the free group of rank 2. Hence, PBn , n ≥ 3, maps onto F2 . Now, the result follows from
Corollary 24.6.3 and the Inheritance Theorem 24.6.4.

Corollary 24.6.7. The root extraction problem is a generic problem in Bn , n ≥ 3.

We now describe an authentication scheme based on the root extraction problem


as given in [88]. Let Bn , n ≥ 3, be the braid group generated by σ1 , . . . , σn with n even.
Write LBn for the braid group generated by σ1 , . . . , σ n −1 and UBn for the group generated
2
by σ n +1 , . . . , σn .
2
Alice chooses two integers r, s ≥ 2, and two elements a ∈ LBn and c ∈ Bn . Then
Bn , LBn , UBn , X = ar cas , c, r, s are public and a is secret. The authentication is as follows.
Bob chooses an element b ∈ UBn , and sends to Alice Y = br cbs . Alice computes Z = ar Yas
and sends it to Bob. Finally, Bob verifies that Z = br Xbs . The security is based on finding
a root x in Bn when x m , m ≥ 2, is given.
In the protocol a secret braid x is chosen at random, and the braid y = x m is made
public. Hence, we are dealing with braids for which an mth root is known to exist. This
means generically we may find the mth root of y very fast. The interest of braid groups
for cryptography has decreased due to the appearance of algorithms which solve, for
instance, the conjugacy problem and the root extraction problem, fast in the generic
24.6 The Strong Generic Free Group Property � 389

case. The main problem with the cryptographic protocols based on braid groups turns
out to be the key generation.
Public and secret keys are so far chosen at random, and this implies often that the
protocols are insecure against algorithms which have generically a fast complexity. The
importance and the future of braid groups cryptography depends on finding a suitable
key generation procedure, or in popular words, in finding so-called suitable black holes.
Another promising possibility is to look for nongeneric properties of braid groups which
could be used for cryptographic protocols.

24.6.1 Security Analysis of the Group Randomizer Protocols

In order to analyze the security of the group randomizer password protocols, we make
the security assumption that an adversary has access to the coded group theoretic re-
sponses. The strength of the proposed protocol include that an attacker must steal three
things: the original password, the group randomizer and the challenge group. There is
no access without all three. This immediately nullifies middleman attacks. If the adver-
sary pretends to be the verifier to obtain the group words the attack is thwarted by the
facts that the prover can verify the verifier and further if the attacker just transmits from
the middle, nothing can be stolen since each time through a new challenge word must
be used. Further, the group randomizer has an infinite supply of both subgroups and
challenge responses that are done randomly. In addition, since a challenge word can
be used only once the protocol nullifies replay attacks. Since challenge responses are
machine to machine there is essentially zero probability of an incorrect response. The
protocol shuts down with an incorrect response and hence repeat attacks are harmless.
These are in distinction to answer-driven challenge–response systems where a
prover often forgets or misspells a response. In these systems a prover is usually per-
mitted several opportunities to answer making it susceptible to both man-in-the-middle
and repeat attacks.
There are two theoretical attacks that must be dealt with. Relative to these the se-
curity of the system, and hence a security proof for the protocol, is provided by several
results in asymptotic group theory.
The most straightforward attack is for the adversary to collect enough challenge
words and responses. This provides a system of equations in a free group (or a finitely
presented group)

yi1 ⋅ ⋅ ⋅ yit = wi (x1 , . . . , xn ), i = 1, . . . , m.

An adversary can then break the protocol by solving the system

zi = wi (x1 , . . . , xn )

to obtain the challenge group.


390 � 24 Non-Commutative Group Based Cryptography

However, a result by Lysenok and Myasnikov, see [91], shows that solving such sys-
tems of equations in free groups (and in most finitely presented groups) is NP-hard.
Hence this method of attack is impractical in most cases.
A second method of attack is based on the following. The mapping yi → wi is a ho-
momorphism. If a challenge word appears in the subgroup generated by previous chal-
lenge words then an attacker can use this to answer a challenge without ever solving for
the challenge group. However, the probability of succeeding with this approach is essen-
tially zero due to Jitsukawa’s result mentioned in the previous section. Each challenge
word lies in a free group which has the strong generic free group property. Hence as ex-
plained in the previous section the probability is essentially zero that a new challenge
word is in the subgroup generated by previous challenge words.

24.6.2 Implementation of a Group Randomizer System Protocol

The actual implementation of a workable group randomizer system protocol involves


several choices of parameters and subprograms. These include the following choices.
1. The choice of the rank of the ambient free group in the group randomizer systems
A and B.
2. An enhancement program which takes randomly chosen words w1 , . . . , wk in a free
group F and finds a new set of words v1 , . . . , vk generating the same subgroup for
which the words formed in v1 , . . . , vk have a great deal of free cancellation. This
involves Nielsen transformations, see Section 14.3.
3. The choice of parameter sizes for the lengths of the randomly chosen words. In an
actual implementation all words in the generators will have lengths between a and
b where a and b are to be determined. All words used as test logins will have lengths
between c and d with c and d to be determined.
The determination of the optimal values of a, b, c, d are being studied.
4. The implementation of a coded permutation system on {1, . . . , k} where k is the rank
of the challenge group and which can be sent with each challenge word.
5. The development of an automatic reset protocol for the challenge group. In an ideal
situation this can be done without actually communicating the changes between
verifier and prover. That is, each randomizer system does the same protocol auto-
matically when reset is called for.

24.7 A Secret Sharing Scheme Using Combinatorial Group Theory


Recall that the secret sharing problem is the following. We have a secret K and a group of
n participants. This group is called the access control group. A dealer allocates shares to
each participant under given conditions. If a sufficient number of participants combine
their shares then the secret can be recovered. If t ≤ n then an (t, n)-threshold scheme
24.7 A Secret Sharing Scheme � 391

is the one with n total participants and in which any t participants can combine their
shares to recover the secret but not fewer than t. The number t is called the thresh-
old. The scheme is called a secure secret sharing scheme if, given fewer shares than the
threshold, there is no chance to recover the secret.
Panagopoulos, see [96], devised a secret sharing scheme based on the word problem
in finitely presented groups. It is an (t, n)-threshold scheme and its main advantage over
many other secret sharing schemes is that it does not require the secret message to be de-
termined before each individual person receives his share of the secret. For this scheme
it is assumed that the secret is given in the form of a binary sequence. The scheme is as
follows.
1. A finitely presented group G = ⟨x1 , x2 , . . . , xk ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩ is chosen. It is
assumed that the word problem is solvable for the presentation and that m = (t−1 n ).

2. Let A1 , . . . , Am be an enumeration of the subsets of {1, . . . , n} with t − 1 elements.


Define n subsets R1 , . . . , Rn of {r1 , . . . , rm } such that rj ∈ Ri if and only if i ∉ Aj for
i = 1, . . . , n and j = 1, . . . , m. Then for every j ∈ {1, . . . , m}, the word rj is not contained
in exactly t − 1 of the subsets R1 , . . . , Rn . It follows that rj is contained in any union
of t of them, whereas if we take any t − 1 of the sets R1 , . . . , Rn , there exists an index j
such that rj is not contained in their union.
3. Distribute to each of the n persons one of the sets R1 , . . . , Rn . The set {x1 , . . . , xk } is
known to all participants.
4. If the binary sequence to be distributed is a1 , . . . , ak , construct and distribute a se-
quence of elements w1 , . . . , wk of G such that we have wi = 1 in G if and only if
ai = 1 for i = 1, . . . , k. The word wi must involve most of the relations r1 = 1, . . . ,
rm = 1 if wi = 1. Furthermore, all of the relations must be used at some point in the
construction of some element.

Then any t of the n persons can obtain the sequence a1 , . . . , ak by taking the union of
the subsets of the relations of G that they possess. Thus they obtain the presentation
G = ⟨x1 , x2 , . . . , xk ; r1 = ⋅ ⋅ ⋅ = rm = 1⟩ and can solve the word problem wi = 1 in G for
i = 1, . . . , k. A collection of fewer than t persons cannot decode the message correctly,
since the union of fewer than t of the sets R1 , . . . , Rn contains some but not all of the
relations r1 , . . . , rm .
Such a collection leads to a group presentation

G
̃ = ⟨x1 , x2 , . . . , xk ; rj = ⋅ ⋅ ⋅ = rj = 1⟩
1 p

with p < m and G ≠ G, ̃ where wi = 1 in G is, in general, not equivalent to wi = 1 in G.


̃
Notice that the secret sequence to be shared is not needed until the final step. It is
possible for someone to distribute the sets R1 , . . . , Rm and decide at a later time what the
sequence a1 , . . . , ak would be. In that way the scheme can also be used so that t of the n
persons can verify the authenticity of the message. In particular, the binary sequence in
Step 4 may contain a predetermined subsequence (signature) along with the actual mes-
392 � 24 Non-Commutative Group Based Cryptography

sage. Then any t persons may check whether this predetermined sequence is contained
in the encoded message and thus validate it.
In the paper by Panagopoulos, see [96], he also describes some methods for attacking
this scheme and makes some suggestions for possible group presentation types to use.
Moldenhauer [90] proposed a modification of Panagopoulos’ (t, n)-threshold scheme
using Nielsen transformations. We need the following.

Theorem 24.7.1. Let T1 , T2 , . . . be a countable number of matrices of the form

−rj −1 + rj2
Tj = ( )
1 −rj

where rj are integers and rj+1 − rj ≥ 3, r1 ≥ 2. Then the T1 , T2 , . . . form a basis of a free
group of countable rank.

Proof. The isometric circle of Tj is given by |z−rj | = 1 and that of Tj−1 is given by |z+rj | = 1.
The respective isometric disks

K(T1 ), K(T1−1 ), K(T2 ), K(T2−1 ), . . .

are pairwise disjoint because of the restriction on rj . Let F be the group generated by
{T1 , T2 , . . . }. Clearly, F is a subgroup of SL(2, ℤ). Let Sk ⋅ ⋅ ⋅ S1 be a reduced word in F. Each
Si is a Tj or Tj−1 . It may happen that Si+1 = Si . Suppose p lies outside every isometric disk
K(Tj ), K(Tj−1 ), j = 1, 2, . . . . Such a P exists because F is a subgroup of the SL(2, ℤ). Then
S1 (P) lies inside K(S1−1 ). Since S1 (P) lies outside K(S2 ), this is true even if S1 = S2 , it is seen
that S2 S1 (P) is inside K(S2−1 ). We conclude that Q = Sk ⋅ ⋅ ⋅ S1 (P) is inside K(Sk−1 ). Hence,
Sk ⋅ ⋅ ⋅ S1 ≠ 1(= E2 ). This shows that F is free on {T1 , T2 , . . . }.

We now describe the modified (t, n)-threshold scheme. We write Nrj instead of Tj
and choose a large number m of the form m = 2n , n ≥ 64. This allows us to use the idea
of linear congruence generators (modulo m) to get a stream cipher. The dealer performs
the following to distribute the secret among n participants:
1. Start with a set (x1 , Nr1 ), . . . , (xm , Nrm ), where x1 , . . . , xm are the generators of the free
group F(x1 , . . . , xm ) and Nr1 , . . . , Nrm are matrices in SL(2, ℤ) of the form

−ri −1 + r12
Nri = ( )
1 −ri

satisfying r1 ≥ 2 and ri+1 ≥ ri + 3 (more generally, any free generating set for a free
subgroup in SL(2, ℚ)). The secret is a rational number

m m
1 1
∑ =∑ .
i=1
| tr(Nri )| i=1 2|ri |
24.8 The Ko–Lee and AAG Protocols � 393

2. Apply a sequence of Nielsen transformations to the set of pairs above to obtain a


new set

(ν1 , M1 ), . . . , (νm , Mm ).

3. Distribute subsets in Panagopoulos’ scheme. To recover the secret, perform the fol-
lowing:
4. Take a union of their shares. In the case that t participants gather, they are able to
recover the set (ν1 , M1 , . . . , νm , Mm ).
5. Apply a sequence of Nielsen transformations to the obtained set of pairs in order
to Nielsen-reduce the first components and obtain the set x1 , . . . , xm in the first
components. As a result, in the second components, we get the original matrices
Nr1 , . . . , Nrm . Compute the sum ∑m 1
i=1 | tr(N )| .
i

Kotov, Panteleev and Ushakov, see [87], analyzed this secret sharing protocol. They
could reduce it to a system of polynomial equations over the free group F({x1 , . . . , xm } ∪
{a1 , . . . , am−1 }) where xi stands for an unknown matrix Nri and ai stands for the ma-
trix Mi . Replacing xi with an unknown matrix Nri and ai with Mi and performing matrix
multiplication, we obtain a system of polynomial equations which can further fed to
any computer algebra system that can solve polynomial equations, for instance CoCoA.
The solution of the systems provides the original matrices M1 , . . . , Mm . The attack
reconstructs the original data generated by the dealer and does not depend on the func-
tion of M1 , . . . , Mm used to calculate the shared secret. It seems unlikely that their attack
is successful if m ≥ 264 . If so, for chosen matrices Nr1 , . . . , Nrm we still may collect in each
round m new matrices from the countably many and/or may use the stream cipher for
a one-time pad.
Moreover, increasing the length of keys, the number of Nielsen transformations in-
creases the sizes of polynomials and seems to be successful countermeasure against
their attack. Another possibility to repel such attacks is to change the tactic and to work
with more general matrices Nr1 , Nr2 , . . . which form a free generating set of a free sub-
group in SL(k, ℝ), k ≥ 2.

24.8 Ko–Lee and Anshel–Anshel–Goldfeld Protocols


All of the non-Abelian group based protocols depend on the difficulty of solving certain
group decision problems and group theoretical computational problems. Recall that the
conjugacy problem, also called the decision conjugacy problem, for a group G, or more
precisely for a group presentation for G, is the following: given g, h ∈ G, determine
algorithmically if they are conjugate.
The conjugacy problem is unsolvable in general, that is, there exist group presen-
tations for which there does not exist an algorithm that solves the conjugacy problem.
394 � 24 Non-Commutative Group Based Cryptography

Hence a solution to the conjugacy problem is usually associated with a particular class
of group presentations. For example, the conjugacy problem is solvable in free groups
and in torsion-free hyperbolic groups.
Relevant to the Ko–Lee protocol is the conjugator search problem. This is, given a
group presentation for G, and two elements g1 , g2 in G, that are known to be conjugate,
to determine algorithmically a conjugator, that is, an element h ∈ G with g1 = hg2 h−1 . It
is known, as with the decision conjugacy problem, that the conjugator search problem
is undecidable in general.

24.8.1 The Ko–Lee Protocol

Ko, Lee et al., see [85], developed a public key exchange system that is a direct translation
of the Diffie–Hellman protocol to a non-Abelian group theoretic setting. Its security is
based on the difficulty of the conjugacy problem. We assume that the platform group
has nice unique normal forms that are easy to compute for a given group element but
hard to recover the individual group elements under group multiplication.
Recall from Section 24.1 that by this we mean that if G = ⟨X; R⟩ is a finite presenta-
tion for the group G and g ∈ G then there is a unique expression NFX (g) called a normal
form as a word in the generators X. Further, given any g ∈ G it is computationally easy
to find NFX (g). On the other hand, given g1 , g2 ∈ G and given the normal form NFX (g1 g2 ),
it is computationally difficult to recover g1 and g2 . We say that there is good diffusion in
terms of normal forms in forming products.
In any group G and for g, h ∈ G the notation g h indicates the conjugate of g by h,
that is, g h = h−1 gh. What is important for both the Ko–Lee and Anshel–Anshel–Goldfeld
protocols is that relative to this notation, group conjugation behaves exactly as ordinary
exponentiation. That is for groups elements g, h1 , h2 ∈ G we have (g h1 )h2 = g h1 h2 . That
this is true is a straightforward computation

h2
(g h1 ) = h2−1 g h1 h2 = h2−1 h1−1 gh1 h2 = (h1 h2 )−1 g(h1 h2 ) = g h1 h2 .

With this observation, the Ko–Lee protocol exactly mimics, using group conjuga-
tion, the traditional Diffie–Hellman protocol. We first start with a platform group G sat-
isfying the necessary requirements on normal forms. We assume further that the plat-
form group G has a collection of large (noncyclic) subgroups that commute elementwise.
That is, if A, B are two of these subgroups and a ∈ A and b ∈ B, then ab = ba. It is not
necessary that the subgroups themselves be Abelian.
Alice and Bob choose a pair of these commuting subgroups A and B of the platform
group G. A is Alice’s subgroup while Bob’s subgroup is B and these are secret. By assump-
tion each element of A commutes with each element of B. Further, it is not assumed that
A and/or B are themselves Abelian. Now the method completely mimics the classical
Diffie–Hellman technique.
24.8 The Ko–Lee and AAG Protocols � 395

There is a public element g ∈ G, Alice chooses a random secret element a ∈ A and


makes public g a , the conjugate of g by a.
Bob chooses a random secret element b ∈ B and makes public g b the conjugate of g
by b. The secret shared key is g ab . Notice that ab = ba since the subgroups commute. It
follows then that
b a
(g a ) = g ab = g ba = (g b )

just as if these were ordinary exponents.


It follows, as in the number theoretic based Diffie–Hellman protocol, that both Bob
and Alice can determine the common secret. Alice knows her secret key a and Bob’s
public key g b . Hence she knows (g b )a = g ba . Bob knows his secret key b and g a is public.
Hence Bob knows (g a )b = g ab . However, as explained g ab = g ba . The difficulty is in that
of the decision conjugacy problem.
It is known that both the decision conjugacy problem and the conjugator search
problem are undecidable in general. However, there are groups where both are solvable
but hard, that is the problems are solvable but are not solvable in polynomial time. These
groups then become the target platform groups for the Ko–Lee protocol. Ko and Lee
in their initial work suggest the use of the braid groups. We will discuss braid group
cryptography later in this chapter.
We now summarize the formal setup for the Ko–Lee Key Exchange Protocol. After
this we will show how to use the ElGamal method to construct a public key encryption
system from this.

24.8.1.1 Ko–Lee Preparation


1. We start with a platform group G. We assume that G has a finite presentation with
efficiently computable normal forms that have good diffusion. Further the group G
must have a large collection of subgroups that commute elementwise.
2. We choose an element g ∈ G.
3. We assume that Alice wants to share a common key with Bob. Alice and Bob choose
subgroups A and B that elementwise commute. A is Alice’s subgroup and B is Bob’s
subgroup. These subgroups are kept secret and known only to Bob and Alice, re-
spectively.

Ko–Lee Key Exchange


1. Alice randomly chooses an a ∈ A. This element a will be her secret key. Her public
key is (g, g a ) where g a = a−1 ga is the conjugate of g by her secret key a. All pub-
lic information and communication is done in terms of the normal forms of these
elements.
2. Bob randomly chooses an element b ∈ B. This element b will be his secret key. His
public key is (g, g b ) where g b = b−1 gb is the conjugate of g by his secret key b. As
396 � 24 Non-Commutative Group Based Cryptography

with Alice all public information and communication is done in terms of the normal
forms of these elements.
3. The secret shared key is g ab .

24.8.1.2 ElGamal Encryption Using the Ko–Lee Protocol


As with the standard Diffie–Hellman key exchange protocol using number theory, the
Ko–Lee protocol can be changed to an encryption system via the ElGamal method. There
are several different variants of noncommutative ElGamal systems. At the simplest level
we assume that we have a group G appropriate for the Ko–Lee key exchange and that
Alice and Bob want to communicate secretly. The element g ∈ G is public and Alice and
Bob, respectively, have chosen their appropriate commuting subgroups A and B. Bob
has made public g b for b ∈ B in normal form and Alice has made public g a for a ∈ A
also in normal form. The secret shared key is then g ab . We assume that Alice wants to
send an encrypted message to Bob and further we assume the encrypted message can be
encoded as h ∈ G, that is as an element of the group G. Alice then sends to Bob the normal
form of hg ab . Bob can determine the common shared secret g ab . He then multiplies hg ab
by (g ab )−1 to obtain the secret h.
As with the number theoretic based public key cryptosystems, the Ko–Lee method
can be used to provide methods for other protocols, especially authentication and digital
signature protocols.

24.8.2 The Anshel–Anshel–Goldfeld Protocol

We now describe another non-Abelian group-based public key exchange protocol. It is


somewhat similar to the Ko–Lee protocol and was developed at approximately the same
time. This is the Anshel–Anshel–Goldfeld public key exchange protocol.
As in the Ko–Lee protocol we start with a group G given by a finite presentation
G = ⟨X; R⟩. We further assume as before that there are efficiently computable normal
forms relative to the presentation ⟨X; R⟩. The Ko–Lee protocol required two large com-
muting subgroups. For communication, the Anshel–Anshel–Goldfeld protocol requires
a choice of subgroups of G, but they need not commute. While the difficulty of the deci-
sion conjugacy problem provides the security for the Ko–Lee method, it is the difficulty
of the conjugator search problem that provides the hard problem, and hence the secu-
rity, in the Anshel–Anshel–Goldfeld protocol.
Once we have our platform group G, we assume that Alice and Bob want to obtain
a common shared secret or a common shared secret key. We assume that this secret key
can be expressed as a group element g ∈ G. The first step is for Alice and Bob to choose
random finitely generated subgroups of G by giving a set of generators for each,

A = {a1 , . . . , an }, B = {b1 , . . . , bm },
24.8 The Ko–Lee and AAG Protocols � 397

and make them public. The subgroup A is Alice’s subgroup while the subgroup B is Bob’s
subgroup.
Alice chooses a secret group word a = w(a1 , . . . , an ) in her subgroup while Bob
chooses a secret group word b = v(b1 , . . . , bm ) in his subgroup. As before, for an ele-
ment g ∈ G we denote by NFX (g) the normal form for g. Alice knows her secret word a
and knows the generators bi of Bob’s subgroup. She can then form the conjugates of the
generators of Bob’s subgroup B by her secret element a ∈ A. That is, she can compute
bai = a−1 bi a for each bi . She then makes public the normal forms of these conjugates

NFX (bai ), i = 1, . . . , m.

Bob does the analogous thing. He knows his secret word b and the generators ai ,
i = 1, . . . , n of Alice’s subgroup A and hence can compute the conjugates aib = b−1 ai b for
i = 1, . . . , n. He then makes public the normal forms of the conjugates

NFX (ajb ), j = 1, . . . , n.

The common shared secret is the commutator

[a, b] = a−1 b−1 ab = a−1 ab = (ba ) b.


−1

Notice that this is known for both Alice and Bob. Alice knows ab = b−1 ab since she
knows a in terms of generators ai of her subgroup and she knows the conjugates by
b, since Bob has made the conjugates of the generators of A by b public. That is, Alice
knows a = w(a1 , . . . , an ) and ab = b−1 ab = w(b−1 a1 b, . . . , b−1 an b) = w(a1b , . . . , anb ). Since
Alice knows ab , she knows

[a, b] = a−1 b−1 ab = a−1 ab .

In an analogous manner Bob knows [a, b] = (ba )−1 b, since he knows his secret el-
ement b in terms of the generators bj , j = 1, . . . , m, of his subgroup B and Alice has
made public the conjugates of each of his generators by her secret element a. Hence
b = v(b1 , . . . , bm ) so that ba = v(ba1 , . . . , bam ) and this is known to Bob. Since Bob knows ba
and b, he knows
a
[a, b] = a−1 b−1 ab = ab b = (b−1 ) b = (ba ) b.
−1

Notice that in this system there is no requirement that the chosen subgroups A and
B commute.
An attacker would have to know the corresponding conjugator, that is the element
that conjugates each of the generators, that is, the conjugator search problem: Given
elements g, h in a group G, where it is known that g k = k −1 gk = h, determine the conju-
gator k. It is known that this problem is undecidable in general, that is, there are groups
398 � 24 Non-Commutative Group Based Cryptography

where the conjugator cannot be determined algorithmically. On the other hand there
are groups where the conjugator search problem is solvable but difficult, that is, the
complexity of solving the conjugator search problem is hard. Such groups become the
ideal platform groups for the Anshel–Anshel–Goldfeld protocol.
The security in this system is then in the computational difficulty of the conjuga-
tor search problem. Anshel, Anshel, Goldfeld suggested, as did Ko, Lee et al., the braid
groups, Bn , as potential platforms. The braid groups are a class of infinite, finitely pre-
sented groups that arise in many different contexts. The braid group Bn has a standard
presentation with n − 1 generators.
The necessary parameters that must be decided in using the braid groups as plat-
forms for either the Ko–Lee protocol or the Anshel–Anshel–Goldfeld protocol are then
the number of generators of the braid groups used and the number of generators for
the chosen subgroups. For example B200 , the braid group on 200 strands with 12 or more
generators in the chosen subgroups might be used. It has been shown that the larger
the number of strands, the harder it is to attack the protocol. The suggested use of the
braid groups by both Anshel, Anshel and Goldfeld and Ko and Lee led to the develop-
ment of braid group cryptography. There have been various attacks on the braid group
cryptosystems.
We now summarize the formal setup for the Anshel–Anshel–Goldfeld Key Exchange
Protocol. After this we will show how to use the ElGamal method to construct a public
key encryption system from this.

24.8.2.1 Anshel–Anshel–Goldfeld Preparation


1. We start with a platform group G. We assume that G has a finite presentation with
efficiently computable normal forms that have good diffusion. Further, there is a
large collection of efficiently computable subgroups.
2. We assume that Alice wants to share a common key with Bob. Alice and Bob choose
random finitely generated subgroups of G by giving a set of generators for each,

A = {a1 , . . . , an }, B = {b1 , . . . , bm },

and make them public. The subgroup A is Alice’s subgroup while the subgroup B is
Bob’s subgroup.

24.8.2.2 Anshel–Anshel–Goldfeld Key Exchange


1. Alice chooses a secret group word a = w(a1 , . . . , an ) in her subgroup. Alice knows
her secret word a and knows the generators bi of Bob’s subgroup. She can then form
the conjugates of the generators of Bob’s subgroup B by her secret element a ∈ A.
That is, she can compute bai = a−1 bi a for each bi . She then makes public the normal
forms of these conjugates

NFX (bai ), i = 1, . . . , m.
Exercises � 399

2. Bob chooses a secret group word b = w(b1 , . . . , bm ) in his subgroup. Bob knows his
secret word b and knows the generators ai of Alice’s subgroup. He can then form
the conjugates of the generators of Alice’s subgroup A by his secret element b ∈ B.
That is, he can compute aib = b−1 ai b for each ai . He then makes public the normal
forms of these conjugates

NFX (aib ), i = 1, . . . , m.

3. The secret shared key is the commutator

[a, b] = a−1 b−1 ab = a−1 ab = (ba ) b.


−1

24.8.2.3 ElGamal Encryption using the Anshel–Anshel–Goldfeld Protocol


As with all public key exchange protocols, the Anshel–Anshel–Goldfeld key exchange
can be developed into a cryptosystem by the ElGamal method. This works essentially
in the same manner as for the Ko–Lee protocol. We assume that we have a group G
appropriate for the Anshel–Anshel–Goldfeld key exchange and that Alice and Bob want
to communicate secretly.
Alice and Bob, respectively, have chosen their appropriate subgroups A and B whose
generators have been made public. Bob has made public the conjugates of the generators
of A by his secret element b ∈ B in normal form and Alice has made public the conjugates
of the generators of B by her secret element a ∈ A, also in normal form. The secret shared
key is then the commutator [a, b].
We assume that Alice wants to send an encrypted message to Bob, and further we
assume that the encrypted message can be encoded as h ∈ G, that is, as an element of
the group G. Alice then sends to Bob the normal form of h[a, b]. Bob can determine the
common shared key [a, b]. He then multiplies h[a, b] by [a, b]−1 to obtain the secret h.

Exercises
1. Bob has a backup authentication security system as described in Section 24.5. His
basic words are w1 = x1−1 x22 x3−2 , w2 = x15 x23 , and w3 = x25 x13 x2−2 x34 . The bank sends him
w = y21 y33 y1 . What must the group randomizer send back?
2. Let M = PSL(2, ℤ) be the modular group. Let 𝒜 = {a, b, c, d, e, f , g} be a 7 letter
plaintext alphabet. Choose a free subgroup of the modular group to encrypt these.
(a) Using your basic encryption and message units of size 3, what would be the
encryption matrices for the message abbdceffgcba?
(b) Using your basic encryption and the algorithm given in Problem 1, what is the
plaintext message for ( 85 35 ) and ( 73 49 )?
3. The following protocol is based on the factorization search problem which is: Given
two subgroups A, B of a group G and w ∈ G, to find a ∈ A, b ∈ B with w = ab. This
400 � 24 Non-Commutative Group Based Cryptography

protocol is described in [93]. For this problem you must show and explain that the
protocol works.
The requirements for the protocol are as follows: a public group and two public
subgroups A, B that commute elementwise. Alice randomly chooses two private el-
ements a1 ∈ A and b1 ∈ B and sends a1 b1 to Bob. Bob does the same choosing a2 ∈ A
and b2 ∈ B and sends a2 b2 to Alice.
The common shared secret is K = a2 a1 b1 b2 .
4. Prove Epstein’s theorem: Given a random finitely generated subgroup of GL(n, ℝ),
2
with probability 1 it is a free group. The probability is standard measure on ℝn .
Hint: Given a finite set of matrices in GL(n, ℝ), think what a relation between them
would mean algebraically on the coefficients and where this would place the matri-
ces topologically.
5. Let G = H1 ∗ ⋅ ⋅ ⋅ ∗ Hn with n ≥ 2 be the free product of finitely many nontrivial
groups. Suppose that H1 ≥ 3 if n = 2. Show that G has the strong generic free group
property.
6. Let G be a group and N be a normal subgroup. Show: If the quotient G/N satisfies
the strong generic free group property then G also satisfies the strong generic free
group property.
7. Show that a group with a generating set X is an epimorphic image of F(X). Moreover,
every map X → G with G a group can be extended to a unique homomorphism
f : F(X) → G.
8. Let F be a free group on {x1 , . . . , xn }. Show that each conjugation xi 󳨃→ gxi g −1 with
g ∈ F can be written as a sequence of elementary Nielsen transformations.
9. Let F be a free group on {x1 , . . . , xn }. Show that the automorphism group Aut(F) is
generated by the elementary Nielsen transformations (N1) and (N2).
10. Let PBn stand for the pure braid group, n ≥ 3. Using the Reidemeister–Schreier
method, show that this group has a presentation with generators

Aij = σj−1 σj−2 ⋅ ⋅ ⋅ σi+1 σi2 σi+1


−1
⋅ ⋅ ⋅ σj−2 σj−1
−1 −1

where 1 ≤ i < j ≤ n and relations

{
{ Aij if s < i or j > r,
{
{
{Ais Aij Ais
{ if i < j = r < s,
Ars Aij A−1
rs = { −1 −1
{Aij Air Aij Air Aij
{
{ if i < r < j = s,
{
{ −1 −1
{Ais Air Ais Air Aij Air Ais Air Ais if i < r, j, s.
−1 −1

11. Show that the pure braid group PB3 is isomorphic to the direct product F2 × ℤ.
12. Let Fn be the free group of rank n on the free generating system X = {x1 , . . . , xn }
and let β ∈ Aut(Fn ). Show that β ∈ Bn if and only if β satisfies the following two
conditions:
(1) β(xi ) is conjugate to another generator.
(2) β(x1 ⋅ ⋅ ⋅ xn ) = x1 ⋅ ⋅ ⋅ xn .
Exercises � 401

13. Let G be B20 , the braid group on 19 generators σ1 , . . . , σ19 . Let A be the subgroup
generated by σ1 , . . . , σ5 and B the subgroup generated by σ16 , . . . , σ19 .
Let g = σ73 σ12 σ3−1 σ5−2 σ10 , a = σ24 σ32 σ1 , and b = σ17
4 −1
σ18 σ17 .
(a) What is the secret shared key using the Ko–Lee protocol?
(b) What is the secret shared key using the Anshel–Anshel–Goldfeld protocol?
Bibliography
General Abstract Algebra

[1] J. L. Alperin and R. B. Bell, Groups and Representations, Springer-Verlag, 1995.


[2] M. Artin, Algebra, Prentice-Hall, 1991.
[3] C. Curtis and I. Reiner, Representation Theory of Finite Groups and Associative Algebras, Wiley
Interscience, 1966.
[4] C. Curtis and I. Reiner, Methods of Representation Theory I, Wiley Interscience, 1982.
[5] C. Curtis and I. Reiner, Methods of Representation Theory II, Wiley Interscience, 1986.
[6] V. Diekert, M. Kufleitner, G. Rosenberger, and U. Hertrampf, Discrete Algebraic Methods, De Gruyter,
2016.
[7] B. Fine and G. Rosenberger, The Fundamental Theorem of Algebra, Springer-Verlag, 2000.
[8] J. Fraleigh, A First Course in Abstract Algebra, 7th ed., Addison-Wesley, 2003.
[9] E. G. Hafner, Lineare Algebra, Wiley-VCH, 2018.
[10] P. R. Halmos, Naive Set Theory, Springer-Verlag, 1998.
[11] I. Herstein, Topics in Algebra, Blaisdell, 1964.
[12] M. Kreuzer and S. Robiano, Computational Commutative Algebra I and II, Springer-Verlag, 1999.
[13] S. Lang, Algebra, Addison-Wesley, 1965.
[14] S. MacLane and G. Birkhoff, Algebra, Macmillan, 1967.
[15] N. McCoy, Introduction to Modern Algebra, Allyn and Bacon, 1960.
[16] N. McCoy, The Theory of Rings, Macmillan, 1964.
[17] G. Stroth, Algebra. Einführung in die Galoistheorie, De Gruyter, 1998.
[18] A. Zimmermann, Representation Theory, Spinger-Verlag, 2014.

Group Theory and Related Topics

[19] G. Baumslag, Topics in Combinatorial Group Theory, Birkhäuser, 1993.


[20] O. Bogopolski, Introduction to Group Theory, European Mathematical Society, 2008.
[21] T. Camps, V. Große Rebel, and G. Rosenberger, Einführung in die kombinatorische und die geometrische
Gruppentheorie, Heldermann Verlag, 2008.
[22] T. Camps, S. Kühling, and G. Rosenberger, Einführung in die mengenteoretische und die algebraische
Topologie, Heldermann Verlag, 2006.
[23] C. Carstensen, B. Fine, and G. Rosenberger, On asymptotic densities and generic properties in finitely
generated groups, Groups Complex. Cryptol., 2, 212–225, 2010.
[24] P. Dehornoy, Braids and Self-Distributivity, Birkhäuser, 2000.
[25] D. B. A. Epstein, Almost all subgroups of Lie groups are free, J. Algebra, 19, 261–262, 1971.
[26] B. Fine, A. Moldenhauer, G. Rosenberger, and L. Wienke, Topics in Infinite Group Theory, De Gruyter,
2021.
[27] B. Fine, A. Moldenhauer, G. Rosenberger, A. Schürenberg, and L. Wienke, Geometry and Discrete
Mathematics: A Selection of Highlights, 2nd ed., De Gruyter, 2022.
[28] B. Fine and G. Rosenberger, Algebraic Generalizations of Discrete Groups, Marcel Dekker, 2001.
[29] B. Fine, A. Myasnikov, and G. Rosenberger, Generic subgroups of amalgams, Groups Complex. Cryptol.,
1, 51–61, 2009.
[30] D. Gorenstein, Finite Simple Groups. An Introduction to Their Classification, Plenum Press, 1982.
[31] D. Johnson, Presentations of Groups, Cambridge University Press, 1990.
[32] I. Kapovich, I. Kaimonovich, and P. Schupp. The Subadditive Ergodic Theorem and generic stretching
factors for free group automorphisms, Isr. J. Math., 157, 1–46, 2007.

https://doi.org/10.1515/9783111142524-025
404 � Bibliography

[33] S. Katok, Fuchsian Groups, Univ. of Chicago Press, 1992.


[34] G. Kern-Isberner and G. Rosenberger, A note on numbers of the form x 2 +Ny 2 , Arch. Math., 43, 148–155,
1986.
[35] R. C. Lyndon, Groups and Geometry, LMS Lecture Note Series 101, Cambridge University Press, 1985.
[36] R. C. Lyndon and P. Schupp, Combinatorial Group Theory, Springer-Verlag, 1977.
[37] W. Magnus, A. Karrass, and D. Solitar, Combinatorial Group Theory, Wiley, 1966.
[38] K. A. Mihailova, The occurence problem for direct products of groups, Dokl. Akad. Nauk SSSR, 119,
1103–1105, 1958.
[39] C. F. Miller, On Group-Theoretic Decision Problems and Their Classification, Princeton University Press,
1971.
[40] B. H. Neumann, Über ein gruppentheoretisch-arithmetisches Problem, Sitz.ber. Preuss. Akad. Wiss. Phys.
Math. Kl., 429–444, 1933.
[41] D. J. S. Robinson, A Course in the Theory of Groups, Springer-Verlag, 1982.
[42] J. Rotman, Group Theory, 3rd ed., Wm. C. Brown, 1988.
[43] J. Rotman, An Introduction to the Theory of Groups, Springer-Verlag, 1999.

Number Theory

[44] L. Ahlfors, Introduction to Complex Analysis, Springer-Verlag, 1968.


[45] T. M. Apostol, Introduction to Analytic Number Theory, Springer-Verlag, 1976.
[46] A. Baker, Transcendental Number Theory, Cambridge University Press, 1975.
[47] H. Cohn, A Classical Invitation to Algebraic Numbers and Class Fields, Springer-Verlag, 1978.
[48] L. E. Dickson, History of the Theory of Numbers, Chelsea, 1950.
[49] B. Fine, A note on the two-square theorem, Can. Math. Bull., 20, 93–94, 1977.
[50] B. Fine, Sums of squares rings, Can. J. Math., 29, 155–160, 1977.
[51] B. Fine, The Algebraic Theory of the Bianchi Groups, Marcel Dekker, 1989.
[52] B. Fine, A. Gaglione, A. Moldenhauer, G. Rosenberger, and D. Spellman, Algebra and Number Theory:
A Selection of Highlights, De Gruyter, 2017.
[53] B. Fine and G. Rosenberger, Number Theory: An Introduction via the Distribution of Primes, 2nd ed.,
Birkhäuser, 2016.
[54] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, 5th ed., Clarendon Press, 1979.
[55] E. Landau, Elementary Number Theory, Chelsea, 1958.
[56] M. Newman, Integral Matrices, Academic Press, 1972.
[57] I. Niven and H. S. Zuckerman, The Theory of Numbers, 4th ed., John Wiley, 1980.
[58] O. Ore, Number Theory and its History, McGraw-Hill, 1949.
[59] H. Pollard and H. Diamond, The Theory of Algebraic Numbers, Carus Mathematical Monographs 9,
Math. Assoc. of America, 1975.
[60] J. H. Silverman, The Arithmethic of Elliptic Curves, Springer-Verlag, 1986.
[61] W. C. Waterhouse, Elliptic Curves: Number Theory and Cryptography, Chapman and Hall, 2003.

Cryptography

[62] I. Anshel, M. Anshel, and D. Goldfeld, An algebraic method for public key cryptography, Math. Res. Lett.,
6, 287–291, 1999.
[63] G. Baumslag, Y. Brjukhov, B. Fine, and G. Rosenberger, Some cryptoprimitives for noncommutative
algebraic cryptography, in Aspects of Infinite Groups, 26–44, World Scientific Press, 2009.
Bibliography � 405

[64] G. Baumslag, Y. Brjukhov, B. Fine, and D. Troeger, Challenge response password security using
combinatorial group theory, Groups Complex. Cryptol., 2, 67–81, 2010.
[65] G. Baumslag, T. Camps, B. Fine, G. Rosenberger, and X. Xu, Designing key transport protocols using
combinatorial group theory, Contemp. Math., 418, 35–43, 2006.
[66] G. Baumslag, B. Fine, M. Kreuzer, and G. Rosenberger, A Course in Mathematical Cryptography,
De Gruyter, 2015.
[67] G. Baumslag, B. Fine, and X. Xu, Cryptosystems using linear groups, Appl. Algebra Eng. Commun.
Comput., 17, 205–217, 2006.
[68] G. Baumslag, B. Fine, and X. Xu, A proposed public key cryptosystem using the modular group,
Contemp. Math., 421, 35–44, 2007.
[69] J. Birman, Braids, Links and Mapping Class Groups, Annals of Math Studies, 82, Princeton University
Press, 1975.
[70] A. V. Borovik, A. G. Myasnikov, and V. Shpilrain, Measuring sets in infinite groups, in Computational and
Statistical Group Theory, Contemp. Math., 298, 21–42, 2002.
[71] J. A. Buchmann, Introduction to Cryptography, Springer 2004.
[72] T. Camps, Surface braid groups as platform groups and applications in cryptography, Ph. D. thesis,
Universität Dortmund, 2009.
[73] R. E. Crandall and C. Pomerance, Prime Numbers. A Computational Perspective, 2nd ed.,
Springer-Verlag, 2005.
[74] P. Dehornoy, Braid-based cryptography, Contemp. Math., 360, 5–34, 2004.
[75] B. Eick and D. Kahrobaei, Polycyclic groups: A new platform for cryptology?, arXiv:math/0411077, 1–7,
2004.
[76] D. Garber, Braid group cryptography, World Scientific Review Volume, arXiv:0711.3941, 2008.
[77] R. Gilman, A. G. Myasnikov, and D. Osin, Exponentially generic subsets of groups, Ill. J. Math., 54,
371–388, 2010.
[78] M. I. Gonzalez Vasco and R. Steinwandt, Group Theoretic Cryptography, Chapman & Hall, 2015.
[79] D. Grigoriev and I. Ponomarenko, Homomorphic public-key cryptosystems over groups and rings, Quad.
Mat., 2005.
[80] P. Hoffman, Archimedes’ Revenge, W. W. Norton & Company, 1988.
[81] T. Jitsuwaka, Malnormal subgroups of free groups, Contemp. Math., 298, 83–96, 2002.
[82] D. Kahrobaei and B. Khan, A non-commutative generalization of the El-Gamal key exchange using
polycyclic groups, in Proceedings of IEEE, 1–5, 2006.
[83] I. Kapovich and A. Myasnikov, Stallings foldings and subgroups of free groups, J. Algebra, 248, 608–668,
2002.
[84] K. H. Ko, D. Choi, M. Cho, and J. Lee, New signature scheme using conjugacy problem, IACR Cryptology
ePrint Archive, 168, 1–13, 2002.
[85] K. H. Ko, S. J. Lee, J. H. Cheon, J. H. Han, J. S. Kang, and C. Park, New public-key cryptosystems using
Braid groups, in Advances in Cryptography, Proceedings of Crypto 2000, Lecture Notes in Computer
Science, 1880, 166–183, 2000.
[86] N. Koblitz, Algebraic Methods of Cryptography, Springer, 1998.
[87] M. Kotov, D. Panteleev, and A. Ushakov, Analysis of the secret schemes based on Nielsen transformations,
Groups Complex. Cryptol., 10, 1–8, 2018.
[88] S. Lal and A. Chaturvedi, Authentication schemes using braid groups, arXiv:cs/0507066, 2005.
[89] W. Magnus, Rational Representations of Fuchsian Groups and Non-Parabolic Subgroups of the Modular
Group, Nachrichten der Akad. Göttingen, 179–189, 1973.
[90] A. Moldenhauer, Cryptographic protocols based on inner product spaces and group theory with a special
focus on the use of Nielsen transformations, Ph. D. thesis, University of Hamburg, 2016.
[91] I. G. Lysenok and A. G. Myasnikov, A polynomial bound on solutions of quadratic equations in free
groups, Proc. Steklov Inst. Math., 274, 136–173, 2011.
406 � Bibliography

[92] A. G. Myasnikov, V. Shpilrain, and A. Ushakov, A practical attack on some braid group based
cryptographic protocols, in CRYPTO 2005, Lecture Notes in Computer Science, 3621, 86–96, 2005.
[93] A. G. Myasnikov, V. Shpilrain, and A. Ushakov, Group-Based Cryptography, Advanced Courses in
Mathematics, CRM, Barcelona, 2007.
[94] A. D. Myasnikov and A. Ushakov, Length based attack and braid groups: Cryptanalysis of
Anshel–Anshel–Goldfeld key exchange protocol, Lect. Notes Comput. Sci., 4450, 76–88, 2007.
[95] G. Petrides, Cryptoanalysis of the public key cryptosystem based on the word problem on the Grigorchuk
groups, in Cryptography and Coding, Lecture Notes in Computer Science, 2898, 234–244, 2003.
[96] D. Panagopoulos, A secret sharing scheme using groups, arXiv:1009.0026, 2010.
[97] J. J. Quisquarter, L. C. Guillou, and T. A. Bersom, How to explain zero-knowledge protocols to your
children, in Advances in Cryptology – CRYPTO’ 89 Proceedings, Lecture Notes in Computer Science,
435, 628–631, 1990.
[98] C. E. Shannon, Communication theory of secrecy systems, Bell Syst. Tech. J., 28, 656–715, 1949.
[99] V. Shpilrain and A. Ushakov, The conjugacy search problem in public key cryptography; unnecessary and
insufficient, Appl. Algebra Eng. Commun. Comput., 17, 285–289, 2006.
[100] V. Shpilrain and A. Zapata, Using the subgroup memberhsip problem in public key cryptography,
Contemp. Math., 418, 169–179, 2006.
[101] R. Steinwandt, Loopholes in two public key cryptosystems using the modular groups, preprint, University
of Karlsruhe, 2000.
[102] R. Stinson, Cryptography; Theory and Practice, Chapman and Hall, 2002.
[103] N. R. Wagner and M. R. Magyarik, A public-key cryptosystem based on the word problem, in Advances in
Cryptology, 19–36, 1985.
[104] M. J. Wiener, Cryptoanalysis of short RSA secret exponents, IEEE Trans. Inf. Theory, 36, 553–558, 1990.
[105] X. Xu, Cryptography and infinite group theory, Ph. D. thesis, CUNY, 2006.
[106] A. Yamamura, Public key cryptosystems using the modular group, in Public Key Cryptography, Lecture
Notes in Computer Sciences, 1431, 203–216, 1998.
Index
Abelian group 2, 97 conjugacy search problem 379
Abelianization 177 conjugation in groups 140
affine coordinate ring 318 conjugator search problem 370
algebra 330 constructible number 78
algebraic closure 72, 88, 92 construction of a regular n-gon 82
algebraic extension 68 coset 17, 127
algebraic geometry 312 cryptosystem
algebraic integer 295 – free group 372
algebraic number field 297 cyclic group 121
algebraic numbers 66, 73 cyclotomic field 253
algebraic variety 312
algebraically closed 88, 91 decision conjugacy problem 379
alternating group 166 Dedekind domain 48
annihilator 271 degree of a representation 343
associates 34 derived series 177
attack dihedral groups 154
– length based 376 dimension of an algebraic set 319
automorphism 11 direct summand 328
axiom of choice 25 divisibility 28
axiom of well-ordering 25 division algorithm 29
division ring 107, 336
basis theorem for finite Abelian groups 150, 286 doubling the cube 81
Betti number 288 dual module 332
Burnside’s theorem 350 Dyck’s theorem 217

Cardano’s formulas 256 Eisenstein’s criterion 60


Cayley complex 213 elliptic function 319
Cayley’s theorem 127 endomorphism algebra 339
cell complex 210 Euclidean algorithm 31
centralizer 182 Euclidean domain 43
challenge group 381 Euclidean group 123
character 344 Euclidean norm 44
character table 349 Euclid’s lemma 20
characteristic 14 extension field 65
characters and character theory 344
class equation 184 factor group 17, 143
class function 347 factor module 327
class sums 344 factor R-module 327
combinatorial group theory 193 factor ring 9
commutative algebra 312 Feit–Thompson theorem 190, 350
commutative ring 2 field 4
commutator 176 – extension 65
composition series 179 field extension 65
composition series for modules 328 – algebraic 68
congruence motion 123 – by radicals 249
conjugacy class 182 – degree 65
conjugacy problem 217, 379 – finite 65

https://doi.org/10.1515/9783111142524-026
408 � Index

– finitely generated 68 – finitely related 200


– isomorphic 66 – free Abelian 288
– separable 235 – free product 218
– simple 68 – generating system 199
– transcendental 68 – generators 127, 199
field of fractions 13 – homomorphism 121
finite fields 238 – internal direct product 149
finite integral domains 5 – isomorphism 121
fix field 224 – order 16, 97, 119
free group 194 – presentation 127, 200
– rank 197 – relations 127
free group cryptosystem 372 – relator 200
free modules 274, 275 – simple 168
free product 218 – solvable 173
free reduction 195 – transversal 128
Frobenius homomorphism 15 group action 181
Fuchsian group 202 group algebra 329, 330
fully reducible representation 334 group isomorphism theorem 18, 144
fundamental theorem of algebra 96, 101 group presentation 200
fundamental theorem of arithmetic 28 group representation 325, 326
fundamental theorem of Galois theory 225 group ring 329
fundamental theorem of modules 280 group rings and modules over group rings 329
fundamental theorem of symmetric polynomials group table 120
100
Hamiltonian skew field 107
G-invariant subspace 334 Hilbert basis theorem 315
Galois extension 235 Hilbert’s Nullstellensatz 315, 316
– finite 224 homomorphism
Galois group 222 – group 16
Galois theory 221 – automorphism 16
Gauss’ lemma 57 – epimorphism 16
Gaussian integers 45 – isomorphism 16
Gaussian primes 47 – monomorphism 16
Gaussian rationals 47 – ring 10
general linear group 122 – automorphism 11
generic free group property 385 – endomorphism 11
geodesic space 215 – epimorphism 10
geodesic triangle 215 – isomorphism 10
group 16, 97, 119 – monomorphism 10
– Abelian 2, 16, 119 hyperbolic 216
– center 182 hyperbolic geodesic space 215
– conjugate elements 182
– coset 127 ideal 7
– coset representative 128 – generators 26
– cyclic 133 – maximal 23
– direct product 148 – prime 21
– finite 16, 97, 119 – product 22
– finitely generated 200 ideals in ℤ 7
– finitely presented 200 index of a subgroup 17
Index � 409

inner automorphism of G by a 158 normal forms 198


inner product 347 normal series 172
insolvability of the quintic 255 normal subgroup 17, 141
integral closure 300 normalizer 184
integral domain 3
integral element 298 opposite algebra 339
integral ring extension 299 ordinary representation theory 343
integrally closed 300 orthogonality relations 349
intermediate field 66
irreducible character 345 p-group 155
irreducible element 34 p-Sylow subgroup 158
irreducible representation 334 perfect field 235
isometry 123 permutation 16, 98
isomorphism problem 217 permutation group 125
permutation module 332
Jordan–Hölder theorem 179 polynomial 41, 51
Jordan–Hölder theorem for R-modules 328 – coefficients 41, 51
– constant 41
K -isomorphism 88 – degree 41, 51
kernel 18 – irreducible 42, 52, 53
Kronecker’s theorem 88 – leading coefficient 41, 51
Krull dimension 320 – linear 41, 51
Krull’s lemma 322 – prime 42, 53
Kurosh theorem 219 – primitive 55
– quadratic 41, 51
Lagrange’s theorem 17 – separable 235
left module 327 – zero 41
left R-module 327 – zero of 52
length based attack 376 prime element 34
linear action 326 prime field 14
linear character 345 prime ideal 21
linear representation 325, 326 prime ring 14
local ring 322 primitive element theorem 246
principal character 345
Maschke’s theorem 332 principal ideal 7, 26
maximal ideal 23 principal ideal domain 26
minimal polynomial 69 Prüfer ring 49
modular group 201 purely transcendental 305
modular representation theory 334
modular rings 4 quasi-isometric 213
modular rings in ℤ 10 quasi-isometric groups 215
module 267 quaternions 107
module homomorphism 329 quotient group 17, 143
module sum 328 quotient ring 9

Nielsen–Schreier theorem 198 R-algebra 298


Noetherian 314 R-module 267
norm 35 – cyclic 269
normal extension 115 – direct product 272
410 � Index

– factor module 270 strong generic free group property 385


– faithful 272 subfield 6
– free 275 subgroup 17, 97, 120
– generators 269 – commutator 176
– quotient module 270 – conjugate 140
– torsion element 272 – cyclic 121
– unitary 267 – derived 176
radical 314 – index 128
– nil 314 – normal 141
rational integers 47 submodule 327
rational primes 47 subring 6
regular character 345 Sylow theorems 158, 184
Reidemeister–Schreier process 208, 375 symmetric group 16, 98, 161
ring 2 symmetric polynomials 100
– commutative 2 symmetry 124
– finite 2
– prime 14 the character table and orthogonality relations 349
– trivial 2 Theorem of Frobenius 111
– with identity 2 transcendence basis 303
ring extension 298 transcendence degree 305, 318
ring isomorphism theorem 11 transcendental extension 68
ring of polynomials 52 transcendental numbers 66, 73
Rips condition 215 transitive action 181
transposition 164
S-length 212
trisecting an angle 81
Schur’s lemma 329
trivial module 331
secure password authentication 380
semisimple algebra 335
UFD 37
semisimple module 333
unique factorization domain 37
separable field extension 235
unit 4, 34
separable hull 242
unit group 34
separable polynomial 235
simple algebra 337
simple extension 68 valid generating system 212
simple group 168 vector space 65
simple module 327
simplicial complex 210 word 195
skew field 107, 336 – cyclically reduced 198
solvability by radicals 249 – length 195
solvable group 173 – reduced 195
solvable series 173 – trivial 195
special linear group 122 word problem 217
splitting field 96, 113
squaring the circle 81 zero divisor 3
stabilizer 125, 181 Zorn’s lemma 25

You might also like